Pauline Jones Gandrud Papers

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
(What Is An Item?)
(Box/Run Assignment)
(4 intermediate revisions by one user not shown)
Line 13: Line 13:
 
*Example: Box 4043a, the first to be digitized for the 4043 run, has 15 items. Their filenames look like this: u0003_0000555_0430001, u0003_0000555_0430002 … u0003_0000555_0430015. Box 4043b, the second to be digitized for the run, has 13 items in it. Their filenames look like this: u0003_0000555_0430016, u0003_0000555_0430017 … u0003_0000555_0430028.
 
*Example: Box 4043a, the first to be digitized for the 4043 run, has 15 items. Their filenames look like this: u0003_0000555_0430001, u0003_0000555_0430002 … u0003_0000555_0430015. Box 4043b, the second to be digitized for the run, has 13 items in it. Their filenames look like this: u0003_0000555_0430016, u0003_0000555_0430017 … u0003_0000555_0430028.
  
===Box/Run Assignment===
+
===Box Assignment===
1. There should never be (1) multiple people working on a single box at once or (2) one or more people working on multiple boxes in the same run at once
+
'''1. There should never be either (a) multiple people working on a single box at once or (b) one or more people working on multiple boxes in the same run at once.''' This is so we can easily keep to the filenaming system -- a system which is designed to keep us from repeating filenames without having to count the number of items in a box before scanning.
*Example:  If you’re working on 4043b…
+
 
 +
*Example:  You’re working on 4043b...
 
**No one else should be working in 4043b.
 
**No one else should be working in 4043b.
 
**No one – including you – should be working in 4043a or 4043c either.  
 
**No one – including you – should be working in 4043a or 4043c either.  
  
2. If you can’t finish a box or run, it should be completely turned over to someone else.
+
'''2. If you can’t finish a box or run, your progress should be well documented so that someone else can take over.'''
*Example: If you can’t complete 4043b, find someone else to take responsibility for that box (and the remainder of the 4043 run). That person finishes box 4043b and moves on to 4043c. If you finish 4043b but can’t continue the run, find someone else to take responsibility for the 4043 run. That person moves on to 4043c.
+
  
3. The person of record for a run should be listed on the white board.
+
*What folder did you leave off with? If you're in the middle of one, leave a marker.
 +
*What filename did you leave off with?
 +
 
 +
*Example: You stop working at the end of folder 4 in box 4043b...
 +
**In the collection tracking spreadsheet, on the line for box 4043b
 +
***note '''folder 4 complete'''
 +
***note the last filename used
 +
 
 +
*Example: You stop working in the middle of folder 5 in box 4043b...
 +
**Leave a marker in the folder -- use something archive-friendly, like one of the folder strips we typically have around or a strip of acid-free paper
 +
**In the collection tracking spreadsheet, on the line for box 4043b,
 +
***note '''folder 5 in progress'''
 +
***note the last filename used
  
 
===Batching===
 
===Batching===
1. Batches are '''created before capture begins''', not assigned after the fact, prior to upload. After you upload a batch, then (1) assign a new batch number, (2) create a spreadsheet for that batch, and (3) begin logging your work there.
 
2. Batch numbers are '''sequential''', completely unrelated to the run/box number
 
*Example: You are working in box 4043a, which is batch number 25, and you have hit 300 items. It’s time for a new batch, and the last batch number being used (by one of your coworkers) is 28. You create batch 29 for your continuing work in box 4043a. You reach the end of that box, but your batch has only 72 items in it. You continue batch 29 with the next box you begin work on, even if it’s not in the 4043 run.
 
3. '''Each batch goes into its own Scans folder'''
 
4. '''Each Scans folder is used by only one staff member, period'''
 
*Example: You took over the 6832 box number from someone else, and they had only done a few items toward batch 17. You do not continue batch 17. You begin a new batch number with a new Scans folder.
 
  
 +
'''A batch is basically just a separate container for any items scanned by the same person. That container is represented on the Share drive by a single Scans folder. This is the best way to keep people's scans separate for QC.'''
 +
 +
# Batches are created at the time of capture, not assigned after the fact, prior to upload. After you upload a batch:
 +
## assign the next available batch number
 +
## create a spreadsheet for that batch, and
 +
## begin logging your work there.
 +
# '''Batch numbers are completely unrelated to the run/box number'''; they are assigned sequentially within the group. (This means your own batch numbers will probably not be sequential.)
 +
# Each batch is tied to a particular staff member.
 +
# Each batch gets its own Scans folder.
 +
# A batch may contain items from multiple boxes.
 +
 +
Examples:
 +
* You are working in box 4043a, which is batch number 25, and you've reached 900 captures. '''Your current batch is full (see [[Batches | What is a batch?]]), so you need a ''new'' batch number -- not necessarily the ''next'' one.''' You look and see that the last batch number being used (by one of your coworkers) is 28. You create batch 29 (and a new Scans folder) for your continuing work in box 4043a.
 +
* You took over the 6832 box number from someone else, and they had only done a few items toward batch 17. You do not continue batch 17. '''It can't be your batch because they're not your scans.''' You begin a new batch number (the next available) and a new Scans folder.
 +
* You reach the end of box 3600c, but your current batch, batch 55, only has a few items in it. When you begin work on box 3601a (or any other box!), you continue in batch 55. '''It's your batch number, and it's not full yet, so you keep it.'''
  
 
==Documentation Differences==
 
==Documentation Differences==
Line 49: Line 70:
  
 
===Additional Internal Tracking===
 
===Additional Internal Tracking===
Please use the dry erase board in 215 to indicate which box you’re working on
+
'''Please use the dry erase board in 215 to indicate which box you're working on!'''
  
The in-progress collection Admin folder contains a document called u0003_0000555.tracking.xslx. Use it to indicate the staff member working on a box, and the box’s status:
+
The in-progress collection Admin folder contains a document called u0003_0000555.tracking.xslx. Use it to indicate  
*in progress:  used when you're actively working on a box; if it’s helpful, use the notes column to mark what folder you’re on (although this can definitely be done physically with a paper marker instead!)
+
*Status
*complete: used to note that capture is complete for a box (not necessarily that it's been uploaded)
+
**in progress:  used when you're actively working on a box; if it’s helpful, use the notes column to mark what folder you’re on (although this can definitely be done physically with a paper marker instead!)
*incomplete: used if you haven’t completed a box and can’t do so for the foreseeable future, such as when the archivists request a return of a box before you’re done with it; use the notes column to mark where you left off (at the folder or even item level)
+
**complete: used to note that capture is complete for a box (not necessarily that it's been uploaded)
 +
**incomplete: used if you haven’t completed a box and can’t do so for the foreseeable future, such as when the archivists request a return of a box before you’re done with it
 +
*Batch -- batch the box was uploaded with
 +
*Staff Member -- who is working on the box or last worked on it
 +
*Left off at -- '''if you stop in the middle of a box, indicate
 +
**what folder is next, OR
 +
**what folder is in progress (please leave a marker in the folder itself)'''
 +
*Last filename used -- for the most recently captured box in the run
  
  
Line 89: Line 117:
 
## Ink
 
## Ink
 
## Handwriting
 
## Handwriting
 +
 +
===Page Ordering===
 +
Contrary to our usual policy of capturing and presenting items the way we find them, page by page, this collection sometimes requires different ways of presenting pages within items. 
 +
 +
Examples:
 +
* On the front two pieces of paper is a typed letter. On the back of both pages are handwritten notes. Rather than scan the pages as ordered -- resulting in letter, notes, letter, notes -- keep the "subitems" together -- letter, letter, notes, notes -- while still treating them as a single item.
 +
* On the back of three separate letters (over six pages) is another document, a numbered series of typed pages, but its ordering doesn't correspond to the ordering of the letter pages. It begins on the back of the second page of letter two, runs over the letter one pages, then the letter three pages, then ends on the first page of letter two. In this case, it would be least confusing to treat them as four separate items, three letters and one typed document.
  
 
===Photos===
 
===Photos===
Images are mixed in with the documents in this collection. They '''should not''' be separated out, in terms of filenaming. They are u0003 (like the rest of the collection), NOT u0001.
+
Images are mixed in with the documents in this collection. They '''should not''' be separated out, in terms of filenaming. They are u0003 (like the rest of the collection), NOT u0001. If they have a penciled-in photo number, make sure to include that in the spreadsheet column "Original ID."
  
 
===Photocopies===
 
===Photocopies===
Line 99: Line 134:
 
Most of the collection is notes and letters from the mid-20th century, so it’s not particularly fragile. But be on the lookout for older documents and treat them carefully.
 
Most of the collection is notes and letters from the mid-20th century, so it’s not particularly fragile. But be on the lookout for older documents and treat them carefully.
 
You will find photographs mixed in with the papers in this collection. They should be treated as part of the manuscript collection, captured with the care we usually give images (use gloves), and given u0003 filenames within the sequence as they’re found.
 
You will find photographs mixed in with the papers in this collection. They should be treated as part of the manuscript collection, captured with the care we usually give images (use gloves), and given u0003 filenames within the sequence as they’re found.
 
  
 
==Preparation and Upload Differences==
 
==Preparation and Upload Differences==
Line 139: Line 173:
 
'''Upload'''
 
'''Upload'''
  
Found libcontent1 server:  MassContent/scripts
+
Found libcontent server:  MassContent/scripts
 
* makeMassJpegs
 
* makeMassJpegs
 
** Equivalent to part of makeJpegs
 
** Equivalent to part of makeJpegs
Line 155: Line 189:
 
'''Archiving'''
 
'''Archiving'''
  
Found libcontent1 server:  MassContent/scripts
+
Found libcontent server:  MassContent/scripts
 
* moveMassContent  
 
* moveMassContent  
 
** Equivalent to moveContent
 
** Equivalent to moveContent

Revision as of 12:59, 6 January 2014

Contents

Organizational Differences

Terminology

Box = physical box you’re working from

  • Example: 4043b

Run = all the boxes with the same 4-digit number

  • Example: Run 4043 = Box 4043a, 4043b, and 4043c

File Naming

1. Item number = last 3 digits of run number + four-digit sequential number

2. The sequential numbering applies to the whole run, not just a single box.

  • Example: Box 4043a, the first to be digitized for the 4043 run, has 15 items. Their filenames look like this: u0003_0000555_0430001, u0003_0000555_0430002 … u0003_0000555_0430015. Box 4043b, the second to be digitized for the run, has 13 items in it. Their filenames look like this: u0003_0000555_0430016, u0003_0000555_0430017 … u0003_0000555_0430028.

Box Assignment

1. There should never be either (a) multiple people working on a single box at once or (b) one or more people working on multiple boxes in the same run at once. This is so we can easily keep to the filenaming system -- a system which is designed to keep us from repeating filenames without having to count the number of items in a box before scanning.

  • Example: You’re working on 4043b...
    • No one else should be working in 4043b.
    • No one – including you – should be working in 4043a or 4043c either.

2. If you can’t finish a box or run, your progress should be well documented so that someone else can take over.

  • What folder did you leave off with? If you're in the middle of one, leave a marker.
  • What filename did you leave off with?
  • Example: You stop working at the end of folder 4 in box 4043b...
    • In the collection tracking spreadsheet, on the line for box 4043b
      • note folder 4 complete
      • note the last filename used
  • Example: You stop working in the middle of folder 5 in box 4043b...
    • Leave a marker in the folder -- use something archive-friendly, like one of the folder strips we typically have around or a strip of acid-free paper
    • In the collection tracking spreadsheet, on the line for box 4043b,
      • note folder 5 in progress
      • note the last filename used

Batching

A batch is basically just a separate container for any items scanned by the same person. That container is represented on the Share drive by a single Scans folder. This is the best way to keep people's scans separate for QC.

  1. Batches are created at the time of capture, not assigned after the fact, prior to upload. After you upload a batch:
    1. assign the next available batch number
    2. create a spreadsheet for that batch, and
    3. begin logging your work there.
  2. Batch numbers are completely unrelated to the run/box number; they are assigned sequentially within the group. (This means your own batch numbers will probably not be sequential.)
  3. Each batch is tied to a particular staff member.
  4. Each batch gets its own Scans folder.
  5. A batch may contain items from multiple boxes.

Examples:

  • You are working in box 4043a, which is batch number 25, and you've reached 900 captures. Your current batch is full (see What is a batch?), so you need a new batch number -- not necessarily the next one. You look and see that the last batch number being used (by one of your coworkers) is 28. You create batch 29 (and a new Scans folder) for your continuing work in box 4043a.
  • You took over the 6832 box number from someone else, and they had only done a few items toward batch 17. You do not continue batch 17. It can't be your batch because they're not your scans. You begin a new batch number (the next available) and a new Scans folder.
  • You reach the end of box 3600c, but your current batch, batch 55, only has a few items in it. When you begin work on box 3601a (or any other box!), you continue in batch 55. It's your batch number, and it's not full yet, so you keep it.

Documentation Differences

Setup

At the start of work on a new batch, open the log file template, found in the collection’s admin folder, and save it with the next available batch number, named in this format:

  • [collnum].[batchnum].log.xlsx
  • Example: u0003_0000555.2.log.xlsx
  • The spreadsheet fields should be self-explanatory except: Original ID. This is for photographs only, and it refers to the number which MAY be penciled in on the photograph (in this format: ###.####). If it doesn't have one, leave the field blank.

Then create a new Scans folder, labeled in this format:

  • Scans_[batchnum][staff initials]
  • Example: Austin, batch 22 = Scans_22AD

Prep for Upload

The Excel version of the log file is for internal use only. We are not turning it over to the Metadata Unit. It houses the data during capture, to be exported as a .txt file, named in the same format as the Excel file:

  • Example: u0003_0000555.2.log.txt

Additional Internal Tracking

Please use the dry erase board in 215 to indicate which box you're working on!

The in-progress collection Admin folder contains a document called u0003_0000555.tracking.xslx. Use it to indicate

  • Status
    • in progress: used when you're actively working on a box; if it’s helpful, use the notes column to mark what folder you’re on (although this can definitely be done physically with a paper marker instead!)
    • complete: used to note that capture is complete for a box (not necessarily that it's been uploaded)
    • incomplete: used if you haven’t completed a box and can’t do so for the foreseeable future, such as when the archivists request a return of a box before you’re done with it
  • Batch -- batch the box was uploaded with
  • Staff Member -- who is working on the box or last worked on it
  • Left off at -- if you stop in the middle of a box, indicate
    • what folder is next, OR
    • what folder is in progress (please leave a marker in the folder itself)
  • Last filename used -- for the most recently captured box in the run


Capture Differences

Workflow

We receive no item-level metadata from the archivists for this collection, so there is nothing to check your capture against. In this case, it is our responsibility to look at the material as we work and record data about it and our processing accurately.

  • survey the contents of a folder before doing any scanning on it
  • leave a marker if you have to stop in the middle of a folder (use a strip of archival paper or archival folder)
  • make sure you don’t skip anything -- it's easier to do than you’d think, since there’s no metadata to guide you as you scan or help you discover that you’ve skipped something afterwards

What Is An Item?

Because of the lack of metadata to guide us, we will sometimes have to do some interpretation to determine what constitutes an item and how to arrange it.

Some of the time, folder and item arrangement get a little wonky, mainly because this collection is used frequently by patrons. Don’t be afraid of using your own judgment. You might see

  • Pages presented on their back sides, not their front (making a pre-capture survey of the folder all the more important)
  • Pages within an item out of order
  • Pages from an item separated from each other


According to what April told us, these are good questions to consider when establishing what an item is:

  1. Is it clipped together with a plastic clip? This means the material used to be fastened together and should be considered as a single item, even if it seems strange. Plastic clips trump all other considerations.
  2. Does it look like is used to be fastened together? If you can see a common staple mark or paperclip mark, it's probably a single item.
  3. Is it a letter with enclosures? Skim the end of the letter to see if the word "enclosed" or "enclosure" is used -- that probably means what comes next belongs with that letter.
  4. Does it appear to be the same document, content-wise? Things to look for:
    1. Numbered pages
    2. Headings
    3. Dates
    4. Continuation of text from one page to next
    5. Bottom of page has "over" or "next," indicating further pages
  5. Does it physically look like the same document? Things to look for:
    1. Stationary
    2. Paper
    3. Dogears or other common folds
    4. Ink
    5. Handwriting

Page Ordering

Contrary to our usual policy of capturing and presenting items the way we find them, page by page, this collection sometimes requires different ways of presenting pages within items.

Examples:

  • On the front two pieces of paper is a typed letter. On the back of both pages are handwritten notes. Rather than scan the pages as ordered -- resulting in letter, notes, letter, notes -- keep the "subitems" together -- letter, letter, notes, notes -- while still treating them as a single item.
  • On the back of three separate letters (over six pages) is another document, a numbered series of typed pages, but its ordering doesn't correspond to the ordering of the letter pages. It begins on the back of the second page of letter two, runs over the letter one pages, then the letter three pages, then ends on the first page of letter two. In this case, it would be least confusing to treat them as four separate items, three letters and one typed document.

Photos

Images are mixed in with the documents in this collection. They should not be separated out, in terms of filenaming. They are u0003 (like the rest of the collection), NOT u0001. If they have a penciled-in photo number, make sure to include that in the spreadsheet column "Original ID."

Photocopies

Most of the photocopied material in the collection should be scanned. Since Mrs. Gandrud was a researcher, she sometimes made copies of things for her clients and her own files. These copies, like her notes, are part of her archive.

Condition of Items

Most of the collection is notes and letters from the mid-20th century, so it’s not particularly fragile. But be on the lookout for older documents and treat them carefully. You will find photographs mixed in with the papers in this collection. They should be treated as part of the manuscript collection, captured with the care we usually give images (use gloves), and given u0003 filenames within the sequence as they’re found.

Preparation and Upload Differences

Quality Control

For QC script, see Scripts below


Eyeball check

  • Missing page? let the capturing person know so he or she can look at the materials to see if it's really missing
  • You think the materials could've been divided into items better? Point it out for future reference, but unless it's drastic or can be fixed easily (i.e., it does not involve inserting or deleting item numbers), this will probably not be changed
  • Image quality and orientation/alignment/cropping should be the same as for other collections.


Folder check

  • Admin
    • should contain -- .xml file (example: u0003_0000555.xml)
    • should contain -- .log.txt file (data that matches filename to box/folder + tracking data)(example: u0003_0000555.9.log.txt)
    • will not contain -- .m01.txt file (there is no metadata spreadsheet for this collection; minimal metadata is incorporated with the log file)
  • Metadata
    • may contain -- (1) the .xlsx version of the log file and/or (2) a copy of the .xml EAD -- both are necessary in the preparation phase but should be deleted before upload
  • Scans
    • should contain -- singletons + folders for multiple-file items, as with a normal collection


Scripts

Gandrud mass content has a somewhat different upload process and (mostly) different scripts than regular content.

For instructions on the process, see Uploading Gandrud Mass Content or consult the document DOTHIS.txt on the server at /home/ds/MassContent. This is just an overview.


Prep

Found on Share drive: Administrative\scripts\qc

  • massContentCheck
    • Equivalent to filenamesAndDupes and ScrapbookCheck
    • You can run this script only on the Complete folder. Please include only one batch at a time. The folder should be set up as noted above (under Quality Control)
    • After script comes out without errors, please delete the Excel version of the log file (but KEEP the TEXT EXPORT) and the local copy of the EAD (which the QC script need for reference, while the upload scripts will use the updated version on the server)

Upload

Found libcontent server: MassContent/scripts

  • makeMassJpegs
    • Equivalent to part of makeJpegs
    • If all goes well, makeMassJpegs will move the log file from matchFile\ to \backups
  • linkContent
    • Creates MODS
    • Will be run over multiple batches at the same time
  • makeMassLive
    • Equivalent to relocate_all
    • Will be run over multiple batches at the same time
  • eadLive
    • updates the EAD online
    • Do NOT run this script until content has been indexed, usually the next day)

Archiving

Found libcontent server: MassContent/scripts

  • moveMassContent
    • Equivalent to moveContent
    • Can be run before eadLive

Multiple Batches

Multiple batches MUST be processed through uploads (1) at the exact same time OR (2) completely separately.

If not uploading multiple batches at the same time, make sure any given batch has made it all the way through the process, including eadLive, before beginning the next batch. This is so that the EAD can be updated to incorporate already-indexed material without picking up changes related to material that hasn't been indexed yet, creating dead links.

Notes

  • Gandrud Batch 1-3 were processed in the regular way, not as mass digitized content.
  • Many of these workflows and scripts were based on or derived from work done on the Cabaniss mass digitization project. The remainder were tailored to this collection, by Jody DeRidder (scripts) and Kate Matheny (workflow)
Personal tools