Pauline Jones Gandrud Papers

From UA Libraries Digital Services Planning and Documentation
Revision as of 11:15, 28 May 2013 by Kgmatheny (talk | contribs) (Scripts)

Organizational Differences


Box = physical box you’re working from

  • Example: 4043b

Run = all the boxes with the same 4-digit number

  • Example: Run 4043 = Box 4043a, 4043b, and 4043c

File Naming

1. Item number = last 3 digits of run number + four-digit sequential number

2. The sequential numbering applies to the whole run, not just a single box.

  • Example: Box 4043a, the first to be digitized for the 4043 run, has 15 items. Their filenames look like this: u0003_0000555_0430001, u0003_0000555_0430002 … u0003_0000555_0430015. Box 4043b, the second to be digitized for the run, has 13 items in it. Their filenames look like this: u0003_0000555_0430016, u0003_0000555_0430017 … u0003_0000555_0430028.

Box/Run Assignment

1. There should never be (1) multiple people working on a single box at once or (2) one or more people working on multiple boxes in the same run at once

  • Example: If you’re working on 4043b…
    • No one else should be working in 4043b.
    • No one – including you – should be working in 4043a or 4043c either.

2. If you can’t finish a box or run, it should be completely turned over to someone else.

  • Example: If you can’t complete 4043b, find someone else to take responsibility for that box (and the remainder of the 4043 run). That person finishes box 4043b and moves on to 4043c. If you finish 4043b but can’t continue the run, find someone else to take responsibility for the 4043 run. That person moves on to 4043c.

3. The person of record for a run should be listed on the white board.


1. Batches are created before capture begins, not assigned after the fact, prior to upload. After you upload a batch, then (1) assign a new batch number, (2) create a spreadsheet for that batch, and (3) begin logging your work there. 2. Batch numbers are sequential, completely unrelated to the run/box number

  • Example: You are working in box 4043a, which is batch number 25, and you have hit 300 items. It’s time for a new batch, and the last batch number being used (by one of your coworkers) is 28. You create batch 29 for your continuing work in box 4043a. You reach the end of that box, but your batch has only 72 items in it. You continue batch 29 with the next box you begin work on, even if it’s not in the 4043 run.

3. Each batch goes into its own Scans folder 4. Each Scans folder is used by only one staff member, period

  • Example: You took over the 6832 box number from someone else, and they had only done a few items toward batch 17. You do not continue batch 17. You begin a new batch number with a new Scans folder.

Documentation Differences


This collection may or may not have a collection info xml in the Admin folder. Don’t worry about making one if it doesn’t. It is not a new collection At the start of work on a new batch, open the match file template, found in the collection’s admin folder, and save it with the next available batch number, named in this format:

  • [collnum].[batchnum].match.xlsx
  • Example: u0003_0000555.2.match.xlsx

Then create a new Scans folder, labeled in this format:

  • Scans_[batchnum][staff initials]
  • Example: Austin, batch 22 = Scans_22AD

Prep for Upload

There is no log file export. The tracking data stays with the match data, in the Metadata folder. The Excel version of the match file is for internal use only. We are not turning it over to the Metadata Unit. It houses the data during capture, to be exported as a .txt file, named in the same format as the Excel file:

  • Example: u0003_0000555.2.match.txt

Additional Internal Tracking

Please use the dry erase board in 215 to indicate which box you’re working on

The in-progress collection Admin folder contains a document called u0003_0000555.tracking.xslx. Use it to indicate the staff member working on a box, and the box’s status:

  • in progress: used when you're actively working on a box; if it’s helpful, use the notes column to mark what folder you’re on (although this can definitely be done physically with a paper marker instead!)
  • complete: used to note that capture is complete for a box (not necessarily that it's been uploaded)
  • incomplete: used if you haven’t completed a box and can’t do so for the foreseeable future, such as when the archivists request a return of a box before you’re done with it; use the notes column to mark where you left off (at the folder or even item level)

Capture Differences


We receive no item-level metadata from the archivists for this collection, so there is nothing to check your capture against. In this case, it is our responsibility to look at the material as we work and record data about it and our processing accurately.

  • survey the contents of a folder before doing any scanning on it
  • leave a marker if you have to stop in the middle of a folder (use a strip of archival paper or archival folder)
  • make sure you don’t skip anything -- it's easier to do than you’d think, since there’s no metadata to guide you as you scan or help you discover that you’ve skipped something afterwards

What Is An Item?

Because of the lack of metadata to guide us, we will sometimes have to do some interpretation to determine what constitutes an item and how to arrange it. Take these guidelines into consideration:

  1. Frequently, the pages of an item aren’t fastened together. Don’t be afraid to make judgment calls. Three pages in a row in the same ink, the first with a salutation and the last with a signature? Probably a single letter. Other things to look for as a guide:
    1. Numbered pages
    2. Same heading on each page (often with ‘’cont" on the second and subsequent pages)
    3. Same date on each page
    4. Sentence makes sense when begun on one page and ended on the next
  2. Sometimes, a page of notes or a chart accompanies a letter. Skim the end of the letter to see if the word ‘’enclosed’’ or ‘’enclosure’’ is used -- that probably means what comes next belongs with that letter
  3. Err on the side of leaving a page as its own item if you’re not sure where it goes. Since it’s presented within the context of a particular folder, and since folks will probably be looking at things from a whole folder at once, it’s okay if some pages show up as items.
  4. Some of the time, folder and item arrangement get a little wonky, mainly because this collection is used frequently by patrons. Once again, don’t be afraid of using your own judgment. You might see
    1. Pages presented on their back sides, not their front (making a pre-capture survey of the folder all the more important)
    2. Pages within an item out of order
    3. Pages from an item separated from each other


Most of the photocopied material in the collection should be scanned. Since Mrs. Gandrud was a researcher, she sometimes made copies of things for her clients and her own files. These copies, like her notes, are part of her archive.

Condition of Items

Most of the collection is notes and letters from the mid-20th century, so it’s not particularly fragile. But be on the lookout for older documents and treat them carefully. You will find photographs mixed in with the papers in this collection. They should be treated as part of the manuscript collection, captured with the care we usually give images (use gloves), and given u0003 filenames within the sequence as they’re found.

Preparation and Upload Differences

Quality Control

For scripts, see below

For eyeball

  • Missing page? let the capturing person know so he or she can look at the materials to see if it's really missing
  • You think the materials could've been divided into items better? Point it out for future reference, but unless it's drastic or can be fixed easily (i.e., it does not involve inserting or deleting item numbers), this will probably not be changed
  • Image quality and orientation/alignment/cropping should be the same as for other collections.


Gandrud mass content has a somewhat different upload process and (mostly) different scripts than regular content. For instructions on the process, see the document DOTHIS.txt on the server at /home/ds/MassContent.


Found on Share drive: Administrative\scripts\qc

  • massContentCheck (quality control)
    • must have two pieces of documentation in the Metadata folder before running this script: (1) match file .txt export and (2) EAD
    • can be run ONLY on the Complete folder


Found libcontent1 server: MassContent/scripts

  • makeMassJpegs (equivalent to part of makeJpegs)
  • linkContent (creates MODS)
  • makeMassLive (equivalent to relocate_all)
  • eadLive (updates the EAD -- do NOT run this script until content has been indexed, usually the next day)


  • moveMassContent (equivalent to moveContent)



You can run massContentCheck only from the Complete folder. Please include only one batch at a time. The folder should be set up as normal, and should contain

  • Admin: empty (no log file or xml needed)
  • Metadata: match text export, EAD
  • Scans: as normal

After QC is done, please delete the Excel version of the match file (but KEEP the TEXT EXPORT) and the local copy of the EAD (the QC script needed it for reference, but the upload scripts will use the updated version on the server)


If all goes well, makeMassJpegs will move the match file from \matchFile to \backups linkContent and makeMassLive will be run over any batch/batches at the same time

Multiple Batches

Multiple batches must be uploaded at the same time. If they need to be uploaded within a couple of days of each other, make sure batch A has made it all the way through the process, including eadLive, before beginning batch B. This is so that the EAD can be updated to incorporate batch A after it is indexed without picking up changes related to batch B, leading to dead links for batch B material.


  • Gandrud Batch 1-3 were processed in the regular way, not as mass digitized content.
  • Many of these workflows and scripts were based on or derived from work done on the Cabaniss mass digitization project. The remainder were tailored to this collection, by Jody DeRidder (scripts) and Kate Matheny (workflow)