Linking out from EADS

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
Line 3: Line 3:
 
We had originally set up capabilities for archivists to add links to digitized content into the EADs for small collections (Here are the instructions for [[Obtaining PURLs for online content]]) -- but they didn't have the time to do so.  
 
We had originally set up capabilities for archivists to add links to digitized content into the EADs for small collections (Here are the instructions for [[Obtaining PURLs for online content]]) -- but they didn't have the time to do so.  
  
Recently, our linking procedures had grown in complexity, so I have expanded our linking software to do several things: 
+
My first round of automation matched up the box and folder as entered in the MODS to the box and folder as entered in the EAD, normalizing these values as much as possible, and NOT linking anything into the EAD if the box/folder entry appeared more than once.  Items were entered in the order digitized, as determined by the item number.  After some tinkering, I pulled the titles from the MODS and added them to the EADs.
  
# adding links based on itemId into image EADs and manuscript EADs (regardless of digital collection number);  
+
My second round of automation was for mass digitization, where we created stub MODS for the items automagically, and entered these into the EADs originally as Item 1, Item 2, etc., in the correct box/folder as indicated in the item identifier.  When we then moved to digitizing larger collections (couldn't fit the box and folder numbers into the item number), I modified this to pull the information from a simple tab-delimited spreadsheet export that indicated the box/folder value for each item.
# adding links based on box & folder value metadata matches, in order of digitization, pulling titles from the MODS
+
 
 +
My third round of automation was for item-level described image collections, where the itemID value in the EAD contained something I could (usually) map to our item identifiers. For some collections, this requires "match" files (tab-delimited spreadsheet exports to match up original photo id with the assigned digital identifier).
 +
 
 +
Then the archivists began adding item-level image entries (from a different collection number, as represented in our online database) into manuscript EADs that also had box/folder entries for the manuscript items. 
 +
 
 +
And they became hard pressed to produce multiple EADs, not spending any time on any of them, really -- and not having time to flesh out older ones they'd begun, or never processed even to the series level. 
 +
 
 +
At this point, the linking process had become quite complex, and was taking several hours each month.  So I combined all the above processes, and expanded our linking software to do several things: 
 +
 
 +
# add links based on itemId into image EADs and manuscript EADs (regardless of digital collection number);  
 +
# add links based on box & folder value metadata matches, in order of digitization, pulling titles from the MODS
 
# If the box/folder value is linked multiple times, adds new box/folder entry following the last one, and inserts links there (thus does not assume title matches the entry)
 
# If the box/folder value is linked multiple times, adds new box/folder entry following the last one, and inserts links there (thus does not assume title matches the entry)
# adding boxes and folders that follow the ones in the EAD (assumes that processing started at the beginning of the collection, and/or boxes), linking in content
+
# add boxes and folders that follow the ones in the EAD (assumes that processing started at the beginning of the collection, and/or boxes), links in content
# adding boxes and folders to EADs with none, and linking in content
+
# add boxes and folders to EADs with none, and links in content
# adding links for content to EADs where neither EAD nor MODS have box/folder entries
+
# add links for content to EADs where neither EAD nor MODS have box/folder entries
# transforming itemID values indicating numerical sequence into potential item identifiers, and checking for match, then linking if available (example: "Item 2" or "Artifact 12"
+
# transform itemID values indicating numerical sequence into potential item identifiers, and checks for match, then links if available (example: "Item 2" or "Artifact 12")
  
 
This still fails to link in:  
 
This still fails to link in:  
Line 21: Line 31:
 
Still, these changes have enabled me to link in over 9500 more items into the finding aids.  And combining the multiple linking scripts and workflows has enabled me to automate the process, saving me several hours per month.
 
Still, these changes have enabled me to link in over 9500 more items into the finding aids.  And combining the multiple linking scripts and workflows has enabled me to automate the process, saving me several hours per month.
  
 +
We now have well over 68,000 digitized items linked into well over 300 collection EADs.
  
 
Here are specifics for what those tags may look like:  [[Scripted Links in EADs]]
 
Here are specifics for what those tags may look like:  [[Scripted Links in EADs]]

Revision as of 14:13, 6 February 2014

Currently, our Encoded Archival Description finding aids (EADs, reference http://www.loc.gov/ead/ ) are being created in Archivists Toolkit (AT, reference http://www.archiviststoolkit.org/ ). In early 2014, we expect to shift to ArchivesSpace.

We had originally set up capabilities for archivists to add links to digitized content into the EADs for small collections (Here are the instructions for Obtaining PURLs for online content) -- but they didn't have the time to do so.

My first round of automation matched up the box and folder as entered in the MODS to the box and folder as entered in the EAD, normalizing these values as much as possible, and NOT linking anything into the EAD if the box/folder entry appeared more than once. Items were entered in the order digitized, as determined by the item number. After some tinkering, I pulled the titles from the MODS and added them to the EADs.

My second round of automation was for mass digitization, where we created stub MODS for the items automagically, and entered these into the EADs originally as Item 1, Item 2, etc., in the correct box/folder as indicated in the item identifier. When we then moved to digitizing larger collections (couldn't fit the box and folder numbers into the item number), I modified this to pull the information from a simple tab-delimited spreadsheet export that indicated the box/folder value for each item.

My third round of automation was for item-level described image collections, where the itemID value in the EAD contained something I could (usually) map to our item identifiers. For some collections, this requires "match" files (tab-delimited spreadsheet exports to match up original photo id with the assigned digital identifier).

Then the archivists began adding item-level image entries (from a different collection number, as represented in our online database) into manuscript EADs that also had box/folder entries for the manuscript items.

And they became hard pressed to produce multiple EADs, not spending any time on any of them, really -- and not having time to flesh out older ones they'd begun, or never processed even to the series level.

At this point, the linking process had become quite complex, and was taking several hours each month. So I combined all the above processes, and expanded our linking software to do several things:

  1. add links based on itemId into image EADs and manuscript EADs (regardless of digital collection number);
  2. add links based on box & folder value metadata matches, in order of digitization, pulling titles from the MODS
  3. If the box/folder value is linked multiple times, adds new box/folder entry following the last one, and inserts links there (thus does not assume title matches the entry)
  4. add boxes and folders that follow the ones in the EAD (assumes that processing started at the beginning of the collection, and/or boxes), links in content
  5. add boxes and folders to EADs with none, and links in content
  6. add links for content to EADs where neither EAD nor MODS have box/folder entries
  7. transform itemID values indicating numerical sequence into potential item identifiers, and checks for match, then links if available (example: "Item 2" or "Artifact 12")

This still fails to link in:

  1. content from folders & boxes that precede those listed in the EADs
  2. content for which we cannot determine the correct identifier from the itemId in the EAD
  3. content for which we cannot match the box/folder entry in the MODS to what's in the EAD
  4. content for which we cannot parse the box/folder entry, or there isn't any in the MODS, but there's box/entry values in the EAD

Still, these changes have enabled me to link in over 9500 more items into the finding aids. And combining the multiple linking scripts and workflows has enabled me to automate the process, saving me several hours per month.

We now have well over 68,000 digitized items linked into well over 300 collection EADs.

Here are specifics for what those tags may look like: Scripted Links in EADs

Examples of these can be found in Amelia Gayle & Josiah Gorgas collection, UA Photos, and Septimus Cabaniss Collection.


More info about this process can be found in EADs.

Diagram of the automated linking process, as of 17 January 2014:

LinkingProcess 20140131.png

Jlderidder (talk) 09:24, 19 December 2013 (CST)

Personal tools