Linking out from EADS
Currently, our Encoded Archival Description finding aids (EADs, reference http://www.loc.gov/ead/ ) are being created in Archivists Toolkit.
We had originally set up capabilities for archivists to add links to digitized content into the EADs for small collections (Here are the instructions for Obtaining PURLs for online content) -- but they didn't have the time to do so.
My first round of automation matched up the box and folder as entered in the MODS to the box and folder as entered in the EAD, normalizing these values as much as possible, and NOT linking anything into the EAD if the box/folder entry appeared more than once. Items were entered in the order digitized, as determined by the item number. After some tinkering, I pulled the titles from the MODS and added them to the EADs.
My second round of automation was for mass digitization, where we created stub MODS for the items automagically, and entered these into the EADs originally as Item 1, Item 2, etc., in the correct box/folder as indicated in the item identifier. When we then moved to digitizing larger collections (couldn't fit the box and folder numbers into the item number), I modified this to pull the information from a simple tab-delimited spreadsheet export that indicated the box/folder value for each item.
My third round of automation was for item-level described image collections, where the itemID value in the EAD contained something I could (usually) map to our item identifiers. For some collections, this requires a Match file (tab-delimited spreadsheet exports to match up original photo id with the assigned digital identifier).
Then the archivists began adding item-level image entries (from a different collection number, as represented in our online database) into manuscript EADs that also had box/folder entries for the manuscript items. So I had to run the same EADs through multiple processes, every time they updated them.
And they became hard pressed to produce multiple EADs, not spending any time on any of them, really -- and not having time to flesh out older ones they'd begun, or never processed even to the series level.
At this point, the linking process had become quite complex, and was taking several hours each month.
So I combined all the above processes, and expanded our linking software to do several things:
- add links based on itemId into image EADs and manuscript EADs (regardless of digital collection number);
- add links based on box & folder value metadata matches, in order of digitization, pulling titles from the MODS
- If the box/folder value is linked multiple times, adds new box/folder entry following the last one, and inserts links there (thus does not assume title matches the entry)
- add boxes and folders that follow the ones in the EAD (assumes that processing started at the beginning of the collection, and/or boxes), links in content
- add boxes and folders to EADs with none, and links in content
- add links for content to EADs where neither EAD nor MODS have box/folder entries
- transform itemID values indicating numerical sequence into potential item identifiers, and checks for match, then links if available (example: "Item 2" or "Artifact 12")
This still fails to link in:
- content from folders & boxes that precede those listed in the EADs
- content for which we cannot determine the correct identifier from the itemId in the EAD
- content for which we cannot match the box/folder entry in the MODS to what's in the EAD
- content for which we cannot parse the box/folder entry, or there isn't any in the MODS, but there's box/entry values in the EAD
Still, these changes have enabled me to link in over 9500 more items into the finding aids. And combining the multiple linking scripts and workflows has enabled me to automate the process, saving me several hours per month.
We now (January 2014) have well over 68,000 digitized items linked into well over 300 collection EADs.
Here are specifics for what those tags may look like: Scripted Links in EADs
More info about this process can be found in EADs.
Diagram of the automated linking process, as of 9 February 2016: