EADs

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
 
(7 intermediate revisions by one user not shown)
Line 1: Line 1:
 
The diagram below shows the creation of the EAD in Archivists Toolkit (AT) during processing.  The archivists consider the copy in Archivists Toolkit to be the copy of record; however, we have found that reloading EADs containing links to component items modifies the links, and reexporting modifies them further.  So we have altered our workflow.  While the EAD in AT is the copy of record for analog material, the delivery EAD (containing the links to digitized content) will be stored separately.  Every time EADs are modified by the archivists, they must go through the item-level linking process again.
 
The diagram below shows the creation of the EAD in Archivists Toolkit (AT) during processing.  The archivists consider the copy in Archivists Toolkit to be the copy of record; however, we have found that reloading EADs containing links to component items modifies the links, and reexporting modifies them further.  So we have altered our workflow.  While the EAD in AT is the copy of record for analog material, the delivery EAD (containing the links to digitized content) will be stored separately.  Every time EADs are modified by the archivists, they must go through the item-level linking process again.
 +
 +
[[Image:EAD3.png]]
 +
  
  
Line 5: Line 8:
  
  
2) Every Friday night, a script called "getEADs" ([[Image:getEads.txt]] -- this script no longer places the EADs live in Acumen) picks up these EADs, makes a datestamped directory in the "uploaded" directory there on the share drive  (for example, "uploaded_new_20100803"), copys the EADs to the corresponding "uploaded" directory (so the archivists will know what was picked up when), and then places them in an "notInDbase" directory on libcontent (under /srv/deposits/EADs/).
+
2) Every Friday night, a script called "getEADs" ( in /srv/scripts/storing/EADs_auto/):
 +
* tests these EADs for correct naming (and repairs them if possible),
 +
picks up these EADs,  
 +
makes a datestamped directory in the "uploaded" directory there on the share drive  (for example, "uploaded_new_20100803"),  
 +
copys the EADs to the corresponding "uploaded" directory (so the archivists will know what was picked up when), and  
 +
*  puts a copy into /srv/deposits/fromSC/.
 +
*  Also, many EADs come to us with embedded Word or PDF encodings, and some contain ampersands.  These need to be corrected before they go online. So getEADs then runs through these new EADs to replace these values, and
 +
*  puts the new version into /srv/deposits/EADs/unlinkedASCII/
 +
*  and into the notInDbase" directory on libcontent (under /srv/deposits/EADs/) to be used for linking.  
  
  
3) "eadModsTester" (in/srv/scripts/eads/) looks for which ones we can link, outputs several lists of problems found, and creates FaList which is used by next script, of what can (and cannot)  be linked.
+
3) When content is digitized, the EAD may already be online.  So the most recent makeJpegs upload scripts pull a copy of the corresponding EAD (if it exists) and places it in the /srv/script/eads/LINKME/ directory.   
   
+
  
4) "linkInContent" uses the FaList created by eadModsTester, and pulls EADs from /srv/deposits/EADs/notInDbase/.  This script goes through the Acumen directories, reading box and folder values in the item-level MODS, hunts through the appropriate EAD to match up the location, creates PURL links and enters them into the EAD, puts unlinked version in /srv/deposits/EADs/unlinked, linked version in /srv/deposits/EADs/notInDbase and LINKED folder (in /srv/scripts/eads) after backing up previously linked version (datestamped) into backups (also in /srv/scripts/eads/). 
 
  
 +
4) EadModsTester (in /srv/scripts/eads/):
 +
* checks in the /srv/scripts/eads/LINKME/ directory for EADs corresponding to newly uploaded digitized content.  If found, and there's not a new version of it waiting to be linked, it adds this to those in /srv/deposits/EADs/notInDbase/.
 +
* Then it parses the EAD to locate box and folder numbers and itemIDs (all with corresponding ref values), and hunts through the web directories for MODS with item identifiers that correspond to the itemIDs, or for MODS within this collection (pulling box and folder values).
 +
* This script then outputs a tab-delimited text file for each collection into which it can identify linkable content, into /srv/scripts/eads/linkrefs/ ; this file specifies the reference identifier in the EAD, the existing EAD box_folder value, the normalized EAD box_folder, the existing MODS box_folder value, the normalized MODS box_folder, and the MODS identifier if a match exists for an itemID value. 
 +
* This script also outputs a few other files, such as problems found in EADs (see /srv/scripts/eads/output/).
  
  Once a month, Jody collects all online EADs for linking in newly digitized content.
+
More discussion of the linking from EADs can be found in [[Linking_out_from_EADS]].
  After placing them in a single directory, she modifies a line in eadModsTester and linkInContent to point to this directory,
+
   
  and runs those two scripts (steps 3 and 4). Step 5 follows this monthly process.
+
  
5)'''THIS STEP FOR LINKING NEWLY DIGITIZED CONTENT into EADs previously uploaded.''' If any linking is done (previous step) linkedEADlive will copy the EAD into Acumen, place a copy in the /srv/deposits/EADs/new directory for uploading to the archive, and will give the archivists a copy over on the share drive, and also load a copy into the /srv/deposits/EADs/linked directory. Remaining EADs in notInDbase may be discarded, as they are old copies that did not link.
+
4) "linkInContent" (in /srv/scripts/eads/):
 +
*  uses the linkrefs files created by eadModsTester, and
 +
*  while referencing the EADs in /srv/deposits/EADs/notInDbase/, it pulls from /srv/deposits/EADs/unlinkedASCII/.
 +
* If MODS box/folder values are indicated, this script goes through the Acumen directories to locate the corresponding items (otherwise, utilizes the item identifier if in the last column).
 +
*  It creates PURL links for each of these,
 +
*  parsing through the EAD to add them in the correct location, including adding boxes and folders if necessary. 
 +
*  The linked version is placed in /srv/scripts/eads/LINKED/ ; 
 +
*  if this process is successful, it is then written back to /srv/deposits/EADs/notInDbase, overwriting the deposited version. 
 +
*  If no links are added, the version in the unlinkedASCII folder is written to the notInDbase folder, again overwriting the deposited version.
 +
 +
5)  "EadsToDbase" (in /srv/scripts/eads/):
 +
*  pulls from /srv/deposits/EADs/notInDbase/,
 +
*  updates the InfoTrack.allColls database table (including replacing changed title and abstract) --  (the values in the database appear in the online collection list ([http://www.lib.ua.edu/digital/browse| collection list])),
 +
*  puts the EADs live online in Acumen, and
 +
*  moves the copy from notInDbase to /srv/deposits/EADs/new/  for archiving
  
  
'''''Note:  Currently lists of problems found are being copied to the archivists area under S:\Special Collections\Digital_Program_files\EAD\Feedback into the summaries and byEAD subdirectories.  What needs to be done here is a sorting:  what changes need to be made to the EAD;  what EADs must be linked by hand (already contain unlinked items);  what collections require MODS remediation (item level metadata repair);  and script errors.'''''
 
  
 +
6) "LostLinks4" (in /srv/scripts/eads/):
 +
*  collects all the links in the newly linked EADs in /srv/deposits/EADs/new/  and the most recent version of the corresponding EADs in the archive.
 +
*  if any links have been lost, this script creates a directory (named for the current date) in /srv/deposits/EADs/olderVersions/  and copies the current archived EAD there for reference, to give us time to compare the files and correct any errors before overwriting.
 +
*  It then compiles a report of the changes, including the item number, box, folder, and reference id in the older EAD where the particular link existed. 
 +
*  Collected information for each EAD that has this problem is then emailed to the head of digital services for review and investigation.
  
5 alternative) '''THIS STEP FOR NEW or CHANGED (by archivists) EADS ONLY.''' "EadsToDbase" pulls from /srv/deposits/EADs/notInDbase/, updates the database (including replacing changed title and abstract);  the values in the database appear in the online collection list ([http://www.lib.ua.edu/digital/browse| collection list]), so it also puts the EADs live online in Acumen, and moves the copy from notInDbase to /srv/deposits/EADs/new.
 
  
6) "waitCheckEADs" checks to see if the EAD has changed from the last version cached.  If not, it is deleted from the deposits directory.  If so, it checks to see if this collection has been released into LOCKSS (and on what date). If it has, the script asks if you are going to go ahead and archive;  if you say yes,  
+
7) "cleanUpLinkingOutput" (in /srv/scripts/eads/):
the script will copy the existing manifest to one ending in "_LOCKSS_$date" where $date is today's date. We need this because LOCKSS collects each  version of manifest, and we need to know how many bytes we have in the preservation architecture, as it impacts our costsTry NOT to archive to a collection frequently, or within 2-3 weeks of release to LOCKSS.
+
checks for linkrefs files in the /srv/scripts/eads/linkrefs/older directory, which now have a problems file, but not a linkrefs file -- if found, adds this to the error report, as this indicates a failure to link content into a finding aid that previously had links
 +
*  moves linkrefs files and output files to "older" subdirectories in their respective areas
 +
* compares these to the previous versions there;
 +
if the problems file is smaller than previous ones, the previous ones are deleted;  if not, an error is generated.
 +
*  same goes for the linkrefs file
 +
*  If the new EAD has fewer links than the old one (stored in /srv/scripts/eads/backups/), a summary of the difference in the number of links is added to the error report
 +
* The problem output files are copied to the archivist feedback area (/cifs-mount2/Feedback/ or S:\Special Collections\Digital_Program_files\EAD\Feedback ) into summary and byEAD directories (according to what it is) in datestamped directories.
 +
* and it will delete older problem files in the Feedback area, for that particular collection.
  
7) Remove RelocateManifests. Uncomment $test = 1 in relocatingEads and run it.
 
  
8) Check moveme & relocateManifests to verify that the manifests will be written correctly, and verify in move me that the Eads are going to be copied over to the correct place.
+
'''Archiving'''
  
9) Comment back in $test = 1; and re-run relocatingEads."relocatingEads" pulls from /srv/deposits/EADs/new and locates where the EADs go in the archive, versioning as necessary and linking them into existing LOCKSS manifests, or creating new ones as needed.
+
The archiving of EADs has now been integrated into the general archiving processes.
 +
See [[For Archiving]].
  
10)run Checkem to verify that they have been copied over correctly and deleting the Eads in the deposits.
 
  
11) Check directory to make sure nothing is left /srv/deposits/EADs/new. If anything is left in this location, there is a problem and you need to figure out what it is.
 
  
Some discussion of the linking from EADs, instructions, and implementation, can be found in [[Linking_out_from_EADS]].  We are currently in the throes of analysis and repair of data entry in both the item-level metadata and the EADs to enable automated even more linking to digitized content from the finding aid. 
 
  
 +
[[User:Jlderidder|Jlderidder]] ([[User talk:Jlderidder|talk]]) 11:58, 24 March 2015 (CDT)
  
updated 7/1/11 [[User:Jlderidder|Jlderidder]]
+
For reference, from the archivists:
  
[[Image:EAD3.png]]
+
[[Image: containers.jpg]]
 +
 
 +
[[http://www.lib.ua.edu/wiki/digcoll/images/e/ed/Containers.docx  Text list of containers]]

Latest revision as of 11:59, 24 March 2015

The diagram below shows the creation of the EAD in Archivists Toolkit (AT) during processing. The archivists consider the copy in Archivists Toolkit to be the copy of record; however, we have found that reloading EADs containing links to component items modifies the links, and reexporting modifies them further. So we have altered our workflow. While the EAD in AT is the copy of record for analog material, the delivery EAD (containing the links to digitized content) will be stored separately. Every time EADs are modified by the archivists, they must go through the item-level linking process again.

EAD3.png


1) EADs are exported by archivists from Archivists Toolkit and placed in the "new" or "remediated" folders in the share drive S:\Special Collections\Digital_Program_files\EAD directory.


2) Every Friday night, a script called "getEADs" ( in /srv/scripts/storing/EADs_auto/):

  • tests these EADs for correct naming (and repairs them if possible),
  • picks up these EADs,
  • makes a datestamped directory in the "uploaded" directory there on the share drive (for example, "uploaded_new_20100803"),
  • copys the EADs to the corresponding "uploaded" directory (so the archivists will know what was picked up when), and
  • puts a copy into /srv/deposits/fromSC/.
  • Also, many EADs come to us with embedded Word or PDF encodings, and some contain ampersands. These need to be corrected before they go online. So getEADs then runs through these new EADs to replace these values, and
  • puts the new version into /srv/deposits/EADs/unlinkedASCII/
  • and into the notInDbase" directory on libcontent (under /srv/deposits/EADs/) to be used for linking.


3) When content is digitized, the EAD may already be online. So the most recent makeJpegs upload scripts pull a copy of the corresponding EAD (if it exists) and places it in the /srv/script/eads/LINKME/ directory.


4) EadModsTester (in /srv/scripts/eads/):

  • checks in the /srv/scripts/eads/LINKME/ directory for EADs corresponding to newly uploaded digitized content. If found, and there's not a new version of it waiting to be linked, it adds this to those in /srv/deposits/EADs/notInDbase/.
  • Then it parses the EAD to locate box and folder numbers and itemIDs (all with corresponding ref values), and hunts through the web directories for MODS with item identifiers that correspond to the itemIDs, or for MODS within this collection (pulling box and folder values).
  • This script then outputs a tab-delimited text file for each collection into which it can identify linkable content, into /srv/scripts/eads/linkrefs/ ; this file specifies the reference identifier in the EAD, the existing EAD box_folder value, the normalized EAD box_folder, the existing MODS box_folder value, the normalized MODS box_folder, and the MODS identifier if a match exists for an itemID value.
  • This script also outputs a few other files, such as problems found in EADs (see /srv/scripts/eads/output/).

More discussion of the linking from EADs can be found in Linking_out_from_EADS.


4) "linkInContent" (in /srv/scripts/eads/):

  • uses the linkrefs files created by eadModsTester, and
  • while referencing the EADs in /srv/deposits/EADs/notInDbase/, it pulls from /srv/deposits/EADs/unlinkedASCII/.
  • If MODS box/folder values are indicated, this script goes through the Acumen directories to locate the corresponding items (otherwise, utilizes the item identifier if in the last column).
  • It creates PURL links for each of these,
  • parsing through the EAD to add them in the correct location, including adding boxes and folders if necessary.
  • The linked version is placed in /srv/scripts/eads/LINKED/ ;
  • if this process is successful, it is then written back to /srv/deposits/EADs/notInDbase, overwriting the deposited version.
  • If no links are added, the version in the unlinkedASCII folder is written to the notInDbase folder, again overwriting the deposited version.

5) "EadsToDbase" (in /srv/scripts/eads/):

  • pulls from /srv/deposits/EADs/notInDbase/,
  • updates the InfoTrack.allColls database table (including replacing changed title and abstract) -- (the values in the database appear in the online collection list (collection list)),
  • puts the EADs live online in Acumen, and
  • moves the copy from notInDbase to /srv/deposits/EADs/new/ for archiving


6) "LostLinks4" (in /srv/scripts/eads/):

  • collects all the links in the newly linked EADs in /srv/deposits/EADs/new/ and the most recent version of the corresponding EADs in the archive.
  • if any links have been lost, this script creates a directory (named for the current date) in /srv/deposits/EADs/olderVersions/ and copies the current archived EAD there for reference, to give us time to compare the files and correct any errors before overwriting.
  • It then compiles a report of the changes, including the item number, box, folder, and reference id in the older EAD where the particular link existed.
  • Collected information for each EAD that has this problem is then emailed to the head of digital services for review and investigation.


7) "cleanUpLinkingOutput" (in /srv/scripts/eads/):

  • checks for linkrefs files in the /srv/scripts/eads/linkrefs/older directory, which now have a problems file, but not a linkrefs file -- if found, adds this to the error report, as this indicates a failure to link content into a finding aid that previously had links
  • moves linkrefs files and output files to "older" subdirectories in their respective areas
  • compares these to the previous versions there;
  • if the problems file is smaller than previous ones, the previous ones are deleted; if not, an error is generated.
  • same goes for the linkrefs file
  • If the new EAD has fewer links than the old one (stored in /srv/scripts/eads/backups/), a summary of the difference in the number of links is added to the error report
  • The problem output files are copied to the archivist feedback area (/cifs-mount2/Feedback/ or S:\Special Collections\Digital_Program_files\EAD\Feedback ) into summary and byEAD directories (according to what it is) in datestamped directories.
  • and it will delete older problem files in the Feedback area, for that particular collection.


Archiving

The archiving of EADs has now been integrated into the general archiving processes. See For Archiving.



Jlderidder (talk) 11:58, 24 March 2015 (CDT)

For reference, from the archivists:

Containers.jpg

[Text list of containers]

Personal tools