For Archiving

From UA Libraries Digital Services Planning and Documentation
Revision as of 10:19, 14 December 2012 by Jlderidder (talk | contribs)

Currently, content being uploaded for archival storage is in a specific organization (specified here: Share_Drive_Protocols).

Once this content is placed into the /srv/deposits/content/ directory on libcontent (a Linux server), we :

  1. verify that it copied correctly across the network,
  2. check the content with quality control verification scripts (such as File:TestIncoming.txt)
  3. upload the collection information file content into the database to provide access to the online collection via a web-side php script, and
  4. then we archive it.

Archiving it means that we weed out extraneous files, re-order content (via copy) according to our storage organization (specified here: Organization_of_completed_content_for_long-term_storage), version the metadata, xml, or text files (linking into the manifest only the version; the updated one overwrites the unversioned copy in the directory) and either create a LOCKSS manifest for this content or alter existing ones to include this content.

We have different versions of the archiving script (Relocating) for different materials (on libcontent). Most content is archived using the relocating script in /srv/storing/; there's another for ETDs in the bornDigital subdirectory, one for MODS in the MODS subdirectory, one for EADs in the eads subdirectory, one for tags and transcriptions in the crowdsource subdirectory, and so forth. In each, by uncommenting out the $test = 1; line, you can run this as a test, which will not change any existing manifests or copy content. Instead, it will write all the manifest changes and creations into one huge file called RelocateManifests, and it will still write a list of what files it will copy where to the "moveme" file.

After running this script for real, run "checkem" which goes through the moveme file, does md5 comparison on the old file and the new one -- if they're the same, it will delete the old on in the deposits directory. If they're not the same, it will output an error and leave the original untouched.

Here's the checkem script: File:Checkem.txt

We also have verification scripts, such as "checkArchive" which verifies that everything in each manifest is in the archive, and everything I intended to link into the manifest is indeed linked there properly.

When we digitize multiple tiny collections, we may combine the spreadsheets, for simplicity. Then, however, they must be split out by collection for archiving: File:SplitExcel.txt