File Naming and Linking for LOCKSS

From UA Libraries Digital Services Planning and Documentation
Jump to: navigation, search

Once content is harvested into LOCKSS, that content should not change.

However, metadata regularly changes, and we would like to keep our current version of the metadata as the same filename from version to version.

Hence, all metadata and documentation files that we store in LOCKSS will be assigned a version number, and that copy is what will be linked. However, we will only store the oldest and most recent versions of metadata.

Thus, the first version of u0003_0000252_0000002.mods.xml will have a copy made and named u0003_0000252_0000002.mods.v1.xml which will be stored in the same directory, and it will be this file which is linked in LOCKSS manifests.

When the next version of the u0003_0000252_0000002.mods.xml is created, it overwrites the u0003_0000252_0000002.mods.xml in the storage directory, AND is copied to u0003_0000252_0000002.mods.v2.xml, which will be added to the LOCKSS manifest.

Following versions of u0003_0000252_0000002.mods.xml also overwrite the u0003_0000252_0000002.mods.xml in the storage directory, AND overwrite u0003_0000252_0000002.mods.v2.xml (after backing it up if in LOCKSS). Since this is already linked into the LOCKSS manifest, it is not linked again. The backup copies (if the collection has been harvested by LOCKSS partners) have the suffix "_LOCKSS_yyyy-mm-dd" so we can measure the size of our possible LOCKSS content. In the future, we hope to modify LOCKSS storage to delete all but the most recent .v2.mods.xml and .v2.ead.xml files.

In this way, the delivery software which draws from our storage content will always pick up the most recent version of the metadata, and LOCKSS manifests themselves will not be impacted by our metadata upgrades.

As of September 2011, we have selected this content for long term storage AND linking into LOCKSS manifests:

  1. archival tiffs
  2. archival wav files
  3. METS xml containing Qualified Dublin Core
  4. tab-delimited text files containing metadata spreadsheets for entire collections
  5. EAD xml
  6. collection information xml
  7. a png icon for each collection
  8. MODS xml

Other files may exist in the directories which are important to us, but which are not necessary for inclusion in LOCKSS. Those files must be in archival formats.

Each link specifies what is linked. For example:

<a href="">Septimus D. Cabaniss Papers u0003_0000252 Manifest.html</a>

Our Manifests are named Manifest.html, and one exists in each Documentation directory on each level, linking to any levels beneath it which offer content for harvesting.

Each Manifest is divided into sections which use lists to provide access to the links:

  1. Administrative Information (contents of Documentation directory),
  2. Collection Level Metadata (Contents of collection level Metadata directory),
  3. Metadata (for each item)
  4. Transcripts (if available)
  5. Content (archival files).

Each Manifest states the collection name (or level name) and number, for example: "Septimus D. Cabaniss Papers u0003_0000252 LOCKSS Manifest Page" or "University of Alabama Hoole Special Collections Manucripts u0003 Manifest Page".

Each Manifest has the following statement at the bottom: "LOCKSS system has permission to collect, preserve, and serve this Archival Unit"