Organization of completed content for long-term storage
As we expand our holdings to multiple collections, and content from different sources beyond Hoole, we need to organize files and folders in a systematic manner. The following solution follows a simple rule: replace the underscore in a file name with a forward slash, to determine the appropriate directory location for the file.
Based on the [Naming Schemes] we selected,
I propose the following organization for materials that we have already digitized:
Within a specified directory on a slow server (to reduce risk of damage and corruption by access):
1) By holding institution and subsidiary group
The first letter in a filename indicates something about the origin. All digital content starting with "u" originated from holdings within the University of Alabama, whereas other letters indicate origins elsewhere. Following the initial letter are a series of 4 numbers indicating a grouping and an institution. Using these 5 digits to create the first level directory structure ensures that all content is segregated by the holding organization; that is, all Hoole Rare books content will be in folder "u0004" as that is the number assigned to it. For example, a file labeled u0004_0002061_0000345_0003.tif will be found in a subdirectory under /u0004/ in the file system.
2) By collection
In the third level (within the folders on the 2nd level), folders will be named according to the 2nd set of numbers in the filename: after the first underscore and preceding the 2nd underscore. This is the number of the collection for the institution/grouping. In the file name u0004_0002061_0000345_0003.tif, "0002061" indicates collection number 2061, so the folder "2061" will exist for this collection: /u0004/0002061/
- note that metadata and images for collections that do not have items digitized will be stored here, and there will not be a 3rd or 4th level.
3) By item
In the 4th level (within the folders on the 3rd level), folders will be named according to the 3rd set of numbers in the filename: after the 2nd underscore, and preceding the 3rd underscore. This is the number assigned to the item within the specified collection. In the file name u0004_0002061_0000345_0003.tif, "345" is the number for this item in this collection. So the folder /u0004/0002061/0000345 will contain all files relating to this item.
- note that items that do not have pages will be stored here, and there will not be a 4th or 5th level.
4) By sequence for delivery
In the 5th level (within the folders on the 4th level), folders will be named according to the 4th set of numbers in the filename: after the 3rd underscore, and preceding the 4th underscore or the period and filename extension. This is the number assigned to the sequence of delivery for the files within the item. In the file name u0004_0002061_0000345_0003.tif, "0003" indicates the 3rd image in a sequence, so the directory /u0004/0002061/0000345/0003/ will contain this tiff and all information associated with it.
- note that if there are subpages, such as in a scrapbook, there will be an additional level beneath this one, using the same reasoning.
OTHER SUBFOLDERS IN EACH OF THE ABOVE
Each of the levels above may contain any or all of the following folders:
- Note: All documentation needs to be in unicode or ascii xml or plain text.
This directory holds administrative information.
- Within the text folder, subfolders should be named "ocr" for ocr text in ascii or unicode; "transcribed" for transcribed text in ascii or unicode. All text should be stored either as xml or plain text (.txt files).
This will enable us to identify text which may be poor as opposed to better quality. If the text is ocr text which has been remediated, it should be stored as "transcribed".
- If a METS file is available to organize the metadata and tag it with appropriate namespaces, that is ideal. This METS file should have xlinks to the archival quality bitstreams. If a METS file is not available: within the metadata folder, subfolders should be named according to the type of metadata they contain: type followed by underscore, followed by version. The following shorthand is to be used:
- qdc for qualified Dublin Core
- udc for unqualified Dublin Core
- mods for MODS (Metadata Object Description Standard)
- mets for METS (Metadata Encoding Transmission Standard)
- tei for TEI (Text Encoding Initiative)
- ead for EAD (Encoded Archival Description)
Thus, an example folder udc_1.1 can be expected to contain an unqualified Dublin Core record meeting the specifications of version 1.1. Likewise, a folder named mods_3.2 would contain a MODS version 3.2 metadata record.
Note that the metadata record is stored at the appropriate level: an EAD would be stored at the collection level, as it is a collection level record. If the collection contains only one item, then it should be labeled item 1, and the metadata for the item would be in the item directory, to avoid confusion.
Thus, /u0004/2061/metadata/ may contain mods_3.2 and ead_2002 and udc_1.1 directories, each containing collection-level metadata about collection 2061. This describes the collection.
If collection 2061 currently includes only one item, the filename for that item should be u0001_0002061_0000345; the directory /u0004/0002061/0000001/metadata would contain the metadata about that item. If there is page-level metadata for that item, then the metadata for the first page would be stored in /u0002/0002061/0000001/0001/metadata, the metadata for the 2nd page would be stored in /u0004/0002061/0000001/0002/metadata, and so forth.
- For metadata records of local profiles (for example, where needed fields are taken from different metadata schemas), a schema or dtd or text/xml data dictionary is expected within a subsidiary "documentation" folder. The folder containing the metadata record and the "documentation" folder should be named according to the following system: "profile" followed by underscore, followed by the 8 digit date (year month day sequence), followed by underscore, followed by the initials of the responsible party. For example: "profile_20080825_jld" would indicate the profile Jody Lynn DeRidder created on August 25, 2008. Note that the date is the date of the profile, not the date this record was created or stored.
Thus, within the /u0004/0002061/0000345/metadata/profile_20080825_jld/ folder, you would find an xml metadata record meeting a profile specified by text or xml information in /u0004/0002061/0000345/metadata/profile_20080825_jld/documentation/ -- and the metadata record would be for the item u0004_0002061_0000345.
Notice that additional forms of metadata may be added (structural, administrative, and technical) within each metadata folder without confusion as to what the metadata is or what it is about. Hopefully, these can then be incorporated into METS files at some point, to simplify all this.