Intellectual vs Logical
There are times when the intellectual object does not correspond to the file division. This creates a problem in that the metadata file should be named for the intellectual object, and this is what should be presented to the user; however, the files which compose the intellectual object may not match up to the same boundaries.
We have two examples. One in which the preservation of the representation of the analog object requires retention of the logical representation (a letterbook), and one in which the logical representation is an unrelated artifact (audio reels).
The letterbooks we are digitizing have letters which may begin on one page and end on another, to be followed by another letter on the second page. Hence, the intellectual item is the letter, but it is also the book. The pages of the book do not correspond to the letters. Letter 12 may span pages 6 and part of page 7. How do we number the files and provide intellectual access?
The letters for which we have typed transcripts and/or specific descriptive metadata, have priority for delivery as intellectual items over the letters which do not.
Each letter to be delivered as an intellectual item with associated metadata and transcription or OCR, is considered an item in the collection.
If that letter spans more than 1 page, that letter will have multiple page images, one per page containing a portion of the letter. In the previous example where letter 12 spans pages 6 and part of page 7, those pages would be stored in the item as (assuming this is in the Jemison collection, and letter 12 is the 12th intellectual item scanned in this collection:
Letters for which we do NOT have transcripts or specific descriptive metadata need not be considered separate items. However, they will be included in the book as an intellectual item.
The entire book will be delivered as a separate item within the collection, with its own descriptive metadata. Thus we will be storing some of the same tiff images under more than one file name.
What this means:
Again, if letter 12 spans letterbook pages 6 and part of page 7, and letter 13 is on page 7, and we have transcripts for letters 12 and 13, then the tiff for page 6 will be stored as a part of letter 12, and will also be stored as a part of the book item.
(assuming the book is scanned as the 451st intellectual item and that the page count started with the cover:)
u0003_0000753_0000012_0001.tif (explained above)
u0003_0000753_0000451_0006.tif (the same tiff in the book item)
Page 7 will be stored 3 times: as part of letter 12, as part of letter 13, and as part of the book item.
u0003_0000753_0000012_0002.tif (explained above) u0003_0000753_0000013_0001.tif u0003_0000753_0000451_0007.tif
While this increases the storage cost, it disambiguates the situation while providing the best usability and representation of the intellectual item(s).
Audio reels may or may not correspond to the intellectual object. The capture of the performance may in no way relate to the number of reels used, or where that performance capture began on a reel. That is to say, a performance may span multiple reels, and a reel may contain multiple performances, and a performance may span part of one reel and part of another.
We currently are considering a "performance" to be an intellectual item. That "performance" may simply be a home practice session, a lecture, or some other sensible division of an intellectual whole.
Currently we are creating one WAV file per side of a reel. Obviously, the file names used for these WAV files may not at all correspond to the web delivery file names of the intellectual item.
Case 1: Multiple WAV files per single performance
The performance is the intellectual item. Assuming this is the 14th performance identified in the Rohlig collection, the identifier for this performance is u0008_0000002_0000014. Let's say this performance spans one entire, 2-sided reel and one side of a second reel. Since we have 2 WAV files per reel (one per side) this is 3 WAV files. The numbering to use would be this:
Here we are following the file naming convention used previously for multiple batches of spreadsheets (for large or continuing collections), to indicate that each is only a portion of the whole; to recreate the whole, all these numbered parts would need to be concatenated in the order indicated by the number preceding the extension.
This clarifies that all these files are part of the same performance, but they may NOT have a one-to-one correspondence with the intellectual division of this performance into works.
Thus, if this performance contains 5 different works, to be presented online as 5 MP3 files each with a separate metadata record, those intellectual sub-parts would be represented by the following identifiers:
Documentation in the Admin folder (the Documentation folder of the archive) will contain information spelling out how the separate intellectual items are to be derived from the WAV files.
Case 2: Single WAV file containing multiple performances
Consider the situation where a WAV file spans multiple performances. Again, the performance is the intellectual item. In this case, the WAV file will be duplicated, named for each of the intellectual items it encompasses. If performance 16, 17, and 18 of this collection are on a single WAV file, the following files in storage will be identical:
Again, this increases storage cost but reduces ambiguity and confusion.
Case 3: Part of a performance on part of one WAV file, part of it on another
Let's assume that performance 21 begins on the last part of the 2nd side of reel A (we'll call this WAV X) and ends on the first portion of the 1st side of reel B (we'll call this WAV Y). 2 WAV files each contain a segment of performance 21.
This also implies that performance 20 (or part of it) is on WAV X, and performance 22 (or part of it) is on WAV Y. (This may not be true, but for argument's sake, let's say it is.)
WAV X will then have to be duplicated, once for performance 20 and once for performance 21; and WAV Y will also have to be duplicated, once for performance 21 and once for performance 22. (see previous case above)
The WAV files stored for performance 21 will be named:
This means that we cannot assume that a set of WAV files under an item number contain ONLY the intellectual item indicated; instead, we can only assume that the intellectual item is WITHIN the WAV files so numbered, and reconstruction of the intellectual item requires all the WAV files included in this set.
(updated 2/3/10, Jody DeRidder)