Intellectual vs Logical
There are times when the intellectual object does not correspond to the file division. This creates a problem in that the metadata file should be named for the intellectual object, and this is what should be presented to the user; however, the files which compose the intellectual object may not match up to the same boundaries.
We have two examples. One in which the preservation of the representation of the analog object requires retention of the logical representation (a letterbook), and one in which the logical representation is an unrelated artifact (audio reels).
The letterbooks we are digitizing have letters which may begin on one page and end on another, to be followed by another letter on the second page. Hence, the intellectual item is the letter, but it is also the book. The pages of the book do not correspond to the letters. Letter 12 may span pages 6 and part of page 7. How do we number the files and provide intellectual access?
The letters for which we have typed transcripts and/or specific descriptive metadata, have priority for delivery as intellectual items over the letters which do not.
Each letter to be delivered as an intellectual item with associated metadata and transcription or OCR, is considered an item in the collection.
If that letter spans more than 1 page, that letter will have multiple page images, one per page containing a portion of the letter. In the previous example where letter 12 spans pages 6 and part of page 7, those pages would be stored in the item as (assuming this is in the Jemison collection, and letter 12 is the 12th intellectual item scanned in this collection:
Letters for which we do NOT have transcripts or specific descriptive metadata need not be considered separate items. However, they will be included in the book as an intellectual item.
The entire book will be delivered as a separate item within the collection, with its own descriptive metadata. Thus we will be storing some of the same tiff images under more than one file name.
What this means:
Again, if letter 12 spans letterbook pages 6 and part of page 7, and letter 13 is on page 7, and we have transcripts for letters 12 and 13, then the tiff for page 6 will be stored as a part of letter 12, and will also be stored as a part of the book item.
(assuming the book is scanned as the 451st intellectual item and that the page count started with the cover:)
u0003_0000753_0000012_0001.tif (explained above)
u0003_0000753_0000451_0006.tif (the same tiff in the book item)
Page 7 will be stored 3 times: as part of letter 12, as part of letter 13, and as part of the book item.
u0003_0000753_0000012_0002.tif (explained above) u0003_0000753_0000013_0001.tif u0003_0000753_0000451_0007.tif
While this increases the storage cost, it disambiguates the situation while providing the best usability and representation of the intellectual item(s).
Please see Item_Numbering_Variations for information on file naming for the Emphasis and Working Lives audio collections.
Audio reels may or may not correspond to the intellectual object. The capture of the performance may in no way relate to the number of reels used, or where that performance capture began on a reel. That is to say, a performance may span multiple discrete channels on a reel, multiple reels, and a reel may contain multiple performances, and a performance may span only part of one reel and only part of another.
We currently are considering a "performance" to be an intellectual item. That "performance" may simply be a home practice session, a lecture, or some other sensible division of an intellectual whole.
Currently we are creating one WAV file per interconnected channels per side per reel. That is to say if there are two monaural sides for a reel, two monaural WAV files will be made. If there are two stereo sides for a reel, two stereo WAV files will be made. If one side consists of a monaural track and the other side is stereo, one monaural WAV file and one stereo WAV file will respectively be made. If a reel consists of a 4-track recording, one stereo WAV file will be made (i.e. each channel of the WAV file will contain a mix of two channels on the reel). If there are two discrete monaural tracks on one side, two monaural WAV files will be made. That is to say only monaural and stereo WAV files will be made, with a WAV file existing to house all interconnected channels per side per reel with the stipulation that stereo WAV files are made by default when more than one analog channel is to be captured onto one WAV file.
Obviously, the file names used for these WAV files may not at all correspond to the web delivery file names of the intellectual item.
Presented below are three example scenarios and the file naming methodology that will be employed to address the issues contained within. For the examples below it is assumed that no anomalous situations regarding channel configuration exist - that is to say that per what is outlined above there will be only one WAV file made per side.
Case 1: Multiple WAV files per single performance
The performance is the intellectual item. Assuming this is the 14th performance identified in the Rohlig collection, the identifier for this performance is u0008_0000002_0000014. Let's say this performance spans one entire, 2-sided reel and one side of a second reel. Since we have 2 WAV files per reel (one per side) this is 3 WAV files. The numbering to use would be this:
Here we are following the file naming convention used previously for multiple batches of spreadsheets (for large or continuing collections), to indicate that each is only a portion of the whole; to recreate the whole, all these numbered parts would need to be concatenated in the order indicated by the number preceding the extension.
This clarifies that all these files are part of the same performance, but they may NOT have a one-to-one correspondence with the intellectual division of this performance into works.
Thus, if this performance contains 5 different works, to be presented online as 5 MP3 files each with a separate metadata record, those intellectual sub-parts would be represented by the following identifiers:
Documentation in the Admin folder (the Documentation folder of the archive) will contain information spelling out how the separate intellectual items are to be derived from the WAV files.
Case 2: Single WAV file containing multiple performances
Consider the situation where a WAV file spans multiple performances. Again, the performance is the intellectual item. In this case, the WAV file will be duplicated, named for each of the intellectual items it encompasses. If performance 16, 17, and 18 of this collection are on a single WAV file, the following files in storage will be identical:
Again, this increases storage cost but reduces ambiguity and confusion.
Case 3: Part of a performance on part of one WAV file, part of it on another
Let's assume that performance 21 begins on the last part of the 2nd side of reel A (we'll call this WAV X) and ends on the first portion of the 1st side of reel B (we'll call this WAV Y). 2 WAV files each contain a segment of performance 21.
This also implies that performance 20 (or part of it) is on WAV X, and performance 22 (or part of it) is on WAV Y. (This may not be true, but for argument's sake, let's say it is.)
WAV X will then have to be duplicated, once for performance 20 and once for performance 21; and WAV Y will also have to be duplicated, once for performance 21 and once for performance 22. (see previous case above)
The WAV files stored for performance 21 will be named:
This means that we cannot assume that a set of WAV files under an item number contain ONLY the intellectual item indicated; instead, we can only assume that the intellectual item is WITHIN the WAV files so numbered, and reconstruction of the intellectual item requires all the WAV files included in this set.
(updated 2/3/10, Jody DeRidder)
(updated 7/01/10, Nitin Arora)