There are times when the intellectual object does not correspond to the file division. This creates a problem in that the metadata file should be named for the intellectual object, and this is what should be presented to the user; however, the files which compose the intellectual object may not match up to the same boundaries.

We have two examples. One in which the preservation of the representation of the analog object requires retention of the logical representation (a letterbook), and one in which the logical representation is an unrelated artifact (audio reels).


The letterbooks we are digitizing have letters which may begin on one page and end on another, to be followed by another letter on the second page. Hence, the intellectual item is the letter, but it is also the book. The pages of the book do not correspond to the letters. Letter 12 may span pages 6 and part of page 7. How do we number the files and provide intellectual access?

The letters for which we have typed transcripts and/or specific descriptive metadata, have priority for delivery as intellectual items over the letters which do not.

Each letter to be delivered as an intellectual item with associated metadata and transcription or OCR, is considered an item in the collection.

If that letter spans more than 1 page, that letter will have multiple page images, one per page containing a portion of the letter. In the previous example where letter 12 spans pages 6 and part of page 7, those pages would be stored in the item as (assuming this is in the Jemison collection, and letter 12 is the 12th intellectual item scanned in this collection:



Letters for which we do NOT have transcripts or specific descriptive metadata need not be considered separate items. However, they will be included in the book as an intellectual item.

The entire book will be delivered as a separate item within the collection, with its own descriptive metadata. Thus we will be storing some of the same tiff images under more than one file name.

What this means:

Again, if letter 12 spans letterbook pages 6 and part of page 7, and letter 13 is on page 7, and we have transcripts for letters 12 and 13, then the tiff for page 6 will be stored as a part of letter 12, and will also be stored as a part of the book item.

(assuming the book is scanned as the 451st intellectual item and that the page count started with the cover:)

u0003_0000753_0000012_0001.tif (explained above)

u0003_0000753_0000451_0006.tif (the same tiff in the book item)

Page 7 will be stored 3 times: as part of letter 12, as part of letter 13, and as part of the book item.

u0003_0000753_0000012_0002.tif (explained above) u0003_0000753_0000013_0001.tif u0003_0000753_0000451_0007.tif

While this increases the storage cost, it disambiguates the situation while providing the best usability and representation of the intellectual item(s).

Audio Reels

