From UA Libraries Digital Services Planning and Documentation
Revision as of 14:51, 11 September 2008 by Jjcolonnaromano (talk | contribs)

Situations arise in which we may have multiple versions of a particular image or object, for a variety of reasons.

As we cannot foresee all the situations, we will proceed in determining policies in a case by case fashion.

Rule 1. Different format, same file.

Clearly, if another format of the same file is being created, the only change needs to be the extension.


  • A jpeg made from the tif file u0003_000608_0000234_0001.tif would be named u0003_000608_0000234_0001.jpg.
  • A plain text derivation of the tif file u0003_000608_0000234_0001.tif (created by OCR process or hand-typing into a plain text editor) would be named u0003_000608_0000234_0001.txt.
  • Someone reads the letter into a microphone, digitizing the text in audio format. In this case, no pagination is involved, so the filename of the ITEM ends in the extension of the audio format, for example: u0003_000608_0000234.mp3.

Rule 2. Same format, different version.

If, however another tiff is being made of an alternate version of the same object, we need another method. When we have a typed transcription of a handwritten letter, and we scan the typed transcription in order to create OCR text, we need a naming scheme. This image only exists for OCR purposes, so we have decided that for tif file (is it too much to assume that the fidelity of the transcription is enough that we can make the intellectual leap that the OCR comes from the hand written letter with out bulking up the filename with the files geneology. can we not assume equivalency of "a" and "c" if they share "b" u0003_000608_0000234_0001.tif the tiff to be used for OCR purposes (the one made of the transcription) is to be named u0003_000608_0000234_ocr_0001.tif.

When this file is actually OCR'd, it will create a .txt file, which will then be named u0003_000608_0000234_ocr_0001.txt. The reasoning here is this: the typed transcription normally does NOT correspond with the pagination of the original scanned object. That is, page 1 of the transcription normally has all of page 1 and most of page 2 of the original object transcribed onto it. So page 2 of the transcription does not contain all and only the content of page 2 of the original object.

(remember, if an OCR text file is made from the original scanned object: u0003_000608_0000234_0001.tif, then that text file would be named u0003_000608_0000234_0001.txt. This follows rule 1 above, in that it is a different format of the original file.)

Hence the choice here is to add something standard into the filename that indicates the purpose of the new object, at the point where it makes the most sense. But does this scale? My first thought is, probably not.

I'm thinking: we create version 6.0 tiffs right now, at 600 dpi. 10 years from now, we may decide we need to have version 8.0 tiffs instead, though we want to keep the originals. So what do we name the new tiffs? Or do I need to worry about that now?

 I welcome your thoughts!!     Jlderidder