Revision as of 10:23, 17 September 2008 by Jlderidder (talk | contribs)

Situations arise in which we may have multiple versions of a particular image or object, for a variety of reasons.

As we cannot foresee all the situations, we will proceed in determining policies in a case by case fashion.

Rule 1. Different format, same original.

Clearly, if another format of the same file is being created, the only change needs to be the extension.


  • A jpeg made from the tif file u0003_000608_0000234_0001.tif would be named u0003_000608_0000234_0001.jpg.
  • A plain text derivation of the tif file u0003_000608_0000234_0001.tif (created by OCR process or hand-typing into a plain text editor) would be named u0003_000608_0000234_0001.txt.
  • Someone reads the letter into a microphone, digitizing the text in audio format. In this case, no pagination is involved, so the filename of the ITEM ends in the extension of the audio format, for example: u0003_000608_0000234.mp3.
  • A separate version of the document exists, typed, and we want to OCR it for the content, which then will be applied for indexing and searching the scan of the handwritten (or original document). In this case, the scan of the alternate (typed) version of the document should be put in a separate folder labeled "Transcriptions" and discarded after the OCR text file is created. (This ruling compliments of Jeremiah Colonna-Romano.)

Rule 2. Same format, alternative original.

If, however another tiff is being made of an alternate version of the same object, we need another method.

Following on the reasoning in the last example in Rule 1, the alternate version will be stored in a separate folder labeled appropriately. In this case, we are storing the scans of the typed transcripts of handwritten documents in a folder labeled "Transcriptions".

Rule 3. Different format, same original.

Following on the decision in Rule 2, label the folder appropriately to disambiguate, for if the extensions are the same, we need to prevent file overwriting. For example, if previous files have been TIFF 6.0 600 dpi, and you are making scans that are type TIFF 7.0 300 dpi, then the folder should be labeled "tiff_7.0_300dpi".

Rule 4. Different format, alternative original

Follow the ruling that is spelled out in Rule 1.