Difference between revisions of "OtherVersions"

From UA Libraries Digital Services Planning and Documentation
Line 4: Line 4:
  
  
== Rule 1.  Different format, same file. ==   
+
== Rule 1.  Different format, same original. ==   
  
 
Clearly, if another format of the same file is being created, the only change needs to be the extension.
 
Clearly, if another format of the same file is being created, the only change needs to be the extension.
Line 13: Line 13:
 
* A plain text derivation of the tif file  u0003_000608_0000234_0001.tif (created by OCR process or hand-typing into a plain text editor) would be named u0003_000608_0000234_0001.txt.  
 
* A plain text derivation of the tif file  u0003_000608_0000234_0001.tif (created by OCR process or hand-typing into a plain text editor) would be named u0003_000608_0000234_0001.txt.  
 
* Someone reads the letter into a microphone, digitizing the text in audio format.  In this case, no pagination is involved, so the filename of the ITEM ends in the extension of the audio format, for example:  u0003_000608_0000234.mp3.
 
* Someone reads the letter into a microphone, digitizing the text in audio format.  In this case, no pagination is involved, so the filename of the ITEM ends in the extension of the audio format, for example:  u0003_000608_0000234.mp3.
 +
* A separate version of the document exists, typed, and we want to OCR it for the content, which then will be applied for indexing and searching the scan of the handwritten (or original document).  In this case, the scan of the alternate (typed) version of the document should be put in a separate folder labeled "Transcriptions" and discarded after the OCR text file is created. (This ruling compliments of Jeremiah Colonna-Romano.)
  
== Rule 2. Same format, different version. ==
+
 
 +
== Rule 2. Same format, alternative original. ==
  
 
If, however another tiff is being made of an alternate version of the same object, we need another method.
 
If, however another tiff is being made of an alternate version of the same object, we need another method.
When we have a typed transcription of a handwritten letter, and we scan the typed transcription in order to create OCR text, we need a naming scheme.  This image only exists for OCR purposes, so we have decided that for tif file ''(is it too much to assume that the fidelity of the transcription is enough that we can make the intellectual leap that the OCR comes from the hand written letter with out bulking up the filename with the files geneology? can we not assume equivalency of "a" and "c" if they share "b", and lose the "ocr_" from any ocr text file?)'' u0003_000608_0000234_0001.tif  the tiff to be used for OCR purposes (the one made of the transcription) is to be named
 
u0003_000608_0000234_ocr_0001.tif.
 
  
When this file is actually OCR'd, it will create a .txt file, which will then be named u0003_000608_0000234_ocr_0001.txt.
+
Following on the reasoning in the last example in Rule 1, the alternate version will be stored in a separate folder labeled appropriately. In this case, we are storing the scans of the typed transcripts of handwritten documents in a folder labeled "Transcriptions".
The reasoning here is this:  the typed transcription normally does NOT correspond with the pagination of the original scanned object.  That is, page 1 of the transcription normally has all of page 1 and most of page 2 of the original object transcribed onto it.  So page 2 of the transcription does not contain all and only the content of page 2 of the original object.'' if the pagination of the transcription is out of sync with the object how is the ocr matched up with the objects pages so that it can be used to locate words? and if that is not possible with the use of transcriptions and we only want to use the ocr'd text to get the user to the right letter do the transcans/ocr need to have all of the same filenaming components as the final tiffs of the object
 
  
(remember, if an OCR text file is made from the original scanned object:  u0003_000608_0000234_0001.tif, then that text file would be named u0003_000608_0000234_0001.txt.  This follows rule 1 above, in that it is a different format of the original file.)
+
== Rule 3. Different format, same original. ==
  
Hence the choice here is to add something standard into the filename that indicates the purpose of the new object, at the point where it makes the most senseBut does this scale?  My first thought is, probably not.
+
Following on the decision in Rule 2, label the folder appropriately to disambiguate, for if the extensions are the same, we need to prevent file overwritingFor example, if previous files have been TIFF 6.0 600 dpi, and you are making scans that are type TIFF 7.0 300 dpi, then the folder should be labeled "tiff_7.0_300dpi".
  
I'm thinking:  we create version 6.0 tiffs right now, at 600 dpi10 years from now, we may decide we need to have version 8.0 tiffs instead, though we want to keep the originals. So what do we name the new tiffs?  Or do I need to worry about that now?
+
== Rule 4Different format, alternative original ==
 +
Follow the ruling that is spelled out in Rule 1.
  
  
  I welcome your thoughts!!    [[User:Jlderidder|Jlderidder]]
+
  [[User:Jlderidder|Jlderidder]]

Revision as of 10:23, 17 September 2008

Situations arise in which we may have multiple versions of a particular image or object, for a variety of reasons.

As we cannot foresee all the situations, we will proceed in determining policies in a case by case fashion.


Rule 1. Different format, same original.

Clearly, if another format of the same file is being created, the only change needs to be the extension.

Examples:

  • A jpeg made from the tif file u0003_000608_0000234_0001.tif would be named u0003_000608_0000234_0001.jpg.
  • A plain text derivation of the tif file u0003_000608_0000234_0001.tif (created by OCR process or hand-typing into a plain text editor) would be named u0003_000608_0000234_0001.txt.
  • Someone reads the letter into a microphone, digitizing the text in audio format. In this case, no pagination is involved, so the filename of the ITEM ends in the extension of the audio format, for example: u0003_000608_0000234.mp3.
  • A separate version of the document exists, typed, and we want to OCR it for the content, which then will be applied for indexing and searching the scan of the handwritten (or original document). In this case, the scan of the alternate (typed) version of the document should be put in a separate folder labeled "Transcriptions" and discarded after the OCR text file is created. (This ruling compliments of Jeremiah Colonna-Romano.)


Rule 2. Same format, alternative original.

If, however another tiff is being made of an alternate version of the same object, we need another method.

Following on the reasoning in the last example in Rule 1, the alternate version will be stored in a separate folder labeled appropriately. In this case, we are storing the scans of the typed transcripts of handwritten documents in a folder labeled "Transcriptions".

Rule 3. Different format, same original.

Following on the decision in Rule 2, label the folder appropriately to disambiguate, for if the extensions are the same, we need to prevent file overwriting. For example, if previous files have been TIFF 6.0 600 dpi, and you are making scans that are type TIFF 7.0 300 dpi, then the folder should be labeled "tiff_7.0_300dpi".

Rule 4. Different format, alternative original

Follow the ruling that is spelled out in Rule 1.


  Jlderidder