Creating OCR Files
The following is for creating OCR during the digitization process.
- For information and script for creating OCR on the server (Linux), see For_Creating_Derivatives and OCR List
- For information on OCR versions of transcripts, see Transcripts.
What is it?
OCR stands for Optical Character Recognition. This is the process by which typewritten or printed text is electronically translated into machine-editable text. We perform OCR on all images containing typewritten or printed text. OCR is a step in our digitization process.
Where do these files go? [this information is out of date]
OCR files are saved within the collection directory within a folder called Transcripts. For large collections, coordinating Scans and OCR folders should be created within the Transcripts folder (for example, OCR text files from Scans_3 would be placed in a folder called OCR_3 within the Transcripts folder). OCR text files are saved in the following format: ex. u0003_0001577_0000233.ocr.txt
OCR Process (Windows)
- Open Adobe Acrobat 9 Pro.
- Choose the Document tab across the top. From the drop-down, hover over OCR Text Recognition, and then choose Recognize Text in Multiple Files Using OCR.
- On the box that opens, click the Add Files button and navigate to the file you want. When files are selected, click OK.
- The Output Options box will now open. Choose these settings:
- Under Target Folder, choose Specific Folder and navigate to the folder you have prepared for output.
- Under Filenaming, chose Add to Original Filename. Under Insert After, type in ".ocr". Uncheck Overwrite Existing Files.
- Under Output Format, chose Export File(s) to Alternate Format and select Text (Plain) from the drop down menu.
- Click OK. Large files may take quite a bit of time.
OCR Process (Mac)
- Open Adobe Acrobat 8 Pro.
- Drag file into Adobe Acrobat 8 Pro.
- Choose the Document tab across the top. From the drop-down, hover over OCR Text Recognition, and then choose Recognize Text Using OCR.
- In Mac, the file is not automatically saved. So once OCR is done, go to File>Save As>Text (plain). Save to your local transcripts folder.
Unlike Windows/Acrobat-9 which is capable of batch OCR, Mac/Acrobat-8 appears to only support OCR for one item at a time.