Creating OCR Files
- for information on OCR versions of transcripts, see Transcripts.
OCR stands for Optical Character Recognition. This is the process by which typewritten or printed text is electronically translated into machine-editable text. We perform OCR on all images containing typewritten or printed text. OCR is a step in our digitization process.
The following is for creating OCR during the digitization process. For information and script for creating OCR on the server (Linux), see For_Creating_Derivatives.
OCR files are saved within the collection directory within a folder called Transcripts. For large collections, coordinating Scans and OCR folders should be created within the Transcripts folder (for example, OCR text files from Scans_3 would be placed in a folder called OCR_3 within the Transcripts folder). OCR text files are saved in the following format: ex. u0003_0001577_0000233.ocr.txt
OCR Process (Windows)
- Open Adobe Acrobat 9 Pro.
- Choose the Document tab across the top. From the drop-down, hover over OCR Text Recognition, and then choose Recognize Text in Multiple Files Using OCR.
- On the box that opens, click the Add Files button and navigate to the file you want. When files are selected, click OK.
- The Output Options box will now open. Choose these settings:
- Under Target Folder, choose Specific Folder and navigate to the folder you have prepared for output.
- Under Filenaming, chose Add to Original Filename. Under Insert After, type in ".ocr". Uncheck Overwrite Existing Files.
- Under Output Format, chose Export File(s) to Alternate Format and select Text (Plain) from the drop down menu.
- Click OK. Large files may take quite a bit of time.
OCR Process (Mac)
- Open Adobe Acrobat 8 Pro.
- Drag file into Adobe Acrobat 8 Pro.
- Choose the Document tab across the top. From the drop-down, hover over OCR Text Recognition, and then choose Recognize Text Using OCR.
- In Mac, the file is not automatically saved. So once OCR is done, go to File>Save As>Text (plain). Save to your local transcripts folder.
Unlike Windows/Acrobat-9 which is capable of batch OCR, Mac/Acrobat-8 appears to only support OCR for one item at a time.