Most Content

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
(Getting content from the share drive to <u>''live''</u> in Acumen)
Line 11: Line 11:
 
## The makeJpegs script will do some minor QC, including checking to see if there is a MODS for each item-level object and an object for each item-level MODS.   
 
## The makeJpegs script will do some minor QC, including checking to see if there is a MODS for each item-level object and an object for each item-level MODS.   
 
## If there are no problems, it will copy the MODS files to the UploadArea/MODS directory on the server, copy the transcripts to the UploadArea/transcripts directory on the server, and will generate JPEG derivatives (a thumb ending in _128.jpg and a large image ending in _2048.jpg) and place them in the UploadArea/jpegs directory on the server.
 
## If there are no problems, it will copy the MODS files to the UploadArea/MODS directory on the server, copy the transcripts to the UploadArea/transcripts directory on the server, and will generate JPEG derivatives (a thumb ending in _128.jpg and a large image ending in _2048.jpg) and place them in the UploadArea/jpegs directory on the server.
 +
## If you select to OCR all content, it will OCR every TIFF you are uploading;  by default it will OCR all transcript tiffs unless a corresponding text file exists in the upload directory (indicating a corrected transcript).
 +
## If you provide an ocrList.txt file in the Admin directory which contains a list of item numbers to be OCR'd (each to be OCR'd is followed by a tab and then a one ("1"), with one item number per line (those followed with zeros will be ignored) -- then the selected items and their child files will be OCR'd.  If any on that list are already online, the script will look for the tiffs in the archive, look for existing text files in Acumen - and if it finds the former and not the latter, it will OCR the tiff and place it in Acumen.
 
# If there were no errors and the script completed these tasks, check the results on the server in the above-named directories.  If all is well, proceed to the next step.
 
# If there were no errors and the script completed these tasks, check the results on the server in the above-named directories.  If all is well, proceed to the next step.
 
# In the UploadArea/scripts directory, type in 'relocate_all'.  This script will move all the content just uploaded into the correct directories in Acumen. If anything remains in the above-mentioned directories after the script has completed, there's a problem which must be repaired.  One thing to check, if there's no errors in the output, is [[Server Permissions]].
 
# In the UploadArea/scripts directory, type in 'relocate_all'.  This script will move all the content just uploaded into the correct directories in Acumen. If anything remains in the above-mentioned directories after the script has completed, there's a problem which must be repaired.  One thing to check, if there's no errors in the output, is [[Server Permissions]].

Revision as of 09:05, 8 June 2010

The first sections of the attached diagram which pertain to Digital Services are delineated in Preparing_Collections_on_the_S_Drive_for_Online_Delivery_and_Storage.

Here's the steps for creating derivatives and moving content off the share drive:

Getting content from the share drive to live in Acumen

  1. Make sure content is in the Completed directory, and that all quality control scripts have been run, and all corrections made.
  2. Create the MODS from the exported text spreadsheet (Making_MODS), and place them in a MODS directory within the Metadata directory.
  3. Log onto libcontent1 and change into the UploadArea/scripts directory. (see Command-line_Work_on_Linux_Server)
  4. Type in 'makeJpegs' and answer the questions that arise. If errors appear on the commandline or in the output file, repair them. The output file will be located in the UploadArea/output directory, and the script will tell you its name, which is timestamped.
    1. The makeJpegs script will do some minor QC, including checking to see if there is a MODS for each item-level object and an object for each item-level MODS.
    2. If there are no problems, it will copy the MODS files to the UploadArea/MODS directory on the server, copy the transcripts to the UploadArea/transcripts directory on the server, and will generate JPEG derivatives (a thumb ending in _128.jpg and a large image ending in _2048.jpg) and place them in the UploadArea/jpegs directory on the server.
    3. If you select to OCR all content, it will OCR every TIFF you are uploading; by default it will OCR all transcript tiffs unless a corresponding text file exists in the upload directory (indicating a corrected transcript).
    4. If you provide an ocrList.txt file in the Admin directory which contains a list of item numbers to be OCR'd (each to be OCR'd is followed by a tab and then a one ("1"), with one item number per line (those followed with zeros will be ignored) -- then the selected items and their child files will be OCR'd. If any on that list are already online, the script will look for the tiffs in the archive, look for existing text files in Acumen - and if it finds the former and not the latter, it will OCR the tiff and place it in Acumen.
  5. If there were no errors and the script completed these tasks, check the results on the server in the above-named directories. If all is well, proceed to the next step.
  6. In the UploadArea/scripts directory, type in 'relocate_all'. This script will move all the content just uploaded into the correct directories in Acumen. If anything remains in the above-mentioned directories after the script has completed, there's a problem which must be repaired. One thing to check, if there's no errors in the output, is Server Permissions.
  7. Check the indexing of the uploaded content after about 24 hours to verify.

makeJpegs will be upgraded to include the creation of OCR files for either entire collection contents or item-level materials as specified in a text file containing a list of one item number per line (no file extensions).

Getting content from the share drive to the storage server to prepare for archiving

The moveContent script will copy content to the deposits directory on the storage server, where it will be prepared for archiving at a later date.

Additionally, it will check the collection-level xml file (in the Admin directory), then add it to the online database which feeds the collection list online. Thus, this script should NOT be run for new collections until the content is indexed (as described in the previous section); otherwise, the link from the collection listing will be dead.

After making the copy, it will verify that each file copied without alteration, and then delete the copy on the share drive.

If any files remain in the directories on the share drive, they did NOT copy to the server!! Run the script again, as there may have been a failure of the copy across the network. If this fails, the file will need to be moved manually, and the problem encountered by the script must be resolved.

Archiving content

See lines 9-25 and 31-33 on this page: Moving_Content_To_Long-Term_Storage


The movement of metadata is spelled out in Metadata_Movement and the process for uploading MODS is in Uploading_MODS.

We have yet to determine at what point MODS (and server-side generated OCR) are ready for archiving, but that will be sorted out soon.

Most.png

Personal tools