Uploading Gandrud Mass Content

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
(imported note about multiple batches)
Line 4: Line 4:
  
  
===A Note on Processing Multiple Batches===
+
==A Note on Processing Multiple Batches==
 
Multiple batches MUST be processed through uploads (1) at the exact same time OR (2) completely separately.  
 
Multiple batches MUST be processed through uploads (1) at the exact same time OR (2) completely separately.  
  

Revision as of 12:03, 24 July 2013

Contents

Preparation

Follow the digitization and quality control procedures as outlined here.


A Note on Processing Multiple Batches

Multiple batches MUST be processed through uploads (1) at the exact same time OR (2) completely separately.

If not uploading multiple batches at the same time, make sure any given batch has made it all the way through the process, including eadLive, before beginning the next batch. This is so that the EAD can be updated to incorporate already-indexed material without picking up changes related to material that hasn't been indexed yet, creating dead links.


Preliminary steps on the server

Note: you may not need to take these steps. The share should be mounted already.

  • First, make sure that the Windows share is mounted: type into the ssh window on libcontent1: `ls /cifs-mount` -- if no listing appears, or the window hangs up, you need to mount the share drive.
  • To mount the Windows drive, type this in on the commandline on libcontent1: `sudo mount -t cifs -o username=jjcolonnaromano,domain=lib //libfs1.lib.ua-net.ua.edu/share/Digital\ Projects/ /cifs-mount` and use the password for share for Jeremiah. If successful, the command in the last step will show you the directories within the Digital Projects folder on Share.


Scripts

  1. Log in to SSH and open a command-line window. Type in: `cd MassContent/scripts`. This will change your working directory to the one where the scripts are.
  2. Run makeMassJpegs in MassContent/scripts directory
    1. type in: `makeMassJpegs`
    2. This script will ask you for the collection number, and will also ask which directories you want JPEGs created for (do NOT use for transcripts, or they will overwrite the content images).
    3. The script will create JPEGs for all the tiffs in the directories you've chosen, and will place them all in the MassContent/jpegs directory on the server.
    4. After creating all the JPEGs it will correct for any multiples created due to thumbnails in the TIFFs.
  3. Spot check the output
    1. There should be 2 jpegs for every tiff. One ends in _128, and one in _2048.
    2. These correspond to the number of maximum pixels to a side. The one ending in "_128.jpg" is a thumbnail. The one ending in "_2048.jpg" is a large image used in display.
  4. Run linkContent in MassContent/scripts directory
    1. type in: `linkContent`
    2. This script asks for the collection number, looks at all the "_2048" size jpegs in the jpeg directory that belong to this collection, and asks you a series of questions.
    3. It will look through your match file, find the box/folder for each item, make sure there's at least one large jpeg for it, and match up filenames with their box/folder location in the finding aid.
    4. Once the item is located in the EAD, it will obtain a persistent identifier link for each item from the Mysql database, and update the EAD in the EAD directory.
    5. It will also generate the MODS file for the item, using the "unittitle" for the folder as a prefix for "Item x" in the item title, filling out the template, and placing the new MODS (with new PURL) in the MassContent/MODS directory. The unmodified EAD version will have been copied to the MassContent/backups directory.
    6. This script depends upon a template specifically developed for your collection in the templates directory; without it, this will fail.
  5. Check the output file
    1. Look in the output directory for the latest Linking_README file (they are timestamped). Look for "ERROR" and note any warnings. If necessary, make repairs to files and rerun.
    2. If the EAD was trashed, the previous copy was placed in the backups directory with a timestamp on it. You can replace the trashed one with the previous version and rerun if you need to.
  6. Run makeMassLive in MassContent/scripts directory
    1. type in: `makeMassLive`
    2. This script distributes the jpegs and MODS in Acumen.
    3. It will delete the local copies of MODS and jpegs, so you can reuse the same directories over and over.
    4. Watch for errors on the command line.
  7. Run findMissing in UploadArea/scripts directory
    1. move up to the home area and down into the scripts directory there (cd ../../scripts), and run `findMissing`
    2. This script will hunt through Acumen to make sure there is a MODS file for every item, and at least one derivative for each MODS file. Any errors will be found in the output file written to the scripts/output directory.
    3. If errors are found, regenerate those MODS and/or JPEGs/MP3s and rerun makeMassLive. Then run this script again to ensure all errors have been remedied.
  8. Run moveMassContent in MassContent/scripts directory
    1. return to MassContent area( cd ../MassContent/scripts) and type in: `moveMassContent'
    2. This script will check for a collection icon and for collection information (you can upload a new collection xml file if you need to make changes)
    3. It will copy the content to the deposits directory on the storage drive.
    4. After checking the copy, it will delete the version on the share drive, including empty scans directories. Thus, anything left behind did NOT make it to the storage drive for some reason; don't delete it.
    5. It also creates a timestamped output file in the MassContent/outputs directory, and it will put errors there, as well as indicate when it has completed.
    6. moveMassContent will also create OCR for content specified in the match file.
  9. Wait for indexing to occur
    1. This should happen overnight, unless Will has stopped automatic indexing or changed his indexing schedule.
    2. Check online to locate one of the files you just uploaded. For example, if one of them was u0003_0000555_0420123, then look for http://acumen.lib.ua.edu/u0003_0000555_0420123 -- if it's there, indexing completed.
  10. Run eadLive in MassContent/scripts directory
    1. This script copies the updated EAD to the web directory.
    2. It's important to wait for indexing to be complete so the links in the EAD are valid.
  11. Check the output file from the moveMassContent script
    1. Also check the directories on the share drive, to see if anything did NOT get moved successfully to the storage drive.
    2. If anything failed, rerun the script; if it still does not move the content, flag the directory with "_NotInStorage" added to its header, and report it to Jody.
  12. Check the website and make sure everything is hunky dory.  :-)
Personal tools