Difference between revisions of "Mass Content"

From UA Libraries Digital Services Planning and Documentation
m (reflecting server switch from libcontent1 to libcontent)
(Here's the workflow for getting a Mass Content collection live)
Line 49: Line 49:
18)  exit out of secure shell.  Good work!!
18)  exit out of secure shell.  Good work!!
See a visualization of the upload process:
[[Media:Mass_Content_Upload_Process.pdf‎ | Mass Content Upload Process]]

Latest revision as of 10:46, 14 April 2016

Note: By "Mass Content" we mean a collection that is being digitized without manually-created item-level metadata, intended for delivery online via links from the finding aid.

For the process specific to the Pauline Jones Gandrud papers, see Uploading Gandrud Mass Content.

Preliminary steps on the server

  1. make sure that the Windows share is mounted: type into the ssh window on libcontent: `ls /cifs-mount` -- if no listing appears, or the window hangs up, you need to mount the share drive. Otherwise, proceed to getting Cabaniss content live...
  2. To mount the Windows drive, type this in on the commandline on libcontent: `sudo mount -t cifs -o username=jjcolonnaromano,domain=lib //libfs1.lib.ua-net.ua.edu/share/Digital\ Projects/ /cifs-mount` and use the password for share for Jeremiah. If successful, the command in the last step will show you the directories within the Digital Projects folder on Share.

Here's the workflow for getting a Mass Content collection live

(Software here: Mass_Content_Software; Diagram of this workflow here: MassContent.png)

1) starting on the Share drive, make sure all tiffs are in a Scans folder in S:\Digital Projects\Digital_Coll_Complete\{collection_number}_whatever where {collection_number} is the number of your collection, and "whatever" is the text you've tacked on to it so you can find your collection by name

2) use the script S:\Digital Projects\Administrative\scripts\qc\BoxFolderCheck.pl to check filenames; run it till it shows no errors. File:BoxFolderCheck.txt

3) open a command-line window in SSH. Type in: `cd MassContent/scripts`. This will change your working directory to the one where the scripts are. Screenshot of directory structure: Massdirs.jpg

4) run makeMassJpegs in MassContent/scripts directory (File:MakeMassJpegs.txt): type in: `makeMassJpegs` This script will ask you for the collection number, and will also ask which directories you want JPEGs created for (do NOT use for transcripts, or they will overwrite the content images). The script will create JPEGs for all the tiffs in the directories you've chosen, and will place them all in the MassContent/jpegs directory on the server. After creating all the JPEGs it will correct for the multiples created due to thumbnails in the TIFFs.

5) Spot check the output. There should be 2 jpegs for every tiff. One ends in _128, and one in _2048. These correspond to the number of maximum pixels to a side. The one ending in "_128.jpg" is a thumbnail. The one ending in "_2048.jpg" is a large image used in display.

6) run linkContent in MassContent/scripts directory: type in: `linkContent` . This script asks for the collection number, looks at all the "_2048" size jpegs in the jpeg directory that belong to this collection, and asks you a series of questions. Some finding aids have collection numbers preceding the box numbers; some have box numbers left-padded with zeros to a certain number of digits; and some have prefixes on the box numbers that were not used in the filename. For example, "252.012" is box 12 for collection MSS 252, but "415" is box 15 for collection MSS 535. The script needs this information in order to match up filenames with their box/folder location in the finding aid. Once the item is located in the EAD, this script will obtain a persistent identifier link for each item from the Mysql database, and update the EAD in the EAD directory, copying the older version to the MassContent/backups directory. File:LinkContent.txt

7) check in the output directory for the latest Linking_README file (they are timestamped). Look for "ERROR" and note any warnings. If necessary, make repairs to files and rerun. If the EAD was trashed, the previous copy was placed in the backups directory with a timestamp on it. You can replace the trashed one with the previous version and rerun if you need to.

8) run generateMODS in MassContent/scripts directory: type in: `generateMods` . This will again ask for the collection number; then pulls the new PURL links from the database, creates minimal mods for each item and places them in the MassContent/MODS directory -- it also copies them to the deposits directory for archiving with the tiffs. File:GenerateMods.txt This script depends upon a template specifically developed for your collection; without it, this will fail. An example, one for Joshua Foster collection, can be viewed here: Example_MODS_template

9) check in the output directory for the latest MODS_README file (they are timestamped). Look for "ERROR" and note any warnings. If necessary, make repairs to files and rerun. It will overwrite existing MODS only if they have NOT been distributed live.

10) run makeMassLive script in MassContent/scripts directory (File:MakeMassLive.txt): type in: `makeMassLive` . This distributes the jpegs and MODS in Acumen. It will delete the local copies of MODS and jpegs, so you can reuse the same directories over and over. Watch for errors on the command line.

11) move up to the home area and down into the scripts directory there (cd ../../scripts), and run `findMissing` (File:FindMissing.txt); This script will hunt through Acumen to make sure there is a MODS file for every item, and at least one derivative for each MODS file. Any errors will be found in the output file written to the scripts/output directory. If errors are found, regenerate those MODS and/or JPEGs/MP3s and rerun makeMassLive. Then run this script again to ensure all errors have been remedied.

12) run moveMassContent script in MassContent/scripts directory ( cd ../MassContent/scripts) (File:MoveMassContent.txt): type in: `moveMassContent` . This will check for a collection icon, for collection information (you can upload a new collection xml file if you need to make changes) and it will copy the content to the deposits directory on the storage drive. After checking the copy, it will delete the version on the share drive, including empty scans directories. Thus, anything left behind did NOT make it to the storage drive for some reason; don't delete it. This script also creates a timestamped output file in the MassContent/outputs directory, and it will put errors there, as well as indicate when it has completed.

13) Wait for indexing to occur. This may be about 24 hours. If you want, you can watch the content being deposited on the storage server in the deposits/MassContent directory on libcontent.  :-)

14) Check online to locate one of the files you just uploaded. For example, if one of them was u0003_0000252_2010003, then look for http://acumen.lib.ua.edu/u0003_0000252_2010003 -- if it's there, indexing completed.

15) Once indexing has completed, run eadLive (File:EadLive.txt)to copy the EAD to the web directory. It's important to wait for indexing to be complete so the links in the EAD are valid.

16) Check the output file from the moveMassContent script, and also check the directories on the share drive, to see if anything did NOT get moved successfully to the storage drive. If anything failed, rerun the script; if it still does not move the content, flag the directory with "_NotInStorage" added to its header, and report it to Jody.

17) Check the website and make sure everything is hunky dory.  :-)

18) exit out of secure shell. Good work!!

See a visualization of the upload process: Mass Content Upload Process