If already in deposits
|Line 123:||Line 123:|
''The current, working version of these instructions is located here on libcontent1: /srv/scripts
''The current, working version of these instructions is located here on libcontent1: /srv/scripts/storing/DOTHIS.''
''updated 13 October 2010'' [[User:Jlderidder|Jlderidder]]
''updated 13 October 2010'' [[User:Jlderidder|Jlderidder]]
Revision as of 13:22, 16 May 2011
This is the current set of procedures for moving content from Share to long term storage, and linking them into manifests for LOCKSS.
These steps should happen, in order on a regular basis; weekly if content is available:
TO MOVE CONTENT TO STORAGE
ACCESS and IDENTIFICATION
1) ssh to libcontent1.lib.ua.edu.
2) To get root access, run: sudo su (if you do not have sudo access, you cannot do this process)
3) To get access to the share drive from the libcontent1 Linux server, run: mount -t cifs -o username=jlderidder,domain=lib //libfs1.lib.ua-net.ua.edu/share/Digital\ Projects/ /cifs-mount and type in your password to the share drive. (substitute your own username there in the line above)
4) look through the directories in /srv/deposits/content for anomalies, badly named things, and new categories -- any which are not listed on http://libcontent1.lib.ua.edu/lockss/Manifest.html ( this manifest is in /srv/www/htdocs/lockss/). Anomalies might be directories which do NOT match the expected structure (Admin, Metadata, Scans, Transcripts) or subfolder structure beneath them; spaces in filenames or directories, etc. Report any problems to Digital Services staff and arrange for corrections before proceeding further
5) copy any *ocrList.txt files in Admin directory to /srv/deposits/ocrMe for separate processing
6) RUN testDeposits (in /srv/scripts/qc; script available here: File:TestDeposits.txt) and look at the output for problems. If this script runs across subdirectories named "Box" or "Folder" then it will spit out an error and tell you to run testBoxFolder (File:TestBoxFolder.txt) instead. This is for collections that are digitized without item-level metadata, with box and folder information in the filename to assist in linking from the correct part of the EAD. So if you get this message, run testBoxFolder on the specified collection(s) and look at the output for any problems.
MOVE CONTENT TO ARCHIVE
7) a) open /srv/scripts/storing/relocating (script available here: File:Relocating.txt) and set $test = 1. b) run "relocating > look". It will: i) list the files it's going to move, from where and to where in "moveme" ii) write the new manifests into RelocateManifests so you can see what they're going to look like iii) write the parent manifests to parentMans, so you can see what they're going to look like iv) write errors to "look" file, as well as other info c) Look at the output in the "look" file, and the manifests concatenated in RelocateManifests and parentMans.
8) Make whatever repairs are necessary for script to work. If there are any new categories (z0004?), then add them, and create top level manifests in those category directories (copy from like directories and modify to fit).
9) WARNING! Before running /srv/scripts/storing/relocating, do a "chmod -R 755 /srv/archive". This will enable you to write to the directories as root
10) Open /srv/scripts/storing/relocating and edit out the line $test = 1;
11) Empty the file /srv/scripts/storing/RelocateManifests, and copy moveme to another filename; delete nohup.out. 12) run "nohup relocating" -- this time for real. It will take hours; "nohup" keeps it running even if you disconnect. The output will be written to "nohup.out" -- check it occasionally, and the archive directory for errors. This time it's really copying files into the correct location in the archive (making a note of what goes where in 'moveme' for the 'checkem" script in a few steps), and it versions metadata and xml files, and it links versioned files and tiffs and wave files into the correct manifests, and links in new collections to the manifest pages of the holders (the next directory up from collections, such as u0003 for all Hoole Manuscript content).
SUPPLEMENTAL WORK, CLEAN-UP, AND QUALITY CONTROL
13) If the output tells you to modify or create manifests, do that next (if not done already) in top level of /srv/www/htdocs/lockss directory The script does NOT create the top 2 levels of manifests, as it does not have sufficient information about new categories to fill this in.
14) check links at http://libcontent1.lib.ua.edu/lockss/Manifest.html -- drill down. If you cannot access it, modify the .htaccess file in /srv/www/htdocs/lockss to allow your IP, and restart the apache web server: '/usr/sbin/apache2ctl restart' (be sure to change it back, and then restart the web server)
15) then run 'checkem > look' in /srv/scripts/storing (script available here: File:Checkem.txt) -- it will use the "moveme" file to verify md5 sums of content that was moved, and delete from the deposits directory if there's a match. If there's NOT a match, it will output "ERROR: " and the error. So look through the "look file" for problems. Again, if there's a lot of content, precede this command with "nohup " and then look later in the nohup.out file for the content that normally would go into "look". 16) run checkArchive (script available here; File:CheckArchive.txt) and look at ArchiveERRORS; make repairs. If needed, alter and run doubleCheck to make sure things were copied over; checkMans to check that what's linked in the Manifest actually copied; addToMans to add content that was moved but not written to the Manifest. Problems occur especially when the server goes down during a transfer. Especially be aware that dozens of tiffs may have been copying when it went down, meaning multiple images are corrupted or damaged. 17) do a "chmod -R 755 /srv/archive" to protect the archive from unauthorized alteration until next time.
18) copy current versions of EADs for content out of the archive to notInDbase and process for linking new content. Use directories in /srv/deposits for guides as to which collections to link.
19) then go look in the /srv/deposits directory, make sure folders are clean, and delete them. I do this: ls /srv/deposits/content/* ls /srv/deposits/content/*/* ls /srv/deposits/content/*/*/* ls /srv/deposits/content/*/*/*/* if you find content, there's a problem. Solve it!! It may be necessary to rename things and go back through steps 13 on again. if no content: rm -r /srv/deposits/content/* (this will recursively delete all content in the directory)
20) run ocrSelected (File:OcrSelected.txt) in /srv/scripts/surfacing. This will go through the *ocrList files in /srv/deposits/ocrMe, select out the items marked with a "1" -- locate the tiffs, create OCR files, create directories for them, and place them live in the content directory where they belong.
21) check results, and delete files in /srv/deposits/ocrMe
The current, working version of these instructions is located here on libcontent1: /srv/scripts/storing/DOTHIS.
updated 13 October 2010 Jlderidder