|
|
(10 intermediate revisions by one user not shown) |
Line 1: |
Line 1: |
| Currently, we are developing collections on a shared Windows drive called "Share" in the Digital Projects folder. | | Currently, we are developing collections on a shared Windows drive called "Share" in the Digital Projects folder. |
| | | |
− | We are storing content long-term on a Linux server. | + | We are storing content long-term on a SUSE Linux server. |
| | | |
− | We are delivering content on another Linux server via CONTENTdm, and Tonio Loewald of OLT has developed an alternative (Linux-based) delivery solution, [[http://acumen.lib.ua./edu|Acumen]] which runs off a web-enabled version of our storage area. | + | We are delivering content on the same Linux server via Acumen, [http://acumen.lib.ua./edu http://acumen.lib.ua./edu] which runs off a web-enabled version of our storage area. |
| | | |
− | This is the current set of procedures for moving content from Share to long term storage, and linking them into manifests for LOCKSS.
| + | [[If moving from share]] |
− | This procedure should happen on a regular basis:
| + | |
| | | |
− | | + | [[If already in deposits]] (uploaded by staff) |
− | == TO MOVE CONTENT TO STORAGE==
| + | |
− | | + | |
− | | + | |
− | '''ACCESS and IDENTIFICATION'''
| + | |
− | -------------------------
| + | |
− | 1) ssh to libcontent1.lib.ua.edu.
| + | |
− | | + | |
− | 2) To get root access, run: sudo su (if you do not have sudo access, you cannot do this process)
| + | |
− | | + | |
− | 3) To get access to the share drive from the libcontent1 Linux server, run:
| + | |
− | mount -t cifs -o username=jlderidder,domain=lib //libfs1.lib.ua-net.ua.edu/share/Digital\ Projects/ /cifs-mount
| + | |
− | and type in your password to the share drive. (substitute your own username there in the line above)
| + | |
− | | + | |
− | 4) To determine what the status in on content in the Digital Services folder on share,
| + | |
− | run: /srv/scripts/qc/status
| + | |
− | This script goes through the Digital_Coll_Complete and Digital_Coll_in_progress folders, and pulls out
| + | |
− | paths to folders that end in _ready (for Mary), _store and _online (for pickup for storage) and also
| + | |
− | _check, for digital services quality control. The lists of these paths are written to files in
| + | |
− | /srv/scripts/qc/lists:
| + | |
− | forMary are things waiting for her;
| + | |
− | checkme are for the DS folks
| + | |
− | storeMe are things ready to go online and be stored
| + | |
− | | + | |
− | 5) Share this output with Digital Services folks, to verify it is correct.
| + | |
− | | + | |
− | | + | |
− | '''MOVE TO LINUX SERVER'''
| + | |
− | ----------------------------------------
| + | |
− | 6) Once verified, look under /cifs-mount for content in share directory.
| + | |
− | Copy it over to /srv/deposits/content/.
| + | |
− | for example:
| + | |
− | cp -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939 /srv/deposits/content/.
| + | |
− | This does a recursive copy of the entire specified directory and all its contents, and places the copy
| + | |
− | in the /srv/deposits/content directory on the libcontent1 server.
| + | |
− | | + | |
− | 7) do a diff from /srv/deposits/content to the location of the content on share drive area
| + | |
− | to see if we got the content ok; for example:
| + | |
− | diff -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939 /srv/deposits/content/u0003_0002121_Aston_1939
| + | |
− | look at the output: if none, it's a match. If there's a variation, recopy the specified files,
| + | |
− | and then run the diff again. If it's a big directory, prepend the command with "nohup " so the
| + | |
− | process will continue to run if you log off or lose your connection. The output then will be
| + | |
− | in the nohup.out file in the directory where you ran the "diff" command.
| + | |
− | | + | |
− | 8) Once you're sure you have a clean copy (no output from "diff"), delete the specified directory from
| + | |
− | the share drive. For example: rm -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939
| + | |
− | This is a powerful command, a recursive removal of a directory and everything in it,
| + | |
− | including subdirectories. Be careful with it.
| + | |
− | | + | |
− | | + | |
− | | + | |
− | '''PREPARE CONTENT '''
| + | |
− | ------------------------
| + | |
− | | + | |
− | 9) look through the directories in /srv/deposits/content for anomalies, badly named things,
| + | |
− | and new categories -- any which are not listed on http://libcontent1.lib.ua.edu/lockss/Manifest.html
| + | |
− | ( this manifest is in /srv/www/htdocs/lockss/). Anomalies might be directories which do NOT match
| + | |
− | the expected structure (Admin, Metadata, Scans, Transcripts) or subfolder structure beneath them;
| + | |
− | spaces in filenames or directories, etc. Report any problems to Digital Services staff and arrange for corrections before
| + | |
− | proceeding further
| + | |
− | | + | |
− | 10) move any item-level text files out of scans to S:\Digital Projects\Administrative\collectionInfo\forMDlib\itemMD\
| + | |
− | and notify Mary
| + | |
− |
| + | |
− | 11) RUN testIncoming (in /srv/scripts/qc; script available here: [[Image:TestIncoming.txt]]) and look at the output for problems.
| + | |
− | | + | |
− | | + | |
− | | + | |
− | '''UPDATE DATABASE WITH COLLECTION INFO'''
| + | |
− | ------------------------------------
| + | |
− | 12) If the collection has an icon: make sure it is in the Admin folder, named for the collection.
| + | |
− | For example, u0003_0002121.icon.png for the collection above. (You will need to copy this
| + | |
− | to the CONTENTdm server (content.lib.ua.edu) and place it in the /usr/local/Content4/docs/cdm4/images
| + | |
− | directory, with 664 access rights "chmod 664 xyz.png"). List this icon in /srv/scripts/collstuff/icons,
| + | |
− | so the following script will know not to apply a default icon link.
| + | |
− | | + | |
− | 13) Then go to the collstuff directory /srv/scripts/collstuff and run: collToDbase
| + | |
− | to put the collection info in the database. This looks through the Admin folders in the top level
| + | |
− | directories placed in /srv/deposits, for collection-numbered XML files. It parses through the content,
| + | |
− | repairing encoding errors, puts the content into mysql InfoTrack.allColls, adds the expected canned link,
| + | |
− | and asks you for each collection and each delivery system whether or not the collection is online.
| + | |
− | The software will give you the expected URL to test; if it fails, and you know the content is in the system,
| + | |
− | locate the correct URL and provide it to the software, which will ask for a correction.
| + | |
− | | + | |
− | 14) Check the database entries... mysql, InfoTrack.allColls. For example:
| + | |
− | mysql -u jlderidder -p
| + | |
− | (enter password)
| + | |
− | use InfoTrack;
| + | |
− | select * from digColls where id_2009 like "u0003_0002121" and AnalogOrDigital like "D";
| + | |
− | (when done, enter "quit;")
| + | |
− | If not okay, make repairs from command line.
| + | |
− | | + | |
− | 15) Verify that links and display are correct. they will show up online here: [[http://www.lib.ua.edu/digital/browse|http://www.lib.ua.edu/digital/browse]].
| + | |
− | | + | |
− | | + | |
− | | + | |
− | | + | |
− | '''MOVE CONTENT TO ARCHIVE'''
| + | |
− | ----------------------------------
| + | |
− |
| + | |
− | 16)
| + | |
− | a) open /srv/scripts/storing/relocating (script available here: [[Image:Relocating.txt]]) and set $test = 1.
| + | |
− | b) run "relocating > look". It will:
| + | |
− | i) list the files it's going to move, from where and to where in "moveme"
| + | |
− | ii) write the new manifests into RelocateManifests so you can see what they're going to look like
| + | |
− | iii) write the parent manifests to parentMans, so you can see what they're going to look like
| + | |
− | iv) write errors to "look" file, as well as other info
| + | |
− | c) Look at the output in the "look" file, and the manifests concatenated in RelocateManifests and parentMans.
| + | |
− | | + | |
− | 17) Make whatever repairs are necessary for script to work. If there are any new categories (z0004?),
| + | |
− | then add them, and create top level manifests in those category directories (copy from like directories
| + | |
− | and modify to fit).
| + | |
− | | + | |
− | 18) Open /srv/scripts/storing/relocating and edit out the line $test = 1;
| + | |
− | | + | |
− | 19) Empty the file /srv/scripts/storing/RelocateManifests, and copy moveme to another filename.
| + | |
− | | + | |
− | 20) WARNING! Before running /srv/scripts/storing/relocating, do a "chmod -R 755 /srv/archive". This will enable you to \
| + | |
− | write to the directories as root
| + | |
− | | + | |
− | 21) run "relocating > look" again, this time for real. It will take hours. Occasionally check the output in "look" and the
| + | |
− | archive directory for errors.
| + | |
− | | + | |
− | | + | |
− | | + | |
− | | + | |
− | '''SUPPLEMENTAL WORK, CLEAN-UP, AND QUALITY CONTROL'''
| + | |
− | -------------------------------------------------
| + | |
− | | + | |
− | 22) If the output tells you to modify or create manifests, do that next (if not done already)
| + | |
− | in top level of /srv/www/htdocs/lockss directory
| + | |
− | The script does NOT create the top 2 levels of manifests, as it does not have sufficient
| + | |
− | information about new categories to fill this in.
| + | |
− | | + | |
− | 23) check links at http://libcontent1.lib.ua.edu/lockss/Manifest.html -- drill down.
| + | |
− | If you cannot access it, modify the .htaccess file in /srv/www/htdocs/lockss to allow your IP,
| + | |
− | and restart the apache web server: '/usr/sbin/apache2ctl restart'
| + | |
− | (be sure to change it back, and then restart the web server)
| + | |
− | | + | |
− | 24) then run 'checkem > look' in /srv/scripts/storing (script available here: [[Image:Checkem.txt]])
| + | |
− | -- it will use the "moveme" file to
| + | |
− | verify md5 sums of content that was moved,
| + | |
− | and delete from the deposits directory if there's a match. If there's NOT a match, it will output
| + | |
− | "ERROR: " and the error. So look through the "look file" for problems.
| + | |
− | Again, if there's a lot of content, precede this command with "nohup " and then look later in the
| + | |
− | nohup.out file for the content that normally would go into "look".
| + | |
− |
| + | |
− | 25) run checkArchive (script available here; [[Image:CheckArchive.txt]]) and look at ArchiveERRORS; make repairs. If needed, alter and run doubleCheck to make sure things were copied over; checkMans to check that what's linked in the Manifest actually copied; addToMans to add content that was moved but not written to the Manifest. Problems occur especially when the server goes down during a transfer. Especially be aware that dozens of tiffs may have been copying when it went down, meaning multiple images are corrupted or damaged.
| + | |
− |
| + | |
− | 26) do a "chmod -R 555 /srv/archive" to protect the archive from unauthorized alteration until next time.
| + | |
− | | + | |
− | 27) then go look in the /srv/deposits directory, make sure folders are clean, and delete them.
| + | |
− | I do this: ls /srv/deposits/content/*
| + | |
− | ls /srv/deposits/content/*/*
| + | |
− | ls /srv/deposits/content/*/*/*
| + | |
− | ls /srv/deposits/content/*/*/*/*
| + | |
− | if you find content, there's a problem. Solve it!! It may be necessary to rename things and go back
| + | |
− | through steps 13 on again.
| + | |
− | if no content: rm -r /srv/deposits/content/*
| + | |
− | (this will recursively delete all content in the directory)
| + | |
− | | + | |
− | 28) go to /srv/scripts/surfacing and run copychange (script available here: [[Image:Copychange.txt]]) to create derivatives from tiffs, wave files and copy transcripts over to the web directory for access;
| + | |
− | | + | |
− | 29) then run /srv/scripts/surfacing/repair to remediate the doubles created from tiffs which include both thumbs and large images.
| + | |
− | | + | |
− | 30) cd /home/Jeremiah and locate the MODS for what was just moved, in the MODS directory. Run "relocate.pl" to send them live.
| + | |
− | | + | |
− | 31) run /srv/scripts/metadata/findMissing to find out what doesn't have metadata, what doesn't have derivatives -- troubleshoot.
| + | |
− | | + | |
− | 32) Either index these directories immediately from the Acumen admin interface
| + | |
− | ([[http://acumen.lib.ua.edu/admin|http://acumen.lib.ua.edu/admin]]) or wait a little while for the cron indexer to pick
| + | |
− | up the content, then check the output: [[http://acumen.lib.ua.edu/|http://acumen.lib.ua.edu/]]. Content should appear
| + | |
− | already in the web directories located here:
| + | |
− | [[http://libcontent1.lib.ua.edu/content/|http://libcontent1.lib.ua.edu/content/]]. See [[Find our content online]] for
| + | |
− | detailed instructions.
| + | |
− | | + | |
− | | + | |
− | | + | |
− | ''The current, working version of these instructions is located here on libcontent1: /srv/scripts/metadata/storing/DOTHIS.''
| + | |
− | | + | |
− | | + | |
− | | + | |
− | | + | |
− | ''updated 22 February 2010'' [[User:Jlderidder|Jlderidder]]
| + | |
Currently, we are developing collections on a shared Windows drive called "Share" in the Digital Projects folder.
We are storing content long-term on a SUSE Linux server.