Moving Content To Long-Term Storage

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
(TO MOVE CONTENT TO STORAGE)
 
(6 intermediate revisions by one user not shown)
Line 1: Line 1:
 
Currently, we are developing collections on a shared Windows drive called "Share" in the Digital Projects folder.
 
Currently, we are developing collections on a shared Windows drive called "Share" in the Digital Projects folder.
  
We are storing content long-term on a Linux server.
+
We are storing content long-term on a SUSE Linux server.
  
We are delivering content on another Linux server via CONTENTdm, and Tonio Loewald of OLT has developed an alternative (Linux-based) delivery solution, Acumen  [http://acumen.lib.ua./edu  http://acumen.lib.ua./edu] which runs off a web-enabled version of our storage area.
+
We are delivering content on the same Linux server via Acumen, [http://acumen.lib.ua./edu  http://acumen.lib.ua./edu] which runs off a web-enabled version of our storage area.
  
This is the current set of procedures for moving content from Share to long term storage, and linking them into manifests for LOCKSS.
+
[[If moving from share]]
This procedure should happen on a regular basis:
+
  
 
+
[[If already in deposits]]  (uploaded by staff)
== TO MOVE CONTENT TO STORAGE==
+
 
+
 
+
'''ACCESS and IDENTIFICATION'''
+
-------------------------
+
1) ssh to  libcontent1.lib.ua.edu.
+
 
+
2)  To get root access, run: sudo su  (if you do not have sudo access, you cannot do this process)
+
 
+
3)  To get access to the share drive from the libcontent1 Linux server, run:
+
  mount -t cifs -o username=jlderidder,domain=lib //libfs1.lib.ua-net.ua.edu/share/Digital\ Projects/ /cifs-mount
+
  and type in your password to the share drive.  (substitute your own username there in the line above)
+
 
+
4)  To determine what the status in on content in the Digital Services folder on share,
+
  run: /srv/scripts/qc/status
+
  This script goes through the Digital_Coll_Complete and Digital_Coll_in_progress folders, and pulls out
+
  paths to folders that end in _ready (for Mary), _store and _online (for pickup for storage) and also
+
  _check, for digital services quality control.  The lists of these paths are written to files in
+
  /srv/scripts/qc/lists:
+
  forMary are things waiting for her;
+
  checkme are for the DS folks
+
  storeMe are things ready to go online and be stored
+
 
+
5) Share this output with Digital Services folks, to verify it is correct.
+
 
+
 
+
'''MOVE TO LINUX SERVER'''
+
----------------------------------------
+
6) Once verified,  look under /cifs-mount for content in share directory.
+
  Copy it over to /srv/deposits/content/.
+
  for example:
+
  cp -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939 /srv/deposits/content/.
+
  This does a recursive copy of the entire specified directory and all its contents, and places the copy
+
  in the /srv/deposits/content directory on the libcontent1 server.
+
 
+
7) do a diff from /srv/deposits/content to the location of the content on share drive area
+
  to see if we got the content ok; for example:
+
  diff -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939  /srv/deposits/content/u0003_0002121_Aston_1939
+
  look at the output:  if none, it's a match.  If there's a variation, recopy the specified files,
+
  and then run the diff again.  If it's a big directory, prepend the command with "nohup "  so the
+
  process will continue to run if you log off or lose your connection.  The output then will be
+
  in the nohup.out file in the directory where you ran the "diff" command.
+
 
+
8)  Once you're sure you have a clean copy (no output from "diff"), delete the specified directory from
+
    the share drive.  For example:  rm -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939
+
    This is a powerful command, a recursive removal of a directory and everything in it,
+
    including subdirectories.  Be careful with it.
+
 
+
 
+
 
+
'''PREPARE CONTENT '''
+
------------------------
+
 
+
9) look through the directories in /srv/deposits/content for anomalies, badly named things,
+
  and new categories --  any which are not listed on http://libcontent1.lib.ua.edu/lockss/Manifest.html
+
  ( this manifest is in /srv/www/htdocs/lockss/).  Anomalies might be directories which do NOT match
+
  the expected structure (Admin, Metadata, Scans, Transcripts) or subfolder structure beneath them;
+
  spaces in filenames or directories, etc.  Report any problems to Digital Services staff and arrange for corrections before
+
  proceeding further
+
 
+
10) move any item-level text files out of scans to S:\Digital Projects\Administrative\collectionInfo\forMDlib\itemMD\
+
  and notify Mary
+
+
11) RUN testIncoming (in /srv/scripts/qc;  script available here:  [[Image:TestIncoming.txt]]) and look at the output for problems.
+
 
+
 
+
 
+
'''UPDATE DATABASE WITH COLLECTION INFO'''
+
------------------------------------
+
12) If the collection has an icon:  make sure it is in the Admin folder, named for the collection.
+
  For example, u0003_0002121.icon.png for the collection above.  (You will need to copy this
+
  to the CONTENTdm server (content.lib.ua.edu) and place it in the /usr/local/Content4/docs/cdm4/images
+
  directory, with 664 access rights "chmod 664 xyz.png"). List this icon in /srv/scripts/collstuff/icons,
+
  so the following script will know not to apply a default icon link.
+
 
+
13) Then go to the collstuff directory /srv/scripts/collstuff and run: collToDbase
+
  to put the collection info in the database.  This looks through the Admin folders in the top level
+
  directories placed in /srv/deposits, for collection-numbered XML files.  It parses through the content,
+
  repairing encoding errors, puts the content into mysql InfoTrack.allColls, adds the expected canned link,
+
  and asks you for each collection and each delivery system whether or not the collection is online.
+
  The software will give you the expected URL to test;  if it fails, and you know the content is in the system,
+
  locate the correct URL and provide it to the software, which will ask for a correction.
+
 
+
14) Check the database entries... mysql, InfoTrack.allColls.  For example:
+
  mysql -u jlderidder -p
+
  (enter password)
+
  use InfoTrack;
+
  select * from digColls where id_2009 like "u0003_0002121" and AnalogOrDigital like "D";
+
  (when done, enter "quit;")
+
  If not okay, make repairs from command line.
+
 
+
15) Verify that links and display are correct.  they will show up online here: [[http://www.lib.ua.edu/digital/browse|http://www.lib.ua.edu/digital/browse]].
+
 
+
 
+
 
+
 
+
'''MOVE CONTENT TO ARCHIVE'''
+
----------------------------------
+
+
16)
+
  a) open /srv/scripts/storing/relocating (script available here: [[Image:Relocating.txt]]) and set $test = 1. 
+
  b) run "relocating > look".  It will:
+
    i) list the files it's going to move, from where and to where in "moveme"
+
    ii) write the new manifests into RelocateManifests so you can see what they're going to look like
+
    iii) write the parent manifests to parentMans, so you can see what they're going to look like
+
    iv) write errors to "look" file, as well as other info
+
  c) Look at the output in the "look" file, and the manifests concatenated in RelocateManifests and parentMans. 
+
 
+
17) Make whatever repairs are necessary for script to work.  If there are any new categories (z0004?),
+
  then add them, and create top level manifests in those category directories (copy from like directories
+
  and modify to fit).
+
 
+
18) Open /srv/scripts/storing/relocating  and edit out the line $test = 1;
+
 
+
19) Empty the file /srv/scripts/storing/RelocateManifests, and copy moveme to another filename.
+
 
+
20) WARNING!  Before running /srv/scripts/storing/relocating, do a "chmod -R 755 /srv/archive".  This will enable you to
+
    write to the directories as root
+
 
+
21) run "relocating > look" again, this time for real.  It will take hours.  Occasionally check the output in "look" and the
+
    archive directory for errors.
+
 
+
 
+
 
+
 
+
'''SUPPLEMENTAL WORK, CLEAN-UP, AND QUALITY CONTROL'''
+
-------------------------------------------------
+
 
+
22) If the output tells you to modify or create manifests, do that next (if not done already)
+
    in top level of /srv/www/htdocs/lockss directory
+
    The script does NOT create the top 2 levels of manifests, as it does not have sufficient
+
    information about new categories to fill this in.
+
 
+
23) check links at http://libcontent1.lib.ua.edu/lockss/Manifest.html  -- drill down.
+
    If you cannot access it, modify the .htaccess file in /srv/www/htdocs/lockss to allow your IP,
+
    and restart the apache web server:  '/usr/sbin/apache2ctl restart'
+
    (be sure to change it back, and then restart the web server)
+
 
+
24) then run 'checkem > look' in /srv/scripts/storing  (script available here:  [[Image:Checkem.txt]])
+
    -- it will use the "moveme" file to
+
    verify md5 sums of content that was moved,
+
    and delete from the deposits directory if there's a match.  If there's NOT a match, it will output
+
    "ERROR: "  and the error.  So look through the "look file" for problems.
+
    Again, if there's a lot of content, precede this command with "nohup " and then look later in the
+
    nohup.out file for the content that normally would go into "look".
+
+
  25) run checkArchive (script available here;  [[Image:CheckArchive.txt]]) and look at ArchiveERRORS; make repairs.
+
  If needed, alter and run doubleCheck to make sure things were copied over;  checkMans to check that what's linked in
+
  the Manifest actually copied; addToMans to add content that was moved but not written to the Manifest.  Problems occur
+
  especially when the server goes down during a transfer.  Especially be aware that dozens of tiffs may have been copying
+
  when it went down, meaning multiple images are corrupted or damaged.
+
+
  26)  do a "chmod -R 555 /srv/archive"  to protect the archive from unauthorized alteration until next time.
+
 
+
  27) then go look in the /srv/deposits directory, make sure folders are clean, and delete them.
+
    I do this:  ls /srv/deposits/content/*
+
                ls /srv/deposits/content/*/*
+
                ls /srv/deposits/content/*/*/*
+
                ls /srv/deposits/content/*/*/*/*
+
    if you find content, there's a problem.  Solve it!!  It may be necessary to rename things and go back
+
      through steps 13 on  again.
+
    if no content:  rm -r /srv/deposits/content/*
+
      (this will recursively delete all content in the directory)
+
 
+
  28)  go to /srv/scripts/surfacing and run copychange (script available here:  [[Image:Copychange.txt]])
+
    to create derivatives from tiffs, wave files and copy transcripts over to the web directory for access; 
+
 
+
  29) then run /srv/scripts/surfacing/repair  to remediate the doubles created from tiffs which include both thumbs and large images.
+
 
+
  30) cd /home/Jeremiah  and locate the MODS for what was just moved, in the MODS directory.  Run "relocate.pl" to send them live.
+
 
+
  31) run /srv/scripts/metadata/findMissing to find out what doesn't have metadata, what doesn't have derivatives  -- troubleshoot.
+
 
+
  32) Either index these directories immediately from the Acumen admin interface
+
  ([http://acumen.lib.ua.edu/admin  http://acumen.lib.ua.edu/admin]) or wait a little while for the cron indexer to pick
+
  up the content, then check the output: [http://acumen.lib.ua.edu/  http://acumen.lib.ua.edu/].  Content should appear
+
  already in the web directories located here:
+
  [http://libcontent1.lib.ua.edu/content/  http://libcontent1.lib.ua.edu/content/].  See [[Find our content online]] for
+
  detailed instructions.
+
 
+
 
+
 
+
''The current, working version of these instructions is located here on libcontent1:  /srv/scripts/metadata/storing/DOTHIS.''
+
 
+
 
+
 
+
 
+
''updated 22 February 2010'' [[User:Jlderidder|Jlderidder]]
+

Latest revision as of 15:15, 18 December 2012

Currently, we are developing collections on a shared Windows drive called "Share" in the Digital Projects folder.

We are storing content long-term on a SUSE Linux server.

We are delivering content on the same Linux server via Acumen, http://acumen.lib.ua./edu which runs off a web-enabled version of our storage area.

If moving from share

If already in deposits (uploaded by staff)

Personal tools