Moving Content To Long-Term Storage

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
 
(17 intermediate revisions by one user not shown)
Line 1: Line 1:
 
Currently, we are developing collections on a shared Windows drive called "Share" in the Digital Projects folder.
 
Currently, we are developing collections on a shared Windows drive called "Share" in the Digital Projects folder.
  
We are storing content long-term on a Linux server.
+
We are storing content long-term on a SUSE Linux server.
  
We are delivering content on another Linux server via CONTENTdm, and Tonio Loewald of OLT is developing an alternative (Linux-based) delivery solution, which runs off our storage area.
+
We are delivering content on the same Linux server via Acumen, [http://acumen.lib.ua./edu  http://acumen.lib.ua./edu] which runs off a web-enabled version of our storage area.
  
This is the current set of procedures for moving content from Share to long term storage, and linking them into manifests for LOCKSS.
+
[[If moving from share]]
This procedure should happen every Friday:
+
  
              ''' TO MOVE CONTENT TO STORAGE  -- every Friday'''
+
[[If already in deposits]] (uploaded by staff)
 
+
 
+
'''ACCESS and IDENTIFICATION'''
+
-------------------------
+
1) ssh to  libcontent1.lib.ua.edu.
+
 
+
2)  To get root access, run: sudo su  (if you do not have sudo access, you cannot do this process)
+
 
+
3)  To get access to the share drive from the libcontent1 Linux server, run:
+
  mount -t cifs -o username=jlderidder,domain=lib //libfs1.lib.ua-net.ua.edu/share/Digital\ Projects/ /cifs-mount
+
  and type in your password to the share drive.  (substitute your own username there in the line above)
+
 
+
4)  To determine what the status in on content in the Digital Services folder on share,
+
  run: /srv/scripts/qc/status
+
  This script goes through the Digital_Coll_Complete and Digital_Coll_in_progress folders, and pulls out
+
  paths to folders that end in _ready (for Mary), _store and _online (for pickup for storage) and also
+
  _check, for digital services quality control.  The lists of these paths are written to files in
+
  /srv/scripts/qc/lists:
+
  forMary are things waiting for her;
+
  checkme are for the DS folks
+
  storeMe are things NOT to go to CONTENTdm, ready to store, though
+
  online are things ALREADY in CONTENTdm, ready to store.
+
 
+
5) Share this output with Digital Services folks, to verify it is correct.
+
 
+
 
+
'''MOVE TO LINUX SERVER'''
+
----------------------------------------
+
6) Once verified,  look under /cifs-mount for content in share directory.
+
  Copy it over to /srv/deposits/content/.
+
  for example:
+
  cp -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939 /srv/deposits/content/.
+
  This does a recursive copy of the entire specified directory and all its contents, and places the copy
+
  in the /srv/deposits/content directory on the libcontent1 server.
+
 
+
7) do a diff from /srv/deposits/content to the location of the content on share drive area
+
  to see if we got the content ok; for example:
+
  diff -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939  /srv/deposits/content/u0003_0002121_Aston_1939
+
  look at the output:  if none, it's a match.  If there's a variation, recopy the specified files,
+
  and then run the diff again.  If it's a big directory, prepend the command with "nohup "  so the
+
  process will continue to run if you log off or lose your connection.  The output then will be
+
  in the nohup.out file in the directory where you ran the "diff" command.
+
 
+
8)  Once you're sure you have a clean copy (no output from "diff"), delete the specified directory from
+
    the share drive.  For example:  rm -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939
+
    This is a powerful command, a recursive removal of a directory and everything in it,
+
    including subdirectories.  Be careful with it.
+
 
+
 
+
'''UPDATE DATABASE WITH COLLECTION INFO'''
+
------------------------------------
+
9) If the collection has an icon:  make sure it is in the Admin folder, named for the collection.
+
  For example, u0003_0002121.icon.png for the collection above.  (You will need to copy this
+
  to the CONTENTdm server (content.lib.ua.edu) and place it in the /usr/local/Content4/docs/cdm4/images
+
  directory, with 664 access rights "chmod 664 xyz.png"). List this icon in /srv/scripts/collstuff/icons,
+
  so the following script will know not to apply a default icon link.
+
 
+
10) Then go to the collstuff directory /srv/scripts/collstuff and run: collToDbase
+
  to put the collection info in the database.  This looks through the Admin folders in the top level
+
  directories placed in /srv/deposits, for collection-numbered XML files.  It parses through the content,
+
  repairing encoding errors, puts the content into mysql InfoTrack.digColls, adds the expected canned link,
+
  and asks you for each collection whether or not the collection is online.  To answer, copy the title given and input
+
  into CONTENTdm advanced search interface as the exact phrase in the Relation:isPartOf field.  Run the search and make sure
+
  this brings up the content, as this is the link created by the software.  Note that the links created
+
  are the ones expected for CONTENTdm -- the software needs updating when Tonio's software goes live.
+
  (It will need to ask where the content is live, verify the URL it creates, and allow you to correct the link
+
  on the command line.)
+
 
+
11) Check it there... mysql, InfoTrack.digColls.  For example:
+
  mysql -u jlderidder -p
+
  (enter password)
+
  use InfoTrack;
+
  select * from digColls where id_2009 like "u0003_0002121";
+
  (when done, enter "quit;")
+
  If not okay, make repairs from command line.
+
 
+
[Verify that links are correct. Update about.php on CONTENTdm server if collection page PHP is not yet written.]
+
 
+
12) copy the metadata spreadsheets to /srv/scripts/metadata/spreadsheets, and make sure they get into the next batch
+
    processing of MODS files.
+
 
+
{Following step applies if we are keeping a use copy of the database on content.lib.ua.edu}
+
[Verify that links are correct. Update about.php on CONTENTdm server if collection page PHP is not yet written.]
+
 
+
12b) Back up database and copy to the content.lib.ua.edu server so the collection page will be updated
+
    by the new entries in the digColls table that you marked as "online."
+
****    NOTE:  do this early or late in the day to avoid disruptions to users. ******
+
  a) On libcontent1:
+
      i)  cd /srv/mysql  (this changes your directory location)
+
      ii) then you back up the existing database:
+
      mysqldump -u jlderidder -p InfoTrack > ./backups/InfoTrack_20090625.sql (where 20090625 is today's date)
+
      iii)then copy it to your home directory on the CONTENTdm server --
+
        substitute your name, use your password:
+
      scp backups/InfoTrack_20090625.sql jlderidder@content.lib.ua.edu:.
+
    b) On content.lib.ua.edu :
+
      i) ssh there;
+
      ii) 'su root' first, to get root access
+
      iii) move the file where we need it:
+
        mv InfoTrack_20090625.sql /usr/local/mysql/var/backups/.
+
      iv) cd /usr/local/mysql/var/    (change your directory location)
+
      v) backup existing database:  use yesterday's date, so as not to overwrite new version:
+
      mysqldump -u jlderidder -p InfoTrack > ./backups/InfoTrack_20090624.sql (where 20090624 is yesterday)
+
      vi) drop the old database:  (NOTE:  the database will go down until you have refilled it!)
+
          mysqladmin -u jlderidder -p drop InfoTrack
+
      vii) create it again:
+
          mysqladmin -u jlderidder create InfoTrack
+
      viii) refill it with the newer backed-up version:
+
          mysql -u firestar -p InfoTrack < backups/InfoTrack_20090625.sql
+
 
+
 
+
'''PREPARE & MOVE CONTENT TO ARCHIVE'''
+
----------------------------------
+
13) look through the directories in /srv/deposits/content for anomalies, badly named things,
+
  and new categories --  any which are not listed on http://libcontent1.lib.ua.edu/lockss/Manifest.html
+
  ( this manifest is in /srv/www/htdocs/lockss/).  Anomalies might be directories which do NOT match
+
  the expected structure (Admin, Metadata, Scans, Transcripts) or subfolder structure beneath them;
+
  spaces in filenames or directories, etc.
+
 
+
13b) move any item-level text files out of scans to S:\Digital Projects\Administrative\collectionInfo\forMDlib\itemMD\
+
  and notify Mary
+
+
13c) RUN testIncoming(in /srv/scripts/qc) and look at the output for problems.
+
+
13d) open /srv/scripts/storing/relocating and set $test = 1.  Run it and look at the output.  Make repairs.
+
 
+
 
+
14) Make whatever repairs are necessary for script to work.  If there are any new categories (z0004?),
+
  then add them, and create top level manifests in those category directories (copy from like directories
+
  and modify to fit).
+
 
+
15) Open /srv/scripts/storing/relocating  and edit out the line $test = 1;
+
 
+
 
+
16) Empty the file /srv/scripts/storing/RelocateManifests, and copy moveme to another filename.
+
 
+
17) run "relocating > look" and look at the output to make sure it's doing what you want.  It will:
+
  a) list the files it's going to move, from where and to where in "moveme"
+
  b) write the new manifests into RelocateManifests so you can see what they're going to look like
+
  c) write the parent manifests to parentMans, so you can see what they're going to look like
+
  d) write errors to "look" file, as well as other info
+
 
+
18) WARNING!  Before running /srv/scripts/storing/relocating, do a chmod -R 755 /archive
+
(after running it, do chmod -R 555 /archive)
+
this will enable you to write to the directories as root --
+
and then close off that ability afterwards.
+
 
+
19) Repair the edits made to relocating script and rerun, this time for real... check results.
+
  This time it will actually copy the content into the directories in /srv/archive
+
  And create and modify existing manifests.
+
 
+
 
+
'''SUPPLEMENTAL WORK, CLEAN-UP, AND QUALITY CONTROL'''
+
-------------------------------------------------
+
20) If the output tells you to modify or create manifests, do that next (if not done already)
+
    in top level of /srv/www/htdocs/lockss directory
+
    The script does NOT create the top 2 levels of manifests, as it does not have sufficient
+
    information about new categories to fill this in.
+
 
+
21) check links at http://libcontent1.lib.ua.edu/lockss/Manifest.html  -- drill down.
+
    If you cannot access it, modify the .htaccess file in /srv/www/htdocs/lockss to allow your IP,
+
    and restart the apache web server:  '/usr/sbin/apache2ctl restart'
+
    (be sure to change it back, and then restart the web server)
+
 
+
22) then run 'checkem > look' in /srv/scripts/storing -- it will use the "moveme" file to
+
    verify md5 sums of content that was moved,
+
    and delete from the deposits directory if there's a match. If there's NOT a match, it will output
+
    "ERROR: "  and the error.  So look through the "look file" for problems.
+
    Again, if there's a lot of content, precede this command with "nohup " and then look later in the
+
    nohup.out file for the content that normally would go into "look".
+
+
22b) run checkArchive and look at ArchiveERRORS;  make repairs.  If needed, alter and run doubleCheck to make sure things were copied over;  checkMans to check that what's linked in the Manifest actually copied; addToMans to add content that was moved but not written to the Manifest.  Problems occur especially when the server goes down during a transfer.  Especially be aware that dozens of tiffs may have been copying when it went down, meaning multiple images are corrupted or damaged.
+
+
 
+
23) then go look in the /srv/deposits directory, make sure folders are clean, and delete them.
+
    I do this:  ls /srv/deposits/content/*
+
                ls /srv/deposits/content/*/*
+
                ls /srv/deposits/content/*/*/*
+
                ls /srv/deposits/content/*/*/*/*
+
    if you find content, there's a problem.  Solve it!!  It may be necessary to rename things and go back
+
      through steps 13 on  again.
+
    if no content:  rm -r /srv/deposits/content/*
+
      (this will recursively delete all content in the directory)
+
 
+
24) run:  chmod -R 555 /archive
+
  this will protect the archive from accidental writes.
+
 
+
25)  go to /srv/scripts/surfacing and run copychange to create derivatives from tiffs, wave files and copy transcripts over to the web directory for access; 
+
 
+
25b) then run /srv/scripts/surfacing/repair  to remediate the doubles created from tiffs which include both thumbs and large images.
+
 
+
26) run getMets to get METS files and EADs
+
 
+
27) get MODS from 13.5 above, place in /srv/scripts/metadata/MODS and distribute into correct web directories, using /srv/scripts/metadata/relocate.
+
 
+
28) run /srv/scripts/metadata/findMissing to find out what doesn't have metadata, what doesn't have derivatives  -- troubleshoot.
+
 
+
The current, working version of these instructions is located here on libcontent1:  /srv/scripts/metadata/storing/DOTHIS.
+
 
+
 
+
 
+
jlderidder, 14 July 2009
+
''updated 2 October 2009'' [[User:Jlderidder|Jlderidder]]
+

Latest revision as of 15:15, 18 December 2012

Currently, we are developing collections on a shared Windows drive called "Share" in the Digital Projects folder.

We are storing content long-term on a SUSE Linux server.

We are delivering content on the same Linux server via Acumen, http://acumen.lib.ua./edu which runs off a web-enabled version of our storage area.

If moving from share

If already in deposits (uploaded by staff)

Personal tools