Moving Content To Long-Term Storage

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
Line 3: Line 3:
 
We are storing content long-term on a Linux server.
 
We are storing content long-term on a Linux server.
  
We are delivering content on another Linux server via CONTENTdm, and Tonio Loewald of OLT has developed an alternative (Linux-based) delivery solution, Acumen  [http://acumen.lib.ua./edu  http://acumen.lib.ua./edu] which runs off a web-enabled version of our storage area.
+
We are delivering content on another Linux server via CONTENTdm, and Tonio Loewald (of the Office of Library Technology) has developed an alternative (Linux-based) delivery solution, Acumen  [http://acumen.lib.ua./edu  http://acumen.lib.ua./edu] which runs off a web-enabled version of our storage area.
  
 
This is the current set of procedures for moving content from Share to long term storage, and linking them into manifests for LOCKSS.
 
This is the current set of procedures for moving content from Share to long term storage, and linking them into manifests for LOCKSS.
This procedure should happen on a regular basis:
+
 
 +
These steps should happen, in order on a regular basis;  weekly if content is available:
  
  
Line 74: Line 75:
 
  11) RUN testIncoming (in /srv/scripts/qc;  script available here:  [[Image:TestIncoming.txt]]) and look at the output for problems.
 
  11) RUN testIncoming (in /srv/scripts/qc;  script available here:  [[Image:TestIncoming.txt]]) and look at the output for problems.
  
 +
12) RUN cleanCollInfo (in /srv/scripts/qc) to clean up the collection xml record in the Admin directory of each collection. 
 +
    This removes Word encodings and checks for valid content.  This also puts a copy of the collection xml record
 +
    in the /srv/deposits/hold directory to be used later to feed the database that feeds our online list of collections
 +
    with links, descriptions, and titles, seen here:  [[http://www.lib.ua.edu/digital/browse Browse Digital Collections]]
  
 
+
  13) If the collection has an icon:  make sure it is in the Admin folder, named for the collection.
'''UPDATE DATABASE WITH COLLECTION INFO'''
+
------------------------------------
+
  12) If the collection has an icon:  make sure it is in the Admin folder, named for the collection.
+
 
   For example, u0003_0002121.icon.png for the collection above.  (You will need to copy this
 
   For example, u0003_0002121.icon.png for the collection above.  (You will need to copy this
   to the CONTENTdm server (content.lib.ua.edu) and place it in the /usr/local/Content4/docs/cdm4/images
+
   to the /srv/www/htdocs/digital/images/  directory,
   directory, with 664 access rights "chmod 664 xyz.png"). List this icon in /srv/scripts/collstuff/icons,
+
   with 664 access rights "chmod 664 xyz.png"). List this icon in /srv/scripts/collstuff/icons,
   so the following script will know not to apply a default icon link.
+
   so the collToDbase script will know not to apply a default icon link.
 
+
13) Then go to the collstuff directory /srv/scripts/collstuff and run: collToDbase
+
  to put the collection info in the database.  This looks through the Admin folders in the top level
+
  directories placed in /srv/deposits, for collection-numbered XML files.  It parses through the content,
+
  repairing encoding errors, puts the content into mysql InfoTrack.allColls, adds the expected canned link,
+
  and asks you for each collection and each delivery system whether or not the collection is online.
+
  The software will give you the expected URL to test;  if it fails, and you know the content is in the system,
+
  locate the correct URL and provide it to the software, which will ask for a correction.
+
 
+
14) Check the database entries... mysql, InfoTrack.allColls.  For example:
+
  mysql -u jlderidder -p
+
  (enter password)
+
  use InfoTrack;
+
  select * from digColls where id_2009 like "u0003_0002121" and AnalogOrDigital like "D";
+
  (when done, enter "quit;")
+
  If not okay, make repairs from command line.
+
 
+
15) Verify that links and display are correct.  they will show up online here: [[http://www.lib.ua.edu/digital/browse|http://www.lib.ua.edu/digital/browse]].
+
 
+
  
  
Line 108: Line 91:
 
----------------------------------
 
----------------------------------
 
   
 
   
  16)  
+
  14)  
 
   a) open /srv/scripts/storing/relocating (script available here: [[Image:Relocating.txt]]) and set $test = 1.   
 
   a) open /srv/scripts/storing/relocating (script available here: [[Image:Relocating.txt]]) and set $test = 1.   
 
   b) run "relocating > look".  It will:
 
   b) run "relocating > look".  It will:
Line 117: Line 100:
 
   c) Look at the output in the "look" file, and the manifests concatenated in RelocateManifests and parentMans.   
 
   c) Look at the output in the "look" file, and the manifests concatenated in RelocateManifests and parentMans.   
  
  17) Make whatever repairs are necessary for script to work.  If there are any new categories (z0004?),
+
  15) Make whatever repairs are necessary for script to work.  If there are any new categories (z0004?),
 
   then add them, and create top level manifests in those category directories (copy from like directories
 
   then add them, and create top level manifests in those category directories (copy from like directories
 
   and modify to fit).
 
   and modify to fit).
  
  18) Open /srv/scripts/storing/relocating  and edit out the line $test = 1;
+
  16) WARNING!  Before running /srv/scripts/storing/relocating, do a "chmod -R 755 /srv/archive".  This will enable you to  
 
+
19) Empty the file /srv/scripts/storing/RelocateManifests, and copy moveme to another filename.
+
 
+
20) WARNING!  Before running /srv/scripts/storing/relocating, do a "chmod -R 755 /srv/archive".  This will enable you to  
+
 
     write to the directories as root  
 
     write to the directories as root  
  
  21) run "relocating > look" again, this time for real.  It will take hours.  Occasionally check the output in "look" and the  
+
  17) Open /srv/scripts/storing/relocating  and edit out the line $test = 1;
     archive directory for errors.
+
 
 +
18) Empty the file /srv/scripts/storing/RelocateManifests, and copy moveme to another filename.
 +
 +
19) run "relocating > look" again, this time for real.  It will take hours.  Occasionally check the output in "look" and the  
 +
     archive directory for errors.  This time it's really copying files into the correct location in the archive (making a
 +
    note of what goes where in 'moveme' for the 'checkem" script in a few steps),
 +
    and it versions metadata and xml files, and it links versioned
 +
    files and tiffs and wave files into the correct manifests, and links in new collections to the manifest pages of the
 +
    holders (the next directory up from collections, such as u0003 for all Hoole Manuscript content).
  
  
Line 137: Line 124:
 
-------------------------------------------------
 
-------------------------------------------------
  
  22) If the output tells you to modify or create manifests, do that next (if not done already)
+
  20) If the output tells you to modify or create manifests, do that next (if not done already)
 
     in top level of /srv/www/htdocs/lockss directory
 
     in top level of /srv/www/htdocs/lockss directory
 
     The script does NOT create the top 2 levels of manifests, as it does not have sufficient
 
     The script does NOT create the top 2 levels of manifests, as it does not have sufficient
 
     information about new categories to fill this in.
 
     information about new categories to fill this in.
  
  23) check links at http://libcontent1.lib.ua.edu/lockss/Manifest.html  -- drill down.
+
  21) check links at http://libcontent1.lib.ua.edu/lockss/Manifest.html  -- drill down.
 
     If you cannot access it, modify the .htaccess file in /srv/www/htdocs/lockss to allow your IP,
 
     If you cannot access it, modify the .htaccess file in /srv/www/htdocs/lockss to allow your IP,
 
     and restart the apache web server:  '/usr/sbin/apache2ctl restart'
 
     and restart the apache web server:  '/usr/sbin/apache2ctl restart'
 
     (be sure to change it back, and then restart the web server)
 
     (be sure to change it back, and then restart the web server)
  
  24) then run 'checkem > look' in /srv/scripts/storing  (script available here:  [[Image:Checkem.txt]])  
+
  22) then run 'checkem > look' in /srv/scripts/storing  (script available here:  [[Image:Checkem.txt]])  
 
     -- it will use the "moveme" file to
 
     -- it will use the "moveme" file to
 
     verify md5 sums of content that was moved,
 
     verify md5 sums of content that was moved,
Line 155: Line 142:
 
     nohup.out file for the content that normally would go into "look".
 
     nohup.out file for the content that normally would go into "look".
 
   
 
   
   25) run checkArchive (script available here;  [[Image:CheckArchive.txt]]) and look at ArchiveERRORS;  make repairs.  
+
   23) run checkArchive (script available here;  [[Image:CheckArchive.txt]]) and look at ArchiveERRORS;  make repairs.  
 
   If needed, alter and run doubleCheck to make sure things were copied over;  checkMans to check that what's linked in  
 
   If needed, alter and run doubleCheck to make sure things were copied over;  checkMans to check that what's linked in  
 
   the Manifest actually copied; addToMans to add content that was moved but not written to the Manifest.  Problems occur
 
   the Manifest actually copied; addToMans to add content that was moved but not written to the Manifest.  Problems occur
Line 161: Line 148:
 
   when it went down, meaning multiple images are corrupted or damaged.
 
   when it went down, meaning multiple images are corrupted or damaged.
 
   
 
   
   26)  do a "chmod -R 555 /srv/archive"  to protect the archive from unauthorized alteration until next time.  
+
   24)  do a "chmod -R 555 /srv/archive"  to protect the archive from unauthorized alteration until next time.  
  
   27) then go look in the /srv/deposits directory, make sure folders are clean, and delete them.
+
   25) then go look in the /srv/deposits directory, make sure folders are clean, and delete them.
 
     I do this:  ls /srv/deposits/content/*
 
     I do this:  ls /srv/deposits/content/*
 
                 ls /srv/deposits/content/*/*
 
                 ls /srv/deposits/content/*/*
Line 173: Line 160:
 
       (this will recursively delete all content in the directory)
 
       (this will recursively delete all content in the directory)
  
   28)  go to /srv/scripts/surfacing and run copychange (script available here:  [[Image:Copychange.txt]])  
+
   26)  go to /srv/scripts/surfacing and run copychange (script available here:  [[Image:Copychange.txt]])  
 
     to create derivatives from tiffs, wave files and copy transcripts over to the web directory for access;   
 
     to create derivatives from tiffs, wave files and copy transcripts over to the web directory for access;   
  
   29) then run /srv/scripts/surfacing/repair  to remediate the doubles created from tiffs which include both thumbs and large images.
+
   27) then run /srv/scripts/surfacing/repair  to remediate the doubles created from tiffs which include both thumbs  
 +
      and large images.
  
   30) cd /home/Jeremiah and locate the MODS for what was just moved, in the MODS directory.  Run "relocate.pl" to send them live.
+
   28) cd /home/jeremiah and locate the MODS for what was just moved, in the MODS directory.  Run "relocate.pl"  
 +
      to send them live.
  
   31) run /srv/scripts/metadata/findMissing to find out what doesn't have metadata, what doesn't have derivatives  -- troubleshoot.
+
   29) run /srv/scripts/metadata/findMissing to find out what doesn't have metadata, what doesn't have derivatives  --  
 +
      troubleshoot and make any necessary corrections.
  
   32) Either index these directories immediately from the Acumen admin interface  
+
   30) Either index these directories immediately from the Acumen admin interface  
 
   ([http://acumen.lib.ua.edu/admin  http://acumen.lib.ua.edu/admin]) or wait a little while for the cron indexer to pick  
 
   ([http://acumen.lib.ua.edu/admin  http://acumen.lib.ua.edu/admin]) or wait a little while for the cron indexer to pick  
 
   up the content, then check the output: [http://acumen.lib.ua.edu/  http://acumen.lib.ua.edu/].  Content should appear
 
   up the content, then check the output: [http://acumen.lib.ua.edu/  http://acumen.lib.ua.edu/].  Content should appear
Line 188: Line 178:
 
   [http://libcontent1.lib.ua.edu/content/  http://libcontent1.lib.ua.edu/content/].  See [[Find our content online]] for  
 
   [http://libcontent1.lib.ua.edu/content/  http://libcontent1.lib.ua.edu/content/].  See [[Find our content online]] for  
 
   detailed instructions.
 
   detailed instructions.
 +
 +
 +
'''UPDATE DATABASE WITH COLLECTION INFO'''
 +
------------------------------------
 +
 +
 +
31) Then go to the collstuff directory /srv/scripts/collstuff and run: collToDbase.holdDir
 +
  to put the collection info in the database.  This takes those cleaned-up collection xml files that were
 +
  copied into the /srv/deposits/hold/ directory (step 12) and does a number of things.  It puts them into place
 +
  in Acumen, so they can be used by that software to provide a collection name and a bit of info about it.  It also
 +
  parses through the content,
 +
  repairing encoding errors, puts the content into mysql InfoTrack.allColls, adds the expected canned link,
 +
  and asks you for each collection and each delivery system whether or not the collection is online.
 +
  The software will give you the expected URL to test;  if it fails, and you know the content is in the system,
 +
  locate the correct URL and provide it to the software, which will ask for a correction.
 +
 +
32) Check the database entries... mysql, InfoTrack.allColls.  For example:
 +
  mysql -u jlderidder -p
 +
  (enter password)
 +
  use InfoTrack;
 +
  select * from digColls where id_2009 like "u0003_0002121" and AnalogOrDigital like "D";
 +
  (when done, enter "quit;")
 +
  If not okay, make repairs from command line.
 +
 +
33) Verify that links and display are correct.  they will show up online here: [[http://www.lib.ua.edu/digital/browse|http://www.lib.ua.edu/digital/browse]].
  
  

Revision as of 16:53, 25 February 2010

Currently, we are developing collections on a shared Windows drive called "Share" in the Digital Projects folder.

We are storing content long-term on a Linux server.

We are delivering content on another Linux server via CONTENTdm, and Tonio Loewald (of the Office of Library Technology) has developed an alternative (Linux-based) delivery solution, Acumen http://acumen.lib.ua./edu which runs off a web-enabled version of our storage area.

This is the current set of procedures for moving content from Share to long term storage, and linking them into manifests for LOCKSS.

These steps should happen, in order on a regular basis; weekly if content is available:


TO MOVE CONTENT TO STORAGE

ACCESS and IDENTIFICATION


1) ssh to  libcontent1.lib.ua.edu.
2)  To get root access, run: sudo su  (if you do not have sudo access, you cannot do this process)
3)  To get access to the share drive from the libcontent1 Linux server, run:
 mount -t cifs -o username=jlderidder,domain=lib //libfs1.lib.ua-net.ua.edu/share/Digital\ Projects/ /cifs-mount
 and type in your password to the share drive.  (substitute your own username there in the line above)
4)  To determine what the status in on content in the Digital Services folder on share,
 run: /srv/scripts/qc/status
 This script goes through the Digital_Coll_Complete and Digital_Coll_in_progress folders, and pulls out
 paths to folders that end in _ready (for Mary), _store and _online (for pickup for storage) and also
 _check, for digital services quality control.  The lists of these paths are written to files in
 /srv/scripts/qc/lists:
 forMary are things waiting for her;
 checkme are for the DS folks
 storeMe are things ready to go online and be stored 
5) Share this output with Digital Services folks, to verify it is correct.


MOVE TO LINUX SERVER


6) Once verified,  look under /cifs-mount for content in share directory.
  Copy it over to /srv/deposits/content/.
  for example:
  cp -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939 /srv/deposits/content/.
  This does a recursive copy of the entire specified directory and all its contents, and places the copy
  in the /srv/deposits/content directory on the libcontent1 server.
7) do a diff from /srv/deposits/content to the location of the content on share drive area
  to see if we got the content ok; for example:
  diff -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939  /srv/deposits/content/u0003_0002121_Aston_1939
  look at the output:  if none, it's a match.  If there's a variation, recopy the specified files,
  and then run the diff again.  If it's a big directory, prepend the command with "nohup "  so the
  process will continue to run if you log off or lose your connection.  The output then will be
  in the nohup.out file in the directory where you ran the "diff" command.
8)  Once you're sure you have a clean copy (no output from "diff"), delete the specified directory from
   the share drive.  For example:  rm -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939
   This is a powerful command, a recursive removal of a directory and everything in it,
   including subdirectories.  Be careful with it.


PREPARE CONTENT


9) look through the directories in /srv/deposits/content for anomalies, badly named things,
  and new categories --  any which are not listed on http://libcontent1.lib.ua.edu/lockss/Manifest.html
 ( this manifest is in /srv/www/htdocs/lockss/).  Anomalies might be directories which do NOT match
  the expected structure (Admin, Metadata, Scans, Transcripts) or subfolder structure beneath them;
  spaces in filenames or directories, etc.  Report any problems to Digital Services staff and arrange for corrections before
  proceeding further
10) move any item-level text files out of scans to S:\Digital Projects\Administrative\collectionInfo\forMDlib\itemMD\
  and notify Mary

11) RUN testIncoming (in /srv/scripts/qc;  script available here:  File:TestIncoming.txt) and look at the output for problems.
12) RUN cleanCollInfo (in /srv/scripts/qc) to clean up the collection xml record in the Admin directory of each collection.  
   This removes Word encodings and checks for valid content.  This also puts a copy of the collection xml record 
   in the /srv/deposits/hold directory to be used later to feed the database that feeds our online list of collections
   with links, descriptions, and titles, seen here:   [Browse Digital Collections]
13) If the collection has an icon:  make sure it is in the Admin folder, named for the collection.
  For example, u0003_0002121.icon.png for the collection above.  (You will need to copy this
  to the /srv/www/htdocs/digital/images/  directory, 
  with 664 access rights "chmod 664 xyz.png"). List this icon in /srv/scripts/collstuff/icons,
  so the collToDbase script will know not to apply a default icon link.


MOVE CONTENT TO ARCHIVE


14) 
 a) open /srv/scripts/storing/relocating (script available here: File:Relocating.txt) and set $test = 1.  
 b) run "relocating > look".  It will:
    i) list the files it's going to move, from where and to where in "moveme"
    ii) write the new manifests into RelocateManifests so you can see what they're going to look like
    iii) write the parent manifests to parentMans, so you can see what they're going to look like
    iv) write errors to "look" file, as well as other info
 c) Look at the output in the "look" file, and the manifests concatenated in RelocateManifests and parentMans.  
15) Make whatever repairs are necessary for script to work.  If there are any new categories (z0004?),
  then add them, and create top level manifests in those category directories (copy from like directories
  and modify to fit).
16) WARNING!  Before running /srv/scripts/storing/relocating, do a "chmod -R 755 /srv/archive".  This will enable you to 
    write to the directories as root 
17) Open /srv/scripts/storing/relocating  and edit out the line $test = 1;
18) Empty the file /srv/scripts/storing/RelocateManifests, and copy moveme to another filename.

19) run "relocating > look" again, this time for real.  It will take hours.  Occasionally check the output in "look" and the 
    archive directory for errors.  This time it's really copying files into the correct location in the archive (making a 
    note of what goes where in 'moveme' for the 'checkem" script in a few steps), 
    and it versions metadata and xml files, and it links versioned
    files and tiffs and wave files into the correct manifests, and links in new collections to the manifest pages of the 
    holders (the next directory up from collections, such as u0003 for all Hoole Manuscript content).



SUPPLEMENTAL WORK, CLEAN-UP, AND QUALITY CONTROL


20) If the output tells you to modify or create manifests, do that next (if not done already)
   in top level of /srv/www/htdocs/lockss directory
   The script does NOT create the top 2 levels of manifests, as it does not have sufficient
   information about new categories to fill this in.
21) check links at http://libcontent1.lib.ua.edu/lockss/Manifest.html  -- drill down.
   If you cannot access it, modify the .htaccess file in /srv/www/htdocs/lockss to allow your IP,
   and restart the apache web server:  '/usr/sbin/apache2ctl restart'
   (be sure to change it back, and then restart the web server)
22) then run 'checkem > look' in /srv/scripts/storing  (script available here:  File:Checkem.txt) 
    -- it will use the "moveme" file to
   verify md5 sums of content that was moved,
   and delete from the deposits directory if there's a match.  If there's NOT a match, it will output
   "ERROR: "  and the error.  So look through the "look file" for problems.
   Again, if there's a lot of content, precede this command with "nohup " and then look later in the
   nohup.out file for the content that normally would go into "look".

 23) run checkArchive (script available here;  File:CheckArchive.txt) and look at ArchiveERRORS;  make repairs. 
 If needed, alter and run doubleCheck to make sure things were copied over;  checkMans to check that what's linked in 
 the Manifest actually copied; addToMans to add content that was moved but not written to the Manifest.  Problems occur
 especially when the server goes down during a transfer.  Especially be aware that dozens of tiffs may have been copying
 when it went down, meaning multiple images are corrupted or damaged.

 24)   do a "chmod -R 555 /srv/archive"  to protect the archive from unauthorized alteration until next time. 
 25) then go look in the /srv/deposits directory, make sure folders are clean, and delete them.
   I do this:  ls /srv/deposits/content/*
               ls /srv/deposits/content/*/*
               ls /srv/deposits/content/*/*/*
               ls /srv/deposits/content/*/*/*/*
   if you find content, there's a problem.  Solve it!!  It may be necessary to rename things and go back
     through steps 13 on  again.
   if no content:  rm -r /srv/deposits/content/*
     (this will recursively delete all content in the directory)
 26)  go to /srv/scripts/surfacing and run copychange (script available here:  File:Copychange.txt) 
   to create derivatives from tiffs, wave files and copy transcripts over to the web directory for access;   
 27) then run /srv/scripts/surfacing/repair  to remediate the doubles created from tiffs which include both thumbs 
     and large images.
 28) cd /home/jeremiah  and locate the MODS for what was just moved, in the MODS directory.  Run "relocate.pl" 
     to send them live.
 29) run /srv/scripts/metadata/findMissing to find out what doesn't have metadata, what doesn't have derivatives  -- 
     troubleshoot and make any necessary corrections.
 30) Either index these directories immediately from the Acumen admin interface 
  (http://acumen.lib.ua.edu/admin) or wait a little while for the cron indexer to pick 
  up the content, then check the output: http://acumen.lib.ua.edu/.  Content should appear
  already in the web directories located here:
  http://libcontent1.lib.ua.edu/content/.  See Find our content online for 
  detailed instructions.


UPDATE DATABASE WITH COLLECTION INFO



31) Then go to the collstuff directory /srv/scripts/collstuff and run: collToDbase.holdDir
 to put the collection info in the database.  This takes those cleaned-up collection xml files that were
 copied into the /srv/deposits/hold/ directory (step 12) and does a number of things.  It puts them into place
 in Acumen, so they can be used by that software to provide a collection name and a bit of info about it.  It also
 parses through the content,
 repairing encoding errors, puts the content into mysql InfoTrack.allColls, adds the expected canned link,
 and asks you for each collection and each delivery system whether or not the collection is online.
 The software will give you the expected URL to test;  if it fails, and you know the content is in the system,
 locate the correct URL and provide it to the software, which will ask for a correction.
32) Check the database entries... mysql, InfoTrack.allColls.  For example:
  mysql -u jlderidder -p
  (enter password)
  use InfoTrack;
  select * from digColls where id_2009 like "u0003_0002121" and AnalogOrDigital like "D";
  (when done, enter "quit;")
  If not okay, make repairs from command line.
33) Verify that links and display are correct.  they will show up online here: [[1]].


The current, working version of these instructions is located here on libcontent1: /srv/scripts/metadata/storing/DOTHIS.



updated 22 February 2010 Jlderidder

Personal tools