If already in deposits

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
(New page: This is the current set of procedures for moving content from Share to long term storage, and linking them into manifests for LOCKSS. These steps should happen, in order on a regular basi...)
 
(TO MOVE CONTENT TO STORAGE)
 
(21 intermediate revisions by 3 users not shown)
Line 9: Line 9:
 
'''ACCESS and IDENTIFICATION'''
 
'''ACCESS and IDENTIFICATION'''
 
-------------------------
 
-------------------------
  1) ssh to  libcontent1.lib.ua.edu.
+
  1) ssh to  libcontent.lib.ua.edu.
  
 
  2)  To get root access, run: sudo su  (if you do not have sudo access, you cannot do this process)
 
  2)  To get root access, run: sudo su  (if you do not have sudo access, you cannot do this process)
  
  3)  To get access to the share drive from the libcontent1 Linux server, run:
+
  3)  Check to see if the mount point is working.  If it is, skip step 4.  To check, type in:
 +
  ls /cifs-mount/
 +
  If nothing is listed, or you get an error, you probably need to do step 4.  Run this (#3) twice though,
 +
  because the network is sometimes slow.
 +
 
 +
  4)  To get access to the share drive from the libcontent Linux server, run:
 
   mount -t cifs -o username=jlderidder,domain=lib //libfs1.lib.ua-net.ua.edu/share/Digital\ Projects/ /cifs-mount
 
   mount -t cifs -o username=jlderidder,domain=lib //libfs1.lib.ua-net.ua.edu/share/Digital\ Projects/ /cifs-mount
 
   and type in your password to the share drive.  (substitute your own username there in the line above)
 
   and type in your password to the share drive.  (substitute your own username there in the line above)
Line 22: Line 27:
 
------------------------
 
------------------------
  
  4) look through the directories in /srv/deposits/content for anomalies, badly named things,
+
  1) look through the directories in /srv/deposits/content for anomalies, badly named things,
   and new categories --  any which are not listed on http://libcontent1.lib.ua.edu/lockss/Manifest.html
+
   and new categories --  any which are not listed on http://libcontent.lib.ua.edu/lockss/Manifest.html
 
   ( this manifest is in /srv/www/htdocs/lockss/).  Anomalies might be directories which do NOT match
 
   ( this manifest is in /srv/www/htdocs/lockss/).  Anomalies might be directories which do NOT match
 
   the expected structure (Admin, Metadata, Scans, Transcripts) or subfolder structure beneath them;
 
   the expected structure (Admin, Metadata, Scans, Transcripts) or subfolder structure beneath them;
Line 29: Line 34:
 
   proceeding further
 
   proceeding further
  
   5) copy any ocrList.txt files in Admin directory to /srv/deposits/ocrMe for separate processing
+
   2) copy any *ocrList.txt files in Admin directory to /srv/deposits/ocrMe for separate processing
  
  
   6) RUN testDeposits (in /srv/scripts/qc;  script available here:  [[Image:TestDeposits.txt]]) and look at the output  
+
   3) RUN testDeposits (in /srv/scripts/qc;  script available here:  [[Image:TestDeposits.txt]]) and look at the output  
     for problems.
+
     for problems.  If this script runs across subdirectories named "Box" or "Folder" then it will spit out an error and
 +
    tell you to run testBoxFolder ([[Image:TestBoxFolder.txt]]) instead.  This is for collections that are digitized without
 +
    item-level metadata, with box and folder information in the filename to assist in linking from the correct part of the EAD.
 +
    So if you get this message, run testBoxFolder on the specified collection(s) and look at the output for any problems.
 +
 
 +
  4) RUN waitCheck ( in /srv/scripts/storing/).  This will check to see if any of the deposited collections have been harvested by
 +
    LOCKSS partners.  If so, this content may need to be held and staged for processing.  The script will ask if you want to
 +
    proceed with depositing those collections.  If you indicate yes, it will back up the manifest with a datestamp and LOCKSS
 +
    in the file name, which will ensure that we capture the sizes of each version of manifest separately.  The size of our
 +
    preservation content impacts the cost of our participation in LOCKSS.
  
  
Line 39: Line 53:
 
----------------------------------
 
----------------------------------
 
   
 
   
  7)  
+
  1)  
 
   a) open /srv/scripts/storing/relocating (script available here: [[Image:Relocating.txt]]) and set $test = 1.   
 
   a) open /srv/scripts/storing/relocating (script available here: [[Image:Relocating.txt]]) and set $test = 1.   
   b) run "relocating > look".  It will:
+
   b) Remove the file /srv/scripts/storing/RelocateManifests
 +
  c) run "relocating > look".  It will:
 
     i) list the files it's going to move, from where and to where in "moveme"
 
     i) list the files it's going to move, from where and to where in "moveme"
 
     ii) write the new manifests into RelocateManifests so you can see what they're going to look like
 
     ii) write the new manifests into RelocateManifests so you can see what they're going to look like
 
     iii) write the parent manifests to parentMans, so you can see what they're going to look like
 
     iii) write the parent manifests to parentMans, so you can see what they're going to look like
 
     iv) write errors to "look" file, as well as other info
 
     iv) write errors to "look" file, as well as other info
   c) Look at the output in the "look" file, and the manifests concatenated in RelocateManifests and parentMans.   
+
   d) Look at the output in the "look" file, and the manifests concatenated in RelocateManifests (check those at the end for modifications, additions) and parentMans.   
  
  8) Make whatever repairs are necessary for script to work.  If there are any new categories (z0004?),
+
  2) Make whatever repairs are necessary for script to work.  If there are any new categories (z0004?),
 
   then add them, and create top level manifests in those category directories (copy from like directories
 
   then add them, and create top level manifests in those category directories (copy from like directories
   and modify to fit).
+
   and modify to fit).
  
  9) WARNING!  Before running /srv/scripts/storing/relocating, do a "chmod -R 755 /srv/archive". This will enable you to
+
  3) Open /srv/scripts/storing/relocating  and edit out the line $test = 1;
    write to the directories as root
+
  
  10) Open /srv/scripts/storing/relocating  and edit out the line $test = 1;
+
  4)  delete nohup.out.
 
+
11) Empty the file /srv/scripts/storing/RelocateManifests, and copy moveme to another filename; delete nohup.out.
+
 
   
 
   
  12) run "nohup relocating"  -- this time for real.  It will take hours;  "nohup" keeps it running even if you disconnect.  
+
  5) run "nohup relocating"  -- this time for real.  It will take hours;  "nohup" keeps it running even if you disconnect.  
 
     The output will be written to "nohup.out"  -- check it occasionally, and the  
 
     The output will be written to "nohup.out"  -- check it occasionally, and the  
 
     archive directory for errors.  This time it's really copying files into the correct location in the archive (making a  
 
     archive directory for errors.  This time it's really copying files into the correct location in the archive (making a  
     note of what goes where in 'moveme' for the 'checkem" script in a few steps),  
+
     note of what goes where in 'moveme2' for the 'checkem2" script in a few steps),  
 
     and it versions metadata and xml files, and it links versioned
 
     and it versions metadata and xml files, and it links versioned
 
     files and tiffs and wave files into the correct manifests, and links in new collections to the manifest pages of the  
 
     files and tiffs and wave files into the correct manifests, and links in new collections to the manifest pages of the  
 
     holders (the next directory up from collections, such as u0003 for all Hoole Manuscript content).
 
     holders (the next directory up from collections, such as u0003 for all Hoole Manuscript content).
  
 
+
6) Double check parentMans for Manifests that may need to be created at the next level up;  create them if you didn't already, including the links you find here.
  
  
Line 73: Line 85:
 
-------------------------------------------------
 
-------------------------------------------------
  
  13) If the output tells you to modify or create manifests, do that next (if not done already)
+
  1) If the output tells you to modify or create manifests, do that next (if not done already)
 
     in top level of /srv/www/htdocs/lockss directory
 
     in top level of /srv/www/htdocs/lockss directory
 
     The script does NOT create the top 2 levels of manifests, as it does not have sufficient
 
     The script does NOT create the top 2 levels of manifests, as it does not have sufficient
 
     information about new categories to fill this in.
 
     information about new categories to fill this in.
  
  14) check links at http://libcontent1.lib.ua.edu/lockss/Manifest.html  -- drill down.
+
  2) check links at http://libcontent.lib.ua.edu/lockss/Manifest.html  -- drill down.
 
     If you cannot access it, modify the .htaccess file in /srv/www/htdocs/lockss to allow your IP,
 
     If you cannot access it, modify the .htaccess file in /srv/www/htdocs/lockss to allow your IP,
 
     and restart the apache web server:  '/usr/sbin/apache2ctl restart'
 
     and restart the apache web server:  '/usr/sbin/apache2ctl restart'
 
     (be sure to change it back, and then restart the web server)
 
     (be sure to change it back, and then restart the web server)
  
  15) then run 'checkem > look' in /srv/scripts/storing  (script available here:  [[Image:Checkem.txt]])  
+
  3) then run 'checkem2 > look' in /srv/scripts/storing  (script available here:  [[Image:Checkem.txt]])  
     -- it will use the "moveme" file to
+
     -- it will use the "moveme2" file to
 
     verify md5 sums of content that was moved,
 
     verify md5 sums of content that was moved,
 
     and delete from the deposits directory if there's a match.  If there's NOT a match, it will output
 
     and delete from the deposits directory if there's a match.  If there's NOT a match, it will output
Line 91: Line 103:
 
     nohup.out file for the content that normally would go into "look".
 
     nohup.out file for the content that normally would go into "look".
 
   
 
   
   16) run checkArchive (script available here;  [[Image:CheckArchive.txt]]) and look at ArchiveERRORS;  make repairs.  
+
   4) run checkArchive (script available here;  [[Image:CheckArchive.txt]]) and look at ArchiveERRORS;  make repairs.  
 
     If needed, alter and run doubleCheck to make sure things were copied over;  checkMans to check that what's linked in  
 
     If needed, alter and run doubleCheck to make sure things were copied over;  checkMans to check that what's linked in  
 
     the Manifest actually copied; addToMans to add content that was moved but not written to the Manifest.  Problems occur
 
     the Manifest actually copied; addToMans to add content that was moved but not written to the Manifest.  Problems occur
Line 97: Line 109:
 
     when it went down, meaning multiple images are corrupted or damaged.
 
     when it went down, meaning multiple images are corrupted or damaged.
 
   
 
   
  17)  do a "chmod -R 755 /srv/archive"  to protect the archive from unauthorized alteration until next time.
 
  
   18) then go look in the /srv/deposits directory, make sure folders are clean, and delete them.
+
   5)  do a "chown -R jeremiah:archive" to correct permissions and do a "chmod -R 775 /srv/archive"  to protect the archive from unauthorized alteration until next time.
 +
 
 +
  6) then go look in the /srv/deposits directory, make sure folders are clean, and delete them.
 
     I do this:  ls /srv/deposits/content/*
 
     I do this:  ls /srv/deposits/content/*
 
                 ls /srv/deposits/content/*/*
 
                 ls /srv/deposits/content/*/*
Line 108: Line 121:
 
     if no content:  rm -r /srv/deposits/content/*
 
     if no content:  rm -r /srv/deposits/content/*
 
       (this will recursively delete all content in the directory)
 
       (this will recursively delete all content in the directory)
 +
  7) run ocrSelected ([[Image:ocrSelected.txt]]) in /srv/scripts/surfacing.  This will go through the
 +
    *ocrList files in /srv/deposits/ocrMe, select out the items marked with a "1" --
 +
    locate the tiffs, create OCR files, create directories for them, and place them live in the content directory where they belong.
  
 +
  8) check results, and delete files in /srv/deposits/ocrMe
  
''The current, working version of these instructions is located here on libcontent1:  /srv/scripts/metadata/storing/DOTHIS.''
+
''updated 1 July  2011'' [[User:Jlderidder|Jlderidder]]
 
+
 
+
''updated 13 October 2010'' [[User:Jlderidder|Jlderidder]]
+

Latest revision as of 13:57, 10 January 2014

This is the current set of procedures for moving content from Share to long term storage, and linking them into manifests for LOCKSS.

These steps should happen, in order on a regular basis; weekly if content is available:


[edit] TO MOVE CONTENT TO STORAGE

ACCESS and IDENTIFICATION


1) ssh to  libcontent.lib.ua.edu.
2)  To get root access, run: sudo su  (if you do not have sudo access, you cannot do this process)
3)   Check to see if the mount point is working.  If it is, skip step 4.  To check, type in:
  ls /cifs-mount/
 If nothing is listed, or you get an error, you probably need to do step 4.  Run this (#3) twice though,
 because the network is sometimes slow.
 4)  To get access to the share drive from the libcontent Linux server, run:
 mount -t cifs -o username=jlderidder,domain=lib //libfs1.lib.ua-net.ua.edu/share/Digital\ Projects/ /cifs-mount
 and type in your password to the share drive.  (substitute your own username there in the line above)


PREPARE CONTENT


1) look through the directories in /srv/deposits/content for anomalies, badly named things,
  and new categories --  any which are not listed on http://libcontent.lib.ua.edu/lockss/Manifest.html
 ( this manifest is in /srv/www/htdocs/lockss/).  Anomalies might be directories which do NOT match
  the expected structure (Admin, Metadata, Scans, Transcripts) or subfolder structure beneath them;
  spaces in filenames or directories, etc.  Report any problems to Digital Services staff and arrange for corrections before
  proceeding further
 2) copy any *ocrList.txt files in Admin directory to /srv/deposits/ocrMe for separate processing


 3) RUN testDeposits (in /srv/scripts/qc;  script available here:  File:TestDeposits.txt) and look at the output 
   for problems.  If this script runs across subdirectories named "Box" or "Folder" then it will spit out an error and 
   tell you to run testBoxFolder (File:TestBoxFolder.txt) instead.  This is for collections that are digitized without
   item-level metadata, with box and folder information in the filename to assist in linking from the correct part of the EAD.
   So if you get this message, run testBoxFolder on the specified collection(s) and look at the output for any problems.
 4) RUN waitCheck ( in /srv/scripts/storing/).  This will check to see if any of the deposited collections have been harvested by
    LOCKSS partners.  If so, this content may need to be held and staged for processing.  The script will ask if you want to 
    proceed with depositing those collections.  If you indicate yes, it will back up the manifest with a datestamp and LOCKSS
    in the file name, which will ensure that we capture the sizes of each version of manifest separately.  The size of our 
    preservation content impacts the cost of our participation in LOCKSS.


MOVE CONTENT TO ARCHIVE


1) 
 a) open /srv/scripts/storing/relocating (script available here: File:Relocating.txt) and set $test = 1.  
 b) Remove the file /srv/scripts/storing/RelocateManifests
 c) run "relocating > look".  It will:
    i) list the files it's going to move, from where and to where in "moveme"
    ii) write the new manifests into RelocateManifests so you can see what they're going to look like
    iii) write the parent manifests to parentMans, so you can see what they're going to look like
    iv) write errors to "look" file, as well as other info
 d) Look at the output in the "look" file, and the manifests concatenated in RelocateManifests (check those at the end for modifications, additions) and parentMans.  
2) Make whatever repairs are necessary for script to work.  If there are any new categories (z0004?),
  then add them, and create top level manifests in those category directories (copy from like directories
  and modify to fit).  
3) Open /srv/scripts/storing/relocating  and edit out the line $test = 1;
4)  delete nohup.out.

5) run "nohup relocating"  -- this time for real.  It will take hours;  "nohup" keeps it running even if you disconnect. 
    The output will be written to "nohup.out"  -- check it occasionally, and the 
    archive directory for errors.  This time it's really copying files into the correct location in the archive (making a 
    note of what goes where in 'moveme2' for the 'checkem2" script in a few steps), 
    and it versions metadata and xml files, and it links versioned
    files and tiffs and wave files into the correct manifests, and links in new collections to the manifest pages of the 
    holders (the next directory up from collections, such as u0003 for all Hoole Manuscript content).
6) Double check parentMans for Manifests that may need to be created at the next level up;  create them if you didn't already, including the links you find here.


SUPPLEMENTAL WORK, CLEAN-UP, AND QUALITY CONTROL


1) If the output tells you to modify or create manifests, do that next (if not done already)
   in top level of /srv/www/htdocs/lockss directory
   The script does NOT create the top 2 levels of manifests, as it does not have sufficient
   information about new categories to fill this in.
2) check links at http://libcontent.lib.ua.edu/lockss/Manifest.html  -- drill down.
   If you cannot access it, modify the .htaccess file in /srv/www/htdocs/lockss to allow your IP,
   and restart the apache web server:  '/usr/sbin/apache2ctl restart'
   (be sure to change it back, and then restart the web server)
3) then run 'checkem2 > look' in /srv/scripts/storing  (script available here:  File:Checkem.txt) 
    -- it will use the "moveme2" file to
   verify md5 sums of content that was moved,
   and delete from the deposits directory if there's a match.  If there's NOT a match, it will output
   "ERROR: "  and the error.  So look through the "look file" for problems.
   Again, if there's a lot of content, precede this command with "nohup " and then look later in the
   nohup.out file for the content that normally would go into "look".

 4) run checkArchive (script available here;  File:CheckArchive.txt) and look at ArchiveERRORS;  make repairs. 
   If needed, alter and run doubleCheck to make sure things were copied over;  checkMans to check that what's linked in 
   the Manifest actually copied; addToMans to add content that was moved but not written to the Manifest.  Problems occur
   especially when the server goes down during a transfer.  Especially be aware that dozens of tiffs may have been copying
   when it went down, meaning multiple images are corrupted or damaged.

 5)  do a "chown -R jeremiah:archive" to correct permissions and do a "chmod -R 775 /srv/archive"  to protect the archive from unauthorized alteration until next time.
 6) then go look in the /srv/deposits directory, make sure folders are clean, and delete them.
   I do this:  ls /srv/deposits/content/*
               ls /srv/deposits/content/*/*
               ls /srv/deposits/content/*/*/*
               ls /srv/deposits/content/*/*/*/*
   if you find content, there's a problem.  Solve it!!  It may be necessary to rename things and go back
     through steps 13 on  again.
   if no content:  rm -r /srv/deposits/content/*
     (this will recursively delete all content in the directory)
 7) run ocrSelected (File:OcrSelected.txt) in /srv/scripts/surfacing.  This will go through the
    *ocrList files in /srv/deposits/ocrMe, select out the items marked with a "1" -- 
    locate the tiffs, create OCR files, create directories for them, and place them live in the content directory where they belong.
 8) check results, and delete files in /srv/deposits/ocrMe

updated 1 July 2011 Jlderidder

Personal tools