Most Content

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
(Preparation)
(Overview/Checklist)
 
(17 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
 +
 +
 +
== Simpler overview is here:  [[Workflow]] ==
 +
 +
 +
[[File:DS_Workflow_June2017.png]]
 +
 
==Overview/Checklist==
 
==Overview/Checklist==
 
* Make sure everything is properly prepared for upload (see [[Preparing_Collections_on_the_S_Drive_for_Online_Delivery_and_Storage | Preparing Collections]])
 
* Make sure everything is properly prepared for upload (see [[Preparing_Collections_on_the_S_Drive_for_Online_Delivery_and_Storage | Preparing Collections]])
** Move the metadata spreadsheet to S:\Administrative\Pipeline\collectionInfo\forMDlib\needsRemediation
 
 
 
'''(If new to this process, see [[Command-line_Work_on_Linux_Server]])'''
 
'''(If new to this process, see [[Command-line_Work_on_Linux_Server]])'''
 +
*Run ''makeFits'' script on server in scripts directory
 +
**This will take some time, probably hours
 +
**Check the corresponding ouput file located in the scripts output directory
 +
**If there are errors, run ''FixFits'', in the same directory; else it will say "Everything looks fine."
 +
* If this is an ongoing collection, add today's date (no spaces) to the end of the collection directory name, to distinguish it from other sets of batches.
 +
 +
'''Image of what should be in what directories before running holdContent:  [http://www.lib.ua.edu/wiki/digcoll/index.php/File:B4holdContent.PNG]
 +
* Run ''holdContent'', a script which
 +
** puts tiffs, FITS, OCR, and collection xml (if there) in hold4metadata directory on server
 +
** makes a collection directory in For_Metadata_Librarians and puts the Metadata directory and contents there
 +
** generates smaller, non-watermarked images for metadata librarians in a Jpegs directory there
 +
** deletes the collection in Digital_Coll_in_Progress
 +
'''Image of what should be in what directories AFTER running holdContent:  [http://www.lib.ua.edu/wiki/digcoll/index.php/File:after_holdContent.PNG]
 +
 +
* Wait till you hear that the MODS are ready:  Metadata librarians will move collection to Return_To_Digital_Services
 +
** a weekly script, ''checkStatusDS'' will watch for content returned there, and will:
 +
*** check their jpegs against our tiffs
 +
*** check their MODS against their jpegs
 +
*** if all good, the script will delete their jpegs and move the collection to Digital_Coll_Complete for processing -- then notify us
 +
*** if problems, the script will send us a list of errors, and leave the collection in Return_To_Digital_Services
 +
** If collection is still in Return_To_Digital_Services, resolve issues (may require renaming TIFFs and regenerating FITS to replace those on server), then rerun checkStatusDS (in ds /home/scripts/) till all is good
 +
<font color="red">*** NOTE: checkStatusDS will FAIL if the metadata librarians have renamed either the Jpegs folder OR the collection folder, as the script depends on these to test against the TIFFs in the hold4metadata directory on the server.  This is INTENTIONAL so that we can give the metadata librarians multiple batches of scans within a single collection folder AND/OR multiple sets of content for a single collection, as we may digitize far more quickly than they create metadata.  For ongoing large collections this would be an issue. Number/name batches and or sets of collection content appropriately, and DO NOT REUSE any already in their directory.</font>
  
* Run ''makeJpegsPurls'', a script which  
+
'''Image of what should be in what directories before running jpegs2server:  [http://www.lib.ua.edu/wiki/digcoll/index.php/File:B4jpegs2server.PNG]
 +
* Run ''jpegs2Server'', a script which:
 
** copies the MODS to the UploadArea for online delivery and Deposits for archiving, and deletes the version from the Share drive
 
** copies the MODS to the UploadArea for online delivery and Deposits for archiving, and deletes the version from the Share drive
** copies any .txt files in the Transripts folder (if there is one) to the UploadArea
+
** copies any .txt files in the Metadata to the Deposits for archiving
** creates a JPEG derivative of all the TIFFs and puts them into the UploadArea
+
** creates watermarked JPEG derivatives  and thumbs of all the TIFFs and puts them into the UploadArea
* Wait for makeJpegsPurls to finish, then...
+
** generates MIX files and logs technical metadata to the database
 +
** moves any Excel spreadsheets to Administrative\Pipeline\collectionInfo\Storage_Excel
 +
** moves the TIFFs to Deposits
 +
** uploads transcripts & OCR to the Acumen database
 +
** checks the collection XML (if there is one; warns you if it needs one) and gets it in the database, online, and in Deposits
 +
** removes the hold4metadata folder and the Digital_Coll_Complete folders if empty
 +
** sends you an email to check the output, and then run relocate_all to get stuff in Acumen
 +
'''Image of what should be in what directories AFTER running jpegs2server:  [http://www.lib.ua.edu/wiki/digcoll/index.php/File:after_jpegs2server.PNG]
 +
 
 +
* check the output of jpegs2Server, and look to make sure all the content is gone from Digital_Coll_Complete; also eyeball the Upload directories to make sure MODS and Jpegs are there ready and waiting.
 
* Run ''relocate_all'', a script which puts that content in the UploadArea in the right places in Acumen
 
* Run ''relocate_all'', a script which puts that content in the UploadArea in the right places in Acumen
 
* Check error report.  If no errors, run ''findMissing''.  Correct any problems found.
 
* Check error report.  If no errors, run ''findMissing''.  Correct any problems found.
* If this is a new collection, wait overnight -- for Acumen to index it -- before moving forward
+
 
* Run ''moveContent'', a script which
+
** sends the collection XML to the Acumen database
+
** moves the original TIFFs from the Share drive to Deposits, in preparation for archiving
+
* Check error report.  Correct any problems found.
+
  
  
Line 38: Line 72:
  
 
===Preparation===
 
===Preparation===
#  Make sure content is in the Completed directory, and that all quality control scripts have been run, and all corrections made.
+
#  Make sure that all quality control scripts have been run, and all corrections made.
#  Create FITS with the makeFits scripts on the server (in the scripts directory, not in the UploadArea).
+
#  Add today's date (no spaces) to collection directory name if this is an ongoing collection. Make sure scans batch numbers are not repeats.
Create the MODS from the exported text spreadsheet ([[Generating_MODS]]), and place them in a MODS directory within the Metadata directory. If this is a batch, rather than a complete collection, see [[Batches]] for info on what parts of this process might be different.
+
#  Log onto libcontent and change into the scripts directory.  (see [[Command-line_Work_on_Linux_Server]])
Log onto libcontent and change into the UploadArea/scripts directory. (see [[Command-line_Work_on_Linux_Server]])
+
#  Create FITS with the '''makeFits''' scripts on the server (in the scripts directory, not in the UploadArea).
 +
Change into UploadArea/scripts, and run '''holdContent''', which should email you.
 +
Check for any errors in output; your directory should have disappeared from the working area, and part of it should reappear in For_Metadata_Librarians, and part in the /srv/deposits/hold4metadata/ directory.
  
===Getting content to libcontent server===
+
===After Content Returns from Metadata Librarians===
# Type in ''''makeJpegsPurls''''  and answer the questions that arise. If errors appear on the commandline '''or in the output file''', repair them.  The output file will be located in the UploadArea/output directory, and the script will tell you its name, which is timestamped. Script here: ([[Image:MakeJpegsPurls.txt]])
+
#  If the content remains in Return_To_Digital_Services, check the problems list in the email notification.
## The makeJpegsPurls script will do some minor QC, including checking to see if there is a MODS for each item-level object and an object for each item-level MODS.  
+
##  You may need to rename collection directory or batch directories (on the share drive) to what they were originally
## If there are no problems, it will copy the MODS files to the UploadArea/MODS directory on the server, copy the transcripts to the UploadArea/transcripts directory on the server, and will generate JPEG derivatives (a thumb ending in _128.jpg and a large image ending in _2048.jpg) and place them in the UploadArea/jpegs directory on the server.
+
##  You may need to rename TIFFs in /srv/deposits/hold4metadata/ and either modify & rename FITS files, or regenerate them
## OCR
+
##  Then rerun '''checkStatusDS''' till all is clear, and collection gets moved to Digital_Coll_Complete.
### The log file will be searched for information about what items to OCR
+
# Run '''jpegs2server''' on the content that the last script moved to Digital_Coll_Complete, without changing anything.
### If instead you provide an [[OCR List | ocrList.txt]] file in the Admin directory then the items on that list will be OCR'd.  
+
#  Check output file, MODS and JPEGs in UploadArea, and ensure collection disappeared from the Completed folder.
## This new version of makeJpegs will also copy the MODS to the Deposits folder and, if everything checks out, delete them from the Share drive. '''If any of these files remain in the Metadata/MODS directory on the Share drive, they did NOT copy to the server!!'''
+
# If/When all is good, run '''relocate_all'''
 
+
# Check output file for any issues.
 
+
Run '''findMissing''' in /home/ds/scripts/ and check the output. This script will hunt through Acumen to make sure there is a MODS file for every item, and at least one derivative for each MODS file.  Any errors will be found in the output file written to the scripts/output directory.  If errors are found, get Metadata folks to regenerate those MODS -- or you regenerate JPEGs/MP3s, then rerun relocate_all.  Then run this script again to ensure all errors have been remedied.  (Script here:  [[Image:findMissing.txt]])
===Getting content to Acumen===
+
# Check the indexing of the uploaded content the next day to verify. (Content is currently being indexed each day overnight.)   
# In the UploadArea/scripts directory, type in 'relocate_all' ([[Image:relocate_all.txt]]).  This script will move all the content just uploaded into the correct directories in Acumen. If anything remains in the above-mentioned directories after the script has completed, there's a problem which must be repaired.  '''CHECK THE OUTPUT FILE'''. Another thing to check, if there's no errors in the output, is [[Server Permissions]].
+
In the scripts directory (on the same level as the UploadArea, AUDIO and CABANISS directories), '''run 'findMissing'.''This script will hunt through Acumen to make sure there is a MODS file for every item, and at least one derivative for each MODS file.  Any errors will be found in the output file written to the scripts/output directory.  If errors are found, regenerate those MODS and/or JPEGs/MP3s and rerun relocate_all.  Then run this script again to ensure all errors have been remedied.  (Script here:  [[Image:findMissing.txt]])
+
# Check the indexing of the uploaded content the next day to verify. (Content is currently being indexed each day overnight.)  Once the content appears online in Acumen, it is safe to do the next step for new collections;  for ongoing collections that already have a listing on the collection page, you may proceed to the next step without waiting. The collection list can be viewed here:  [http://www.lib.ua.edu/digital/browse Collection List Online].
+
  
 
===Quick Steps===
 
===Quick Steps===
  
# Login as DS and navigate to UploadArea/scripts
+
# Login as DS and navigate to scripts
# Type in '''makeJpegsPurls'''
+
# Type in '''makeFits''' (check output)
# Check output should begin with collnum.timestamp it should be the bottom most file
+
# Navigate to UploadArea/scripts  (CLOSE ALL DIRECTORIES)
# Run '''relocate_all'''
+
# Type in '''holdContent''' (check output)
# Wait a night for it to index
+
# WAIT FOR CONTENT TO BE RETURNED
# Run '''moveContent'''
+
# Repair problems if any, and run '''checkStatusDS''' until clear
# Check output it should begin with Storing.collnum.timestamp
+
# Run '''jpegs2server''' (check output)  (CLOSE ALL DIRECTORIES)
# Make sure everything is deleted off sharedrive
+
# Run '''relocate_all''' (check output)
 +
# Run '''findMissing''' (check output)
 
# Done!
 
# Done!
 
== Getting content from the share drive to the storage server to prepare for archiving ==
 
 
===Preparation===
 
Check the file NOTIFICATIONS.txt in S:\Digital Projects\Digital_Coll_Complete to verify that none of the content is part of collections that need to have special attention paid to them by Jody or the metadata unit.
 
 
===Getting content to storage===
 
MoveContent[[Image:moveContent.txt]] will move archival content to the deposits directory on the storage server, where it will be prepared for archiving at a later date.
 
# In the UploadArea/scripts directory, run 'moveContent.'  This script will:
 
## check the database for existing info about this collection, and provide you with whatever we already know, so you can correct it with your collection xml file,
 
## update our database and online collection xml file, if yours is new or changed -- adding the online link to the collection if new. ''This is why, if this is a new collection, you have to wait to run this script until the files have been indexed and content is viewable online, so there will not be a dead link in the collection list.''
 
## update or add the icon image if you are providing it for a collection thumbnail
 
## copy the archival content to the deposits/content directory on the server, for processing into the archive
 
## compare the copied content with what you have on the share drive;  if it uploaded okay, it will delete it on the share drive
 
## output errors into a file in the output directory
 
# Check back after a few hours and '''look at the output file''' to verify that there were no problems and that the script completed.  If you want, you can watch the files being uploaded in the deposits/content directory on libcontent.  :-)
 
# Check the share drive directories for any files that still remain. '''If any files remain in the directories on the share drive, they did NOT copy to the server!!'''  Run the script again, as there may have been a failure of the copy across the network.  If this fails, the file will need to be moved manually, and the problem encountered by the script must be resolved. Please notify Jody ASAP.
 
# Exit out of secure shell.  Good work!!
 
  
 
== Archiving content ==
 
== Archiving content ==
Line 94: Line 110:
 
See lines 9-25 and 31-33 on this page:  [[Moving_Content_To_Long-Term_Storage]]
 
See lines 9-25 and 31-33 on this page:  [[Moving_Content_To_Long-Term_Storage]]
  
The movement of metadata is spelled out in [[Metadata_Movement]] and the process for uploading MODS is in [[Uploading_MODS]].
+
The movement of metadata is spelled out in [[Metadata_Movement]] and the process for uploading modified MODS is in [[Uploading_MODS]].
 
+
We have yet to determine at what point MODS are ready for archiving, but that will be sorted out soon.
+
 
+
[[image:most.png]]
+

Latest revision as of 10:03, 12 July 2017


Contents

[edit] Simpler overview is here: Workflow

DS Workflow June2017.png

[edit] Overview/Checklist

(If new to this process, see Command-line_Work_on_Linux_Server)

  • Run makeFits script on server in scripts directory
    • This will take some time, probably hours
    • Check the corresponding ouput file located in the scripts output directory
    • If there are errors, run FixFits, in the same directory; else it will say "Everything looks fine."
  • If this is an ongoing collection, add today's date (no spaces) to the end of the collection directory name, to distinguish it from other sets of batches.

Image of what should be in what directories before running holdContent: [1]

  • Run holdContent, a script which
    • puts tiffs, FITS, OCR, and collection xml (if there) in hold4metadata directory on server
    • makes a collection directory in For_Metadata_Librarians and puts the Metadata directory and contents there
    • generates smaller, non-watermarked images for metadata librarians in a Jpegs directory there
    • deletes the collection in Digital_Coll_in_Progress

Image of what should be in what directories AFTER running holdContent: [2]

  • Wait till you hear that the MODS are ready: Metadata librarians will move collection to Return_To_Digital_Services
    • a weekly script, checkStatusDS will watch for content returned there, and will:
      • check their jpegs against our tiffs
      • check their MODS against their jpegs
      • if all good, the script will delete their jpegs and move the collection to Digital_Coll_Complete for processing -- then notify us
      • if problems, the script will send us a list of errors, and leave the collection in Return_To_Digital_Services
    • If collection is still in Return_To_Digital_Services, resolve issues (may require renaming TIFFs and regenerating FITS to replace those on server), then rerun checkStatusDS (in ds /home/scripts/) till all is good

*** NOTE: checkStatusDS will FAIL if the metadata librarians have renamed either the Jpegs folder OR the collection folder, as the script depends on these to test against the TIFFs in the hold4metadata directory on the server. This is INTENTIONAL so that we can give the metadata librarians multiple batches of scans within a single collection folder AND/OR multiple sets of content for a single collection, as we may digitize far more quickly than they create metadata. For ongoing large collections this would be an issue. Number/name batches and or sets of collection content appropriately, and DO NOT REUSE any already in their directory.

Image of what should be in what directories before running jpegs2server: [3]

  • Run jpegs2Server, a script which:
    • copies the MODS to the UploadArea for online delivery and Deposits for archiving, and deletes the version from the Share drive
    • copies any .txt files in the Metadata to the Deposits for archiving
    • creates watermarked JPEG derivatives and thumbs of all the TIFFs and puts them into the UploadArea
    • generates MIX files and logs technical metadata to the database
    • moves any Excel spreadsheets to Administrative\Pipeline\collectionInfo\Storage_Excel
    • moves the TIFFs to Deposits
    • uploads transcripts & OCR to the Acumen database
    • checks the collection XML (if there is one; warns you if it needs one) and gets it in the database, online, and in Deposits
    • removes the hold4metadata folder and the Digital_Coll_Complete folders if empty
    • sends you an email to check the output, and then run relocate_all to get stuff in Acumen

Image of what should be in what directories AFTER running jpegs2server: [4]

  • check the output of jpegs2Server, and look to make sure all the content is gone from Digital_Coll_Complete; also eyeball the Upload directories to make sure MODS and Jpegs are there ready and waiting.
  • Run relocate_all, a script which puts that content in the UploadArea in the right places in Acumen
  • Check error report. If no errors, run findMissing. Correct any problems found.


Problems? See Troubleshooting

[edit] Important things to remember about working with the server

  • Think long and hard before you run upload scripts on more than one collection at a time. Yes, it is possible to run the same scripts on multiple collections at a time. But just because it's possible doesn't mean you should do it. If something goes wrong, it can be very difficult to disentangle the separate collections in order to fix the problem.
  • Do not run the same script more than once at a time on a particular collection. If a script encounters a problem such that it doesn't finish running, do not just run it again. We do not want two instances of the same script running simultaneously. This gets very confusing and multiplies the possibility of error exponentially!
  • If you need to kill a script, please see someone who manages the scripts (Jody or the Repository Manager). Closing the current Terminal Window (the command line interface) of SSH Secure File Transfer does not kill the scripts that are running. This is especially important to remember for makeJpegs, which splits into two processes as soon as you run it (whether you see results of this in the UploadArea or not).
  • When checking a libcontent folder for ongoing upload status, hit Refresh. The SSH File Transfer Window will not refresh on its own.

[edit] Preliminary steps on the server

  1. make sure that the Windows share is mounted: type into the ssh window on libcontent: `ls /cifs-mount` -- if no listing appears, or the window hangs up, you need to mount the share drive. Otherwise, proceed to getting content from the share drive...
  2. To mount the Windows drive, type this in on the commandline on libcontent: `sudo mount -t cifs -o username=jjcolonnaromano,domain=lib //libfs1.lib.ua-net.ua.edu/share/Digital\ Projects/ /cifs-mount` and use the password for share for Jeremiah. If successful, the command in the last step will show you the directories within the Digital Projects folder on Share.

[edit] Getting content from the share drive to live in Acumen

[edit] Preparation

  1. Make sure that all quality control scripts have been run, and all corrections made.
  2. Add today's date (no spaces) to collection directory name if this is an ongoing collection. Make sure scans batch numbers are not repeats.
  3. Log onto libcontent and change into the scripts directory. (see Command-line_Work_on_Linux_Server)
  4. Create FITS with the makeFits scripts on the server (in the scripts directory, not in the UploadArea).
  5. Change into UploadArea/scripts, and run holdContent, which should email you.
  6. Check for any errors in output; your directory should have disappeared from the working area, and part of it should reappear in For_Metadata_Librarians, and part in the /srv/deposits/hold4metadata/ directory.

[edit] After Content Returns from Metadata Librarians

  1. If the content remains in Return_To_Digital_Services, check the problems list in the email notification.
    1. You may need to rename collection directory or batch directories (on the share drive) to what they were originally
    2. You may need to rename TIFFs in /srv/deposits/hold4metadata/ and either modify & rename FITS files, or regenerate them
    3. Then rerun checkStatusDS till all is clear, and collection gets moved to Digital_Coll_Complete.
  2. Run jpegs2server on the content that the last script moved to Digital_Coll_Complete, without changing anything.
  3. Check output file, MODS and JPEGs in UploadArea, and ensure collection disappeared from the Completed folder.
  4. If/When all is good, run relocate_all
  5. Check output file for any issues.
  6. Run findMissing in /home/ds/scripts/ and check the output. This script will hunt through Acumen to make sure there is a MODS file for every item, and at least one derivative for each MODS file. Any errors will be found in the output file written to the scripts/output directory. If errors are found, get Metadata folks to regenerate those MODS -- or you regenerate JPEGs/MP3s, then rerun relocate_all. Then run this script again to ensure all errors have been remedied. (Script here: File:FindMissing.txt)
  7. Check the indexing of the uploaded content the next day to verify. (Content is currently being indexed each day overnight.)

[edit] Quick Steps

  1. Login as DS and navigate to scripts
  2. Type in makeFits (check output)
  3. Navigate to UploadArea/scripts (CLOSE ALL DIRECTORIES)
  4. Type in holdContent (check output)
  5. WAIT FOR CONTENT TO BE RETURNED
  6. Repair problems if any, and run checkStatusDS until clear
  7. Run jpegs2server (check output) (CLOSE ALL DIRECTORIES)
  8. Run relocate_all (check output)
  9. Run findMissing (check output)
  10. Done!

[edit] Archiving content

The first sections of the attached diagram which pertain to Digital Services are delineated in Preparing_Collections_on_the_S_Drive_for_Online_Delivery_and_Storage.

See lines 9-25 and 31-33 on this page: Moving_Content_To_Long-Term_Storage

The movement of metadata is spelled out in Metadata_Movement and the process for uploading modified MODS is in Uploading_MODS.

Personal tools