Most Content

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
m (reflecting server switch from libcontent1 to libcontent)
(22 intermediate revisions by 3 users not shown)
Line 1: Line 1:
The first sections of the attached diagram which pertain to Digital Services are delineated in [[Preparing_Collections_on_the_S_Drive_for_Online_Delivery_and_Storage]].  
+
==Overview/Checklist==
 +
* Make sure everything is properly prepared for upload (see [[Preparing_Collections_on_the_S_Drive_for_Online_Delivery_and_Storage | Preparing Collections]])
 +
** Move the metadata spreadsheet to S:\Administrative\Pipeline\collectionInfo\forMDlib\needsRemediation
 +
* Run ''makeJpegs'', a script which
 +
** makes a copy of all the MODS
 +
** makes a copy of any .txt files in the Transripts folder (if there is one)
 +
** creates a derivative of all the TIFFs
 +
** puts them into the UploadArea on libcontent
 +
* Wait for makeJpegs to finish, then...
 +
* Run ''relocate_all'', a script which puts that new content in the right places in Acumen
 +
* If this is a new collection, wait for 24 hours or so -- for Acumen to index it -- before moving forward
 +
* Run ''moveContent'', a script which
 +
** sends the collection XML to the Acumen database
 +
** move the original files from the Share drive to storage, in preparation for archiving
  
Here's the steps for creating derivatives and moving content off the share drive:
+
== Important things to remember about working with the server==
 +
 
 +
* '''Think long and hard before you run upload scripts on more than one collection at a time.''' Yes, it is possible to run the same scripts on multiple collections at a time. But just because it's possible doesn't mean you should do it. If something goes wrong, it can be very difficult to disentangle the separate collections in order to fix the problem.
 +
* '''Do not run the same script more than once at a time on a particular collection.''' If a script encounters a problem such that it doesn't finish running, do not just run it again. We do not want two instances of the same script running simultaneously. This gets very confusing and multiplies the possibility of error exponentially!
 +
* '''If you need to kill a script, please see someone who manages the scripts (Jody or the Repository Manager).''' Closing the current Terminal Window (the command line interface) of SSH Secure File Transfer does not kill the scripts that are running. This is especially important to remember for makeJpegs, which splits into two processes as soon as you run it (whether you see results of this in the UploadArea or not).
 +
* '''When checking a libcontent folder for ongoing upload status, hit Refresh.''' The SSH File Transfer Window will not refresh on its own.
 +
 
 +
<font color=grey>
 +
== Preliminary steps on the server ==
 +
 
 +
# make sure that the Windows share is mounted:  type into the ssh window on libcontent:  `ls /cifs-mount`  -- if no listing appears, or the window hangs up, you need to mount the share drive.  Otherwise, proceed to getting content from the share drive...
 +
# To mount the Windows drive, type this in on the commandline on libcontent: `sudo mount -t cifs -o username=jjcolonnaromano,domain=lib //libfs1.lib.ua-net.ua.edu/share/Digital\ Projects/ /cifs-mount`  and use the password for share for Jeremiah.  If successful, the command in the last step will show you the directories within the Digital Projects folder on Share.
 +
</font>
  
 
== Getting content from the share drive to <u>''live''</u> in Acumen ==
 
== Getting content from the share drive to <u>''live''</u> in Acumen ==
  
 +
===Preparation===
 
#  Make sure content is in the Completed directory, and that all quality control scripts have been run, and all corrections made.
 
#  Make sure content is in the Completed directory, and that all quality control scripts have been run, and all corrections made.
#  Create the MODS from the exported text spreadsheet ([[Making_MODS]]), and place them in a MODS directory within the Metadata directory.
+
#  Create the MODS from the exported text spreadsheet ([[Making_MODS]]), and place them in a MODS directory within the Metadata directory. If this is a batch, rather than a complete collection, see [[Batches]] for info on what parts of this process might be different.
#  Log onto libcontent1 and change into the UploadArea/scripts directory.  (see [[Command-line_Work_on_Linux_Server]])
+
#  Log onto libcontent and change into the UploadArea/scripts directory.  (see [[Command-line_Work_on_Linux_Server]])
#  Type in 'makeJpegs'  and answer the questions that arise.  If errors appear on the commandline or in the output file, repair them.  The output file will be located in the UploadArea/output directory, and the script will tell you its name, which is timestamped.   
+
 
 +
===Getting content to libcontent server===
 +
#  Type in 'makeJpegs'  and answer the questions that arise.  If errors appear on the commandline or in the output file, repair them.  The output file will be located in the UploadArea/output directory, and the script will tell you its name, which is timestamped.  Script here: ([[Image:makeJpegs.txt]])
 
## The makeJpegs script will do some minor QC, including checking to see if there is a MODS for each item-level object and an object for each item-level MODS.   
 
## The makeJpegs script will do some minor QC, including checking to see if there is a MODS for each item-level object and an object for each item-level MODS.   
 
## If there are no problems, it will copy the MODS files to the UploadArea/MODS directory on the server, copy the transcripts to the UploadArea/transcripts directory on the server, and will generate JPEG derivatives (a thumb ending in _128.jpg and a large image ending in _2048.jpg) and place them in the UploadArea/jpegs directory on the server.
 
## If there are no problems, it will copy the MODS files to the UploadArea/MODS directory on the server, copy the transcripts to the UploadArea/transcripts directory on the server, and will generate JPEG derivatives (a thumb ending in _128.jpg and a large image ending in _2048.jpg) and place them in the UploadArea/jpegs directory on the server.
## If you provide an ocrList.txt file in the Admin directory which contains a list of item numbers to be OCR'd (each to be OCR'd is followed by a tab and then a one ("1"), with one item number per line (those followed with zeros will be ignored) -- then the selected items and their child files will be OCR'd.  If any on that list are already online, the script will look for the tiffs in the archive, look for existing text files in Acumen - and if it finds the former and not the latter, it will OCR the tiff and place it in Acumen.  For the content in the ocrList which is currently being uploaded, the OCR will be placed in the UploadArea/ocr directory.  For content not found, there will be a list of the unlocated TIFFs in the output file.
+
## OCR
## If you select to OCR all content, it will OCR every TIFF you are uploading;  by default it will OCR all transcript tiffs unless a corresponding text file exists in the upload directory (indicating a corrected transcript). You may also select to OCR all of this collection, including what is already online. (See previous entry for how it will handle this.)
+
### If you provide an [[OCR List | ocrList.txt]] file in the Admin directory then the selected items and their child files will be OCR'd.   
 +
### If you select to OCR all content, it will OCR every TIFF you are uploading;  by default it will OCR all transcript tiffs unless a corresponding text file exists in the upload directory (indicating a corrected transcript). You may also select to OCR all of this collection, including what is already online. It will handle discrepancies the same way it does if you'd provided a list (go [[OCR List | here]] for information on that).
 
# If there were no errors on the command line, check the output file in a few hours to verify that the script completed these tasks; and check the results on the server in the above-named directories.  If all is well, proceed to the next step.
 
# If there were no errors on the command line, check the output file in a few hours to verify that the script completed these tasks; and check the results on the server in the above-named directories.  If all is well, proceed to the next step.
# In the UploadArea/scripts directory, type in 'relocate_all'.  This script will move all the content just uploaded into the correct directories in Acumen. If anything remains in the above-mentioned directories after the script has completed, there's a problem which must be repaired.  One thing to check, if there's no errors in the output, is [[Server Permissions]].
+
 
#  In the scripts directory (on the same level as the UploadArea, AUDIO and CABANISS directories, run 'findMissing'.  This script will hunt through Acumen to make sure there is a MODS file for every item, and at least one derivative for each MODS file.  Any errors will be found in the output file written to the scripts/output directory.  If errors are found, regenerate those MODS and/or JPEGs/MP3s  and rerun relocate_all.  Then run this script again to ensure all errors have been remedied.
+
===Getting content to Acumen===
# Check the indexing of the uploaded content after about 24 - 30 hours to verify.  Once the content appears online in Acumen, it is safe to do the next step for new collections;  for ongoing collections that already have a listing on the collection page, you may proceed to the next step without waiting. (We don't want to create dead links in the collection list, by adding listings before the content is findable in Acumen.)  The collection list can be viewed here:  [http://www.lib.ua.edu/digital/browse Collection List Online].
+
# In the UploadArea/scripts directory, type in 'relocate_all' ([[Image:relocate_all.txt]]).  This script will move all the content just uploaded into the correct directories in Acumen. If anything remains in the above-mentioned directories after the script has completed, there's a problem which must be repaired.  One thing to check, if there's no errors in the output, is [[Server Permissions]].
# In the UploadArea/scripts directory, run 'moveContent.'  (Again, if this is a new collection, wait until the files have been indexed and content is viewable online, so there will not be a dead link in the collection list.) This script will:
+
#  In the scripts directory (on the same level as the UploadArea, AUDIO and CABANISS directories, run 'findMissing'.  This script will hunt through Acumen to make sure there is a MODS file for every item, and at least one derivative for each MODS file.  Any errors will be found in the output file written to the scripts/output directory.  If errors are found, regenerate those MODS and/or JPEGs/MP3s  and rerun relocate_all.  Then run this script again to ensure all errors have been remedied. (Script here:  [[Image:findMissing.txt]])
 +
# Check the indexing of the uploaded content after about 24 - 30 hours to verify.  Once the content appears online in Acumen, it is safe to do the next step for new collections;  for ongoing collections that already have a listing on the collection page, you may proceed to the next step without waiting. The collection list can be viewed here:  [http://www.lib.ua.edu/digital/browse Collection List Online].
 +
 
 +
== Getting content from the share drive to the storage server to prepare for archiving ==
 +
 
 +
===Preparation===
 +
Check the file NOTIFICATIONS.txt in S:\Digital Projects\Digital_Coll_Complete to verify that none of the content is part of collections that need to have special attention paid to them by Jody or the metadata unit.
 +
 
 +
===Getting content to storage===
 +
MoveContent[[Image:moveContent.txt]] will move content to the deposits directory on the storage server, where it will be prepared for archiving at a later date.
 +
# In the UploadArea/scripts directory, run 'moveContent.'  This script will:
 
## check the database for existing info about this collection, and provide you with whatever we already know, so you can correct it with your collection xml file,
 
## check the database for existing info about this collection, and provide you with whatever we already know, so you can correct it with your collection xml file,
## update our database and online collection xml file, if yours is new or changed -- adding the online link to the collection if new
+
## update our database and online collection xml file, if yours is new or changed -- adding the online link to the collection if new. ''This is why, if this is a new collection, you have to wait to run this script until the files have been indexed and content is viewable online, so there will not be a dead link in the collection list.''
 
## update or add the icon image if you are providing it for a collection thumbnail
 
## update or add the icon image if you are providing it for a collection thumbnail
 
## copy the archival content to the deposits/content directory on the server, for processing into the archive
 
## copy the archival content to the deposits/content directory on the server, for processing into the archive
 
## compare the copied content with what you have on the share drive;  if it uploaded okay, it will delete it on the share drive
 
## compare the copied content with what you have on the share drive;  if it uploaded okay, it will delete it on the share drive
 
## output errors into a file in the output directory
 
## output errors into a file in the output directory
# Check back after a few hours and look at the output file to verify that there were no problems and that the script completed
+
# Check back after a few hours and look at the output file to verify that there were no problems and that the script completed.  If you want, you can watch the files being uploaded in the deposits/content directory on libcontent.  :-)
# Check the share drive directories for any files that still remain. If any archival files are still there, rerun moveContent. There may have been a failure in the network connection between the servers.  If the files still remain, notify Jody.
+
# Check the share drive directories for any files that still remain. '''If any files remain in the directories on the share drive, they did NOT copy to the server!!'''  Run the script again, as there may have been a failure of the copy across the network.  If this fails, the file will need to be moved manually, and the problem encountered by the script must be resolved. Please notify Jody ASAP.  
 
# Exit out of secure shell.  Good work!!
 
# Exit out of secure shell.  Good work!!
 
== Getting content from the share drive to the storage server to prepare for archiving ==
 
 
The moveContent script will copy content to the deposits directory on the storage server, where it will be prepared for archiving at a later date.
 
 
Additionally, it will check the collection-level xml file (in the Admin directory), then add it to the online database which feeds the collection list online.  Thus, this script should NOT be run for new collections until the content is indexed (as described in the previous section);  otherwise, the link from the collection listing will be dead.
 
 
After making the copy, it will verify that each file copied without alteration, and then delete the copy on the share drive. 
 
 
If any files remain in the directories on the share drive, they did NOT copy to the server!!  Run the script again, as there may have been a failure of the copy across the network.  If this fails, the file will need to be moved manually, and the problem encountered by the script must be resolved.
 
  
 
== Archiving content ==
 
== Archiving content ==
 +
 +
The first sections of the attached diagram which pertain to Digital Services are delineated in [[Preparing_Collections_on_the_S_Drive_for_Online_Delivery_and_Storage]]. 
  
 
See lines 9-25 and 31-33 on this page:  [[Moving_Content_To_Long-Term_Storage]]
 
See lines 9-25 and 31-33 on this page:  [[Moving_Content_To_Long-Term_Storage]]
----------------------------------------------------------------
 
  
 
The movement of metadata is spelled out in [[Metadata_Movement]] and the process for uploading MODS is in [[Uploading_MODS]].
 
The movement of metadata is spelled out in [[Metadata_Movement]] and the process for uploading MODS is in [[Uploading_MODS]].
  
We have yet to determine at what point MODS (and server-side generated OCR) are ready for archiving, but that will be sorted out soon.
+
We have yet to determine at what point MODS are ready for archiving, but that will be sorted out soon.
  
 
[[image:most.png]]
 
[[image:most.png]]

Revision as of 09:13, 6 August 2013

Contents

Overview/Checklist

  • Make sure everything is properly prepared for upload (see Preparing Collections)
    • Move the metadata spreadsheet to S:\Administrative\Pipeline\collectionInfo\forMDlib\needsRemediation
  • Run makeJpegs, a script which
    • makes a copy of all the MODS
    • makes a copy of any .txt files in the Transripts folder (if there is one)
    • creates a derivative of all the TIFFs
    • puts them into the UploadArea on libcontent
  • Wait for makeJpegs to finish, then...
  • Run relocate_all, a script which puts that new content in the right places in Acumen
  • If this is a new collection, wait for 24 hours or so -- for Acumen to index it -- before moving forward
  • Run moveContent, a script which
    • sends the collection XML to the Acumen database
    • move the original files from the Share drive to storage, in preparation for archiving

Important things to remember about working with the server

  • Think long and hard before you run upload scripts on more than one collection at a time. Yes, it is possible to run the same scripts on multiple collections at a time. But just because it's possible doesn't mean you should do it. If something goes wrong, it can be very difficult to disentangle the separate collections in order to fix the problem.
  • Do not run the same script more than once at a time on a particular collection. If a script encounters a problem such that it doesn't finish running, do not just run it again. We do not want two instances of the same script running simultaneously. This gets very confusing and multiplies the possibility of error exponentially!
  • If you need to kill a script, please see someone who manages the scripts (Jody or the Repository Manager). Closing the current Terminal Window (the command line interface) of SSH Secure File Transfer does not kill the scripts that are running. This is especially important to remember for makeJpegs, which splits into two processes as soon as you run it (whether you see results of this in the UploadArea or not).
  • When checking a libcontent folder for ongoing upload status, hit Refresh. The SSH File Transfer Window will not refresh on its own.

Preliminary steps on the server

  1. make sure that the Windows share is mounted: type into the ssh window on libcontent: `ls /cifs-mount` -- if no listing appears, or the window hangs up, you need to mount the share drive. Otherwise, proceed to getting content from the share drive...
  2. To mount the Windows drive, type this in on the commandline on libcontent: `sudo mount -t cifs -o username=jjcolonnaromano,domain=lib //libfs1.lib.ua-net.ua.edu/share/Digital\ Projects/ /cifs-mount` and use the password for share for Jeremiah. If successful, the command in the last step will show you the directories within the Digital Projects folder on Share.

Getting content from the share drive to live in Acumen

Preparation

  1. Make sure content is in the Completed directory, and that all quality control scripts have been run, and all corrections made.
  2. Create the MODS from the exported text spreadsheet (Making_MODS), and place them in a MODS directory within the Metadata directory. If this is a batch, rather than a complete collection, see Batches for info on what parts of this process might be different.
  3. Log onto libcontent and change into the UploadArea/scripts directory. (see Command-line_Work_on_Linux_Server)

Getting content to libcontent server

  1. Type in 'makeJpegs' and answer the questions that arise. If errors appear on the commandline or in the output file, repair them. The output file will be located in the UploadArea/output directory, and the script will tell you its name, which is timestamped. Script here: (File:MakeJpegs.txt)
    1. The makeJpegs script will do some minor QC, including checking to see if there is a MODS for each item-level object and an object for each item-level MODS.
    2. If there are no problems, it will copy the MODS files to the UploadArea/MODS directory on the server, copy the transcripts to the UploadArea/transcripts directory on the server, and will generate JPEG derivatives (a thumb ending in _128.jpg and a large image ending in _2048.jpg) and place them in the UploadArea/jpegs directory on the server.
    3. OCR
      1. If you provide an ocrList.txt file in the Admin directory then the selected items and their child files will be OCR'd.
      2. If you select to OCR all content, it will OCR every TIFF you are uploading; by default it will OCR all transcript tiffs unless a corresponding text file exists in the upload directory (indicating a corrected transcript). You may also select to OCR all of this collection, including what is already online. It will handle discrepancies the same way it does if you'd provided a list (go here for information on that).
  2. If there were no errors on the command line, check the output file in a few hours to verify that the script completed these tasks; and check the results on the server in the above-named directories. If all is well, proceed to the next step.

Getting content to Acumen

  1. In the UploadArea/scripts directory, type in 'relocate_all' (File:Relocate all.txt). This script will move all the content just uploaded into the correct directories in Acumen. If anything remains in the above-mentioned directories after the script has completed, there's a problem which must be repaired. One thing to check, if there's no errors in the output, is Server Permissions.
  2. In the scripts directory (on the same level as the UploadArea, AUDIO and CABANISS directories, run 'findMissing'. This script will hunt through Acumen to make sure there is a MODS file for every item, and at least one derivative for each MODS file. Any errors will be found in the output file written to the scripts/output directory. If errors are found, regenerate those MODS and/or JPEGs/MP3s and rerun relocate_all. Then run this script again to ensure all errors have been remedied. (Script here: File:FindMissing.txt)
  3. Check the indexing of the uploaded content after about 24 - 30 hours to verify. Once the content appears online in Acumen, it is safe to do the next step for new collections; for ongoing collections that already have a listing on the collection page, you may proceed to the next step without waiting. The collection list can be viewed here: Collection List Online.

Getting content from the share drive to the storage server to prepare for archiving

Preparation

Check the file NOTIFICATIONS.txt in S:\Digital Projects\Digital_Coll_Complete to verify that none of the content is part of collections that need to have special attention paid to them by Jody or the metadata unit.

Getting content to storage

MoveContentFile:MoveContent.txt will move content to the deposits directory on the storage server, where it will be prepared for archiving at a later date.

  1. In the UploadArea/scripts directory, run 'moveContent.' This script will:
    1. check the database for existing info about this collection, and provide you with whatever we already know, so you can correct it with your collection xml file,
    2. update our database and online collection xml file, if yours is new or changed -- adding the online link to the collection if new. This is why, if this is a new collection, you have to wait to run this script until the files have been indexed and content is viewable online, so there will not be a dead link in the collection list.
    3. update or add the icon image if you are providing it for a collection thumbnail
    4. copy the archival content to the deposits/content directory on the server, for processing into the archive
    5. compare the copied content with what you have on the share drive; if it uploaded okay, it will delete it on the share drive
    6. output errors into a file in the output directory
  2. Check back after a few hours and look at the output file to verify that there were no problems and that the script completed. If you want, you can watch the files being uploaded in the deposits/content directory on libcontent.  :-)
  3. Check the share drive directories for any files that still remain. If any files remain in the directories on the share drive, they did NOT copy to the server!! Run the script again, as there may have been a failure of the copy across the network. If this fails, the file will need to be moved manually, and the problem encountered by the script must be resolved. Please notify Jody ASAP.
  4. Exit out of secure shell. Good work!!

Archiving content

The first sections of the attached diagram which pertain to Digital Services are delineated in Preparing_Collections_on_the_S_Drive_for_Online_Delivery_and_Storage.

See lines 9-25 and 31-33 on this page: Moving_Content_To_Long-Term_Storage

The movement of metadata is spelled out in Metadata_Movement and the process for uploading MODS is in Uploading_MODS.

We have yet to determine at what point MODS are ready for archiving, but that will be sorted out soon.

Most.png

Personal tools