Electronic Theses and Dissertations

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
(Workflow Overview)
(22 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
[[Image:ETDs_20100426.png]]
 +
 
== Workflow Overview==
 
== Workflow Overview==
'''A.''' ProQuest uploads deposits of zip files to the content.lib.ua.edu server via ftp into the ftpaccess home directory, and notifies either Janet Lee-Smeltzer or Jody DeRidder of the upload.  If Janet, then Janet notifies Jody.
+
'''A.''' ProQuest uploads deposits of zip files to the content.lib.ua.edu server via ftp into the ftpaccess home directory, and notifies Janet Lee-Smeltzer of the upload.  Janet notifies the Metadata Librarian (in this case, Mary or Kayla) to process them.
 +
 
 +
'''B.''' Metadata Librarian runs the PERL script "moveContent"  ([[Image:moveContentBD.txt]]) which is located in the scripts directory of her home area on content.lib.ua.edu.  This script picks up all zip files sitting in the etd_deposits directory (which corresponds to the ftpaccess home area), identifies the date the files were deposited, and relocates the files into a directory named for this date of deposit (yyyymmdd) in etd_deposits. (This script and the following one encompass the tasks that Jody was doing, outlined here:  [[Preprocessing ETDs]])
 +
 
 +
'''C.''' Metadata Librarian then runs the PERL script "processEtds" ([[Image:processEtds.txt]])which is located in her scripts directory.  This script will ask which directory to process, and whether the files are for May, December, or September graduation, so it knows when to start the embargoes.  It will create a subdirectory in the "working" directory which matches the name of the selected directory.  Within this it will create 3 subdirectories:  OPEN, PRQ, and CONTENT.  OPEN is where it will open the zip directories.  PRQ is where it will put the renamed and altered XML for processing.  CONTENT is where it will put the renamed content files.
 +
 
 +
The "processEtds" script performs these tasks: 
  
'''B.''' Jody relocates the content into a directory named for the date of deposit (yyyymmdd) in the ftpaccess home directory.  She then copies this entire directory to a working directory for modifications, where she unzips all the content and performs the following tasks:
 
 
# extracts from each metadata file the following information:   
 
# extracts from each metadata file the following information:   
 
##title,  
 
##title,  
Line 14: Line 21:
 
# places this altered copy of the metadata into an PRQ subdirectory;  the copy will be named with the assigned filename followed by ".prq.xml" (thus a correctly named file would be:  u0015_0000001_0000023.prq.xml) to indicate this is still ProQuest XML.
 
# places this altered copy of the metadata into an PRQ subdirectory;  the copy will be named with the assigned filename followed by ".prq.xml" (thus a correctly named file would be:  u0015_0000001_0000023.prq.xml) to indicate this is still ProQuest XML.
 
# copies all the bitstreams and renames them appropriately, placing them in a CONTENT subdirectory. The primary PDF will be named with the assigned filename followed by ".pdf";  subsidiary files will be numbered sequentially, with a 4-digit left-padded number attached to the assigned filename, followed by the extension.  So the first subsidiary file for this file (if a jpeg) would properly be named u0015_0000001_0000023_0001.jpg, and the second (if a text file) would be named u0015_0000001_0000023_0002.txt.
 
# copies all the bitstreams and renames them appropriately, placing them in a CONTENT subdirectory. The primary PDF will be named with the assigned filename followed by ".pdf";  subsidiary files will be numbered sequentially, with a 4-digit left-padded number attached to the assigned filename, followed by the extension.  So the first subsidiary file for this file (if a jpeg) would properly be named u0015_0000001_0000023_0001.jpg, and the second (if a text file) would be named u0015_0000001_0000023_0002.txt.
# creates an entry for each record in a tab-delimited xmlList.txt file which contains the following fields:
+
# creates an entry for each record in a tab-delimited xmlList.xml file which contains the following fields:
 
## assigned filename
 
## assigned filename
 
## original filename
 
## original filename
Line 22: Line 29:
 
## year the manuscript was completed
 
## year the manuscript was completed
 
## year the degree was awarded
 
## year the degree was awarded
## an indicator of the existence of subsidiary files
+
## an indicator of the existence of subsidiary files (a count)
 
## the embargo code
 
## the embargo code
 
## date item is made available via the web
 
## date item is made available via the web
 
## the assigned PURL
 
## the assigned PURL
  
'''C.''' Jody creates a directory in the Metadata Librarian home area under etd_deposits labeled to match the date of deposit (as aboveyyyymmdd). She copies the original, unzipped deposit to an ORIGINAL subdirectory. She also copies the CONTENT and PRQ subdirectories to this directory, and copies the xmlList.txt to this directory alsoShe then notifies the Metadata Librarian responsible for the next step.
+
'''D.''' Metadata Librarian works with the deposited content to create valid MODS files meeting our local profile, which include the assigned identifier and PURL, and are named for the assigned identifier with a ".mods.xml" extension. 
 +
 
 +
'''E.''' Metadata Librarian uploads the finished MODS and the associated renamed content into a datestamp-titled MODS directory in her home area on libcontent1.lib.ua.edu. 
 +
 
 +
'''F. ''' Metadata Librarian runs the [[Image:relocate_all_BD.txt]] script in her home directory. This places all the content and MODS (which are not under embargo) into the correct directories in Acumen, as well as copying everything to the deposits directory for upload into the storage archive.  
 +
 
 +
'''G.''' Metadata Librarian checks the final display and access via Acumen to verify that no problems exist.  If any problems are encountered, she contacts Jody and we work out how to fix them:-)
 +
 
 +
'''H.'''  Jody runs a script (/srv/scripts/bornDigital/relocatingBd [[Image:relocatingBD.txt]]) which will move the files into the correct subdirectories for long-term storage, linking them into the LOCKSS manifests.
  
'''D.''' The Metadata Librarian (at this time, Shawn Averkamp) works with the deposited content to create valid MODS files meeting our local profile, which include the assigned identifier and PURL, and are named for the assigned identifier with a ".mods.xml" extension. 
+
'''I.''' Kayla or Janet creates valid MARC files for upload into our OPAC system, which reference the included assigned identifier and PURL.  Note: This step is for files without embargoes.  For more information about files with embargoes, see steps M. and O.)
  
'''E.''' The Metadata Librarian also creates valid MARC files for upload into our OPAC system, which reference the included assigned identifier and PURL.
+
'''J.''' Kayla or Janet will batch upload the MARC to WorldCat, in order to obtain OCLC numbers for the records.
  
'''F.''' She places the finished MODS in a MODS directory next to the PRQ and CONTENT directories.
+
'''K.''' Kayla or Janet then submits a batch upload of the MARC records into our catalog system.
  
'''G. ''' The Metadata Librarian notifies Jody that the records are ready.
+
'''L.''' Kayla or Janet checks the final display and access via the OPAC.
  
'''H.'''  Jody copies the deposits to the Deposit subdirectory on the storage server, and runs a script which will move the files into the correct subdirectories for long-term storage, linking them into the LOCKSS manifests.
+
'''M.'''  The embargo-checking script "[[Image:checkEmbargo.txt]]" will check the database on the 21st of each month for embargoes which are due to lift the next month;  this script will email Jody and Metadata Librarians with the filename, title, author, and date that the embargo is to lift.
  
'''I.'''  Jody extracts the filenames and embargo codes from the xmlList.txt, adds in the date the embargo starts, and calculates the end of the embargo dates.  She then adds this information to a list or database entry that is checked by the periodic refreshing script.  
+
'''N.'''  Another script, "[[Image:liftEmbargo.txt]]" will  run on the first of each month, and will copy live anything whose embargo has lifted.  
  
'''J.''' The periodic refreshing script crawls through the storage directory, picks up new and modified files, checks for embargo dates not yet past, and if this raises no flags, copies the content and MODS to the web directories for online delivery.  The periodic refreshing script will also note if an embargo date is past.
+
'''O.''' Kayla or Janet will upload the no-longer-embargoed-content into WorldCat and then the local OPAC.
  
'''I.''' The Metadata Librarian submits a batch upload of the MARC records into our catalog system.
+
'''P.''' Should the metadata require remediation, Metadata Librarian will add a recordChangeDate field, and will upload the altered MODS files to libcontent1 and run [[Image:relocate_all_BD.txt]].  This will move the files to the live web directory (except for those under embargo) and also to a deposits directory for archival storage.
  
'''J.''' The Metadata Librarian checks the final display and access via both OPAC and digital library system to verify that no problems exist. If any problems are encountered, she contacts Jody and we work out how to fix them. :-)
+
'''Q.'''  Jody will transfer these into archival storage using the aforementioned script, [[Image:relocatingBD.txt]].
  
'''K.''' The Metadata Librarian will batch upload the MARC to WorldCat.
+
==[[Altered_Embargoes]]==
  
'''L.''' Jody will also set up a script that will check the database on the first of each month for embargoes which are due to lift that month; this script will email Jody and the Metadata Librarians with the filename, title, author, and date that the embargo is to lift.
+
Reference: [[Find_our_content_online]]
  
'''M.''' The Metadata Librarian will then prepare and upload the no-longer-embargoed-content into the local OPAC and into WorldCat.
+
updated 4/26/10  jlderidder

Revision as of 08:16, 14 June 2012

ETDs 20100426.png

Workflow Overview

A. ProQuest uploads deposits of zip files to the content.lib.ua.edu server via ftp into the ftpaccess home directory, and notifies Janet Lee-Smeltzer of the upload. Janet notifies the Metadata Librarian (in this case, Mary or Kayla) to process them.

B. Metadata Librarian runs the PERL script "moveContent" (File:MoveContentBD.txt) which is located in the scripts directory of her home area on content.lib.ua.edu. This script picks up all zip files sitting in the etd_deposits directory (which corresponds to the ftpaccess home area), identifies the date the files were deposited, and relocates the files into a directory named for this date of deposit (yyyymmdd) in etd_deposits. (This script and the following one encompass the tasks that Jody was doing, outlined here: Preprocessing ETDs)

C. Metadata Librarian then runs the PERL script "processEtds" (File:ProcessEtds.txt)which is located in her scripts directory. This script will ask which directory to process, and whether the files are for May, December, or September graduation, so it knows when to start the embargoes. It will create a subdirectory in the "working" directory which matches the name of the selected directory. Within this it will create 3 subdirectories: OPEN, PRQ, and CONTENT. OPEN is where it will open the zip directories. PRQ is where it will put the renamed and altered XML for processing. CONTENT is where it will put the renamed content files.

The "processEtds" script performs these tasks:

  1. extracts from each metadata file the following information:
    1. title,
    2. author,
    3. year the manuscript was completed
    4. year the degree was awarded
    5. embargo code (if any)
  2. calls the InfoTrack.bornDigital mysql database table on libcontent1.lib.ua.edu to find the next filenumber to assign and the InfoTrack.lookup table to determine the next persistent URL;
  3. records the item number assigned, the PURL assigned, the author and title in these tables
  4. inserts the assigned item number (filename, minus the extension) into a UA_identifier attribute and the assigned PURL into a UA_purl attribute within the DISS_submission field in a copy of the metadata
  5. places this altered copy of the metadata into an PRQ subdirectory; the copy will be named with the assigned filename followed by ".prq.xml" (thus a correctly named file would be: u0015_0000001_0000023.prq.xml) to indicate this is still ProQuest XML.
  6. copies all the bitstreams and renames them appropriately, placing them in a CONTENT subdirectory. The primary PDF will be named with the assigned filename followed by ".pdf"; subsidiary files will be numbered sequentially, with a 4-digit left-padded number attached to the assigned filename, followed by the extension. So the first subsidiary file for this file (if a jpeg) would properly be named u0015_0000001_0000023_0001.jpg, and the second (if a text file) would be named u0015_0000001_0000023_0002.txt.
  7. creates an entry for each record in a tab-delimited xmlList.xml file which contains the following fields:
    1. assigned filename
    2. original filename
    3. author
    4. title
    5. directory (created out of zip file name)
    6. year the manuscript was completed
    7. year the degree was awarded
    8. an indicator of the existence of subsidiary files (a count)
    9. the embargo code
    10. date item is made available via the web
    11. the assigned PURL

D. Metadata Librarian works with the deposited content to create valid MODS files meeting our local profile, which include the assigned identifier and PURL, and are named for the assigned identifier with a ".mods.xml" extension.

E. Metadata Librarian uploads the finished MODS and the associated renamed content into a datestamp-titled MODS directory in her home area on libcontent1.lib.ua.edu.

F. Metadata Librarian runs the File:Relocate all BD.txt script in her home directory. This places all the content and MODS (which are not under embargo) into the correct directories in Acumen, as well as copying everything to the deposits directory for upload into the storage archive.

G. Metadata Librarian checks the final display and access via Acumen to verify that no problems exist. If any problems are encountered, she contacts Jody and we work out how to fix them.  :-)

H. Jody runs a script (/srv/scripts/bornDigital/relocatingBd File:RelocatingBD.txt) which will move the files into the correct subdirectories for long-term storage, linking them into the LOCKSS manifests.

I. Kayla or Janet creates valid MARC files for upload into our OPAC system, which reference the included assigned identifier and PURL. Note: This step is for files without embargoes. For more information about files with embargoes, see steps M. and O.)

J. Kayla or Janet will batch upload the MARC to WorldCat, in order to obtain OCLC numbers for the records.

K. Kayla or Janet then submits a batch upload of the MARC records into our catalog system.

L. Kayla or Janet checks the final display and access via the OPAC.

M. The embargo-checking script "File:CheckEmbargo.txt" will check the database on the 21st of each month for embargoes which are due to lift the next month; this script will email Jody and Metadata Librarians with the filename, title, author, and date that the embargo is to lift.

N. Another script, "File:LiftEmbargo.txt" will run on the first of each month, and will copy live anything whose embargo has lifted.

O. Kayla or Janet will upload the no-longer-embargoed-content into WorldCat and then the local OPAC.

P. Should the metadata require remediation, Metadata Librarian will add a recordChangeDate field, and will upload the altered MODS files to libcontent1 and run File:Relocate all BD.txt. This will move the files to the live web directory (except for those under embargo) and also to a deposits directory for archival storage.

Q. Jody will transfer these into archival storage using the aforementioned script, File:RelocatingBD.txt.

Altered_Embargoes

Reference: Find_our_content_online

updated 4/26/10 jlderidder

Personal tools