Preparing Collections on the S Drive for Online Delivery and Storage

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
(Admin)
(removing references to outdated documentation forms (trackingfiles, ocrlist); simplifying other stuff)
 
(One intermediate revision by one user not shown)
Line 15: Line 15:
 
## Example: u0003_0000633_HarperTimetables --> u0003_0000633
 
## Example: u0003_0000633_HarperTimetables --> u0003_0000633
 
# Finalize collection documentation
 
# Finalize collection documentation
## '''Tracking Data'''
+
#* '''Tracking Data''': COPY Filename column from the metadata spreadsheet to a new sheet, MOVE Tracking Data columns to new sheet, SAVE as tab delimited .txt file in Admin folder (see [[Tracking Data]] for more)
### OLD/SEPARATE FILE: save TrackingFiles spreadsheet (see S:\Digital Projects\Organization\TrackingFiles) as tab delimited .txt file in Admin folder (see [[TrackingFiles]] for more)
+
#* '''Metadata''':  save spreadsheet (minus the tracking data you should’ve already removed) as tab delimited .txt file in Metadata folder
### NEW/INTEGRATED: COPY Filename column from the metadata spreadsheet to a new sheet, MOVE Tracking Data columns to new sheet, SAVE as tab delimited .txt file in Admin folder (see [[Tracking Data]] for more)
+
## '''OCR List''': see [[OCR List | these instructions]]
+
## '''Metadata''':  save spreadsheet (minus the tracking data you should’ve already removed) as tab delimited .txt file in Metadata folder
+
 
# [[Making MODS | Create MODS]]  
 
# [[Making MODS | Create MODS]]  
 
# Carefully check contents of the folders (see list below)
 
# Carefully check contents of the folders (see list below)
## Make sure to remove any unnecessary files created during the capture process (for example, test scans or supplementary metadata or text file notes about progress)
+
#* Make sure to remove any unnecessary files created during the capture process (for example, test scans or supplementary metadata or text file notes about progress)
 
# Once everything is okay, you’re ready to [[Most Content | Upload Content]]!
 
# Once everything is okay, you’re ready to [[Most Content | Upload Content]]!
  
Line 28: Line 25:
  
 
# Set up collection folder in Digital_Coll_Complete, and create inside it
 
# Set up collection folder in Digital_Coll_Complete, and create inside it
## Admin
+
#* Admin
## Metadata
+
#* Metadata
## Transcripts (if necessary)
+
#* Transcripts (if necessary)
 
# Move Scans folder from ongoing collection folder in Digital_Coll_in_progress to this new collection folder in Digital_Coll_Complete
 
# Move Scans folder from ongoing collection folder in Digital_Coll_in_progress to this new collection folder in Digital_Coll_Complete
 
# Follow the procedures for [[Batches | Creating Batch Documentation]]
 
# Follow the procedures for [[Batches | Creating Batch Documentation]]
Line 41: Line 38:
 
The Collection Level folder contains [[Share_Drive_Protocols#Contents|subfolders]] and their content must adhere to certain specifications prior to the collection being considered ready to "ship" for online access and long term storage.
 
The Collection Level folder contains [[Share_Drive_Protocols#Contents|subfolders]] and their content must adhere to certain specifications prior to the collection being considered ready to "ship" for online access and long term storage.
  
The collection number u0003_0000001 will be used as an example.
+
 
 +
'''The following folders must exist and be capitalized as shown
 +
* Admin
 +
* Metadata
 +
* Scans'''
  
 
===Admin===
 
===Admin===
* This folder must exist.
 
  
*Must contain:
+
*MUST contain:
** '''[[Collection_Information|u0003_0000001.xml]]'''  
+
** [[Collection_Information| Collection information XML file]]
 +
***'''u0003_0000001.xml'''
 
***If multiple Digital Collections spawn from the same Analog Collection, there can be more than one Collection Information XML file as follows: u0003_0000001.1.xml, u0003_0000001.2.xml, etc.
 
***If multiple Digital Collections spawn from the same Analog Collection, there can be more than one Collection Information XML file as follows: u0003_0000001.1.xml, u0003_0000001.2.xml, etc.
 
***Make sure to refer to the [[Collection_Information|Collection_Information]] page regarding acceptable data values.
 
***Make sure to refer to the [[Collection_Information|Collection_Information]] page regarding acceptable data values.
** '''[[Tracking Data|u0003_0000001.log.txt]]'''
+
** [[Tracking Data|Log file]]
***Include a text version of the log file with every batch; previous versions of the log will be overwritten by the newest version.
+
***'''u0003_0000001.log.txt'''
 +
***Include a text version of the log file with every batch.
  
* May also contain:
+
* MAY also contain:
 
** [[Thumbs_icons|Thumbs Icon]] - .png extension.
 
** [[Thumbs_icons|Thumbs Icon]] - .png extension.
** [[OCR List|OCR list]] - .ocrList.txt extension.
+
*** '''u0001_2007001.icon.png'''
*** '''u0001_0000002.ocrList.txt'''
+
 
<!--** Finding Aid - .xml extension-->
 
<!--** Finding Aid - .xml extension-->
 
** [[Skipped Items | Skipped items list]] - .skipped.txt extension. For batched collections: this should be present ONLY during the last upload, as it will contain information about skipped items across the entire collection.
 
** [[Skipped Items | Skipped items list]] - .skipped.txt extension. For batched collections: this should be present ONLY during the last upload, as it will contain information about skipped items across the entire collection.
 
*** '''u0003_0000193.skipped.txt'''
 
*** '''u0003_0000193.skipped.txt'''
**Other relevant documents saved as plain .txt (ANSI or UTF-8 without BOM preferred). If possible please incorporate any additional data into the log.txt file. For example, audio collections often have significant item-level notes that we want to retain. These plain text files can be saved with a ".notes.txt" extension
+
** [[Match file]] - .txt extension
 +
*** '''u0001_2007010.match.txt'''
 +
**Other relevant documents saved as plain .txt (ANSI or UTF-8 without BOM preferred). If possible please incorporate any additional data into the log.txt file. For example, audio collections often have significant item-level notes that we want to retain. Additional notes can be saved as a plain text files with a ".notes.txt" extension
 
*** '''u0008_0000001_0000001.notes.txt'''
 
*** '''u0008_0000001_0000001.notes.txt'''
  
 
===Metadata===
 
===Metadata===
* This folder must exist.
+
 
* Must contain:
+
* MUST contain:
**'''u0003_0000001.m01.txt'''  or '''u0002_0000001.m03.txt'''  or '''u0008_0000001.m02.txt'''  
+
** Text export of metadata spreadsheet
***Note the type of spreadsheet is echoed in the segment before the ".txt" --  if this is a batch file, the batch number precedes the m0x value
+
***'''u0003_0000001.m01.txt'''  or '''u0002_0000001.m03.txt'''  or '''u0008_0000001.m02.txt'''  
**** example: '''u0002_0000001.1.m01.txt'''.
+
***Note the type of spreadsheet is echoed in the segment before the ".txt" --  if this is a batch file, the batch number precedes the m0x value -- example: '''u0002_0000001.1.m01.txt'''.
 
**** check this file for: Diacritics, Quotes, UTF-8 encoding
 
**** check this file for: Diacritics, Quotes, UTF-8 encoding
***see [https://intranet.lib.ua.edu/cataloging/metadata/SpreadsheetRegistry] for more information.
+
**** see [https://intranet.lib.ua.edu/cataloging/metadata/SpreadsheetRegistry] for more information.
**'''A MODS folder'''
+
** [[Making MODS | MODS folder]]
*** This folder will contain all the MODS files created via Archivist Utility (see: [[Making MODS]]).
+
*** With MODS files created via Archivist Utility.
 +
** [[Image Technical Metadata | FITS folder]]
 +
*** With FITS files created by script
  
 
===Scans===
 
===Scans===
* This folder must exist.
+
 
 
+
*MUST contain ONLY:
*Must only contain:
+
 
**'''Scans (tiffs/wavs)''' of non-compound objects and compound objects (inside respective subfolders). All other files types will not be retained. Temporary files and thumbs.db files do not have to be deleted since they will be removed upon transfer to Storage.
 
**'''Scans (tiffs/wavs)''' of non-compound objects and compound objects (inside respective subfolders). All other files types will not be retained. Temporary files and thumbs.db files do not have to be deleted since they will be removed upon transfer to Storage.
  
 
====Transcripts====
 
====Transcripts====
* This folder can exist only if [[transcripts]] exist.
+
* This folder CAN exist ONLY IF [[transcripts]] exist.
 
*Must only contain one or more of the following types of files:
 
*Must only contain one or more of the following types of files:
** u0003_0000001_0000001.tif, u0003_0000002_0001.tif, etc. - corresponding to non-compound and compound objects (inside respective subfolders).
+
** u0003_0000001_0000001.tif, u0003_0000002_0000001_0001.tif, etc. - images that need to be OCRed
** u0003_0000001_0000001.txt, u0003_0000002_0001.txt, etc. - plain .txt files corresponding to non-compound and compound objects (inside respective subfolders).
+
** u0003_0000001_0000001.txt, u0003_0000002_0001.txt, etc. - plain text transcripts
** u0003_0000001_0000001.ocr.txt, u0003_0000002_0001.ocr.txt, etc. - plain .txt files of OCRed tiffs corresponding to non-compound and compound objects (inside respective subfolders). If cleaned up .txt files exist, remove the corresponding .ocr.txt file.
+
** u0003_0000001_0000001.ocr.txt, u0003_0000002_0001.ocr.txt, etc. - plain text OCR from images
  
 
===needsRemediation===
 
===needsRemediation===

Latest revision as of 08:54, 22 April 2015

The following page assumes the content for upload has already been through the Quality Control process

Contents

[edit] Preparation Procedure

Choose one of the following checklists:

One-Shot Collections are small enough to be completed in one batch, so they require no batch numbering or batching process.

Ongoing Collections are those which will have multiple batches for upload, so they require "batching."

[edit] One-Shot Collections

  1. Move collection folder from Digital_Coll_in_progress to Digital_Coll_Complete
  2. Remove extra text in folder name so that it is labeled with just the collection number
    1. Example: u0003_0000633_HarperTimetables --> u0003_0000633
  3. Finalize collection documentation
    • Tracking Data: COPY Filename column from the metadata spreadsheet to a new sheet, MOVE Tracking Data columns to new sheet, SAVE as tab delimited .txt file in Admin folder (see Tracking Data for more)
    • Metadata: save spreadsheet (minus the tracking data you should’ve already removed) as tab delimited .txt file in Metadata folder
  4. Create MODS
  5. Carefully check contents of the folders (see list below)
    • Make sure to remove any unnecessary files created during the capture process (for example, test scans or supplementary metadata or text file notes about progress)
  6. Once everything is okay, you’re ready to Upload Content!

[edit] Ongoing Collections

  1. Set up collection folder in Digital_Coll_Complete, and create inside it
    • Admin
    • Metadata
    • Transcripts (if necessary)
  2. Move Scans folder from ongoing collection folder in Digital_Coll_in_progress to this new collection folder in Digital_Coll_Complete
  3. Follow the procedures for Creating Batch Documentation
  4. Create MODS
  5. Double-check contents of the folders (see list below)
  6. Once everything is okay, you’re ready to Upload Content!


[edit] Checking Folders

The Collection Level folder contains subfolders and their content must adhere to certain specifications prior to the collection being considered ready to "ship" for online access and long term storage.


The following folders must exist and be capitalized as shown

  • Admin
  • Metadata
  • Scans

[edit] Admin

  • MUST contain:
    • Collection information XML file
      • u0003_0000001.xml
      • If multiple Digital Collections spawn from the same Analog Collection, there can be more than one Collection Information XML file as follows: u0003_0000001.1.xml, u0003_0000001.2.xml, etc.
      • Make sure to refer to the Collection_Information page regarding acceptable data values.
    • Log file
      • u0003_0000001.log.txt
      • Include a text version of the log file with every batch.
  • MAY also contain:
    • Thumbs Icon - .png extension.
      • u0001_2007001.icon.png
    • Skipped items list - .skipped.txt extension. For batched collections: this should be present ONLY during the last upload, as it will contain information about skipped items across the entire collection.
      • u0003_0000193.skipped.txt
    • Match file - .txt extension
      • u0001_2007010.match.txt
    • Other relevant documents saved as plain .txt (ANSI or UTF-8 without BOM preferred). If possible please incorporate any additional data into the log.txt file. For example, audio collections often have significant item-level notes that we want to retain. Additional notes can be saved as a plain text files with a ".notes.txt" extension
      • u0008_0000001_0000001.notes.txt

[edit] Metadata

  • MUST contain:
    • Text export of metadata spreadsheet
      • u0003_0000001.m01.txt or u0002_0000001.m03.txt or u0008_0000001.m02.txt
      • Note the type of spreadsheet is echoed in the segment before the ".txt" -- if this is a batch file, the batch number precedes the m0x value -- example: u0002_0000001.1.m01.txt.
        • check this file for: Diacritics, Quotes, UTF-8 encoding
        • see [1] for more information.
    • MODS folder
      • With MODS files created via Archivist Utility.
    • FITS folder
      • With FITS files created by script

[edit] Scans

  • MUST contain ONLY:
    • Scans (tiffs/wavs) of non-compound objects and compound objects (inside respective subfolders). All other files types will not be retained. Temporary files and thumbs.db files do not have to be deleted since they will be removed upon transfer to Storage.

[edit] Transcripts

  • This folder CAN exist ONLY IF transcripts exist.
  • Must only contain one or more of the following types of files:
    • u0003_0000001_0000001.tif, u0003_0000002_0000001_0001.tif, etc. - images that need to be OCRed
    • u0003_0000001_0000001.txt, u0003_0000002_0001.txt, etc. - plain text transcripts
    • u0003_0000001_0000001.ocr.txt, u0003_0000002_0001.ocr.txt, etc. - plain text OCR from images

[edit] needsRemediation

  • This folder always exists at the following location.
    • S:\Digital Projects\Administrative\Pipeline\collectionInfo\forMDlib\needsRemediation
  • Must contain:
    • Excel formatted version of the batch metadata.
      • example: u0002_0000001.1.m01.xlsx.
Personal tools