Batches

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
(reworked instructions based on new work flow, incorporating tracking into SS)
Line 5: Line 5:
 
A new batch is created when you decide to upload content from an ongoing collection. This almost always happens when you've broken off and numbered a new [[Scans folder]], but it is possible to upload several Scans folders at once. Whatever the case, the content from a particular collection which is uploaded together is a batch, and this content should be ''no more than 300 items or 4000 captures.''  
 
A new batch is created when you decide to upload content from an ongoing collection. This almost always happens when you've broken off and numbered a new [[Scans folder]], but it is possible to upload several Scans folders at once. Whatever the case, the content from a particular collection which is uploaded together is a batch, and this content should be ''no more than 300 items or 4000 captures.''  
  
==Documenting and creating metadata for a batch==
+
==Documentation for a batch==
  
===How do you determine which batch number it is?===
+
===Dual role of the metadata spreadsheet===
  
In the collection's metadata spreadsheet, the last column contains the batch number.  
+
As of January 2013, tracking data is incorporated into the metadata spreadsheet during the capture process. For the purposes of this explanation, metadata refers to the data which goes online, while tracking data is the internal, administrative data we capture for our unit. Those columns are found at the end of the spreadsheet (after the Staff Notes and Batch columns).
# Check that column for the largest number -- which should be the last/previous batch
+
# Checking for the largest batch number on the storage sever may also help with secondary verification
+
# Assign your batch the next number.
+
# Fill in that number for all the lines of metadata in your batch.
+
  
===How do you create batch metadata?===  
+
===Isolating the batch within the metadata spreadsheet===
  
While the whole metadata spreadsheet should stay with the collection in progress, the collection batch that has been moved to Completed needs its own batch metadata.
+
* As you tracking your captures in the metadata spreadsheet, make sure to note in the Batch column that the item is part of the current batch.
 +
* Note which items have been skipped by labeling them 'skipped' in the Batch column
 +
* When you are ready to upload a batch, sort the spreadsheet by the Batch column; this will bring all of the lines of metadata for that batch together
  
Batch metadata includes
+
===Creating batch documentation===
* ''Batch spreadsheet''
+
 
** This is created from the collection metadata spreadsheet
+
* ''Batch metadata spreadsheet''
** Copy the following to a new sheet: the header row and the rows of metadata that correspond to your batch
+
** Copy the batch metadata to another sheet, and also the spreadsheet headers
 +
*** Do not copy the tracking data
 
** Save that sheet as a new, separate file, so that you don't alter the original collection metadata
 
** Save that sheet as a new, separate file, so that you don't alter the original collection metadata
 
** Naming format: ''collectionnumber.batchnumber.m0number.xlsx''
 
** Naming format: ''collectionnumber.batchnumber.m0number.xlsx''
 
*** Example: u0003_0000193.2.m01.xlsx
 
*** Example: u0003_0000193.2.m01.xlsx
*** For older spreadsheets that don't have m0numbers (m01, m02, m03, etc) simply leave that out: u0003_0000193.2.xlsx
+
 
* ''Batch text file'' for MODS creation
+
* ''Batch text export'' for [[Making MODS | MODS creation]]
** Text file creation explained in the first part of [[Making MODS]] -- make sure to process your batch spreadsheet, not the whole collection metadata spreadsheet
+
** Export as a tab delimited file all the columns of metadata, but NOT the tracking data
 +
** Save to the collection's Metadata folder
 
** Naming format: ''collectionnumber.batchnumber.m0number.txt''
 
** Naming format: ''collectionnumber.batchnumber.m0number.txt''
 
*** Example: u0003_0000193.2.m01.txt
 
*** Example: u0003_0000193.2.m01.txt
*** Same rule applies for text files made from metadata spreadsheets without m0numbers.
+
 
 +
* ''Batch log file''
 +
** Export as a tab delimited file all the columns of tracking data (but NOT the metadata), plus the Item Identifier column
 +
** Save to the collection's Admin folder
 +
** Naming format: ''collectionnumber.batchnumber.m0number.log.txt''
 +
*** Example: u0003_0000193.2.m01.log.txt
 +
 
 +
===Older spreadsheet?===
 +
 
 +
* For older spreadsheets that don't have m0numbers (m01, m02, m03, etc) simply leave that out
 +
* Examples
 +
** u0003_0001577.30.xlsx
 +
** u0001_2007010.4.log.txt
 +
 
  
 
==Checklist: What's different about workflow for a batch==
 
==Checklist: What's different about workflow for a batch==
 
* Preparation for Upload
 
* Preparation for Upload
 
** Separate batch spreadsheet in Administrative\Pipeline\collectionInfo\forMDlib\needsRemediation  
 
** Separate batch spreadsheet in Administrative\Pipeline\collectionInfo\forMDlib\needsRemediation  
** Text file version in S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Metadata, with MODS created from that
+
** Text file metadata in S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Metadata, with MODS created from that
 +
** Text file log in S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Admin
 
* Upload
 
* Upload
 
** The script-running process should be identical
 
** The script-running process should be identical

Revision as of 17:14, 7 January 2013

A batch is simply a portion of a collection that goes through the upload process bundled together. This page contains or links to information related to the batching process and what differentiates it from a whole-collection upload.

Contents

When to make a batch

A new batch is created when you decide to upload content from an ongoing collection. This almost always happens when you've broken off and numbered a new Scans folder, but it is possible to upload several Scans folders at once. Whatever the case, the content from a particular collection which is uploaded together is a batch, and this content should be no more than 300 items or 4000 captures.

Documentation for a batch

Dual role of the metadata spreadsheet

As of January 2013, tracking data is incorporated into the metadata spreadsheet during the capture process. For the purposes of this explanation, metadata refers to the data which goes online, while tracking data is the internal, administrative data we capture for our unit. Those columns are found at the end of the spreadsheet (after the Staff Notes and Batch columns).

Isolating the batch within the metadata spreadsheet

  • As you tracking your captures in the metadata spreadsheet, make sure to note in the Batch column that the item is part of the current batch.
  • Note which items have been skipped by labeling them 'skipped' in the Batch column
  • When you are ready to upload a batch, sort the spreadsheet by the Batch column; this will bring all of the lines of metadata for that batch together

Creating batch documentation

  • Batch metadata spreadsheet
    • Copy the batch metadata to another sheet, and also the spreadsheet headers
      • Do not copy the tracking data
    • Save that sheet as a new, separate file, so that you don't alter the original collection metadata
    • Naming format: collectionnumber.batchnumber.m0number.xlsx
      • Example: u0003_0000193.2.m01.xlsx
  • Batch text export for MODS creation
    • Export as a tab delimited file all the columns of metadata, but NOT the tracking data
    • Save to the collection's Metadata folder
    • Naming format: collectionnumber.batchnumber.m0number.txt
      • Example: u0003_0000193.2.m01.txt
  • Batch log file
    • Export as a tab delimited file all the columns of tracking data (but NOT the metadata), plus the Item Identifier column
    • Save to the collection's Admin folder
    • Naming format: collectionnumber.batchnumber.m0number.log.txt
      • Example: u0003_0000193.2.m01.log.txt

Older spreadsheet?

  • For older spreadsheets that don't have m0numbers (m01, m02, m03, etc) simply leave that out
  • Examples
    • u0003_0001577.30.xlsx
    • u0001_2007010.4.log.txt


Checklist: What's different about workflow for a batch

  • Preparation for Upload
    • Separate batch spreadsheet in Administrative\Pipeline\collectionInfo\forMDlib\needsRemediation
    • Text file metadata in S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Metadata, with MODS created from that
    • Text file log in S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Admin
  • Upload
    • The script-running process should be identical
    • With a batch (unless it's the first), the collection info xml is already online, so you don't have to wait 24 hours for Acumen to index the items before you can run moveContent
  • Storage
    • You will not need the collection info xml file when you move the collection from the share drive to storage. In fact, the moveContent script will tell you to delete it...unless it is a new, improved version
  • Documentation
    • Unless you're working on the last batch of a collection, do not move the collection from in progress to completed on the Selection spreadsheet
Personal tools