Batches

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
(reworked instructions based on new work flow, incorporating tracking into SS)
m
(4 intermediate revisions by one user not shown)
Line 1: Line 1:
 
A batch is simply a portion of a collection that goes through the upload process bundled together. This page contains or links to information related to the batching process and what differentiates it from a whole-collection upload.
 
A batch is simply a portion of a collection that goes through the upload process bundled together. This page contains or links to information related to the batching process and what differentiates it from a whole-collection upload.
  
==When to make a batch==
+
==Creating a Batch==
  
A new batch is created when you decide to upload content from an ongoing collection. This almost always happens when you've broken off and numbered a new [[Scans folder]], but it is possible to upload several Scans folders at once. Whatever the case, the content from a particular collection which is uploaded together is a batch, and this content should be ''no more than 300 items or 4000 captures.''
+
===What is a batch?===
  
==Documentation for a batch==
+
A new batch is created when you decide to upload content from an ongoing collection. This almost always happens when you've broken off and numbered a new [[Scans folder]], but it is possible to upload several Scans folders at once. Whatever the case, the content from a particular collection which is uploaded together is a batch, and this content should be ''no more than 300 items or 4000 captures.''
  
 
===Dual role of the metadata spreadsheet===
 
===Dual role of the metadata spreadsheet===
  
As of January 2013, tracking data is incorporated into the metadata spreadsheet during the capture process. For the purposes of this explanation, metadata refers to the data which goes online, while tracking data is the internal, administrative data we capture for our unit. Those columns are found at the end of the spreadsheet (after the Staff Notes and Batch columns).
+
As of January 2013, tracking data is incorporated into the metadata spreadsheet during the capture process. For the purposes of this explanation, ''metadata'' refers to the data which goes online, while ''tracking data'' is the internal, administrative information we capture for our unit. Those columns are found at the end of the spreadsheet, and they should be removed before handing off the spreadsheet to the metadata unit.
  
===Isolating the batch within the metadata spreadsheet===
+
===Isolating a batch within the metadata spreadsheet===
  
* As you tracking your captures in the metadata spreadsheet, make sure to note in the Batch column that the item is part of the current batch.
+
* As you track your captures in the metadata spreadsheet, make sure to note in the Batch column that the item is part of the current batch.
* Note which items have been skipped by labeling them 'skipped' in the Batch column
+
* Note which items have been skipped by labeling them 'skipped' in the Batch column (see [[Skipped Items]])
 
* When you are ready to upload a batch, sort the spreadsheet by the Batch column; this will bring all of the lines of metadata for that batch together
 
* When you are ready to upload a batch, sort the spreadsheet by the Batch column; this will bring all of the lines of metadata for that batch together
  
===Creating batch documentation===
+
==Batch Documentation==
  
* ''Batch metadata spreadsheet''
+
===Batch Spreadsheet===
** Copy the batch metadata to another sheet, and also the spreadsheet headers
+
* Used by: the Metadata Unit
*** Do not copy the tracking data
+
* Used for: remediating metadata that has already gone online
** Save that sheet as a new, separate file, so that you don't alter the original collection metadata
+
 
 +
To create a batch spreadsheet...
 +
* Select just the rows that belong to the batch, plus the header row
 +
* Deselect the tracking data, leaving the metadata
 +
* Save the selection as a new, separate file: a Microsoft Excel document (.xlsx)
 
** Naming format: ''collectionnumber.batchnumber.m0number.xlsx''
 
** Naming format: ''collectionnumber.batchnumber.m0number.xlsx''
*** Example: u0003_0000193.2.m01.xlsx
+
** Example: u0003_0000193.2.m01.xlsx
 +
* Save this file to the Metadata folder temporarily; just prior to upload, it will be moved to Administrative\Pipeline\collectionInfo\forMDlib\needsRemediation
  
* ''Batch text export'' for [[Making MODS | MODS creation]]
+
===Batch text export===
** Export as a tab delimited file all the columns of metadata, but NOT the tracking data
+
* Used by: Digital Services
** Save to the collection's Metadata folder
+
* Used for: [[Making MODS | MODS creation]]
 +
 
 +
To create a batch text export...
 +
* Select just the rows that belong to the batch, plus the header row
 +
* Deselect the tracking data, leaving the metadata
 +
* Save the selection as a new, separate file: a tab-delimited text document (.txt)
 
** Naming format: ''collectionnumber.batchnumber.m0number.txt''
 
** Naming format: ''collectionnumber.batchnumber.m0number.txt''
*** Example: u0003_0000193.2.m01.txt
+
** Example: u0003_0000193.2.m01.txt
 +
* Save this file to the collection's Metadata folder (S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Metadata)
  
* ''Batch log file''
+
===Batch log file===
** Export as a tab delimited file all the columns of tracking data (but NOT the metadata), plus the Item Identifier column
+
* Created for: archiving tracking data collected during the capture process
** Save to the collection's Admin folder
+
 
 +
To create a batch log file...
 +
* Select just the rows that belong to the batch, plus the header row
 +
* Deselect the metadata, leaving the tracking data
 +
* Save the selection as a new, separate file: a tab-delimited text document (.txt)
 
** Naming format: ''collectionnumber.batchnumber.m0number.log.txt''
 
** Naming format: ''collectionnumber.batchnumber.m0number.log.txt''
*** Example: u0003_0000193.2.m01.log.txt
+
** Example: u0003_0000193.2.m01.log.txt
 +
* Save this file to the collection's Admin folder (S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Admin)
  
===Older spreadsheet?===
+
===Batch OCR list===
 +
* Used by: Upload Script
 +
* Used for: telling MakeJpegs which files to process through Tesseract
 +
 
 +
To create a batch OCR list, see [[OCR List | these instructions]]
 +
 
 +
===Note on naming files made from older spreadsheets===
  
 
* For older spreadsheets that don't have m0numbers (m01, m02, m03, etc) simply leave that out
 
* For older spreadsheets that don't have m0numbers (m01, m02, m03, etc) simply leave that out
Line 44: Line 66:
 
** u0003_0001577.30.xlsx
 
** u0003_0001577.30.xlsx
 
** u0001_2007010.4.log.txt
 
** u0001_2007010.4.log.txt
 
  
 
==Checklist: What's different about workflow for a batch==
 
==Checklist: What's different about workflow for a batch==
 
* Preparation for Upload
 
* Preparation for Upload
** Separate batch spreadsheet in Administrative\Pipeline\collectionInfo\forMDlib\needsRemediation  
+
** Create a collection folder in Digital_Coll_Complete, and
** Text file metadata in S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Metadata, with MODS created from that
+
*** '''Move''' the Scans folder to it
** Text file log in S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Admin
+
*** '''Create''' Admin and Metadata folders
 +
*** '''Copy''' the appropriate documents from in_progress to Admin and Metadata 
 +
** Create batch spreadsheet in Administrative\Pipeline\collectionInfo\forMDlib\needsRemediation  
 +
** Create batch metadata text file in S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Metadata, with MODS created from that
 +
** Create batch log file in S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Admin
 
* Upload
 
* Upload
 
** The script-running process should be identical
 
** The script-running process should be identical
** With a batch (unless it's the first), the collection info xml is already online, so you don't have to wait 24 hours for Acumen to index the items before you can run moveContent  
+
** With a batch (unless it's the first), the collection info xml is already online, so you don't have to wait for Acumen to index the items before you can run moveContent  
 
* Storage
 
* Storage
 
** You will not need the collection info xml file when you move the collection from the share drive to storage. In fact, the moveContent script will tell you to delete it...unless it is a new, improved version
 
** You will not need the collection info xml file when you move the collection from the share drive to storage. In fact, the moveContent script will tell you to delete it...unless it is a new, improved version
 
* Documentation
 
* Documentation
 
** Unless you're working on the last batch of a collection, do not move the collection from in progress to completed on the Selection spreadsheet
 
** Unless you're working on the last batch of a collection, do not move the collection from in progress to completed on the Selection spreadsheet

Revision as of 16:01, 23 April 2013

A batch is simply a portion of a collection that goes through the upload process bundled together. This page contains or links to information related to the batching process and what differentiates it from a whole-collection upload.

Contents

Creating a Batch

What is a batch?

A new batch is created when you decide to upload content from an ongoing collection. This almost always happens when you've broken off and numbered a new Scans folder, but it is possible to upload several Scans folders at once. Whatever the case, the content from a particular collection which is uploaded together is a batch, and this content should be no more than 300 items or 4000 captures.

Dual role of the metadata spreadsheet

As of January 2013, tracking data is incorporated into the metadata spreadsheet during the capture process. For the purposes of this explanation, metadata refers to the data which goes online, while tracking data is the internal, administrative information we capture for our unit. Those columns are found at the end of the spreadsheet, and they should be removed before handing off the spreadsheet to the metadata unit.

Isolating a batch within the metadata spreadsheet

  • As you track your captures in the metadata spreadsheet, make sure to note in the Batch column that the item is part of the current batch.
  • Note which items have been skipped by labeling them 'skipped' in the Batch column (see Skipped Items)
  • When you are ready to upload a batch, sort the spreadsheet by the Batch column; this will bring all of the lines of metadata for that batch together

Batch Documentation

Batch Spreadsheet

  • Used by: the Metadata Unit
  • Used for: remediating metadata that has already gone online

To create a batch spreadsheet...

  • Select just the rows that belong to the batch, plus the header row
  • Deselect the tracking data, leaving the metadata
  • Save the selection as a new, separate file: a Microsoft Excel document (.xlsx)
    • Naming format: collectionnumber.batchnumber.m0number.xlsx
    • Example: u0003_0000193.2.m01.xlsx
  • Save this file to the Metadata folder temporarily; just prior to upload, it will be moved to Administrative\Pipeline\collectionInfo\forMDlib\needsRemediation

Batch text export

To create a batch text export...

  • Select just the rows that belong to the batch, plus the header row
  • Deselect the tracking data, leaving the metadata
  • Save the selection as a new, separate file: a tab-delimited text document (.txt)
    • Naming format: collectionnumber.batchnumber.m0number.txt
    • Example: u0003_0000193.2.m01.txt
  • Save this file to the collection's Metadata folder (S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Metadata)

Batch log file

  • Created for: archiving tracking data collected during the capture process

To create a batch log file...

  • Select just the rows that belong to the batch, plus the header row
  • Deselect the metadata, leaving the tracking data
  • Save the selection as a new, separate file: a tab-delimited text document (.txt)
    • Naming format: collectionnumber.batchnumber.m0number.log.txt
    • Example: u0003_0000193.2.m01.log.txt
  • Save this file to the collection's Admin folder (S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Admin)

Batch OCR list

  • Used by: Upload Script
  • Used for: telling MakeJpegs which files to process through Tesseract

To create a batch OCR list, see these instructions

Note on naming files made from older spreadsheets

  • For older spreadsheets that don't have m0numbers (m01, m02, m03, etc) simply leave that out
  • Examples
    • u0003_0001577.30.xlsx
    • u0001_2007010.4.log.txt

Checklist: What's different about workflow for a batch

  • Preparation for Upload
    • Create a collection folder in Digital_Coll_Complete, and
      • Move the Scans folder to it
      • Create Admin and Metadata folders
      • Copy the appropriate documents from in_progress to Admin and Metadata
    • Create batch spreadsheet in Administrative\Pipeline\collectionInfo\forMDlib\needsRemediation
    • Create batch metadata text file in S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Metadata, with MODS created from that
    • Create batch log file in S:\Digital Projects\Digital_Coll_Complete\[collectionnumber]\Admin
  • Upload
    • The script-running process should be identical
    • With a batch (unless it's the first), the collection info xml is already online, so you don't have to wait for Acumen to index the items before you can run moveContent
  • Storage
    • You will not need the collection info xml file when you move the collection from the share drive to storage. In fact, the moveContent script will tell you to delete it...unless it is a new, improved version
  • Documentation
    • Unless you're working on the last batch of a collection, do not move the collection from in progress to completed on the Selection spreadsheet
Personal tools