Tracking Data

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
 
(4 intermediate revisions by one user not shown)
Line 6: Line 6:
 
# Open the collection's metadata spreadsheet
 
# Open the collection's metadata spreadsheet
 
# Open trackingcolumns_template.xlsx (found in S:\Digital Projects\Organization\Digital_Program_Logs\TrackingFiles\TrackingFiles_database_files)
 
# Open trackingcolumns_template.xlsx (found in S:\Digital Projects\Organization\Digital_Program_Logs\TrackingFiles\TrackingFiles_database_files)
# COPY tracking columns to the end of the metadata spreadsheet or enter these column headers manually:
+
# COPY tracking columns to the end of the metadata spreadsheet or enter these column headers manually, just as they are given here:
 
     Number of Captures
 
     Number of Captures
 
     Captured with
 
     Captured with
Line 13: Line 13:
 
     OCR? (1=yes or 0=no)
 
     OCR? (1=yes or 0=no)
 
     DS Notes
 
     DS Notes
 +
    Metadata changed
 
4. Save the metadata spreadsheet (and close trackingcolumns_template.xlsx)
 
4. Save the metadata spreadsheet (and close trackingcolumns_template.xlsx)
  
 
===Use the tracking data columns during capture===
 
===Use the tracking data columns during capture===
* Be sure to fill in the OCR? column
+
* '''Include data for items that you actually capture, not things you skip''' (see [[Skipped Items]] for how to track those)
* Check your Number of Captures against the Format column -- the metadata should be made to match the number of captures you make (if there's a big discrepancy, see [[Procedural Anomalies]])
+
* '''If you begin but don't finish an item (even if you're planning to finish it soon), note that in the DS Notes column'''; this especially important for collections that multiple people are working on, or that you're going to put on the backburner for some reason -- basically, make sure someone else picking up the collection will know exactly where you are, even if you're in the middle of an item
 +
* Check your Number of Captures against the Format column in the metadata part of the spreadsheet -- the metadata should be made to match the number of captures you make (if there's a big discrepancy, see [[Procedural Anomalies]])
 +
* Be sure to fill in the OCR? column so that Tesseract can do its job
  
 
===Remove and save the tracking data columns===
 
===Remove and save the tracking data columns===
# This happens after the spreadsheet has been batched, if necessary
+
* This happens after the spreadsheet has been batched, if necessary
# COPY Filename column from the metadata spreadsheet to a new sheet
+
* COPY Filename column from the metadata spreadsheet to a new sheet
# MOVE Tracking Data columns to new sheet
+
* MOVE Tracking Data columns to new sheet
# Save new sheet as tab delimited .txt file [collNum.batchNum.log.txt] in the collection's Admin folder
+
* Make sure you also copy the column headers
 +
** If you don't, you'll get an error on upload -- because the script automatically assumes the first row is the header, it will interpret that your first item in Scans hasn't been entered into the log
 +
* Save new sheet as tab delimited .txt file [collNum.batchNum.log.txt] in the collection's Admin folder
 +
** Example: u0001_2007001.25.log.txt
  
 
==Rationale for Change==
 
==Rationale for Change==

Latest revision as of 10:04, 6 January 2014

This page refers to the current procedure for recording administrative metadata during the capture process. For older processes of recording tracking data, see TrackingFiles.

Contents

[edit] Procedure

[edit] Integrate the tracking data columns before capture begins

  1. Open the collection's metadata spreadsheet
  2. Open trackingcolumns_template.xlsx (found in S:\Digital Projects\Organization\Digital_Program_Logs\TrackingFiles\TrackingFiles_database_files)
  3. COPY tracking columns to the end of the metadata spreadsheet or enter these column headers manually, just as they are given here:
    Number of Captures
    Captured with
    Captured by
    Date
    OCR? (1=yes or 0=no)
    DS Notes
    Metadata changed

4. Save the metadata spreadsheet (and close trackingcolumns_template.xlsx)

[edit] Use the tracking data columns during capture

  • Include data for items that you actually capture, not things you skip (see Skipped Items for how to track those)
  • If you begin but don't finish an item (even if you're planning to finish it soon), note that in the DS Notes column; this especially important for collections that multiple people are working on, or that you're going to put on the backburner for some reason -- basically, make sure someone else picking up the collection will know exactly where you are, even if you're in the middle of an item
  • Check your Number of Captures against the Format column in the metadata part of the spreadsheet -- the metadata should be made to match the number of captures you make (if there's a big discrepancy, see Procedural Anomalies)
  • Be sure to fill in the OCR? column so that Tesseract can do its job

[edit] Remove and save the tracking data columns

  • This happens after the spreadsheet has been batched, if necessary
  • COPY Filename column from the metadata spreadsheet to a new sheet
  • MOVE Tracking Data columns to new sheet
  • Make sure you also copy the column headers
    • If you don't, you'll get an error on upload -- because the script automatically assumes the first row is the header, it will interpret that your first item in Scans hasn't been entered into the log
  • Save new sheet as tab delimited .txt file [collNum.batchNum.log.txt] in the collection's Admin folder
    • Example: u0001_2007001.25.log.txt

[edit] Rationale for Change

[edit] Problem

In December of 2012, it was proposed that we rethink the current tracking model, for two reasons:

  1. Numerous fields in the TrackingFiles document were not being used, and they were overburdened with validation rules
  2. TrackingFiles documents were separate files, housed in a totally separate location from the rest of the collection's documentation

[edit] Advantages of New System

We then proposed the simplified and integrated tracking data procedure outlined above. This process has many advantages for our workflow:

  • There is no longer a need to keep two spreadsheets open during capture, which improves our data collection
  • With everything in one spreadsheet, it is easier to
    • check the metadata against the tracking data, which allows for more accurate recordkeeping
    • isolate batch tracking data (as the entire working metadata/tracking spreadsheet is being "batched")
  • The tracking data has been pared down to essential fields but is open to future reconfiguration if necessary

[edit] Effects and Potential Effects

The process has the following impacts on workflow:

  • No added step at preparation for capture, just a rewriting of the old one (adding template columns to existing sheet rather than saving template sheet as new document)
  • No added step at preparation for upload, just a reworking of the old (transferring and saving the data rather than simply saving a spreadsheet)
    • admittedly, the copying/removing procedure introduces risk, but it is a risk we are accustomed to taking: we have always done something similar to extract the OCR list from TrackingFiles
    • if the columns are not removed, Archivist Utility will point them out during the MODS-making process, and the problem can be corrected
  • The new model contains most of the same data that was effectively used in the old TrackingFiles documents, and can be named and archived in the same way
  • Done properly, the new model should not impact the Metadata Unit at all; done poorly, it will not break our current system (columns will be mapped to MODS in a way that doesn't trigger a fatal error)
  • Legacy collections will not require manual migration to the new system
    • There was nothing inherently broken about the old system, so it can continue where it needs to
    • New form tracking can be instituted as outlined above at any point in ongoing collections because we document them in batches
Personal tools