Streamlining Metadata Creation

From UA Libraries Digital Services Planning and Documentation
Jump to: navigation, search

From about 2009 to 2015, our process for most digitization was this:

  1. Archivists select content and generate initial metadata spreadsheets
  2. Digital Services digitize content and upload with initial MODS, handing off initial metadata to metadata librarians
  3. Metadata librarians remediate the spreadsheets and overwrite the initial MODS with final MODS (see SpreadsheetToMODSWorkflow.docx.)

A more detailed review of this process can be found in MDworkflow_March2014.docx.

In 2015, we were mandated to streamline this process, generating completed MODS prior to upload.

The possible points for change are listed below, for reference and implementation.

  1. Reduce duplication of effort
    1. Incorporate necessary spreadsheet review into QC2?
    2. Do not enter "creator" value if a name has been entered into another creator field (such as "Sender")
    3. Check names as they are entered in the spreadsheet, and verify entire spreadsheet after tagging subjects (do not recheck names)
  2. Reduce excess effort
    1. If not significant, do not include names in subject headings
    2. If not significant material, do not spend more than 5 minutes assigning subjects or description
    3. Subjects should be "about" the content
    4. Document common forms of materials to assist in consistency
    5. Do not enter "unknown" if creator/date/etc is unknown; leave blank
    6. Combine small collections together (see Combo Collections)
    7. Repurpose unused column for names with other relator values, and standardize how they are entered (have script translate)
  3. Automate manual processes where possible
    1. Copy sender and receiver names to LCSH column, followed by "--Correspondence"
    2. Use ExcelConverter to transform spreadsheets to tab-delimited text
    3. Include item and collection PURL incorporation in getJpegs script to avoid extra steps?
  4. Reorder work flow where it makes sense to do so
    1. Take care of common "remediation" problems during initial data entry
    2. Do subject tagging and name authority work BEFORE digitization
    3. Review spreadsheets as part of QC 2 during digitization process
    4. Run rawMods2Mods and validation scripts before upload
    5. Incorporate item PURLs and collection PURLS into MODS during upload process (makeJpegs script), eliminating preliminary scripts and current problems
    6. Incorporate recordCreation/change dates into upload process?

Other aspects

  1. Improve existing scripts
    1. Pull names from subject headings for MADS
    2. Repair date transformations
    3. Improve PURL incorporation script if we continue to include it in rawModsToMods
    4. Improve VIAF search script if possible
    5. Rewrite powershell scripts (safety risk and too slow)
  2. Automate as much of existing remediation work as possible
    1. Add authority code to master spreadsheet list
    2. Compare subject headings to those in existing subject master list, pull replacement value, translate to proper MODS encoding; document failures
    3. Compare name entries to those in MADS list; if not there, attempt VIAF lookup and additions to MADS list; document failures
    4. Remove creator entries where sender and recipient exist
    5. Remove newlines in MODS fields, ensure typeOfContent contains controlled vocabulary, add creation/change dates, check for validation
  3. Complete metadata processing on existing spreadsheets prior to upload
  4. Do complete metadata process on new content entering the queue prior to digitization, adding staffing to this effort