Streamlining Metadata Creation
From UA Libraries Digital Services Planning and Documentation
From about 2009 to 2015, our process for most digitization was this:
- Archivists select content and generate initial metadata spreadsheets
- Digital Services digitize content and upload with initial MODS, handing off initial metadata to metadata librarians
- Metadata librarians remediate the spreadsheets and overwrite the initial MODS with final MODS (see SpreadsheetToMODSWorkflow.docx.)
A more detailed review of this process can be found in MDworkflow_March2014.docx.
In 2015, we were mandated to streamline this process, generating completed MODS prior to upload.
The possible points for change are listed below, for reference and implementation.
- Reduce duplication of effort
- Incorporate necessary spreadsheet review into QC2?
- Do not enter "creator" value if a name has been entered into another creator field (such as "Sender")
- Check names as they are entered in the spreadsheet, and verify entire spreadsheet after tagging subjects (do not recheck names)
- Reduce excess effort
- If not significant, do not include names in subject headings
- If not significant material, do not spend more than 5 minutes assigning subjects or description
- Subjects should be "about" the content
- Document common forms of materials to assist in consistency
- Do not enter "unknown" if creator/date/etc is unknown; leave blank
- Combine small collections together (see Combo Collections)
- Repurpose unused column for names with other relator values, and standardize how they are entered (have script translate)
- Automate manual processes where possible
- Copy sender and receiver names to LCSH column, followed by "--Correspondence"
- Use ExcelConverter to transform spreadsheets to tab-delimited text
- Include item and collection PURL incorporation in getJpegs script to avoid extra steps?
- Reorder work flow where it makes sense to do so
- Take care of common "remediation" problems during initial data entry
- Do subject tagging and name authority work BEFORE digitization
- Review spreadsheets as part of QC 2 during digitization process
- Run rawMods2Mods and validation scripts before upload
- Incorporate item PURLs and collection PURLS into MODS during upload process (makeJpegs script), eliminating preliminary scripts and current problems
- Incorporate recordCreation/change dates into upload process?
- Improve existing scripts
- Pull names from subject headings for MADS
- Repair date transformations
- Improve PURL incorporation script if we continue to include it in rawModsToMods
- Improve VIAF search script if possible
- Rewrite powershell scripts (safety risk and too slow)
- Automate as much of existing remediation work as possible
- Add authority code to master spreadsheet list
- Compare subject headings to those in existing subject master list, pull replacement value, translate to proper MODS encoding; document failures
- Compare name entries to those in MADS list; if not there, attempt VIAF lookup and additions to MADS list; document failures
- Remove creator entries where sender and recipient exist
- Remove newlines in MODS fields, ensure typeOfContent contains controlled vocabulary, add creation/change dates, check for validation
- Complete metadata processing on existing spreadsheets prior to upload
- Do complete metadata process on new content entering the queue prior to digitization, adding staffing to this effort