Parsing Metadata

From UA Libraries Digital Services Planning and Documentation
Revision as of 15:55, 25 June 2012 by Scpstu32 (talk | contribs) (Steps)


This page demonstrates how to parse out metadata that corresponds with actual scans from the entire, collection-level spreadsheet. In other words, if a collection has a grand total of 100 items and only 25 items have been scanned to date and are ready for Storage, this tutorial shows how to extract only the metadata for those scanned items from the total set.

What you need

  1. Collection-level metadata file (generally an .xlsx file)
  2. Microsoft Excel
  3. [found here: S:\Digital Projects\Administrative\scripts\qc]
  4. [found here: S:\Digital Projects\Administrative\scripts\Python]
  5. A text editor (preferably one like Notepad++ Portable)


  • Run across your Scans folder for which you want metadata. #*This ensures there are no problems with filenames, etc. If there are any errors found, they MUST be corrected before proceeding to the next step.
  • Run across the Scans folder for which you want metadata. You may also fun this file on a collection folder.
    • Do what the script says regarding the file "pasteFromMetadata.txt"
    • Open the text file called "pasteToMetadata.txt" made by and then select and copy the results to your clipboard.


  • Open your master, collection-level metadata.
  • Paste the results from "pasteToMetadata.txt" into the bottom of the Filename column in the metadata Excel file, skipping approximately 10 rows from where the metadata ends.
  • In Excel, highlight the entire Filename column.
  • Find the Conditional Formatting tool in the middle of the Home ribbon. Click here and choose Highlight Cells Rules>Duplicate Values. Chose how you want duplicate values to be highlighted and click OK.
  • The rows in the metadata that correspond to your items from the Scans folder/folders will be highlighted as you have chosen.

Parsing Rohlig Audio Collection Metadata

  • As of 2010 July this tutorial and the accompanying script do not apply to metadata and scans for the Rohlig Audio Collection as the metadata for this collection is made per item by Digital Services from other sources and because of filenaming differences for audio collections vs. image collections. Therefore, in order to parse out metadata for items for this collection ready for storage/delivery, one must parse out the metadata "by hand".