Parsing Metadata

From UA Libraries Digital Services Planning and Documentation
Revision as of 08:43, 26 June 2012 by Scpstu32 (talk | contribs) (Steps)


This page demonstrates how to parse out metadata that corresponds with actual scans from the entire, collection-level spreadsheet. In other words, if a collection has a grand total of 100 items and only 25 items have been scanned to date and are ready for Storage, this tutorial shows how to extract only the metadata for those scanned items from the total set.

What you need

  1. Collection-level metadata file (generally an .xlsx file)
  2. Microsoft Excel
  3. [found here: S:\Digital Projects\Administrative\scripts\qc]
  4. [found here: S:\Digital Projects\Administrative\scripts\Python]
  5. A text editor (preferably one like Notepad++ Portable)


  • Run across your Scans folder for which you want metadata. #*This ensures there are no problems with filenames, etc. If there are any errors found, they MUST be corrected before proceeding to the next step.
  • Run across the Scans folder for which you want metadata. You may also fun this file on a collection folder.
    • Do what the script says regarding the file "pasteFromMetadata.txt"
    • Open the text file called "pasteToMetadata.txt" made by and then select and copy the results to your clipboard.


  • Open your master, collection-level metadata.
  • Paste the results from "pasteToMetadata.txt" into the bottom of the Filename column in the metadata Excel file, skipping approximately 10 rows from where the metadata ends.
  • In Excel, highlight the entire Filename column.
  • Find the Conditional Formatting tool in the middle of the Home ribbon. Click here and choose Highlight Cells Rules>Duplicate Values. Chose how you want duplicate values to be highlighted and click OK.

Parsing 2.png

  • The rows in the metadata that correspond to your items from the Scans folder/folders will be highlighted as you have chosen.
  • Group the highlighted metadata together go to Sort & Filter on the right hand side of the Home ribbon. Choose custom sort. Sort by Filename, Sort on Cell Color, and choose to put the selected color of cells On Top or On Bottom.

File:Paring 3.png

  • The colored rows will group together at the top or the bottom, whichever you have chosen. The reason we left blanks in between what was pasted is so that the results from "pasteToMetadata.txt" would not intermix with the actual metadata. The highlighted items are the ones we want to parse out metadata for. The non-highlighted items are the ones we don't want metadata for at this time.
  • Note that ALL of the items you pasted from "pasteToMetadata.txt" should be highlighted. If not, that means that you have scans for items with no metadata. Identify and correct this problem immediately!
  • Extract the highlighted rows of metadata and place them, along with the header row, on another spreadsheet. Use this spreadsheet to give to the metadata librarian and to create the metadata text file that will go in the completed folder and be used to make Mods. Be sure to add the batch number to the Batch column. Name the file appropriately using the batch extention, i.e., u0003_0000001.4 or u0003_0000001.4.m01.
    • On some collections, we remove the metadata of uploaded items from the main metadata sheet and place it on a sheet of Uploaded Metadata for the collection. These sheets can be found in S:\Digital Projects\Administrative\collectionInfo\Storage_Excel.
  • On the original metadata spreadsheet, delete the rows you pasted in from "pasteToMetadata.txt", remove conditional formatting rules, and set the spreadsheet to once again sort on the filename values.

Parsing Rohlig Audio Collection Metadata

  • As of 2010 July this tutorial and the accompanying script do not apply to metadata and scans for the Rohlig Audio Collection as the metadata for this collection is made per item by Digital Services from other sources and because of filenaming differences for audio collections vs. image collections. Therefore, in order to parse out metadata for items for this collection ready for storage/delivery, one must parse out the metadata "by hand".