Collection Information

From UA Libraries Digital Services Planning and Documentation
Revision as of 13:06, 14 July 2016 by Cjchatnik (talk | contribs)

The collection level record is important for several reasons.

  1. It serves to provide basic information about the collection at the collection level in our preservation repository, which will enable archivists of the future to determine what is in this directory and whether to reconstruct it;
  2. It serves to provide the information that enables a script to select an appropriate icon for each collection, which shows up in the collection browse interface;
  3. It serves to provide the information displayed in the collection browse interface;
  4. It contains information that is fed into the InfoTrack database for tracking the digitized collections;
  5. It provides the title used for the LOCKSS manifest for the collection, to assist in digital preservation, and
  6. It provides a landing page in Acumen for collections that do not have EAD finding aids.

The collection XML file should be uploaded with the first batch of TIFF files to the libcontent server by using a moveContent script, after the uploaded delivery content (jpegs, mp3s, metadata) have been indexed by Acumen. If the collection XML file is uploaded prior to indexing, the link to the collection from the collection browse page will be a dead link.

If you are the first person to work on a manuscript collection, you should create an xml file (using the 'Notepad' program on the PC; or 'TextEdit' on the Mac, saving as ANSI/ASCII text on Windows and as UTF-8 on the Mac) that contains the following information, in this order and using the following template, with no carriage returns (newlines) or formatting within any element.

Please make sure that all encoding saved in these files (like the tab-delimited files from the Excel exports) is either ANSI ASCII, or UTF-8. That means, if any Word or PDF content was cut and pasted and transferred into these files – overwrite all the quotes, hyphens, apostrophes, and non-keyboard encodings with plain text or UTF-8 unicode. Otherwise they’ll be garbage. In addition, if you MUST use “&”, encode it as & so it won’t break the xml.

Save the file in the Admin folder for the collection, naming it for the collection itself. Thus, u0003_0000252.xml for the Cabaniss papers.

If more than one digital collection is created for a given analog collection, distinguish following collection file names sequentially as if they are pages. That is, if we return to the Cabaniss Papers and digitize some descriptions of tigers in Africa, the collection xml file for this second set of content would be saved as u0002_0000252_002.xml, and the original xml file should have "_001" appended onto the name prior to the xml prefix, as at that point, u0002_0000252.xml becomes the first in a sequence (u0003_0000252_001.xml), rather than a representation of the complete analog collection.


<?xml version="1.0" encoding="UTF-8"?>
<collInfo xmlns:xsi="" xsi:noNamespaceSchemaLocation="">

Step by step instructions

This section describes how to make Collection Information XML files from "scratch" using the file: S:\Digital Projects\Organization\Digital Program\Selection.xlsx.

In general, most of the the values such as Digital Collection Name, etc. are already in the Descriptive Metadata spreadsheets handed to us by the archivists as well as the TrackingFilenames spreadsheet. That is to say that most of the work in determining such values is already done by the Archivists.

All of this information should be available from the S:\Digital Projects\Organization\Digital Program\Selection.xlsx which was filled in by the archivists. If this information is not available in the given spreadsheet, please contact the archivists to ask for it.

See this page for information on matching values across documents.

Entering information into the template

  • Insert values between the the two tags that make up a field. For example: <sample_tag>text</sample_tag>
  • Do NOT add extra spaces before or after the value within a field. For example, this would be INCORRECT: <sample_tag> text </sample_tag>
  • Do NOT insert new lines into a field. For example, this would be INCORRECT: <sample_tag>text

more text</sample_tag>


  • Enter value from "Project Name" column
  • This value -- exactly! -- must be entered in the digital collection column of the metadata spreadsheet for each object, so that we can retrieve all the contents of a collection with a search.


  • Enter value from "Alphabetize" column
  • IF THIS VALUE IS NOT AVAILABLE, enter what makes sense. This is NOT an optional field! It determines how things appear on our collections page.
  • DO NOT precede the value in the "Alphabetized By" element with "a", "an", or "the". We don't want to sort things by definite or indefinite articles.


  • Enter value from "Genre/Type" column
  • These are the only acceptable values: book, image, text, audio, video, mixed media, finding aid, score, other.
  • This value determines what icon will be displayed, if we do NOT have one for this collection -- so choose wisely. Most manuscript content should be "text". Sheet music should be "score".
  • USE ONE TYPE ONLY (for example, DO NOT enter 'image; text' -- choose ONE).


  • Enter value from "Name of Analog Collection" column
  • More information on the Analog Collection Title is here: Analog Collection Title


  • Enter value from "Manuscript Number" column
  • Always give in this format: MS ####.
  • Exception: When the official manuscript number is given in a different format, use that number as-is.
    • Example: RG.001 (manuscript number from University Archives).


  • Insert the collection purl into this field:
  • Search for the Analog Collection name in these 2 places:
  1. [| The new interface] ... here, enter the collection number, for example, for MS 252, enter
  2. [| The old interface]

If you find a live link to a finding aid, enter it in the above field. If you do not, leave the field empty. NOTE: If inputting the collection number to Acumen gets you something, but it's not a finding aid, it's probably a default link to the very xml you're trying to create. Which means it's already in Acumen, and you don't need to make it. Do not put this link in the Finding Aid Link field. If no finding aid can be found, leave this field blank.

If you did not find a Manuscript Number in the spreadsheet, but you do find a finding aid online, use the finding aid to fill in the Manuscript Number field. If possible, use 4 digits for the MS number, left-padding with zeros so that MS 14 would be written MS 0014. Always give the Manuscript Number in this format: MS 1632

Alternative forms the Manuscript number may currently have, in the filename or elsewhere:

  1. ms_1952.pdf would be MS 1952
  2. Reference Code USALM1952 would be MS 1952
  3. MSS.0630 would be MS 0630 or MS 630
  4. u0003_0000252 would be MS 0252 or MS 252


  • Enter value from "Blurb" column


<?xml version="1.0" encoding="UTF-8"?>

<collInfo xmlns:xsi="" xsi:noNamespaceSchemaLocation="">

<Digital_Collection_Name>Septimus D. Cabaniss Papers</Digital_Collection_Name>

<Alphabetized_By>Cabaniss, S. D.</Alphabetized_By>


<Analog_Collection_Name>The S.D. Cabaniss Papers, 1820-1937</Analog_Collection_Name>

<Manuscript_Number>MS 0252</Manuscript_Number>


<Digital_Collection_Description>Materials from the papers of this nineteenth-century Madison County, Alabama, attorney who drafted a controversial will for wealthy planter Samuel Townsend which manumitted certain slaves and designated them as Townsend's primary heirs. Selected items include Townsend's will, a deposition given by S.D. Cabaniss concerning his role in the estate, and a report by Rev. William D. Chadick discussing the prospect of settling the newly-manumitted Townsend heirs in Ohio.</Digital_Collection_Description>