Managing Incoming Digital Content

From UA Libraries Digital Services Planning and Documentation
Revision as of 12:40, 10 June 2016 by Afmatheny (talk | contribs) (Guidelines)

Whether digital content comes to Digital Services that has been digitized by someone else, or born digital, it’s much the same set of issues:

General Needs

We need to:

  1. protect the original (even opening the file or virus checking it can change it)
  2. virus check it before anything else
  3. find out everything we can about it, including file dates, file structure, who created it, when, what’s it about, etc (hence the latest documentation files)
  4. (if on media) capture a snapshot of it if possible (ISO image), and md5 checksums before and after transfer (then compare!)
  5. find out whether we are to put it online, and/or
  6. find out if we have to preserve it

For the last two, more research may be needed:

  1. what rights do we have with regard to this content? Do we have the right to digitize it? To make copies? To make derivatives? To generate metadata? To preserve it? To provide open access to it? The answers to these questions may determine whether we do anything at all.
  2. what are the significant properties are of the file (text? image? video? software? audio?)
  3. what are the appropriate file types to normalize to (both for web delivery and for archiving)?
  4. what quality should we use for storing it (if the original is low quality, it won't make much sense to normalize it to a high quality archival file)
  5. what should we name it? This involves assignment to a collection, gathering collection metadata if we don't have it, and selecting appropriate item numbers
  6. what should we use for metadata?

Managing digital content created by anyone else is a whole new ball of wax. We are in the process of developing our procedures and workflows for this type of content, so the following is subject to change.

Upon receiving digital content in Digital Services, first notify the Head of Digital Services, who will contact the Associate Deans for prioritization and determination of preservation or web delivery needs.

Upon receiving born-digital content in Digital Services:

  1. Do not immediately open the media.
  2. Ask the person who provided you the content to fill out the [Incoming Digital Form] and the [Digital Services Permissions Agreement].
  3. Notify the Head of Digital Services.
  4. Fill in the BornDigitalDocumentation file to the best of your ability, asking questions of those who brought you the content.
  5. When we have the documentation we need, and the DS head authorizes you to move forward, see Born Digital Ingest for further instructions.

For content coming in from OUTSIDE UA Libraries, fill in the IncomingDigitalDocumentation file to the best of your ability, asking questions of those who brought you the content.

For content coming from INSIDE UA Libraries, at the Dean's request or with his approval: see In House Born Digital.


Do not immediately open the media.

All incoming media must undergo:

  1. Write protection, to avoid inadvertent alteration of content
  2. Virus-checking prior to any access or use

 Bear in mind that if we are to preserve this content and attest to it remaining unchanged, and if we are
 to make it usable again, and provide context, we need to retain the original files as is, in the directory structure as is, with 
 any files that may be referred to or utilized by other files, WITH the original file dates and information.
 We also have to capture as much information as we can about how the files were created, by whom, using what hardware/software
 (include versions of these), and why these files are important.

After virus-checking, IF this content is possibly going to be preserved:

  1. make an ISO capture of the DVD/CD onto designated external hard drive for analysis
  2. if not on a media that enables this, we will need to capture all file information first, one by one, directory by directory, before copying the entire directory structure elsewhere.

IF we are to make this content available online, try to identify the current delivery formats and extensions for the types of files included in the content, and document your findings. Communicate with the Head of Digital Services to determine how to proceed.

Support Levels

Incoming born-digital content is assigned one of the following categories and aligns with one or more Levels of Preservation:

  • Supported: we fully support the format. The format is sustainable - either open-source or widely-used original version - and with continued funding and support, we can offer Level I or Level II preservation.
  • Known: we can recognize the format, but cannot guarantee full support. This format will most likely be migrated to the appropriate archival format. Level II preservation support is available.
  • Unsupported: we cannot recognize the format. If selected, Level III preservation, or basic bit-level preservation only is applied.

See Formats and Preservation for more information on levels of support.


Archival formats identified for digital preservation thus far:

Text/Document Files
Format Extension Support Level Preservation Format Access Format Normalization Tool
Text txt Supported Original format Original format none
HTML html, htm Supported Original format Original format none
XML xml Supported Original format Original format none
Adobe PDF pdf Known PDF/A or PDF/A-1a PDF
MS Word Doc doc, docx Known ODF (odt) PDF
MS Excel xls, xlsxc Known CSV (csv) or ODF (ods) ODF
MS PowerPoint ppt, pptx Known ODF (odp) PDF
Image Files
Format Extension Support Level Preservation Format Access Format Normalization Tool
TIFF tiff, tif Supported Original format JPEG
JPEG jpeg, jpg Supported TIFF JPEG none
JPEG 2000 jp2 Known TIFF JPEG
GIF gif Unsupported Original format JPEG none
PNG png Supported TIFF Original format or JPEG none
SVG svg Supported SVG Original format none
Audio Files
Format Extension Support Level Preservation Format Access Format Normalization Tool
WAVE wav Supported Original format MP3
AIFF aiff, aif, aifc Supported WAVE MP3
FLAC flac Known WAVE MP3


We will retain at least one copy of all content in its original format when feasible. For other copies, as a general rule, we prefer to preserve digital files that are either in their original, non-proprietary formats or widely adopted formats, are uncompressed or use lossless compression, have some sort of metadata support, are well documented, and can be accessed using open source software (i.e., they do not require any proprietary software).


Textual documents include plain text (TXT, XML), formatted text, and word processing documents (PDF, DOC, XLS). Plain text files encoded in UTF-8 may be preserved in their original formats. Formatted text and word processing files may be migrated to PDF/A or the appropriate corresponding ODF format. Overall, plain text and PDF/A files are the preferred preservation formats. Problems and flaws in formatting with the conversion of MS Word (DOC, DOCX) to PDF/A and ODT formats have been reported by various institutions (such as NCDCR and Archivematica), but both the Library of Congress (LOC) and the National Archives and Records Administration (NARA) recommend both formats as preferred for preservation. In fact, LOC recommends the use of PDF/A, stating that "For pragmatic reasons, when PDF/A is mandated, PDF/A-1b is usually acceptable. Full PDF/A-1a compliance, with tagged document structure, is hard to achieve except in a workflow that anticipates that objective from initial document creation." Corrado and Moulaison also advocate for PDF/A in arguing that "PDF/A is specifically designed for digital archiving, as it removes some of the features of PDF that are less desirable for digital preservation, including the ability to link to fonts instead of embedding them within the document" (Digital Preservation for Libraries, Archives, and Museums, p. 109). Formats for online and in-house access will be PDF or plain text file as these formats are smaller, faster to access, and easier to upload to the web than preservation files.

Image Files

Image files include raster images (TIFF, JPEG, PNG) and vector images (SVG). All supported raster image formats may be migrated to TIFF version 6 for preservation, while unsupported raster formats will be preserved at bit level. While TIFF is a proprietary format owned by Adobe (see their TIFF documentation), it is widely adopted, well documented, and features no or lossless compression. According to Corrado and Moulaison, "many Archives consider the uncompressed TIFF the best format for digital preservation" (Digital Preservation, p. 109). NARA, Library of Congress, Archivematica, Harvard, and many other leading institutions in digital preservation use TIFF as their archival preservation format. SVG is considered the standard vector graphic format for preservation (see W3C's SVG documentation), and these files will be retained in their original format. The format for online and in-house access generally will be JPEG as it is smaller, faster to access/load, and easier to upload to the web than uncompressed preservation files.

Audio Files

Audio files include uncompressed audio (WAV and FLAC) and compressed audio (mp3). All supported audio files will be migrated to WAV format for preservation as it is an uncompressed format that will preserve the files at highest quality. Exceptions to this may be any mp3 files that are determined to be of value to the institution; these files may be retained in the original format. NARA, Library of Congress, the British Library, Archivematica, and IASA all recommend WAV as an archival preservation format. The format for online and in-house access will be MP3 using a bit-rate of 192 kbp/s as it is smaller, faster to access/load, and easier to upload to the web than uncompressed preservation files.