Managing Incoming Digital Content

From UA Libraries Digital Services Planning and Documentation
Jump to: navigation, search

Whether digital content comes to Digital Services that has been digitized by someone else, or born digital, it’s much the same set of issues:

We need to:

  1. protect the original (even opening the file or virus checking it can change it)
  2. virus check it before anything else
  3. find out everything we can about it, including file dates, file structure, who created it, when, what’s it about, etc (hence the latest documentation files)
  4. (if on media) capture a snapshot of it if possible (ISO image), and md5 checksums before and after transfer (then compare!)
  5. find out whether we are to put it online, and/or
  6. find out if we have to preserve it

For the last two, more research may be needed:

  1. what rights do we have with regard to this content? Do we have the right to digitize it? To make copies? To make derivatives? To generate metadata? To preserve it? To provide open access to it? The answers to these questions may determine whether we do anything at all.
  2. what are the significant properties are of the file (text? image? video? software? audio?)
  3. what are the appropriate file types to normalize to (both for web delivery and for archiving)?
  4. what quality should we use for storing it (if the original is low quality, it won't make much sense to normalize it to a high quality archival file)
  5. what should we name it? This involves assignment to a collection, gathering collection metadata if we don't have it, and selecting appropriate item numbers
  6. what should we use for metadata?

Managing digital content created by anyone else is a whole new ball of wax. We are in the process of developing our procedures and workflows for this type of content, so the following is subject to change.

Upon receiving digital content in Digital Services, first notify the Head of Digital Services, who will contact the Associate Deans for prioritization and determination of preservation or web delivery needs.

Upon receiving born-digital content in Digital Services:

  1. Do not immediately open the media.
  2. Ask the person who provided you the content to fill out the [Incoming Digital Form] and the [Digital Services Permissions Agreement].
  3. Notify the Head of Digital Services.
  4. Fill in the BornDigitalDocumentation file to the best of your ability, asking questions of those who brought you the content.
  5. When we have the documentation we need, and the DS head authorizes you to move forward, see Born Digital Ingest for further instructions.

For content coming in from OUTSIDE UA Libraries, fill in the IncomingDigitalDocumentation file to the best of your ability, asking questions of those who brought you the content.

For content coming from INSIDE UA Libraries, at the Dean's request or with his approval: see In House Born Digital.


Do not immediately open the media.

All incoming media must undergo:

  1. Write protection, to avoid inadvertent alteration of content
  2. Virus-checking prior to any access or use

 Bear in mind that if we are to preserve this content and attest to it remaining unchanged, and if we are
 to make it usable again, and provide context, we need to retain the original files as is, in the directory structure as is, with 
 any files that may be referred to or utilized by other files, WITH the original file dates and information.
 We also have to capture as much information as we can about how the files were created, by whom, using what hardware/software
 (include versions of these), and why these files are important.

After virus-checking, IF this content is possibly going to be preserved:

  1. make an ISO capture of the DVD/CD onto designated external hard drive for analysis
  2. if not on a media that enables this, we will need to capture all file information first, one by one, directory by directory, before copying the entire directory structure elsewhere.

IF we are to make this content available online, try to identify the current delivery formats and extensions for the types of files included in the content, and document your findings. Communicate with the Head of Digital Services to determine how to proceed.

Formats & Support Levels

Incoming born-digital content is assigned one of the following categories and aligns with one or more Levels of Preservation:

  • Supported: we fully support the format. The format is sustainable - either open-source or widely-used original version - and with continued funding and support, we can offer Level I or Level II preservation.
  • Known: we can recognize the format, but cannot guarantee full support. This format will most likely be migrated to the appropriate archival format. Level II preservation support is available.
  • Unsupported: we cannot recognize the format. If selected, Level III preservation, or basic bit-level preservation only is applied.

Archival formats identified for digital preservation thus far:

Text/Document Files
Format Extension Support Level Preservation Format Access Format Normalization Tool
Text txt Supported Original format Original format none
HTML html, htm Supported Original format Original format none
XML xml Supported Original format Original format none
Adobe PDF pdf Known PDF/A or PDF/A-1a PDF
MS Word Doc doc, docx Known ODF (odt) ODF
MS Excel xls, xlsxc Known CSV (csv) or ODF (ods) ODF
MS PowerPoint ppt, pptx Known ODF (odp) PDF
Image Files
Format Extension Support Level Preservation Format Access Format Normalization Tool
TIFF tiff, tif Supported Original format JPEG
JPEG jpeg, jpg Supported Original format Original format none
JPEG 2000 jp2 Known JPEG or TIFF JPEG
GIF gif Supported Original format JPEG
PNG png Supported Original format Original format none
Audio Files
Format Extension Support Level Preservation Format Access Format Normalization Tool
WAVE wav Supported Original format MP3
AIFF aiff, aif, aifc Supported WAVE MP3
FLAC flac Known WAVE MP3

As a general rule, we prefer to preserve digital files that are either in their original, non-proprietary formats or widely adopted formats, are uncompressed or use lossless compression, have some sort of metadata support, and can be accessed using open source software (i.e., they do not require any proprietary software).


Textual documents include plain text (TXT, XML), formatted text, and word processing documents (PDF, DOC, XLS). Plain text files encoded in UTF-8 may be preserved in their original formats. Formatted text and word processing files may be migrated to PDF/A or the appropriate corresponding ODF format. Problems and flaws in formatting with the conversion of MS Word (DOC, DOCX) to PDF/A and ODT formats have been reported by various institutions (such as NCDCR and Archivematica), but both the Library of Congress (LOC) and the National Archives and Records Administration (NARA) recommend both formats as preferred for preservation. In fact, LOC recommends the use of PDF/A, stating that "For pragmatic reasons, when PDF/A is mandated, PDF/A-1b is usually acceptable. Full PDF/A-1a compliance, with tagged document structure, is hard to achieve except in a workflow that anticipates that objective from initial document creation." Corrado and Moulaison also advocate for PDF/A in arguing that "PDF/A is specifically designed for digital archiving, as it removes some of the features of PDF that are less desirable for digital preservation, including the ability to link to fonts instead of embedding them within the document" (Digital Preservation for Libraries, Archives, and Museums, p. 109).

Image Files

Audio Files

Personal tools