Image Technical Metadata

From UA Libraries Digital Services Planning and Documentation
Revision as of 09:21, 27 August 2014 by Jlderidder (talk | contribs) (The md5sum database.)

For our TIFF images, we currently are generating and storing output from the File Information Tool Set (FITS), as well as MIX (NISO Metadata for Images in XML schema). Our MIX (version 2) profile is described in File:MIX-2.0.xlsx.

During quality control testing, the QC script used by staff members generate FITS files and tests them for errors. Staff members review the errors, and if warranted, repair the file and test again. The current tested and approved method for repairs is to open the TIFF with Irfanview and save as an uncompressed tiff at 100%. Our tests with Photoshop and ImageMagick failed to repair all the critical (invalid or not well formed) errors we found.

The scripts which generate derivatives for the web test the upload directories to ensure there's a FITS file for each TIFF, and will not continue until there is.

The scripts which upload the TIFFs for deposits to be processed for the archive, retest the FITS files for errors. If none are found, this script also generates MIX files and uploads some information to our md5sum:imageTechMed database table for quick reference. (More information on this database below).

In the archive, the FITS and MIX files are named for the TIFF, with extensions ".mix.xml" and ".fits.xml". If the file was repaired or modified, there will be versions of these files as well: the first version (fileNumber.mix.v1.xml and fileNumber.fits.v1.xml) would be from the first version of this file; v2 from the second version, etc. The unversioned MIX and FITS files are the CURRENT ones. All technical metadata is stored in a Metadata directory at the level to which it applies. So if this is technical metadata for a page, then in the archive on the page level there will be a Metadata directory containing the technical (and any other) metadata for that page.

Example: /srv/archive/u0003/0000581/0000003/0002/Metadata/ will contain metadata about page 2 of the 3rd item in the WC Gorgas collection (Collection MS 581) in the Hoole library manuscript holdings:

  • The tiff is named u0003_0000581_0000003_0002.tif
  • The current MIX file is named u0003_0000581_0000003_0002.mix.xml
  • The current FITS file is named u0003_0000581_0000003_0002.fits.xml

The md5sum database.

The information currently stored in the md5sums:imageTechMed table includes:

  1. the identifier,
  2. mime type,
  3. format,
  4. format version,
  5. whether the file contains a thumb,
  6. date of testing,
  7. format registry,
  8. format registry key,
  9. whether there are any conflicts between the findings of the various FITS tools,
  10. and if any conflicts were found in the identity of the file:
    1. what the alternative format,
    2. format version,
    3. and mime type are.

Other tables in this database include itemSums and modified.

The md5sums:itemSums table includes:

  1. the identifier
  2. file name
  3. file path
  4. md5sum
  5. byte size
  6. whether this file has been modified
  7. when it was first entered into the system
  8. and notes.

The md5sums:modified table includes:

  1. the identifier
  2. when the file was changed
  3. file name
  4. previous md5 sum
  5. previous byte size
  6. the reason it was modified
  7. the party responsible.