Image Technical Metadata

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
(Created page with "For our TIFF images, we currently are generating and storing output from the [http://projects.iq.harvard.edu/fits File Information Tool Set (FITS)], as well as [http://www.loc...")
 
Line 1: Line 1:
For our TIFF images, we currently are generating and storing output from the [http://projects.iq.harvard.edu/fits File Information Tool Set (FITS)], as well as [http://www.loc.gov/standards/mix/ MIX (NISO Metadata for Images in XML schema)].  Our MIX (version 2) profile is described in [[file: ]]
+
For our TIFF images, we currently are generating and storing output from the [http://projects.iq.harvard.edu/fits File Information Tool Set (FITS)], as well as [http://www.loc.gov/standards/mix/ MIX (NISO Metadata for Images in XML schema)].  Our MIX (version 2) profile is described in [[File:MIX-2.0.xlsx]].
 +
 
 +
During quality control testing, the QC script used by staff members generate FITS files and tests them for errors. Staff members review the errors, and if warranted, repair the file and test again.
 +
 
 +
The scripts which generate derivatives for the web test the upload directories to ensure there's a FITS file for each TIFF, and will not continue until there is.
 +
 
 +
The scripts which upload the TIFFs for deposits to be processed for the archive, retest the FITS files for errors.  If none are found, this script also generates MIX files and uploads some information to our md5sum:imageTechMed database table for quick reference. 
 +
The information currently stored in this table include:
 +
# the identifier,
 +
# mime type,
 +
# format,
 +
# format version,
 +
# whether the file contains a thumb,
 +
# date of testing,
 +
# format registry,
 +
# format registry key,
 +
# whether there are any conflicts between the findings of the various FITS tools,
 +
# and if any conflicts were found in the identity of the file:
 +
## what the alternative format,
 +
## format version,
 +
## and mime type are.
 +
 
 +
Other tables in this database include itemSums and modified. 
 +
 
 +
The md5sums:itemSums table includes:
 +
# the identifier
 +
# file name
 +
# file path
 +
# md5sum
 +
# byte size
 +
# whether this file has been modified
 +
# when it was first entered into the system
 +
# and notes.
 +
 
 +
The md5sums:modified table includes:
 +
# the identifier
 +
# when the file was changed
 +
# file name
 +
# previous md5 sum
 +
# previous byte size
 +
# the reason it was modified
 +
# the party responsible.
 +
 
 +
 
 +
In the archive, the FITS and MIX files are named for the TIFF, with extensions ".mix.xml" and ".fits.xml".  If the file was repaired or modified, there will be versions of these files as well:  the first version (fileNumber.mix.v1.xml and fileNumber.fits.v1.xml)  would be from the first version of this file;  v2 from the second version, etc.  '''The unversioned MIX and FITS files are the CURRENT ones.'''  All technical metadata is stored in a Metadata directory at the level to which it applies.  So if this is technical metadata for a page, then in the archive on the page level there will be a Metadata directory containing the technical (and any other) metadata for that page.

Revision as of 08:04, 27 August 2014

For our TIFF images, we currently are generating and storing output from the File Information Tool Set (FITS), as well as MIX (NISO Metadata for Images in XML schema). Our MIX (version 2) profile is described in File:MIX-2.0.xlsx.

During quality control testing, the QC script used by staff members generate FITS files and tests them for errors. Staff members review the errors, and if warranted, repair the file and test again.

The scripts which generate derivatives for the web test the upload directories to ensure there's a FITS file for each TIFF, and will not continue until there is.

The scripts which upload the TIFFs for deposits to be processed for the archive, retest the FITS files for errors. If none are found, this script also generates MIX files and uploads some information to our md5sum:imageTechMed database table for quick reference. The information currently stored in this table include:

  1. the identifier,
  2. mime type,
  3. format,
  4. format version,
  5. whether the file contains a thumb,
  6. date of testing,
  7. format registry,
  8. format registry key,
  9. whether there are any conflicts between the findings of the various FITS tools,
  10. and if any conflicts were found in the identity of the file:
    1. what the alternative format,
    2. format version,
    3. and mime type are.

Other tables in this database include itemSums and modified.

The md5sums:itemSums table includes:

  1. the identifier
  2. file name
  3. file path
  4. md5sum
  5. byte size
  6. whether this file has been modified
  7. when it was first entered into the system
  8. and notes.

The md5sums:modified table includes:

  1. the identifier
  2. when the file was changed
  3. file name
  4. previous md5 sum
  5. previous byte size
  6. the reason it was modified
  7. the party responsible.


In the archive, the FITS and MIX files are named for the TIFF, with extensions ".mix.xml" and ".fits.xml". If the file was repaired or modified, there will be versions of these files as well: the first version (fileNumber.mix.v1.xml and fileNumber.fits.v1.xml) would be from the first version of this file; v2 from the second version, etc. The unversioned MIX and FITS files are the CURRENT ones. All technical metadata is stored in a Metadata directory at the level to which it applies. So if this is technical metadata for a page, then in the archive on the page level there will be a Metadata directory containing the technical (and any other) metadata for that page.

Personal tools