Removing a TIFF or WAV

From UA Libraries Digital Services Planning and Documentation
Revision as of 14:43, 15 May 2015 by Jlderidder (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Generally speaking, this event should never occur. Once archival content is in our archive, it is expected to live there in perpetuity, safeguarded against a variety of threats.

If the image or audio represented online is what needs to be removed, that is far simpler and has far fewer repercussions than removing the archival file. If that will suffice, see Removing a derivative.

Content in the archive may already be in LOCKSS. If so, removing it locally does not reduce our preservation costs via that system (or any other preservation system). In fact, removing it complicates tracking, as there will be a discrepancy between the local content and the caches. Therefore, DOCUMENTATION is very important; otherwise, someone may think we have suffered an unintentional loss. Therefore, start with that:

1) In the collection documentation directory in the archive, edit or add to the problems.txt file to specify the identifier(s) of what is being removed, why, at whose request and on what date. For example:

  vi  /srv/archive/u9999/9999999/Documentation/problems.txt
  enter:  "u9999_9999999_0001234_0002.tif was removed from the archive on 5/5/2014 at the request of Fred Jones, because it is a duplicate of u9999_9999999_0001234_0001.tif"

NEXT: consider all the different protection systems we have in place for this file. Each one must be addressed separately.

2) You will need to add an entry to the md5sums.modified table on the libcontent server, to document the size, existing md5sum, location, id number, date, and why this file was removed. We HAVE TO HAVE the size, as we need it to estimate our preservation costs for years and years to come, and we will no longer have the file itself to refer to. You should be able to locate the size and md5sum in the md5sums.itemSums table. Example:

  mysql -u ds -p
  {enter password}
  use md5sums;
  select byteSize, md5sum from itemSums where id like "u9999_9999999_0001234_0002";
  insert into modified (changedWhen, id_2009, fileName, previousMD, byteSize, reasonModified, partyResponsible) values(NULL, "u9999_9999999_0001234_0002","u9999_9999999_0001234_0002.tif","a9sd8fa9dhaksjdfhpasodfpa8uasd","393845632", "Removed because it's a duplicate of u9999_9999999_0001224_0001.tif", "Fred Jones");

3) Next, remove the entry for this file from the md5sums.itemSums table. For example:

  delete from itemSums where id like "u9999_9999999_0001234_0002";

4) Then, remove entries for this file from md5sums.imageTechMed table:

  delete from imageTechMed where id like "u9999_9999999_0001234_0002";

5) Now, go find the flat file where we store the md5sums for comparison purposes on a weekly basis, and in case the database becomes corrupt.

 cd /srv/scripts/md5/cya/checksums

Here you will see directories that reflect the holder IDs that precede collection numbers in the file names, for example:

  u0003 u0008  p0003 w0001  u9999

Change into the correct directory for the content you're removing and look for the collection number, which will be only 7 digits, and which will contain ALL the md5 checksums for that collection. You will need to edit this file. Example:

 cd u9999
 vi 9999999

Hunt for the item number:


When you're on the line for that number, type in "dd" and that will delete that line. Save the file and close it.

6) Now, navigate to the archival directory where that file resides:

 cd /srv/archive/u9999/9999999/0001234/0002

You will need to delete everything here: OCR, transcripts, technical metadata, and descriptive metadata, as well as the file itself:

  rm -r *  (use this with care:  it removes everything in this directory, and all directories below this point!)
  cd ..  (go up one level)
  rmdir 0002   (remove the directory you were just in)

7) Next, correct the Manifest used by LOCKSS. You will find this in the collection documentation directory:

  cd /srv/archive/u9999/9999999/Documentation/

There may be multiple manifests, if it's a large directory: if you see Manifest2.html, for example, there are multiples to check. Simplest would be to type in:

  grep "u9999_9999999_0001234_0002.tif"  *.html

This should locate which Manifest.html contains the entry for this file. Back up the manifest (make a copy, with "_LOCKSS_datestamp" on the end of the filename, or "_datestamp" on the end if it's not in LOCKSS -- see Finding out if in LOCKSS )

Then edit the original file and delete the line that contains the filename in question. Save the file and exit.

8) Next, remove the web display. See Removing a derivative.