Checking Acumen vs. Archive
In general, we expect everything in Acumen to be in the archive, and vice versa (however, there are some exceptions, noted below) -- so it's a sanity check to regularly compare the contents of the two. We have had situations where tiffs were lost before or during transition to the server, so we had content in Acumen without content in the archive. We have also had situations where content made it all the way to the archive, but was never put online -- or the derivative-generating script was cut short due to networking or other issues, so derivatives for some content was not in Acumen.
On libcontent, in the /srv/scripts/stats directory, there are two scripts for this purpose. One is acumenToArchiveDiff and one is archiveToAcumenDiff. The names reflect what they do. acumenToArchiveDiff looks at what's in Acumen, and then checks those against what's in the archive; and archiveToAcumenDiff looks at what's in the archive, and then checks those against what's in Acumen. The result files are written to the output directory.
What is compared are TIFFs and WAVEs in the archive and large (2048 pixel) JPEGs and MP3s in Acumen. Since we're looking for a one-to-one comparison of ids, this process does NOT yet recognize MP3s when multiple ones are generated from a single WAV file. Those are listed in the results file under "OKAY TO DIFFER" -- and new collections that meet this description need to be noted in the scripts, in order to be included.
Other content that notably should not be in one or the other area include the following:
Not in the archive
- u0015_0000002 Undergraduate research projects
- t0003 Tuskegee finding aids
- u0007_0000001 UA Video collection "Realizing the Dream" is copyright protected, and though we were required to put it online, we do not have the rights to preserve it or make copies, so it's not in the archive.
- Content that is in deposits awaiting archiving
Not in Acumen
- u0011_0000011 Publisher's Bindings Online (this is hosted elsewhere)
- u0005_0000002 County maps that were digitized by the cartographic lab (hosted elsewhere)
- Content that has been taken down for a reason, which should be noted in a problems.txt file in the collection Documentation directory in the archive.