Watching Our Backs
Currently, we are storing archival content (described here: formats on a Linux server, in the directory structure described here: Organization_of_completed_content_for_long-term_storage. Content which we want included in LOCKSS (ADPNET) is linked into Manifests (as described here: File_Naming_and_Linking_for_LOCKSS.
Once this archival content is placed into storage, anything linked into the Manifest.html pages should NOT be changed. However, since our new delivery platform derives from this stored content, we need to be able to update the metadata as needed. Hence, the primary metadata file is copied as a versioned file, and the versioned file is what is linked into the Manifests for LOCKSS pickup. The metadata file which is NOT versioned is the most recent, and over-writeable, copy.
For the content which is not allowed to change, we have scripts running weekly to verify that the md5 checksum has not changed, prior to the full tape backup. If there's an error, we are notified in time to retrieve a good copy from a previous backup, before the corrupted item can be written to tape.
On libcontent, in /srv/scripts/md5/cya/, the script which calculates and checks sums is called "md5check".
First it looks for new holder areas (such as u0001, u0003, etc. in the archive). If there are any, it sets up a new location for checksums for that holder. Within each holder directory in the checksums area, there's a file for each collection. Each collection's file contains all the checksums for that collection.
Second, md5check goes through each holder area in turn, checking collections one at a time. If the collection exists, it checks the checksums for each file. If they have changed, it outputs an error. While traversing the archive it notes new content, and generates checksums for those before going on to the next collection.
Third, if any new collections were noted in the holder area, the script creates collection md5 files and generates checksums for the content before going on to the next holder area.
Fourth, the script goes through the new holder areas, finds new collections, notes any new content in existing collections.
The script then logs in with the checkscripts database to verify that it ran, the timestamp, and any errors.
Following this script, another script named storeFileSums runs. This one goes through all the gathered checksums, compares them to the ones in the md5sums mysql database, verifies (again) that nothing has changed, and enters new checksums that are found, with a timestamp.
This database is backed up monthly, and regular copies kept on libcontent.lib.ua.edu.
Errors are sent to email@example.com and firstname.lastname@example.org.
This script also logs in with the checkscripts database to verify that it ran, the timestamp, and any errors.
Automated Script Verifications
Once a week, a script called "checkscripts" on libcontent.lib.ua.edu in /srv/scripts/cya looks through the entries in the checkscripts database for the past week, and compares them with the list of entries of existing cron scripts and when they are due to run.
The checkscripts database consists of 2 tables: "ran" and "scripts." Ran contains an auto-incrementing primary key (num), scriptid (identifier for the script, again a number), datestamp (for when this script ran) and errors.
Scripts contains a numerical id (which corresponds to the scriptid in "ran"), scriptname, server, directory where the script resides, the cron job that calls it, and a textual description of when it runs ("runswhen") and what it does ("doeswhat") as well as names of any scripts that it precedes (preceeds) or succeeds (succeeds) in order to work properly.
If any scripts did NOT run, which were scheduled, or any of them logged errors, this script sends emails to notify us of problems. It also sends email to reassure us that all scripts ran on time (and if any extra runs were logged). Of course, this script also logs in with the checkscripts database to verify that it ran, and when.
Since the checksums are backed up in a MySQL database, and a second MySQL database manages the scripts, there's a cron script (called "backups") on libcontent.lib.ua.edu (in /srv/scripts/cya/; backups go into /srv/backups/ ) which backs up selected databases monthly. The script deletes backups over a year old.
Currently the list of databases backed up on libcontent.lib.ua.edu is this:
- InfoTrack ( see Tracking_for_the_long_term)
- md5sums ( for more info, see Tracking_for_the_long_term and Image_Tech_Metadata
- acumen_staging (the test database for Acumen)
Critical File Backups
The Windows Share drive has a curious property in that any space not needed by current content is used for "shadow copies" (backups) of recently changed files. However, sometimes we have critical files that we do not touch for months; yet if they disappear, it can be a nightmare to try to reconstruct them (as we discovered with the Woodward master spreadsheet, for example). Files that go untouched will not have shadow copies, and hence have no backup.
Therefore, we have a system for capturing these files. In S:\Digital Projects\Administrative\EMERGENCY there is a spreadsheet called critical_files.xlsx, which contains 3 columns: document title, location (a file path), and type ("File Folder" needs to be indicated here if all subfiles are to be included). This is where we document what our critical files are. Whenever this changes, a new tab-delimited export must be made and placed in the same directory ("critical_files.txt"). All of these files and directories must have names that work on Linux: no spaces, commas, or punctuation other than underscores, periods, and hyphens.
Once a week, on the libcontent server, a script called "critFiles" (in /srv/scripts/cya/) picks up this list and crawls these directories, comparing the files with the copies in /srv/backups/criticalFiles/. Each file in these directories has a datestamp attached to the file name, after an underscore. If the most recent copy there does not match the current version, a new version is created in this directory.
If a file in the backup system does NOT exist any more on the share drive, an error message is emailed out to the head of digital services and uadigitalservices gmail account.
Since we rely heavily on this wiki, if the network goes down or the server is compromised, we want access to a copy. Therefore, a script called grabWiki in /srv/scripts/crawler collects the wiki pages once a month, translates them into html and corrects the internal links, then does 2 things with them:
- puts a zip archive of the entire wiki on the share drive, in the S:\Digital Projects\Administrative\EMERGENCY folder
- places the HTML versions in the /home/ds/public_html/wiki directory, so a snapshot can be accessed from the libcontent server as static html: http://libcontent.lib.ua.edu/~ds/wiki/.