After several discussions with our web programmer, Will Jones, we decided on a method of incorporating a variety of forms of dates with a single set of encodings.
- There can be a begin date and an end date, or a single date.
- The date can be an approximation ("circa") on year, month, or day values.
- The date can be missing the day value, the month value, or both day and month value, but it must have a valid year.
- The date will be provided to the indexer in the form of a timestamp (yyyy-mm-ddThh:mm:ssZ).
So, the scripts look for the MODS key date (see the "Specific DLF/Aquifer Guidelines" for dates in the MODS originInfo field) -- and if not found, looks for (in order of priority):
It will also look for "point" attributes ("start" and "end" tags) on dates, which would indicate a time span. Additionally, it will look for date entries that contain a range of dates, and will turn those into a start and end date.
Spelled out months (such as "September") are translated into numeric values. "Circa", "ca", "approx.", and other similar values are used to generate the "circa" value to the level of the most specific date value provided (year, month, or day).
Errors that it looks for are:
- invalid years (prior to 1500 or later than current year)
- end date prior to start date
- invalid number of days in a month (yes, it checks leap years too!)
- invalid month values (greater than 12)
- no year value
Errors are written to a file, indicating the item where they were found, and these errors are mailed to the metadata librarians when the script ends.
The acumenFacets script is in /srv/scripts/metadata/faceting/ (and uses a library there). It runs on a cron job, weekly, over MODS that have been collected (via upload scripts by digital services and metadata librarians) into the inMODS directory there. Modified files are copied to the deposits directory for archiving (/srv/deposits/content/coll_number/Metadata/ ) and also to Acumen for indexing and delivery.
There is also a version there (acumenFacetsAllDirs) that can be run over all Acumen content, should we change how we want to do things (it was used when we first implemented this).