Harvesting tags and transcripts

From UA Libraries Digital Services Planning and Documentation
Jump to: navigation, search

Tagging and transcriptions are both now supported directly in Acumen.

As they are captured, both tags and transcripts are stored in a server-side database named (appropriately) "acumen". On the first day of every month, two cron scripts run to pull content out of this database, compare it against what's in the archive, and if it's newer or changed, place it in deposits for archiving.

tagsToDepo in /srv/scripts/tagging first pulls item identifiers, and then for each one, pulls tags, both from the tags_assets table in the acumen database. Then for each identifier, the tags are formatted according to the schema. These tag files are compared to what is current in the archive, and if the new tag file is different, it is moved to /srv/deposits/crowdsourcing/tags/ for the archiving process. Otherwise, it deletes it.

transAcToDepo in /srv/scripts/transcripts finds most recent transcripts in the transcripts table which were written in the past month and writes them to file. It then compares them to what's in the archive, and if different, moves them to /srv/deposits/crowdsourcing/transcriptions/ for the archiving process. Otherwise it deletes them.

For more information on the archiving process, see for Archiving.