User transcription

From UA Libraries Digital Services Planning and Documentation
Jump to: navigation, search

The information on this page is deprecated, now that Acumen has incorporated transcription crowdsourcing.

The work flows described here have multiple parts. Each of these requires different tasks, and uses different (though sometimes related) scripts.

  1. adding selected content: Adding_Transcription_Content
  2. harvesting content: Harvesting_Transcriptions
    1. harvesting content during the rotation
    2. harvesting content at the end of the rotation
  3. deleting content from the transcription software: Deleting_Transcription_Content
  4. archiving the transcripts: Archiving_Transcriptions
  5. collecting stats Transcription_stats

The databases for the transcription software are: transcribe (used by Omeka) and trmediawiki (used by MediaWiki); I think both are used by the Scripto plugin for Omeka, to tie the two together. Some analysis of these databases can be found in /srv/scripts/transcripts/mediawiki/dbase and toExtractText and in /srv/scripts/transcripts/omeka/dbase and doThis.

InfoTrack tables

Tables have been created in InfoTrack (MySQL database) on libcontent to track success/failure of rotations of content.

  1. numItemPages lists each item in u0003 and u0002 online, the number of pages per item, the number of pages transcribed, and the number of pages OCR'd. This table is updated by numPagesAndTrans in the /srv/scripts/transcripts directory.
  2. userTranscribed lists each page (if it's a multi-page item) or each item (if it's a single page), the collection number, the box (if one was specified during upload and identified from the MODS), and a Y if transcribed, an N if not transcribed during a rotation. These entries are initiated by the dbload script, and updated by the transLive scripts (for successes) and the deleteSelectedTrans script (for not successful).
  3. dates_transcription lists the item identifiers, the date each was loaded into the transcription software, the date it was taken out and an integer for the number of transcriptions harvested during this round. Hence, the same item may be reloaded multiple times, and have multiple entries in this table. Again, this table is initiated by the dbload script when loading in the content into the transcription software, and then updated by the deleteSelectedTrans script, which identifies the date something was pulled.