User transcription

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
(InfoTrack tables)
 
(2 intermediate revisions by one user not shown)
Line 9: Line 9:
 
# archiving the transcripts:  [[Archiving_Transcriptions]]
 
# archiving the transcripts:  [[Archiving_Transcriptions]]
 
# collecting stats [[Transcription_stats]]
 
# collecting stats [[Transcription_stats]]
 +
 +
 +
''The databases for the transcription software are: transcribe (used by Omeka) and trmediawiki (used by MediaWiki);  I think both are used by the Scripto plugin for Omeka, to tie the two together.  Some analysis of these databases can be found in /srv/scripts/transcripts/mediawiki/dbase and toExtractText and in /srv/scripts/transcripts/omeka/dbase and doThis.''
 +
 +
 +
 +
 +
== InfoTrack tables ==
 +
 +
Tables have been created in InfoTrack (MySQL database) on libcontent to track success/failure of rotations of content.
 +
 +
#'''numItemPages'''  lists each item in u0003 and u0002 online, the number of pages per item, the number of pages transcribed, and the number of pages OCR'd.  This table is updated by numPagesAndTrans in the /srv/scripts/transcripts directory.
 +
#'''userTranscribed''' lists each page (if it's a multi-page item) or each item (if it's a single page), the collection number, the box (if one was specified during upload and identified from the MODS), and a Y if transcribed, an N if not transcribed during a rotation.  These entries are initiated by the dbload script, and updated by the transLive scripts (for successes) and the deleteSelectedTrans script (for not successful).
 +
#'''dates_transcription''' lists the item identifiers, the date each was loaded into the transcription software, the date it was taken out and an integer for the number of transcriptions harvested during this round.  Hence, the same item may be reloaded multiple times, and have multiple entries in this table.  Again, this table is initiated by the dbload script when loading in the content into the transcription software, and then updated by the deleteSelectedTrans script, which identifies the date something was pulled.

Latest revision as of 16:15, 18 December 2012

The work flows described here have multiple parts. Each of these requires different tasks, and uses different (though sometimes related) scripts.

  1. adding selected content: Adding_Transcription_Content
  2. harvesting content: Harvesting_Transcriptions
    1. harvesting content during the rotation
    2. harvesting content at the end of the rotation
  3. deleting content from the transcription software: Deleting_Transcription_Content
  4. archiving the transcripts: Archiving_Transcriptions
  5. collecting stats Transcription_stats


The databases for the transcription software are: transcribe (used by Omeka) and trmediawiki (used by MediaWiki); I think both are used by the Scripto plugin for Omeka, to tie the two together. Some analysis of these databases can be found in /srv/scripts/transcripts/mediawiki/dbase and toExtractText and in /srv/scripts/transcripts/omeka/dbase and doThis.



[edit] InfoTrack tables

Tables have been created in InfoTrack (MySQL database) on libcontent to track success/failure of rotations of content.

  1. numItemPages lists each item in u0003 and u0002 online, the number of pages per item, the number of pages transcribed, and the number of pages OCR'd. This table is updated by numPagesAndTrans in the /srv/scripts/transcripts directory.
  2. userTranscribed lists each page (if it's a multi-page item) or each item (if it's a single page), the collection number, the box (if one was specified during upload and identified from the MODS), and a Y if transcribed, an N if not transcribed during a rotation. These entries are initiated by the dbload script, and updated by the transLive scripts (for successes) and the deleteSelectedTrans script (for not successful).
  3. dates_transcription lists the item identifiers, the date each was loaded into the transcription software, the date it was taken out and an integer for the number of transcriptions harvested during this round. Hence, the same item may be reloaded multiple times, and have multiple entries in this table. Again, this table is initiated by the dbload script when loading in the content into the transcription software, and then updated by the deleteSelectedTrans script, which identifies the date something was pulled.
Personal tools