Harvesting Tags

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
(When the rotation is complete:)
Line 5: Line 5:
 
updateAcumenOnly will extract tags and image names from the steve_museum software, dedupe, and check the XML file for each item in Acumen.  If no XML file, it creates one, inserting tags (and the number of times each tag was used this round).  If there is an XML file, the script checks to see if it contains any of the new tags -- if so, it updates the count for those, and adds any tags not yet entered.  This version does NOT log retrieval in the InfoTrack database.
 
updateAcumenOnly will extract tags and image names from the steve_museum software, dedupe, and check the XML file for each item in Acumen.  If no XML file, it creates one, inserting tags (and the number of times each tag was used this round).  If there is an XML file, the script checks to see if it contains any of the new tags -- if so, it updates the count for those, and adds any tags not yet entered.  This version does NOT log retrieval in the InfoTrack database.
  
 
+
Copies of the tag files are written to /srv/deposits/crowdsourcing/tags/ for archiving and to the Digital_Program_files/Tags/ directory on the Share drive in Special Collections, for access by the archivists.  In the latter, older versions of tag files are overwritten by new ones.  In the former, the files are versioned for archiving.
 
+
 
+
  
 
== When the rotation is complete: ==
 
== When the rotation is complete: ==

Revision as of 10:05, 13 April 2012

During the rotation in the tagging software:

Use the `updateAcumenOnly` script in /srv/scripts/tagging/. The script expects a collection name on the commandline, to specify which collection to extract.

updateAcumenOnly will extract tags and image names from the steve_museum software, dedupe, and check the XML file for each item in Acumen. If no XML file, it creates one, inserting tags (and the number of times each tag was used this round). If there is an XML file, the script checks to see if it contains any of the new tags -- if so, it updates the count for those, and adds any tags not yet entered. This version does NOT log retrieval in the InfoTrack database.

Copies of the tag files are written to /srv/deposits/crowdsourcing/tags/ for archiving and to the Digital_Program_files/Tags/ directory on the Share drive in Special Collections, for access by the archivists. In the latter, older versions of tag files are overwritten by new ones. In the former, the files are versioned for archiving.

When the rotation is complete:

Use the `getCollTags` script in /srv/scripts/tagging/. The script expects a collection name on the commandline, to specify which collection to extract.


getCollTags does the same as updateAcumenOnly, but ALSO logs retrieval in the InfoTrack database. Both of these scripts add a recordModificationDate in a comment within the XML file for the tags.

NOTE

Tags are stored in separate XML files utilizing this simple schema. In Acumen, these files are located in the Metadata folder at the item level, and the files are named according to this format: itemID.tags.xml where itemID is the item identifier for the item being described. The "confidenceLevel" is the number of times a specific tag has been applied to this item by a user.

Personal tools