Scripts for spreadsheet work

From UA Libraries Digital Services Planning and Documentation
Revision as of 10:18, 22 March 2017 by Jlderidder (Talk | contribs)

Jump to: navigation, search

New workflow: After you have entered data in your spreadsheet including names and subjects. You will want to tag your subjects.

Remediation workflow: For spreadsheets that are in the metadata queue, you will want to review the spreadsheet and make edits. All names should be searched in the names database and replaced with database correspondence entry including the database number. All subjects should be searched in the subject database and replaced with subject database entry including its tags.

DB_SubjectTagging.pl (S:\Digital Projects\Administrative\scripts\Metadata\DB_SubjectTagging.pl) will read the text file version of the spreadsheet and make a list of Subjects that are in the spreadsheet and not in the subjects database. The list will be generated in the correct format needed to upload several at once with GUI_subsToDbase.pl (S:\Digital Projects\Administrative\scripts\Metadata\GUI_subsToDbase.pl) you would just need enter the tagged value. Needed for the script: Navigate to Text export of Metadata spreadsheet.

DB_SubjectReplace.pl (S:\Digital Projects\Administrative\scripts\Metadata\DB_SubjectReplace.pl) will do a find and replace with untagged subjects to tagged subjects using the database. If there are subjects that are in the spreadsheet but not in the database it will make a list of those subjects. Needed for the script: Navigate to Text export of Metadata spreadsheet.

spreadsheetCheck.pl (S:\Digital Projects\Administrative\scripts\Metadata\spreadsheetCheck.pl) Use to verify that your Metadata file is error free before creating MODS. Needed for the script: Navigate to Text export of Metadata spreadsheet. Which at the moment you can only do on the Digital Projects side. Error message will be located in (S:\Digital Projects\Administrative\output)


Subject Remediation (Database and Acumen)

  1. use /srv/scripts/metadata/subjectStuff/getSubjects to generate a list of alphabetized subjects in Acumen (writes alphaSubjects to output directory there).
  2. copy that to the share drive and open in Excel, then do these things:
    1. label the first column "SearchOn" (containing the output values), the 2nd column "Corrected", the third "Tagged Value", the 4th "Authority", the 5th "Collection", the 6th "Item Number"
    2. delete any that do not need correcting.
    3. separate out names for name authority remediation work.
    4. enter corrected value in Corrected, complete tagged value in Tagged Value, and the authority in Authority (use "local" if no authority assigned).
    5. if the entry should be deleted, put DELETE in the Corrected column
    6. if this is known to be in only one collection, add the collection number in the appropriate column.
    7. if this only needs to be remediated in a handful of known items (or less), add those IDs in the Item Number column (semicolon separated)
    8. check results for errors, failures to tag, etc.
    9. replace & with &
    10. export using ExcelConverter (in Share drive Digital Projects/Administrative/scripts/ExcelConverter) to unicode.
  3. copy export to /srv/scripts/metadata/regExStuff/subjectChanges/
  4. run fixSubjects
  5. check results
  6. return to spreadsheet and modify it in the following fashion:
    1. Check the database for any entries that contain DELETE, and delete by hand (InfoTrack:subjects)
    2. Remove any entries in the spreadsheet that contain DELETE
    3. Replace everything in the first column with ADD
    4. Replace the last 2 columns (Collection, Item Number) with Alternate Authority (one column)
    5. Split any multiple authorities into Authority and Alternate Authority
    6. export using ExcelConverter (in Share drive Digital Projects/Administrative/scripts/ExcelConverter) to unicode.
  7. copy export to S:\Metadata\Authorities\ToDatabase and name it databaseChanges.txt
  8. run GUI_subsToDbase_corrections.pl
  9. check the errors file (toSubsDbaseErrors.txt) created there
  10. make any necessary modifications to the database by hand (InfoTrack:subjects)
Personal tools