Scripts for spreadsheet work

From UA Libraries Digital Services Planning and Documentation
(Difference between revisions)
Jump to: navigation, search
(Subject Remediation (Database and Acumen))
(Subject Remediation (Database and Acumen))
Line 28: Line 28:
 
# run fixSubjects; this will find and correct the MODS in Acumen that match, and place the modified MODS in a local directory for review
 
# run fixSubjects; this will find and correct the MODS in Acumen that match, and place the modified MODS in a local directory for review
 
# check results
 
# check results
# copy MODS to /home/ds/UploadArea/MODS
+
# copy MODS to /home/ds/Metadata/MODS/TodaysDate/
 
# run `chown ds:www *` in that directory to correct permissions on the files
 
# run `chown ds:www *` in that directory to correct permissions on the files
# run relocate_all to set them live in Acumen
+
# run relocate.pl to set them live in Acumen
 
# return to spreadsheet and modify it in the following fashion:
 
# return to spreadsheet and modify it in the following fashion:
 
## Remove any entries in the spreadsheet that contain DELETE
 
## Remove any entries in the spreadsheet that contain DELETE

Revision as of 08:22, 13 April 2017

New workflow: After you have entered data in your spreadsheet including names and subjects. You will want to tag your subjects.

Remediation workflow: For spreadsheets that are in the metadata queue, you will want to review the spreadsheet and make edits. All names should be searched in the names database and replaced with database correspondence entry including the database number. All subjects should be searched in the subject database and replaced with subject database entry including its tags.

DB_SubjectTagging.pl (S:\Digital Projects\Administrative\scripts\Metadata\DB_SubjectTagging.pl) will read the text file version of the spreadsheet and make a list of Subjects that are in the spreadsheet and not in the subjects database. The list will be generated in the correct format needed to upload several at once with GUI_subsToDbase.pl (S:\Digital Projects\Administrative\scripts\Metadata\GUI_subsToDbase.pl) you would just need enter the tagged value. Needed for the script: Navigate to Text export of Metadata spreadsheet.

DB_SubjectReplace.pl (S:\Digital Projects\Administrative\scripts\Metadata\DB_SubjectReplace.pl) will do a find and replace with untagged subjects to tagged subjects using the database. If there are subjects that are in the spreadsheet but not in the database it will make a list of those subjects. Needed for the script: Navigate to Text export of Metadata spreadsheet.

spreadsheetCheck.pl (S:\Digital Projects\Administrative\scripts\Metadata\spreadsheetCheck.pl) Use to verify that your Metadata file is error free before creating MODS. Needed for the script: Navigate to Text export of Metadata spreadsheet. Which at the moment you can only do on the Digital Projects side. Error message will be located in (S:\Digital Projects\Administrative\output)


Subject Remediation (Database and Acumen)

  1. use /srv/scripts/metadata/subjectStuff/getSubjects to generate a list of alphabetized subjects in Acumen (writes alphaSubjects to output directory there).
  2. copy that to the share drive and open in Excel, then do these things:
    1. label the first column "SearchOn" (containing the output values), the 2nd column "Corrected", the third "Tagged Value", the 4th "Authority", the 5th "Collection", the 6th "Item Number"
    2. delete any that do not need correcting.
    3. separate out names for name authority remediation work.
    4. enter corrected value in Corrected, complete tagged value in Tagged Value, and the authority in Authority (use "local" if no authority assigned).
    5. if the entry should be deleted, put DELETE in the Corrected column
    6. if this is known to be in only one collection, add the collection number in the appropriate column.
    7. if this only needs to be remediated in a handful of known items (or less), add those IDs in the Item Number column (semicolon separated)
    8. check results for errors, failures to tag, etc.
    9. replace "&" with &
    10. export using ExcelConverter (in Share drive Digital Projects/Administrative/scripts/ExcelConverter) to unicode.
  3. copy export to /srv/scripts/metadata/regExStuff/subjectChanges/
  4. run fixSubjects; this will find and correct the MODS in Acumen that match, and place the modified MODS in a local directory for review
  5. check results
  6. copy MODS to /home/ds/Metadata/MODS/TodaysDate/
  7. run `chown ds:www *` in that directory to correct permissions on the files
  8. run relocate.pl to set them live in Acumen
  9. return to spreadsheet and modify it in the following fashion:
    1. Remove any entries in the spreadsheet that contain DELETE
    2. Replace everything in the first column with ADD
    3. Replace the last 2 columns (Collection, Item Number) with Alternate Authority (one column)
    4. Split any multiple authorities into Authority and Alternate Authority
    5. export using ExcelConverter (in Share drive Digital Projects/Administrative/scripts/ExcelConverter) to unicode.
  10. copy export to S:\Metadata\Authorities\ToDatabase and name it databaseChanges.txt
  11. run GUI_subsToDbase_corrections.pl -- this will add anything new from column 2, and correct any existing entries that match this value
  12. check the errors file (toSubsDbaseErrors.txt) created there
  13. make any necessary modifications to the database by hand (InfoTrack:subjects)
  14. return to original spreadsheet and modify it in the following fashion:
    1. Use conditional formatting to color the fields that are duplicate in columns 1 and 2
    2. Sort entire spreadsheet on first column containing that color
    3. delete those rows where column 1 and 2 are the same (we've already updated those in the last step).
    4. Use conditional formatting to color the fields that contain & in column 1 (this is an error of confusion; they have to be encoded like that in XML, but in the database, those are "&" already)
    5. Sort entire spreadsheet on first column containing that color
    6. Delete those rows (they don't need correcting)
    7. Now delete all but the first column; only the first column needs to be deleted from the database (if there); (corrections/additions were made in the last step)
    8. export using ExcelConverter (in Share drive Digital Projects/Administrative/scripts/ExcelConverter) to unicode.
  15. copy export to S:\Metadata\Authorities\ToDatabase and name it deleteThese.txt
  16. run GUI_subsToDbase_deletions.pl to remove the offending entries from the database (if they exist) -- this ONLY does an exact match on the display value.
  17. check results, and make any modifications by hand.
Personal tools