For Quality Control
From UA Libraries Digital Services Planning and Documentation
| (9 intermediate revisions not shown.) | |||
| Line 1: | Line 1: | ||
Quality Control checks happen in multiple parts of the work flow pipeline. | Quality Control checks happen in multiple parts of the work flow pipeline. | ||
| + | ==On Windows Share Drive, During Digitization== | ||
| - | [[Image:filenames.txt]] is a Windows Perl script for locating badly formed and mislocated file names. | + | [[Image:filenames.txt]] is a Windows Perl script for locating badly formed and mislocated file names. This version of the script is also known as 'filenamesAndTranscripts' as it checks transcript directories also, if they exist. |
[[Image:numfiles.txt]] is a Windows Perl script that looks through scans directories in a selected collection directory, | [[Image:numfiles.txt]] is a Windows Perl script that looks through scans directories in a selected collection directory, | ||
counts up all files of input extension chosen, and | counts up all files of input extension chosen, and | ||
outputs txt file tab delimiting item number followed by number of files | outputs txt file tab delimiting item number followed by number of files | ||
| - | |||
| + | [[Image:BoxFolderCheck.txt]] is a Windows OR Macintosh Perl script designed for digital content where the item number section of the identifier includes the box and folder number where it is to be linked into the EAD finding aid, and the digital files are located in directories named for the box, and in subdirectories named for the folder. This verifies that files are named appropriately and also located in directories which reflect their file name. | ||
| + | == On Linux Server, for Web Delivery== | ||
| + | Once content is uploaded to the Linux server for archival storage, one of the scripts we run to verify that all the archival filenames are correct and in the right directory, and no sequences are missing, is [[Image:TestDeposits.txt]] -- for the Cabaniss content, it's [[Image:testNums.txt]]. | ||
| + | To test content online in Acumen, and locate items that have no derivatives or no MODS record, use this Linux Perl script: [[Image:findMissing.txt]] | ||
| + | For Cabaniss, this one checks the content in the web directory (in Acumen) against what's in the storage archive, to make sure nothing is missing: [[Image:findMissingFile.txt]] | ||
| + | To create OCR files for items listed in *ocrList.txt files located in the /srv/deposits/ocrMe directory, and place those OCR files in the correct web location: [[Image:ocrSelected.txt]] | ||
| - | + | == On Linux Server, the Storage Archive == | |
| + | |||
| + | Checking the MD5 checksums of content stored prior to each full-tape backup: [[Image:Dirs.txt]] (as described in [[Watching Our Backs]]) | ||
Current revision
Quality Control checks happen in multiple parts of the work flow pipeline.
On Windows Share Drive, During Digitization
Image:Filenames.txt is a Windows Perl script for locating badly formed and mislocated file names. This version of the script is also known as 'filenamesAndTranscripts' as it checks transcript directories also, if they exist.
Image:Numfiles.txt is a Windows Perl script that looks through scans directories in a selected collection directory, counts up all files of input extension chosen, and outputs txt file tab delimiting item number followed by number of files
Image:BoxFolderCheck.txt is a Windows OR Macintosh Perl script designed for digital content where the item number section of the identifier includes the box and folder number where it is to be linked into the EAD finding aid, and the digital files are located in directories named for the box, and in subdirectories named for the folder. This verifies that files are named appropriately and also located in directories which reflect their file name.
On Linux Server, for Web Delivery
Once content is uploaded to the Linux server for archival storage, one of the scripts we run to verify that all the archival filenames are correct and in the right directory, and no sequences are missing, is Image:TestDeposits.txt -- for the Cabaniss content, it's Image:TestNums.txt.
To test content online in Acumen, and locate items that have no derivatives or no MODS record, use this Linux Perl script: Image:FindMissing.txt
For Cabaniss, this one checks the content in the web directory (in Acumen) against what's in the storage archive, to make sure nothing is missing: Image:FindMissingFile.txt
To create OCR files for items listed in *ocrList.txt files located in the /srv/deposits/ocrMe directory, and place those OCR files in the correct web location: Image:OcrSelected.txt
On Linux Server, the Storage Archive
Checking the MD5 checksums of content stored prior to each full-tape backup: Image:Dirs.txt (as described in Watching Our Backs)
