DOTHIS Instruction sheet for University of Alabama Model for Mass Digitization ################################### # Software Installation and setup # ################################### 1) Read the ASSUMPTIONS.txt file and ensure that assumptions will be met; note that if you are generating HTML from each EAD, those steps are outside this list of instructions. 2) Install the software listed in the dependencies list in ASSUMPTIONS.txt 3) Ensure the software is available to the user running the scripts. To test, type the following on the command line (without quotes): "which convert" (if a path is output, ImageMagick is accessible for creating JPEGS) "which tesseract" (if a path is output, tesseract-ocr is accessible for creating OCR) "which perl" (if a path is output, perl is accessible for running the scripts) If any of these output "which: no [something] in ( ...)" then the values in the parentheses indicate the current classpath for the user, and the indicated software package is not in any of those directories. Either the software needs to be installed, or the classpath needs to be modified for the user to indicate where the installed software is located. Normal locations in Linux are usually /usr/bin/ or /usr/local/bin/. To modify your classpath, see http://www.linuxheadquarters.com/howto/basic/classpath.shtml. 4) Ensure that the necessary Perl modules are installed. To test, type the following on the command line and then hit the enter key after each line: perl -e 'use File::Copy; print "ok\n"' perl -e 'use Shell::Command; print "ok\n"' perl -e 'use Time::Local; print "ok\n"' perl -e 'use XML::TWIG; print "ok\n"' If the output for each of the three lines is "ok" then the three modules indicated are already installed. If not, see http://www.cpan.org/misc/cpan-faq.html#How_install_Perl_modules for installation instructions. 5) Note the path output when (in step 3) you typed in "which perl" on the command line. If the path output in response is not /usr/bin/perl, open the changePerlPath.pl file in the scripts directory with a text editor. Change the top line that begins with #! by replacing the /usr/bin/perl path with whatever path is output when you type in "which perl". Do not remove the #! or add any spaces, and do not change anything else. Close the file and type in "chmod 555 changePerlPath.pl" without the quotes. Then type in "perl changePerlPath.pl" (without the quotes) and hit enter. This will modify the path at the top of each file in the scripts directory to reflect the correct path, so the other scripts will run. 6) Ensure all the scripts in the script directory have executable permissions; from within the script directory, type `chmod 555 *` ############################# ### For each collection #### ############################# 7) Ensure the user running the script has write permissions to the content directories, the output directory, and the web directories where content will be made available 8) Determine what your collection identifier(s) will be, based on the information in the ASSUMPTIONS.txt file. 9) Copy the config.txt in the config directory and rename it to reflect your collection identifier(s). For example, if your collection identifier is u3_abs42 then the config file for that collection will be named u3_abs42_config.txt, and must also be located in the config directory. 10) In your collection config file, fill in the values after the equal (=) signs 11) Name your TIFF files according to the convention expected by the ASSUMPTIONS and indicated by your config file specifications. 12) Name your EAD for the collection identifier: for example, if your collection identifier is u3_abs42 then the EAD will be named u3_abs42.ead.xml. Place this EAD in the EADS directory. 13) Run the testFiles.pl script in the scripts directory: within the scripts directory, type in (without quotes): "perl testfiles.pl". It will ask for the collection identifier, and the path to the TIFF files. It will then test the TIFF filenames for compliance with the config file, and output any errors to the output directory. It will also test the EAD file name and box/folder values. Check the output file (the exact filename will be output on commandline) and repair any errors indicated (either in config file or EAD or file names). Repeat until there are no errors. 14) Run the makeJpegs.pl script in the scripts directory: within the scripts directory, type in (without quotes): "perl makeJpegs.pl". Check the output directory record for any errors, and look in the content directory. This script will create a subdirectory there for this collection identifier, and within it, for JPEGS and THUMBS, and that is where it will output the images. Check results, and if not pleased, make alterations as needed (for example, size of images is determined by config file parameters) and repeat. Delete all files in these directories prior to rerunning the script if file names are changed in any way! 15) If delivery is to be via static HTML pages as opposed to Acumen or some other delivery system, run the makeHTML.pl script in the scripts directory: within the scripts directory, type in (without quotes): "perl makeHTML.pl". This will generate item-level HTML files as well as folder-level HTML files if specified in the config file. Javascript and CSS templates will also be modified and placed in the content area HTML directory. Any errors will be found in the output directory. If problems, repair the config file, and run again. 16) Run the linkContent.pl script in the scripts directory: within the scripts directory, type in (without quotes): "perl linkContent.pl". This will first back up a copy of your EAD into the backups directory (within EADS). Then it will check your config file to determine whether to link items or folders into the EAD. If items, it will use the JPEG filenames and the config file to add links to your EAD in the correct locations. Check the output file for errors, and also check your EAD in the EADS directory. If there are any problems, make necessary adjustments (to file names, EAD, or config file) and rerun until there are no errors. 17) If you want MODS files generated for these items, either for delivery or preservation, run the makeMODS.pl script in the scripts directory: within the scripts directory, type in (without quotes): "perl makeMODS.pl". This will use the config file and the large image filenames to create a MODS record for each item. It will create a MODS directory within the collection area in the content directory and place the results there. Check these files, and the output file in the output directory for any errors. If problems, please make adjustments to the config file and repeat. THIS STEP IS NECESSARY for Acumen delivery. 18) If an HTML version of the EAD is to be placed in the collection web directory (for static HTML delivery) use other resources to convert the linked EAD to HTML, such as those listed here: http://www.archivists.org/saagroups/ead/tools.html Ensure that the HTML version includes clickable links to the content. If you want it placed in the collection web directory, name the HTML file for the collection ID followed by ".html" (such as testColl.html) and place it into the HTML directory. It will be renamed index.html when it is moved to the web directory, so users accessing that directory will automatically see it. 19) To move the JPEGS (and the MODS or HTML) into the web area, run the makeItLive.pl script in the scripts directory: within the scripts directory, type in (without quotes): "perl makeItLive.pl". Collection HTML (if any) will be placed at the collection level and renamed index.html to ensure seamless user access. Folder-level HTML (if any) will be located at the collection level (in a Folders subdirectory), named for the box and folder numbers. Item-level HTML will be named index.html and placed in item-level subdirectories. Javascript and CSS files created by makeHTML will be moved to includes directories. Any errors will be found in the output directory. If problems, delete the web directories created, repair the config file, and run again. Check the output file for errors, and check the web directory. Each item will have its own subdirectory in the collection directory, and so will each page. This allows support for metadata at any level. Item-level MODS (if available) will be placed in a Metadata directory at the item level. 20) If using another delivery method for your EAD besides Acumen, upload it. Note that most browsers will transform an xml file based on the rules in the XSL it is linked to. Thus, if you modify an xsl file to your liking, name it ead.xsl, place it in the same directory as the EAD, and link it at the top like this: (just below the "