Moving Content To Long-Term Storage

From UA Libraries Digital Services Planning and Documentation

(Difference between revisions)
Jump to: navigation, search
Current revision (21:15, 18 December 2012) (edit) (undo)
 
(4 intermediate revisions not shown.)
Line 1: Line 1:
Currently, we are developing collections on a shared Windows drive called "Share" in the Digital Projects folder.
Currently, we are developing collections on a shared Windows drive called "Share" in the Digital Projects folder.
-
We are storing content long-term on a Linux server.
+
We are storing content long-term on a SUSE Linux server.
-
We are delivering content on another Linux server via CONTENTdm, and Tonio Loewald of OLT has developed an alternative (Linux-based) delivery solution, Acumen [http://acumen.lib.ua./edu http://acumen.lib.ua./edu] which runs off a web-enabled version of our storage area.
+
We are delivering content on the same Linux server via Acumen, [http://acumen.lib.ua./edu http://acumen.lib.ua./edu] which runs off a web-enabled version of our storage area.
-
This is the current set of procedures for moving content from Share to long term storage, and linking them into manifests for LOCKSS.
+
[[If moving from share]]
-
This procedure should happen on a regular basis:
+
-
 
+
[[If already in deposits]] (uploaded by staff)
-
== TO MOVE CONTENT TO STORAGE==
+
-
 
+
-
 
+
-
'''ACCESS and IDENTIFICATION'''
+
-
-------------------------
+
-
1) ssh to libcontent1.lib.ua.edu.
+
-
 
+
-
2) To get root access, run: sudo su (if you do not have sudo access, you cannot do this process)
+
-
 
+
-
3) To get access to the share drive from the libcontent1 Linux server, run:
+
-
mount -t cifs -o username=jlderidder,domain=lib //libfs1.lib.ua-net.ua.edu/share/Digital\ Projects/ /cifs-mount
+
-
and type in your password to the share drive. (substitute your own username there in the line above)
+
-
 
+
-
4) To determine what the status in on content in the Digital Services folder on share,
+
-
run: /srv/scripts/qc/status
+
-
This script goes through the Digital_Coll_Complete and Digital_Coll_in_progress folders, and pulls out
+
-
paths to folders that end in _ready (for Mary), _store and _online (for pickup for storage) and also
+
-
_check, for digital services quality control. The lists of these paths are written to files in
+
-
/srv/scripts/qc/lists:
+
-
forMary are things waiting for her;
+
-
checkme are for the DS folks
+
-
storeMe are things ready to go online and be stored
+
-
 
+
-
5) Share this output with Digital Services folks, to verify it is correct.
+
-
 
+
-
 
+
-
'''MOVE TO LINUX SERVER'''
+
-
----------------------------------------
+
-
6) Once verified, look under /cifs-mount for content in share directory.
+
-
Copy it over to /srv/deposits/content/.
+
-
for example:
+
-
cp -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939 /srv/deposits/content/.
+
-
This does a recursive copy of the entire specified directory and all its contents, and places the copy
+
-
in the /srv/deposits/content directory on the libcontent1 server.
+
-
 
+
-
7) do a diff from /srv/deposits/content to the location of the content on share drive area
+
-
to see if we got the content ok; for example:
+
-
diff -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939 /srv/deposits/content/u0003_0002121_Aston_1939
+
-
look at the output: if none, it's a match. If there's a variation, recopy the specified files,
+
-
and then run the diff again. If it's a big directory, prepend the command with "nohup " so the
+
-
process will continue to run if you log off or lose your connection. The output then will be
+
-
in the nohup.out file in the directory where you ran the "diff" command.
+
-
 
+
-
8) Once you're sure you have a clean copy (no output from "diff"), delete the specified directory from
+
-
the share drive. For example: rm -r /cifs-mount/Digital_Coll_Complete/u0003_0002121_Aston_1939
+
-
This is a powerful command, a recursive removal of a directory and everything in it,
+
-
including subdirectories. Be careful with it.
+
-
 
+
-
 
+
-
 
+
-
'''PREPARE CONTENT '''
+
-
------------------------
+
-
 
+
-
9) look through the directories in /srv/deposits/content for anomalies, badly named things,
+
-
and new categories -- any which are not listed on http://libcontent1.lib.ua.edu/lockss/Manifest.html
+
-
( this manifest is in /srv/www/htdocs/lockss/). Anomalies might be directories which do NOT match
+
-
the expected structure (Admin, Metadata, Scans, Transcripts) or subfolder structure beneath them;
+
-
spaces in filenames or directories, etc. Report any problems to Digital Services staff and arrange for corrections before
+
-
proceeding further
+
-
 
+
-
10) move any item-level text files out of scans to S:\Digital Projects\Administrative\collectionInfo\forMDlib\itemMD\
+
-
and notify Mary
+
-
+
-
11) RUN testIncoming (in /srv/scripts/qc; script available here: [[Image:TestIncoming.txt]]) and look at the output for problems.
+
-
 
+
-
 
+
-
 
+
-
'''UPDATE DATABASE WITH COLLECTION INFO'''
+
-
------------------------------------
+
-
12) If the collection has an icon: make sure it is in the Admin folder, named for the collection.
+
-
For example, u0003_0002121.icon.png for the collection above. (You will need to copy this
+
-
to the CONTENTdm server (content.lib.ua.edu) and place it in the /usr/local/Content4/docs/cdm4/images
+
-
directory, with 664 access rights "chmod 664 xyz.png"). List this icon in /srv/scripts/collstuff/icons,
+
-
so the following script will know not to apply a default icon link.
+
-
 
+
-
13) Then go to the collstuff directory /srv/scripts/collstuff and run: collToDbase
+
-
to put the collection info in the database. This looks through the Admin folders in the top level
+
-
directories placed in /srv/deposits, for collection-numbered XML files. It parses through the content,
+
-
repairing encoding errors, puts the content into mysql InfoTrack.allColls, adds the expected canned link,
+
-
and asks you for each collection and each delivery system whether or not the collection is online.
+
-
The software will give you the expected URL to test; if it fails, and you know the content is in the system,
+
-
locate the correct URL and provide it to the software, which will ask for a correction.
+
-
 
+
-
14) Check the database entries... mysql, InfoTrack.allColls. For example:
+
-
mysql -u jlderidder -p
+
-
(enter password)
+
-
use InfoTrack;
+
-
select * from digColls where id_2009 like "u0003_0002121" and AnalogOrDigital like "D";
+
-
(when done, enter "quit;")
+
-
If not okay, make repairs from command line.
+
-
 
+
-
15) Verify that links and display are correct. they will show up online here: [[http://www.lib.ua.edu/digital/browse|http://www.lib.ua.edu/digital/browse]].
+
-
 
+
-
 
+
-
 
+
-
 
+
-
'''MOVE CONTENT TO ARCHIVE'''
+
-
----------------------------------
+
-
+
-
16)
+
-
a) open /srv/scripts/storing/relocating (script available here: [[Image:Relocating.txt]]) and set $test = 1.
+
-
b) run "relocating > look". It will:
+
-
i) list the files it's going to move, from where and to where in "moveme"
+
-
ii) write the new manifests into RelocateManifests so you can see what they're going to look like
+
-
iii) write the parent manifests to parentMans, so you can see what they're going to look like
+
-
iv) write errors to "look" file, as well as other info
+
-
c) Look at the output in the "look" file, and the manifests concatenated in RelocateManifests and parentMans.
+
-
 
+
-
17) Make whatever repairs are necessary for script to work. If there are any new categories (z0004?),
+
-
then add them, and create top level manifests in those category directories (copy from like directories
+
-
and modify to fit).
+
-
 
+
-
18) Open /srv/scripts/storing/relocating and edit out the line $test = 1;
+
-
 
+
-
19) Empty the file /srv/scripts/storing/RelocateManifests, and copy moveme to another filename.
+
-
 
+
-
20) WARNING! Before running /srv/scripts/storing/relocating, do a "chmod -R 755 /srv/archive". This will enable you to
+
-
write to the directories as root
+
-
 
+
-
21) run "relocating > look" again, this time for real. It will take hours. Occasionally check the output in "look" and the
+
-
archive directory for errors.
+
-
 
+
-
 
+
-
 
+
-
 
+
-
'''SUPPLEMENTAL WORK, CLEAN-UP, AND QUALITY CONTROL'''
+
-
-------------------------------------------------
+
-
 
+
-
22) If the output tells you to modify or create manifests, do that next (if not done already)
+
-
in top level of /srv/www/htdocs/lockss directory
+
-
The script does NOT create the top 2 levels of manifests, as it does not have sufficient
+
-
information about new categories to fill this in.
+
-
 
+
-
23) check links at http://libcontent1.lib.ua.edu/lockss/Manifest.html -- drill down.
+
-
If you cannot access it, modify the .htaccess file in /srv/www/htdocs/lockss to allow your IP,
+
-
and restart the apache web server: '/usr/sbin/apache2ctl restart'
+
-
(be sure to change it back, and then restart the web server)
+
-
 
+
-
24) then run 'checkem > look' in /srv/scripts/storing (script available here: [[Image:Checkem.txt]])
+
-
-- it will use the "moveme" file to
+
-
verify md5 sums of content that was moved,
+
-
and delete from the deposits directory if there's a match. If there's NOT a match, it will output
+
-
"ERROR: " and the error. So look through the "look file" for problems.
+
-
Again, if there's a lot of content, precede this command with "nohup " and then look later in the
+
-
nohup.out file for the content that normally would go into "look".
+
-
+
-
25) run checkArchive (script available here; [[Image:CheckArchive.txt]]) and look at ArchiveERRORS; make repairs.
+
-
If needed, alter and run doubleCheck to make sure things were copied over; checkMans to check that what's linked in
+
-
the Manifest actually copied; addToMans to add content that was moved but not written to the Manifest. Problems occur
+
-
especially when the server goes down during a transfer. Especially be aware that dozens of tiffs may have been copying
+
-
when it went down, meaning multiple images are corrupted or damaged.
+
-
+
-
26) do a "chmod -R 555 /srv/archive" to protect the archive from unauthorized alteration until next time.
+
-
 
+
-
27) then go look in the /srv/deposits directory, make sure folders are clean, and delete them.
+
-
I do this: ls /srv/deposits/content/*
+
-
ls /srv/deposits/content/*/*
+
-
ls /srv/deposits/content/*/*/*
+
-
ls /srv/deposits/content/*/*/*/*
+
-
if you find content, there's a problem. Solve it!! It may be necessary to rename things and go back
+
-
through steps 13 on again.
+
-
if no content: rm -r /srv/deposits/content/*
+
-
(this will recursively delete all content in the directory)
+
-
 
+
-
28) go to /srv/scripts/surfacing and run copychange (script available here: [[Image:Copychange.txt]])
+
-
to create derivatives from tiffs, wave files and copy transcripts over to the web directory for access;
+
-
 
+
-
29) then run /srv/scripts/surfacing/repair to remediate the doubles created from tiffs which include both thumbs and large images.
+
-
 
+
-
30) cd /home/Jeremiah and locate the MODS for what was just moved, in the MODS directory. Run "relocate.pl" to send them live.
+
-
 
+
-
31) run /srv/scripts/metadata/findMissing to find out what doesn't have metadata, what doesn't have derivatives -- troubleshoot.
+
-
 
+
-
32) Either index these directories immediately from the Acumen admin interface
+
-
([http://acumen.lib.ua.edu/admin http://acumen.lib.ua.edu/admin]) or wait a little while for the cron indexer to pick
+
-
up the content, then check the output: [http://acumen.lib.ua.edu/ http://acumen.lib.ua.edu/]. Content should appear
+
-
already in the web directories located here:
+
-
[http://libcontent1.lib.ua.edu/content/ http://libcontent1.lib.ua.edu/content/]. See [[Find our content online]] for
+
-
detailed instructions.
+
-
 
+
-
 
+
-
 
+
-
''The current, working version of these instructions is located here on libcontent1: /srv/scripts/metadata/storing/DOTHIS.''
+
-
 
+
-
 
+
-
 
+
-
 
+
-
''updated 22 February 2010'' [[User:Jlderidder|Jlderidder]]
+

Current revision

Currently, we are developing collections on a shared Windows drive called "Share" in the Digital Projects folder.

We are storing content long-term on a SUSE Linux server.

We are delivering content on the same Linux server via Acumen, http://acumen.lib.ua./edu which runs off a web-enabled version of our storage area.

If moving from share

If already in deposits (uploaded by staff)

Personal tools