Monthly Count

From UA Libraries Digital Services Planning and Documentation
Revision as of 09:14, 6 August 2013 by Kgmatheny (talk | contribs) (reflecting server switch from libcontent1 to libcontent)

What is our monthly count?

We're currently counting TIFF and WAVE files in the archive, deposits, and on the share drive, and MP3 and large JPEGs in Acumen.

Share side preparation

In October 2012, we modified how we collect the count, to capture a snapshot across multiple locations and note duplicates which are in the process of being repaired. The benefits of this new method is that we're not trying to collect a 3-D representation of our work in a single fell swoop in a single spreadsheet; and also, that we can automate it and save a ton of time and confusion. We're also working off the actual existence of files, not off the reported digitization counts. This avoids a ton of errors caused by inattention, poor counting, and forgetfulness.

In addition, we also developed a method of tracking digitization that never sees the light of day. This can include:

  • digitization for donors, such as providing digital versions of content donated to Special Collections
  • digitization for publication, at the dean's request, often for cost recovery
  • weeded collections (such as those containing duplicates), and collections deleted due to rights issues
  • digitization of typed transcripts in order to obtain OCR to enable searchability of hand-written documents (we delete the transcript tiffs)
  • recapture of badly digitized content
  • correction of errors, such as duplicate captures of a single page

For this process, see Documenting Invisible Digitization.

Part I: What's where right now?

Starting in October 2012, we are capturing monthly snapshots (an example can be viewed here of image and audio content, with an intellectual item count pulled from the filenames, across web directories (Acumen), the deposits directories (awaiting archiving), the digital archive, and the working share drive, with duplicate items/files there noted in the "Under Repair" columns for clarification.

More on this process can be found in Server-side Count Scripts.

Part II: How does this differ from last month? And how many GB do we have?

Since this does not clarify progress since the previous month, we will also be generating a second synopsis specifying GB counts, total digitization thus far, and what was accomplished this past month.

For example, for October, 2012, the synopsis was as follows:

  • Digitized 125 new items (8719 captures)
  • Total items digitized: 89972
  • Total captures: 323204
  • Total items in Acumen: 78704
  • Total captures in Acumen: 294350
  • Total GB in archive: 13852
  • Total GB on share: 474
  • Total GB: 14326

The script that generates this information picks up the information encoded in the tab-delimited text files that track Documenting Invisible Digitization and incorporate these into the Total items and Total captures sums, to reflect the work we've done that can't be seen. It also captures GB counts across the archive/deposits directory and the following share drive directories in the Digital Projects area:

  • Digital_Coll_Complete
  • Digital_Coll_in_progress
  • Digital_Coll_ON_HOLD
  • Digital_Coll_Proposed

If a REVIEW folder exists in Digital_Coll_in_progress, the size of that folder is subtracted, as those files are assumed to be copies of TIFF files already in the archive.

Part III: Synopsis over time...

A third script will collect monthly synopsis files, and generate a snapshot for the fiscal year thus far, with columns for each month and totals at the end and at the bottom.

Again, more on this process can be found in Server-side Count Scripts.

16:04, 2 November 2012 (CDT) Jlderidder

The following information is deprecated as of 10/2012 (15:39, 2 November 2012 (CDT)Jlderidder)

To collect the count on the storage drive:

  1. Log onto libcontent via SSH and change into the scripts directory. (see Command-line_Work_on_Linux_Server)
  2. Run GBcount: type in `GBcount` and hit enter. It will output on the command line the total count in gigabytes for the archival storage directories on the server. It will also output a detailed report in the scripts/output directory.
  3. Run acumenCount: type in `acumenCount` and hit enter. It will output on the command line the total number of collections represented in Acumen, including the number which do not have EADs; the detailed report will be output in the scripts/output directory.
  4. Download the report created by acumenCount, and open it in Excel, with tabs as delimiters. This report lists collection identifiers, titles, item count, file count, and whether the collection has an EAD. At the end is the total counts for both items and files.

To collect the count on the Share drive, and combine the two for the monthly report:

For the size of content on Share, use the monthly GB counter described here Monthly_gb_counter.

Please refer to this document for information on how we compile our monthly count File:Collecting numbers for use in the monthly count.docx