Indexing

From UA Libraries Digital Services Planning and Documentation
Jump to: navigation, search

Indexing in Acumen takes place incrementally every night, with periodic full re-indexing (which can take several hours, and is usually scheduled on weekends).

Indexing in Acumen is dependent upon coordination between multiple things:

  1. The XSL file for the particular type of XML must effectively select out the metadata to be included in each field, and name the field to be indexed
  2. The SOLR schema file must indicate the field name selected in the XSL file, and specify how to index it correctly
  3. The database needs to support the field name appropriately, for cached results which provide speedy responses.

For example, if in the mods_to_solr.xsl the following template is used:

               <xsl:for-each select=".//mods:abstract">
                   <field name="abstract">
                       <xsl:value-of select="text()" />
                   </field>
               </xsl:for-each>

Then the MODS abstract field is to be extracted and indexed as "abstract" by SOLR.


This "abstract" value is then matched in the SOLR schema file (schema.xsl; the staging one is in /home/acumen/solr/staging/conf/ ), with instructions on how to index it:

       <copyField source="abstract" dest="abstract_s" maxChars="300" />
       <field name="abstract" type="text_general" indexed="true" stored="false" multiValued="true" />
       <field name="abstract_s" type="string" indexed="true" stored="true" multiValued="true" />

For more info, check out the SOLR reference guide.


Then in the Acumen database on libcontent, there must be an entry in the authority_type table with the value "abstract", so that the extracted values can be stored there.


When search results do not work correctly or well, then all of these aspects need to be reviewed, potentially modified, and coordinated.