Sitemaps

From UA Libraries Digital Services Planning and Documentation
Revision as of 14:19, 24 March 2015 by Jlderidder (talk | contribs)

Sitemaps are a way of telling web search engine crawlers where to find the content on your site that you want them to index. After all, crawlers have no idea how to create the URLs that your database or delivery system create on the fly to provide access to online materials. Thus, database content is like a black hole on the web, and without help, that content will not be reflected in web search engine results such as Google.


Sitemaps cannot contain more than 50,000 URLs and must be no larger than 50 MB uncompressed. If you have multiple sitemaps, then you need a sitemap index file that lists them all -- then this would be the file you submit to the search engine site for indexing, as opposed to the sitemap itself.


Our sitemaps are automatically regenerated once a month (by makeSiteMap in /srv/scripts/sitemaps/), using the file date for the <lastmod> value; all our entries are listed as changing "yearly", since the next option is "monthly" and they rarely are updated that frequently. The <priority> value is highest for finding aids, and lowest for mass-digitized content (as that has little metadata to index).


Our sitemaps are located in /srv/www/htdocs/acumen/sitemaps and /srv/www/htdocs/sitemaps/ with corresponding sitemapIndex files in the directory just above these locations (visible via the web at http://acumen.lib.ua.edu/sitemapIndex.xml and http://libcontent.lib.ua.edu/sitemapIndex.xml.


One is for Acumen, and the other for libcontent, but they contain the same links.

Here's what our current sitemap index file looks like:

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
   <loc>http://acumen.lib.ua.edu/sitemaps/sitemap1.xml</loc>
   <lastmod>2015-03-01T07:10:16+00:00</lastmod>
 </sitemap>
 <sitemap>
   <loc>http://acumen.lib.ua.edu/sitemaps/sitemap2.xml</loc>
   <lastmod>2015-03-01T07:10:17+00:00</lastmod>
 </sitemap>
 <sitemap>
   <loc>http://acumen.lib.ua.edu/sitemaps/sitemap3.xml</loc>
   <lastmod>2015-03-01T07:10:17+00:00</lastmod>
 </sitemap>
 </sitemapindex>


And here is the first part of one of the sitemaps:

 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
     <loc>http://acumen.lib.ua.edu/u0003_0004090</loc>
     <lastmod>2015-02-24T16:22:54+00:00</lastmod>
     <changefreq>yearly</changefreq>
     <priority>1.0</priority>
   </url>
   <url>
     <loc>http://acumen.lib.ua.edu/u0003_0004091</loc>
     <lastmod>2015-02-27T22:59:57+00:00</lastmod>
     <changefreq>yearly</changefreq> 
     <priority>1.0</priority>
   </url>
   <url>
     <loc>http://acumen.lib.ua.edu/u0003_0004092</loc>
     <lastmod>2015-02-27T22:59:58+00:00</lastmod>
     <changefreq>yearly</changefreq>
     <priority>1.0</priority>
   </url>
   <url>
      <loc>http://acumen.lib.ua.edu/u0003_0004093</loc>
      <lastmod>2015-02-27T22:59:58+00:00</lastmod>
      <changefreq>yearly</changefreq>
      <priority>1.0</priority>
   </url>
   <url>
      <loc>http://acumen.lib.ua.edu/u0004_0000001_0000001</loc> 
      <lastmod>2014-08-05T16:43:00+00:00</lastmod>
      <changefreq>yearly</changefreq>
      <priority>0.8</priority>
   </url>

As you can see, the top 3 links are priority 1 and go to finding aids. The last one is for an item (not mass-digitized) and is priority .8 -- mass digitized content is priority .3.


To submit a new sitemap to Google, or to check our indexing progress, log in with web services credentials to Google Webmaster Tools.