Google

Adding dynamic “sitemap.xml” file to Enki Blog Sites

Written on:December 21, 2010
Comments
Add One

I realized that since switching my site to the new Enki based software that Google stopped indexing the site’s pages. Thinking about the cause I came up with the conclusion that because all the pages and blog articles are dynamically generated it would take ages for them to end up in the indexes of search engines. What I needed was a way to help the search engines by adding a site map to the setup. This article is based on how I implemented sitemaps for klauskorner.com.

Sitemaps are an important requirement for sites with pages that are frequently created and updated, especially when those pages are dynamically created. Surprising many sites with dynamic content do not provide a sitemap.xml for search engines. You can find this out by yourself by simply adding sitemap.xml to the root URL in your browser.

Since on this web site all pages and posts are created on the fly (dynamically based on the content stored in linked database) I started by search the web for Rails based sitemap solutions. I found many Rails plugins for managing sitemaps, the problem with those solutions was that they all created a static sitemap. After adding new content to the site one would either execute a command inside the application, or via a rake task. In some cases this could be automated by adding a cron entry. There is the danger of forgetting to execute the creation of the sitemap after having added new content. In addition, they all required way too much rework by adding additional controllers and models to the exiting code.

I wanted something that dynamically generates the sitemap when a request was made. Refining my search terms I found an article by Ilya Grigorik describing essentially the solution I was after. The article is a bit outdated as it was written in 2006, but that did not matter since it was a very simple solution that required only some minor changes to be inline with the changes Rails had undergone since then.

Below is the code I added and/or modified:

To start I added a route to handle sitemap.xml to config/routes.rb file:

map.connect '/2010/12/21/adding_dynamic_sitemap_xml_file_to_enki_blog_sites/sitemap.xml', :controller => 'portal', :action => 'sitemap'

I decides to create a simple standalone site controller and added to the site_controller.rb file:

class SiteController < ApplicationController
  def sitemap
    pages = Page.find :all, :o rder => 'id DESC'
    posts = Post.find :all, :o rder => 'id DESC'
    tags = Tag.find :all, :o rder => 'name ASC'<
    respond_to do |format|
      format.xml
    end
  end
end

This will allow the sitemap to list all postings, pages and categories. To do so what is needed next is the builder to generate the XML. (views/site/sitemap.xml.builder):

xml.instruct!
 
xml.urlset "xmlns" => "http://www.sitemaps.org/schemas/sitemap/0.9" do
  xml.url do
    xml.loc         "/index.html"
    xml.lastmod     Time.now.strftime("%Y-%m-%d")
    xml.changefreq  "always"
    xml.priority    1.0
  end
 
  posts.each do |post|
    xml.url do
      xml.loc         post_path(post, :o nly_path => false)
      xml.lastmod     post.published_at.strftime("%Y-%m-%d")
      xml.changefreq  "always"
      xml.priority    0.8
    end
  end

  tags.each do |tag|
    xml.url do>
      xml.loc         category_path(tag, :o nly_path => false)
      xml.lastmod     Time.now.strftime("%Y-%m-%d")
      xml.changefreq  "always"
      xml.priority    0.7
    end
  end

  pages.each do |page|
    xml.url do
      xml.loc         page_path(page, :o nly_path => false)
      xml.lastmod     page.created_at.strftime("%Y-%m-%d")
      xml.changefreq  "always"
      xml.priority    0.6
    end
  end
end

Looks more complicated than it is! I simply iterate over all of my collections and provide the appropriate pathes for the location URLs. The :o nly_path => false option forces Rails to produce an ‘absolute’ URL to the application, not a relative path (http://www…com/lists/view/id instead of /lists/view/id) – this is a requirement for sitemap files.

The three path methods are in the apps/helpers/url_helper.rb file. Of the three, the post_path method existed already and needed no modification:

  def post_path(post, options = {})
    suffix = options[:anchor] ? "##{options[:anchor]}" : ""
    path = post.published_at.strftime("/Y/m/d/index.html") + post.slug + suffix
    path = URI.join(config[:url], path) if options[:only_path] == false
    path.untaint
  end

The page_path method needed to be modified by adding the options parameter to allow for the :o nly_path values to be included:

BEFORE:

  def page_path(page)
    "/pages/#{page.slug}"
  end

AFTER:

  def page_path(page, options = {})
    path = "/pages/#{page.slug}"
    path = URI.join(config[:url], path) if options[:only_path] == false
    path.untaint
  end

The only code left is to add the category_path method. Note the string substitution to replace a ‘space’ with ‘%20’ to follow the URL rules:

  def category_path(tag, options = {})
    path = "/#{tag.name.downcase.gsub(" ","%20")}"
    path = URI.join(config[:url], path) if options[:only_path] == false
    path.untaint
  end

That’s it a quick and easy way to keep a sitemap current. The resources of generating a new sitemap only incur when the sitemap.xml is requested by a spider. Last comment, for details about the sitemap specification visit sitemaps.org.

Leave a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>