I realized that since switching my site to the new Enki based software that Google stopped indexing the site’s pages. Thinking about the cause I came up with the conclusion that because all the pages and blog articles are dynamically generated it would take ages for them to end up in the indexes of search engines. What I needed was a way to help the search engines by adding a site map to the setup. This article is based on how I implemented sitemaps for klauskorner.com.
Sitemaps are an important requirement for sites with pages that are frequently created and updated, especially when those pages are dynamically created. Surprising many sites with dynamic content do not provide a sitemap.xml for search engines. You can find this out by yourself by simply adding sitemap.xml
to the root URL in your browser.
Since on this web site all pages and posts are created on the fly (dynamically based on the content stored in linked database) I started by search the web for Rails based sitemap solutions. I found many Rails plugins for managing sitemaps, the problem with those solutions was that they all created a static sitemap. After adding new content to the site one would either execute a command inside the application, or via a rake task. In some cases this could be automated by adding a cron entry. There is the danger of forgetting to execute the creation of the sitemap after having added new content. In addition, they all required way too much rework by adding additional controllers and models to the exiting code.
I wanted something that dynamically generates the sitemap when a request was made. Refining my search terms I found an article by Ilya Grigorik describing essentially the solution I was after. The article is a bit outdated as it was written in 2006, but that did not matter since it was a very simple solution that required only some minor changes to be inline with the changes Rails had undergone since then.
Below is the code I added and/or modified:
To start I added a route to handle sitemap.xml to config/routes.rb
file:
map.connect '/2010/12/21/adding_dynamic_sitemap_xml_file_to_enki_blog_sites/sitemap.xml', :controller => 'portal', :action => 'sitemap'
I decides to create a simple standalone site controller and added to the site_controller.rb file:
class SiteController < ApplicationController def sitemap pages = Page.find :all, rder => 'id DESC' posts = Post.find :all, rder => 'id DESC' tags = Tag.find :all, rder => 'name ASC'< respond_to do |format| format.xml end end end
This will allow the sitemap to list all postings, pages and categories. To do so what is needed next is the builder to generate the XML. (views/site/sitemap.xml.builder):
xml.instruct! xml.urlset "xmlns" => "http://www.sitemaps.org/schemas/sitemap/0.9" do xml.url do xml.loc "/index.html" xml.lastmod Time.now.strftime("%Y-%m-%d") xml.changefreq "always" xml.priority 1.0 end posts.each do |post| xml.url do xml.loc post_path(post, nly_path => false) xml.lastmod post.published_at.strftime("%Y-%m-%d") xml.changefreq "always" xml.priority 0.8 end end tags.each do |tag| xml.url do> xml.loc category_path(tag, nly_path => false) xml.lastmod Time.now.strftime("%Y-%m-%d") xml.changefreq "always" xml.priority 0.7 end end pages.each do |page| xml.url do xml.loc page_path(page, nly_path => false) xml.lastmod page.created_at.strftime("%Y-%m-%d") xml.changefreq "always" xml.priority 0.6 end end end
Looks more complicated than it is! I simply iterate over all of my collections and provide the appropriate pathes for the location URLs. The nly_path => false
option forces Rails to produce an ‘absolute’ URL to the application, not a relative path (http://www…com/lists/view/id instead of /lists/view/id) – this is a requirement for sitemap files.
The three path methods are in the apps/helpers/url_helper.rb file. Of the three, the post_path method existed already and needed no modification:
def post_path(post, options = {}) suffix = options[:anchor] ? "##{options[:anchor]}" : "" path = post.published_at.strftime("/Y/m/d/index.html") + post.slug + suffix path = URI.join(config[:url], path) if options[:only_path] == false path.untaint end
The page_path method needed to be modified by adding the options parameter to allow for the nly_path
values to be included:
BEFORE:
def page_path(page) "/pages/#{page.slug}" end
AFTER:
def page_path(page, options = {}) path = "/pages/#{page.slug}" path = URI.join(config[:url], path) if options[:only_path] == falsepath.untaint
end
The only code left is to add the category_path method. Note the string substitution to replace a ‘space’ with ‘%20’ to follow the URL rules:
def category_path(tag, options = {}) path = "/#{tag.name.downcase.gsub(" ","%20")}" path = URI.join(config[:url], path) if options[:only_path] == false path.untaint end
That’s it a quick and easy way to keep a sitemap current. The resources of generating a new sitemap only incur when the sitemap.xml is requested by a spider. Last comment, for details about the sitemap specification visit sitemaps.org.
Add One