How to Reach Enlightenment and Get Your Site Indexed by Search Engines
Update
If you don’t fancy getting your hands dirty with code or just love automation, then check out the mini web app we launched earlier this week, PingMyMap. It’s a free service for web developers and site admins which allows you to easily notify multiple search engines of updates to your site’s Sitemaps. Pingtastic!
For various reasons, some sites simply aren’t well indexed. Despite having highly relevant content, you just can’t get your site to show up in search engine results, or perhaps you have some pages that are unfortunately tucked away.
All Content is Not Created Equal
I personally advocate progressively enhanced, standards-based and accessible content solutions (damn I sound so buzzword!), but it’s pretty clear that there’s a lot of content out there that just isn’t so ready to expose itself to search engine crawlers.
Flash movies are notorious for suffering from indexing issues due to their non-semantic structure (although credit is due to Google, who are taking action). Complex JavaScript heavy sites often have many potential pitfalls when it comes to crawlers sussing out the structure and content, such as hash links (e.g. Save) and content that is dynamically loaded via AJAX requests.
Even if your content is on the most accessible, standards compliant site in the whole of the web, there’s a fantastic tool that will give you indexing bliss, no matter how your site is structured.
Google et Al To The Rescue
Google Sitemaps will be familiar to a lot of web developers as a mechanism to make Google aware of pages and content on your site. What many people still don’t know, is that several other web giants support XML Sitemaps as a standard: Yahoo!, Microsoft, Ask.com, Moreover.com and IBM. These companies got on board in the couple of years after Google Sitemaps were launched (June 2005), and the Sitemaps protocol was established.
The Sitemaps protocol is comprehensive and equips you with everything you need to know to create Sitemaps.
Ping – A World of Discovery
Well, this is all well and good, and we’re jolly happy that these collaborating search engines can find our stuff, but how are we going to make sure that they keep checking our sitemaps for updates? It’s a chicken and egg situation, but one for which there is a tasteful solution.
Google, Yahoo!, Microsoft and Ask.com all offer ping services that you can submit your sitemap URLs to. This means that you can notify these search engines of sitemap updates on your terms and in an automated fashion without having to manually submit your sitemap URL to each service one by one.
Here’s a PHP script to help you get your Sitemaps ping action on the go. I’ve opted to use the cURL functions (which are compiled into most PHP installations) over file_get_contents() with the fopen_url wrapper as it’s a more secure solution.
PHP Ping Script
For you Ruby divas out there, here’s an auto-magical script that will give you the same jazz.
Ruby Ping Script
# Switch output on or off
OUTPUT = true
# Sitemap URL
SITEMAP_URL = 'http://www.example.com/sitemap.xml'
# Search engine pings configuration
pings_config = [
{:name => 'Google',
:url => 'http://www.google.com/webmasters/sitemaps/ping?sitemap='},
{:name => 'Yahoo!',
:url => 'http://search.yahooapis.com/SiteExplorerService/V1/ping?sitemap='},
{:name => 'Ask.com',
:url => 'http://submissions.ask.com/ping?sitemap='},
{:name => 'Microsoft Live Search',
:url => 'http://webmaster.live.com/ping.aspx?siteMap='},
{:name => 'Moreover.com',
:url => 'http://api.moreover.com/ping?u='}
];
# .................................................................
attr_accessor :pinged_count
# Loop through sites to ping
@pinged_count = 0;
pings_config.each do |ping_config|
# Construct ping URL
ping_url = ping_config[:url] + CGI::escape(SITEMAP_URL);
# Execute HTTP request to ping URL
response = Net::HTTP.get_response(URI.parse(ping_url))
# Check HTTP status code
result = true if response.code == '200'
# Increment the pinged_count if successfully pinged
@pinged_count += 1 if result
# Output ping result
puts "[] " if OUTPUT
end
end
end
# Count number of sites to ping
stat_pings_count = pings_config.size;
# Ping the sites
ping = Ping.new(pings_config)
# Output ping results summary
puts "Pinged out of sites successfully" if OUTPUT
# exit 0 # Needed if script is run from the CLI(Thanks to Alistair for beautifying my wretched Ruby script!)
Let’s Automate, Baby
For bonus points you can schedule your ping script with CRON so that it pings the search engines with your Sitemap every week, for example. You can either include it within a scheduled script that generates your Sitemaps (which will be a typical approach for CMS-driven sites), or simply call it directly.
If you’re calling the ping script directly by itself then make sure you uncomment the last line so that it returns an exit code when it’s executed. You’ll also want to add a shebang line to the start of the script (alter your path as required):
- PHP:
#!/usr/bin/php - Ruby:
#!/usr/bin/ruby
Minor Caveat
The scripts both check the HTTP status code of the search engine ping URL that has been requested (not your Sitemap URL) is 200 (OK). The Microsoft Live Search ping endpoint appears to be the only one that returns a 404 (Not Found) status code if you submit a Sitemap URL that doesn’t exist. Unfortunately none of the search engines return a parsable response (unless you really want to parse inconsistent HTML!).
No Blackhat Required!
Combined with mod_rewritten URLs (surely the greatest freebie in the SEO toolkit!), Sitemaps give us a perfect opportunity for shameless self-promotion. Instead of twiddling your thumbs and hoping that search engines will discover all of your site and its updates, you can now be confident that they’ll have the inside dope. Now that’s what I call standards satisfaction!
If anyone would like to contribute a port of the ping script shown above to Python, please drop me an e-mail at simon [at] seventytwo [dot] co [dot] uk.
Further Reading
- Official Sitemaps Website – Sponsored by Google, Yahoo! and Microsoft
- XML Sitemaps Generator – Bear in mind that this will only include pages/content that are accessible to their crawler
- Mephisto (Google) Sitemap – Plugin for Mephisto that generates Sitemaps
- Wikipedia on Sitemaps
Discuss
Why not discuss this post with us and other users.
