View Single Post
Old 03-28-2006, 03:10 PM  
Kevsh
Confirmed User
 
Join Date: Dec 2004
Location: TO
Posts: 8,619
Google Sitemaps: A Little 411 For All

Oops .. meant to start this as a new thread (kind of a 6K post thing...) Anyhow, a big read ...

Here is a few tips and observations regarding Google Sitemaps. If you don't know, or haven't heard about it, I'd highly recommend getting up to speed.
(Btw, this is not the same as having a sitemap.html page on your site!)

After a lot of investigating, testing, frustration and some success, I've come up with a few things to point out, feel free to dispute, add, etc.:

1) Sitemaps, when done properly, is a great tool when it comes to indexing your site. It doesn't guarantee anything in terms of rankings, but you can more quickly get sub-pages indexed and have far more control in terms of what appears in Google's index.

2) Use the recommended Sitemap method Google suggests. That is, build the sitemap with the Python script they provide. Then ping them every time you update it. (There's full details in the Google Sitemaps FAQ). From my experience, the other options are far less likely to help, and may actually hurt you. (I know)

3) This from a Google engineer I talked to in New Orleans last year: Your robots.txt will override the Sitemap file you register if there are any conflicting details (e.g. your robots.txt file says to disallow access to a page which appears in your sitemap file - it won't include it)

4) The Google sitemap Control Panel is very useful for much more than monitoring your sitemap. It verifies your robots.txt file, provides a simple place to see links to your site, indexed pages, etc. and if you verify there is a wealth of details available.

5) Big thing to note about the Robots.txt file (also if you find out you cannot Verify your site with the file they provide). If you are using an ErrorDocument directive in your .htaccess for 404s - redirecting to another page, something most of us do - there's a fair chance Google isn't reading your robots.txt file properly. This will also prevent Google from verifying your site if you try it.

Use the Sitemap Control Panel to see if Google is fetching your Robots.txt. Just because your logs say Googlebot is reading it, doesn't mean it is. If you see a "file not found" in the Sitemaps for your robots.txt it's a good bet that is why. Solution? Remove the ErrorDocument directive - but, of course, for many of us that's not a fair trade-off.

6) There will be some overhead in terms of time, but if you don't update your site daily it won't be a big chore. A programmer or admin can set up a cron job on your server to automatically build the sitemap file whenever the site is updated (pages added, deleted, etc.) With a little coding you can make this work very well. If you do it manually, using the Python script, it should only be a minor inconvenience.

If anyone wants to discuss, get some advice (i'm no guru, but if I can't help directly, I can offer at least some places to get help) or tips on getting started, feel free to hit me up. ICQ 312104564

I'll also be in Phoenix. You'll see me at the poker tables getting trounced and probably wearing a red "Canada" hat. Hey, I'm a homer. In any event, if anyone wants to discuss, I'd love to.

Lastly, before doing *anything* thoroughly go through the Sitemaps FAQ or get a techie to. You don't want to screw things up - though any reasonable mistake can be corrected, it will just be a pain until it is.

Good luck
Kevsh is offline   Share thread on Digg Share thread on Twitter Share thread on Reddit Share thread on Facebook Reply With Quote