Search Engine Optimization and Marketing for E-commerce

Yahoo officially rejects use of META Keywords tag

by Andrew Kagan 10. October 2009 11:00

Following up on Google's announcement a few weeks ago that it gave zero relevance to the META Keywords tag in it's search engine rankings, #2 SE Yahoo announced at SMX East this past week in NYC that it, too, gave no relevance to this deprecated tag.
Of the three major search engine databases, only Inktomi (which was purchased by Yahoo in 2002) had continued to use this META tag in it's search rankings, although it's relevance was vastly diminished over time. 
In an open Q&A session during the conference, Cris Pierry, senior director of search at Yahoo, announced that support for the META Keywords tag in Yahoo's search engine in fact had ended several months ago.
Google has always maintained that it never supported the keywords tag, but made the official announcement recently to dispel lingering rumors about it. AltaVista officially dropped support of the tag in 2002.
Besides Yahoo, the Inktomi engine had provided ranking to MSN, AOL and others over the years. Microsoft shifted to it's own Bing search engine earlier this year, and the #3 search provider has already announced that it grants no relevance to the keywords tag.
Still, many SEO's are clinging to the belief that somewhere out there the keywords tag may still have relevance, but to expend any effort on using this tag is just a waste of time at this point, and a waste of client money that could be better spent on multivariate testing of pages to improve search engine rank.
Another argument defending use of keywords is that it provides additional keyword variant matching that might not be incorporated into the content of the page...but this argument fails if the META data is disregarded entirely. A better approach would be to identify the keyword variants with greatest value and incorporate them into the page content...this is completely white-hat and when written properly will boost the relevance of the page.

Tags: , , , , ,


It's official...Google Ignores the Keywords META Tag

by Andrew Kagan 21. September 2009 11:31

Google officially announced today that it does not use the META Keywords tag at all in its search rankings. While it certainly did use this META tag earlier in the search engine wars 5-10 years ago, the tag has lost all relevance (to Google) because of rampant spamming and abuse of this tag.

The META Keywords tag was part of the "essential" hidden META data that search engines originally relied on when indexing pages. It provided a shortcut repository in which to list all the search words a webmaster felt were relevant to the webpage they appeared on. Like all META data, the keywords were embedded in the HEAD area of the webpage and were invisible to everyone except the search engines themselves.

META Keywords had significant value before search engines had the capacity to read, parse and index the entire content of every page they visited. For a time, it was believed that properly crafted keywords added to the overall relevance and ranking of a page, when they were in close agreement with the page content itself, but Google is slamming the door on that notion moving forward.

Google hastened to clarify that not all META data is "bad" or "ignored", and took pains to remind everyone that they still use the "Description" META tag in search results, when it is the best content summary of the page itself. A good META description is still very important to SEO, because when properly written and displayed in search results, it provides additional information to the user that may influence their likelihood of clicking through on a link. Without this description, you are leaving it up to the search engine to try to find the sentence that best synopsizes the page, and the results are usually poor when this happens.

So, from the horse's mouth, don't bother with the Keywords META tag, and focus your SEO skills on the Description tag instead.



Server Location matters for TLD and Local placement (Google)

by Andrew Kagan 8. June 2009 08:14

Google's Matt Cutts posted a video reply recently on whether a server's physical location affects search does!

Matt pointed out that in the early days of Google (ca. 2000) the only locational reference used was the TLD (top-level domain) of a website, so if your URL ended in ".FR" then it was assumed your website was franco-centric and would be more relevant than a website ending in ".UK". 

The explosion of TLDs of late makes it harder to pinpoint relevance based on URL, so Google is also using the IP address (and parent NetBlock) of the server to identify it's a server located in France will receive more weight for french queries than a server located elsewhere. How much this contributes to overall rank is debatable, but likely it's more important for local search results.

So if you have a web presence in multiple countries, it might make sense to locate servers locally to your markets...certainly it might improve the latency of queries (although again, it would depend on the ISP). There are also IPPs that offer hosting on multiple netblocks in specific territories to achieve the same effect.

Tags: , , ,


Amazon's 20-million-URL Sitemap

by Andrew Kagan 15. May 2009 09:57

The 18th annual International World Wide Web Conference, WWW 2009, was held this past April 20-24 in Madrid, Spain, and it has become the premiere event to publish research and development on the evolution of our favorite medium.

A fascinating (if you're a web geek) paper was presented by Uri Schonfeld of UCLA and Narayanan Shivakumar of Google called "Sitemaps: Above and Beyond the Crawl of Duty". The main thrust of the paper was that traditional web crawlers employed by search engines are becoming overwhelmed by number of new websites and pages appearing daily on the web; by one count, there are more than 3 trillion (!) pages that need to be indexed, deduplicated, and tracked for inbound/outbound links.

The Sitemaps protocol is becoming more and more important to search engines as they try to prioritize and filter this mound of information, and part of the problem is the rise of large-scale CMS systems, which dynamically generate pages regardless of whether there's any real content in them or not. They used the example of, which for any given product will have dozens of subsidiary pages, such as reader reviews, excerpts, images, specifications. Even if there is no content, the link to a dynamically generated page will still return a page with no data in it, creating literally tens of millions of unique URLs at, which "dumb" crawlers must follow and index.

The Sitemap protocol defines an XML file format for search engines to use which not only lists all the URLs that should be indexed, but also provides information on how important the page is, how often it's updated, and when it was last updated. Search engines can use this file to rapidly index the important content and ignore what isn't there, improving the accuracy and time taken to index a website. Every site should have a sitemap, but as of October 2008 it was estimated that only 35 million sitemaps have been published, out of billions of URLs.

Amazon makes a concerted effort to publish accurate sitemap data, as it dramatically reduces the time required to index new content. Even so, Amazon's robots.txt file lists more than 10,000 sitemap files, each holding between 20,000 and 50,000 URLs, for a total of more than 20 million URLs on alone! The authors note that there is still a lot of content duplication and null content pages there, but the number is staggeringly large. After monitoring URLs on another website, they also noted that sitemap crawlers picked up new content significantly faster over time than when using the simple "discovery" method used when there is no sitemap file.

We said before that every website should have a properly constructed sitemap, as it will improve the quality and accuracy of search engines as a whole. Beyond creating the sitemap, registering it with major search engines will provide valuable feedback for the webmaster on crawl and index rates, and provide insights into what the search engine "sees" when it looks at your website. Please create a sitemap for your website today, or just ask us if you need help!

Tags: , , , , , ,


Auto-Submitting Sitemaps to Google...Necessary?

by Andrew Kagan 1. May 2009 10:13

Google's Webmaster Tools provides webmasters with a way to upload XML sitemaps to improve the accuracy of Google's index. Registering and maintaining an accurate sitemap (Google, Yahoo, and Microsoft all accept sitemap data) is important to proper indexing of your website pages, and Google provides two methods for notifying them when the sitemap is updated: manually through the Google website, and "semi-automatically" by sending an HTTP request that signals Google to reload the sitemap.

Ping me when you're ready

The second method can be automated through server-side scripting, so that when content on a website or blog is updated, the sitemap file is updated as well, and the update request is sent to Google at the same time. In theory, this should provide rapid updating of Google's index to include the latest content on your website.

Depending on a number of factors, Google will automatically reload your sitemap file without you specifically requesting it to do so. One factor is the content of the sitemap itself. Besides a list of URLs on your website, the sitemap file can also hold information about date the URL was last updated, and how frequently it is updated. For example, if your homepage content changes every day, you can assign a frequency of "daily" to that URL, telling the search engine it should check that page every day.

It should be noted that incorrect use (or "abuse") of a sitemap, such as indicating pages are new when the content hasn't changed, can cause problems if the search engine recrawls the page too many times without seeing any new data. Empirical data have shown that pages may be dropped from the search engine index under this scenario, and new pages added to this "unreliable" sitemap may be ignored or crawled more slowly.

It's a popularity contest

Another factor in sitemap reloading is link popularity. If a lot of websites are linking to particular pages on your website, search engine spiders will crawl those pages more often, and if the site is large, the sitemap will help prioritize which pages are crawled first.

To Submit, or Not to submit...

We have seen that once a sitemap is submitted and indexed by search engines, they will regulary come back and reload the sitemap looking for new URLs, whether you re-submit it or not. As your website's pagerank (on Google) and general link popularity grows, there's an increase in the frequency that the sitemap will be reloaded, without your taking any do you need to submit it manually or automatically?

The answer is "it depends". Google itself warns webmasters not to resubmit sitemaps more than once per hour, probably because that's as fast as it's going to process the changes and redirect Googlebot to the URLs in the sitemap. If you are auto-submitting sitemaps more than once an hour, the "punishment" could range from the SE ignoring the subsequent re-submits, to something more dire...but no one really knows the consequences. It would probably be safer to resubmit sitemaps on a regular schedule, but we do not have any hard data about this at this time.

When you Should re-submit a sitemap

So when should you re-submit a sitemap? The obvious answer is whenever your content changes, but not more than once an hour. Google does not yet provide an API to query when it last loaded your sitemap, although you can see this data in its Webmaster Tools. If you have some very timely news that the SE really needs to know about, then resubmit the may not increase the crawl rate, but it may impact which URLs are crawled first.

The bottom line is that sitemaps are becoming increasingly important to search engines to help them prioritize the content they crawl, so use them, don't abuse them, help the internet be a better place!

Tags: , , ,


Sitemap Crawling (cont'd)

by Andrew Kagan 30. April 2009 08:59

Following up on yesterday's post about Googlebot and crawlers, I see that Googlebot is coming back to read the sitemap on a regular basis without needing to submit it...probably based on the update-frequency parameter specified in the sitemap...a good thing I hope, although Google is not adding more pages to the index yet.

Registering the sitemap with MSN/Live and Yahoo resulted in immediate crawls by the MSNbot and Slurp crawlers, which of course is a good thing...will see what the indexing rate is in a future post.

Tags: , , , , ,

General | SEO

Sitemap download / crawl frequency update

by Andrew Kagan 29. April 2009 11:22

To follow up on my previous post, after the initial long delay before Google downloaded the sitemap.xml for, the sitemap was resubmitted 24 hours later and Google downloaded it within minutes.

A quick review of the server logs showed Googlebot hitting the website shortly thereafter, which corresponds to behavior reported by Adam at BlogIngenuity. Adam also reported pages quickly appearing in Google's index, but no additional pages appear to have made it in yet for This could be attributable to time-of-day as well as how recently the website was added to Google's index. Presumably the page text is "in the hopper" and being processed (wouldn't a progress bar be a cool webmaster's tool?).

Appropriately tagging the sitemap file with date/frequency/importance data for each URL will probably build the site reputation in Google's index and hopefully priortize content indexing. We know that the better a website's reputation the faster Google will add pages to the index.

Tags: , , , , , ,

General | SEO

Mission Control we have liftoff!

by Andrew Kagan 29. April 2009 04:57

Launching the website was an interesting experiment in measuring Google's crawl rate. The domain had been parked at a registrar for some time, nearly a year, so Googlebot and other crawlers would have known about it, but would not have found any content. This may have been a negative factor in the subsequent crawl rate.

Before launching the website, all the appropriate actions were taken to insure a rapid crawl and index rate:


  • Creation of all relevant pages, with informational pages of high quality and narrow focus
  • Implementation of appropriate META data
  • Validation of all links and HTML markup
  • Implementation of crawler support files such as robots.txt and an XML sitemap 
Finally a sitemap was registered with Google and the site brought online...and then the waiting began. 
  • It took more than two days (approx. 57 hours) after registering the sitemap for Google to actually parse it. Google found no errors.
  • It took three more days after parsing the sitemap for Googlebot to actually crawl the site. 
  • More than 24 hours after crawling the site, Google had added only three pages to its index.
It seems that the days of "launch today, indexed tomorrow" are in the past. Even with publishing a website based on Google's best practices, it seems that Google is somewhat overwhelmed at this point and crawl rates for new sites are being delayed.

Two unknowns:
  • Does leaving a domain parked for a long time negatively impact the initial crawl rate?
  • Does the TLD -- "COM", "NET", "PRO" -- affect the crawl rate? Does Google give precedence to well-regarded TLDs over new/marginal TLDs?

I will be testing this hypothesis with additional sites in the near future. 


Tags: , , ,

General | SEO

Powered by BlogEngine.NET
Theme by Mads Kristensen updated by Search Partner Pro