Tools and Tactics for Diagnosing Technical Search Issues Vanessa Fox
 
Diagnostic Checklists and Resources Search Accessibility Checklist Search Discoverability Checklist Diagnostic Tools janeandrobot.com
Search Engine Tools Created by NineByBlue.com Google Webmaster Central http://www.google.com/webmasters Microsoft Live Search Webmaster Center http://webmaster.live.com Yahoo! Site Explorer http://siteexplorer.search.yahoo.com Google Analytics http://www.google.com/analytics Google Search http://www.google.com
Ranking and Diagnostic Tools Created by NineByBlue.com SEOBook Rank Checker http://tools.seobook.com/firefox/rank-checker/ Firefox Web Developer Toolbar https://addons.mozilla.org/en-US/firefox/addon/60 Firefox Firebug http://getfirebug.com/ Firefox Live HTTP Headers https://addons.mozilla.org/en-US/firefox/addon/3829 Google Search http://adlab.msn.com/Keyword-Forecast/default.aspx http://janeandrobot.com/resources
How Search  Engines Work
Crawling Discover links Check robots rules Bandwidth considerations URLs Indexing Canonicalization Context extraction Topic association Web-wide value Ranking Relevance Value Uniqueness Display
Search Engine Crawlers Haven’t Quite Grown Up Yet
Crawling Lack of discovery Crawl inefficiency URL issues (infinite, redirects, dynamic) Inaccessible links Indexing Duplication Extraction issues Lack of exposed  content Non-optimized media Ranking Display issues Lack of quality links Guidelines violations Non-focused content
Step 1:  Get the Data Pages crawled Pages indexed Web traffic Key ranking metrics
Crawling Indexing Ranking Which pages have the search engines crawled? What kind of pages are they? Has the search engine indexed all of the crawled pages? How’s the search engine traffic?
Benchmarking Top ten queries that bring search traffic Search results position URL that ranks
 
Crawl Issues
Crawl Log Example: Apache Log Analyzer 2 Feed 1 /**  2 * @see ApacheLogAnalyzer2Feed  3 */  4 require_once 'ApacheLogAnalyzer2Feed.php';  5  6 // create a new instance, parse access. log  and  7 // write test.xml  8 $tool = new ApacheLogAnalyzer2Feed('access. log ', 9 'test.xml');  10 // select entries matching Googlebot useragent  11 $tool->addFilter('User-Agent', 'Mozilla/5.0  12 (compatible; Googlebot/2.1;  13 +http://www.google.com/bot.html)');  14  // run  15 $tool->run(); 12 http://code.simonecarletti.com/wiki/apachelog2feed 1 /**  2 * @see ApacheLogAnalyzer2Feed  3 */  4 require_once 'ApacheLogAnalyzer2Feed.php';  5  6 // create a new instance, parse access. log  and write test.xml  7 $tool = new ApacheLogAnalyzer2Feed('access. log ', 'test.xml');  8 // select entries matching Googlebot useragent with a regular  9 expression pattern  10 $tool->addFilter('User-Agent', 'regexp:Googlebot');  11 // select entries with Request matching a regular expression  12 // pattern  13 $tool->addFilter('Request', 'regexp:/site/profile\.php');  14 // run  15 $tool->run();  16 All Pages Google’s Crawled All Profile Pages Google’s Crawled
Communicating with Search Robots
Extractable Link Issues: Flash
Extractable Link Issues: Images
Extractable Link Issues: AJAX
Extractable Link Issues: URL Errors
Extractable Link Issues: URLs That Expire
Comprehensive external links At least one internal link to every page XML Sitemap referenced in robots.txt with the comprehensive list of canonical URLs Comprehensive HTML sitemap Ensure links load without JavaScript, images, or other rich media Ensure robots.txt and meta robots tag is used correctly URL Discovery  Checklist http://janeandrobot.com/library/managing-robots-access-to-your-website
URL Structure Checklist Keep number of parameters in dynamic URLs short Don’t use temporary URLs that expire Ensure redirects are 301 and are short Use dashes rather than underscores when separating  words Use keywords in URLs for higher  click through and better anchor text
Canonicalization Checklist Have only URL for each page Put all unneeded details in cookies, rather than URLs (session IDs, tracking parameters) Don’t allow infinite parameters Use 301 redirects for any URL changes 301 redirect www/non-www Use absolute URLs for internal links Ensure canonical version is in XML Sitemap Use rel=canonical attribute for optional parameters Block print and other versions with robots.txt http://janeandrobot.com/library/url-referrer-tracking http://searchengineland.com/canonical-tag-16537
Crawl Efficiency Checklist Ensure page load times aren’t slow as to reduce number of pages crawled Ensure server is responsive Return a 304 for unchanged content Use compression Return a 404 for not found content Ensure each page has at least one link Avoid infinite redirects and redirect loops Ensure most important pages are linked from home page No JavaScript redirects or meta refresh redirects (if possible) Reasonable crawl-delay setting (if used at all) Reasonable use of Google Webmaster Tools crawl setting
Indexing Issues
Indexing Example: XML Sitemaps http://sitemaps.org XML Sitemap <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?>  <urlset xmlns=&quot;http://www.sitemaps.org/schemas/sitemap/0.9&quot;>     <url>        <loc>http://www.example.com/</loc>      </url> <url>        <loc>http://www.example.com/page1.php</loc>    </url>  <url>        <loc>http://www.example.com/page2.php</loc>    </url>  </urlset>  http://www.google.com/webmasters Pages Indexed From Sitemap
Duplicate Content Issues
Partner Content http://www.google.co.uk/search?q=%22The+Radisson+Edwardian+Vanderbilt+Hotel+stands+among+a+row+of+Victorian+townhouses+located+in+the+fashionable+Kensington+district+of+London,+England%22&hs=cN0&filter=0
Indexing Diagnostic Checklist Have the pages ever been indexed? If deindexed, are you sure they are no longer in the index? Is the indexing loss across all engines? What was percentage of loss? Is there a pattern? Check Google Webmaster Tools for errors/blocking Did you change infrastructure/CMS/implement redirects? What’s the linking pattern?
Indexing Checklist: Content Extraction Ensure content is in text wherever possible Ensure text isn’t hidden in: JavaScript/AJAX Flash Video Images Avoid multiple URLs for the same page and very similar pages
Indexing Checklist: Semantic Markup Use keywords in title tag Ensure each page has a unique meta description tag Use keywords in (single) H1 tag Appropriate use of H2 – H6 tag Relevant anchor text in a href tags Put Javascript in .js file (except onclick event functions) and style details in .css Validate HTML to ensure it renders Provide focus for each page Ensure pages provide unique and valuable content beyond boilerplate template and reused content
Optimizing Images Don’t put text in images Use descriptive ALT text Use descriptive filenames Provide caption and surrounding text Be cautious about logo images Consider blocking non-useful images with robots.txt Don’t provide alternate text using CSS that styles the text off the page (such as -9999) http://janeandrobot.com/post/Effectively-Using-Images.aspx
Ranking Issues
How’s the Search Engine Traffic? Overall Percentage Percentage Non-Branded
Do You Rank For the Right Things? arbor snowboards snowboard Google 1 49 500+ Yahoo 1 80 500+ Live Search 3 128 500+
If ranking loss… Drop For All Keywords Does the site rank for different queries than before? Did you substantially change the site content? Did you change the underlying site infrastructure? Was there a large change in linking behavior? Could there be a penalty? Drop For Only Some Keywords Do different pages rank highest than used to rank before? Are the pages that used to rank still indexed?
Ranking Checklist Relevance What is the page about? Are the pages ranking for the desired query more relevant? Do the pages use the language of the searcher? Value How many relevant links (and how authoritative are they?) What’s the value of the page? (do more useful pages rank above yours?) SERP display Are the title and snippet compelling? Do Sitelinks appear for navigational queries? What universal elements appear on the page? Does the site rank for non-branded queries?
The Webmaster Guidelines http://google.com/support/webmasters/bin/answer.py?answer=35769 Common Definition of Spam On page schemes Keyword stuffing Fake/ stolen content Hidden text Hidden links Cloaking Linking schemes Paid Links Link exchanges Doorway pages Deceptive redirects
Getting Out of the Penalty Box Check if you’ve been penalized Live Search:  http://webmaster.live.com   Google:  http://google.com/webmasters   ** maybe ** Review the webmaster guidelines Google ,  Live Search ,  Yahoo Identify the issue Fix it! Request re-evaluation Google:  http://google.com/webmasters   Live Search:  http://webmaster.live.com
Traffic Issues
Traffic Drop
Display Issues
Would you click this link?
Does the Result Inspire Clicks?
First step in diagnosis: find the root
Ninebyblue.com Twitter.com/vanessafox Jane and Robot Developer Summit June 12 th , 2009 – San Francisco FREE for SMX attendees! janeandrobot.com Twitter.com/janeandrobot

Diagnosing Technical Issues With Search Engine Optimization

  • 1.
    Tools and Tacticsfor Diagnosing Technical Search Issues Vanessa Fox
  • 2.
  • 3.
    Diagnostic Checklists andResources Search Accessibility Checklist Search Discoverability Checklist Diagnostic Tools janeandrobot.com
  • 4.
    Search Engine ToolsCreated by NineByBlue.com Google Webmaster Central http://www.google.com/webmasters Microsoft Live Search Webmaster Center http://webmaster.live.com Yahoo! Site Explorer http://siteexplorer.search.yahoo.com Google Analytics http://www.google.com/analytics Google Search http://www.google.com
  • 5.
    Ranking and DiagnosticTools Created by NineByBlue.com SEOBook Rank Checker http://tools.seobook.com/firefox/rank-checker/ Firefox Web Developer Toolbar https://addons.mozilla.org/en-US/firefox/addon/60 Firefox Firebug http://getfirebug.com/ Firefox Live HTTP Headers https://addons.mozilla.org/en-US/firefox/addon/3829 Google Search http://adlab.msn.com/Keyword-Forecast/default.aspx http://janeandrobot.com/resources
  • 6.
    How Search Engines Work
  • 7.
    Crawling Discover linksCheck robots rules Bandwidth considerations URLs Indexing Canonicalization Context extraction Topic association Web-wide value Ranking Relevance Value Uniqueness Display
  • 8.
    Search Engine CrawlersHaven’t Quite Grown Up Yet
  • 9.
    Crawling Lack ofdiscovery Crawl inefficiency URL issues (infinite, redirects, dynamic) Inaccessible links Indexing Duplication Extraction issues Lack of exposed content Non-optimized media Ranking Display issues Lack of quality links Guidelines violations Non-focused content
  • 10.
    Step 1: Get the Data Pages crawled Pages indexed Web traffic Key ranking metrics
  • 11.
    Crawling Indexing RankingWhich pages have the search engines crawled? What kind of pages are they? Has the search engine indexed all of the crawled pages? How’s the search engine traffic?
  • 12.
    Benchmarking Top tenqueries that bring search traffic Search results position URL that ranks
  • 13.
  • 14.
  • 15.
    Crawl Log Example:Apache Log Analyzer 2 Feed 1 /** 2 * @see ApacheLogAnalyzer2Feed 3 */ 4 require_once 'ApacheLogAnalyzer2Feed.php'; 5 6 // create a new instance, parse access. log and 7 // write test.xml 8 $tool = new ApacheLogAnalyzer2Feed('access. log ', 9 'test.xml'); 10 // select entries matching Googlebot useragent 11 $tool->addFilter('User-Agent', 'Mozilla/5.0 12 (compatible; Googlebot/2.1; 13 +http://www.google.com/bot.html)'); 14 // run 15 $tool->run(); 12 http://code.simonecarletti.com/wiki/apachelog2feed 1 /** 2 * @see ApacheLogAnalyzer2Feed 3 */ 4 require_once 'ApacheLogAnalyzer2Feed.php'; 5 6 // create a new instance, parse access. log and write test.xml 7 $tool = new ApacheLogAnalyzer2Feed('access. log ', 'test.xml'); 8 // select entries matching Googlebot useragent with a regular 9 expression pattern 10 $tool->addFilter('User-Agent', 'regexp:Googlebot'); 11 // select entries with Request matching a regular expression 12 // pattern 13 $tool->addFilter('Request', 'regexp:/site/profile\.php'); 14 // run 15 $tool->run(); 16 All Pages Google’s Crawled All Profile Pages Google’s Crawled
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
    Extractable Link Issues:URLs That Expire
  • 22.
    Comprehensive external linksAt least one internal link to every page XML Sitemap referenced in robots.txt with the comprehensive list of canonical URLs Comprehensive HTML sitemap Ensure links load without JavaScript, images, or other rich media Ensure robots.txt and meta robots tag is used correctly URL Discovery Checklist http://janeandrobot.com/library/managing-robots-access-to-your-website
  • 23.
    URL Structure ChecklistKeep number of parameters in dynamic URLs short Don’t use temporary URLs that expire Ensure redirects are 301 and are short Use dashes rather than underscores when separating words Use keywords in URLs for higher click through and better anchor text
  • 24.
    Canonicalization Checklist Haveonly URL for each page Put all unneeded details in cookies, rather than URLs (session IDs, tracking parameters) Don’t allow infinite parameters Use 301 redirects for any URL changes 301 redirect www/non-www Use absolute URLs for internal links Ensure canonical version is in XML Sitemap Use rel=canonical attribute for optional parameters Block print and other versions with robots.txt http://janeandrobot.com/library/url-referrer-tracking http://searchengineland.com/canonical-tag-16537
  • 25.
    Crawl Efficiency ChecklistEnsure page load times aren’t slow as to reduce number of pages crawled Ensure server is responsive Return a 304 for unchanged content Use compression Return a 404 for not found content Ensure each page has at least one link Avoid infinite redirects and redirect loops Ensure most important pages are linked from home page No JavaScript redirects or meta refresh redirects (if possible) Reasonable crawl-delay setting (if used at all) Reasonable use of Google Webmaster Tools crawl setting
  • 26.
  • 27.
    Indexing Example: XMLSitemaps http://sitemaps.org XML Sitemap <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <urlset xmlns=&quot;http://www.sitemaps.org/schemas/sitemap/0.9&quot;>     <url>        <loc>http://www.example.com/</loc>     </url> <url>        <loc>http://www.example.com/page1.php</loc>   </url> <url>        <loc>http://www.example.com/page2.php</loc>   </url> </urlset> http://www.google.com/webmasters Pages Indexed From Sitemap
  • 28.
  • 29.
  • 30.
    Indexing Diagnostic ChecklistHave the pages ever been indexed? If deindexed, are you sure they are no longer in the index? Is the indexing loss across all engines? What was percentage of loss? Is there a pattern? Check Google Webmaster Tools for errors/blocking Did you change infrastructure/CMS/implement redirects? What’s the linking pattern?
  • 31.
    Indexing Checklist: ContentExtraction Ensure content is in text wherever possible Ensure text isn’t hidden in: JavaScript/AJAX Flash Video Images Avoid multiple URLs for the same page and very similar pages
  • 32.
    Indexing Checklist: SemanticMarkup Use keywords in title tag Ensure each page has a unique meta description tag Use keywords in (single) H1 tag Appropriate use of H2 – H6 tag Relevant anchor text in a href tags Put Javascript in .js file (except onclick event functions) and style details in .css Validate HTML to ensure it renders Provide focus for each page Ensure pages provide unique and valuable content beyond boilerplate template and reused content
  • 33.
    Optimizing Images Don’tput text in images Use descriptive ALT text Use descriptive filenames Provide caption and surrounding text Be cautious about logo images Consider blocking non-useful images with robots.txt Don’t provide alternate text using CSS that styles the text off the page (such as -9999) http://janeandrobot.com/post/Effectively-Using-Images.aspx
  • 34.
  • 35.
    How’s the SearchEngine Traffic? Overall Percentage Percentage Non-Branded
  • 36.
    Do You RankFor the Right Things? arbor snowboards snowboard Google 1 49 500+ Yahoo 1 80 500+ Live Search 3 128 500+
  • 37.
    If ranking loss…Drop For All Keywords Does the site rank for different queries than before? Did you substantially change the site content? Did you change the underlying site infrastructure? Was there a large change in linking behavior? Could there be a penalty? Drop For Only Some Keywords Do different pages rank highest than used to rank before? Are the pages that used to rank still indexed?
  • 38.
    Ranking Checklist RelevanceWhat is the page about? Are the pages ranking for the desired query more relevant? Do the pages use the language of the searcher? Value How many relevant links (and how authoritative are they?) What’s the value of the page? (do more useful pages rank above yours?) SERP display Are the title and snippet compelling? Do Sitelinks appear for navigational queries? What universal elements appear on the page? Does the site rank for non-branded queries?
  • 39.
    The Webmaster Guidelineshttp://google.com/support/webmasters/bin/answer.py?answer=35769 Common Definition of Spam On page schemes Keyword stuffing Fake/ stolen content Hidden text Hidden links Cloaking Linking schemes Paid Links Link exchanges Doorway pages Deceptive redirects
  • 40.
    Getting Out ofthe Penalty Box Check if you’ve been penalized Live Search: http://webmaster.live.com Google: http://google.com/webmasters ** maybe ** Review the webmaster guidelines Google , Live Search , Yahoo Identify the issue Fix it! Request re-evaluation Google: http://google.com/webmasters Live Search: http://webmaster.live.com
  • 41.
  • 42.
  • 43.
  • 44.
    Would you clickthis link?
  • 45.
    Does the ResultInspire Clicks?
  • 46.
    First step indiagnosis: find the root
  • 47.
    Ninebyblue.com Twitter.com/vanessafox Janeand Robot Developer Summit June 12 th , 2009 – San Francisco FREE for SMX attendees! janeandrobot.com Twitter.com/janeandrobot

Editor's Notes

  • #2 who in audience is: seo Dev Product Know a lot about seo, etc.