Earlier this month, Google stopped providing the statistic “not selected” from their advanced section of “Index Status”. As a webmaster and SEO specialist, this tool was very helpful to see which pages were being filtered out. The “not selected” window was pretty broad. Google can choose not to select a page based on a number of reasons but the most common reasons were: poor content or duplicate content. Although Google never gave us a list of which pages were not selected, the statistic did give us a general idea of which pages produced were of poor content.
I believe the reason that this statistic was removed is exactly the reason it was so helpful. This statistic allowed webmasters to test the waters and see what content worked for their site and what didn’t. My guess is that Google was not happy with the fact that they were providing webmasters an inside look into their algorithm. If you monitored your indexed and not selected pages closely, you could use those numbers to test new pages. This could provide you with a window into what Google was specifically looking for in terms of content. Since content has been a growing factor in Google’s algorithm over the past 2 years, it makes perfect sense that Google would want to protect it. By eliminating our ability to measure new content, Google is able to protect the largest part of their search engine equation. Although there has been no official word from Google on why they have removed this feature, I strongly believe it is to protect their algorithm.
While I am sad to see this section go, there are still ways to help measure which pages of your site are being read by Google. I first recommend generating an XML sitemap of your website. This will give you a blue print of every page on your site. This can be generated by hand or with the aid of a sitemap generator. Next, go to Google.com and run a site search of your website. This can be done be searching for “site:www.yourdomain.com”. This search will output every URL that Google has indexed for your site. The last step is comparing your XML sitemap list to the list generated by Google. Those URLs not listed in your Google column, will be the pages that Google has either not read or not selected. This simple step can help you determine the number of pages that require indexing or more work.
I would recommend performing this type of exercise quarterly. If you are engaged in a blog program or you are adding new pages at a high rate, you may want to run this every two months. How many times a year you complete this exercise really depends on how much new content/pages your website produces yearly.
By Matthew Wilkos