How Does Google Know When to Stop Crawling a Site?
Google’s Webmaster Trends Analyst Gary Illyes shared at SMX East a few factors that Googlebots take into consideration when crawling a site.
One of the main factors that Google’s crawlers look at is the time it takes to connect to the server and access a site’s page. If it takes crawlers a long period of time to connect to your site, they will likely back off so that they do not overload your servers.
A long load time is also detrimental to the user experience. No one appreciates having to wait forever for a website to finish loading, and if crawlers can’t access your pages, it will be very difficult to improve search results.
Additionally, if Google’s crawlers encounter 5xx Server Error status codes they will also stop crawling your site.
However, just because crawlers have a tough time communicating with your server doesn’t mean they won’t crawl your site in the future. It simply means that Google does not want to overload your web server, and will try back later.
These two factors are not the only that Googlebots take into consideration. One should not forget about disavow, robots.txt and nofollow tags, which tell Google not to look at a page.
To prevent problems with Google crawling your site, webmasters should ensure they choose a reliable hosting provider and frequently monitor website load times.