5. Google’s Crawl Budget
Crawl Rate Crawl Demand
How often the site is visited.
This is determined by:
• Crawl health
What is Good Health?
• Quick response time
• Limited server errors
Fast site = increased crawl rate
Slow site = Google takes it easy, slows
down, visits less.
Popularity
• How often URLs are visited from the web
• Removal of stale URLs to keep the SERPs
fresh
There are two elements that decide how much crawl budget a website
gets:
Crawl Rate + Crawl Demand = Crawl Budget
6. Factors affecting crawl budget
• Having many low-value-add URLs has a negative
impact on budget. These include:
• Faceted navigation & session identifiers
• On-site duplicate content
• Low quality & spam content
• Soft error pages e.g. 404s; 500s
• Hacked pages
• Infinite spaces & proxies
Source: from Google Webmaster blog
7. Why give a jot about the Bot:
Wasting server resources on pages like those mentioned will drain
crawl activity from pages that do actually have value.
This may cause a significant delay in discovering great content on
your or a client’s site.
8. Crawl Rate Limit
Crawl rate limit is designed to help Google not crawl your pages too
much and too fast where it hurts your server.
In other words, if the Bot thinks your site can’t cope, it will take it
easy. It will crawl slower and therefore reach less of your site.
9. Crawl Frequency
Crawl Frequency is the number of days per month that Googlebot
requests a URL. There is a clear relationship between traffic and
frequency of crawl, making this a critical SEO indicator.
Understanding what Google is crawling frequently is a good indicator of
what Google thinks is worthwhile, what it needs to keep fresh in the
index.
Understanding the characteristics of those pages can inform what you
might need to do to improve the remainder.
Source: Botify
10. Factors affecting crawl frequency
• Things that can reduce crawl frequency include:
• Site structure issues
• Duplicate content
• Publishing pages for which there’s no demand
• Publishing at a rate that is faster than what Google is ready to admit to
the index
Pages with more internal links are more likely to be crawled more frequently.
Source: Botify
11. Why give a jot about the Bot:
• Crawl budget is a precious resource. We need to use it wisely. It is SO
important on larger sites.
• More pages crawled = the more pages that may be indexed.
• More bot energy focussed on high value pages over poor low value content.
• New content is discovered quickly and easily, giving it a chance to rank
sooner
12. What we want to achieve:
A well-organized site in which the most important content is easily accessible
from the homepage and other important entry points.
A speedy site that represents healthy servers, so the Bot can get more content
over the same number of connections.
13. • Use Deepcrawl to identify thin pages
• Improve these pages where possible
• Consider combining with another page or removing
Thin
Pages
Duplicate
Content
Internal Linking
• Always do a ‘site:’ search for a variety of related keywords to your
new page / content
• Make sure your content is unique
• Make sure your content sits logically in the site’s hierarchy
• Remove orphaned pages – link to them from relevant pages,
ideally those that are indexed and do well
• Make sure important pages are linked to widely, from relevant
variations of optimised anchor text
• Make sure sitemaps are up-to-date
• Use search console to discover most linked to pages & make sure
these respond to priorities
What we need to do?