Why Is ClaudeBot Crawling Your Site 71,000 Times for Every Single Visitor It Sends?
ClaudeBot crawled 70,900 pages per referral in June 2025. GPTBot: 1,700:1. Here's the bot-by-bot breakdown of what AI crawlers are actually doing — and which ones are worth letting in.
There's a number buried in a June 2025 network analysis that should change how you think about AI bot access control: 70,900. That's how many pages ClaudeBot crawled for every single referral session it generated that week. GPTBot's ratio for the same period? 1,700:1. Both bots were hitting millions of sites, both showed up in server logs in volume — and neither was sending meaningful traffic back.
So what's the right question to ask about AI bots? Not "block or allow?" It's: which bots, doing what, for whom? Because the answer to that determines whether you're protecting your content, giving it away for free, or accidentally blocking the small slice of AI bots that might actually send you users.
Where does this data come from?
The crawl-to-referral ratios come from a June 2025 dataset measuring verified referral sessions against total crawl volume by user-agent. Essentially: for each bot, how many page fetches resulted in a user actually clicking through from an AI platform? A network provider's analysis of billions of requests, independent robots.txt research across millions of sites, and request-level data from a major hosting platform's logs fill in the broader picture.
What are AI bots actually doing when they hit your site?
Most site owners have one mental model: "AI bot hits my site, learns things, maybe cites me." That model isn't wrong — it just describes one of at least three distinct behaviours.
Take the bots from the company behind ChatGPT as an example. They disclose three separate user-agent strings:
- GPTBot — an offline training crawler. It fetches pages to build model datasets. There's no user behind it and no referral potential. When it's done, your content is in a data pipeline somewhere.
- OAI-SearchBot — crawls to build the ChatGPT Search index. Pages indexed here can surface in real-time results when users search inside ChatGPT. This category has actual referral potential.
- ChatGPT-User — fires only when a live user shares a URL directly in a chat session. The rarest category, but the one most likely to generate a real visit back to your site.
These are independent robots.txt directives. You can ALLOW OAI-SearchBot while DISALLOWing GPTBot. Most sites aren't doing this — they're making a single decision that applies to the entire bot family from one operator, treating all three as interchangeable when they're not.
What is all that AI crawl traffic actually for?
Aggregate data covering 12 months through mid-2025 puts it plainly: roughly 80% of all AI bot requests are training crawls. About 18% are search indexing. User-driven fetches — requests triggered by a live human — account for around 2%.
That 80/18/2 split is why the referral numbers look so bleak for most bots. The dominant category of AI traffic is categorically not designed to send you users. It's data acquisition for model training. Whether that eventually benefits you — through your content being surfaced in AI answers — depends on how aggressively each platform cites indexed content versus using it only as a training signal. The crawl-to-referral ratios suggest many platforms do a great deal of the former and very little of the latter.
Which bots are growing fastest — and does it matter for traffic?
GPTBot was the standout growth story of 2025: raw request volume up 305% from May 2024 to May 2025, pushing it from rank #9 to rank #3 among all verified crawlers. One large hosting network captured 569 million GPTBot requests in a single month. Combined AI crawler traffic across major platforms reached roughly 28% of Googlebot's monthly volume by late 2024.
The highest percentage growth rate belonged to PerplexityBot: up over 157,000% year-over-year from a small base. User-driven fetches (the 2% category) grew about 15 times over 2025 — low volume, but the category that actually correlates with referrals.
Here's the thing: growth in raw crawl volume tells you almost nothing useful about referral potential. A bot can triple its crawl frequency while its referral ratio stays at 70,000:1. More requests, same outcome. The chart above makes the practical implication clear: PerplexityBot returns 5 referrals per 1,000 crawl requests. GPTBot returns 0.6. ClaudeBot returns 0.01. If your goal is AI-driven traffic, volume of crawl activity isn't the metric that matters.
What content do AI crawlers actually target?
Industry-level analysis of AI crawl patterns shows retail and e-commerce at the top, attracting more than 25% of observed AI crawling activity. Software documentation and developer resources follow at around 15.6%. Media companies and financial services appear lower on the list, likely because tighter access controls and paywalls make them harder to reach.
Within a page, content signals matter for crawl frequency. Research tracking re-crawl rates by page characteristics found that pages with data tables, multiple heading sections, and longer bodies get crawled more often and re-indexed faster. Pages in the 25–40 KB range (roughly 2,500–4,500 words of structured content) tend to hit a sweet spot.
One pattern worth noting for anyone running an image-heavy site: some AI crawlers fetch images at a notably higher rate than others. A blanket block might be cutting off a crawler that was specifically interested in your visual content — worth checking your logs before making that call.
What should site owners actually do?
The honest takeaway is that a blanket policy — block everything or allow everything — is suboptimal in almost every direction.
If your goal is protecting content from training use: the entries you want in your DISALLOW rules are the offline training user-agents, not the search-indexing ones. These often come from the same operator under different user-agent strings, and the blocking behaviour is independent. By May 2025, roughly 60% of sites had blocked at least one AI crawler, and 79% of top news publishers had blocked AI training bots specifically. The blocking trend is real and accelerating — and it mostly targets the training category.
If your goal is getting AI-driven referrals: the bots worth allowing are the search-indexing and user-triggered ones, not the training crawlers. PerplexityBot's referral ratio (under 200:1 as of late 2025) is orders of magnitude better than a training-only bot at 70,000:1. Blocking PerplexityBot to be "safe from AI" might be actively hurting your referral numbers.
The practical step is to audit your robots.txt at the individual user-agent level rather than treating "AI bots" as a monolithic category. GPTBot and OAI-SearchBot are different directives with different impacts. Getting this right is less about ideology around AI — "pro" or "anti" — and more about matching your access rules to what each specific bot actually does when it hits your site.