Bot Traffic · June 7, 2025

AI Crawlers: Traffic Almost Level With Google, Referrals Nowhere Close

AI training crawlers consumed 4.2% of all HTML requests by late 2025, approaching Googlebot's 4.5% share. Their crawl-to-referral ratios tell a different story: over 1,000 pages crawled per click sent back.

AI training crawlers collectively consumed 4.2% of all HTML requests by December 2025 — within 0.3 percentage points of Googlebot's 4.5% share. That volume parity masks a profound asymmetry: the crawl-to-referral ratio for those same AI crawlers ranged from 857:1 to more than 23,000:1, versus Googlebot's 5:1. For site owners, understanding which bots are hitting servers — and what they are taking — is now a first-order infrastructure question.

Method

This analysis draws from edge-network bot traffic reports covering H2 2025 through Q1 2026, aggregated across hundreds of billions of monthly HTTP requests logged at the network edge. Bot-share percentages represent fractions of bot-category traffic, not total requests. Crawl-to-referral ratios are computed by cross-referencing inbound crawl volume per operator against outbound referral traffic attributed to those operators' AI products across the same measurement period.

Bot-mix breakdown: April 2026

Googlebot held 30.28% of all bot-category traffic in April 2026. The field of AI training and search crawlers has grown large enough to account for most of the remaining share.

AI Crawler Share of Bot Traffic, April 2026

Percentage share of bot-category HTTP requests. Googlebot leads; AI training crawlers collectively account for over half the bot mix.

Source: Edge network bot traffic analysis, Q1 2026

Meta-ExternalAgent held 14.91%, followed by ClaudeBot at 11.69%, GPTBot at 9.84%, and Applebot at 9.23%. Bytespider — serving several AI product lines for a major short-video platform — held 5.73% in April before nearly doubling to 10.25% in May 2026, passing ClaudeBot in a single month. An "other" bucket covering dozens of smaller crawlers accounts for the remaining 18.32%.

Month-to-month volatility is significant. ClaudeBot led GPTBot in April (11.69% vs 9.84%) but trailed in May (9.73% vs 11.48%). A bot accounting for 4% of your crawler traffic this month may account for 12% the next. Any monitoring approach that is only a point-in-time snapshot misses these shifts entirely.

What these crawlers are actually doing

Of the AI-driven crawler volume in 2025 through early 2026, roughly 80% is attributable to training data collection, 18% to search indexing, and 2% to agentic tasks — autonomous browsing on behalf of a live user session in an AI assistant.

AI Crawler Activity by Purpose, 2025–Q1 2026

Training dominates. Only search-indexed and agentic crawlers reliably return referral traffic.

Source: AI crawler purpose analysis, edge network reports 2025

The training-heavy composition has a direct consequence for referral attribution. Training crawlers ingest content to update model weights; they have no mechanism for returning referral clicks. Only search-indexed crawlers and, to a lesser extent, agentic crawlers convert crawl activity into downstream audience traffic. This means the majority of AI-driven bot traffic — by volume — produces zero measurable referral benefit to publishers at the time of crawling.

Crawl intensity by bot

Measured across a 48-day window of server log data spanning multiple site categories, median daily crawl rates per site were approximately 4,200 requests for GPTBot, 1,800 for ClaudeBot, and 980 for PerplexityBot.

Median Daily Crawl Rate by AI Bot (requests/site/day)

Medians across a 48-day server-log study. High-traffic sites attract multiples of these figures; training bots do not back off under load.

Source: 48-day server log analysis, multiple site categories, 2025

These are median figures; content-dense or high-traffic sites attract significantly higher crawl rates. Unlike traditional search crawlers, most AI training crawlers do not calibrate crawl rate against server response latency. A site experiencing degraded response times will typically see training bots maintaining crawl rate rather than backing off. Crawl-Delay directives in robots.txt are honored inconsistently — some smaller training crawlers ignore them entirely.

Crawl-path preferences diverge by bot. GPTBot operates breadth-first, concentrating on /blog/, /docs/, and /about/ paths with a mean crawl depth of 3.8 links from the homepage. ClaudeBot crawls deeper at an average depth of 5.2, with heavier concentration on /docs/ and /api/ paths. Understanding these patterns matters if you are serving AI-optimized or pre-rendered content selectively across path types.

The crawl-to-referral gap

The bluntest measure of AI crawler value to a publisher is the crawl-to-referral ratio — pages crawled per referral click delivered. Across Q1 2026 measurements, ClaudeBot was observed crawling between 11,000 and 24,000 pages per referral sent back. GPTBot's ratio ranged from 857 to 1,276 pages per referral. Googlebot sits at approximately 5:1. DuckDuckGo's crawler reaches near-parity at roughly 1.5:1.

AI assistants accounted for approximately 0.20% of all web referral traffic as of late 2025, despite the underlying crawlers generating nearly as many HTML requests as Googlebot. That gap has not meaningfully narrowed in recent quarters.

What this means for site owners

Three industries absorbed more than 95% of AI-driven crawler traffic in 2025: retail and e-commerce, streaming and media, and travel. If your site operates in one of these verticals, significant crawl load from every major training crawler is a near-certainty regardless of whether you have explicit access controls in place.

Analytics platforms relying on JavaScript-based page-view tracking typically exclude bot traffic entirely, since most AI crawlers do not execute JavaScript. Server-log analysis is the only reliable measure of true crawler volume. A site showing 5,000 daily users in a JS-based analytics dashboard may be handling 40,000 to 60,000 daily bot requests that never appear there. The gap matters both for accurate bandwidth cost forecasting and for understanding the actual composition of server load.

The crawl-to-referral ratio should inform access control decisions. Crawlers with ratios above 1,000:1 are consuming bandwidth and contributing content to training datasets with minimal reciprocal value via referral traffic. Whether that exchange is acceptable depends on your content strategy and business model — but making the decision without crawl-log data means you are defaulting, not deciding. The bot-mix volatility documented above means access decisions also need regular review: a rule set calibrated to April's bot landscape may be badly miscalibrated by June.