OAI-SearchBot's 3.5× Surge and the Crawl-to-Refer Gap
OAI-SearchBot crawl events grew 3.5× after the August 2025 AI model release. Yet AI crawlers still take thousands of crawls to generate a single referral. Here is what the data shows.
OAI-SearchBot logged 3.5× more crawl events per site in enterprise log data in the 90 days after August 2025 — a surge that tracks directly with the release of a major AI assistant model. The instinct is to read that growth as a sign that AI search is sending more traffic to websites. The referral data says something different: for every 11,122 pages ClaudeBot crawls, one user arrives at a site from an AI assistant recommendation. The crawl-to-refer ratio is the metric that actually predicts business value from AI crawler traffic, and for most AI bots, it is still catastrophically lopsided.
Method
This analysis draws on two datasets. Botify published a 7-billion-event crawl log analysis spanning November 2024 through March 2026 across its enterprise client base, tracking per-bot event volumes segmented by industry vertical and anchored on the August 2025 AI model release as the key inflection point. SEOmator cross-referenced crawler log volumes against referrer-header attribution data to compute crawl-to-refer ratios for major AI and traditional search bots, with figures updated through May 2026. Blocking behavior data comes from TechnologyChecker.io's analysis of robots.txt configurations across top websites as of June 2026.
OAI-SearchBot's Post-August Surge
Before August 2025, OAI-SearchBot and GPTBot were running at near-parity in Botify's enterprise dataset, with a ratio of 0.95:1. After the August 2025 model release, OAI-SearchBot grew 3.5× and GPTBot grew 2.9×. OAI-SearchBot's share crossed above GPTBot for the first time — a structural shift consistent with a rebalancing of the crawler fleet from training-mode collection toward retrieval-mode search serving.
The growth was not uniform across site categories. Healthcare sites saw OAI-SearchBot events increase 740% versus the prior 90-day baseline. Media and publishing climbed 702%. Marketplaces grew 216%, software 190%, and travel only 30%. The distribution tracks with which content categories users are most likely to query through AI-assisted search: health information and news are common query targets, while travel remains dominated by transactional booking interfaces.
At total scale, the fleet is a fraction of traditional search: Googlebot processes 18.2 billion events per month across Botify's client base, versus 887 million for OAI-SearchBot and GPTBot combined. AI crawlers represent about 4.9% of Googlebot's monthly event volume.
The Crawl-to-Refer Ratio
Raw crawl volume is a poor proxy for value delivered to site owners. SEOmator's analysis computes how many pages a crawler reads before it generates one referring visit. For Googlebot, that ratio is approximately 4.9:1 — for every five pages crawled, roughly one user eventually arrives via a search click. DuckDuckGo's crawler runs at 1.5:1.
AI crawler ratios are in a different dimension entirely. GPTBot's ratio as of Q1 2026 was 1,276:1. Perplexity's bot — which directly powers live search responses — sat at 111:1. ClaudeBot's ratio was 11,122:1 in Q1 2026, improved from an earlier peak but still an order of magnitude worse than GPTBot.
The difference between Perplexity's 111:1 and ClaudeBot's 11,122:1 explains the architectural gap. Perplexity's crawler is tightly coupled to live query answering: it fetches a page because a user query triggered a retrieval request, and those results feed directly into a cited response. The result is a relatively tight crawl-to-referral loop. Other AI assistant crawlers are largely still operating in index-building mode: they read pages to populate a knowledge base from which the assistant may later synthesize answers, with no guarantee of citation and no direct referral path per crawled page.
Blocking Patterns
OAI-SearchBot is blocked by 12% of top websites as of June 2026. GPTBot is blocked by 25% of the top 1,000 sites and 5.52% across all analyzed domains. The gap between these rates reflects timing more than intent: OAI-SearchBot is a newer user-agent string that postdates many block lists written during the 2024 training crawler wave. Sites that blocked GPTBot then have not necessarily added OAI-SearchBot to their disallow lists.
This creates a structural problem for site owners who set broad bot blocks without distinguishing between the training fleet and the retrieval fleet. Blocking the retrieval bot — the one that generates citations and sends referrals — removes crawl traffic with the best crawl-to-refer outlook without affecting the training crawler that was never going to send referrals regardless.
What This Means for Site Owners
Evaluate your bot blocking policy by user-agent string rather than by vendor. The retrieval-mode bots (Perplexity, OAI-SearchBot) and the training-mode bots (GPTBot in its current predominant use) operate on fundamentally different referral timelines. OAI-SearchBot's industry-specific growth in healthcare and media is worth tracking: if your site operates in those verticals, the crawler fleet with the best prospect of sending future referrals is already running at substantially elevated volumes compared to 2024.
Check your server-side rendering coverage for known bot user-agents. The crawl-to-refer ratio is a ceiling — it measures outcomes that actually happened — but a crawler that hits a JavaScript-rendered page and receives a blank body never had a chance to index anything useful. The prerequisite for any AI-generated citation is a successful crawl with meaningful content. Static HTML delivery or server-side rendering for identified bot user-agents removes this failure mode before it affects the ratio.
Track crawl volume trends by user-agent alongside your referral attribution data month over month. Crawl growth without referral growth is a training-mode signal. Referral growth without proportional crawl growth — as you would expect from Perplexity's tighter retrieval loop — is a sign the crawler is in active search-serving mode and worth prioritizing for content accessibility. The ratio, not the raw crawl count, is the signal that maps to actual business outcomes.