Bot Traffic · June 27, 2026

What are AI bots actually doing on your site — and is any of it helping you?

57.5% of web traffic is now bots, but most of that AI crawling won't put a single visitor in your analytics. Here's what the 2026 data shows about training crawlers, browsing agents, and the gap between.

57.5% of all HTML web traffic is now bots. As of June 3, 2026, automated requests outnumber human page loads for the first time in the web's history — and if you run a website, the odds are good that you've noticed something strange in your server logs: crawl requests you can't explain, from user-agents you've never seen before, hitting pages that don't get much human traffic.

So what are they actually doing on your site?

How we got this data

This piece draws on network-level telemetry published by a major CDN platform that proxies a significant share of global web traffic, along with Adobe Analytics data covering more than one trillion US retail site visits tracked through Q1 2026. All percentages refer to the most recent 28-day window unless stated — late May to mid-June 2026.

Which AI crawlers are actually in your logs?

GPTBot — operated by the company behind one of the dominant AI assistants — accounted for 11.48% of AI bot HTTP requests in May 2026. ClaudeBot sat at 9.73%. Bytespider, ByteDance's crawler, jumped from 6.5% to 10.5% in the same period, its third consecutive month of growth, making it the single biggest AI crawler mover that month. Googlebot, which is not purely an AI crawler, remained the busiest individual named bot at 27.26% of AI-adjacent verified traffic.

The user-agents worth watching aren't the ones with "Bot" at the end. ChatGPT-User and similar real-time browsing agents have a completely different function: they fire when an actual person is asking a question. They're not batch-crawling your site for a training run that kicks off in six months. They're fetching your content right now to answer someone. That distinction — between harvesting and answering — is where the real split in AI crawler behaviour lies.

AI Crawler Requests by Purpose (May 2026)

Training-only and mixed-purpose requests combined account for 87.5% of all AI crawler traffic. Real-time user-initiated requests represent just 3.2%.

Source: Cloudflare Radar, CDN network telemetry, May 2026

So what are all these crawlers actually here for?

Here's the number that changes how you should think about the AI bot traffic in your logs: in May 2026, 51.8% of AI crawler requests were classified as pure training. These are crawlers bulk-collecting text with no mechanism to send a human back to your site afterward. Add in "mixed-purpose" requests — crawlers that blend training data collection with retrieval indexing — and the combined training-component share climbs to 87.5% of all AI crawler traffic.

Pure search indexing — the kind that could actually put your page inside an AI-generated answer — accounted for only 9.3%. Requests that directly correlate with a real human actively browsing via an AI assistant? Just 3.2%.

That means for every 100 AI bot requests your server handles, roughly 87 are there to collect content with no referral mechanism attached. They take, and they don't give back.

Training-Component Share of AI Crawler Requests (2024–2026)

Includes both training-only and mixed-purpose (training + retrieval) requests. The training component has risen consistently year over year.

Source: Cloudflare Radar annual and H1 2026 reports

Has that ratio been getting worse over time?

Yes. In 2024, training-purpose requests made up roughly 72% of AI crawler activity. By the full year 2025, that figure had risen to around 80%. When you include mixed-purpose traffic in 2026's updated methodology, the training component now touches 87.5% — a consistent upward trend as the appetite for training data grows alongside model scale.

Real-time browsing agents have grown in absolute terms too, but from a far smaller base. Their request volumes are rising, which is why referral traffic from AI sources can show triple-digit year-over-year growth in some industries — even while training crawls dominate the total count.

Does all this crawling actually send you any visitors?

Rarely, and unevenly. The retail sector is the clearest example of where the referral side of AI traffic does show up. Adobe Analytics tracked AI-referred traffic to US retail sites growing 393% year-over-year in Q1 2026. When those visitors arrive, they convert 42% better than organic search traffic, spend 48% longer on site, and browse 13% more pages per visit.

But that referral traffic comes from real-time browsing agents — a 3.2% sliver of the total AI crawler footprint. The training bots filling the rest of your logs don't appear in any referral report. They send no one anywhere.

The gap between crawl volume and referral traffic is the metric most site owners haven't started tracking yet. A site receiving 10,000 AI bot hits daily and 15 AI-attributed visits has a completely different situation from one seeing 500 hits and 400 referrals. Both might look similar in raw log count, but they represent opposite relationships with the AI ecosystem.

What should you actually do with this?

Decide whether you want to feed the training crawlers. GPTBot, ClaudeBot, Bytespider, and their peers all respect a well-configured robots.txt. If you're not comfortable with your content being used for AI training data in exchange for no referral traffic, that's the mechanism. The most-blocked AI crawlers in Q1 2026 — GPTBot at 5.52% of DISALLOW rules, ClaudeBot at 4.88% — suggest a meaningful share of site owners have already made that call.

Treat real-time browsing agents as a separate channel. ChatGPT-User, PerplexityBot, and similar agents are effectively a new traffic source — one that converts remarkably well when it does arrive. If your pages return near-blank HTML to a crawler that doesn't run JavaScript, or bury your main content behind lazy loading, you're invisible to the queries that would actually send someone to your site. Pre-rendered, semantically marked-up HTML is what these agents can work with.

Start measuring the gap. Your analytics shows what traffic arrives. Your access logs show everything that hits your origin, including the training crawlers that will never become a visit. The difference between those two numbers is roughly how much content you're contributing to AI training for free. Whether that trade-off makes sense for your business is your call — but you should at least know how big it is.