Bot Traffic · June 6, 2026

The AI Crawler Mix Has Flipped: Bot Traffic Patterns in Mid-2026

Bots now account for 52% of all web requests — the first time automated traffic crossed the majority line. Bytespider nearly doubled its share in a single month. Here is what the 2026 AI crawler mix means for your infrastructure.

The 52% milestone no one planned for

For the first time in recorded internet history, automated traffic has crossed the 50% threshold: bots now account for approximately 52% of all global web requests as of mid-2026. That figure is drawn from HUMAN Security's analysis of over one quadrillion digital interactions published in April 2026, corroborated by network-layer data showing AI crawlers alone generating more than 50 billion requests per day across major CDN networks. The crossover was projected to arrive in 2027; it arrived a full year ahead of schedule.

Method

This post draws on three sources: HUMAN Security's 2026 State of AI Traffic & Cyberthreat Benchmark Report (April 2026, covering full-year 2025 data), a monthly AI crawler market-share tracker with data through May 2026 published at websearchapi.ai, and a 30-day per-site log study covering 12 production sites across e-commerce, SaaS, and media verticals published by Digital Applied in April 2026. Bot share figures are identified HTTP request counts at the network layer, not user sessions or analytics pageviews.

Findings

1. The current bot mix: who is crawling in May 2026

The AI crawler landscape in May 2026 is more concentrated than aggregate traffic numbers suggest. Googlebot still holds the plurality of named-bot requests globally, but its share fell three percentage points in a single month to 27.3%. The second-largest crawler is Meta-ExternalAgent at 16.3%. GPTBot sits third at 11.5%, followed by Bytespider — ByteDance's training crawler — at 10.3%, and ClaudeBot at 9.7%. Applebot rounds out the top six at 9.1%.

AI Crawler Market Share — May 2026
Share of named-bot HTTP requests. AI-only crawlers collectively exceed 50% of all identified crawler traffic for the first time.

The velocity story in that chart matters more than the snapshot. Bytespider nearly doubled its named-bot share — from 5.7% to 10.3% — in a single calendar month, passing ClaudeBot to move from seventh to fourth. These are not gradual share migrations. They are step-function jumps correlated with training-run schedules and model release timelines. A site's top AI crawler today may rank fourth within a single monthly reporting period.

2. Training still dominates, but agentic bots are the accelerating segment

Training-data crawlers accounted for approximately 74% of all AI bot traffic in 2025, down from roughly 90% in 2024. The vacated share moved to search-inference crawlers — bots that fetch real-time context for live user queries — and to agentic crawlers: autonomous agents that navigate forms, execute JavaScript, and take actions on behalf of users.

AI Bot Traffic by Purpose — 2024 vs 2025
Training crawlers shed 16 percentage points of share in 2025 as search-inference and agentic traffic expanded. Agentic volume grew 7,851% year-over-year despite its small absolute share.

Agentic traffic is under 4% of total AI crawler volume, but it grew 7,851% year-over-year in 2025. The category is small in absolute terms and large in risk-profile terms. These are not passive training scrapers reading your HTML for inclusion in a future dataset; they are active participants in user workflows, attempting purchases, account actions, and form submissions in real time. The intent profile confirms this: 77% of observed agentic AI activity landed on product and search pages, with 8.8% on account pages, 5% on authentication flows, and 2.3% on checkout pages.

3. Per-site crawl intensity: what AI bots cost per day

Across 12 monitored production sites in the March–April 2026 period, GPTBot averaged 4,200 hits per site per day. ClaudeBot trailed at approximately 1,800 hits per day, PerplexityBot at 980. On three e-commerce sites in the same study, Bytespider reached 6,500 hits per day — the highest single-bot figure observed in the sample, concentrated almost entirely on product listing and category pages.

Revisit cadence amplifies the volume. GPTBot revisits high-traffic pages approximately every 2.4 days; Bytespider revisits commercial product pages every 1.8 days on retail sites. Unlike traditional search bots that modulate crawl rate based on content-change signals, several AI training crawlers operate on fixed-interval schedules regardless of last-modified headers or sitemap change frequencies. Pages with no meaningful content update are re-fetched at the same cadence as actively updated inventory and pricing pages.

A moderately sized e-commerce site serving uncached responses to every inbound AI crawler request absorbs what amounts to a permanent background traffic test — every day, from visitors whose conversion rate to human transactions is zero. The compute and bandwidth costs are real regardless of whether the training-data value ultimately benefits the site's discoverability.

4. Vertical concentration and the blocking compliance gap

Three industries absorbed more than 95% of AI-driven traffic in 2025: retail and e-commerce, streaming and media, and travel and hospitality. Sites outside those verticals are not insulated from AI crawlers but do receive structurally less volume per page. For agentic traffic specifically, retail and e-commerce absorbed 46.6% of requests, streaming and media 28.5%, and travel and hospitality 19.2%.

On blocking: an academic study found that AI-blocking among reputable sites increased from 23% in September 2023 to nearly 60% by May 2025. The practical problem is compliance asymmetry. Training-mode crawlers for the major AI operators broadly honor robots.txt Disallow directives. Their inference-mode and agentic-mode counterparts, which may use different user-agent strings, show inconsistent compliance. A site that disallowed the training-mode user-agent in 2024 may have no effective controls on the inference-mode variant crawling the same paths today.

What this means for site owners

Treat AI crawlers as a distinct traffic class with its own bandwidth and caching budget. A training-data crawler hitting your product catalog every 1.8 days generates real origin load from visitors who will never convert. Serving pre-rendered, cached responses to identified AI user-agents rather than routing every request through your application stack is a straightforward operational improvement. Measure the ratio of AI bot requests to human sessions on a quarterly basis; the ratio moved substantially in 2025 and is on track to move again.

The 7,851% year-over-year growth in agentic traffic is the number security teams should act on. Agentic bots land on authentication flows and checkout pages using browser-automation frameworks that are effectively indistinguishable from account-takeover toolkits at the session-fingerprinting layer. Rules built on the assumption that bots do not execute JavaScript no longer hold. Blanket user-agent blocks for known agentic strings will also capture legitimate shopping agents generating attributable revenue for the site.

The bot mix is not a stable configuration item. Bytespider doubling its network share in a single month is not an anomaly; it reflects how training-data acquisition schedules align with model release cycles. Bot inventories accurate in Q4 2025 are already stale. Rules based on static user-agent lists need quarterly review. Configurations that match against live, continuously updated user-agent data and surface new bots automatically reduce the time between a new crawler's appearance and your site's response.

Sources

  1. 2026 State of AI Traffic & Cyberthreat Benchmark Report
  2. Monthly AI Crawler Report: April 2026 Traffic Trends
  3. Agentic Crawler Behavior: 30-Day Site Log Study 2026
  4. Web Crawler Restrictions, AI Training Datasets & Political Biases