Bot Traffic · June 22, 2026

Bytespider Tripled Its Share in 90 Days. Your robots.txt Hasn't Caught Up.

An unknown crawler hit 10.5% of AI bot traffic in three months, yet appears in only 4.23% of robots.txt DISALLOW rules — the widest blocking gap of any top-five AI bot in 2026.

Automated requests crossed 57.5% of all HTML web traffic on June 3, 2026 — the first recorded majority. Within that bot majority, AI-related crawling (training bots plus real-time retrieval) accounts for 26.7% of verified bot volume, a figure that grew 30% between January and May alone. The headline numbers frame a competitive landscape most teams haven't mapped since late 2024: five AI crawlers each now hold more than 9% share, and one of them — Bytespider, operated by TikTok's parent company, ByteDance — went from near-invisible to the fourth-largest AI bot on the web in a single quarter.

Method

Data in this post draws from three sources. The monthly AI crawler tracker from WebSearchAPI aggregates request-share data from CDN telemetry and public radar feeds; figures here are from the May 2026 edition. The robots.txt analysis comes from TechnologyChecker.io's Q1 2026 scan of the top 1 million domains. Per-site crawl-rate data comes from a 30-day log study of 12 production sites published by Digital Applied in April 2026. Bot-share percentages count HTTP requests; URL-coverage and per-site frequency tell a different story covered below.

The Current Top Five

The May 2026 ranking by request share shows Googlebot leading at 27.26%. Meta-ExternalAgent holds second at 13.10%, up from roughly 12% in January. GPTBot reclaimed third at 11.48% after ClaudeBot briefly led that slot in April. Bytespider sits fourth at 10.50% and ClaudeBot fifth at 9.73%.

Top 5 AI Crawlers by Request Share (May 2026)
Bytespider's three-month surge pushed it past ClaudeBot and Bingbot to become the #4 AI crawler in May 2026.

The clustering matters: all five slots between second and fifth are separated by under 4 percentage points. A swing of the size Bytespider delivered in May — plus 4 points in a single month — is enough to pass an established crawler. The January gap between third and fifth was roughly 6 points. By May it had collapsed to under 2.

One structural fact cuts across the ranking: Meta-ExternalAgent operates no search engine and returns zero referral traffic to publishers. Googlebot's requests lead to indexed pages and click-through. The other major crawlers — GPTBot, Bytespider, ClaudeBot — are overwhelmingly training-data pipelines. Across all AI bots tracked in the period, 89.4% of traffic is classified as training or mixed-purpose rather than search retrieval.

Bytespider's Three-Month Run

ByteDance's crawler registered 3.6% share in March, 6.5% in April, and 10.5% in May — three consecutive months of growth, the most sustained single-crawler surge tracked in 2026.

Bytespider Traffic Share: March–May 2026
Three consecutive months of growth, culminating in a +61% jump in May — the largest single-month move of any AI crawler in 2026.

May's +4-percentage-point move (+61% relative) was the largest monthly shift across the entire leaderboard. One infrastructure team reported that close to 90% of their total AI crawler traffic had shifted to Bytespider by April, pushing it ahead of every other non-Google AI bot on their edge. The crawler presents as Bytespider/1.0 in logs and respects robots.txt directives when explicitly listed, but does not publish a crawl purpose declaration equivalent to what some other crawlers' documentation provides. There is no public crawl-rate limit or politeness interval specification from ByteDance.

The bot has been active since at least 2022. What changed in early 2026 was intensity, not presence. Earlier log data showed infrequent, low-volume activity typical of indexing experiments; the 2026 pattern is high-volume, high-frequency re-crawling of the same URL sets — the signature of training-data collection at scale.

The Blocking Gap

When sites make a blocking decision about AI crawlers, GPTBot is the most common target. In Q1 2026, an analysis of robots.txt directives across the top 1 million domains found GPTBot appearing in 5.52% of DISALLOW rules — the highest rate of any AI crawler. CCBot came in at 5.08%, ClaudeBot at 4.88%, Google-Extended at 4.44%.

Bytespider sat last at 4.23%.

robots.txt DISALLOW Rate by AI Crawler (Q1 2026)
Bytespider carries the fewest DISALLOW rules (4.23%) of any top-five AI crawler, despite being #4 by request share.

That 1.29-point gap between GPTBot and Bytespider represents approximately 12,900 domains that block GPTBot but not Bytespider. These are sites that made a deliberate robots.txt decision — most likely between 2023 and 2024 — using block lists available at the time. Bytespider had negligible share in 2023 and does not appear in most third-party bot block lists published before mid-2025.

The blocking rate and the traffic share now point in opposite directions. Bytespider holds more request-share than any crawler except Googlebot and Meta-ExternalAgent, yet carries the fewest DISALLOW rules among the top five. Any site whose policy is to block training crawlers while permitting search-retrieval bots is executing that policy with a gap.

Per-Site Frequency and URL Coverage

Global share percentages describe how much of total AI crawler traffic each bot represents. Per-site hit rates describe how hard each bot hits an individual origin. A 30-day log study across 12 production sites in April measured GPTBot at 4,200 requests per site per day — the highest. ClaudeBot came in at 1,800 per day and PerplexityBot at 980.

High hit rates don't translate into broad URL coverage. A two-month coverage analysis across a large edge network found Googlebot reaching 1.76× more unique URLs than GPTBot and 1.70× more than ClaudeBot. The implication: training crawlers cycle through a narrow hot set of high-traffic URLs at high frequency rather than continuously expanding to new paths. A crawler making 4,200 daily requests on a site with 50,000 URLs is not discovering new content — it is re-fetching the same small percentage repeatedly.

This matters for caching. If a training crawler revisits the same 200 URLs daily, those requests are cheap to serve from edge cache. If the same requests hit uncached dynamic pages or run database queries, compute cost scales with request volume rather than with any return traffic. Bytespider, with no public documentation on its revisit intervals, may behave differently from the bots measured in the April study.

What This Means for Site Owners

Any robots.txt file built from guidance written before mid-2025 is likely missing Bytespider. The fix is a two-line addition. Whether to add it is a policy question: if the site's position is to permit training crawlers in exchange for potential future visibility in AI systems that ByteDance builds, no action is needed. If the position is to block crawlers that return no referral traffic, Bytespider belongs on the list alongside Meta-ExternalAgent, CCBot, and similar bots.

The bot mix is no longer stable enough to treat robots.txt as a set-once decision. Bytespider doubled in two months; another entrant could replicate that trajectory in the second half of 2026. Treating the block list as a quarterly review item — comparing current traffic share against current DISALLOW coverage — is now the defensible standard rather than the cautious one.

For sites running server-side rendering or making database calls per request, bot-aware caching is the higher-leverage intervention. Serving training crawlers cached HTML at the edge costs essentially nothing and avoids the compute overhead from repeated requests. The question is whether the site's caching layer distinguishes bot traffic from human traffic, and whether Bytespider's user-agent string is included in that list.

Sources

  1. Monthly AI Crawler Report: May 2026 — Bytespider Surges to #4 as Applebot Reverses
  2. We Analyzed robots.txt Across the Web: AI Crawler Blocking Report Q1 2026
  3. Bot Traffic Passes Humans Online: Agentic AI Drove 57.5% Share
  4. Agentic Crawler Behavior: 30-Day Site Log Study 2026