Crawler Accessibility · June 8, 2026

llms.txt Is Live on 2% of the Web. AI Crawlers Still Can't Execute JavaScript.

The 2025 Web Almanac found llms.txt on 2.13% of sites—39.6% auto-generated by a plugin. Analysis of 500 million AI crawler requests found zero JavaScript execution. Only one of these gaps causes active content delivery failures today.

The 2025 Web Almanac crawl of approximately 16 million websites found llms.txt files on 2.13% of desktop sites and 2.10% of mobile sites — 324,184 valid files in total. A separate analysis of over 500 million AI crawler requests, published in December 2024 by Vercel and MERJ, found zero JavaScript execution across the entire dataset. Both numbers are more surprising in combination: site operators are publishing a navigation aid for content that most AI crawlers cannot access.

Method

llms.txt adoption data comes from the HTTP Archive's 2025 Web Almanac (July 2025, ~16 million sites). JavaScript rendering data comes from Vercel and MERJ's December 2024 analysis of over 500 million AI crawler requests across Vercel's edge network. The Majestic Million adoption figures come from Chris Green's independent crawl study, published May 2025. Citation effect data is from SE Ranking's November 2025 study of approximately 300,000 domains.

llms.txt: 2% Adoption, 40% Accidental

The 2025 Web Almanac gives the most comprehensive adoption figure available: 2.13% of desktop sites and 2.10% of mobile sites have a valid llms.txt file. That headline number contains a notable internal finding: 39.6% of all detected llms.txt files were auto-generated by the All in One SEO WordPress plugin — roughly four in ten implementations created automatically, without deliberate configuration by the site owner.

llms.txt Files by Origin, July 2025
39.6% of all llms.txt files detected in the 2025 Web Almanac crawl were auto-generated by the All in One SEO WordPress plugin, not deliberately authored.

The auto-generation rate reframes the effective adoption figure. The 2.13% includes a large fraction of files that may not reflect meaningful editorial decisions about AI access policy. Deliberately authored implementations sit within the remaining 60%.

Among the most-linked web tier, adoption is substantially lower. Chris Green's May 2025 crawl of the Majestic Million — the top one million sites by inbound link authority — found only 105 valid llms.txt files, up from 15 in February 2025. That is 0.011% of the dataset, compared to 2.13% across the broader sample. High-authority domains, which typically receive the most AI crawler traffic, are the slowest adopters.

On citation effect: SE Ranking's November 2025 analysis of approximately 300,000 domains, using statistical correlation and a machine learning model, found no measurable relationship between having llms.txt and appearing in AI-generated responses. No major AI operator has publicly committed to honoring the specification during crawl or indexing; without that commitment, the file is a signal without a confirmed receiver.

The JavaScript Execution Gap

Vercel and MERJ's December 2024 analysis of over 500 million AI crawler requests across a major edge network found zero JavaScript execution across the entire dataset. GPTBot downloads JavaScript files on 11.5% of its requests. ClaudeBot fetches JavaScript files on 23.8% of its requests. Neither executes them.

AI Crawlers: JavaScript File Fetches vs Execution Rate
GPTBot fetches JavaScript files on 11.5% of its requests; ClaudeBot on 23.8%. Neither executes them. Vercel / MERJ analysis of 500M+ requests, December 2024.

The crawlers that do render JavaScript are limited to a short list: Googlebot uses a two-phase system with headless Chrome to execute client-side code before indexing. Every major AI training and retrieval crawler — GPTBot, ClaudeBot, Meta-ExternalAgent, Bytespider, PerplexityBot, OAI-SearchBot — operates on static HTML only.

For sites built with React, Vue, or Angular where product listings, article text, and navigation menus are injected into the DOM after the initial page load, this means those elements are absent from the version of the page an AI crawler receives. The crawler does not encounter an error. It receives the initial server response — typically a minimal HTML shell containing a root container element and script tags — and indexes that as the page's content. Server-log audits consistently estimate that 50–80% of content on heavily JavaScript-dependent single-page applications fails to appear in the static HTML that AI crawlers receive.

The scale of invisible traffic is significant. Vercel's December 2024 data showed GPTBot generating 569 million requests and ClaudeBot generating 370 million requests in a single month across their network. Combined with PerplexityBot and Applebot, that is approximately 1.3 billion requests — about 28% of Googlebot's volume — all served against static HTML.

The Interaction Problem

A site that has published llms.txt while its primary pages render content via JavaScript faces both problems at once. The llms.txt file describes which content sections matter most, which URLs to prioritize, and which to exclude. The JavaScript rendering constraint means those pages return near-blank HTML to the crawler fetching them. The navigation aid describes content the crawler cannot read.

This is the priority ordering problem. llms.txt addresses indexing policy — how a future, confirmed crawl behavior should be configured. JavaScript rendering addresses content delivery — what actually arrives in the HTTP response body on each request, today. Resolving the delivery problem is a precondition for the policy file to have anything to work with.

The crawl test is direct: fetch any content URL with a plain HTTP client that does not execute JavaScript and compare that response to the browser-rendered version. If the static HTML output is missing the page's primary content, that gap is what AI crawlers see on every request. No structured data, no robots.txt configuration, and no llms.txt structure changes what is present in that initial server response.

What This Means for Site Owners

Fix the rendering problem before investing in the signaling layer. Server-side rendering, static HTML export, or dynamic rendering for identified bot user-agents all address the fundamental constraint: getting content into the initial server response. Dynamic rendering is the incremental path for most existing deployments — detect AI user-agents and serve pre-rendered HTML to them while human visitors continue receiving the client-side application unchanged.

For llms.txt: the 39.6% auto-generation figure suggests many existing implementations required no deliberate action. For sites without it, publication is a low-cost, forward-facing step given the SE Ranking finding of no current measurable citation effect. The argument for publishing is positioning for potential future adoption if AI operators commit to the spec — not a near-term visibility mechanism. The 97.87% of sites that do not yet have llms.txt are unlikely to be materially disadvantaged by that absence. The 50–80% content gap facing JavaScript-heavy sites that have not addressed static rendering is an active delivery failure on every crawl today.

Sources

  1. 2025 Web Almanac: Generative AI Chapter
  2. 2025 Web Almanac: SEO Chapter
  3. The Rise of the AI Crawler — Vercel / MERJ, December 2024
  4. Crawling a Million Websites in Search of LLMs.txt
  5. LLMs.txt Shows No Clear Effect On AI Citations, Based On 300k Domains