Crawler Accessibility · May 7, 2026

llms.txt Adoption in 2025: 0% of the Top 1,000 Sites, ~10% of the Broader Web

A scan of the top 1,000 most-visited sites finds zero valid llms.txt files. A separate study of 300,000 domains puts adoption at ~10%. Here's what the data actually shows — and why the gap matters for AI crawler accessibility.

Zero. That's how many of the top 1,000 most-visited websites globally publish a valid /llms.txt file, according to a monthly scan by Rankability last updated in June 2025. A broader SE Ranking study of nearly 300,000 domains puts adoption at 10.13% — a number that sounds more encouraging until you notice that high-traffic sites are less likely to adopt it than mid-tier ones. The standard proposed in September 2024 has spread through developer tooling and documentation platforms at speed, but it has essentially zero footprint among mainstream consumer destinations. At the same time, the AI crawlers that would theoretically benefit from these files are generating hundreds of millions of raw-HTML fetches every month — and virtually none of them execute JavaScript. The two problems are related.

Method

Data in this post comes from three published datasets: Rankability's automated monthly scan of the top 1,000 globally-ranked domains (checking for HTTP 200 responses with valid text content at /llms.txt); SE Ranking's November 2025 study of approximately 300,000 domains across traffic tiers; and a major edge-network provider's AI crawler traffic breakdowns (July 2025), cross-referenced with Vercel's own crawler analysis. We also checked the official llms.txt specification at llmstxt.org and the ALLMO.ai dataset covering 94,614 cited URLs across 11,867 AI responses collected between August and December 2025. All fetch_url calls are logged as sources below.

Finding 1: The top 1,000 have not adopted llms.txt

<cite index="1-11">A comprehensive scan of the top 1,000 websites reveals that not a single major platform has implemented llms.txt transparency standards.</cite> <cite index="3-10">Zero major consumer platforms like Google, Facebook, Amazon, or mainstream news sites have adopted it.</cite> Rankability's tracker, which requires a valid HTTP 200 response with text content (redirects and HTML responses are excluded), reported a 0.3% rate across the top 1,000 — meaning three sites — as of June 2025, which had fallen back to 0% by the time of publication. Even the brief exceptions didn't last: <cite index="6-16">Walmart had an /llms.txt page publicly available in November 2025, but a follow-up check in January 2026 showed it had been removed.</cite>

This is not simply a measurement artefact of looking at consumer properties. <cite index="3-14">The pattern is unmistakable: llms.txt has achieved rapid penetration in developer tools, AI companies, technical documentation, and SaaS platforms, while remaining completely absent from the mainstream web.</cite> <cite index="5-32">Over 844,000 websites had already implemented it according to BuiltWith's tracking as of October 25, 2025.</cite> The volume is real, but the distribution is narrow. <cite index="3-1">An independent crawl of the Majestic Million dataset found just 15 sites in February 2025 growing to 105 sites by May — a 600% increase from a near-zero base.</cite>

llms.txt Adoption: Top 1,000 Sites vs. Broader Web (2025)

Zero of the top 1,000 most-visited domains publish a valid llms.txt (Rankability, June 2025). Across ~300,000 domains surveyed by SE Ranking, ~10% do. The gap shows the standard lives almost exclusively outside the mainstream web.

Source: Rankability (top-1000 scan, June 2025) + SE Ranking (300K domain study, Nov 2025)

The Majestic Million growth curve and BuiltWith's 844K figure are not contradictory: the former measures the very top of the domain authority distribution, the latter counts any site with the file, heavily weighted toward developer documentation hosted on platforms like Mintlify. <cite index="9-1,9-2">Adoption shifted sharply in November 2024 when Mintlify made a strategic decision to roll out support for /llms.txt across all docs sites it hosts, and practically overnight thousands of docs sites began supporting llms.txt.</cite> That single platform event accounts for a large fraction of the total count.

Finding 2: Adoption is flat across traffic tiers — and high-traffic sites are slightly behind

<cite index="2-10,2-11">Only about 1 in 10 websites use llms.txt. In a dataset of nearly 300,000 domains, just 10.13% had an llms.txt file in place.</cite> The more striking finding is the distribution across traffic tiers.

llms.txt Adoption Rate by Site Traffic Tier

Adoption is nearly flat regardless of traffic volume — high-traffic sites are marginally less likely to publish llms.txt than smaller sites. Across ~300,000 domains, overall adoption is ~10%.

Source: SE Ranking, analysis of ~300,000 domains (Nov 2025)

<cite index="2-15">Adoption rates were: websites with low traffic (0–100 visits): 9.88%; mid traffic (1,001–5,000 visits): 10.54%; high traffic (100,001+ visits): 8.27%.</cite> <cite index="2-17,2-18">Surprisingly, the biggest, most established sites are actually slightly less likely to use the file than mid-tier ones. The use of llms.txt isn't concentrated among industry leaders.</cite> If llms.txt were delivering a measurable competitive advantage in AI citations, you would expect the high-traffic cohort — which has the most to gain from AI search visibility — to be over-represented. It is not.

Finding 3: No measured effect on AI citation frequency

<cite index="2-13,2-14">There's no correlation between AI citations and llms.txt. Both statistical analysis and machine learning showed no effect of llms.txt on how often a domain is cited by LLMs.</cite> A separate analysis from ALLMO.ai, using a proprietary dataset, found the same pattern. <cite index="6-7,6-8">Across 94,614 cited URLs from 11,867 AI responses monitored between August and December 2025, the presence of llms.txt appeared in a negligible fraction of cited sources.</cite> <cite index="6-25,6-26">If ~10% of websites have adopted llms.txt, you would expect roughly ~10% of cited sources to include it — instead, the file was found in less than 1% of the 120 analyzed websites.</cite>

<cite index="2-32">As of now, there's no clear evidence that major AI platforms are actively using llms.txt in their data pipelines: Google has stated that its AI Overviews and AI Mode continue to rely on traditional SEO signals instead of llms.txt.</cite> <cite index="4-37,4-38">According to Search Engine Land, 8 out of 9 sites saw no measurable change in traffic after llms.txt implementation. John Mueller also reinforced this point, saying that none of the AI crawlers have claimed they extract information via llms.txt yet.</cite>

That said, there is one confirmed use case with a technical rationale. <cite index="27-4">AI agents are actively visiting a site's llms-full.txt over twice as much as the standard llms.txt file.</cite> For documentation-heavy products where AI coding assistants need to load context at inference time, the file delivers measurable value — not because it changes what a training crawler ingests, but because it gives agents a single endpoint to retrieve the full doc set without following hundreds of links.

Finding 4: The crawlers arriving at your site can't render JavaScript

Even if you perfect your llms.txt, the crawlers visiting your pages face a more fundamental problem. <cite index="20-4,20-5">While AI crawlers do fetch JavaScript files — with one major AI assistant's crawler fetching JavaScript in 11.50% of requests and another in 23.84% — they don't execute them. They can't read client-side rendered content.</cite>

AI Crawler Share of All Crawler Traffic (July 2025)

Training-purpose bots dominate. GPTBot and ClaudeBot together account for over half of all AI-only crawler traffic; both fetch raw HTML without executing JavaScript.

Source: Cloudflare Radar (via SoftwareSeni analysis, July 2025)

<cite index="40-11">Analysis of both Next.js applications and traditional web applications shows consistently that none of the major AI crawlers currently render JavaScript.</cite> The practical consequence: <cite index="11-7">if your product listings, prices, or "Add to Cart" buttons are rendered by JavaScript, AI crawlers see a blank slate — maybe just a header and footer.</cite> <cite index="12-4">Your SPA can rank position one on Google while being completely blank to every AI search system simultaneously.</cite>

<cite index="31-4">The AI crawler landscape saw significant shifts between May 2024 and May 2025, with the leading training crawler surging from 5% to 30% share of AI-only crawler traffic.</cite> <cite index="32-15">Over the past 12 months, 80% of AI crawling was for training, compared with 18% for search and just 2% for user actions.</cite> The crawlers doing the most fetching are also the ones with the worst crawl-to-referral ratios: <cite index="39-16,39-17,39-18">the worst ratio belongs to one major AI assistant's training crawler at 38,065:1 (July 2025), while the best among pure AI companies is approximately 195:1, with a major AI search product sitting between at approximately 1,091:1.</cite>

The JavaScript gap is not theoretical. <cite index="8-2">GPTBot's share of robots.txt encounters rose from 2.9% on desktop in 2024 to 4.5% in 2025 — a ~55% increase.</cite> More crawlers, more often, all hitting your raw HTML. <cite index="8-15">Another major AI crawler's bot nearly doubled its robots.txt encounter share, from under 2% on desktop in 2024 to 3.6% in 2025.</cite> These bots are now a meaningful fraction of crawler traffic, and they will miss any content that lives behind a client-side render.

What this means for site owners

Fix rendering before adding a navigation file. llms.txt is a map to your content. If the content itself is invisible — because it loads client-side in a React SPA, behind a lazy-load trigger, or inside a JavaScript-injected component — the map points to nothing. <cite index="16-9">For pages that require dynamic functionality alongside static content, deliver the substantive content — headings, body copy, product descriptions, FAQs, pricing context — in the initial HTML response, and layer dynamic enhancements on top via JavaScript.</cite> Run curl -A "GPTBot/1.1" https://yoursite.com/your-key-page to see exactly what AI crawlers receive. <cite index="12-7">If you see only an empty div and script tags, you are running client-side rendering and AI crawlers see nothing.</cite>

Treat llms.txt as a low-cost hedge, not a priority project. The data is consistent: there is currently no measurable citation lift from publishing the file, and adoption among high-authority sites is low. But the implementation cost is also genuinely low — <cite index="5-5">implementation takes 1–4 hours with no demonstrated downside if platforms eventually adopt the standard.</cite> For teams managing documentation or API references, the case is stronger: <cite index="25-8,25-9,25-10">the challenge with llms.txt is freshness — it's a static file, and when you publish a new blog post, add a service page, or update content, the llms.txt goes stale.</cite> Automate generation (Mintlify, Yoast, GitBook all do this natively) so you don't create a maintenance burden. A stale index actively misleads AI agents about what your site contains.

Prioritize the crawlers that send traffic back. <cite index="32-15,32-16">Over the past 12 months, 80% of AI crawling was for training, with only 18% for search and 2% for user actions. In the last six months, training's share rose to 82%.</cite> Training crawlers have no structural incentive to send visitors back to your site. The user-action segment — the bots that fetch your page in real time when someone asks an AI assistant a question — is still small but growing fast. <cite index="33-5,33-6">User-action crawling volume increased more than 15 times from January through early December 2025, closely matching the traffic pattern for the crawler that visits pages when users ask questions of a major AI assistant.</cite> That segment is where rendering quality converts directly into AI-generated referrals. Get your critical pages into the initial HTML response, then worry about the navigation file.