Crawler Accessibility · June 18, 2025

0 of the Top 1,000 Sites Use llms.txt — And AI Crawlers Still Can’t Render JavaScript

Not one of the world’s top 1,000 most-visited sites publishes an llms.txt file, and every major AI crawler skips JavaScript execution entirely. Data from June 2025 scans and a 500-million-request study explain the two-gap problem.

Zero of the world’s 1,000 most-visited websites publish an llms.txt file. That number comes from Rankability’s June 2025 scan of the top 1,000 most-trafficked global domains — a systematic HTTP check of /llms.txt for each property. Against a wider backdrop of 10.13% adoption across roughly 300,000 domains studied in November 2025, the near-zero top-tier figure looks like a contradiction. It is not. It is the expected result when the sites with the most engineering resources move too slowly to adopt standards that primarily benefit smaller, more agile teams. And it compounds a second, more technically severe problem: every major AI crawler operating today fetches static HTML only. Any content that JavaScript generates after page load is invisible to all of them.

Method

Four sources underpin this post. Rankability publishes monthly llms.txt adoption data across both the top 1,000 global sites and a broader corpus of approximately 300,000 domains. Vercel and MERJ published an analysis of 500+ million GPTBot fetch requests recorded at the edge in early 2025. Adobe Digital Insights produced a report on U.S. retail site AI readability. Presenc AI’s State of llms.txt 2026 report covers sector-level adoption data through Q1 2026. Where ranges are reported, figures below use the midpoint or conservative bound.

The llms.txt Gap: Top Sites Trail the Web by 30×

llms.txt Adoption by Site Tier (Nov 2025)

Top-1,000 websites trail the broader web by 30x. Larger sites are paradoxically slower to adopt the standard.

Source: Rankability LLMS.txt Adoption Research Report

Rankability’s June 2025 scan found approximately 0.3% llms.txt adoption among the top 1,000 domains — roughly three sites. The same team’s November 2025 broader scan (approximately 300,000 domains) showed 10.13% adoption overall. The traffic-tier breakdown is equally unusual: low-traffic sites (0–100 visits) adopted at 9.88%, mid-traffic (1,001–5,000 visits) at 10.54%, and high-traffic sites (100,001+ visits, excluding the top 1,000) at 8.27%.

The implication is that smaller sites are running ahead of large ones. This reflects the adoption curve for many developer-originated standards: technically sophisticated teams at early-stage companies implement them quickly because the cost is low and the signal value is high. Established enterprises with large engineering backlogs and legal review cycles move slower.

A separate data point from Rankability’s research is worth noting: statistical analysis of link datasets finds no measurable correlation between llms.txt publication and frequency of AI citation. The file is a machine-readable hint, not a ranking factor. This does not make it worthless — a well-structured llms.txt reduces the chance that crawlers index low-quality pages instead of canonical content — but it does mean skipping it to prioritize rendering fixes is the right call for most engineering teams.

The Rendering Gap: JavaScript Execution Is Zero

GPTBot Interaction With JavaScript Files (500M+ Requests)

Downloads JS as code training data 11.5% of the time; executes JavaScript exactly 0% of the time. Source: Vercel + MERJ.

Source: Vercel and MERJ AI Crawler Analysis

Vercel and MERJ’s dataset is the most comprehensive published study of AI crawler behavior: 500+ million GPTBot requests observed at the edge. JavaScript execution rate: zero. GPTBot does download JavaScript files 11.5% of the time, but only to harvest the code as training data for models trained on programming languages. There is no evidence of the runtime being invoked, no DOM mutations observed, no event listeners triggered.

The pattern holds across all other major AI crawlers: ClaudeBot, PerplexityBot, Meta’s ExternalAgent, and Bytespider fetch static HTML and nothing else. Googlebot is the sole exception, using a deferred two-phase rendering pipeline that invokes a headless browser hours or days after initial discovery. Every other crawler — including every AI-specific crawler — sees only what the server delivers in the initial HTML response.

For a React or Next.js site configured for client-side rendering, that initial response is typically a shell: a <head> block, a root <div>, and a set of <script> tags pointing at bundled JavaScript. The content those scripts would render — product descriptions, pricing, review text, FAQ answers, structured comparisons — is absent from the static payload. As far as AI crawlers are concerned, the page is blank.

AI Crawler Traffic Is Growing Faster Than Rendering Can Keep Up

AI Crawler Traffic Index: May 2024 vs May 2025

96% year-over-year growth. AI crawlers now represent approximately 28% of total crawl traffic across major sites.

Source: Composite CDN and log-analysis providers

AI crawler traffic grew 96% between May 2024 and May 2025, based on aggregate data from CDN and server log analysis sources. AI crawlers now represent approximately 28% of total crawl traffic across major properties, up from single-digit percentages two years prior. For retail specifically, one analysis found AI bots crawling product pages at 198 times the rate of Googlebot during the same period.

The gap between crawl volume and content visibility has widened, not narrowed. Sites that have not addressed client-side rendering are receiving more bot requests than ever, and those requests are returning empty HTML at an accelerating rate. Bandwidth and server capacity are consumed; the crawlers log a 200 OK response, record an empty document, and move on.

Adobe’s Digital Insights team examined U.S. retail sites for AI readiness and found that major portions of these properties were not entirely machine-readable — with JavaScript-dependent rendering cited as the primary driver. The same sites are among the heaviest recipients of AI crawler traffic.

Sector-Level Adoption

llms.txt Adoption by Sector, Top-100 Domains (Q1 2026)

Technology and blockchain lead at ~20-30%; media, legal, and healthcare trail below 10%. Approximate midpoints of stated ranges.

Source: Presenc AI State of llms.txt 2026

Presenc AI’s Q1 2026 sector analysis provides the clearest breakdown of where llms.txt adoption is concentrated. Technology, cybersecurity, and blockchain cluster at roughly 20–30% adoption among top-100 domain holders in those categories. Media, legal, and healthcare properties land under 10%. Consumer retail, entertainment, and social platforms are not broken out separately but align with the near-zero top-site figure from Rankability’s scan.

The sectors with the highest AI crawler exposure — retail, news, media — have the lowest llms.txt adoption and, separately, the most JavaScript-dependent front-end architectures. The sectors with the highest adoption — developer tools, security research, technical documentation — are also most likely to use static generation or server-side rendering by default. The result is an inversion: those who need AI crawler optimization least are doing the most of it.

What This Means for Site Owners

Fix rendering before shipping llms.txt. The rendering gap is a hard technical barrier: if JavaScript generates your content, AI crawlers simply do not see it. A well-written llms.txt cannot fix this. Render key pages — product descriptions, pricing, FAQs, knowledge base articles, category pages — server-side or at build time. For teams that cannot refactor rendering architecture quickly, serve pre-rendered HTML to detected crawler user agents. The user agents to target include GPTBot, ClaudeBot, PerplexityBot, PerplexityBot-User, Bytespider, and Meta-ExternalAgent.

Run the baseline audit now. Fetch your highest-value pages using a curl command that mimics an AI crawler user agent: curl -A "GPTBot/1.0" https://yourdomain.com/key-page. Inspect the response body length and content. If the body is under 2KB or contains only a root element and script tags, the page is invisible to AI crawlers regardless of any other optimization. Every site engineering team running a JavaScript-heavy front end should run this check before considering any other AI visibility work.

On llms.txt: publish it, but do not prioritize it over rendering. The file is low-cost to produce — a minimal version pointing to your sitemap and a few canonical sections takes under an hour. It has plausible value as the ecosystem evolves and crawlers develop more sophisticated access protocols. The current data does not support it having a measurable impact on AI citations today. Treat it as insurance, not a priority.