Crawler Accessibility · June 22, 2026

llms.txt Has 10% Adoption and No Proven Effect on AI Citations

Among the 50 domains most cited by AI assistants, only one publishes an llms.txt file. A 300,000-domain study found no correlation between the file and AI citation frequency. Here is what the data shows.

Among the 50 domains most frequently cited by AI assistants across 2025 and 2026, exactly one publishes an llms.txt file. SE Ranking's study of 300,000 domains found no statistically significant correlation between publishing the file and appearing in AI-generated answers — in fact, adding llms.txt presence as a feature to their citation-prediction model decreased accuracy, suggesting the file is negative noise in the signal. Adoption reached 10.13% of the measured web after 18 months of industry conversation. The evidence that it changes what AI systems cite has not materialized.

Method

Three data sources underpin this analysis. SE Ranking examined approximately 300,000 domains, testing 14 domain-level features including llms.txt presence against AI citation frequency using Pearson correlation and an XGBoost gradient-boosted model; findings were published in November 2025. Ahrefs monitored /llms.txt request traffic across 137,000 domains in their user panel through May 2026, capturing total request volume and originating bot category for files that received any traffic. A 2026 AI crawler operator reference compiled official statements from major AI crawler operators on which file standards each system actively processes.

Adoption Versus AI Citation Presence

llms.txt Adoption Across Domain Populations (2025–2026)

Adoption is highest among technical and developer-facing panels. The 50 domains most cited by AI assistants carry only a 2% adoption rate — the inverse of the growth trend across the broader web.

Source: SE Ranking llms.txt Study 2025; Ahrefs llms.txt Analysis May 2026

The adoption figures reveal an inverse pattern relative to citation outcomes. SE Ranking's broad sample shows 10.13% of domains publish llms.txt — roughly one in ten sites. Among the Ahrefs panel, which skews toward developers and SEO practitioners, adoption was 28%. Both figures reflect growth well beyond early-mover pioneers. Yet the 50 domains SE Ranking found most frequently cited by AI assistants carried only a 2% adoption rate — one domain in fifty.

The pattern is consistent with SE Ranking's model-level finding. Their XGBoost model, trained to predict which domains receive AI citations, produced lower accuracy when llms.txt presence was included as a feature than when it was excluded. The file correlates slightly negatively with citation frequency in their dataset. Citation-leading domains are large, heavily indexed, freshly updated, and present across multiple platforms — properties that predate and operate independently of the llms.txt standard.

What Web AI Crawlers Actually Do at /llms.txt

No major AI company has publicly committed to reading llms.txt as a signal in production search or answer systems. Google's search liaison has confirmed that Google does not support the file and is not planning to. GPTBot, ClaudeBot, OAI-SearchBot, PerplexityBot, and Google-Extended all bypass the file in typical crawl runs and retrieve HTML content directly. The Ahrefs request logs confirm the gap: for the fraction of the 137,000-domain panel that had any traffic to their llms.txt files, AI search bots — the category that generates citations — sent 1% of requests. SEO audit tools sent 21% and unidentified bots 14%, together representing the dominant request source.

The only AI file standard with documented support from every major web AI crawler is robots.txt. All major crawlers — GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended, Bytespider, Meta-ExternalAgent, CCBot, and Applebot — publish user-agent strings and respect DISALLOW directives. The llms.txt specification has no equivalent operator commitment from any of them.

Where llms.txt Does Work: AI Coding Agents

Official /llms.txt Processing by AI System Type (2026)

Five major AI coding assistants actively fetch /llms.txt for documentation ingestion. Zero of the nine major web AI search and training crawlers have issued a public commitment to read it.

Source: AI User-Agent Landscape 2026; operator documentation

The picture changes entirely for AI coding assistants. Cursor, GitHub Copilot, Aider, Cline, and Windsurf all actively read /llms.txt and /llms-full.txt when a developer points them at a documentation domain. When a developer asks a coding assistant to work with an unfamiliar API, the assistant fetches the /llms.txt manifest to get a structured index of documentation URLs, then ingests the relevant content. The file serves this purpose well: it provides the routing layer the agent needs to find the right documentation pages without parsing the full HTML site structure.

This is the actual production use case for the file. The early advocates for llms.txt were documentation-heavy projects — developer tooling, API platforms, libraries — where the target audience is developers using coding agents, not users of AI search engines. Sites like Stripe, Vercel, and Supabase publish well-structured llms.txt files primarily to help developers who work with AI coding tools, not to improve their rank in ChatGPT responses.

What This Means for Site Owners

The useful framing is to split the question by intended audience. For AI web search crawlers — the systems that generate citations and potentially send referral traffic — llms.txt has no documented effect and no operator commitments. For AI coding agents indexing technical documentation — the systems helping developers understand your API — llms.txt actively improves ingestion and is worth publishing. These are different systems with different technical architectures and different ways of routing to content.

If you publish an llms.txt today, the primary beneficiaries are developers querying a coding assistant while building integrations with your product. The search-retrieval bots that might cite your site have not committed to reading it and the data suggests they are not.

For actual AI search visibility, the SE Ranking's model points to different levers. Brand search volume was the strongest predictor of citation frequency in their dataset, with a correlation coefficient of 0.334 — stronger than backlinks, which showed weak or neutral correlation. Content published within the past year is targeted by 65% of AI crawler traffic. Consistent presence across four or more platforms correlates with 2.8 times higher citation rates than single-platform brands.

These signals operate through the HTML content that AI crawlers actually retrieve and index, not through a manifest file that web AI crawlers skip. For access control — telling crawlers what they may or may not retrieve — robots.txt remains the only mechanism that all major AI crawlers actively follow. Any site using llms.txt to signal content permissions is directing that signal at systems that are not reading it.

The practical conclusion: publish an llms.txt if your site targets developers who use AI coding tools. Invest in content recency, structured markup, and cross-platform presence if the goal is AI search citation. Treat robots.txt as the single reliable mechanism for crawler access control. These three separate decisions address three separate technical requirements that llms.txt alone does not resolve.