AI Crawler Failure Modes: 57% of Top Sites Invisible, Lazy Load Fails for All
A May 2026 audit of the top 1,000 most-visited sites found 57% serve content only after JavaScript runs. The AI crawlers now generating referral traffic execute none of it. Here is the failure-mode breakdown.
Fifty-seven percent of the top 1,000 most-visited sites serve no meaningful body content in their raw HTTP response. That number comes from ModPageSpeed's May 28, 2026 audit of the top 1,000 global domains — a systematic check of whether meaningful content appears before JavaScript executes. The AI crawlers now sending measurable referral traffic execute zero JavaScript. For the majority of the web's busiest properties, those two facts intersect to produce responses that are, to any HTML-only crawler, effectively blank.
Method
Four sources underpin this analysis. ModPageSpeed audited the top 1,000 most-visited global sites on May 28, 2026, grading each on five dimensions: content fidelity, token efficiency, structured data, crawler access, and citation readiness. Writesonic tested 62 discrete webpage elements across six major AI crawlers in March 2026, including Copilot, DeepSeek, Grok, and three HTML-only crawlers. searchVIU analyzed 1.3 billion AI crawler requests across 23 distinct crawlers in November 2025. Vercel and MERJ published an analysis of more than 500 million GPTBot fetch requests, measuring JavaScript execution behavior directly.
Which Sites Are Already Dark
Of the 1,000 sites ModPageSpeed audited, 289 were unreachable — blocked by authentication walls, returning HTTP errors, or inaccessible by automated fetch. Of the remaining 711, 403 (57%) revealed their content only after JavaScript executed. Among that group are the internet's highest-traffic properties: Instagram, YouTube, Facebook, X, TikTok, Pinterest, Reddit, Amazon, Spotify, and Booking.com.
Sites scoring well share a structural characteristic: they deliver full content in the initial HTTP response before any script runs. Apple and Discord each scored 92 out of 100. Stripe, Zoom, and the BBC earned the top grade. Wikipedia, GitHub, Microsoft, and Stack Overflow earned a B. Two-thirds of the top 1,000 received a D or F on the five-part readability rubric.
The Rendering Tier Gap
searchVIU's November 2025 analysis of 1.3 billion requests across 23 AI crawlers found 69% incapable of JavaScript execution. A March 2026 Writesonic study that tested 62 webpage elements across six crawlers identified three rendering tiers. Copilot operates a full browser stack and found approximately 35% of the 62 test elements. DeepSeek and Grok ran headless environments with partial JavaScript support. The three HTML-only crawlers found between 21% and 22% of elements each — limited to whatever text, links, and metadata appear in the static server response.
The best-performing crawler in the study still retrieved only 35% of tested elements. Even full-browser AI crawlers do not scroll, do not interact with the page, and do not wait for deferred content. The gap between a full browser stack and an HTML-only parser is real, but both miss the majority of measurable page elements for reasons beyond JavaScript execution alone.
Three Failure Modes, in Order of Impact
The Writesonic study, with controlled tests across six crawlers and 62 elements, produces a failure-mode breakdown that holds across rendering tiers.
Lazy-loaded content — content triggered by scroll via IntersectionObserver — was invisible to all six tested crawlers. Elements placed 2,000 pixels below the initial viewport produced zero retrievals across every tier: HTML-only, headless JS, and full browser. No AI crawler scrolls. Product grids, review sections, and recommendation modules placed below the fold are silently absent from every AI crawler's view of a page, regardless of how the page is rendered server-side.
Client-side rendered content — DOM built entirely by JavaScript in the browser — is invisible to the 69% of crawlers that do not execute JavaScript. For a React, Vue, or Angular SPA, the server response is typically a root div and a set of bundle script tags. GPTBot and ClaudeBot see that shell and nothing else. The three HTML-only crawlers in the Writesonic study each saw between 21% and 22% of test elements, limited to content present in the HTML shell.
JavaScript-injected metadata, including JSON-LD blocks written to the DOM by tag managers or client-side SEO plugins, was invisible to every crawler in 9 of the 11 tested metadata element types. The two types visible to some crawlers were those present in the static <head>: Open Graph tags and <title>. JSON-LD injected after hydration or by a script-triggered tag manager rule failed for all six crawlers — including the full-browser crawler — confirming that the common CMS plugin implementation pattern does not reach any AI crawler regardless of its JavaScript capability.
What This Means for Site Owners
The 57% figure describes a population of sites, not a sliding scale for individual pages. A client-side rendered page is not partially dark — its dynamic content is typically 100% invisible to HTML-only crawlers, and its HTML shell is 100% visible. The diagnostic is direct: curl -s -A "GPTBot/1.0" https://yourdomain.com/page. Compare the response body against a normal browser fetch. A response under 2KB containing only navigation links and script tag references without readable text indicates the page is invisible to every HTML-only crawler in the AI ecosystem.
Lazy loading is the most underestimated failure mode because it produces no obvious signal in standard tooling. A curl check on a lazy-loading page shows clean headline text and above-fold content — the page appears readable. FAQ sections, review aggregations, and product detail panels below the scroll threshold are silently absent. For AI crawlers deciding whether to cite a page, below-fold content is frequently the most specific and highest-value information on it. The fix is structural: load content in the initial HTML response, not deferred to a scroll event. Reserve loading="lazy" for secondary images, not content elements.
The fix hierarchy follows from the failure-mode order. Lazy loading is the simplest to address and affects even server-rendered pages: remove scroll triggers from content elements. Client-side rendering requires server-side rendering, static generation at build time, or dynamic pre-rendering served to detected crawler user agents — a larger change with the highest coverage impact. Metadata injection failures require moving JSON-LD blocks from script-injected to server-rendered positions — a configuration change in most frameworks rather than a rewrite, and one that benefits every crawler tier simultaneously.