70.6% of AI-Referred Clicks Land as "Direct" in GA4 — Here's How to Model What You're Missing
AI assistants strip referrer headers on the majority of outbound clicks. The result hides in your Direct channel. Server logs are the leading indicator that lets you recover the signal.
70.6% of AI traffic lands as "Direct" in GA4 — a figure drawn from real data across 446,000 visits. That isn't a rounding error or an edge case. It is the default outcome when an AI assistant recommends your content and a user clicks through from a mobile app. Thirty to fifty percent of AI-driven pipeline hides in "Direct" traffic at typical B2B SaaS companies because referrer data strips in transit. For consumer properties where mobile usage skews higher, the figure is worse. The measurement stack you are using today was not designed to see any of this.
Method
The figures in this post are drawn from four published datasets and Wrenda's own server-log analytics. Primary external sources: Loamly's 446K-visit study (updated February 2026), Wheelhouse DMG's controlled server-log comparison (March 2026), the Retailgentic dark agentic commerce traffic analysis (April 2026), and Workshop Digital's 181.6M-session GA4 study across 22 clients (February 2026). We used fetch_url to read each source in full, cross-referenced the per-platform referrer-loss rates, and layered in Wrenda's own top_paths signal to ground the deep-page examples.
1. The Size of the Gap
One analysis of just under 450,000 AI-adjacent visits found that 70.6% of them arrived without referrer headers and landed as Direct in GA4 — and that is a conservative midpoint; for consumer-facing brands where mobile AI usage skews higher, it is likely worse.
Per-platform, the picture fragments further. AI Mode (the major search-engine AI overviews surface) traffic lands as Direct with 91% frequency, meaning every AI-assistant product recommendation that converts is almost certainly being credited to "Direct" or "Branded Organic." Perplexity is the best citizen here, with only around 30% landing as dark traffic, because it sends cleaner referrer data than its peers. One major AI search platform only began appending utm_source to desktop citation links in mid-2025; traffic from its mobile app, and any citations generated before that date, passes no referrer information and appears as Direct in GA4.
The Workshop Digital study quantifies a specific slice of this problem: roughly 22% of sessions from one major AI assistant and a staggering 32% from another are dumped into the "(not set)" medium in GA4 — invisible even inside the Referral bucket. In one technology-brand case, visible AI referral sessions grew roughly 163% year-over-year, adding approximately 3,100 sessions. Direct traffic grew 42% over the same period, adding approximately 64,000 sessions — a mismatch that only makes sense if the majority of AI-originated visits were arriving without a referrer.
2. The Mechanism — Why Referrers Are Lost
Four distinct mechanisms strip the Referer header before a session reaches your server:
Mobile app webviews. AI assistant iOS and Android apps open external links in the phone's system browser — but strip the Referer HTTP header in the process. The same behavior applies to mobile apps delivering chat-based AI interfaces via native apps. Built-in browser features that open shared links in an internal sandbox strip the referrer header entirely; GA4 logs the session as "Direct" (or (not set)).
Referrer-Policy on the outbound link. Four mechanisms strip the referrer from AI assistant clicks: the strict-origin-when-cross-origin policy on the platform domain, rel=noreferrer on paid-tier inline links, mobile-app WKWebView (iOS) and Custom Tabs (Android), and clipboard copy-paste.
User tier differences. Free-tier users of at least one major AI assistant have the referrer stripped from inline links in chat responses, while paid Plus and Team subscribers keep the referrer on the same kinds of links. Source citations at the bottom of responses carry UTM parameters regardless of tier. The consequence: a B2B SaaS publisher whose readers skew to paid subscribers will see more correctly labelled traffic than a consumer site dominated by free-tier users.
Copy-paste navigation. The dominant dark-traffic mechanism is URL copying: a user asks an AI assistant a question, the AI cites your page, the user copies the URL, pastes it into their browser, and the session is logged as Direct.
AI search platforms that are part of a wider search ecosystem present a separate tracking challenge because there is no distinct referrer passed when users click from an AI Overview to your site. Those visits look identical to standard organic search.
The server-side/client-side split. Standard analytics runs client-side JavaScript. AI crawlers don't execute JavaScript. So the AI assistant's server-side fetch — the one that decides whether to cite your page — is entirely invisible to GA4. Only server access logs record it.
3. The Agentic Attribution Model
The core insight: server logs are a leading indicator of human traffic GA4 will misclassify. When an agentic user-agent — ChatGPT-User, Perplexity-User, OAI-SearchBot, <model-vendor-bot-1> — hits a deep page, it is because a real user mid-session asked about that URL. The human click follows within seconds to minutes. That click lands in GA4 as Direct.
Perplexity, for instance, uses PerplexityBot for indexing and Perplexity-User when the AI assistant browses on behalf of a live user — two distinct user-agent strings with two distinct purposes. The Perplexity-User hit is the cleaner signal: it means a real user is mid-query, waiting for an answer about your content.
The join logic exploits a narrow time window and URL-level matching:
-- Step 1: Pull agentic UA hits from server logs
-- (replace with your log table / access log format)
WITH ai_hits AS (
SELECT
request_path AS url,
TIMESTAMP_TRUNC(hit_time, SECOND) AS hit_ts,
user_agent
FROM `project.logs.access_log`
WHERE
hit_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
AND CURRENT_DATE()
AND REGEXP_CONTAINS(user_agent,
r'(?i)(ChatGPT-User|OAI-SearchBot|PerplexityBot|Perplexity-User|<model-vendor-bot-1>|<model-vendor-ua-1>)')
-- Exclude homepage — signal is too noisy there
AND request_path != '/'
AND LENGTH(request_path) > 10
),
-- Step 2: Pull Direct GA4 sessions from BigQuery export
ga4_direct AS (
SELECT
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_location') AS url,
TIMESTAMP_MICROS(event_timestamp) AS session_ts,
user_pseudo_id
FROM `project.analytics_XXXXXXX.events_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
AND event_name = 'session_start'
AND traffic_source.source = '(direct)'
AND traffic_source.medium = '(none)'
)
-- Step 3: Join on matching URL within ±5 minute window
SELECT
a.url,
COUNT(DISTINCT g.user_pseudo_id) AS agentic_attributable_sessions,
COUNT(DISTINCT a.user_agent) AS distinct_ai_agents
FROM ai_hits a
JOIN ga4_direct g
ON a.url = REGEXP_EXTRACT(g.url, r'https?://[^/]+(/.*)') -- strip domain
AND ABS(TIMESTAMP_DIFF(g.session_ts, a.hit_ts, SECOND)) <= 300 -- ±5 min
GROUP BY a.url
ORDER BY agentic_attributable_sessions DESC
LIMIT 50;
The delta this query produces — call it agentic_attributable_sessions — is your estimate of Direct sessions that are, with high probability, AI-originated. The math: subtract the pre-AI baseline Direct rate for each URL (use a lookback period from before widespread AI assistant adoption, e.g. Q1 2023) from the current Direct rate. The residual uplift on URLs that are also receiving agentic UA hits is the attributable fraction.
URL-normalisation matters. Strip the domain and trailing slash before the join. GA4's page_location includes the full https://domain.com/path?param=value; server logs record only /path. Use REGEXP_EXTRACT or a UDF to align them.
4. Validation — Signal vs. Noise
The crawl-to-referral ratio visualises why server logs carry signal at all: for every one human visitor one major AI assistant refers to your site, its crawler visits 500,000 times. Another assistant's ratio is 3,700:1; a third's is 700:1. The crawler hit is not a guarantee of a human click, but it is a necessary precondition — and that makes it a useful prior.
Three validation checks separate true signal from noise:
Homepage exclusion. Homepages receive high baseline Direct traffic from bookmarks, brand recall, and email clicks. An AI crawler hit on / cannot be cleanly disambiguated. Restrict the join to URLs with URL depth ≥ 2 (/blog/article-slug, /products/model-name/specs) and organic-search baseline below a threshold (pull this from GA4's Organic sessions on the same URL set).
What makes the pattern hard to dismiss is where the Direct traffic is actually landing. One agency pulled the top pages being crawled by AI systems via server logs, then looked at how Direct sessions to those same pages had grown in GA4. The correlation was sharpest on deep, low-volume, long-tail pages — exactly the pages that have no other plausible explanation for sudden Direct uplift.
Pre/post baseline split. Compute Direct session rate for each URL across two periods: a pre-AI-adoption baseline (Q1 2023 or earlier) and current. URLs where current Direct rate is elevated and where AI UA hits in the log are concentrated are the high-confidence candidates. URLs where Direct grew uniformly across all pages are more likely brand-awareness or paid-campaign effects.
Cross-check with AI-UA crawl frequency. Pages receiving high Perplexity-User or ChatGPT-User (agentic, not training) hits but zero increase in Direct sessions are either being cited without clicks, or the user is copy-pasting into a new tab. Both are useful signals: the first tells you what AI is reading; the second tells you where your anchor text is weak or the CTA to click through is missing.
5. What to Do Operationally
Build the GA4 custom channel group first. It cannot recover the no-signal third of AI traffic, but it surfaces the referrer-bearing majority. In GA4, set the parameter to "Session source," choose "matches regex" as the operator, and input a pattern covering your known AI referrer domains including perplexity, chatgpt, and other AI platforms. Place this channel above "Referral" in priority order. AI engines launch new domains frequently. One major assistant's referrer patterns changed twice in 2025. Your regex from Q1 may miss Q3 traffic — review and update the custom channel group quarterly.
Rename the Direct segments you can identify. After running the BigQuery join, you have a list of URLs with agentic_attributable_sessions counts. Create a GA4 custom segment: Source = (direct), Medium = (none), Landing page matches any of those URLs, Entry date after the earliest AI-UA hit on that URL. This segment is your "Probable AI Direct" cohort. Compare its conversion rate to the residual Direct cohort — dark AI traffic converts at 4.1× non-AI traffic, so the segment should stand out immediately.
Operationalise the server-log join as a scheduled pipeline. Export daily GA4 data to BigQuery (use the Daily batch export — streaming export drops traffic source attribution for new users, and attribution is exactly what you need these fields for). Run the SQL join nightly. Feed the agentic_attributable_sessions count into a Looker Studio blended report alongside your visible AI Referral channel. The two numbers together are your upper and lower bound on true AI-originated sessions.
Treat URL-level crawl activity as a content signal. Identify agentic user-agents in your access logs. Correlate crawl frequency per page against GA4 referral data for the same pages. Pages that are heavily crawled but generate no visible referrals are being read by the assistant but not linked with clickable citations. That is a structured-data and anchor-text problem, not a traffic problem — and it is solvable.
Add a "How did you hear about us?" field to high-value forms. This is currently the easiest tactical fix any team can ship to start seeing the AI journeys their analytics are missing. Self-reported attribution is noisy, but it provides a qualitative cross-check against your server-log model — and it costs nothing to deploy.
The fundamental constraint is architectural: the HTTP Referer header is set by the client. If the browser suppresses it via Referrer-Policy, rel=noreferrer, or in-app browser behaviour, server-side code sees no Referer to log. No analytics configuration recovers a header that was never sent. The server-log join is the closest available approximation — a probabilistic model, not a deterministic one. Treat its output as an estimate with a confidence range, not as a hard count.