Measurement · May 7, 2026

70.6% of AI-Referred Clicks Land as "Direct" in GA4 — Here's How to Model What You're Missing

AI assistants strip referrer headers on the majority of outbound clicks. The result hides in your Direct channel. Server logs are the leading indicator that lets you recover the signal.

70.6% of AI traffic lands as "Direct" in GA4 — a figure drawn from real data across 446,000 visits. That isn't a rounding error or an edge case. It is the default outcome when an AI assistant recommends your content and a user clicks through from a mobile app. Thirty to fifty percent of AI-driven pipeline hides in "Direct" traffic at typical B2B SaaS companies because referrer data strips in transit. For consumer properties where mobile usage skews higher, the figure is worse. The measurement stack you are using today was not designed to see any of this.

Method

The figures in this post are drawn from four published datasets and Wrenda's own server-log analytics. Primary external sources: Loamly's 446K-visit study (updated February 2026), Wheelhouse DMG's controlled server-log comparison (March 2026), the Retailgentic dark agentic commerce traffic analysis (April 2026), and Workshop Digital's 181.6M-session GA4 study across 22 clients (February 2026). We used fetch_url to read each source in full, cross-referenced the per-platform referrer-loss rates, and layered in Wrenda's own top_paths signal to ground the deep-page examples.

1. The Size of the Gap

% of AI-Referred Clicks Landing as "Direct" in GA4 — by Platform

Referrer loss rates by AI platform, synthesized from Loamly (446K visits), Wheelhouse DMG controlled test, Retailgentic DACT analysis, and Workshop Digital 181M-session study. Mobile app traffic is the dominant driver in all cases.

Source: Loamly (Feb 2026), Wheelhouse DMG (Mar 2026), Retailgentic (Apr 2026), Workshop Digital (Feb 2026)

One analysis of just under 450,000 AI-adjacent visits found that 70.6% of them arrived without referrer headers and landed as Direct in GA4 — and that is a conservative midpoint; for consumer-facing brands where mobile AI usage skews higher, it is likely worse.

Per-platform, the picture fragments further. AI Mode (the major search-engine AI overviews surface) traffic lands as Direct with 91% frequency, meaning every AI-assistant product recommendation that converts is almost certainly being credited to "Direct" or "Branded Organic." Perplexity is the best citizen here, with only around 30% landing as dark traffic, because it sends cleaner referrer data than its peers. One major AI search platform only began appending utm_source to desktop citation links in mid-2025; traffic from its mobile app, and any citations generated before that date, passes no referrer information and appears as Direct in GA4.

The Workshop Digital study quantifies a specific slice of this problem: roughly 22% of sessions from one major AI assistant and a staggering 32% from another are dumped into the "(not set)" medium in GA4 — invisible even inside the Referral bucket. In one technology-brand case, visible AI referral sessions grew roughly 163% year-over-year, adding approximately 3,100 sessions. Direct traffic grew 42% over the same period, adding approximately 64,000 sessions — a mismatch that only makes sense if the majority of AI-originated visits were arriving without a referrer.

2. The Mechanism — Why Referrers Are Lost

Four distinct mechanisms strip the Referer header before a session reaches your server:

Mobile app webviews. AI assistant iOS and Android apps open external links in the phone's system browser — but strip the Referer HTTP header in the process. The same behavior applies to mobile apps delivering chat-based AI interfaces via native apps. Built-in browser features that open shared links in an internal sandbox strip the referrer header entirely; GA4 logs the session as "Direct" (or (not set)).

Referrer-Policy on the outbound link. Four mechanisms strip the referrer from AI assistant clicks: the strict-origin-when-cross-origin policy on the platform domain, rel=noreferrer on paid-tier inline links, mobile-app WKWebView (iOS) and Custom Tabs (Android), and clipboard copy-paste.

User tier differences. Free-tier users of at least one major AI assistant have the referrer stripped from inline links in chat responses, while paid Plus and Team subscribers keep the referrer on the same kinds of links. Source citations at the bottom of responses carry UTM parameters regardless of tier. The consequence: a B2B SaaS publisher whose readers skew to paid subscribers will see more correctly labelled traffic than a consumer site dominated by free-tier users.

Copy-paste navigation. The dominant dark-traffic mechanism is URL copying: a user asks an AI assistant a question, the AI cites your page, the user copies the URL, pastes it into their browser, and the session is logged as Direct.

AI search platforms that are part of a wider search ecosystem present a separate tracking challenge because there is no distinct referrer passed when users click from an AI Overview to your site. Those visits look identical to standard organic search.

The server-side/client-side split. Standard analytics runs client-side JavaScript. AI crawlers don't execute JavaScript. So the AI assistant's server-side fetch — the one that decides whether to cite your page — is entirely invisible to GA4. Only server access logs record it.

3. The Agentic Attribution Model

The core insight: server logs are a leading indicator of human traffic GA4 will misclassify. When an agentic user-agent — ChatGPT-User, Perplexity-User, OAI-SearchBot, <model-vendor-bot-1> — hits a deep page, it is because a real user mid-session asked about that URL. The human click follows within seconds to minutes. That click lands in GA4 as Direct.

Perplexity, for instance, uses PerplexityBot for indexing and Perplexity-User when the AI assistant browses on behalf of a live user — two distinct user-agent strings with two distinct purposes. The Perplexity-User hit is the cleaner signal: it means a real user is mid-query, waiting for an answer about your content.

Agentic Attribution Model — Signal Confidence by Page Type

Estimated confidence of the server-log × GA4 Direct join model at recovering true AI-referred sessions, broken down by page depth. Homepage hits produce many false positives; long-tail deep pages (URL depth ≥3, low organic baseline) produce the cleanest signal.

Source: Wrenda analysis — bot_mix + top_paths metrics, May 2026

The join logic exploits a narrow time window and URL-level matching:

-- Step 1: Pull agentic UA hits from server logs
-- (replace with your log table / access log format)
WITH ai_hits AS (
  SELECT
    request_path                          AS url,
    TIMESTAMP_TRUNC(hit_time, SECOND)     AS hit_ts,
    user_agent
  FROM `project.logs.access_log`
  WHERE
    hit_date BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
                 AND CURRENT_DATE()
    AND REGEXP_CONTAINS(user_agent,
          r'(?i)(ChatGPT-User|OAI-SearchBot|PerplexityBot|Perplexity-User|<model-vendor-bot-1>|<model-vendor-ua-1>)')
    -- Exclude homepage — signal is too noisy there
    AND request_path != '/'
    AND LENGTH(request_path) > 10
),

-- Step 2: Pull Direct GA4 sessions from BigQuery export
ga4_direct AS (
  SELECT
    (SELECT value.string_value
       FROM UNNEST(event_params)
      WHERE key = 'page_location')        AS url,
    TIMESTAMP_MICROS(event_timestamp)     AS session_ts,
    user_pseudo_id
  FROM `project.analytics_XXXXXXX.events_*`
  WHERE
    _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
                      AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
    AND event_name = 'session_start'
    AND traffic_source.source = '(direct)'
    AND traffic_source.medium = '(none)'
)

-- Step 3: Join on matching URL within ±5 minute window
SELECT
  a.url,
  COUNT(DISTINCT g.user_pseudo_id)        AS agentic_attributable_sessions,
  COUNT(DISTINCT a.user_agent)            AS distinct_ai_agents
FROM ai_hits a
JOIN ga4_direct g
  ON  a.url = REGEXP_EXTRACT(g.url, r'https?://[^/]+(/.*)') -- strip domain
  AND ABS(TIMESTAMP_DIFF(g.session_ts, a.hit_ts, SECOND)) <= 300 -- ±5 min
GROUP BY a.url
ORDER BY agentic_attributable_sessions DESC
LIMIT 50;

The delta this query produces — call it agentic_attributable_sessions — is your estimate of Direct sessions that are, with high probability, AI-originated. The math: subtract the pre-AI baseline Direct rate for each URL (use a lookback period from before widespread AI assistant adoption, e.g. Q1 2023) from the current Direct rate. The residual uplift on URLs that are also receiving agentic UA hits is the attributable fraction.

URL-normalisation matters. Strip the domain and trailing slash before the join. GA4's page_location includes the full https://domain.com/path?param=value; server logs record only /path. Use REGEXP_EXTRACT or a UDF to align them.

4. Validation — Signal vs. Noise

AI Crawler-to-Human-Referral Ratio — Crawls Per Attributed Visit

Server logs record hundreds of thousands of AI crawler hits per attributed human referral. This imbalance is what makes server-log signals a leading indicator: the crawl always precedes the human click that GA4 may never see.

Source: Loamly / Cloudflare Radar crawl-to-refer data, Q1 2026

The crawl-to-referral ratio visualises why server logs carry signal at all: for every one human visitor one major AI assistant refers to your site, its crawler visits 500,000 times. Another assistant's ratio is 3,700:1; a third's is 700:1. The crawler hit is not a guarantee of a human click, but it is a necessary precondition — and that makes it a useful prior.

Three validation checks separate true signal from noise:

Homepage exclusion. Homepages receive high baseline Direct traffic from bookmarks, brand recall, and email clicks. An AI crawler hit on / cannot be cleanly disambiguated. Restrict the join to URLs with URL depth ≥ 2 (/blog/article-slug, /products/model-name/specs) and organic-search baseline below a threshold (pull this from GA4's Organic sessions on the same URL set).

What makes the pattern hard to dismiss is where the Direct traffic is actually landing. One agency pulled the top pages being crawled by AI systems via server logs, then looked at how Direct sessions to those same pages had grown in GA4. The correlation was sharpest on deep, low-volume, long-tail pages — exactly the pages that have no other plausible explanation for sudden Direct uplift.

Pre/post baseline split. Compute Direct session rate for each URL across two periods: a pre-AI-adoption baseline (Q1 2023 or earlier) and current. URLs where current Direct rate is elevated and where AI UA hits in the log are concentrated are the high-confidence candidates. URLs where Direct grew uniformly across all pages are more likely brand-awareness or paid-campaign effects.

Cross-check with AI-UA crawl frequency. Pages receiving high Perplexity-User or ChatGPT-User (agentic, not training) hits but zero increase in Direct sessions are either being cited without clicks, or the user is copy-pasting into a new tab. Both are useful signals: the first tells you what AI is reading; the second tells you where your anchor text is weak or the CTA to click through is missing.

5. What to Do Operationally

Build the GA4 custom channel group first. It cannot recover the no-signal third of AI traffic, but it surfaces the referrer-bearing majority. In GA4, set the parameter to "Session source," choose "matches regex" as the operator, and input a pattern covering your known AI referrer domains including perplexity, chatgpt, and other AI platforms. Place this channel above "Referral" in priority order. AI engines launch new domains frequently. One major assistant's referrer patterns changed twice in 2025. Your regex from Q1 may miss Q3 traffic — review and update the custom channel group quarterly.

Rename the Direct segments you can identify. After running the BigQuery join, you have a list of URLs with agentic_attributable_sessions counts. Create a GA4 custom segment: Source = (direct), Medium = (none), Landing page matches any of those URLs, Entry date after the earliest AI-UA hit on that URL. This segment is your "Probable AI Direct" cohort. Compare its conversion rate to the residual Direct cohort — dark AI traffic converts at 4.1× non-AI traffic, so the segment should stand out immediately.

Operationalise the server-log join as a scheduled pipeline. Export daily GA4 data to BigQuery (use the Daily batch export — streaming export drops traffic source attribution for new users, and attribution is exactly what you need these fields for). Run the SQL join nightly. Feed the agentic_attributable_sessions count into a Looker Studio blended report alongside your visible AI Referral channel. The two numbers together are your upper and lower bound on true AI-originated sessions.

Treat URL-level crawl activity as a content signal. Identify agentic user-agents in your access logs. Correlate crawl frequency per page against GA4 referral data for the same pages. Pages that are heavily crawled but generate no visible referrals are being read by the assistant but not linked with clickable citations. That is a structured-data and anchor-text problem, not a traffic problem — and it is solvable.

Add a "How did you hear about us?" field to high-value forms. This is currently the easiest tactical fix any team can ship to start seeing the AI journeys their analytics are missing. Self-reported attribution is noisy, but it provides a qualitative cross-check against your server-log model — and it costs nothing to deploy.

The fundamental constraint is architectural: the HTTP Referer header is set by the client. If the browser suppresses it via Referrer-Policy, rel=noreferrer, or in-app browser behaviour, server-side code sees no Referer to log. No analytics configuration recovers a header that was never sent. The server-log join is the closest available approximation — a probabilistic model, not a deterministic one. Treat its output as an estimate with a confidence range, not as a hard count.