GA4 Can Finally See AI Chatbot Traffic — But That Still Won't Tell You Which Channels Are Actually Incremental

6 Min Read

by Allison Nick

Share this article

Every analytics team in the industry spent the last two weeks doing the same thing: opening GA4, navigating to the traffic acquisition report, and staring at a new cluster of referral sources tagged as AI chatbot traffic. Google now automatically categorizes visits from ChatGPT, Perplexity, Claude, and other AI interfaces, so you can finally see how much traffic arrives through conversational search instead of traditional SERPs. Useful update. But if your reaction was “great, now I know where these users came from,” you stopped one question short of the one that actually matters: did your marketing cause these conversions, or would they have happened anyway?

Cleaner Traffic Labels Don’t Answer the Incrementality Question

Before the update, AI-referred visits were bucketed into direct, organic, or referral depending on how the chatbot’s browser handled the handoff. Now they get their own channel grouping. That’s cleaner data hygiene, and it matters for directional reporting.

But cleaner labels aren’t causal evidence. Knowing that 8% of your converting traffic arrived via an AI chatbot tells you something about user behavior. But it tells you nothing about whether your spend on SEO, content marketing, or any other channel influenced that visit. The same gap exists in every row of your GA4 report. You can see that a user arrived from a Meta ad, clicked a Google Shopping listing, and converted after opening an email, yet still have no empirical evidence that any of those touchpoints changed behavior versus what would have happened with zero marketing intervention.

This isn’t a philosophical distinction. It’s the difference between allocating budget to channels that generate true lift and pouring money into touchpoints that are simply present at the point of conversion.

More Traffic Sources Make Multi-Touch Attribution Less Reliable, Not More

Every new traffic source — AI chatbots, retail media networks, connected TV, programmatic direct mail — adds another node to an already overloaded attribution model. Multi-touch attribution (MTA) was supposed to solve the last-touch problem by distributing credit across the journey. In practice, MTA models are breaking down faster than they’re being refined.

MTA models routinely misallocate channel credit depending on model type and data completeness. Add AI chatbot referrals into the mix — a channel with no click cost, no impression data, and no campaign structure — and the model has one more input it can’t properly weight. The result isn’t better attribution. It’s more sophisticated-looking attribution that’s still wrong.

Performance marketers accountable to CPA and direct mail ROAS can’t afford to optimize against misallocated credit. If your MTA model gives 15% credit to a channel that’s capturing demand rather than creating it, you’ll over-invest in that channel and under-invest in the one actually driving incremental conversions.

Holdout-Based Incrementality Testing Isolates True Lift Where Attribution Models Can’t

The fix isn’t a better attribution model. It’s a fundamentally different measurement approach.

Holdout-based incrementality testing splits your audience into a test group that receives a marketing treatment and a control group that doesn’t, then measures the difference in conversion rates. The delta is your incremental lift: the revenue that would not have existed without your marketing.

This methodology is channel-agnostic, which is exactly why it works in an environment where traffic sources multiply every quarter. It doesn’t matter whether a user arrived via AI chatbot, paid search, or a QR code on a direct mail piece. What matters is whether the group that received your campaign converted at a higher rate than the group that didn’t.

Direct mail is uniquely well-suited for holdout testing because the audience is deterministic. Every household either receives a piece or doesn’t. There’s no frequency capping ambiguity, no cross-device matching, no probabilistic modeling. Postie’s platform supports holdout group creation natively, matching test and control groups on demographics, purchase history, and behavioral signals. Conversion deltas are measured via deterministic matchback attribution against actual transaction records, a clean read on true lift, not modeled lift.

One important note: holdout testing is a best practice that Postie supports and recommends, but it’s something clients opt into, not a default applied to every campaign. Incrementality is best measured across multiple campaigns over time, giving a more reliable read on whether the channel is driving behavior change.

What Sophisticated Performance Teams Are Actually Measuring

The teams producing the most defensible performance data aren’t debating which MTA model to use. They’re running holdout groups and matched-market tests across their channel mix to build an empirical incrementality stack — a channel-by-channel view of true lift that informs budget allocation with causal evidence rather than correlative credit.

Here’s what that looks like in practice for performance direct mail:

Holdout groups on campaigns where incrementality matters most: A percentage of the qualified audience is withheld from the mail send. Conversions in the holdout group represent baseline demand; conversions above that baseline in the mailed group represent incremental revenue. This is particularly valuable for CRM reactivation programs and any campaign where you need to prove channel-level ROI beyond correlation.
Matched-market geographic tests: Two comparable DMAs receive different treatments — one gets direct mail layered onto the existing digital mix, the other runs digital only. The revenue delta isolates the incremental contribution of mail.
Sequential testing across creative and audience: Once you’ve established that direct mail drives incremental lift, you test which audiences (including lookalike audiences built from first-party data) and which creative formats produce the highest marginal return. Not the highest raw ROAS, but the highest incremental ROAS.

Teams running holdout-based measurement frequently find that direct mail’s incremental contribution exceeds its attributed contribution in MTA models because MTA tends to under-credit channels that operate earlier in the consideration window or outside the digital click path entirely.

The GA4 Update Is Useful. The Question It Can’t Answer Is More Important.

Explore your AI chatbot traffic in GA4. Understand the volume, the conversion rates, the landing pages. That’s good analytics hygiene.

But don’t mistake a new row in your traffic report for a new insight into what’s driving your business. The question that GA4, or any click-stream analytics platform, cannot answer is whether your marketing caused the conversion or simply observed it. The only way to answer that question is to remove the marketing for a matched control group and measure the difference.

If you want to see how holdout-based incrementality testing and matchback attribution work for programmatic direct mail, talk to Postie about the methodology and the math.

Share this article

Launching DM tips & tricks to your inbox