AI Summaries Hallucinate 60% of the Time. Buyers Trust Them More Than Real Reviews.
A UC San Diego study found AI shopping summaries are wrong 60% of the time, but buyers trust them more than real reviews. What that means for brands.
AI Summaries Hallucinate 60% of the Time. Buyers Trust Them More Than Real Reviews.
Here’s the most uncomfortable finding in AI search this year: when researchers showed shoppers summaries of product reviews generated by six different LLMs, 84% said they would buy the product. When they showed the same shoppers the actual reviews the AI had summarized, only 52% said they would buy. The summaries were, on average, hallucinating in 60% of the high-stakes details that influenced the purchase decision.
That’s the trust paradox. AI gets your brand wrong, and buyers act on the wrong information faster than they would have acted on the truth.
The study, led by Abeer Alessa at UC San Diego, tested six large language models against 1,000 electronics reviews, 1,000 media interviews, and 8,500 news items. Then they put 70 subjects in front of either the originals or the AI summaries and tracked behavior. The summaries flipped the sentiment of real reviews in 26.5% of cases. They invented specifications. They mixed up brands. And the people who read them came away more confident, not less.
If you sell anything, this is your problem now. Not someday. Now.
Why hallucinations are worse than bad reviews
A bad review hurts. A hallucinated summary hurts differently, and most marketing teams haven’t built the muscle to deal with it.
A bad review is a signal you can answer. The customer is identifiable. The complaint is specific. You can apologize, fix it, escalate it, request a takedown, or write a response under the thread. The damage is local to the page where the review lives, and most prospective buyers will weigh it against the rest of the reviews they read.
A hallucinated AI summary skips all of that. There’s no author to contact. The wrong specification doesn’t appear on a page you can edit. It appears inside a conversation between a buyer and an assistant, alone in a private interface, with no context, no alternative voices, and no way to know which review or article the AI invented it from. The buyer doesn’t see “5 of 47 reviews mentioned this.” They see one polished sentence presented as fact.
Worse, AI summaries are read with a level of trust that real reviews don’t get. Coverage of the UCSD work noted that users were 30% more likely to trust incorrect AI outputs than equivalent claims in other formats. The polish reads as authority. The conversational tone reads as honesty. The fact that the assistant cites a few sources at the bottom reads as rigor, even when the claim above the citations doesn’t appear in any of them.
This is the part most brand teams underestimate. You’re not competing with bad reviews anymore. You’re competing with confident summaries that sound like a trusted advisor and are wrong more than half the time.
What “hallucination” actually looks like for a brand
The word “hallucination” gets thrown around like it means one thing. It doesn’t. For a brand, it shows up in at least four distinct ways, and each one has a different fix.
| Type | What it looks like | Where it usually comes from |
|---|---|---|
| Spec fabrication | The AI invents a feature, price, dimension, or version number you don’t ship | Old marketing copy, competitor confusion, training data drift |
| Sentiment flip | A negative review reads as positive in the summary, or vice versa | LLM averaging across mixed reviews, framing bias |
| Brand confusion | Your name gets attached to a competitor’s product or feature | Similar names, weak entity disambiguation, missing schema |
| Context collapse | A statement true in one context (e.g., a beta) becomes a general claim | LLMs strip qualifiers when summarizing |
The UCSD work focused on the first two. A separate analysis tracked by Damien Charlotin has now catalogued more than 1,000 legal cases in which AI hallucinations caused real-world harm, including the April 2026 incident where Sullivan & Cromwell formally apologized to a US bankruptcy court after opposing counsel discovered fabricated citations submitted on behalf of Prince Global Holdings. The fabrications weren’t malicious. They were what LLMs do when they don’t know.
Brand confusion and context collapse get less attention but cost more in aggregate. We’ve seen them in our own monitoring data: a B2B SaaS company quietly described as having a feature their main competitor ships and they don’t, repeated across half a dozen ChatGPT sessions before anyone on the marketing team noticed. The buyer who asked the question made a vendor decision before talking to either company.
Why this is getting worse, not better
The convenient story is that hallucinations are a temporary glitch — that GPT-6 or Gemini 3 will fix this and we can stop worrying. The data doesn’t support that.
A 2026 benchmark across 37 frontier models reported hallucination rates between 15% and 52% on knowledge-heavy tasks. Even on basic summarization, the best models hallucinate at least 0.7% of the time. Domain-specific rates are worse: 18.7% on legal questions, 15.6% on medical queries. The newer reasoning-heavy models are not categorically more accurate; in some benchmarks they’re less.
Two structural forces push the rate up rather than down. First, AI platforms are summarizing more aggressively to win user time. Google AI Mode, Perplexity, and ChatGPT Search all reduce the number of source clicks per session because the summary is the product. Second, AI training data is increasingly polluted by AI-generated content. When the next model trains on the previous model’s confident-but-wrong outputs, the wrongness compounds.
The shift toward agents makes this sharper. Perplexity’s revenue grew 50% in March after pivoting toward AI agents that book, shop, and email on behalf of users. When the agent is acting, not just summarizing, a 60% spec hallucination rate stops being an information problem and becomes a transaction problem. The agent buys the wrong thing. The user finds out at delivery.
The correction playbook
You can’t make AI stop hallucinating. You can change the inputs it pulls from, and you can catch the wrong outputs before they compound. Here’s the workflow we recommend to brand teams that take this seriously.
1. Audit what AI says about you across platforms. Run a fixed set of branded and category queries through ChatGPT, Perplexity, Google AI Mode, Claude, and Gemini. Use the same prompts every week. Record the brand mentions, the claims about your product, the source citations, and the sentiment. Compare to ground truth. The Conductor 2026 AEO/GEO Benchmarks Report — based on 3.3 billion sessions across 13,000+ domains — found that 94% of marketing leaders plan to increase AEO investment this year, but most are still measuring whether they appear, not whether what AI says is correct. Track both.
2. Build a source map for every recurring hallucination. When the AI claims something wrong, find the source. Is it an outdated comparison article from 2023? A G2 listing with an old version number? A Reddit thread describing a competitor’s feature under your brand’s name? Most hallucinations trace back to one or two specific URLs that AI keeps pulling from. The source is fixable. The model isn’t.
3. Make your own site the cleanest source on the question. AI summarizers reward consistency and recency. A homepage that lists current pricing, current features, current versions, and uses the same phrasing across the site is easier for an LLM to extract from than a sprawling content tree where the truth is scattered across 200 pages. Define the canonical phrasing for your top facts and make sure it appears on the pages AI is most likely to retrieve.
4. Strengthen entity signals. Implement Organization, Product, and Brand schema with @id and sameAs links to your Wikidata entry, LinkedIn page, Crunchbase profile, and any other authoritative sources. The Geneo team’s misattribution playbook is a solid reference here. We’ve covered why generic schema markup actually hurts AI visibility and what entity-rich schema looks like in practice.
5. Push corrections at the source level. If the hallucinated claim traces back to an outdated review article, contact the publisher with documentation and ask for an update. If it traces back to your own site, fix it. If it traces back to Wikipedia, propose an edit on the talk page with sources. AI platforms re-crawl. The next time they do, the source will be different.
6. Build a recurring monitoring loop. AI answers shift. We documented this in our piece on how AI answers change over time: the same query produces different answers across sessions, weeks, and model updates. A one-time audit doesn’t survive a model release. Weekly or daily monitoring on the queries that matter most is the only way to catch a hallucination before it compounds across thousands of buyer conversations.
What to stop doing
Three things to cut from your current playbook because they don’t help.
Stop optimizing only for citation count. A model can cite your domain at the bottom of an answer that misrepresents your product in the body. Citations without accuracy are worse than no citation at all, because they lend authority to the wrong claim.
Stop treating hallucinations as a tech problem for someone else. Engineering can’t fix what marketing won’t measure. The brand team owns the integrity of how the brand is described, in every interface, including the ones owned by OpenAI and Google.
Stop assuming “we’re a small brand, this doesn’t apply yet.” The UCSD study tested products at every price point. The 60% hallucination rate didn’t depend on how famous the product was. Smaller brands actually fare worse, because there’s less authoritative source material for the AI to anchor on, so it improvises more.
The real takeaway
AI summaries are persuasive, confident, and frequently wrong. They’ve already started making purchase decisions for your buyers, and the buyers don’t know they’re acting on hallucinated information. Your job is no longer just to be cited. Your job is to make sure that when you are cited, the claim above your name is one you’d be willing to put in a press release.
Most brand teams aren’t there yet. They’re still tracking whether they appear in AI answers. The harder question — and the one that will separate winners from also-rans over the next year — is whether what AI says about them is true.
That requires monitoring, not hoping.
Stop guessing about your AI search presence. Start your free RivalHound trial and get real data on what ChatGPT, Perplexity, and Google AI are saying about your brand.