Your AI Sentiment Score Says More About the Platform Than Your Brand

Run the same brand through six AI platforms and ask each one what it thinks. You will not get one answer. You will get six, and they will not even agree on the mood.

Superlines did exactly this. Across 6,447 brand mentions collected over 30 days, from January 14 to February 13, 2026, the share of mentions framed positively ranged from 90.9% on Microsoft Copilot down to 0% on Claude (Superlines). Same brands. Same window. A 90-point spread in how favorably they were described, decided entirely by which model did the describing.

If you are reporting a single “AI sentiment” number to your CMO, you are reporting a coin flip and calling it a measurement.

The numbers

Here is the full breakdown of how positively each platform framed the brands it mentioned:

Platform	Positive sentiment
Microsoft Copilot	90.9%
Perplexity	76.9%
Grok	58.2%
Google AI Mode	35.6%
ChatGPT	6.8%
Claude	0%

Look at ChatGPT and Claude at the bottom and your first instinct is probably alarm. ChatGPT runs the largest share of AI search traffic, and it framed brands positively less than 7% of the time. Claude managed zero.

That instinct is wrong, and getting it wrong is how teams waste a quarter chasing a problem that doesn’t exist.

Low positive does not mean negative

The trap here is treating sentiment as a single axis where everything that isn’t positive must be negative. It isn’t.

AI models rarely attack brands. As Superlines put it in a separate piece, “LLMs rarely criticize brands directly, and responses are typically either positive or neutral.” So when Claude scores 0% positive, it is not telling users your product is bad. It is describing your product the way a reference librarian would: flatly, factually, without adjectives. “Acme offers project management software with task tracking and reporting” carries no sentiment. It is also not an insult.

ChatGPT and Claude default to that neutral register. Copilot and Perplexity reach for warmer language. That difference is a property of how each model was trained and tuned, not a verdict on the brands passing through them. The number measures the platform’s house style first and your reputation second.

Which means the dangerous report isn’t “Claude scored us at zero.” It’s “ChatGPT sentiment dropped from neutral to actually negative” — a real signal hiding inside a metric most teams have already learned to ignore because the baseline looks scary.

Why the platforms diverge

The sentiment gap is downstream of a sourcing gap. These models don’t share a brain, and they don’t share a source list.

We’ve written before about how only 11% of websites get cited by both ChatGPT and Perplexity — each platform pulls from an almost entirely different pool of sources. Perplexity leans heavily on Reddit and community discussion. ChatGPT leans on encyclopedic, authoritative pages. Grok weights real-time social posts. When the inputs differ that much, the tone of the output follows.

A platform that builds its answer from enthusiastic Reddit threads and review sites will sound enthusiastic. A platform that builds its answer from spec sheets and Wikipedia will sound clinical. Neither is hallucinating a feeling. They are both faithfully reflecting the emotional temperature of whatever they read.

The same fracturing shows up in raw visibility. In a separate Superlines analysis of 34,234 AI responses across 10 platforms in March 2026, citation rates for the same brand ranged from 27% on Grok to 0% on Claude, Mistral, and DeepSeek — a 615x gap between the most and least visible platforms (Superlines). A brand can be loud and beloved on one platform and silent on another, and a single blended score erases both facts.

What actually drives the sentiment you can change

The platform sets the register. But the raw material — the sources each model reads about you — is the part you can influence.

Ahrefs studied 75,000 brands and found that branded web mentions correlate with AI visibility far more strongly than traditional SEO signals, at roughly 0.66 to 0.71, while backlink counts and domain authority trailed well behind (Ahrefs). The places people talk about you — forums, reviews, news, video — are the same places the models read to form a tone. Improve the conversation there and you shift the inputs every platform draws from, even the clinical ones.

That reframes sentiment work. You are not optimizing a model. You are improving the corpus the model summarizes.

How to read sentiment without overreacting

Most sentiment dashboards collapse all of this into one cheerful gauge. Don’t trust it. Here is the discipline that survives the noise:

1. Track sentiment per platform, never blended. A single average mixes Copilot’s warmth with Claude’s neutrality and produces a number that describes no real user’s experience. Break it out by platform or it tells you nothing actionable.

2. Calibrate to each platform’s baseline. Before you panic about a low score, learn the platform’s house style. ChatGPT framing a brand neutrally is normal. ChatGPT framing it negatively is the alarm. Judge yourself against the platform’s typical register, not against an absolute 100%.

3. Read the sentences, not just the score. A percentage tells you something changed; it never tells you why. The fix lives in the actual language — which claim, which comparison, which sourced complaint. Sentiment scores are smoke detectors, not diagnoses.

4. Separate neutral from negative. Insist your tooling distinguish “no opinion” from “bad opinion.” Collapsing them is how a perfectly fine neutral baseline gets mistaken for a crisis, and how a genuine reputation problem gets buried under one.

5. Weight by where your buyers are. A glowing 90.9% on Copilot is worth less than a mediocre score on the platform your customers actually open every morning. Match the attention you pay to the traffic each platform sends you, not to whichever one flatters you most.

Here’s the contrast in plain terms:

Don’t	Do
Report one blended sentiment number	Report sentiment per platform
Treat low positive as negative	Separate neutral from negative
React to the score	Read the underlying sentences
Optimize for the friendliest platform	Prioritize where your buyers search

The takeaway

The cross-platform sentiment spread isn’t a flaw in the data. It’s the most honest thing the data is telling you: there is no such thing as “what AI thinks of your brand.” There is only what each model, reading its own slice of the web in its own voice, happens to say.

This is the same lesson as measurement noise within a single platform — that one prompt can’t measure your visibility — scaled up across every platform at once. The answer is the same too. Stop reaching for one tidy number. Measure each platform on its own terms, calibrate to its baseline, and read the words behind the score before you act. That’s also the difference between a GEO metric that earns a place on the dashboard and one that just looks reassuring.

A brand can be adored on Copilot and a footnote on Claude in the very same week. If your reporting can’t hold both of those truths at once, it isn’t measuring your reputation. It’s flattening it.

RivalHound tracks your brand’s sentiment and visibility across ChatGPT, Google AI, Perplexity, Copilot, and more — broken out by platform, not blended into one misleading number. Start monitoring to see where you actually stand.