Technical

GPTBot Lost 85% of the Web. Your AI Bot Strategy Probably Isn't Ready.

A 66.7B request study shows GPTBot coverage crashed from 84% to 12%. Here's which AI bots to allow, which to block, and why it matters.

RivalHound Team
9 min read

GPTBot Lost 85% of the Web. Your AI Bot Strategy Probably Isn’t Ready.

OpenAI’s GPTBot used to reach 84% of websites. It now reaches 12%. That’s not a technical failure. It’s publishers making a choice.

A Hostinger analysis of 66.7 billion bot requests across more than five million websites shows a clear pattern: the web is splitting its approach to AI bots. Training crawlers are getting shut out. Assistant and search bots are being welcomed in. And a growing number of sites are stuck in the worst position of all, blocking everything or blocking nothing, because they haven’t figured out the difference.

If your team hasn’t reviewed your robots.txt in the past six months, your AI visibility strategy has a hole in it. And if you think robots.txt is still just an SEO housekeeping file, you’re about to fall behind teams that treat it as the strategic decision it’s become.

Four Types of AI Bots (and Why You Can’t Treat Them the Same)

The bot landscape used to be simple. Googlebot crawled your site. You let it. Maybe you blocked a few scrapers. That was it.

Now there are dozens of AI-related bots hitting your site, and they fall into four distinct categories with very different implications for your brand:

Search engine bots (Googlebot, Bingbot) still do what they’ve always done: index your content for search results. Nothing has changed here. Block these and you disappear from search entirely.

AI training bots (GPTBot, Google-Extended, Meta-ExternalAgent, CCBot) scrape your content to train large language models. They don’t send you traffic. They don’t cite you. They take your content and feed it into a model that may or may not mention you later. The value exchange is, at best, indirect.

AI search bots (OAI-SearchBot, PerplexityBot, Claude-SearchBot) index your content so AI search products can cite you in real-time answers. This is the direct pipeline to appearing as a source when someone asks ChatGPT or Perplexity a question about your industry.

User-action bots (ChatGPT-User, Claude-User) fire when an actual human asks an AI assistant to visit your page. Someone types “summarize this article” and pastes your URL. The bot visits on behalf of that specific user.

Each category has a different risk/reward profile. Treating them identically is like having one policy for journalists, cold callers, shoplifters, and paying customers walking through your door.

The Numbers: What 66.7 Billion Requests Tell Us

The Hostinger study tracked bot behavior across three sampling windows in 2024 and found the four categories moving in opposite directions:

Bot CategoryExample BotsTraffic ShareCoverage Trend
Search enginesGooglebot (72%), Bingbot (58%)30.5% (20.3B requests)Stable
AI trainingGPTBot (12%), Meta-ExternalAgent (57%)15.1% (10.1B requests)Declining fast
AI search/assistantOAI-SearchBot (56%), Applebot (24%)6.9% (4.6B requests)Expanding
SEO toolsAhrefsbot (60%), Semrushbot (25%)9.7% (6.4B requests)Declining

The standout number is GPTBot’s collapse. Going from 84% coverage to 12% means roughly seven out of eight websites that used to allow OpenAI’s training crawler have now blocked it. Publishers decided that letting OpenAI scrape their content for model training, with no direct benefit in return, wasn’t worth it.

But here’s what matters for brand visibility: OAI-SearchBot, the bot that actually determines whether you show up in ChatGPT search results, held steady at nearly 56% coverage. Most publishers blocking GPTBot are still allowing the search bot through. They’ve learned to distinguish between “training on my content” and “citing my content in answers.”

The ChatGPT-User Wildcard

On December 9, 2025, OpenAI quietly revised its crawler documentation. The change was subtle but significant.

Previously, OpenAI stated that robots.txt rules applied to all three of its user agents: GPTBot, OAI-SearchBot, and ChatGPT-User. The updated documentation removed ChatGPT-User from that list. OpenAI’s justification: “Because these actions are initiated by a user, robots.txt rules may not apply.”

The logic is that ChatGPT-User acts as a proxy for a human browsing the web, not as an autonomous crawler. When a person asks ChatGPT to “read this page,” the resulting bot visit is framed as user-initiated, like a browser, not a spider.

This matters for two reasons. First, it means you can’t fully control ChatGPT-User access through robots.txt alone. Second, it sets a precedent that other AI platforms could follow. If user-triggered browsing bypasses robots.txt, then the file governs less of the AI access to your site than you might think.

For brand visibility, though, ChatGPT-User visits are actually a good signal. They mean real people are asking AI to engage with your content. The concern is more about control and consent than about visibility loss.

The Robots.txt Playbook for AI Visibility

Based on the data, here’s a straightforward framework for which bots to allow and which to block:

Allow these (they drive citations and visibility)

  • OAI-SearchBot: Powers ChatGPT search results. Blocking this means ChatGPT can’t cite you.
  • PerplexityBot: Indexes content for Perplexity answers. Perplexity is growing fast and sends real referral traffic.
  • Googlebot: Obviously. But also: Google’s AI Overviews pull from indexed content. No index, no AI Overview citations.
  • Bingbot: Powers Copilot search results and feeds into multiple AI products.
  • Applebot: Apple Intelligence is rolling out across all Apple devices. This bot is expanding its footprint quickly.

Block these (they take content without sending traffic)

  • GPTBot: Used exclusively for model training. Blocking it does not affect your ChatGPT search visibility (that’s OAI-SearchBot’s job).
  • Google-Extended: Google’s AI training bot, separate from Googlebot. Blocking it doesn’t impact your SEO rankings or AI Overview citations.
  • CCBot: Common Crawl’s bot, used to build training datasets.
  • Meta-ExternalAgent: Meta’s training crawler for its AI models.

The robots.txt snippet

Here’s what the actual directives look like:

# Allow AI search bots (drive citations)
User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

# Block AI training bots (no direct visibility benefit)
User-agent: GPTBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: meta-externalagent
Disallow: /

This is a starting point. Your specific situation may warrant different choices, especially if you’re in publishing (where training data licensing deals exist) or if you have proprietary content you want to protect more aggressively.

Three Mistakes Teams Keep Making

Mistake 1: Blocking everything. Some teams, spooked by the AI scraping headlines, block all AI bots indiscriminately. This kills their AI search visibility. If you block OAI-SearchBot and PerplexityBot, you won’t appear as a cited source in AI answers. You’ve essentially opted out of the fastest-growing discovery channel in search. As we covered in our post on how ChatGPT reads your content, the search pipeline requires crawl access to cite you.

Mistake 2: Blocking nothing. Other teams haven’t touched their robots.txt since 2023. They’re giving training bots free access to their entire content library with no compensation. While this doesn’t directly harm AI visibility, it does give AI companies your content for free when you could be selectively sharing.

Mistake 3: Not knowing which bot is which. The most common failure. Teams block “GPTBot” thinking they’re protecting their content from ChatGPT, without realizing that OAI-SearchBot is the one that actually matters for search visibility. Or they allow “Googlebot” thinking that covers AI Overviews, without understanding that Google-Extended is a separate bot for AI training that can be blocked independently.

Beyond Robots.txt: The Enforcement Gap

Robots.txt is a gentlemen’s agreement. It’s not access control. Well-behaved bots respect it. Others don’t.

The Hostinger data shows most major AI bots comply with robots.txt directives. But the ChatGPT-User change signals a shift in thinking. When AI companies frame bot visits as “user-initiated actions” rather than autonomous crawling, the robots.txt convention weakens.

If you need harder enforcement, consider these additional layers:

  • Web Application Firewall (WAF) rules: Cloudflare, AWS WAF, and similar services can block specific user agents at the network level, before the request hits your server. We wrote about Cloudflare’s default AI bot blocking when they changed their policy in 2025.
  • Rate limiting: Even for bots you allow, set rate limits to prevent aggressive crawling from affecting site performance. The Read the Docs project found that blocking AI crawlers dropped their bandwidth from 800GB to 200GB daily, saving $1,500 per month.
  • Server log monitoring: Check which bots are actually hitting your site. Compare the user-agent strings against your robots.txt policy to find bots that aren’t respecting your rules.

What This Means for AI Visibility Strategy

The robots.txt decision connects directly to your broader AI visibility approach. If you’re investing in content freshness and earning brand mentions across the web, you need to make sure AI search bots can actually find and index that work.

Here’s a quick audit checklist:

  1. Review your current robots.txt. Do you know which AI bots you’re allowing and which you’re blocking? If not, check now.
  2. Check your Cloudflare/CDN settings. Network-level bot blocking can override your robots.txt. Make sure your CDN isn’t blocking bots you want to allow.
  3. Separate training from search. Block GPTBot and Google-Extended. Allow OAI-SearchBot and PerplexityBot. This gives you citations without giving away training data for free.
  4. Monitor your server logs. Look for AI bot user agents and check their crawl frequency. If a bot you’ve blocked is still crawling, you need WAF rules.
  5. Revisit quarterly. New AI bots appear regularly. The landscape six months from now will look different. Set a calendar reminder.

The era where robots.txt was a “set it and forget it” file is over. It’s now one of the most consequential technical decisions for how your brand shows up in AI-generated answers. Treat it accordingly.

Want to know what AI platforms say about your brand? Try RivalHound free and find out.

#robots.txt #AI crawlers #GPTBot #AI visibility #technical SEO

Ready to Monitor Your AI Search Visibility?

Track your brand mentions across ChatGPT, Google AI, Perplexity, and other AI platforms.