How to Block ClaudeBot (Anthropic's Web Crawler)
ClaudeBot is Anthropic's web crawler. If you're reading this on their AI assistant Claude, well, irony aside—here's how to make sure your content doesn't end up in Claude's next training run.
Anthropic's crawler ecosystem
Anthropic runs multiple User-Agents:
| User-Agent | Purpose | |------------|---------| | ClaudeBot | Primary training data crawler | | Claude-Web | Real-time browsing for Claude users | | Anthropic-ai | General Anthropic crawling |
ClaudeBot is the main one for training data collection. Claude-Web is similar to OpenAI's ChatGPT-User—it's used when Claude needs to fetch a page in real-time for a user query.
The quick block
robots.txt:
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: Anthropic-ai
Disallow: /
Done. Anthropic is pretty good about respecting robots.txt.
Use our robots.txt generator to create these rules automatically.
Try robots.txt Generator →robots.txt explained
Here's a more complete robots.txt that blocks Claude alongside other AI crawlers while preserving search engine access:
# Anthropic / Claude
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: Anthropic-ai
Disallow: /
# OpenAI (while we're at it)
User-agent: GPTBot
Disallow: /
# Google AI training (not search!)
User-agent: Google-Extended
Disallow: /
# Keep search engines happy
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: *
Allow: /
Partial blocking
Want Claude to access your public docs but not premium content?
User-agent: ClaudeBot
Disallow: /members/
Disallow: /premium/
Allow: /docs/
Allow: /help/
Why might you want this? If you have an API or developer tools, having Claude be able to answer questions about your public documentation could actually be useful for your users.
Server-level blocking
For an extra layer of protection, block at the server level too.
Apache (.htaccess)
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ClaudeBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Claude-Web [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Anthropic [NC]
RewriteRule .* - [F,L]
</IfModule>
Use our .htaccess generator to create these rules automatically.
Try .htaccess Generator →Nginx
if ($http_user_agent ~* "Claude|Anthropic") {
return 403;
}
Cloudflare
- Security → WAF → Custom Rules
- Expression:
(http.user_agent contains "Claude") or (http.user_agent contains "Anthropic") - Action: Block
Checking if it works
curl test
curl -A "ClaudeBot/1.0" -I https://yoursite.com/
Should return 403 Forbidden if your server-level block is active.
Log check
grep -i "claude\|anthropic" /var/log/apache2/access.log | tail -20
Look for 403 responses.
Anthropic vs OpenAI
In my experience, Anthropic's crawlers are less aggressive than OpenAI's. I see GPTBot more frequently in my logs than ClaudeBot. But that could vary by site.
Both companies respect robots.txt, which is more than I can say for some others (looking at you, Bytespider). The robots.txt block should be sufficient for most cases, but the server-level block adds peace of mind.
Should you block Claude-Web?
Same trade-off as ChatGPT-User. Claude-Web is used for real-time browsing—when a user asks Claude "what does this article say" and pastes a link.
- Block it if you don't want any AI interaction with your content
- Keep it if you want Claude to be able to answer questions about your public pages
The real-time browsing traffic supposedly isn't used for training, but "supposedly" is doing a lot of work there. Your call.
The business reality
Look, Anthropic is a commercial company. They're training models to sell API access and subscriptions. If you're creating original content, you're providing free training data that generates revenue for them.
Whether that bothers you is a personal/business decision. Plenty of sites are fine with it. Others aren't. Either way, now you know how to block it.
For the full list of AI crawlers, see our 2025 AI Crawler List.