ClaudeBot is Anthropic's web crawler. If you're reading this on their AI assistant Claude, well, irony aside—here's how to make sure your content doesn't end up in Claude's next training run.

Anthropic's crawler ecosystem

Anthropic runs multiple User-Agents:

User-Agent	Purpose
ClaudeBot	Primary training data crawler
Claude-Web	Real-time browsing for Claude users
Anthropic-ai	General Anthropic crawling

ClaudeBot is the main one for training data collection. Claude-Web is similar to OpenAI's ChatGPT-User—it's used when Claude needs to fetch a page in real-time for a user query.

The quick block

robots.txt:

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: Anthropic-ai
Disallow: /

Done. Anthropic is pretty good about respecting robots.txt.

Want to skip the copy-paste?

Use our robots.txt generator to create these rules automatically.

Try robots.txt Generator →

robots.txt explained

Here's a more complete robots.txt that blocks Claude alongside other AI crawlers while preserving search engine access:

# Anthropic / Claude
User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: Anthropic-ai
Disallow: /

# OpenAI (while we're at it)
User-agent: GPTBot
Disallow: /

# Google AI training (not search!)
User-agent: Google-Extended
Disallow: /

# Keep search engines happy
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /

Partial blocking

Want Claude to access your public docs but not premium content?

User-agent: ClaudeBot
Disallow: /members/
Disallow: /premium/
Allow: /docs/
Allow: /help/

Why might you want this? If you have an API or developer tools, having Claude be able to answer questions about your public documentation could actually be useful for your users.

Server-level blocking

For an extra layer of protection, block at the server level too.

Apache (.htaccess)

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ClaudeBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Claude-Web [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Anthropic [NC]
RewriteRule .* - [F,L]
</IfModule>

Want to skip the copy-paste?

Use our .htaccess generator to create these rules automatically.

Try .htaccess Generator →

Nginx

if ($http_user_agent ~* "Claude|Anthropic") {
    return 403;
}

Cloudflare

Security → WAF → Custom Rules
Expression: (http.user_agent contains "Claude") or (http.user_agent contains "Anthropic")
Action: Block

Checking if it works

curl test

curl -A "ClaudeBot/1.0" -I https://yoursite.com/

Should return 403 Forbidden if your server-level block is active.

Log check

grep -i "claude\|anthropic" /var/log/apache2/access.log | tail -20

Look for 403 responses.

Anthropic vs OpenAI

In my experience, Anthropic's crawlers are less aggressive than OpenAI's. I see GPTBot more frequently in my logs than ClaudeBot. But that could vary by site.

Both companies respect robots.txt, which is more than I can say for some others (looking at you, Bytespider). The robots.txt block should be sufficient for most cases, but the server-level block adds peace of mind.

Should you block Claude-Web?

Same trade-off as ChatGPT-User. Claude-Web is used for real-time browsing—when a user asks Claude "what does this article say" and pastes a link.

Block it if you don't want any AI interaction with your content
Keep it if you want Claude to be able to answer questions about your public pages

The real-time browsing traffic supposedly isn't used for training, but "supposedly" is doing a lot of work there. Your call.

The business reality

Look, Anthropic is a commercial company. They're training models to sell API access and subscriptions. If you're creating original content, you're providing free training data that generates revenue for them.

Whether that bothers you is a personal/business decision. Plenty of sites are fine with it. Others aren't. Either way, now you know how to block it.

For the full list of AI crawlers, see our 2025 AI Crawler List.

How to Block ClaudeBot (Anthropic's Web Crawler)