How to Block AI Crawlers with Cloudflare
Cloudflare sits in front of your server, so bot blocking happens before requests even reach you. It's faster, more reliable than server-side blocking, and the free tier handles most use cases.
Method 1: Custom WAF Rules
The most precise approach is creating custom firewall rules.
Step-by-step
- Log into Cloudflare Dashboard
- Select your domain
- Go to Security → WAF → Custom Rules
- Click Create rule
Rule for AI crawlers
Rule name: Block AI Training Crawlers
Expression:
(http.user_agent contains "GPTBot") or
(http.user_agent contains "ChatGPT-User") or
(http.user_agent contains "ClaudeBot") or
(http.user_agent contains "Claude-Web") or
(http.user_agent contains "Anthropic") or
(http.user_agent contains "Google-Extended") or
(http.user_agent contains "CCBot") or
(http.user_agent contains "Bytespider") or
(http.user_agent contains "PerplexityBot")
Action: Block
That's it. Click Deploy and requests from these bots will be blocked at Cloudflare's edge.
Using the expression builder
If you prefer the visual builder:
- Field:
User Agent - Operator:
contains - Value:
GPTBot - Click "Or" to add more conditions
- Repeat for each bot
Method 2: Bot Fight Mode
Cloudflare has a built-in Bot Fight Mode that blocks known bad bots automatically.
Free plan: Security → Bots → Bot Fight Mode → Toggle On
Pro and higher: You get more granular controls under Super Bot Fight Mode.
Bot Fight Mode catches a lot of scrapers, but it's not specifically tuned for AI crawlers. Some legitimate AI bots might slip through because they're not flagged as "bad." Combine it with custom rules for best results.
Method 3: Managed Rulesets (Pro+)
On Pro plans and above, you can use managed rulesets:
- Security → WAF → Managed Rules
- Look for bot-related rulesets
- Enable and configure them
These are maintained by Cloudflare and update automatically. But for targeting specific AI crawlers, custom rules are still better.
Free tier limitations
The free tier is surprisingly capable:
| Feature | Free | Pro | Business | |---------|------|-----|----------| | Custom WAF rules | 5 | 20 | 100 | | Bot Fight Mode | Basic | Super | Super | | Rate limiting | Limited | Full | Full |
Five custom rules is enough to block the major AI crawlers with one well-crafted expression like the one above.
Blocking strategy
I recommend this setup:
- Custom WAF rule for the specific AI crawlers you want to block
- Bot Fight Mode enabled for general bad bot protection
- Challenge suspicious traffic instead of blocking (see below)
Challenge vs Block
Instead of outright blocking, you can challenge suspicious requests:
Action: Managed Challenge
This presents a CAPTCHA-like challenge. Real users pass through; bots don't. Useful if you're worried about false positives.
For known AI crawlers like GPTBot, straight blocking is fine. For less certain cases, challenge might be safer.
Rate limiting
Maybe you want to slow AI bots rather than block them entirely:
- Security → WAF → Rate Limiting Rules
- Create a rule matching AI User-Agents
- Set a low threshold (like 10 requests per minute)
- Action: Block or Challenge when exceeded
(http.user_agent contains "GPTBot") or
(http.user_agent contains "ClaudeBot")
Rate: 10 requests per minute Action: Block
This lets them crawl slowly while preventing server hammering.
Logging and monitoring
Check what's being blocked:
- Security → Events
- Filter by Action: Block
- Look at User-Agents
This shows you which bots are hitting your firewall rules. Useful for:
- Verifying your rules work
- Spotting new bots to block
- Finding false positives
robots.txt still matters
Even with Cloudflare blocking, maintain your robots.txt:
User-agent: GPTBot
Disallow: /
User-agent: ClaudeBot
Disallow: /
Use our robots.txt generator to create these rules automatically.
Try robots.txt Generator →Why? Documentation of intent. If a bot somehow bypasses Cloudflare (unlikely but possible), the robots.txt is your fallback. It also establishes your policy for legal purposes.
Edge caching considerations
If you use Cloudflare's caching (most people do), blocked requests never hit cache anyway—they're stopped at the WAF layer. No worries there.
But make sure your WAF rules fire before cached responses are served. By default, Cloudflare's order is: WAF → Cache → Origin. You should be fine.
Workers (advanced)
For complex logic, Cloudflare Workers let you write custom JavaScript:
export default {
async fetch(request) {
const userAgent = request.headers.get("User-Agent") || "";
const aiCrawlers = [
"GPTBot",
"ClaudeBot",
"Anthropic",
"Google-Extended",
"CCBot",
"Bytespider"
];
if (aiCrawlers.some(bot => userAgent.includes(bot))) {
return new Response("Blocked", { status: 403 });
}
return fetch(request);
}
};
This is overkill for most people. Custom WAF rules are simpler and don't consume Worker request limits.
Testing
Check if Cloudflare is active
curl -I https://yoursite.com/ | grep -i cloudflare
You should see Cloudflare headers.
Test bot blocking
curl -A "GPTBot/1.0" -I https://yoursite.com/
Should return:
HTTP/2 403
server: cloudflare
...
Check events
After testing, go to Security → Events. You should see your test request blocked.
Common issues
"Rule not blocking anything"
- Check the rule is deployed (green checkmark)
- Verify the expression syntax
- Check Security → Events for errors
- Make sure you're on the right domain
"Blocking too much"
- Review Security → Events for false positives
- Use "Challenge" instead of "Block"
- Make expressions more specific
"Cloudflare not in front of site"
Check that your DNS is proxied (orange cloud icon). If it's gray (DNS only), traffic bypasses Cloudflare.
My setup
I use:
- One custom WAF rule blocking major AI crawlers (the expression from above)
- Bot Fight Mode enabled
- robots.txt for documentation
This catches everything without complexity. The free tier handles it fine.
Generate your blocking rules in seconds with our free tools.
See also: