Back to blog
cloudflarewaffirewall

How to Block AI Crawlers with Cloudflare

December 3, 2024(Updated: Dec 5, 2024)5 min read
Share:
Cloud network infrastructure and connections

Cloudflare sits in front of your server, so bot blocking happens before requests even reach you. It's faster, more reliable than server-side blocking, and the free tier handles most use cases.

Method 1: Custom WAF Rules

The most precise approach is creating custom firewall rules.

Step-by-step

  1. Log into Cloudflare Dashboard
  2. Select your domain
  3. Go to Security → WAF → Custom Rules
  4. Click Create rule

Rule for AI crawlers

Rule name: Block AI Training Crawlers

Expression:

(http.user_agent contains "GPTBot") or
(http.user_agent contains "ChatGPT-User") or
(http.user_agent contains "ClaudeBot") or
(http.user_agent contains "Claude-Web") or
(http.user_agent contains "Anthropic") or
(http.user_agent contains "Google-Extended") or
(http.user_agent contains "CCBot") or
(http.user_agent contains "Bytespider") or
(http.user_agent contains "PerplexityBot")

Action: Block

That's it. Click Deploy and requests from these bots will be blocked at Cloudflare's edge.

Using the expression builder

If you prefer the visual builder:

  1. Field: User Agent
  2. Operator: contains
  3. Value: GPTBot
  4. Click "Or" to add more conditions
  5. Repeat for each bot

Method 2: Bot Fight Mode

Cloudflare has a built-in Bot Fight Mode that blocks known bad bots automatically.

Free plan: Security → Bots → Bot Fight Mode → Toggle On

Pro and higher: You get more granular controls under Super Bot Fight Mode.

Bot Fight Mode catches a lot of scrapers, but it's not specifically tuned for AI crawlers. Some legitimate AI bots might slip through because they're not flagged as "bad." Combine it with custom rules for best results.

Method 3: Managed Rulesets (Pro+)

On Pro plans and above, you can use managed rulesets:

  1. Security → WAF → Managed Rules
  2. Look for bot-related rulesets
  3. Enable and configure them

These are maintained by Cloudflare and update automatically. But for targeting specific AI crawlers, custom rules are still better.

Free tier limitations

The free tier is surprisingly capable:

| Feature | Free | Pro | Business | |---------|------|-----|----------| | Custom WAF rules | 5 | 20 | 100 | | Bot Fight Mode | Basic | Super | Super | | Rate limiting | Limited | Full | Full |

Five custom rules is enough to block the major AI crawlers with one well-crafted expression like the one above.

Blocking strategy

I recommend this setup:

  1. Custom WAF rule for the specific AI crawlers you want to block
  2. Bot Fight Mode enabled for general bad bot protection
  3. Challenge suspicious traffic instead of blocking (see below)

Challenge vs Block

Instead of outright blocking, you can challenge suspicious requests:

Action: Managed Challenge

This presents a CAPTCHA-like challenge. Real users pass through; bots don't. Useful if you're worried about false positives.

For known AI crawlers like GPTBot, straight blocking is fine. For less certain cases, challenge might be safer.

Rate limiting

Maybe you want to slow AI bots rather than block them entirely:

  1. Security → WAF → Rate Limiting Rules
  2. Create a rule matching AI User-Agents
  3. Set a low threshold (like 10 requests per minute)
  4. Action: Block or Challenge when exceeded
(http.user_agent contains "GPTBot") or
(http.user_agent contains "ClaudeBot")

Rate: 10 requests per minute Action: Block

This lets them crawl slowly while preventing server hammering.

Logging and monitoring

Check what's being blocked:

  1. Security → Events
  2. Filter by Action: Block
  3. Look at User-Agents

This shows you which bots are hitting your firewall rules. Useful for:

  • Verifying your rules work
  • Spotting new bots to block
  • Finding false positives

robots.txt still matters

Even with Cloudflare blocking, maintain your robots.txt:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /
Want to skip the copy-paste?

Use our robots.txt generator to create these rules automatically.

Try robots.txt Generator

Why? Documentation of intent. If a bot somehow bypasses Cloudflare (unlikely but possible), the robots.txt is your fallback. It also establishes your policy for legal purposes.

Edge caching considerations

If you use Cloudflare's caching (most people do), blocked requests never hit cache anyway—they're stopped at the WAF layer. No worries there.

But make sure your WAF rules fire before cached responses are served. By default, Cloudflare's order is: WAF → Cache → Origin. You should be fine.

Workers (advanced)

For complex logic, Cloudflare Workers let you write custom JavaScript:

export default {
  async fetch(request) {
    const userAgent = request.headers.get("User-Agent") || "";

    const aiCrawlers = [
      "GPTBot",
      "ClaudeBot",
      "Anthropic",
      "Google-Extended",
      "CCBot",
      "Bytespider"
    ];

    if (aiCrawlers.some(bot => userAgent.includes(bot))) {
      return new Response("Blocked", { status: 403 });
    }

    return fetch(request);
  }
};

This is overkill for most people. Custom WAF rules are simpler and don't consume Worker request limits.

Testing

Check if Cloudflare is active

curl -I https://yoursite.com/ | grep -i cloudflare

You should see Cloudflare headers.

Test bot blocking

curl -A "GPTBot/1.0" -I https://yoursite.com/

Should return:

HTTP/2 403
server: cloudflare
...

Check events

After testing, go to Security → Events. You should see your test request blocked.

Common issues

"Rule not blocking anything"

  • Check the rule is deployed (green checkmark)
  • Verify the expression syntax
  • Check Security → Events for errors
  • Make sure you're on the right domain

"Blocking too much"

  • Review Security → Events for false positives
  • Use "Challenge" instead of "Block"
  • Make expressions more specific

"Cloudflare not in front of site"

Check that your DNS is proxied (orange cloud icon). If it's gray (DNS only), traffic bypasses Cloudflare.

My setup

I use:

  1. One custom WAF rule blocking major AI crawlers (the expression from above)
  2. Bot Fight Mode enabled
  3. robots.txt for documentation

This catches everything without complexity. The free tier handles it fine.

Skip the manual work

Generate your blocking rules in seconds with our free tools.

See also:

Found this helpful? Share it with others.

Share:

Ready to block AI crawlers?

Use our free generators to create your blocking rules in seconds.