Back to blog
nginxserver-confightaccess

How to Block AI Crawlers with Nginx

December 2, 2024(Updated: Dec 5, 2024)5 min read
Share:
Server room with networking equipment

Nginx gives you precise control over bot blocking. Unlike Apache's .htaccess, Nginx config changes require a reload, but the performance is better and the syntax is cleaner.

Basic blocking

Add this inside your server block in your Nginx configuration:

# Block AI crawlers
if ($http_user_agent ~* "GPTBot|ChatGPT|ClaudeBot|Claude-Web|Anthropic|Google-Extended|CCBot|Bytespider|PerplexityBot") {
    return 403;
}

That's the minimal version. It blocks the major AI crawlers with a single line.

More readable version

If you prefer clarity over brevity:

# Block AI training crawlers
set $block_bot 0;

# OpenAI
if ($http_user_agent ~* "GPTBot") { set $block_bot 1; }
if ($http_user_agent ~* "ChatGPT") { set $block_bot 1; }
if ($http_user_agent ~* "OAI-SearchBot") { set $block_bot 1; }

# Anthropic
if ($http_user_agent ~* "ClaudeBot") { set $block_bot 1; }
if ($http_user_agent ~* "Claude-Web") { set $block_bot 1; }
if ($http_user_agent ~* "Anthropic") { set $block_bot 1; }

# Google AI (not search!)
if ($http_user_agent ~* "Google-Extended") { set $block_bot 1; }

# Others
if ($http_user_agent ~* "CCBot") { set $block_bot 1; }
if ($http_user_agent ~* "Bytespider") { set $block_bot 1; }
if ($http_user_agent ~* "PerplexityBot") { set $block_bot 1; }
if ($http_user_agent ~* "Omgilibot") { set $block_bot 1; }

if ($block_bot = 1) {
    return 403;
}

This makes it easy to add or remove specific bots without wrestling with regex.

Using map for efficiency

For high-traffic sites, using map is more efficient than multiple if statements:

# In http block (usually /etc/nginx/nginx.conf)
map $http_user_agent $block_ai_bot {
    default 0;
    ~*GPTBot 1;
    ~*ChatGPT 1;
    ~*ClaudeBot 1;
    ~*Claude-Web 1;
    ~*Anthropic 1;
    ~*Google-Extended 1;
    ~*CCBot 1;
    ~*Bytespider 1;
    ~*PerplexityBot 1;
    ~*Omgilibot 1;
}

# In server block
server {
    ...
    if ($block_ai_bot = 1) {
        return 403;
    }
    ...
}

The map directive is evaluated once per request and is more performant than multiple regex matches.

Rate limiting instead of blocking

Maybe you don't want to block entirely, just slow them down:

# Define rate limiting zone for AI bots
limit_req_zone $binary_remote_addr zone=ai_bots:10m rate=1r/s;

# In server block
if ($http_user_agent ~* "GPTBot|ClaudeBot|Bytespider") {
    set $limit_ai 1;
}

location / {
    if ($limit_ai = 1) {
        limit_req zone=ai_bots burst=5 nodelay;
    }
    # ... rest of your config
}

This limits AI bots to about one request per second while letting regular users through at full speed.

Custom error page

Instead of a bare 403, serve a friendly message:

error_page 403 /blocked.html;

location = /blocked.html {
    root /var/www/html;
    internal;
}

Then create /var/www/html/blocked.html:

<!DOCTYPE html>
<html>
<head><title>Access Denied</title></head>
<body>
<h1>Access Denied</h1>
<p>Automated crawling is not permitted on this site.</p>
</body>
</html>

Where to put the config

Depending on your setup:

| Setup | Location | |-------|----------| | Single site | /etc/nginx/sites-available/yoursite.conf | | Multiple sites | Each site's config file | | Map directive | /etc/nginx/nginx.conf (http block) |

After editing, always test and reload:

sudo nginx -t
sudo systemctl reload nginx

Don't forget robots.txt

Nginx handles enforcement, but still add a robots.txt for documentation:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

# etc.

Put this in your webroot. In Nginx, that's usually the root directive in your server block.

Want to skip the copy-paste?

Use our robots.txt generator to create these rules automatically.

Try robots.txt Generator

Testing

curl -A "GPTBot/1.0" -I https://yoursite.com/

Should return:

HTTP/1.1 403 Forbidden

Check your error log if it's not working:

tail -f /var/log/nginx/error.log

Common issues

"If is evil" warning

You might see warnings about Nginx's if directive being problematic. For simple User-Agent checks returning 403, it's fine. The issues arise when using if for more complex logic inside location blocks.

For safety, keep your if blocks simple:

# Good - simple return
if ($condition) { return 403; }

# Potentially problematic - complex logic
if ($condition) {
    rewrite ...;
    proxy_pass ...;
}

Config not reloading

# Check syntax
sudo nginx -t

# If OK, reload
sudo systemctl reload nginx

# Check status
sudo systemctl status nginx

Case sensitivity

The ~* operator makes the match case-insensitive. Use ~ for case-sensitive matching if needed:

# Case-insensitive (recommended)
if ($http_user_agent ~* "GPTBot") { ... }

# Case-sensitive
if ($http_user_agent ~ "GPTBot") { ... }

Performance considerations

Nginx is already fast, but for very high-traffic sites:

  1. Use map instead of multiple if statements
  2. Put bot blocking early in the request processing
  3. Consider blocking at the firewall level (iptables/nftables) for known malicious IPs

For most sites, the basic if approach is plenty fast. Don't over-optimize until you have a problem.

Combining with fail2ban

For really aggressive bots, you can feed Nginx logs into fail2ban:

# /etc/fail2ban/filter.d/nginx-ai-bots.conf
[Definition]
failregex = ^<HOST>.*"(GPTBot|ClaudeBot|Bytespider).*" 403

This bans IPs that repeatedly try to crawl after being blocked. Overkill for most sites, but an option if you're dealing with persistent scrapers.

My Nginx config

Here's what I use on my sites:

# In http block
map $http_user_agent $is_ai_bot {
    default 0;
    ~*GPTBot 1;
    ~*ChatGPT 1;
    ~*ClaudeBot 1;
    ~*Anthropic 1;
    ~*Google-Extended 1;
    ~*CCBot 1;
    ~*Bytespider 1;
    ~*PerplexityBot 1;
}

# In server block
server {
    ...

    # Block AI bots early
    if ($is_ai_bot) {
        return 403;
    }

    # Rest of config
    location / {
        ...
    }
}

Simple, efficient, and easy to maintain.

See also:

Found this helpful? Share it with others.

Share:

Ready to block AI crawlers?

Use our free generators to create your blocking rules in seconds.