Back to blog
apachehtaccessserver-config

How to Block AI Crawlers with Apache (.htaccess)

December 3, 2024(Updated: Dec 5, 2024)5 min read
Share:
Server room with networking equipment

Apache's .htaccess gives you server-level bot blocking without needing root access. Perfect for shared hosting where you can't touch the main server config.

The standard approach

Add this to your .htaccess file in your website's root directory:

# Block AI Training Crawlers
<IfModule mod_rewrite.c>
RewriteEngine On

# OpenAI
RewriteCond %{HTTP_USER_AGENT} GPTBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ChatGPT-User [NC,OR]
RewriteCond %{HTTP_USER_AGENT} OAI-SearchBot [NC,OR]

# Anthropic
RewriteCond %{HTTP_USER_AGENT} ClaudeBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Claude-Web [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Anthropic [NC,OR]

# Google AI (NOT Googlebot!)
RewriteCond %{HTTP_USER_AGENT} Google-Extended [NC,OR]

# Common Crawl
RewriteCond %{HTTP_USER_AGENT} CCBot [NC,OR]

# ByteDance
RewriteCond %{HTTP_USER_AGENT} Bytespider [NC,OR]

# Others
RewriteCond %{HTTP_USER_AGENT} PerplexityBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Omgilibot [NC]

RewriteRule .* - [F,L]
</IfModule>

The [NC] flag makes matching case-insensitive. The [OR] chains conditions. The final [F,L] returns 403 Forbidden and stops processing.

Want to skip the copy-paste?

Use our .htaccess generator to create these rules automatically.

Try .htaccess Generator

Alternative: SetEnvIf method

Some prefer using SetEnvIf, which doesn't require mod_rewrite:

# Block AI crawlers
SetEnvIfNoCase User-Agent "GPTBot" bad_bot
SetEnvIfNoCase User-Agent "ChatGPT" bad_bot
SetEnvIfNoCase User-Agent "ClaudeBot" bad_bot
SetEnvIfNoCase User-Agent "Claude-Web" bad_bot
SetEnvIfNoCase User-Agent "Anthropic" bad_bot
SetEnvIfNoCase User-Agent "Google-Extended" bad_bot
SetEnvIfNoCase User-Agent "CCBot" bad_bot
SetEnvIfNoCase User-Agent "Bytespider" bad_bot
SetEnvIfNoCase User-Agent "PerplexityBot" bad_bot

<RequireAll>
    Require all granted
    Require not env bad_bot
</RequireAll>

Both methods work. The RewriteRule approach is more common and usually works on shared hosting. The SetEnvIf method is cleaner but requires Apache 2.4+.

Where to put .htaccess

Your .htaccess file goes in your website's document root. That's typically:

| Platform | Location | |----------|----------| | Standard Apache | /var/www/html/.htaccess | | WordPress | Same folder as wp-config.php | | cPanel hosting | public_html/.htaccess |

If .htaccess doesn't exist, create it. If it does, add your rules at the top (before any existing rules).

Before and after

If you already have an .htaccess (like WordPress creates), add the bot blocking before the existing content:

# Block AI Crawlers
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} GPTBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ClaudeBot [NC]
RewriteRule .* - [F,L]
</IfModule>

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteBase /
# ... rest of WordPress rules
</IfModule>
# END WordPress

The order matters—bot blocking should happen before other rewrite rules.

Testing

Check syntax before uploading

If you have Apache locally, test your syntax:

apachectl configtest

Test with curl

curl -A "GPTBot/1.0" -I https://yoursite.com/

Expected response:

HTTP/1.1 403 Forbidden

If you see 500 errors

A 500 Internal Server Error usually means a syntax problem in your .htaccess:

  1. Check for typos
  2. Make sure each [OR] flag is there except the last condition
  3. Verify mod_rewrite is enabled on your server
  4. Check if your host restricts .htaccess usage

Common issues

"mod_rewrite not enabled"

Some shared hosts disable mod_rewrite. Contact your host or try the SetEnvIf method instead.

"AllowOverride not set"

Your host might have AllowOverride None in the Apache config, which disables .htaccess entirely. You'll need to contact your host or upgrade to a plan with more control.

"Rules not applying"

Check:

  1. File is named exactly .htaccess (note the leading dot)
  2. File is in the document root
  3. File permissions allow Apache to read it (644 typically)
  4. Clear any caching (browser, CDN, WordPress cache plugins)

"Blocked wrong bot"

If you accidentally blocked Googlebot:

# DON'T DO THIS - blocks search engines!
RewriteCond %{HTTP_USER_AGENT} Google [NC]

Be specific:

# DO THIS - only blocks AI crawler
RewriteCond %{HTTP_USER_AGENT} Google-Extended [NC]

Partial blocking

Maybe you want to block AI from certain directories only:

# Block AI from /premium/ only
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI} ^/premium/ [NC]
RewriteCond %{HTTP_USER_AGENT} GPTBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ClaudeBot [NC]
RewriteRule .* - [F,L]
</IfModule>

Or protect everything except certain paths:

# Block AI from everywhere EXCEPT /public/
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_URI} !^/public/ [NC]
RewriteCond %{HTTP_USER_AGENT} GPTBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ClaudeBot [NC]
RewriteRule .* - [F,L]
</IfModule>

Rate limiting

Apache can also rate-limit instead of blocking:

<IfModule mod_ratelimit.c>
SetEnvIfNoCase User-Agent "GPTBot" rate_limit
SetEnvIfNoCase User-Agent "ClaudeBot" rate_limit

<If "env('rate_limit') == '1'">
    SetOutputFilter RATE_LIMIT
    SetEnv rate-limit 1
</If>
</IfModule>

This requires mod_ratelimit, which isn't always available on shared hosting.

Custom error page

ErrorDocument 403 /blocked.html

Create a blocked.html file with a friendly message explaining that automated crawling isn't permitted.

Don't forget robots.txt

.htaccess handles enforcement, but keep robots.txt for documentation:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

This establishes your intent to block, which matters for legal purposes.

Skip the manual work

Generate your blocking rules in seconds with our free tools.

Performance note

.htaccess files are read on every request, which adds some overhead. On a high-traffic site with a VPS or dedicated server, consider moving these rules to your main Apache config (httpd.conf or apache2.conf) instead. Same rules, better performance.

For shared hosting, .htaccess is your only option, and the performance impact is minimal for most sites.

See also:

Found this helpful? Share it with others.

Share:

Ready to block AI crawlers?

Use our free generators to create your blocking rules in seconds.