Back to blog
netlifyedge-functionsjamstack

How to Block AI Crawlers on Netlify

December 4, 2024(Updated: Dec 5, 2024)5 min read
Share:
Global web and code network visualization

Netlify doesn't have traditional server config, but between Edge Functions and _redirects, you can block AI crawlers effectively. Here's how.

Method 1: robots.txt (simplest)

Create public/robots.txt (or just robots.txt in your repo root for static sites):

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-Web
Disallow: /

User-agent: Anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /

This is a polite request. Well-behaved bots honor it; others don't.

Want to skip the copy-paste?

Use our robots.txt generator to create these rules automatically.

Try robots.txt Generator

Method 2: Edge Functions (recommended)

Netlify Edge Functions run at the edge, intercepting requests before they hit your site. This is real enforcement.

Setup

Create netlify/edge-functions/block-ai-bots.ts:

import type { Config, Context } from "@netlify/edge-functions";

const AI_BOTS = [
  'GPTBot',
  'ChatGPT-User',
  'ChatGPT',
  'OAI-SearchBot',
  'ClaudeBot',
  'Claude-Web',
  'Anthropic',
  'Google-Extended',
  'CCBot',
  'Bytespider',
  'PerplexityBot',
  'Omgilibot',
];

export default async (request: Request, context: Context) => {
  const userAgent = request.headers.get('user-agent') || '';

  const isAIBot = AI_BOTS.some(bot =>
    userAgent.toLowerCase().includes(bot.toLowerCase())
  );

  if (isAIBot) {
    return new Response('Forbidden', { status: 403 });
  }

  // Continue to the next handler (your site)
  return context.next();
};

export const config: Config = {
  path: "/*",
};

Configuration in netlify.toml

Add to your netlify.toml:

[[edge_functions]]
  path = "/*"
  function = "block-ai-bots"

Excluding static assets

You probably don't want to run the function for every static asset. Refine the path:

[[edge_functions]]
  path = "/*"
  function = "block-ai-bots"
  excludedPath = ["/_next/*", "/images/*", "/fonts/*", "/*.ico"]

Or in the function itself:

export default async (request: Request, context: Context) => {
  const url = new URL(request.url);

  // Skip static assets
  if (url.pathname.match(/\.(js|css|png|jpg|svg|ico|woff|woff2)$/)) {
    return context.next();
  }

  const userAgent = request.headers.get('user-agent') || '';
  const isAIBot = AI_BOTS.some(bot =>
    userAgent.toLowerCase().includes(bot.toLowerCase())
  );

  if (isAIBot) {
    return new Response('Forbidden', { status: 403 });
  }

  return context.next();
};

Method 3: Netlify redirects (limited)

You can't block by User-Agent with _redirects or netlify.toml redirects—they don't support header matching. Use Edge Functions instead.

However, you can use redirects to serve different robots.txt files:

# netlify.toml - not useful for User-Agent blocking
[[redirects]]
  from = "/robots.txt"
  to = "/robots-blocking.txt"
  status = 200
  force = true

Limited use case, but an option if you need conditional robots.txt.

Logging blocked requests

import type { Config, Context } from "@netlify/edge-functions";

const AI_BOTS = ['GPTBot', 'ClaudeBot', 'Bytespider'];

export default async (request: Request, context: Context) => {
  const userAgent = request.headers.get('user-agent') || '';

  const isAIBot = AI_BOTS.some(bot =>
    userAgent.toLowerCase().includes(bot.toLowerCase())
  );

  if (isAIBot) {
    console.log(`Blocked: ${userAgent} from ${request.url}`);
    return new Response('Forbidden', { status: 403 });
  }

  return context.next();
};

export const config: Config = {
  path: "/*",
};

Check Netlify Dashboard → Functions → Logs to see blocked requests.

Rate limiting

Netlify Edge Functions can implement rate limiting, though it's a bit more involved:

import type { Config, Context } from "@netlify/edge-functions";

// Simple rate limiting with edge KV (Blob store)
// Note: This is pseudocode - actual implementation depends on your KV setup

const AI_BOTS = ['GPTBot', 'ClaudeBot', 'Bytespider'];
const MAX_REQUESTS = 10;
const WINDOW_SECONDS = 60;

export default async (request: Request, context: Context) => {
  const userAgent = request.headers.get('user-agent') || '';

  const isAIBot = AI_BOTS.some(bot =>
    userAgent.toLowerCase().includes(bot.toLowerCase())
  );

  if (isAIBot) {
    // For proper rate limiting, use Netlify Blobs or external KV
    // This is a simplified example
    const ip = context.ip;
    console.log(`AI bot request from ${ip}: ${userAgent}`);

    // Implement your rate limiting logic here
    // For now, just block outright
    return new Response('Forbidden', { status: 403 });
  }

  return context.next();
};

For production rate limiting, consider using Netlify Blobs for state or a service like Upstash Redis.

Partial blocking

Block AI from specific paths only:

const PROTECTED_PATHS = ['/premium', '/members', '/api'];

export default async (request: Request, context: Context) => {
  const userAgent = request.headers.get('user-agent') || '';
  const url = new URL(request.url);

  const isAIBot = AI_BOTS.some(bot =>
    userAgent.toLowerCase().includes(bot.toLowerCase())
  );

  const isProtectedPath = PROTECTED_PATHS.some(path =>
    url.pathname.startsWith(path)
  );

  if (isAIBot && isProtectedPath) {
    return new Response('Forbidden', { status: 403 });
  }

  return context.next();
};

Testing locally

Use Netlify CLI:

netlify dev

Then test:

curl -A "GPTBot/1.0" -I http://localhost:8888/

Testing on Netlify

After deploying:

curl -A "GPTBot/1.0" -I https://yoursite.netlify.app/

Should return 403 Forbidden.

Framework-specific notes

Next.js on Netlify

You can use Next.js middleware instead of Netlify Edge Functions. Either works, but Next.js middleware might feel more natural if you're already using Next.js features.

Gatsby

Gatsby generates static files. Use Netlify Edge Functions as shown above.

Hugo / Jekyll / other static

Same approach—Edge Functions work regardless of your static site generator.

Common issues

Edge Function not running

  • Check netlify.toml configuration
  • Verify function file is in netlify/edge-functions/
  • Check Netlify Dashboard → Functions for errors

Wrong response

  • Check your logic isn't too broad (accidentally blocking real users)
  • Test with specific User-Agent strings

Performance concerns

Edge Functions are fast, but if you're worried:

  • Exclude static asset paths
  • Keep the bot list short
  • Use early return for non-bot requests

Using with Cloudflare

If you front Netlify with Cloudflare (via CNAME), you can do bot blocking in Cloudflare instead. See our Cloudflare guide. But Netlify Edge Functions work fine alone.

My recommendation

  1. Add robots.txt for documentation and compliant bots
  2. Add Edge Function for actual enforcement
  3. Check Netlify function logs to verify
Skip the manual work

Generate your blocking rules in seconds with our free tools.

See also:

Found this helpful? Share it with others.

Share:

Ready to block AI crawlers?

Use our free generators to create your blocking rules in seconds.