How to Block AI Crawlers on Vercel (Next.js)
Vercel doesn't give you traditional server config access, but you've got options. Here's how to block AI crawlers on Next.js projects hosted on Vercel.
Method 1: robots.txt (simplest)
Create public/robots.txt:
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: Anthropic-ai
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: *
Allow: /
This works for well-behaved bots that check robots.txt. But some bots (especially Bytespider) ignore it.
Use our robots.txt generator to create these rules automatically.
Try robots.txt Generator →Method 2: Next.js Middleware (recommended)
Middleware lets you intercept requests before they reach your pages. This is actual enforcement, not just a suggestion.
Create middleware.ts in your project root (or src/ if you use that structure):
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';
const AI_BOTS = [
'GPTBot',
'ChatGPT-User',
'ChatGPT',
'OAI-SearchBot',
'ClaudeBot',
'Claude-Web',
'Anthropic',
'Google-Extended',
'CCBot',
'Bytespider',
'PerplexityBot',
'Omgilibot',
'FacebookBot',
'Meta-ExternalAgent',
];
export function middleware(request: NextRequest) {
const userAgent = request.headers.get('user-agent') || '';
// Check if User-Agent matches any AI bot
const isAIBot = AI_BOTS.some(bot =>
userAgent.toLowerCase().includes(bot.toLowerCase())
);
if (isAIBot) {
return new NextResponse('Forbidden', { status: 403 });
}
return NextResponse.next();
}
// Apply to all routes
export const config = {
matcher: '/((?!_next/static|_next/image|favicon.ico).*)',
};
This blocks AI crawlers from all pages while allowing Next.js static assets to load normally.
Middleware with logging
Want to see what's being blocked?
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';
const AI_BOTS = [
'GPTBot',
'ClaudeBot',
'Google-Extended',
'CCBot',
'Bytespider',
];
export function middleware(request: NextRequest) {
const userAgent = request.headers.get('user-agent') || '';
const isAIBot = AI_BOTS.some(bot =>
userAgent.toLowerCase().includes(bot.toLowerCase())
);
if (isAIBot) {
// Log blocked requests (visible in Vercel function logs)
console.log(`Blocked AI bot: ${userAgent} from ${request.url}`);
return new NextResponse('Forbidden', { status: 403 });
}
return NextResponse.next();
}
export const config = {
matcher: '/((?!_next/static|_next/image|favicon.ico).*)',
};
Check Vercel Dashboard → Logs to see blocked requests.
Method 3: vercel.json headers
You can't directly block with vercel.json, but you can set headers that instruct reverse proxies or CDNs to block. Limited use case, but here's an example:
{
"headers": [
{
"source": "/(.*)",
"headers": [
{
"key": "X-Robots-Tag",
"value": "noai, noimageai"
}
]
}
]
}
The noai directive is experimental and not widely respected yet. Stick with middleware for actual blocking.
Rate limiting (Edge Runtime)
Vercel's Edge Runtime supports rate limiting. You can slow down AI bots instead of blocking:
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';
// Simple in-memory rate limiting (resets on cold start)
const requestCounts = new Map<string, { count: number; timestamp: number }>();
const WINDOW_MS = 60000; // 1 minute
const MAX_REQUESTS = 10;
export function middleware(request: NextRequest) {
const userAgent = request.headers.get('user-agent') || '';
const isAIBot = ['GPTBot', 'ClaudeBot', 'Bytespider'].some(bot =>
userAgent.toLowerCase().includes(bot.toLowerCase())
);
if (isAIBot) {
const ip = request.ip || 'unknown';
const now = Date.now();
const record = requestCounts.get(ip);
if (record && now - record.timestamp < WINDOW_MS) {
record.count++;
if (record.count > MAX_REQUESTS) {
return new NextResponse('Rate limited', { status: 429 });
}
} else {
requestCounts.set(ip, { count: 1, timestamp: now });
}
}
return NextResponse.next();
}
Note: This simple implementation resets when the Edge function cold starts. For production rate limiting, use Vercel KV or an external service.
Partial blocking
Block AI from specific routes only:
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';
const AI_BOTS = ['GPTBot', 'ClaudeBot', 'Bytespider'];
const PROTECTED_PATHS = ['/premium', '/members', '/api'];
export function middleware(request: NextRequest) {
const userAgent = request.headers.get('user-agent') || '';
const pathname = request.nextUrl.pathname;
const isAIBot = AI_BOTS.some(bot =>
userAgent.toLowerCase().includes(bot.toLowerCase())
);
const isProtectedPath = PROTECTED_PATHS.some(path =>
pathname.startsWith(path)
);
if (isAIBot && isProtectedPath) {
return new NextResponse('Forbidden', { status: 403 });
}
return NextResponse.next();
}
export const config = {
matcher: '/((?!_next/static|_next/image|favicon.ico).*)',
};
This blocks AI bots from /premium, /members, and /api while allowing them to access public content.
Testing locally
Run your Next.js dev server and test with curl:
curl -A "GPTBot/1.0" -I http://localhost:3000/
Should return 403 Forbidden if your middleware is working.
Testing on Vercel
After deploying:
curl -A "GPTBot/1.0" -I https://yoursite.vercel.app/
Check Vercel Dashboard → Logs for the blocked request.
Common issues
Middleware not running
- Check file is named
middleware.ts(notmiddleware.tsx) - Check it's in the right location (root or
src/depending on project structure) - Check the
matcherconfig isn't too restrictive
Static files being blocked
Make sure your matcher excludes static paths:
export const config = {
matcher: '/((?!_next/static|_next/image|favicon.ico|.*\\.(?:svg|png|jpg|jpeg|gif|webp)$).*)',
};
Cold start logging gaps
Edge functions have cold starts. Your in-memory logging or rate limiting might reset. For persistent logging, use Vercel's built-in function logs or integrate with an external service.
App Router vs Pages Router
The middleware above works with both. Just make sure:
- App Router:
middleware.tsin root orsrc/ - Pages Router: Same location
The config export works identically.
Using with Cloudflare
If you front Vercel with Cloudflare, you can do bot blocking there instead (see our Cloudflare guide). But the middleware approach works fine alone—no need for both unless you want defense in depth.
My recommendation
- Add
robots.txtinpublic/for documentation and compliant bots - Add middleware for actual enforcement
- Check Vercel logs to verify it's working
Generate your blocking rules in seconds with our free tools.
See also: