Back to blog
statisticsai-crawlersresearch

AI Crawler Traffic Statistics (2025)

December 7, 2024(Updated: Dec 7, 2024)5 min read
Share:
Data analytics dashboard showing charts and statistics

Everyone's talking about AI crawlers, but how much traffic are they actually generating? I dug into available data and talked to site operators to put together this picture.

How much AI crawler traffic are sites seeing?

This varies wildly by site type and size, but here are some patterns:

High-traffic content sites (100k+ monthly visitors):

  • GPTBot: 500-5,000 requests/day
  • ClaudeBot: 100-1,000 requests/day
  • CCBot: 200-2,000 requests/day
  • Bytespider: 1,000-50,000 requests/day (when active)

Medium sites (10k-100k monthly visitors):

  • GPTBot: 50-500 requests/day
  • ClaudeBot: 10-100 requests/day
  • Others: Variable

Small sites (under 10k monthly visitors):

  • Often see AI crawlers weekly rather than daily
  • But when they hit, they hit hard (hundreds of requests in a burst)

Bytespider is the outlier

Every data point I've seen shows Bytespider being the most aggressive crawler by volume. ByteDance's crawler regularly accounts for 50%+ of all AI bot traffic on sites that don't block it.

One site operator I spoke with reported:

  • 47,000 Bytespider requests in a single day
  • After blocking: 0 (they used server-level blocking)

The difference in server load was noticeable.

What percentage of sites are blocking?

Based on available research from late 2024:

| Source | Finding | |--------|---------| | Originality.AI study | 35.7% of top 1,000 websites block GPTBot | | Reuters Institute | 48% of news publishers block at least one AI crawler | | Dark Visitors tracking | Blocking rates increasing month-over-month |

The trend is clearly upward. More sites are blocking as awareness grows.

Who blocks and who doesn't

Industries with highest blocking rates:

  1. News and media publishing (~50%)
  2. Stock photography and creative assets
  3. Academic and scientific publishers
  4. E-commerce with proprietary content

Industries with lowest blocking rates:

  1. Open-source projects
  2. Government and public information sites
  3. Marketing/promotional sites
  4. Developer documentation (though this is changing)

The crawl frequency question

How often do AI crawlers come back? From log analysis:

  • GPTBot: Every 1-7 days on active sites
  • ClaudeBot: Weekly to bi-weekly
  • CCBot: Monthly snapshots (Common Crawl has a regular schedule)
  • Bytespider: Continuous when active, can hit multiple times per hour

The frequency depends on how "interesting" your content is to their algorithms. News sites get hit constantly; static brochure sites might see crawlers monthly.

Bandwidth impact

A rough estimate of bandwidth per AI crawler visit:

  • Each page crawl: 50-200 KB (HTML + some assets)
  • 1,000 crawls/day: 50-200 MB
  • 10,000 crawls/day: 500 MB - 2 GB

For sites paying for bandwidth or on performance-sensitive shared hosting, this matters. Multiply by multiple AI crawlers and it adds up.

The robots.txt adoption gap

Here's an interesting stat: many sites that have blocked GPTBot haven't blocked other AI crawlers.

From sampling various websites:

  • Block GPTBot: 35%
  • Block ClaudeBot: 25%
  • Block Google-Extended: 20%
  • Block CCBot: 15%
  • Block Bytespider: 10%

GPTBot gets the most attention because OpenAI is the most visible AI company. But if you're blocking for philosophical/business reasons, leaving other crawlers unblocked is inconsistent.

Server-level blocking is rare

Most blocking happens via robots.txt. Server-level blocking (.htaccess, nginx rules, WAF) is much less common:

  • robots.txt blocking: Very common
  • .htaccess/nginx blocking: Uncommon (~5-10% of blocking sites)
  • WAF rules: Even rarer

This matters because some crawlers (notably Bytespider) don't reliably respect robots.txt. Sites relying only on robots.txt may still be getting crawled.

Year-over-year trends

Comparing 2023 to 2024:

| Metric | 2023 | 2024 | Change | |--------|------|------|--------| | Sites blocking any AI crawler | ~15% | ~35% | +133% | | GPTBot crawl volume | Lower | Higher | Increased | | New AI crawlers identified | ~8 | ~15 | +87% | | Average crawl aggressiveness | Moderate | High | Increased |

Both blocking and crawling are increasing. It's an arms race of sorts.

What this means for you

If you're not blocking:

  • AI crawlers are probably hitting your site regularly
  • The volume depends on your content freshness and popularity
  • This traffic is using your bandwidth and server resources

If you're blocking via robots.txt only:

  • You're stopping most well-behaved crawlers (GPTBot, ClaudeBot, CCBot)
  • Bytespider may still be getting through
  • Consider adding server-level blocking for complete coverage

If you're blocking at server level:

  • You're blocking effectively
  • Monitor logs to catch new crawlers with different User-Agents

Future projections

Where is this going? My educated guesses:

  1. More AI crawlers — New AI companies will launch their own crawlers
  2. More blocking — The trend toward blocking will continue
  3. More obfuscation — Some crawlers may start disguising themselves
  4. Legal clarity — Court decisions will start shaping norms
  5. New standards — Something like ai.txt may become official

The equilibrium hasn't been reached yet. We're in the middle of the disruption.

Methodology notes

Where did these numbers come from?

  • Third-party research (Originality.AI, Reuters Institute, etc.)
  • Server log analysis shared by site operators
  • My own monitoring of test sites
  • Publicly available bot monitoring services

Take all numbers as directional rather than precise. The web is messy and comprehensive data is hard to come by.

Taking action

If these numbers have convinced you to block AI crawlers:

Skip the manual work

Generate your blocking rules in seconds with our free tools.

The choice is yours. The data just helps you make it informed.

Found this helpful? Share it with others.

Share:

Ready to block AI crawlers?

Use our free generators to create your blocking rules in seconds.