How to Block Bytespider (ByteDance's Aggressive Crawler)
Bytespider is ByteDance's web crawler—the company behind TikTok. And unlike most AI crawlers, it has a reputation for not playing nice.
Bytespider has been widely reported to ignore robots.txt. You NEED server-level blocking for this one.
The Bytespider problem
Most AI crawlers respect robots.txt. You add a Disallow rule, they stop coming. Bytespider? Not so much.
I've seen multiple reports—and experienced it myself—of Bytespider continuing to crawl sites well after being blocked in robots.txt. ByteDance claims they respect robots.txt, but server logs tell a different story.
Whether this is intentional or a bug in their system, the effect is the same: robots.txt alone won't stop them.
What Bytespider crawls for
Bytespider collects data for:
- TikTok's recommendation algorithm
- ByteDance's AI products
- Douyin (TikTok's Chinese version)
- Various machine learning initiatives
The User-Agent looks like:
Bytespider; spider-feedback@bytedance.com
They're pretty aggressive about crawl frequency too. I've seen Bytespider hit sites thousands of times in a single day.
The solution: Server-level blocking
Since robots.txt isn't reliable, you need to block at the server level.
Apache (.htaccess)
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Bytespider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bytedance [NC]
RewriteRule .* - [F,L]
</IfModule>
I block both "Bytespider" and "Bytedance" in case they use alternate User-Agent strings.
Use our .htaccess generator to create these rules automatically.
Try .htaccess Generator →Nginx
if ($http_user_agent ~* "Bytespider|Bytedance") {
return 403;
}
Add this inside your server block.
Cloudflare
This is honestly the easiest option if you're using Cloudflare:
- Security → WAF → Custom Rules
- Create rule:
(http.user_agent contains "Bytespider") or (http.user_agent contains "Bytedance") - Action: Block
Cloudflare's Bot Fight Mode also helps catch Bytespider even when it tries to disguise itself.
Still add robots.txt
Even though Bytespider ignores robots.txt, add the rule anyway:
User-agent: Bytespider
Disallow: /
Why? Documentation. If you ever need to take legal action or report abusive behavior, having a clear robots.txt establishes your intent to block them. It shows you did everything the "right" way and they ignored it.
Use our robots.txt generator to create these rules automatically.
Try robots.txt Generator →IP-based blocking
Some people go nuclear and block ByteDance's entire IP range. This is aggressive and might cause collateral damage, but if you're really fed up:
ByteDance uses various IP ranges, but they rotate frequently. A more maintainable approach is to use services like:
- Cloudflare Bot Fight Mode
- AWS WAF with managed rule sets
- Your host's anti-bot features
These maintain updated blocklists so you don't have to.
Checking your logs
See if Bytespider is hitting your site:
grep -i "bytespider\|bytedance" /var/log/apache2/access.log | tail -50
Look at the response codes:
- 200 = They're getting through
- 403 = Your block is working
If you're seeing 200s after implementing blocks, double-check your .htaccess syntax and that mod_rewrite is enabled.
The bigger picture
Bytespider's behavior is frustrating but not surprising. TikTok/ByteDance has a history of aggressive data collection. They're optimizing for their business, not web etiquette.
This is why I recommend a layered approach for all AI crawlers:
- robots.txt (catches well-behaved bots)
- Server-level blocking (catches most others)
- Cloudflare or similar WAF (catches the rest)
Bytespider is just the most obvious example of why you need all three layers.
Alternative: Rate limiting
If outright blocking is too aggressive, you can rate limit instead. In Nginx:
limit_req_zone $http_user_agent zone=bytedance:10m rate=1r/m;
if ($http_user_agent ~* "Bytespider|Bytedance") {
limit_req zone=bytedance burst=5 nodelay;
}
This limits Bytespider to about one request per minute. They'll get your content eventually, but they won't hammer your server.
Verification
curl -A "Bytespider" -I https://yoursite.com/
Should return 403 Forbidden.
If it returns 200, check:
- Is your .htaccess being processed? (AllowOverride setting)
- Is mod_rewrite enabled?
- Are there conflicting rules?
My recommendation
For Bytespider specifically:
- Add robots.txt block (for documentation)
- Add .htaccess or nginx block (for enforcement)
- Consider Cloudflare if you're seeing persistent crawling
Don't rely on robots.txt alone. Bytespider has demonstrated they don't reliably respect it.
Generate your blocking rules in seconds with our free tools.
See also: