Bots are draining your bandwidth without you notice

WebProject · September 2025

@ehhthing said:

@WebProject said:

@ehhthing said:

@davide said:
Has anyone written a web server logs parser to detect non-human sessions? There are human patterns that crawlers don't follow. This kind of passive analysis would throw no JavaShit to the users' browser like HCaptcha or Cloudflare.

This is possible but then you can’t exactly block the connections outright because you risk accidentally blocking real traffic from a “weird” person. Eventually you do need to serve a captcha or some other verification method unless you want to lose business.

It is possible to use Cloudflare to block all AI bots and other non-search engine bots using security rules. On one of our client websites, the Facebook bot alone was consuming around 50 GB of bandwidth per month.

Cloudflare uses JS as part of their bot detection system.

You can block requests with Cloudflare security rules or .htaccess by targeting specific user agent headers.

JustPfff · September 2025

@Saahib said: @JustPfff, what method you used to filter out bots ?

I didn't, I was doing it manually blocking entire /16 subnets, until my FW config bloated, it made my VPS much slower since it filter millions of IP address, i don't own a top 1 Million website, but these bots were hammering my VPS like crazy.
I don't know what happening, but I think my first VPS provider put no effort to filter bot bad IP's at all.

Saahib · September 2025

There were days when higher traffic used to mean more money for website owners.. but now..

TimboJones · September 2025

Instead of blocking them, how do we get it to view ads and earn passive income?

zed · September 2025

I thought this was a @DediRock post at first tbh

gremeyer · September 2025

@cu_olly said:
They will not, under any circumstance, remove Grok as a customer, no matter what levels of abuse are reported.

I'm sure xAI pays a ton of money for Google Cloud compute, so of course Google Cloud would turn a blind eye.

praburam · September 2025

Intersting thing is sometimes i get real traffic from chatgpt users

topgamer · September 2025

@TimboJones said:
Instead of blocking them, how do we get it to view ads and earn passive income?

I think passive income will not come through bots, you need to get real traffic

JustPfff · September 2025

@TimboJones said: how do we get it to view ads and earn passive income?

These traffic won't show up in Statcounter either they block JS or they're blocking all trackers/ads servers using any AdBlock list.

david · September 2025

For something simple, I like csswaf.

https://github.com/yzqzss/csswaf

It's still a proof-of-concept (says not for production), but I'm using it for some stuff that's not too important.

If the connection is poor, sometimes it has to try 2-3 times, but it mostly works on the first try.

NeedDeal · September 2025

If bots causes you too much hassle then it means your webpage has too much junk on it.

jnd · September 2025

@david said:
For something simple, I like csswaf.

https://github.com/yzqzss/csswaf

It's still a proof-of-concept (says not for production), but I'm using it for some stuff that's not too important.

If the connection is poor, sometimes it has to try 2-3 times, but it mostly works on the first try.

I thought why aren't there more or less hidden links that normal users won't click but bots do and after that they get insta banned. Offer some randomized poisoned links here and there, more in the footer.

jnd · September 2025

@NeedDeal said:
If bots causes you too much hassle then it means your webpage has too much junk on it.

Yeah this too. Or just offer text only content to the bots, no more bw draining.

MeAtExampleDotCom · September 2025

@MikeA said:

@davide said:
Has anyone written a web server logs parser to detect non-human sessions? There are human patterns that crawlers don't follow.

There's no realistic way to do it. AI crawlers like xAI use residential IPs and actively rotate fake useragents. They're not the only one but they're the most prevalent due to their size.

One method that works for the moment is proof-of-work based things like Anubis (which you can self-host). Anubis is the most common of its ilk ATM but there are several other implementations out there if you prefer not to use that one (https://github.com/sequentialread/pow-bot-deterrent is the first that came out of a quick search). This is at least part of the techniques used by Cloudflair and similar services that offer scraper detection.

These methods work by the fact that the scraper won't want to spend a pile of CPU time running a tight JS or WebASM loop. If CPU time becomes cheap enough it will stop working as a deterrent, though even then it might at least reduce the rate of requests sent by each bot. This does cause trouble for people with JS turned off so you need to decide if you care to support that or not. Anubis does support non-JS challenges now though I've not looked into how this is done to see if it would be a significant inconvenience to a human user, presumably other similar systems support non-JS users too. It can also be a problem for low-power devices if the difficulty of the CPU challenge is set too high, so again that is something you need to decide how much you care about. A further issue for very low-end servers is that you are adding some processing that side too, not much but it might be significant if practically everything else you are hosting is static files.

david · September 2025

@jnd said: I thought why aren't there more or less hidden links that normal users won't click but bots do and after that they get insta banned. Offer some randomized poisoned links here and there, more in the footer.

For csswaf, from the readme:

What is CSSWAF?
CSSWAF places random hidden empty.gif files in CSS animation
progress, allowing the browser to load these images one by one.
The backend measures the loading order. If the loading order is
correct, it passes the request to the target server. Otherwise, 🙅.

HoneyPot
CSSWAF places some honeypot empty.gif files in HTML <img>
tags but instructs the browser not to load them. If someone
loads the honeypot GIFs, 🙅. CSSWAF also places some
unvisible <a> tags in HTML, if someone clicks the honeypot
links, 🙅.

mrTom · September 2025

@jnd said: I thought why aren't there more or less hidden links that normal users won't click but bots do and after that they get insta banned. Offer some randomized poisoned links here and there, more in the footer.

that worked for a while, but todays bots often use one fresh ip per request.

in my log i have endless "visitors" from far away countries that request exactly one page and never come back.

jnd · September 2025

@david said:

@jnd said: I thought why aren't there more or less hidden links that normal users won't click but bots do and after that they get insta banned. Offer some randomized poisoned links here and there, more in the footer.

For csswaf, from the readme:
What is CSSWAF?
CSSWAF places random hidden empty.gif files in CSS animation
progress, allowing the browser to load these images one by one.
The backend measures the loading order. If the loading order is
correct, it passes the request to the target server. Otherwise, 🙅.

HoneyPot
CSSWAF places some honeypot empty.gif files in HTML <img>
tags but instructs the browser not to load them. If someone
loads the honeypot GIFs, 🙅. CSSWAF also places some
unvisible <a> tags in HTML, if someone clicks the honeypot
links, 🙅.

Yes I read that, that's why I replied.

jnd · September 2025

@mrTom said:

@jnd said: I thought why aren't there more or less hidden links that normal users won't click but bots do and after that they get insta banned. Offer some randomized poisoned links here and there, more in the footer.

that worked for a while, but todays bots often use one fresh ip per request.

in my log i have endless "visitors" from far away countries that request exactly one page and never come back.

What about limiting the random visitors if they don't provide the correct referrer? And those who do should have previous previous visit logged. That could perhaps hit people who are switching networks (mobile/Wifi) between pages. I mean there must be some countermeasures on webserver and firewall level, even before the request starts to be processed.

david · September 2025

@jnd said: Yes I read that, that's why I replied.

Ah, got it. So far it seems to be effective, though. Looking at the log, most of the bots that were hitting my site don't even try to load the css animation, and give up.

blip1945 · September 2025

I just added 1 domain to cf, no real a and aaaa records beside pointing to cf cdn, activate a rule to block all just to see the logs, finish adding cf ns to my registrar and immediately saw all nasty request from ai bots getting blocked and logged. I pity real webmaster got to deal with those shit 247.

NeedDeal · September 2025

Real webmasters cough devs dont bloat their website.
Ai bots are not the issue.

sb_th · September 2025

I would argue AI bots are the issue. They keep refreshing a page like it changes every minute of every day to grab new content. In a way that does not make any sense.

blip1945 · September 2025

@NeedDeal said:
Real webmasters cough devs dont bloat their website.
Ai bots are not the issue.

Bloated or slim static html aren't even the factor, ai bots will hammer them regardless so ai bot is definitely the source issue.

mrTom · September 2025

@NeedDeal said: Real webmasters cough devs dont bloat their website.
Ai bots are not the issue.

you clearly have no idea about the scale of the problem. if you dont block certain ip ranges you are hammered with multiple requests per second 24/7. apart from the cpu load - for nothing in return - my logfiles grew so big that it became its own problem.

Fubuki · September 2025

I would just block on the ASN level regardless like Amazon and Google and move on with my life

dustinc · September 2025

We’ve noticed this too, it’s a pretty common scenario.

Speaking from experience with our shared hosting platform, cpGuard (with its built-in WAF rules) helps mitigate a lot of this junk traffic, without impacting your search engine visibility, or AI visibility. Paired with LiteSpeed Cache which we also include, it makes a noticeable difference. I had a customer run into a similar issue recently with a Wordpress site, as soon as he enabled our ModSec rules and turned on LiteSpeed Cache, the issue disappeared immediately.

Might be worth exploring a mix of caching plus some WAF rules. As others mentioned, Cloudflare also gives you flexibility to customize WAF behavior etc.

Fritz · September 2025

Do you block Ai Bot?

Amazonbot (Amazon)
Applebot (Apple)
Bytespider (ByteDance)
ClaudeBot (Anthropic)
DuckAssistBot (DuckDuckGo)
Google-CloudVertexBot (Google)
GoogleOther (Google)
GPTBot (OpenAI)
Meta-ExternalAgent (Meta)
PetalBot (Huawei)
TikTokSpider (ByteDance)
CCBot (Common Crawl)

Saahib · September 2025

@NeedDeal said:
If bots causes you too much hassle then it means your webpage has too much junk on it.

No sir, definitely you haven't dealt this issue much.

Carrie · September 2025

@CloudHopper said:
Safeline WAF has a really good anti-bot mechanism, for those who don't want to use Cloudflare, (and its exploit prevention is also far superior to CF too): https://github.com/chaitin/SafeLine

Thank you so much for recommending SafeLine WAF. I'm the operations manager of this product. We're trying to make it better and it's getting more and more popular as well.

Carrie · September 2025

You could try SafeLine web application firewall to effectively stop malicious bots and scrapers. The basic and core features are all free of charge forever.

Howdy, Stranger!

Categories

In this Discussion

Bots are draining your bandwidth without you notice

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Bots are draining your bandwidth without you notice

Comments