New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Comments
You can block requests with Cloudflare security rules or .htaccess by targeting specific user agent headers.
I didn't, I was doing it manually blocking entire /16 subnets, until my FW config bloated, it made my VPS much slower since it filter millions of IP address, i don't own a top 1 Million website, but these bots were hammering my VPS like crazy.
I don't know what happening, but I think my first VPS provider put no effort to filter bot bad IP's at all.
There were days when higher traffic used to mean more money for website owners.. but now..
Instead of blocking them, how do we get it to view ads and earn passive income?
I thought this was a @DediRock post at first tbh
I'm sure xAI pays a ton of money for Google Cloud compute, so of course Google Cloud would turn a blind eye.
Intersting thing is sometimes i get real traffic from chatgpt users
I think passive income will not come through bots, you need to get real traffic
These traffic won't show up in Statcounter either they block JS or they're blocking all trackers/ads servers using any AdBlock list.
For something simple, I like csswaf.
https://github.com/yzqzss/csswaf
It's still a proof-of-concept (says not for production), but I'm using it for some stuff that's not too important.
If the connection is poor, sometimes it has to try 2-3 times, but it mostly works on the first try.
If bots causes you too much hassle then it means your webpage has too much junk on it.
I thought why aren't there more or less hidden links that normal users won't click but bots do and after that they get insta banned. Offer some randomized poisoned links here and there, more in the footer.
Yeah this too. Or just offer text only content to the bots, no more bw draining.
One method that works for the moment is proof-of-work based things like Anubis (which you can self-host). Anubis is the most common of its ilk ATM but there are several other implementations out there if you prefer not to use that one (https://github.com/sequentialread/pow-bot-deterrent is the first that came out of a quick search). This is at least part of the techniques used by Cloudflair and similar services that offer scraper detection.
These methods work by the fact that the scraper won't want to spend a pile of CPU time running a tight JS or WebASM loop. If CPU time becomes cheap enough it will stop working as a deterrent, though even then it might at least reduce the rate of requests sent by each bot. This does cause trouble for people with JS turned off so you need to decide if you care to support that or not. Anubis does support non-JS challenges now though I've not looked into how this is done to see if it would be a significant inconvenience to a human user, presumably other similar systems support non-JS users too. It can also be a problem for low-power devices if the difficulty of the CPU challenge is set too high, so again that is something you need to decide how much you care about. A further issue for very low-end servers is that you are adding some processing that side too, not much but it might be significant if practically everything else you are hosting is static files.
For csswaf, from the readme:
that worked for a while, but todays bots often use one fresh ip per request.
in my log i have endless "visitors" from far away countries that request exactly one page and never come back.
Yes I read that, that's why I replied.
What about limiting the random visitors if they don't provide the correct referrer? And those who do should have previous previous visit logged. That could perhaps hit people who are switching networks (mobile/Wifi) between pages. I mean there must be some countermeasures on webserver and firewall level, even before the request starts to be processed.
Ah, got it. So far it seems to be effective, though. Looking at the log, most of the bots that were hitting my site don't even try to load the css animation, and give up.
I just added 1 domain to cf, no real a and aaaa records beside pointing to cf cdn, activate a rule to block all just to see the logs, finish adding cf ns to my registrar and immediately saw all nasty request from ai bots getting blocked and logged. I pity real webmaster got to deal with those shit 247.
Real webmasters cough devs dont bloat their website.
Ai bots are not the issue.
I would argue AI bots are the issue. They keep refreshing a page like it changes every minute of every day to grab new content. In a way that does not make any sense.
Bloated or slim static html aren't even the factor, ai bots will hammer them regardless so ai bot is definitely the source issue.
you clearly have no idea about the scale of the problem. if you dont block certain ip ranges you are hammered with multiple requests per second 24/7. apart from the cpu load - for nothing in return - my logfiles grew so big that it became its own problem.
I would just block on the ASN level regardless like Amazon and Google and move on with my life
We’ve noticed this too, it’s a pretty common scenario.
Speaking from experience with our shared hosting platform, cpGuard (with its built-in WAF rules) helps mitigate a lot of this junk traffic, without impacting your search engine visibility, or AI visibility. Paired with LiteSpeed Cache which we also include, it makes a noticeable difference. I had a customer run into a similar issue recently with a Wordpress site, as soon as he enabled our ModSec rules and turned on LiteSpeed Cache, the issue disappeared immediately.
Might be worth exploring a mix of caching plus some WAF rules. As others mentioned, Cloudflare also gives you flexibility to customize WAF behavior etc.
Do you block Ai Bot?
Amazonbot (Amazon)
Applebot (Apple)
Bytespider (ByteDance)
ClaudeBot (Anthropic)
DuckAssistBot (DuckDuckGo)
Google-CloudVertexBot (Google)
GoogleOther (Google)
GPTBot (OpenAI)
Meta-ExternalAgent (Meta)
PetalBot (Huawei)
TikTokSpider (ByteDance)
CCBot (Common Crawl)
No sir, definitely you haven't dealt this issue much.
Thank you so much for recommending SafeLine WAF. I'm the operations manager of this product. We're trying to make it better and it's getting more and more popular as well.
You could try SafeLine web application firewall to effectively stop malicious bots and scrapers. The basic and core features are all free of charge forever.