Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Bots are draining your bandwidth without you notice

2»

Comments

  • WebProjectWebProject Veteran, 🚩 Host Rep Tag Suspended

    @ehhthing said:

    @WebProject said:

    @ehhthing said:

    @davide said:
    Has anyone written a web server logs parser to detect non-human sessions? There are human patterns that crawlers don't follow. This kind of passive analysis would throw no JavaShit to the users' browser like HCaptcha or Cloudflare.

    This is possible but then you can’t exactly block the connections outright because you risk accidentally blocking real traffic from a “weird” person. Eventually you do need to serve a captcha or some other verification method unless you want to lose business.

    It is possible to use Cloudflare to block all AI bots and other non-search engine bots using security rules. On one of our client websites, the Facebook bot alone was consuming around 50 GB of bandwidth per month.

    Cloudflare uses JS as part of their bot detection system.

    You can block requests with Cloudflare security rules or .htaccess by targeting specific user agent headers.

  • JustPfffJustPfff Member
    edited September 2025

    @Saahib said: @JustPfff, what method you used to filter out bots ?

    I didn't, I was doing it manually blocking entire /16 subnets, until my FW config bloated, it made my VPS much slower since it filter millions of IP address, i don't own a top 1 Million website, but these bots were hammering my VPS like crazy.
    I don't know what happening, but I think my first VPS provider put no effort to filter bot bad IP's at all.

  • SaahibSaahib Host Rep, Veteran

    There were days when higher traffic used to mean more money for website owners.. but now..

  • Instead of blocking them, how do we get it to view ads and earn passive income?

  • I thought this was a @DediRock post at first tbh

  • @cu_olly said:
    They will not, under any circumstance, remove Grok as a customer, no matter what levels of abuse are reported.

    I'm sure xAI pays a ton of money for Google Cloud compute, so of course Google Cloud would turn a blind eye.

  • Intersting thing is sometimes i get real traffic from chatgpt users

  • @TimboJones said:
    Instead of blocking them, how do we get it to view ads and earn passive income?

    I think passive income will not come through bots, you need to get real traffic

  • JustPfffJustPfff Member
    edited September 2025

    @TimboJones said: how do we get it to view ads and earn passive income?

    These traffic won't show up in Statcounter either they block JS or they're blocking all trackers/ads servers using any AdBlock list.

  • For something simple, I like csswaf.

    https://github.com/yzqzss/csswaf

    It's still a proof-of-concept (says not for production), but I'm using it for some stuff that's not too important.

    If the connection is poor, sometimes it has to try 2-3 times, but it mostly works on the first try.

    Thanked by 2jnd 0xC7
  • If bots causes you too much hassle then it means your webpage has too much junk on it.

    Thanked by 1jnd
  • @david said:
    For something simple, I like csswaf.

    https://github.com/yzqzss/csswaf

    It's still a proof-of-concept (says not for production), but I'm using it for some stuff that's not too important.

    If the connection is poor, sometimes it has to try 2-3 times, but it mostly works on the first try.

    I thought why aren't there more or less hidden links that normal users won't click but bots do and after that they get insta banned. Offer some randomized poisoned links here and there, more in the footer.

  • @NeedDeal said:
    If bots causes you too much hassle then it means your webpage has too much junk on it.

    Yeah this too. Or just offer text only content to the bots, no more bw draining.

  • @MikeA said:

    @davide said:
    Has anyone written a web server logs parser to detect non-human sessions? There are human patterns that crawlers don't follow.

    There's no realistic way to do it. AI crawlers like xAI use residential IPs and actively rotate fake useragents. They're not the only one but they're the most prevalent due to their size.

    One method that works for the moment is proof-of-work based things like Anubis (which you can self-host). Anubis is the most common of its ilk ATM but there are several other implementations out there if you prefer not to use that one (https://github.com/sequentialread/pow-bot-deterrent is the first that came out of a quick search). This is at least part of the techniques used by Cloudflair and similar services that offer scraper detection.

    These methods work by the fact that the scraper won't want to spend a pile of CPU time running a tight JS or WebASM loop. If CPU time becomes cheap enough it will stop working as a deterrent, though even then it might at least reduce the rate of requests sent by each bot. This does cause trouble for people with JS turned off so you need to decide if you care to support that or not. Anubis does support non-JS challenges now though I've not looked into how this is done to see if it would be a significant inconvenience to a human user, presumably other similar systems support non-JS users too. It can also be a problem for low-power devices if the difficulty of the CPU challenge is set too high, so again that is something you need to decide how much you care about. A further issue for very low-end servers is that you are adding some processing that side too, not much but it might be significant if practically everything else you are hosting is static files.

  • @jnd said: I thought why aren't there more or less hidden links that normal users won't click but bots do and after that they get insta banned. Offer some randomized poisoned links here and there, more in the footer.

    For csswaf, from the readme:

    What is CSSWAF?
    CSSWAF places random hidden empty.gif files in CSS animation
    progress, allowing the browser to load these images one by one.
    The backend measures the loading order. If the loading order is
    correct, it passes the request to the target server. Otherwise, 🙅.
    
    HoneyPot
    CSSWAF places some honeypot empty.gif files in HTML <img>
    tags but instructs the browser not to load them. If someone
    loads the honeypot GIFs, 🙅. CSSWAF also places some
    unvisible <a> tags in HTML, if someone clicks the honeypot
    links, 🙅.
    
  • @jnd said: I thought why aren't there more or less hidden links that normal users won't click but bots do and after that they get insta banned. Offer some randomized poisoned links here and there, more in the footer.

    that worked for a while, but todays bots often use one fresh ip per request.

    in my log i have endless "visitors" from far away countries that request exactly one page and never come back.

  • @david said:

    @jnd said: I thought why aren't there more or less hidden links that normal users won't click but bots do and after that they get insta banned. Offer some randomized poisoned links here and there, more in the footer.

    For csswaf, from the readme:

    What is CSSWAF?
    CSSWAF places random hidden empty.gif files in CSS animation
    progress, allowing the browser to load these images one by one.
    The backend measures the loading order. If the loading order is
    correct, it passes the request to the target server. Otherwise, 🙅.
    
    HoneyPot
    CSSWAF places some honeypot empty.gif files in HTML <img>
    tags but instructs the browser not to load them. If someone
    loads the honeypot GIFs, 🙅. CSSWAF also places some
    unvisible <a> tags in HTML, if someone clicks the honeypot
    links, 🙅.
    

    Yes I read that, that's why I replied.

  • @mrTom said:

    @jnd said: I thought why aren't there more or less hidden links that normal users won't click but bots do and after that they get insta banned. Offer some randomized poisoned links here and there, more in the footer.

    that worked for a while, but todays bots often use one fresh ip per request.

    in my log i have endless "visitors" from far away countries that request exactly one page and never come back.

    What about limiting the random visitors if they don't provide the correct referrer? And those who do should have previous previous visit logged. That could perhaps hit people who are switching networks (mobile/Wifi) between pages. I mean there must be some countermeasures on webserver and firewall level, even before the request starts to be processed.

  • @jnd said: Yes I read that, that's why I replied.

    Ah, got it. So far it seems to be effective, though. Looking at the log, most of the bots that were hitting my site don't even try to load the css animation, and give up.

  • I just added 1 domain to cf, no real a and aaaa records beside pointing to cf cdn, activate a rule to block all just to see the logs, finish adding cf ns to my registrar and immediately saw all nasty request from ai bots getting blocked and logged. I pity real webmaster got to deal with those shit 247.

  • Real webmasters cough devs dont bloat their website.
    Ai bots are not the issue.

    Thanked by 2tentor classy
  • sb_thsb_th Member, Patron Provider

    I would argue AI bots are the issue. They keep refreshing a page like it changes every minute of every day to grab new content. In a way that does not make any sense.

  • @NeedDeal said:
    Real webmasters cough devs dont bloat their website.
    Ai bots are not the issue.

    Bloated or slim static html aren't even the factor, ai bots will hammer them regardless so ai bot is definitely the source issue.

    Thanked by 1mrTom
  • @NeedDeal said: Real webmasters cough devs dont bloat their website.
    Ai bots are not the issue.

    you clearly have no idea about the scale of the problem. if you dont block certain ip ranges you are hammered with multiple requests per second 24/7. apart from the cpu load - for nothing in return - my logfiles grew so big that it became its own problem.

    Thanked by 2JustPfff 0xC7
  • I would just block on the ASN level regardless like Amazon and Google and move on with my life

  • dustincdustinc Member, Patron Provider, Top Host

    We’ve noticed this too, it’s a pretty common scenario.

    Speaking from experience with our shared hosting platform, cpGuard (with its built-in WAF rules) helps mitigate a lot of this junk traffic, without impacting your search engine visibility, or AI visibility. Paired with LiteSpeed Cache which we also include, it makes a noticeable difference. I had a customer run into a similar issue recently with a Wordpress site, as soon as he enabled our ModSec rules and turned on LiteSpeed Cache, the issue disappeared immediately.

    Might be worth exploring a mix of caching plus some WAF rules. As others mentioned, Cloudflare also gives you flexibility to customize WAF behavior etc.

  • Do you block Ai Bot?

    Amazonbot (Amazon)
    Applebot (Apple)
    Bytespider (ByteDance)
    ClaudeBot (Anthropic)
    DuckAssistBot (DuckDuckGo)
    Google-CloudVertexBot (Google)
    GoogleOther (Google)
    GPTBot (OpenAI)
    Meta-ExternalAgent (Meta)
    PetalBot (Huawei)
    TikTokSpider (ByteDance)
    CCBot (Common Crawl)

  • SaahibSaahib Host Rep, Veteran

    @NeedDeal said:
    If bots causes you too much hassle then it means your webpage has too much junk on it.

    No sir, definitely you haven't dealt this issue much.

  • CarrieCarrie Member, Host Rep

    @CloudHopper said:
    Safeline WAF has a really good anti-bot mechanism, for those who don't want to use Cloudflare, (and its exploit prevention is also far superior to CF too): https://github.com/chaitin/SafeLine

    Thank you so much for recommending SafeLine WAF. I'm the operations manager of this product. We're trying to make it better and it's getting more and more popular as well.

  • CarrieCarrie Member, Host Rep

    You could try SafeLine web application firewall to effectively stop malicious bots and scrapers. The basic and core features are all free of charge forever.

Sign In or Register to comment.