Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Honesty I had enough from bots traffic I block them all

Hello, a couple of months ago I posted about how bots were consuming most of my monthly bandwidth.
It happened for two reasons: 1) my first provider wasn’t blocking any of them, not even aggressive bots, and 2) none of these bots honor robots.txt ( I only allow Google/Bing ), and since I have a fairly large website, they probably thought it was worth indexing and kept hammering it consistently.

Even though my new provider does a much better job blocking most of these crazy bots, from what I can see in Cloudflare’s stats, they still hit my site from different IPs.

After I asked ChatGPT for the best solution, it suggested blocking these bots at the source completely via Cloudflare firewall custom rules, adding the most well-known server providers by their ASN.

I know this sounds extreme and I wouldn’t normally advise anyone to do it, but I did it anyway to buy peace of mind, so I don’t have to worry about security, hacking, or content grabbing (which is the main goal of these bots), since I can’t remember a single one of my sites that hasn’t been mirrored by someone at some point.

Comments

  • remyremy Member
    edited February 6

    I'm not sure there is a perfect solution
    the web is increasingly locked down with capatchas, and filtering has become extremely aggressive. It feels like this is the direction things are heading, and honestly it's exhausting.

    Websites are dead, just ask AI i guess.
    Then scroll on tiktok or instagram or x
    Sleep and repeat.

    Thanked by 1tentor
  • I was exactly like you, I was suffering from AI bots, SEO bots, and even hacking bots. I blocked them using Cloudflare bot blocking feature, and also blocked some IPs that were reaching more than 500k visits per day

    But about two weeks ago I removed the blocks, because I completely rebuilt and optimized all my websites properly, implemented Redis caching, organized the databases, and solved the server load issues entirely. In your case, you’re dealing with bandwidth usage, while I personally don’t worry about bandwidth because I have a high limit. My main problem before was server resource consumption caused by heavy bot traffic. After the optimizations I made, my server can now handle up to 150 million visits without problems

    Note: My server hosts 52 news websites, and each site gets 2 million+ daily visits. So even with bots, it’s no longer an issue for me.

    My advice: focus heavily on proper coding and caching, and let the bots access your site. After some time they lose interest and move on. As for AI bots, they won’t be a burden anymore, in fact, your content becomes a source for them and part of their training data

    Thanked by 1JustPfff
  • rpqurpqu Member
    edited February 6

    @remy said:
    the web is increasingly locked down with capatchas, and filtering has become extremely aggressive. It feels like this is the direction things are heading, and honestly it's exhausting.
    Websites are dead, just ask AI i guess.

    It's becoming more and more walled garden. And bot activities has grown out of control to the point reality aligned itself with the dead internet theory.
    If you can't block bot traffic from human traffic, it's better to add more node, more traffic than penny pinch everything.

    Edge optimization is useless when typical visitor have to wait 15-30s before they can visit the site

    Thanked by 2nikio remy
  • nikionikio Member

    Why block bots when you can mess with them instead? Send them to a HTTP tarpit. Or 301 them to fbi.gov when they try visiting /wp-login.php.

    If you're concerned about web scraping, a simple thing you can do without inconveniencing real users is implement user accounts. No paywall or anything. Just an account. If you think an account is hammering your website too much (i.e. a bot registered), just ban it.

    The only traffic I personally block is Port 22 (cuts out 95% of bot activity) and any traffic originating in the EU because fuck the GDPR.

    @rpqu said: Edge optimization is useless when typical visitor have to wait 15-30s before they can visit the site

    Exactly! Gotta love Cloudflare with one hand offering a CDN, and with the other hand, showing you a captcha.

    I have to wonder how long before websites start showing you ads in place of a captcha.

    Thanked by 2JasonM Frameworks
  • even I'm using Cloudflare firewall to block on ASN level and country level: China/Russia.

    Thanked by 1JustPfff
  • rpqurpqu Member
    edited February 6

    @nikio said:

    @rpqu said: Edge optimization is useless when typical visitor have to wait 15-30s before they can visit the site

    Exactly! Gotta love Cloudflare with one hand offering a CDN, and with the other hand, showing you a captcha.

    I have to wonder how long before websites start showing you ads in place of a captcha.

    Netcup has customized anubis version installed, and it showed techaro's new company logo. It's non-personalized, but it does promote the company.
    Anyway, PoW, captcha is bad for web preservation because archive.org wayback machine can't pass many of them. Meanwhile, people could just purchase api credits from scrapper oriented PaaS.
    It's tragedy of the anticommons, as people retired or lock down their website because they cannot monetized it. People will share knowledge less and know less outside of their domain

    Thanked by 1forest
  • JustPfffJustPfff Member
    edited February 6

    @nikio said:
    Why block bots when you can mess with them instead? Send them to a HTTP tarpit. Or 301 them to fbi.gov when they try visiting /wp-login.php.
    If you're concerned about web scraping, . Just an account. If you think an account is hammering your website too much (i.e. a bot registered), just ban it.
    The only traffic I personally block is Port 22 (cuts out 95% of bot activity) and any traffic originating in the EU because fuck the GDPR.
    Exactly! Gotta love Cloudflare with one hand offering a CDN, and with the other hand, showing you a captcha.

    OK guys I already said the issue I have,does not apply to everyone, it's specified case on my sites, from long experience, these contents grabbers will crewel your website from day one even before google, the problem with my site I heavily really on the data/info I provide to the visitors if a bot grab/mirror it to another website or even social media account I'm done, as it's the main traffic driven to my site,
    and for Cloud-flare JS challenge it's stupid specially their the recent update it will scare the visitors away, honesty I fell comfortable now banning most of these bots servers providers, i don't need them I need real users to read my site not bots.

  • rpqurpqu Member
    edited February 6

    @JustPfff said:

    @nikio said:
    Why block bots when you can mess with them instead? Send them to a HTTP tarpit. Or 301 them to fbi.gov when they try visiting /wp-login.php.
    If you're concerned about web scraping, . Just an account. If you think an account is hammering your website too much (i.e. a bot registered), just ban it.
    The only traffic I personally block is Port 22 (cuts out 95% of bot activity) and any traffic originating in the EU because fuck the GDPR.
    Exactly! Gotta love Cloudflare with one hand offering a CDN, and with the other hand, showing you a captcha.

    OK guys I already said the issue I have not apply to everyone, it's specified case on my sites, from long experience, these contents grabbers will crewel your website from day one even before google, the problem with my site I heavily really on the data/info I provide to the visitors if a bot grab/mirror it to another website or even social media account I'm done, as it's the main traffic driven to my site,

    Prime time to become youtube/tiktok influencer. The #1 dream job for gen z & alpha around the world

  • JustPfffJustPfff Member
    edited February 6

    @rpqu said: Prime time to become youtube/tiktok influencer. #1 dream job for gen z & alpha around the world

    What was stopping me so far is that I’m not a good-looking guy ( ^^ check my avatar ).
    But… there is AI, and I’ve already started building strong social profiles/pages for my next stage: using and creating an AI influencer kingdom and cashing out from it..

  • rpqurpqu Member

    @JustPfff said:

    @rpqu said: Prime time to become youtube/tiktok influencer. #1 dream job for gen z & alpha around the world

    What was stopping me so far is that I’m not a good-looking guy ( ^^ check my avatar ).

    Vtuber is ok too

  • @JasonM said:
    even I'm using Cloudflare firewall to block on ASN level and country level: China/Russia.

    cf asn block is supreme. unfortunately, with a free account, you cannot send individual error messages or redirects to the requesters of blocked ASNs.

  •         location = /robots.txt {
               add_header  Content-Type  text/plain;
               return 200 "User-agent: *\nDisallow: /\n";
            }
    

    This no ?

Sign In or Register to comment.