New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.


Comments
Pull the plug or just localhost it. Nothing you can do about it. Nothing.
Use Anubis or similar solutions. In Cloudflare you cna also enable similar bot protection.
cant fight them
then join them
give in
ref link possibel?
Mentally strong people allow bots into all web properties.

When you're scraping and the site has cloudflare under attack mode enabled with the puzzle thingie to confirm ur a human, that can really fuck with bots. You should have it enabled.
Or if you have found a way to curcimvent it, let me know.
Become a bot yourself.
There's a saying (or I am saying ) - if you want to fight something/ someone, get in their shoes, become one of them (figuratively).this will help you be at par or close to combating them
Hence, again, become a bot.
Try Cloudflare Pro with proxy on !! I think it will help ..
anubis does require js and is way too bloated, just call the police for DDoS!
a lawyer will help you as well
Fix your website. You should be able to handle that much traffic.
But it will block most of the bots.
Just put Nepenthes on your server:
https://zadzmo.org/code/nepenthes/
Demo page (what the AI sees with Nepenthes):
https://zadzmo.org/nepenthes-demo
Or if you want something that uses less CPU but is less evil to the bots, use Iocane:
https://iocaine.madhouse-project.org/
Demo page (what the AI sees with Iocane):
https://poison.madhouse-project.org/
warning from that page:
Indeed (although that points to their software being poorly optimized). That is why Iocane may be a better option as it does not result in significant CPU load (but it also doesn't slow down bots as much).
Are all bots (equally) bad? How do you make sure you only bock the bad bots? And no humans?
You can whitelist IP ranges from well-behaved bots. I'm sure Cloudflare allows the Internet Archive and Google spiders, for example. As for people, there are generally three ways: IP reputation, fingerprinting, and proof-of-work.
IP reputation simply works by querying IP databases (or your own database if you're a big CDN like Cloudflare that can see a lot of the internet at once). Fingerprinting is based on analyzing browser behavior. It's possible for bots to get around that by running a genuine browser rather than a lightweight script and interacting with it with something like Selenium, but that's heavier and forces the bot owner to expend more resources. Finally, proof-of-work involves completing a mathematical puzzle. It's automated in JavaScript, but it takes 100% CPU for a fraction of a second. That doesn't bother humans (too much) because it just slightly increases page load time, but it's a huge barrier for bots which would otherwise be able to visit thousands of pages per second.
I don't bother fighting bots beyond basic measures like auth, rate limits, and firewalling off services that don't need to be open to the internet (e.g. SSH). In this case, I'm only referring to the bots scanning all types of shit, and not targeted scraping or similar.
If it causes noticeable load (e.g. with git crawling), then sure, I'll put in a bit more effort.
It bothers me, I hate Anubis, and I just close the site whenever I see the Anubis page.
What is the resource bottleneck being exhausted?
Everything I do is basically on 2GB of RAM or less, 1 vCPU. Nginx and PHP. No bot specific protection.
What are we afraid of, exactly? That they'll access the publicly accessible data and information I willingly published to be accessible and available to everyone?
I wonder if there’s something like a bot black hole I can create to trap them.
wdym my $12/y vps cant handle the LOAD
Bandwidth. There are many sites right now, especially medium-sized forums and blogs, that are being crippled by the tremendous traffic that these scrapers bring. I doubt many people would care if the bots were polite like web spiders.
I hate it too and often do as well. Unfortunately, the alternative for some sites might be a timeout or 503.
A few posts up.
How can you have the AIRPLANE MODE activated and still be Wifi-connected??..
Just enable airplane mode and then enable wifi...