New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Comments
Embrace their scraping. Serve endless auto generated markov tarpit garbage page. Theres nepenthes and few others for that.
I've been experimenting with that, most of them are on 1G pipes only, that you can saturate if you have enough bandwidth, i.e. 10G+
They will still keep coming though, but with latencies in the 10,000+ ms range
This increased latency can be a nice signal to identify their pools of IPs, but you'll end up blocking hundreds of thousands of IPs and legit people will start emailing you "hey I'm blocked?" Turns out they share their home network IP somehow ("free" VPN, "free" money, malware, on some random device)
Im making> @yoursunny said:
Any mock in adsense code leads to ban
add photos of Trump
The point of serving the tarpit is you don't block them so theres no issue of legit users being blocked. You just rate limit main content page and route to the tarpit if rate exceeded, or specifically add the tarpit endpoint to robots.txt and any bots that disrespect the robots.txt will crawl the tarpit.
in my case the log file grew so big that the server crashed constantly, and thats a 5 Core EPYC 10GB VPS. so i would say that is an actual problem.
Rate limit how? New IP for every single request. Some dumb bots even use a new IP for HTTP redirects which makes them easier to detect, but also affects real users switching networks on the move
and no you can't ratelimit the ASN either, because residential IPs
i tried to research this, it seems there are apps that pay users to resell their actual home isp connection, so there are now (premium priced?) scraping services that use actual residential IPs located in actual homes.
no way of blocking them, because two days later the same IP may be assigned to a legit user.
anyone got more insight on this and how to deal with it?
Use server side condition:
All your content remains accessible to everyone.
AdSense is only visible to those who made good life choices and passed the vibe check.
Typically, the request header information from bots differs significantly from that of regular users. You can start by implementing rules to block these headers, which should address at least 80% of bots. For the remainder, you can use a WAF (Web Application Firewall) for blocking and bot detection/validation. Complete blocking isn't entirely realistic, but this approach should at least resolve your AdSense issues.
hi bro u can Building shadowsocks shared china customer mjj Repatriation to gfw Implement denial of crawling; and I look forward to your sharing to telegram ygkkk group or ticket idc need china gfw blocked ip
Maybe install crowdsec and add some aggressive custom scenarios?
After the first 1-2 times, did you think about logrotate?
guess what, thats when it crashed...
its a catalogue type site with ~10000 items in >15 languages. usually no problem, did run on a free webspace without issues for years, but when multiple bots scrape every single page in every language 24/7, its generates a huge load for zero in return.
So rotate your logs more often and the problem is solved
I set a rule in cloudflare security to block traffic from China.
Then, activate the bot fighting mode.
Finally, I'm under attack mode enabled.
This.
While I don't use Cloudflare, this is essential.
According to my Cloudflare WAF logs, there are always lots of spam requests from IPs of Microsoft.
They keep requesting even got rejected by WAF rules, or even switch IP between different datacenters.
lots of scrapers use IPs from Microsoft and Google, i guess because they hope people will not dare to block them.
You can make JavaScript script code, to identify if it's real device connected to you page or just bot runs on a virtual machine, that will be stronger than CloudFlare captcha.
I believe the copyright for winnie the pooh animation has expired so you are free to plaster the page with pooh references. im being serious. hope it helps
i want to hyperblast a ISP / ASN, how can i do it?> @gremeyer said:
i like nuclear options!
how can i block complete ASN?
u can try the country blocked
Oh my I didn't know that vpspricetracker was created by you, its an awesome website, thank you for creating it
Another thing but shame that its been >100_000 addresses
This may seem offtopic but I had once thought of scraping vpspricetracker too but ultimately decided not too but I am wondering if there is a way that we can download the whole database/ (sqlite database?) if possible on a public endpoint since i would love to view vpspricetracker database locally/ do queries and similar stuff with it if you don't mind, its an awesome website and thank you for creating it!
I read some guy once who said he was getting hammered with 600_000 requests (scrape?) requests on its git server and after using anubis, it went down to 600
He said that although he doesnt like anubis as a solution, he said that its understandable because his fake traffic got reduced 1000x
Food for thought
Why don't you reach out to him and say for $200 per month you'll give him all the data. He's probably spending that much on proxies anyway.
Or the evil version: instead of blocking, feed false data.
Not really. Proxies are 0.3-0.8/gb at scale, and I don't think his webpages are that big.
Then there is another idea: when a proxy is detected, insert a GB of whitespace.
blocking ASN with cloudflare's free plan works well. however, is there a trick i can use to display a custom error message to “visitors” from the defined ASN without having to switch to a paid cloudflare plan?