Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Help..! Chinese bots scraping my page

2

Comments

  • Embrace their scraping. Serve endless auto generated markov tarpit garbage page. Theres nepenthes and few others for that.

  • @blip1945 said:
    Serve endless auto generated markov tarpit garbage page

    I've been experimenting with that, most of them are on 1G pipes only, that you can saturate if you have enough bandwidth, i.e. 10G+

    They will still keep coming though, but with latencies in the 10,000+ ms range

    This increased latency can be a nice signal to identify their pools of IPs, but you'll end up blocking hundreds of thousands of IPs and legit people will start emailing you "hey I'm blocked?" Turns out they share their home network IP somehow ("free" VPN, "free" money, malware, on some random device)

  • @sillycat said:

    @praburam said: my adsense RPM

    Oh no! The AI slop isn't making any money! Just don't serve ads to Chinese users.

    Anyways, got a link to your site? I'm interested.

    Im making> @yoursunny said:

    @praburam said:
    So direct entry on the page will not render the google adsense , is that what you are saying ?

    Correct.

    Any mock in adsense code leads to ban

  • @bokuyaba said:
    Hi, I’m Chinese.

    Sorry, I don’t really have a solution either. I can’t handle those spam bots coming from them, either. I sometimes use spam bots myself—once someone puts countermeasures in place, I’ll just figure out a new way around them, so there’s basically no solution.

    But if it’s Chinese websites scraping and stealing your content, maybe you can add some anti-China content into your pages. That might work—some YouTubers do this.

    add photos of Trump

  • blip1945blip1945 Member
    edited December 2025

    @W1zzardTPU said:
    I've been experimenting with that, most of them are on 1G pipes only, that you can saturate if you have enough bandwidth, i.e. 10G+

    They will still keep coming though, but with latencies in the 10,000+ ms range

    This increased latency can be a nice signal to identify their pools of IPs, but you'll end up blocking hundreds of thousands of IPs and legit people will start emailing you "hey I'm blocked?" Turns out they share their home network IP somehow ("free" VPN, "free" money, malware, on some random device)

    The point of serving the tarpit is you don't block them so theres no issue of legit users being blocked. You just rate limit main content page and route to the tarpit if rate exceeded, or specifically add the tarpit endpoint to robots.txt and any bots that disrespect the robots.txt will crawl the tarpit.

    Thanked by 1tentor
  • @OpaqueRegistrant said: What's the actual problem?

    in my case the log file grew so big that the server crashed constantly, and thats a 5 Core EPYC 10GB VPS. so i would say that is an actual problem.

  • W1zzardTPUW1zzardTPU Member
    edited December 2025

    @blip1945 said:

    @W1zzardTPU said:
    I've been experimenting with that, most of them are on 1G pipes only, that you can saturate if you have enough bandwidth, i.e. 10G+

    They will still keep coming though, but with latencies in the 10,000+ ms range

    This increased latency can be a nice signal to identify their pools of IPs, but you'll end up blocking hundreds of thousands of IPs and legit people will start emailing you "hey I'm blocked?" Turns out they share their home network IP somehow ("free" VPN, "free" money, malware, on some random device)

    The point of serving the tarpit is you don't block them so theres no issue of legit users being blocked. You just rate limit main content page and route to the tarpit if rate exceeded, or specifically add the tarpit endpoint to robots.txt and any bots that disrespect the robots.txt will crawl the tarpit.

    Rate limit how? New IP for every single request. Some dumb bots even use a new IP for HTTP redirects which makes them easier to detect, but also affects real users switching networks on the move

    and no you can't ratelimit the ASN either, because residential IPs

  • @W1zzardTPU said: Turns out they share their home network IP somehow ("free" VPN, "free" money, malware, on some random device)

    i tried to research this, it seems there are apps that pay users to resell their actual home isp connection, so there are now (premium priced?) scraping services that use actual residential IPs located in actual homes.
    no way of blocking them, because two days later the same IP may be assigned to a legit user.

    anyone got more insight on this and how to deal with it?

  • yoursunnyyoursunny Member, IPv6 Advocate
    edited December 2025

    @praburam said:

    @yoursunny said:

    @praburam said:
    So direct entry on the page will not render the google adsense , is that what you are saying ?

    Correct.

    Any mock in adsense code leads to ban

    Use server side condition:

    <%
    If VibeCheck(Request) Then
      Response.Write ADSENSE_SNIPPET
    Else
      Response.Write "You cannot view ads. Please face the wall and think about your life choices."
    End If
    %>
    

    All your content remains accessible to everyone.
    AdSense is only visible to those who made good life choices and passed the vibe check.

  • Typically, the request header information from bots differs significantly from that of regular users. You can start by implementing rules to block these headers, which should address at least 80% of bots. For the remainder, you can use a WAF (Web Application Firewall) for blocking and bot detection/validation. Complete blocking isn't entirely realistic, but this approach should at least resolve your AdSense issues.

  • zbezbe Member
    edited December 2025

    @bokuyaba said:
    Hi, I’m Chinese.

    Sorry, I don’t really have a solution either. I can’t handle those spam bots coming from them, either. I sometimes use spam bots myself—once someone puts countermeasures in place, I’ll just figure out a new way around them, so there’s basically no solution.

    But if it’s Chinese websites scraping and stealing your content, maybe you can add some anti-China content into your pages. That might work—some YouTubers do this.

    hi bro u can Building shadowsocks shared china customer mjj Repatriation to gfw Implement denial of crawling; and I look forward to your sharing to telegram ygkkk group or ticket idc need china gfw blocked ip

    Thanked by 2sillycat jnd
  • Maybe install crowdsec and add some aggressive custom scenarios?

  • @mrTom said:

    @OpaqueRegistrant said: What's the actual problem?

    in my case the log file grew so big that the server crashed constantly, and thats a 5 Core EPYC 10GB VPS. so i would say that is an actual problem.

    After the first 1-2 times, did you think about logrotate?

    Thanked by 1sillycat
  • @TimboJones said: After the first 1-2 times, did you think about logrotate?

    guess what, thats when it crashed...

    its a catalogue type site with ~10000 items in >15 languages. usually no problem, did run on a free webspace without issues for years, but when multiple bots scrape every single page in every language 24/7, its generates a huge load for zero in return.

  • @mrTom said:

    @OpaqueRegistrant said: What's the actual problem?

    in my case the log file grew so big that the server crashed constantly, and thats a 5 Core EPYC 10GB VPS. so i would say that is an actual problem.

    So rotate your logs more often and the problem is solved

  • aRNoLDaRNoLD Member
    edited December 2025

    I set a rule in cloudflare security to block traffic from China.

    Then, activate the bot fighting mode.

    Finally, I'm under attack mode enabled.

  • FourplexFourplex Member, Host Rep

    @aRNoLD said:
    I set a rule in cloudflare security to block traffic from China.

    Then, activate the bot fighting mode.

    Finally, I'm under attack mode enabled.

    This.

    While I don't use Cloudflare, this is essential.

  • According to my Cloudflare WAF logs, there are always lots of spam requests from IPs of Microsoft.

    They keep requesting even got rejected by WAF rules, or even switch IP between different datacenters.

  • @vpssh said: According to my Cloudflare WAF logs, there are always lots of spam requests from IPs of Microsoft.

    lots of scrapers use IPs from Microsoft and Google, i guess because they hope people will not dare to block them.

    Thanked by 2tentor oloke
  • JustPfffJustPfff Member
    edited December 2025

    @davide said: I cannot use the big nuke against him or innocent civilians would be blasted too.

    You can make JavaScript script code, to identify if it's real device connected to you page or just bot runs on a virtual machine, that will be stronger than CloudFlare captcha.

  • I believe the copyright for winnie the pooh animation has expired so you are free to plaster the page with pooh references. im being serious. hope it helps

  • @davide said:
    I have some guy that for months has been scraping VPS Price Tracker from over 200 ASNs each with at least /24 addresses. I blocked 100,000 addresses and he pops back up from mobile networks, residential, proxies ... looks to be some rented scraping service. Because of the residential ASNs I cannot use the big nuke against him or innocent civilians would be blasted too.

    i want to hyperblast a ISP / ASN, how can i do it?> @gremeyer said:

    Cloudflare UAM (Under Attack Mode) might solve it but it's a nuclear option.

    i like nuclear options!

  • @KASSA said:
    Many Chinese scrapers originate from Tencent, Alibaba, or ByteDance servers. Create a WAF rule to Challenge or Block traffic from these Autonomous System Numbers:

    Tencent: AS45090, AS132203

    ByteDance (TikTok/Douyin): AS138699, AS13045

    Alibaba: AS37963, AS45102

    how can i block complete ASN?

    Thanked by 1384_cz
  • u can try the country blocked

  • @davide said:
    I have some guy that for months has been scraping VPS Price Tracker from over 200 ASNs each with at least /24 addresses. I blocked 100,000 addresses and he pops back up from mobile networks, residential, proxies ... looks to be some rented scraping service. Because of the residential ASNs I cannot use the big nuke against him or innocent civilians would be blasted too.

    Oh my I didn't know that vpspricetracker was created by you, its an awesome website, thank you for creating it :p

    Another thing but shame that its been >100_000 addresses

    This may seem offtopic but I had once thought of scraping vpspricetracker too but ultimately decided not too but I am wondering if there is a way that we can download the whole database/ (sqlite database?) if possible on a public endpoint since i would love to view vpspricetracker database locally/ do queries and similar stuff with it if you don't mind, its an awesome website and thank you for creating it!

  • @384_cz said:

    No, please no javascript BS blocking Lynx users

    I read some guy once who said he was getting hammered with 600_000 requests (scrape?) requests on its git server and after using anubis, it went down to 600

    He said that although he doesnt like anubis as a solution, he said that its understandable because his fake traffic got reduced 1000x

    Food for thought

  • edited December 2025

    @davide said:
    I have some guy that for months has been scraping VPS Price Tracker from over 200 ASNs each with at least /24 addresses. I blocked 100,000 addresses and he pops back up from mobile networks, residential, proxies ... looks to be some rented scraping service. Because of the residential ASNs I cannot use the big nuke against him or innocent civilians would be blasted too.

    Why don't you reach out to him and say for $200 per month you'll give him all the data. He's probably spending that much on proxies anyway.

    Or the evil version: instead of blocking, feed false data.

  • @OpaqueRegistrant said: He's probably spending that much on proxies anyway.

    Not really. Proxies are 0.3-0.8/gb at scale, and I don't think his webpages are that big.

  • @sillycat said:

    @OpaqueRegistrant said: He's probably spending that much on proxies anyway.

    Not really. Proxies are 0.3-0.8/gb at scale, and I don't think his webpages are that big.

    Then there is another idea: when a proxy is detected, insert a GB of whitespace.

  • hyperblasthyperblast Member
    edited December 2025

    blocking ASN with cloudflare's free plan works well. however, is there a trick i can use to display a custom error message to “visitors” from the defined ASN without having to switch to a paid cloudflare plan?

Sign In or Register to comment.