Help..! Chinese bots scraping my page

blip1945 · December 2025

Embrace their scraping. Serve endless auto generated markov tarpit garbage page. Theres nepenthes and few others for that.

W1zzardTPU · December 2025

@blip1945 said:
Serve endless auto generated markov tarpit garbage page

I've been experimenting with that, most of them are on 1G pipes only, that you can saturate if you have enough bandwidth, i.e. 10G+

They will still keep coming though, but with latencies in the 10,000+ ms range

This increased latency can be a nice signal to identify their pools of IPs, but you'll end up blocking hundreds of thousands of IPs and legit people will start emailing you "hey I'm blocked?" Turns out they share their home network IP somehow ("free" VPN, "free" money, malware, on some random device)

praburam · December 2025

@sillycat said:

@praburam said: my adsense RPM

Oh no! The AI slop isn't making any money! Just don't serve ads to Chinese users.

Anyways, got a link to your site? I'm interested.

Im making> @yoursunny said:

@praburam said:
So direct entry on the page will not render the google adsense , is that what you are saying ?

Correct.

Any mock in adsense code leads to ban

cold · December 2025

@bokuyaba said:
Hi, I’m Chinese.

Sorry, I don’t really have a solution either. I can’t handle those spam bots coming from them, either. I sometimes use spam bots myself—once someone puts countermeasures in place, I’ll just figure out a new way around them, so there’s basically no solution.

But if it’s Chinese websites scraping and stealing your content, maybe you can add some anti-China content into your pages. That might work—some YouTubers do this.

add photos of Trump

blip1945 · December 2025

@W1zzardTPU said:
I've been experimenting with that, most of them are on 1G pipes only, that you can saturate if you have enough bandwidth, i.e. 10G+

They will still keep coming though, but with latencies in the 10,000+ ms range

This increased latency can be a nice signal to identify their pools of IPs, but you'll end up blocking hundreds of thousands of IPs and legit people will start emailing you "hey I'm blocked?" Turns out they share their home network IP somehow ("free" VPN, "free" money, malware, on some random device)

The point of serving the tarpit is you don't block them so theres no issue of legit users being blocked. You just rate limit main content page and route to the tarpit if rate exceeded, or specifically add the tarpit endpoint to robots.txt and any bots that disrespect the robots.txt will crawl the tarpit.

mrTom · December 2025

@OpaqueRegistrant said: What's the actual problem?

in my case the log file grew so big that the server crashed constantly, and thats a 5 Core EPYC 10GB VPS. so i would say that is an actual problem.

W1zzardTPU · December 2025

@blip1945 said:

@W1zzardTPU said:
I've been experimenting with that, most of them are on 1G pipes only, that you can saturate if you have enough bandwidth, i.e. 10G+

They will still keep coming though, but with latencies in the 10,000+ ms range

This increased latency can be a nice signal to identify their pools of IPs, but you'll end up blocking hundreds of thousands of IPs and legit people will start emailing you "hey I'm blocked?" Turns out they share their home network IP somehow ("free" VPN, "free" money, malware, on some random device)

The point of serving the tarpit is you don't block them so theres no issue of legit users being blocked. You just rate limit main content page and route to the tarpit if rate exceeded, or specifically add the tarpit endpoint to robots.txt and any bots that disrespect the robots.txt will crawl the tarpit.

Rate limit how? New IP for every single request. Some dumb bots even use a new IP for HTTP redirects which makes them easier to detect, but also affects real users switching networks on the move

and no you can't ratelimit the ASN either, because residential IPs

mrTom · December 2025

@W1zzardTPU said: Turns out they share their home network IP somehow ("free" VPN, "free" money, malware, on some random device)

i tried to research this, it seems there are apps that pay users to resell their actual home isp connection, so there are now (premium priced?) scraping services that use actual residential IPs located in actual homes.
no way of blocking them, because two days later the same IP may be assigned to a legit user.

anyone got more insight on this and how to deal with it?

yoursunny · December 2025

@praburam said:

@yoursunny said:

@praburam said:
So direct entry on the page will not render the google adsense , is that what you are saying ?

Correct.

Any mock in adsense code leads to ban

Use server side condition:

<%
If VibeCheck(Request) Then
  Response.Write ADSENSE_SNIPPET
Else
  Response.Write "You cannot view ads. Please face the wall and think about your life choices."
End If
%>

All your content remains accessible to everyone.
AdSense is only visible to those who made good life choices and passed the vibe check.

tdy0923 · December 2025

Typically, the request header information from bots differs significantly from that of regular users. You can start by implementing rules to block these headers, which should address at least 80% of bots. For the remainder, you can use a WAF (Web Application Firewall) for blocking and bot detection/validation. Complete blocking isn't entirely realistic, but this approach should at least resolve your AdSense issues.

zbe · December 2025

@bokuyaba said:
Hi, I’m Chinese.

Sorry, I don’t really have a solution either. I can’t handle those spam bots coming from them, either. I sometimes use spam bots myself—once someone puts countermeasures in place, I’ll just figure out a new way around them, so there’s basically no solution.

But if it’s Chinese websites scraping and stealing your content, maybe you can add some anti-China content into your pages. That might work—some YouTubers do this.

hi bro u can Building shadowsocks shared china customer mjj Repatriation to gfw Implement denial of crawling; and I look forward to your sharing to telegram ygkkk group or ticket idc need china gfw blocked ip

b00n · December 2025

Maybe install crowdsec and add some aggressive custom scenarios?

TimboJones · December 2025

@mrTom said:

@OpaqueRegistrant said: What's the actual problem?

in my case the log file grew so big that the server crashed constantly, and thats a 5 Core EPYC 10GB VPS. so i would say that is an actual problem.

After the first 1-2 times, did you think about logrotate?

mrTom · December 2025

@TimboJones said: After the first 1-2 times, did you think about logrotate?

guess what, thats when it crashed...

its a catalogue type site with ~10000 items in >15 languages. usually no problem, did run on a free webspace without issues for years, but when multiple bots scrape every single page in every language 24/7, its generates a huge load for zero in return.

OpaqueRegistrant · December 2025

@mrTom said:

@OpaqueRegistrant said: What's the actual problem?

in my case the log file grew so big that the server crashed constantly, and thats a 5 Core EPYC 10GB VPS. so i would say that is an actual problem.

So rotate your logs more often and the problem is solved

aRNoLD · December 2025

I set a rule in cloudflare security to block traffic from China.

Then, activate the bot fighting mode.

Finally, I'm under attack mode enabled.

Fourplex · December 2025

@aRNoLD said:
I set a rule in cloudflare security to block traffic from China.

Then, activate the bot fighting mode.

Finally, I'm under attack mode enabled.

This.

While I don't use Cloudflare, this is essential.

vpssh · December 2025

According to my Cloudflare WAF logs, there are always lots of spam requests from IPs of Microsoft.

They keep requesting even got rejected by WAF rules, or even switch IP between different datacenters.

mrTom · December 2025

@vpssh said: According to my Cloudflare WAF logs, there are always lots of spam requests from IPs of Microsoft.

lots of scrapers use IPs from Microsoft and Google, i guess because they hope people will not dare to block them.

JustPfff · December 2025

@davide said: I cannot use the big nuke against him or innocent civilians would be blasted too.

You can make JavaScript script code, to identify if it's real device connected to you page or just bot runs on a virtual machine, that will be stronger than CloudFlare captcha.

eb1995 · December 2025

I believe the copyright for winnie the pooh animation has expired so you are free to plaster the page with pooh references. im being serious. hope it helps

hyperblast · December 2025

@davide said:
I have some guy that for months has been scraping VPS Price Tracker from over 200 ASNs each with at least /24 addresses. I blocked 100,000 addresses and he pops back up from mobile networks, residential, proxies ... looks to be some rented scraping service. Because of the residential ASNs I cannot use the big nuke against him or innocent civilians would be blasted too.

i want to hyperblast a ISP / ASN, how can i do it?> @gremeyer said:

Cloudflare UAM (Under Attack Mode) might solve it but it's a nuclear option.

i like nuclear options!

hyperblast · December 2025

@KASSA said:
Many Chinese scrapers originate from Tencent, Alibaba, or ByteDance servers. Create a WAF rule to Challenge or Block traffic from these Autonomous System Numbers:

Tencent: AS45090, AS132203

ByteDance (TikTok/Douyin): AS138699, AS13045

Alibaba: AS37963, AS45102

how can i block complete ASN?

AdamWilliam · December 2025

u can try the country blocked

whynotlearn · December 2025

@davide said:
I have some guy that for months has been scraping VPS Price Tracker from over 200 ASNs each with at least /24 addresses. I blocked 100,000 addresses and he pops back up from mobile networks, residential, proxies ... looks to be some rented scraping service. Because of the residential ASNs I cannot use the big nuke against him or innocent civilians would be blasted too.

Oh my I didn't know that vpspricetracker was created by you, its an awesome website, thank you for creating it

Another thing but shame that its been >100_000 addresses

This may seem offtopic but I had once thought of scraping vpspricetracker too but ultimately decided not too but I am wondering if there is a way that we can download the whole database/ (sqlite database?) if possible on a public endpoint since i would love to view vpspricetracker database locally/ do queries and similar stuff with it if you don't mind, its an awesome website and thank you for creating it!

whynotlearn · December 2025

@384_cz said:

@ShadowLurker said:
tried anubis ?

No, please no javascript BS blocking Lynx users

I read some guy once who said he was getting hammered with 600_000 requests (scrape?) requests on its git server and after using anubis, it went down to 600

He said that although he doesnt like anubis as a solution, he said that its understandable because his fake traffic got reduced 1000x

Food for thought

OpaqueRegistrant · December 2025

@davide said:
I have some guy that for months has been scraping VPS Price Tracker from over 200 ASNs each with at least /24 addresses. I blocked 100,000 addresses and he pops back up from mobile networks, residential, proxies ... looks to be some rented scraping service. Because of the residential ASNs I cannot use the big nuke against him or innocent civilians would be blasted too.

Why don't you reach out to him and say for $200 per month you'll give him all the data. He's probably spending that much on proxies anyway.

Or the evil version: instead of blocking, feed false data.

sillycat · December 2025

@OpaqueRegistrant said: He's probably spending that much on proxies anyway.

Not really. Proxies are 0.3-0.8/gb at scale, and I don't think his webpages are that big.

OpaqueRegistrant · December 2025

@sillycat said:

@OpaqueRegistrant said: He's probably spending that much on proxies anyway.

Not really. Proxies are 0.3-0.8/gb at scale, and I don't think his webpages are that big.

Then there is another idea: when a proxy is detected, insert a GB of whitespace.

hyperblast · December 2025

blocking ASN with cloudflare's free plan works well. however, is there a trick i can use to display a custom error message to “visitors” from the defined ASN without having to switch to a paid cloudflare plan?

Howdy, Stranger!

Categories

In this Discussion

Help..! Chinese bots scraping my page

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Help..! Chinese bots scraping my page

Comments