Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


discouraging http bot probes on sever
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

discouraging http bot probes on sever

Hi,

I get steady probes to my http server by what appears to be bots. Nothing that is causing any disruption of service. My question, what is the best way to discourage this? They seem to be GETing the same pages over and over, which do not have any meaningful content on my server. Should I just have those routes respond with a 404, or is there another status that is better to convince the bots to move on?

Thanks,

-Adam

Comments

  • Did you install fail2ban?

    Thanked by 1impossiblystupid
  • Return code 418 on requests, along with a image of a teapot. Works everytime, add the words "CONFIDENTIAL GOVERNMENT PUNISHMENT PENALTY AUTHORIZED USERS ONLY" lots of bots will flag sites like that as government and not to scan again

  • If you can clearly identify them as bots based on their user agent redirect them to localhost ;)

  • If you can clearly identify them as bots based on their user agent redirect them to localhost ;)

    I like this :)

  • @AdamM said:
    They seem to be GETing the same pages over and over, which do not have any meaningful content on my server. Should I just have those routes respond with a 404, or is there another status that is better to convince the bots to move on?

    It depends on what bots are trying to get at what resources. As I've noted on another thread, I think that common/expected files should get a 2xx code when you simply have no content to return. For spiders that come in scanning for things like PHP exploits, though, I let fail2ban stop the immediate abuse, and then I drop the whole network of the ones that bother me too much directly into the firewall. A 404 isn't going to stop a bot that's intentionally behaving badly.

  • Some people go for the whitelist approach. Normally you'd just want Googlebot, Bing, Yandex, possibly the Archive bot to come grab your stuff. Ban everything else, especially so if they didn't look for or obey robots.txt If you're feeling extra stingy, you'd want to do a reverse and forward DNS lookup for new IPs to ensure the User Agent is what it says it is.

  • joepie91joepie91 Member, Patron Provider

    If they're existing resources, just let them crawl them. You can always ban individual user agents / crawling patterns if they become problematic. If they're non-existent resources, they should be returning a 404 anyway, regardless of bot behaviour.

    @ricardo said:
    Some people go for the whitelist approach. Normally you'd just want Googlebot, Bing, Yandex, possibly the Archive bot to come grab your stuff. Ban everything else, especially so if they didn't look for or obey robots.txt If you're feeling extra stingy, you'd want to do a reverse and forward DNS lookup for new IPs to ensure the User Agent is what it says it is.

    That's really not a good idea, and is just going to lead to people pretending to be browsers. No need to help grow monopolies.

  • Oh, I was just paraphrasing someone I know who's dealt with this kind of thing for 15 years. You'd also want to set up a spider trap.

  • Redirect those bots via HTTP 302 to the biggest file you can find on the internet.

    Thanked by 1ATHK
  • sinsin Member
    edited October 2016

    For my nginx installs I setup Fail2ban. I used this setup: https://petermolnar.net/secure-wordpress-with-nginx-and-fail2ban/ as a base and then I tuned and added in my own custom filters and it works really well.

  • I know it's not that effective but give robots.txt a go as well.
    It might stop at least a few of them if they are obedient.

Sign In or Register to comment.