Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


New type of spam? Getting URL seeding from Indonesia
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

New type of spam? Getting URL seeding from Indonesia

So I have been trying to figure out what the F is going on with a few results pages that google has cached for my website. Im seeing a stupid large spike with inbound traffic hitting urls like

model-kebaya-modern-untuk-ibu-hamil.html/model-baju-kebaya-untuk-hamil

and variants. They started out in Indonesia, but now are global. They always have the shittest connections and are fucking my analytics up bad. I dont want to filter or block this until I get a better handle on it though. Anyone else experiencing the same?

«1

Comments

  • I have seen this before, it's not new by any means but it's an annoyance.

    Just curious, is it only google indexing them non existent pages or are others doing so as well (bing, etc)?

  • @Jacob Bing is not. Actually bing is like 3 weeks behind so it might show later. Yahoo is also indexing the crap links.

    What is the purpose of this do you know? Its so weird. I have not seen it with other sites. From what I can tell its not a security risk but could be an issue with SEO/SEM

  • Remove the Google Analytics include in 404 error pages :)

    Thanked by 2Jacob doghouch
  • Yeah it might wreck your PR, as it's crap and dead links - although this should be picked up initially anyway.

    In terms of Google, I'm aware that you can remove individual back link domains (disavow), however I'm not sure about individual index results.. it might just be a case of waiting.

    There's two approaches;

    If you're using a CMS - add a htaccess rule searching for '.html' in the requesting URI, redirect if found.

    Guidance: http://stackoverflow.com/questions/3220377/htaccess-redirect-if-url-contains-a-certain-string

    Find the common location pattern, block the countries entirely, if it's not your audience then you may as well get rid,

    I'm not aware of anything else that can be done really.

  • @luissousa said:
    Remove the Google Analytics include in 404 error pages :)

    Okay... LMFAO at myself here. So I removed html as a type from my nginx config. Problem in seat here. Not in computer. Conspiracy closed.

  • Probably just a variant of referer spam. As long as you're returning 404 then Google will typically understand, though Google doesn't index 404's so you should double check that, especially if it's a soft 404.

  • @ricardo said:
    Probably just a variant of referer spam. As long as you're returning 404 then Google will typically understand, though Google doesn't index 404's so you should double check that, especially if it's a soft 404.

    Yea turns out I was the issue. I was not returning ANY 404's due to overly paranoid nginx conf.

  • or you can specify your best converting page as a 404 and 403 page I'm yet to implement.

  • <3 Indonesia

  • You can also try to block their IPs

  • Gorbvita said: You can also try to block their IPs

    Whose? Google discovers these links via hyperlinks or other means... there's no visitor involved other than Google.

  • Jangan khawatir. Saya tahu bahasa Indonesia. Korsi Hosting :D @ChairHosting

    Thanked by 1fazar
  • @ricardo said:

    Im pretty sure these hyperlinks had some old affiliation with the IP of the current server somehow. At least thats my working guess. Its a pita however that dirtied up my analytics really bad. It is however odd that it just kinda "started" one day. Not like the ip or site is new.

  • I am seeing the same thing on one of my domains. It was just after a WordPress hack attempt. The site was not hacked but seeding began the next day. I have been returning 404s but Google seems to be index a lot. It stopped a month ago then yesterday I received a notice of 14,500 404s from Google. They are all porn and sitemap links. I know it is not old affiliation of the IP as I have had it for years and the IP is shared so the default is a different site.

    What I do not understand is why about 200 of the 404 URLs end up in Google index but the other 14k+ do not.

    Since the site does not use any .html files a buddy suggested blocking all .html attempts. Anyone know if this is good or bad?

    -G

  • What I do not understand is why about 200 of the 404 URLs end up in Google index but the other 14k+ do not.

    Grep your log files. A lot of these WP hacks will cloak for a Googlebot IP, host or user-agent, so everyone sees 404 apart from Google.

  • @ricardo I think I am following you here.... I did check log files. Everything shows a 404. Also after the hack attempt I recovered from an earlier image to be safe and the vulnerable plug-in was patched before going live. Since the site is very small I was able to look at everything.

    After a good bit of research I did find a number of sites with the domain/ the porn .html links listed. Some sites had thousands. But not much I can do with that.

  • That's all that matters. Generally you're looking at attacks to put content/links/malware on your site, but more obscurely there's other reasons like messing around with Google's knowledge graph.

    As long as you send the appropriate HTTP response and aware of what content is being served, the rest is just noise.

  • Ill chime back in here. I backtraced and found that indeed a hacked old wordpress site was associated with one of the edge IP's. My solution, since I run an ecommerce site, was to direct all these matches for muslim garb to a belly dancers hip scarf! LOL I have even seen a few conversions so far. Priceless backfire.

    Thanked by 1raindog308
  • @jollymon great idea. When this first happened I was getting hit hard from a lot of people in a few countries looking for the pages that were not there. I started using GeoIP and sending them to their National Police (aka FBI type) page.

    @ricardo Thanks for the input, I think I was worried about it more than my client as it is a locally targeted website.

  • I want to necro this for good reason, dont shoot me.

    So IF you are seeing this kind of random traffic, dont assume that it will go away. In fact my situation got MUCH MUCH worse! Turns out the Indo-asshole pointed their domain to cloudflare and somehow was able to get all the new indexed pages indexed under THEIR domain name. 99K plus new pages are all now associated with that domain. Still have my logo and complete copy of my content all over them. So now I am in a DMCA loop with Cloudflare and Google over this, plus I setup a 301 redirect which does handle the content.

    This is bad as over the past 2 weeks my sites index in google has stalled. Their index HAS GROWN and now that content is associated in the eyes of google as being original to that domain. My domain DOES NOT show up for any of the KW search terms they are indexed for.

    FUCK!

  • Nginx reverse proxied you? Like http://lowendtalk.me?

    jollymon said: Turns out the Indo-asshole pointed their domain to cloudflare and somehow was able to get all the new indexed pages indexed under THEIR domain name.

  • @GM2015 said:
    Nginx reverse proxied you? Like http://lowendtalk.me?

    >

    Reverse Proxied me yes, but they somehow got cloudflare to do that for them...so since cloudflare is so fast, guess what Gbot would rather crawl.

  • GM2015GM2015 Member
    edited February 2016

    Are you on wordpress? There's a good plugin called google xml sitemaps that will ping the shit out of googlebots.

    There's a manual function that's guaranteed to get your sitemap crawled. Watch your server log for proof.

    jollymon said: Reverse Proxied me yes, but they somehow got cloudflare to do that for them...so since cloudflare is so fast, guess what Gbot would rather crawl.

  • @GM2015 said:
    Are you on wordpress? There's a good plugin called google xml sitemaps that will ping the shit out of googlebots.

    There's a manual function that's guaranteed to get your sitemap crawled. Watch your server log for proof.

    No its a complete groundup solution here. I can write something to ping google with my sitemaps, but its about 10 sitemaps with nearly 750K entries. Google wont G-rawl me at the speed I would like them to.

    You know what I forgot however, Rel=Can tags.

  • You can probably fuck them off when their reverse proxy updates their site with the new can rel tags.

    I believe you can move sites with them without 301s.

    jollymon said: You know what I forgot however, Rel=Can tags.

  • @jollymon said:
    I want to necro this for good reason, dont shoot me.

    Indo-asshole

    I'm 100% understand about your frustation, but please never use Indo-Asshole in that terms,

    cause we don't know for sure, if he/she is true Indonesian People..

    and not all Indonesian People is ASSHOLE..

    maybe he/she is foreigner doing some black seo for indonesian people or

    maybe foreigner only using Indonesia Proxy

    cause in LET history there is a foreigner who live in Bandung Indonesia and always make trouble with LET Community.

    Just my 2 cents, cheers :D

    Thanked by 1namhuy
  • jollymonjollymon Member
    edited February 2016
    >I'm 100% understand about your frustation, but please never use Indo-Asshole in that >terms,
    >cause we don't know for sure, if he/she is true Indonesian People..
    

    I have the forensics on some attempts here, its the same person and IP. Dont feel disparaged by my remark. If the person was from any other country including my own they would garner the same endearment from me. The fact is that in the past 16 years of my work, about 50% of the issues with attackers originate from eastern Europe and the remaining 50% from Asia pacific regions. Its a well earned reputation.

    >and not all Indonesian People is ASSHOLE..
    

    I certainly did not say that.

    >maybe he/she is foreigner doing some black seo for indonesian people or 
    >maybe foreigner only using Indonesia Proxy
    

    right....well earned reputation

    >cause in LET history there is a foreigner who live in Bandung Indonesia and always make >trouble with LET Community.
    

    What a coincidence! Who would have guessed the same major hub for Asian pacific troubles would be the same one that is causing me issues.

    >Just my 2 cents, cheers :D
    

    Cheers 2 you also.

    Thanked by 1Ndha
  • I just have the

    ngx-test cookie

    module compiled into NGINX to weed out bots, reverse proxies, search engines (only Baidu and Yandex), layer 7 attacks and POST/GET attacks since a challenge page is displayed. Basically, if someone mirrors the site, it would give them a nice smexy redirect back to the origin (which in this case is my site).

    Thanked by 1Rolter
  • @doghouch said:
    I just have the

    ngx-test cookie

    module compiled into NGINX to weed out bots, reverse proxies, search engines (only Baidu and Yandex), layer 7 attacks and POST/GET attacks since a challenge page is displayed. Basically, if someone mirrors the site, it would give them a nice smexy redirect back to the origin (which in this case is my site).

    I really do love nginx

  • doghouchdoghouch Member
    edited February 2016

    The company I work at (I won't advertise the link) uses it to weed off layer7 attacks. It does a good job since it practically blocks everything unless your browser supports JavaScript/headers and cookies :)

    It is also set to validate headers sent to 1) verify that it isn't a bot and 2) works as a fallback in case JavaScript is disabled.

Sign In or Register to comment.