Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Are you web scraping? How can VPN.fail help? :)
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Are you web scraping? How can VPN.fail help? :)

vpnfailvpnfail Member
edited October 2023 in General

As you know, VPN.fail is a project dedicated to helping internet users from countries such as Iran, Russia or China bypass censorship and restrictions and access the free global internet.

Despite our main focus being anti-censorship technologies, our network inevitably gets used also by people from countries where the internet is free - we assume they just want to change their IP address.

So we're considering upgrading our VPN servers to offer a friendly solution for people who are web scraping.

So if you're running a project scraping the web, we have some questions for you to help us better understand how to help you:

  • How many unique IPs do you need? How often do you change them?

  • How many locations (and what locations) are you utilizing?

  • What type of IPs are you using: data center, residential, or mobile?

  • Tell us about the tech stack you prefer. Do you connect directly to SOCKS/HTTP proxy, do you prefer using a web API, or other technologies?

  • What volumes do you generate when scraping? Traffic / Requests per day / hour

  • What types of websites do you target for your scraping activity?

  • Last but not least, to ensure the scraping can be done without legal issues - what specific data are you interested in scraping?

Thank you 🙏

Comments

  • Welcome to aboard 🚀

  • @Ganonk said:
    Welcome to aboard 🚀

    🚀🚀🚀 thanks!

  • yoursunnyyoursunny Member, IPv6 Advocate

    @vpnfail said:
    How many unique IPs do you need? How often do you change them?

    Billions.
    Change IP after each request.

    How many locations (and what locations) are you utilizing?

    193 countries and 9999 cities.

    What type of IPs are you using: data center, residential, or mobile?

    Cellular.

    Tell us about the tech stack you prefer. Do you connect directly to SOCKS/HTTP proxy, do you prefer using a web API, or other technologies?

    WireGuard.

    What volumes do you generate when scraping? Traffic / Requests per day / hour

    112 Kbps.

    What types of websites do you target for your scraping activity?

    Dealz.

    Last but not least, to ensure the scraping can be done without legal issues - what specific data are you interested in scraping?

    CP.

    Central Park

  • If you invite scrapers to your network you will risk that your human users will be blocked from the sites they want to visit.

  • This looks like a genuine call for feedback, so I'll give my honest replies.

    How many unique IPs do you need? How often do you change them?

    Only a handful, really. Something in the range of 10-20 perhaps.
    It depends on how often you plan to reuse different IPs.
    If it's a pool of 20 different IP addresses, shared between all users, they can quickly be marked by various tools.

    How many locations (and what locations) are you utilizing?

    Northern European. Amsterdam and Frankfurt mainly.
    This is just because of my servers locations.

    What type of IPs are you using: data center, residential, or mobile?

    Datacenter is fine.
    Residential would open the possibilities for way more use cases, but for the regular scraping I'm doing, a transparent proxy is fine.

    Tell us about the tech stack you prefer. Do you connect directly to SOCKS/HTTP proxy, do you prefer using a web API, or other technologies?

    Socks5 is absolutely preferred. This is the most universal, and allows for connecting in many different ways.

    What volumes do you generate when scraping? Traffic / Requests per day / hour

    Very dependent on what I'm actually scraping, but for low request count 1-5 per day, and high count 15-20 per minute.

    What types of websites do you target for your scraping activity?

    Online stores and social media.


    I'd like to add that if you plan to start selling premium subscriptions, a good way to get an initial load of cash is to do a Lifetime deal promotion.
    Basically selling lifetime licenses to the first front runners, which will in turn help give you feedback.
    I know this is how I prefer to pay for VPNs at least.

    Thanked by 1vpnfail
  • @mrTom said:
    If you invite scrapers to your network you will risk that your human users will be blocked from the sites they want to visit.

    thanks - this is a very valid point. we're considering deploying separate IP addresses for scrapers, as they usually need big number of IPs. regular VPN users don't mind sharing the same IP, it actually improves their privacy/anonymity

  • @Foxx said:
    This looks like a genuine call for feedback, so I'll give my honest replies.

    highly appreciate response 🙏 it's very helpful!

    If it's a pool of 20 different IP addresses, shared between all users, they can quickly be marked by various tools.

    fair point about sharing few IPs with lots of users. IPs should ideally be dedicated to each user. it could work with sharing the same IP between a very small number of users, if their use cases are compatible and don't generate trouble between eachother

    Northern European. Amsterdam and Frankfurt mainly.
    This is just because of my servers locations.

    europe and US are good for bandwidth costs 👍 asia is what we see a bit challenging

    Datacenter is fine.
    Residential would open the possibilities for way more use cases, but for the regular scraping I'm doing, a transparent proxy is fine.

    we are thinking to start testing with some extra IPs which will be of course data center IPs

    Socks5 is absolutely preferred. This is the most universal, and allows for connecting in many different ways.

    would go for socks5 myself, but still curious how popular API based solutions are

    Very dependent on what I'm actually scraping, but for low request count 1-5 per day, and high count 15-20 per minute.

    your usage sounds more than decent!

    I'd like to add that if you plan to start selling premium subscriptions, a good way to get an initial load of cash is to do a Lifetime deal promotion.
    Basically selling lifetime licenses to the first front runners, which will in turn help give you feedback.
    I know this is how I prefer to pay for VPNs at least.

    thanks - duly noted!

  • @yoursunny said:

    How many locations (and what locations) are you utilizing?

    193 countries

    good to know you don't need a proxy in the Vatican!

    Thanked by 1yoursunny
  • sillycatsillycat Member
    edited October 2023

    @yoursunny said: Change IP after each request.

    FYI, IPv6 proxies exist & changing IP after each request isn't something outlandish for any type of proxy.

  • Banning by a /64 isnt unheard of, infact I think its kinda customary.

    Thanked by 2analog vpnfail
  • @vpnfail said: How many unique IPs do you need? How often do you change them?

    If I can get my hands on residential IPs, then most of times, even two or three of them were sufficient for my needs.

  • @vpnfail said: How many unique IPs do you need? How often do you change them?

    1-2 IP per country.

    How many locations (and what locations) are you utilizing?

    6 Europe, 4 US, 4 APAC.

    What type of IPs are you using: data center, residential, or mobile?

    residential, I prefer to pay a friend for keeping my orangepi / rooted phone / nuc in their place, and then use them as exit node. but i do have backup provider in case i need something in sudden manner. there are devices that being donated for use on me, but that doesn't count as my own property.
    datacenter IP / LET vps are for hosting api endpoints.

    Tell us about the tech stack you prefer. Do you connect directly to SOCKS/HTTP proxy, do you prefer using a web API, or other technologies?

    each client acting on it's own independently using preconfigured setup (just some standard headless chrome, and archivebox instances), there are api endpoints available in case the scraper client looking for "tasks". output data are sent using filebeat to elasticsearch. traffic are mostly over PPTP / wireguard.
    I like the flexibility for soc devices. orange pi for example, I created my own custom armbian with specific bootstrap script a-la tailscale setup. plus they're cheap.

    What volumes do you generate when scraping? Traffic / Requests per day / hour

    20-30 request per day for website per IP, 1440 request for dns query per IP.

    What types of websites do you target for your scraping activity?

    • website that known for pusing A/B testing publicly / silently.
    • random website, just to look for securitytxt(.)org.
    • image boards, but keeps text only.
    • this last one isn't exactly website but i do gather data about country-wide internet censorship (like dns poisoning / ssl hijack, remember DigiNotar?).

    Last but not least, to ensure the scraping can be done without legal issues - what specific data are you interested in scraping?

    don't really care about legal issues, everyone that working in my scraping project has agreed not to share the scraped data to the public. we only needs texts, no images. then again, the data is accessible in public to begin with.
    we don't have any form of cookie/session stuffing to see data that behind login gate. and if it's recaptcha or hcaptcha, we just outsource them to some indian captcha services.

  • How many unique IPs do you need? How often do you change them?

    The more the better, to rotate them per request

    How many locations (and what locations) are you utilizing?

    Not particular for my kinds of project

    What type of IPs are you using: data center, residential, or mobile?

    Residential preferably, but seems any other would be fine for some of the target sites

    Tell us about the tech stack you prefer. Do you connect directly to SOCKS/HTTP proxy, do you prefer using a web API, or other technologies?

    http

    What volumes do you generate when scraping? Traffic / Requests per day / hour

    Differs, but sometimes 100k pages in a day and may be lower

    What types of websites do you target for your scraping activity?

    Corporate websites
    Ecommerce websites

    Last but not least, to ensure the scraping can be done without legal issues - what specific data are you interested in scraping?

    About pages
    Services
    Product pages

    Thanked by 1vpnfail
  • vpnfailvpnfail Member
    edited October 2023

    @yusra said: If I can get my hands on residential IPs, then most of times, even two or three of them were sufficient for my needs.

    do you need residential in specific geolocation, or any country works?

  • @ScreenReader said:

    • this last one isn't exactly website but i do gather data about country-wide internet censorship (like dns poisoning / ssl hijack, remember DigiNotar?).

    thanks for your input, and very interesting project monitoring country-wide internet censorship. do you publish this data somewhere?

  • @vpnfail said: do you need residential in specific geolocation, or any country works?

    specific countries (most of the time)

    Thanked by 1vpnfail
  • @vpnfail said:

    @ScreenReader said:

    • this last one isn't exactly website but i do gather data about country-wide internet censorship (like dns poisoning / ssl hijack, remember DigiNotar?).

    thanks for your input, and very interesting project monitoring country-wide internet censorship. do you publish this data somewhere?

    no. at very least there are no plan yet to publish it to public. we still looking for data in high interest country (like some country in south asia, and in middle-east)

    Thanked by 1vpnfail
  • @yusra said:

    @vpnfail said: do you need residential in specific geolocation, or any country works?

    specific countries (most of the time)

    👍

  • @ScreenReader said:

    @vpnfail said:

    @ScreenReader said:

    • this last one isn't exactly website but i do gather data about country-wide internet censorship (like dns poisoning / ssl hijack, remember DigiNotar?).

    thanks for your input, and very interesting project monitoring country-wide internet censorship. do you publish this data somewhere?

    no. at very least there are no plan yet to publish it to public. we still looking for data in high interest country (like some country in south asia, and in middle-east)

    got it. good luck with the project going forward!

    Thanked by 1ScreenReader
Sign In or Register to comment.