Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


VPS for data scraping and running telegram bots
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

VPS for data scraping and running telegram bots

KwoonKwoon Member
edited September 2018 in Requests

Hello,
I need a VPS for scraping data from amazon and running some telegram bots.

1 GB ram is fine. The problem is that my script uses selenium with chrome to scrape products from amazon, this means that it is a cpu intensive task.
I've been testing it on Linode 1GB plan and the cpu is constantly hitting 90%.

So I need a VPS with a dedicated core or where is allowed to use cpu at maximum.

My budget is 5-10$ per month (5$ is better, it's not a critical business).

I saw BuyVM 3.00$ with 1/4 cpu. I don't know if it is allowed to use cpu 24/7 at 100%.
I know that ramnode limits for abusive cpu usage and they vds plans are too expensive.

I don't care about port speed (at least 20mbps), 500gb bw is fine and I need 5gb space.

Any suggestion please?

Comments

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    Kwoon said: saw BuyVM 3.00$ with 1/4 cpu. I don't know if it is allowed to use cpu 24/7 at 100%.

    You can run 100% of your 25% 24/7.

    If you're needing a actual full thread, then you need a 4GB+ plan.

    Francisco

    Thanked by 1vimalware
  • @Francisco said:

    Kwoon said: saw BuyVM 3.00$ with 1/4 cpu. I don't know if it is allowed to use cpu 24/7 at 100%.

    You can run 100% of your 25% 24/7.

    If you're needing a actual full thread, then you need a 4GB+ plan.

    Francisco

    Thanks Francisco for your quick response. Do you know if 1/4 of the core can handle a single chrome instance?

  • @Francisco said:

    Kwoon said: saw BuyVM 3.00$ with 1/4 cpu. I don't know if it is allowed to use cpu 24/7 at 100%.

    You can run 100% of your 25% 24/7.

    If you're needing a actual full thread, then you need a 4GB+ plan.

    Francisco

    is the ¼ Core @ 3.50 GHz always max at 0.875 GHz?

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @Chuck said:
    is the ¼ Core @ 3.50 GHz always max at 0.875 GHz?

    We only cap if we have to. We try to let users burst but if you sit there rimming a CPU, it's going to be cappin'

    Kwoon said: Thanks Francisco for your quick response. Do you know if 1/4 of the core can handle a single chrome instance?

    Depends what that Chrome instance is doing. Does it have a ton of plugins, etc.

    I got plenty of people that use the 1GB instances for browsers/ipmi boxes/etc and I hear no complaints.

    We have a 3 day refund policy so you can find out quickly if it's an issue or not.

    Francisco

    Thanked by 1vimalware
  • Thanks francisco!
    I'm using vanilla chrome, so I think it will be fine.
    Thanks for your help!

  • @Francisco any restock soon for EU plans?

  • FranciscoFrancisco Top Host, Host Rep, Veteran

    @Kwoon said:
    @Francisco any restock soon for EU plans?

    On the 1st when cancellations run.

    Francisco

    Thanked by 1vimalware
  • ChuckChuck Member
    edited September 2018

    @Francisco said:

    @Chuck said:
    is the ¼ Core @ 3.50 GHz always max at 0.875 GHz?

    We only cap if we have to. We try to let users burst but if you sit there rimming a CPU, it's going to be cappin'

    Kwoon said: Thanks Francisco for your quick response. Do you know if 1/4 of the core can handle a single chrome instance?

    Depends what that Chrome instance is doing. Does it have a ton of plugins, etc.

    I got plenty of people that use the 1GB instances for browsers/ipmi boxes/etc and I hear no complaints.

    We have a 3 day refund policy so you can find out quickly if it's an issue or not.

    Francisco

    If customer uses the ¼ Core plan 24/7 at 100%, do you cap it max at 0.875 GHz?

    I'm just trying to understand the meaning of ¼ Core dedicated.

  • I assume you scrape with python? Selenium is overkill because it mainly use for web automation. Why not requests-html library ? https://github.com/kennethreitz/requests-html

    Thanked by 3ehab coreflux r00t4bl3
  • @creep said:
    I assume you scrape with python? Selenium is overkill because it mainly use for web automation. Why not requests-html library ? https://github.com/kennethreitz/requests-html

    I’m scraping amazon which heavily relies on JavaScript so requests is not suitable for this purpose

  • ricardoricardo Member
    edited September 2018

    Still, it's pretty rare to require an actual browser. Any javascript manipulates existing data on the DOM or does HTTP requests for more data. In either event you can mimic those requests.

    For the most part you only need a full-blown browser for rendering the page or getting round anti-bot measures.

    Nonetheless, as you allude to... the headless browser consumes a lot of CPU.

    Thanked by 2mfs creep
  • @Kwoon said:

    @creep said:
    I assume you scrape with python? Selenium is overkill because it mainly use for web automation. Why not requests-html library ? https://github.com/kennethreitz/requests-html

    I’m scraping amazon which heavily relies on JavaScript so requests is not suitable for this purpose

    requests-html is not the same with requests. it has Javascript support. You just need to render it and done.

  • KwoonKwoon Member
    edited September 2018

    @creep said:

    @Kwoon said:

    @creep said:
    I assume you scrape with python? Selenium is overkill because it mainly use for web automation. Why not requests-html library ? https://github.com/kennethreitz/requests-html

    I’m scraping amazon which heavily relies on JavaScript so requests is not suitable for this purpose

    requests-html is not the same with requests. it has Javascript support. You just need to render it and done.

    Wow, I’m going to check it thanks

    Update: I check it and it’s based on chromium to rendere JavaScript, so it’s basically the same if using selenium

  • @Kwoon said:

    @creep said:

    @Kwoon said:

    @creep said:
    I assume you scrape with python? Selenium is overkill because it mainly use for web automation. Why not requests-html library ? https://github.com/kennethreitz/requests-html

    I’m scraping amazon which heavily relies on JavaScript so requests is not suitable for this purpose

    requests-html is not the same with requests. it has Javascript support. You just need to render it and done.

    Wow, I’m going to check it thanks

    Update: I check it and it’s based on chromium to rendere JavaScript, so it’s basically the same if using selenium

    of course. But as I mentioned earlier, selenium is overkill. You can achieve same thing with requests-html (Chromium only i guess).

    Thanked by 1vimalware
  • You need min 4GB RAM for full selenium power , grid or general driver. Btw, why don't you go http requests ? Much faster and low memory consuming...

    Cheers

  • Or you can go with geckofx - ffox core ... Much light weight and has almost all the same ability that selenium has .

  • KwoonKwoon Member
    edited September 2018

    @creep said:

    @Kwoon said:

    @creep said:

    @Kwoon said:

    @creep said:
    I assume you scrape with python? Selenium is overkill because it mainly use for web automation. Why not requests-html library ? https://github.com/kennethreitz/requests-html

    I’m scraping amazon which heavily relies on JavaScript so requests is not suitable for this purpose

    requests-html is not the same with requests. it has Javascript support. You just need to render it and done.

    Wow, I’m going to check it thanks

    Update: I check it and it’s based on chromium to rendere JavaScript, so it’s basically the same if using selenium

    of course. But as I mentioned earlier, selenium is overkill. You can achieve same thing with requests-html (Chromium only i guess).

    yep, but selenium is based on Chromium too... I need to render every page....
    I checked requests-html and it is calling chromium every time you call the render() function.

  • @Tumbleguy1 said:
    You need min 4GB RAM for full selenium power , grid or general driver. Btw, why don't you go http requests ? Much faster and low memory consuming...

    Cheers

    Hi, I tried with a 5$ Linode (1GB) and it's working fine. Now I'm moving everything on buyvm. I'm loading only 1 page at time. I have about 500MB of free ram.
    The problem is cpu :neutral:

    Thanked by 1creep
  • Go phantomjs or gecko... Much better... I hate chrome but also use chrome for my bots. . but mostly I do http..

    Thanked by 1r00t4bl3
  • @Tumbleguy1 said:
    Go phantomjs or gecko... Much better... I hate chrome but also use chrome for my bots. . but mostly I do http..

    I have already done everything with django and selenium. In my next project I'll consider phantomjs or gecko.
    thanks for your suggestion

    the 3.50$ plan from buyvm is enough for selenium and chrome

  • @Kwoon said:

    @Tumbleguy1 said:
    Go phantomjs or gecko... Much better... I hate chrome but also use chrome for my bots. . but mostly I do http..

    I have already done everything with django and selenium. In my next project I'll consider phantomjs or gecko.
    thanks for your suggestion

    the 3.50$ plan from buyvm is enough for selenium and chrome

    Yes. Buyvm seems nice. It's been few days I'm using buyvm slice and getting good result.

    Thanked by 1Francisco
  • FranciscoFrancisco Top Host, Host Rep, Veteran

    Kwoon said: the 3.50$ plan from buyvm is enough for selenium and chrome

    Glad it's working well :)

    Francisco

  • Just in case someone was wondering how much resources does selenium with chrome require:

    top - 07:37:37 up 19:46, 1 user, load average: 0.88, 0.95, 1.19
    Tasks: 87 total, 2 running, 57 sleeping, 0 stopped, 0 zombie
    %Cpu(s): 77.2/10.8 88[||||||||||||||||||||||||||||||||||||||||||||||| ]
    KiB Mem : 1009168 total, 207728 free, 559980 used, 241460 buff/cache
    KiB Swap: 1048572 total, 852732 free, 195840 used. 312432 avail Mem

    Thanked by 1vimalware
  • @Kwoon : Are you running chrome in headless mode (--disable-gpu --headless)

    Selenium with Headless Chrome

  • @rincewind said:
    @Kwoon : Are you running chrome in headless mode (--disable-gpu --headless)

    Selenium with Headless Chrome

    Yes, but without disable gpu parameter

  • @Kwoon said:

    @rincewind said:
    @Kwoon : Are you running chrome in headless mode (--disable-gpu --headless)

    Selenium with Headless Chrome

    Yes, but without disable gpu parameter

    Using disable-gpu increased the use of cpu :pensive:

  • @Kwoon said:

    @Kwoon said:

    @rincewind said:
    @Kwoon : Are you running chrome in headless mode (--disable-gpu --headless)

    Selenium with Headless Chrome

    Yes, but without disable gpu parameter

    Using disable-gpu increased the use of cpu :pensive:

    Some combination of these flags might help:
    --disable-background-networking --disable-background-timer-throttling --disable-breakpad --disable-client-side-phishing-detection --disable-default-apps
    --disable-dev-shm-usage --disable-extensions --disable-features=site-per-process --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost
    --disable-sync --disable-translate --metrics-recording-only --no-first-run --safebrowsing-disable-auto-update
    --enable-automation --password-store=basic --use-mock-keychain --user-data-dir=/tmp --hide-scrollbars --mute-audio

    For a smaller project, I would recommend compiling your own binary with optimizations enabled (-O2 -march=native) for significant drops in CPU and memory usage - but Chrome is so fricking huge. You could get a stripped down binary from ungoogled-chrome project instead of using the default binaries.

    Thanked by 1nbn
  • Thanks guys. I've been using selenium because I cannot find any alternative before. Now I know there are more options when it comes to scraping.

  • @rincewind said:

    @Kwoon said:

    @Kwoon said:

    @rincewind said:
    @Kwoon : Are you running chrome in headless mode (--disable-gpu --headless)

    Selenium with Headless Chrome

    Yes, but without disable gpu parameter

    Using disable-gpu increased the use of cpu :pensive:

    Some combination of these flags might help:
    --disable-background-networking --disable-background-timer-throttling --disable-breakpad --disable-client-side-phishing-detection --disable-default-apps
    --disable-dev-shm-usage --disable-extensions --disable-features=site-per-process --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost
    --disable-sync --disable-translate --metrics-recording-only --no-first-run --safebrowsing-disable-auto-update
    --enable-automation --password-store=basic --use-mock-keychain --user-data-dir=/tmp --hide-scrollbars --mute-audio

    For a smaller project, I would recommend compiling your own binary with optimizations enabled (-O2 -march=native) for significant drops in CPU and memory usage - but Chrome is so fricking huge. You could get a stripped down binary from ungoogled-chrome project instead of using the default binaries.

    thanks, I'm trying these parameters

Sign In or Register to comment.