VPS for data scraping and running telegram bots
Hello,
I need a VPS for scraping data from amazon and running some telegram bots.
1 GB ram is fine. The problem is that my script uses selenium with chrome to scrape products from amazon, this means that it is a cpu intensive task.
I've been testing it on Linode 1GB plan and the cpu is constantly hitting 90%.
So I need a VPS with a dedicated core or where is allowed to use cpu at maximum.
My budget is 5-10$ per month (5$ is better, it's not a critical business).
I saw BuyVM 3.00$ with 1/4 cpu. I don't know if it is allowed to use cpu 24/7 at 100%.
I know that ramnode limits for abusive cpu usage and they vds plans are too expensive.
I don't care about port speed (at least 20mbps), 500gb bw is fine and I need 5gb space.
Any suggestion please?
Comments
You can run 100% of your 25% 24/7.
If you're needing a actual full thread, then you need a 4GB+ plan.
Francisco
Thanks Francisco for your quick response. Do you know if 1/4 of the core can handle a single chrome instance?
is the ¼ Core @ 3.50 GHz always max at 0.875 GHz?
We only cap if we have to. We try to let users burst but if you sit there rimming a CPU, it's going to be cappin'
Depends what that Chrome instance is doing. Does it have a ton of plugins, etc.
I got plenty of people that use the 1GB instances for browsers/ipmi boxes/etc and I hear no complaints.
We have a 3 day refund policy so you can find out quickly if it's an issue or not.
Francisco
Thanks francisco!
I'm using vanilla chrome, so I think it will be fine.
Thanks for your help!
@Francisco any restock soon for EU plans?
On the 1st when cancellations run.
Francisco
If customer uses the ¼ Core plan 24/7 at 100%, do you cap it max at 0.875 GHz?
I'm just trying to understand the meaning of ¼ Core dedicated.
I assume you scrape with python? Selenium is overkill because it mainly use for web automation. Why not requests-html library ? https://github.com/kennethreitz/requests-html
I’m scraping amazon which heavily relies on JavaScript so requests is not suitable for this purpose
Still, it's pretty rare to require an actual browser. Any javascript manipulates existing data on the DOM or does HTTP requests for more data. In either event you can mimic those requests.
For the most part you only need a full-blown browser for rendering the page or getting round anti-bot measures.
Nonetheless, as you allude to... the headless browser consumes a lot of CPU.
requests-html is not the same with requests. it has Javascript support. You just need to render it and done.
Wow, I’m going to check it thanks
Update: I check it and it’s based on chromium to rendere JavaScript, so it’s basically the same if using selenium
of course. But as I mentioned earlier, selenium is overkill. You can achieve same thing with requests-html (Chromium only i guess).
You need min 4GB RAM for full selenium power , grid or general driver. Btw, why don't you go http requests ? Much faster and low memory consuming...
Cheers
Or you can go with geckofx - ffox core ... Much light weight and has almost all the same ability that selenium has .
yep, but selenium is based on Chromium too... I need to render every page....
I checked requests-html and it is calling chromium every time you call the render() function.
Hi, I tried with a 5$ Linode (1GB) and it's working fine. Now I'm moving everything on buyvm. I'm loading only 1 page at time. I have about 500MB of free ram.
The problem is cpu
Go phantomjs or gecko... Much better... I hate chrome but also use chrome for my bots. . but mostly I do http..
I have already done everything with django and selenium. In my next project I'll consider phantomjs or gecko.
thanks for your suggestion
the 3.50$ plan from buyvm is enough for selenium and chrome
Yes. Buyvm seems nice. It's been few days I'm using buyvm slice and getting good result.
Glad it's working well
Francisco
Just in case someone was wondering how much resources does selenium with chrome require:
top - 07:37:37 up 19:46, 1 user, load average: 0.88, 0.95, 1.19
Tasks: 87 total, 2 running, 57 sleeping, 0 stopped, 0 zombie
%Cpu(s): 77.2/10.8 88[||||||||||||||||||||||||||||||||||||||||||||||| ]
KiB Mem : 1009168 total, 207728 free, 559980 used, 241460 buff/cache
KiB Swap: 1048572 total, 852732 free, 195840 used. 312432 avail Mem
@Kwoon : Are you running chrome in headless mode (
--disable-gpu --headless
)Selenium with Headless Chrome
Yes, but without disable gpu parameter
Using disable-gpu increased the use of cpu
Some combination of these flags might help:
--disable-background-networking --disable-background-timer-throttling --disable-breakpad --disable-client-side-phishing-detection --disable-default-apps
--disable-dev-shm-usage --disable-extensions --disable-features=site-per-process --disable-hang-monitor --disable-popup-blocking --disable-prompt-on-repost
--disable-sync --disable-translate --metrics-recording-only --no-first-run --safebrowsing-disable-auto-update
--enable-automation --password-store=basic --use-mock-keychain --user-data-dir=/tmp --hide-scrollbars --mute-audio
For a smaller project, I would recommend compiling your own binary with optimizations enabled (
-O2 -march=native
) for significant drops in CPU and memory usage - but Chrome is so fricking huge. You could get a stripped down binary fromungoogled-chrome
project instead of using the default binaries.Thanks guys. I've been using selenium because I cannot find any alternative before. Now I know there are more options when it comes to scraping.
thanks, I'm trying these parameters