Suggestion needed for API business

praburam · June 2025

Hi, one of my friend planning to start API business and planning to store data on mysql DB but thing is its a initial level so planning to run small VPS openvz of 2 nodes each 12GB RAM and 500GB HDD disk. friend and me is noob in balance loading tech stuff. Even indexing, partitioning also has to do with help of gpt. Data volume could go 1 million rows.

Suggest me something plug and play via Webgui & should be opensource 🤣😂. We only know how to insert the data on db and python code with the help of GPT.

Motion3549 · June 2025

I think you both need to start small.

FlamesRunner · June 2025

Eh... I wouldn't put SQL data on a hard disk, your random reads/writes will suffer greatly and your uncached lookups will be much slower.

I don't want to take away from your ambitions, but you'd do well to learn even the basics before attempting this yourself. God forbid something goes very wrong and you two aren't able to fix it - and there will be times where technical skills are required beyond what an LLM can help you with.

Alyx · June 2025

No matter what you are doing, just don't use OpenVZ in 2025!

zhujisou · June 2025

now it's 2025,why openvz?
change to KVM pls.hhhhhhh

Dyingcat · June 2025

Mysql usually uses master slave mode for balancing.

Do use Nvme SSD for mysql.

I am running a table with over 13,000,000 rows with fulltext index, where one single query may take 0.3-0.4 seconds on Ryzen 7900. And it used to take 0.8 seconds on Ryzen 3900, 1.2 seconds on Contabo.

itachikonoha · June 2025

If for indexing you need chatgpt, then you have bigger problem my friend.

tototo · June 2025

@praburam said:
planning to run small VPS openvz of 2 nodes each 12GB RAM and 500GB HDD disk

TNAHosting?
https://lowendtalk.com/discussion/195596/12gb-ram-500gb-hdd-openvz-5-m-ssd-kvm-from-16-yr-dedi-servers-chicago-il#latest

praburam · June 2025

@tototo said:

@praburam said:
planning to run small VPS openvz of 2 nodes each 12GB RAM and 500GB HDD disk

TNAHosting?
https://lowendtalk.com/discussion/195596/12gb-ram-500gb-hdd-openvz-5-m-ssd-kvm-from-16-yr-dedi-servers-chicago-il#latest

Yup

navneetkk · June 2025

@praburam said:

@tototo said:

@praburam said:
planning to run small VPS openvz of 2 nodes each 12GB RAM and 500GB HDD disk

TNAHosting?
https://lowendtalk.com/discussion/195596/12gb-ram-500gb-hdd-openvz-5-m-ssd-kvm-from-16-yr-dedi-servers-chicago-il#latest

Yup

How do you run scrappers without getting ip blocked by Amazon, FK???

dedipromo · June 2025

What exactly is an API business?

praburam · June 2025

@navneetkk said:

@praburam said:

@tototo said:

@praburam said:
planning to run small VPS openvz of 2 nodes each 12GB RAM and 500GB HDD disk

TNAHosting?
https://lowendtalk.com/discussion/195596/12gb-ram-500gb-hdd-openvz-5-m-ssd-kvm-from-16-yr-dedi-servers-chicago-il#latest

Yup

How do you run scrappers without getting ip blocked by Amazon, FK???

😅 Using tailscale with my home network and turn off & on incase ip blocked. Planning to setup auto on and off. My broadband has 3TB of bandwidth each month 😁

navneetkk · June 2025

@praburam said:

@navneetkk said:

@praburam said:

@tototo said:

@praburam said:
planning to run small VPS openvz of 2 nodes each 12GB RAM and 500GB HDD disk

TNAHosting?
https://lowendtalk.com/discussion/195596/12gb-ram-500gb-hdd-openvz-5-m-ssd-kvm-from-16-yr-dedi-servers-chicago-il#latest

Yup

How do you run scrappers without getting ip blocked by Amazon, FK???

😅 Using tailscale with my home network and turn off & on incase ip blocked. Planning to setup auto on and off. My broadband has 3TB of bandwidth each month 😁

I read your comments on tailscale but I thought there would be more secret sauce 🥲
Python, Node??
Cheerio or Puppeteer? 🥲🥲
How do you manage VPS usage going too high??
Which providers works best and what minimum specs you recommend??
Thank you my boy 😊❤️

DediRock · June 2025

That's awesome, who's your end user and what other companies are doing something similar?

praburam · June 2025

@DediRock said:
That's awesome, who's your end user and what other companies are doing something similar?

Just going to provide cheap on RAPIDAPI marketplace

praburam · June 2025

@navneetkk said:

@praburam said:

@navneetkk said:

@praburam said:

@tototo said:

@praburam said:
planning to run small VPS openvz of 2 nodes each 12GB RAM and 500GB HDD disk

TNAHosting?
https://lowendtalk.com/discussion/195596/12gb-ram-500gb-hdd-openvz-5-m-ssd-kvm-from-16-yr-dedi-servers-chicago-il#latest

Yup

How do you run scrappers without getting ip blocked by Amazon, FK???

😅 Using tailscale with my home network and turn off & on incase ip blocked. Planning to setup auto on and off. My broadband has 3TB of bandwidth each month 😁

I read your comments on tailscale but I thought there would be more secret sauce 🥲
Python, Node??
Cheerio or Puppeteer? 🥲🥲
How do you manage VPS usage going too high??
Which providers works best and what minimum specs you recommend??
Thank you my boy 😊❤️

One and only Python selenium combo mostly...
Most of the VPS are idle on my hosting provider. So they won't bother. Right now using @TNAHosting openvz if i earn more i will upgrade them to KVM 😅. I will buy vps and simply replicate and load balance the traffic thats my plan

xxsl · June 2025

What the hell is the "API business" ?

praburam · June 2025

@xxsl said:
What the hell is the "API business" ?

Scrape store and sell 😁

xemaps · June 2025

Choose a well known SQL database hosting (resizable when you grow !), so you will be safe and can use at start a small ryzen VPS/VDS for your web (cloudflare cacheable if needed) and api C&C without kill it.

praburam · June 2025

@xemaps said:
Choose a well known SQL database hosting (resizable when you grow !), so you will be safe and can use at start a small ryzen VPS/VDS for your web (cloudflare cacheable if needed) and api C&C without kill it.

Thanks

duopluscloud · June 2025

just do it step by step

dodheimsgard · June 2025

I'm in same boat, let me give you some feedback.
Dedicated server >>> vps

You need to be able to benchmark your API performance in reliable way.
If you are on VPS you never know if its your queries, your code or just crowded VPS node. Big database on VPS = unpredictable outcome.
If you can - go for dedicated server. Ovh has some cheap dedicated servers with 2x480gb drives. This should give you way better performance than any VPS around.

If you cant afford dedicated server - you need NVME based vps, throw away idea of hdd based VPS. Netcup root servers are nice deal for the buck.

Milion rows is not that big table. Take cheapest VPS you can find with hourly billing (hetzner for example), insert 1000 or 10000 sample rows, check table size to find out how much space you actually need. Maybe you need way less than you think and 40gb ssd VPS will be enough.

Also its not bad idea fill table with 1 milion rows of some dummy data and check performance to find out what you actually need. You can also use real data. Scraping 1 milion records shoudnt be that hard. Back in time I've scraped something like 50 or 100 milion users data from one social site. It was fun

Good luck

P.S. Prepare that you will need to learn alot stuff along the way and reserve time for this.

navneetkk · June 2025

...> @dodheimsgard said:

I'm in same boat, let me give you some feedback.
Dedicated server >>> vps

You need to be able to benchmark your API performance in reliable way.
If you are on VPS you never know if its your queries, your code or just crowded VPS node. Big database on VPS = unpredictable outcome.
If you can - go for dedicated server. Ovh has some cheap dedicated servers with 2x480gb drives. This should give you way better performance than any VPS around.

If you cant afford dedicated server - you need NVME based vps, throw away idea of hdd based VPS. Netcup root servers are nice deal for the buck.

Milion rows is not that big table. Take cheapest VPS you can find with hourly billing (hetzner for example), insert 1000 or 10000 sample rows, check table size to find out how much space you actually need. Maybe you need way less than you think and 40gb ssd VPS will be enough.

Also its not bad idea fill table with 1 milion rows of some dummy data and check performance to find out what you actually need. You can also use real data. Scraping 1 milion records shoudnt be that hard. Back in time I've scraped something like 50 or 100 milion users data from one social site. It was fun

Good luck

P.S. Prepare that you will need to learn alot stuff along the way and reserve time for this.

Can you give some tips for noob like me here or on dm?
Wanna scrap 5000+ products from different ecom sites but I am not sure how to get results faster 😕
It takes hell lot of time like 5 products per minute..... whether to employ multiple vps or there are some secrets which will yield better results..... whether to use database, CSV or json format??

dodheimsgard · June 2025

@navneetkk
You need to scrape with browser or without (ofc without browser = better performance per buck spent on hardware).
You need to implement proxies in your code, datacenter, mobile or residential.
Once thats done you scale by using multiple processes or multiple threads.
For storage database is only format that makes sense for me.
Either multiple VPSes or just one bigger dedicated server. From my experience $20 spent on dedi can outperform $40-50 spent on vpses.

natestamm · June 2025

✅️ Let’s break this into a feasible high-level solution that combines low-level control (BIOS-like orchestration), remote volume management (via NVMe-oF or RDMA), and tight ScyllaDB integration for high-performance operations. The goal is to run a minimal, BIOS-style control layer from a central server to manage and serve ScyllaDB volumes on physical or virtual nodes with maximum throughput and minimal OS overhead. The architecture includes a “BIOS-like” control layer built using coreboot, u-root, or a unikernel like OSv, which boots into a stripped-down runtime that connects to a central server, mounts remote volumes, and launches the ScyllaDB node. Volume management is handled centrally via RDMA, SPDK, or NVMe-over-TCP to expose high-speed block devices and assign them to nodes dynamically. Each node runs ScyllaDB directly from the mounted volume with zero OS/container overhead, configured via scylla.yaml and launched with performance-tuned flags. The stack includes a bare-metal runtime, remote volume protocol, central control daemon, PXE boot provisioning, and observability via Prometheus. The boot sequence involves PXE boot, identity fetch, volume assignment, mounting, and Scylla launch. Benefits include zero OS overhead, centralized storage control, modular upgrades, and ideal suitability for edge or datacenter racks. Security is ensured with signed runtimes, TLS provisioning, RBAC on volumes, and watchdog crash recovery. Remote updates are supported via lightweight downloads during boot. This BIOS-inspired runtime is a powerful foundation for private cloud or high-throughput infrastructure, enabling servers to boot, mount remote Scylla volumes, and join a cluster with minimal software and maximum efficiency.

praburam · June 2025

@dodheimsgard said:
@navneetkk
You need to scrape with browser or without (ofc without browser = better performance per buck spent on hardware).
You need to implement proxies in your code, datacenter, mobile or residential.
Once thats done you scale by using multiple processes or multiple threads.
For storage database is only format that makes sense for me.
Either multiple VPSes or just one bigger dedicated server. From my experience $20 spent on dedi can outperform $40-50 spent on vpses.

Cool

ahnlak · June 2025

How the hell does this crap not fall under the "LET is White Hat" rule?

tentor · June 2025

@ahnlak said:
How the hell does this crap not fall under the "LET is White Hat" rule?

"Residential proxy" requests threads are not banned since a while, so I don't even know if this rule even works nowadays

ahnlak · June 2025

@tentor said:

@ahnlak said:
How the hell does this crap not fall under the "LET is White Hat" rule?

"Residential proxy" requests threads are not banned since a while, so I don't even know if this rule even works nowadays

True, but "I wanna build a scraping farm" goes even beyond the "residential proxy" nonsense.

But yeah, it feels more and more like the rules no longer apply.

Howdy, Stranger!

Categories

In this Discussion

Suggestion needed for API business

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Suggestion needed for API business

Comments