New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Suggestion needed for API business
Hi, one of my friend planning to start API business and planning to store data on mysql DB but thing is its a initial level so planning to run small VPS openvz of 2 nodes each 12GB RAM and 500GB HDD disk. friend and me is noob in balance loading tech stuff. Even indexing, partitioning also has to do with help of gpt. Data volume could go 1 million rows.
Suggest me something plug and play via Webgui & should be opensource π€£π. We only know how to insert the data on db and python code with the help of GPT.


Comments
I think you both need to start small.
Eh... I wouldn't put SQL data on a hard disk, your random reads/writes will suffer greatly and your uncached lookups will be much slower.
I don't want to take away from your ambitions, but you'd do well to learn even the basics before attempting this yourself. God forbid something goes very wrong and you two aren't able to fix it - and there will be times where technical skills are required beyond what an LLM can help you with.
No matter what you are doing, just don't use OpenVZ in 2025!
now it's 2025,why openvz?
change to KVM pls.hhhhhhh
Mysql usually uses master slave mode for balancing.
Do use Nvme SSD for mysql.
I am running a table with over 13,000,000 rows with fulltext index, where one single query may take 0.3-0.4 seconds on Ryzen 7900. And it used to take 0.8 seconds on Ryzen 3900, 1.2 seconds on Contabo.
If for indexing you need chatgpt, then you have bigger problem my friend.
TNAHosting?
https://lowendtalk.com/discussion/195596/12gb-ram-500gb-hdd-openvz-5-m-ssd-kvm-from-16-yr-dedi-servers-chicago-il#latest
Yup
How do you run scrappers without getting ip blocked by Amazon, FK???
What exactly is an API business?
π Using tailscale with my home network and turn off & on incase ip blocked. Planning to setup auto on and off. My broadband has 3TB of bandwidth each month π
I read your comments on tailscale but I thought there would be more secret sauce π₯²
Python, Node??
Cheerio or Puppeteer? π₯²π₯²
How do you manage VPS usage going too high??
Which providers works best and what minimum specs you recommend??
Thank you my boy πβ€οΈ
That's awesome, who's your end user and what other companies are doing something similar?
Just going to provide cheap on RAPIDAPI marketplace
One and only Python selenium combo mostly...
Most of the VPS are idle on my hosting provider. So they won't bother. Right now using @TNAHosting openvz if i earn more i will upgrade them to KVM π . I will buy vps and simply replicate and load balance the traffic thats my plan
What the hell is the "API business" ?
Scrape store and sell π
Choose a well known SQL database hosting (resizable when you grow !), so you will be safe and can use at start a small ryzen VPS/VDS for your web (cloudflare cacheable if needed) and api C&C without kill it.
Thanks
just do it step by step
I'm in same boat, let me give you some feedback.
Dedicated server >>> vps
You need to be able to benchmark your API performance in reliable way.
If you are on VPS you never know if its your queries, your code or just crowded VPS node. Big database on VPS = unpredictable outcome.
If you can - go for dedicated server. Ovh has some cheap dedicated servers with 2x480gb drives. This should give you way better performance than any VPS around.
If you cant afford dedicated server - you need NVME based vps, throw away idea of hdd based VPS. Netcup root servers are nice deal for the buck.
Milion rows is not that big table. Take cheapest VPS you can find with hourly billing (hetzner for example), insert 1000 or 10000 sample rows, check table size to find out how much space you actually need. Maybe you need way less than you think and 40gb ssd VPS will be enough.
Also its not bad idea fill table with 1 milion rows of some dummy data and check performance to find out what you actually need. You can also use real data. Scraping 1 milion records shoudnt be that hard. Back in time I've scraped something like 50 or 100 milion users data from one social site. It was fun
Good luck
P.S. Prepare that you will need to learn alot stuff along the way and reserve time for this.
...> @dodheimsgard said:
Can you give some tips for noob like me here or on dm?
Wanna scrap 5000+ products from different ecom sites but I am not sure how to get results faster π
It takes hell lot of time like 5 products per minute..... whether to employ multiple vps or there are some secrets which will yield better results..... whether to use database, CSV or json format??
@navneetkk
You need to scrape with browser or without (ofc without browser = better performance per buck spent on hardware).
You need to implement proxies in your code, datacenter, mobile or residential.
Once thats done you scale by using multiple processes or multiple threads.
For storage database is only format that makes sense for me.
Either multiple VPSes or just one bigger dedicated server. From my experience $20 spent on dedi can outperform $40-50 spent on vpses.
β οΈ Letβs break this into a feasible high-level solution that combines low-level control (BIOS-like orchestration), remote volume management (via NVMe-oF or RDMA), and tight ScyllaDB integration for high-performance operations. The goal is to run a minimal, BIOS-style control layer from a central server to manage and serve ScyllaDB volumes on physical or virtual nodes with maximum throughput and minimal OS overhead. The architecture includes a βBIOS-likeβ control layer built using coreboot, u-root, or a unikernel like OSv, which boots into a stripped-down runtime that connects to a central server, mounts remote volumes, and launches the ScyllaDB node. Volume management is handled centrally via RDMA, SPDK, or NVMe-over-TCP to expose high-speed block devices and assign them to nodes dynamically. Each node runs ScyllaDB directly from the mounted volume with zero OS/container overhead, configured via scylla.yaml and launched with performance-tuned flags. The stack includes a bare-metal runtime, remote volume protocol, central control daemon, PXE boot provisioning, and observability via Prometheus. The boot sequence involves PXE boot, identity fetch, volume assignment, mounting, and Scylla launch. Benefits include zero OS overhead, centralized storage control, modular upgrades, and ideal suitability for edge or datacenter racks. Security is ensured with signed runtimes, TLS provisioning, RBAC on volumes, and watchdog crash recovery. Remote updates are supported via lightweight downloads during boot. This BIOS-inspired runtime is a powerful foundation for private cloud or high-throughput infrastructure, enabling servers to boot, mount remote Scylla volumes, and join a cluster with minimal software and maximum efficiency.
Cool
How the hell does this crap not fall under the "LET is White Hat" rule?
"Residential proxy" requests threads are not banned since a while, so I don't even know if this rule even works nowadays
True, but "I wanna build a scraping farm" goes even beyond the "residential proxy" nonsense.
But yeah, it feels more and more like the rules no longer apply.