Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
Home β€Ί General
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Suggestion needed for API business

Hi, one of my friend planning to start API business and planning to store data on mysql DB but thing is its a initial level so planning to run small VPS openvz of 2 nodes each 12GB RAM and 500GB HDD disk. friend and me is noob in balance loading tech stuff. Even indexing, partitioning also has to do with help of gpt. Data volume could go 1 million rows.

Suggest me something plug and play via Webgui & should be opensource πŸ€£πŸ˜‚. We only know how to insert the data on db and python code with the help of GPT.

Comments

  • I think you both need to start small.

  • FlamesRunnerFlamesRunner Member
    edited June 2025

    Eh... I wouldn't put SQL data on a hard disk, your random reads/writes will suffer greatly and your uncached lookups will be much slower.

    I don't want to take away from your ambitions, but you'd do well to learn even the basics before attempting this yourself. God forbid something goes very wrong and you two aren't able to fix it - and there will be times where technical skills are required beyond what an LLM can help you with.

  • AlyxAlyx Member, Host Rep

    No matter what you are doing, just don't use OpenVZ in 2025!

  • now it's 2025,why openvz?
    change to KVM pls.hhhhhhh

  • Mysql usually uses master slave mode for balancing.

    Do use Nvme SSD for mysql.

    I am running a table with over 13,000,000 rows with fulltext index, where one single query may take 0.3-0.4 seconds on Ryzen 7900. And it used to take 0.8 seconds on Ryzen 3900, 1.2 seconds on Contabo.

    Thanked by 1siemens
  • If for indexing you need chatgpt, then you have bigger problem my friend.

    Thanked by 1sillycat
  • totototototo Member

    @praburam said:
    planning to run small VPS openvz of 2 nodes each 12GB RAM and 500GB HDD disk

    TNAHosting?
    https://lowendtalk.com/discussion/195596/12gb-ram-500gb-hdd-openvz-5-m-ssd-kvm-from-16-yr-dedi-servers-chicago-il#latest

  • How do you run scrappers without getting ip blocked by Amazon, FK???

  • What exactly is an API business?

    Thanked by 2xxsl tentor
  • @navneetkk said:

    How do you run scrappers without getting ip blocked by Amazon, FK???

    πŸ˜… Using tailscale with my home network and turn off & on incase ip blocked. Planning to setup auto on and off. My broadband has 3TB of bandwidth each month 😁

    Thanked by 1navneetkk
  • @praburam said:

    @navneetkk said:

    How do you run scrappers without getting ip blocked by Amazon, FK???

    πŸ˜… Using tailscale with my home network and turn off & on incase ip blocked. Planning to setup auto on and off. My broadband has 3TB of bandwidth each month 😁

    I read your comments on tailscale but I thought there would be more secret sauce πŸ₯²
    Python, Node??
    Cheerio or Puppeteer? πŸ₯²πŸ₯²
    How do you manage VPS usage going too high??
    Which providers works best and what minimum specs you recommend??
    Thank you my boy 😊❀️

  • DediRockDediRock Member, Patron Provider

    That's awesome, who's your end user and what other companies are doing something similar?

  • @DediRock said:
    That's awesome, who's your end user and what other companies are doing something similar?

    Just going to provide cheap on RAPIDAPI marketplace

    Thanked by 1navneetkk
  • @navneetkk said:

    @praburam said:

    @navneetkk said:

    How do you run scrappers without getting ip blocked by Amazon, FK???

    πŸ˜… Using tailscale with my home network and turn off & on incase ip blocked. Planning to setup auto on and off. My broadband has 3TB of bandwidth each month 😁

    I read your comments on tailscale but I thought there would be more secret sauce πŸ₯²
    Python, Node??
    Cheerio or Puppeteer? πŸ₯²πŸ₯²
    How do you manage VPS usage going too high??
    Which providers works best and what minimum specs you recommend??
    Thank you my boy 😊❀️

    One and only Python selenium combo mostly...
    Most of the VPS are idle on my hosting provider. So they won't bother. Right now using @TNAHosting openvz if i earn more i will upgrade them to KVM πŸ˜…. I will buy vps and simply replicate and load balance the traffic thats my plan

    Thanked by 1navneetkk
  • xxslxxsl Member, LIR

    What the hell is the "API business" ?

    Thanked by 2tentor dedipromo
  • @xxsl said:
    What the hell is the "API business" ?

    Scrape store and sell 😁

    Thanked by 1xxsl
  • xemapsxemaps Member

    Choose a well known SQL database hosting (resizable when you grow !), so you will be safe and can use at start a small ryzen VPS/VDS for your web (cloudflare cacheable if needed) and api C&C without kill it.

    Thanked by 1praburam
  • @xemaps said:
    Choose a well known SQL database hosting (resizable when you grow !), so you will be safe and can use at start a small ryzen VPS/VDS for your web (cloudflare cacheable if needed) and api C&C without kill it.

    Thanks

  • just do it step by step

    Thanked by 1praburam
  • I'm in same boat, let me give you some feedback.
    Dedicated server >>> vps

    You need to be able to benchmark your API performance in reliable way.
    If you are on VPS you never know if its your queries, your code or just crowded VPS node. Big database on VPS = unpredictable outcome.
    If you can - go for dedicated server. Ovh has some cheap dedicated servers with 2x480gb drives. This should give you way better performance than any VPS around.

    If you cant afford dedicated server - you need NVME based vps, throw away idea of hdd based VPS. Netcup root servers are nice deal for the buck.

    Milion rows is not that big table. Take cheapest VPS you can find with hourly billing (hetzner for example), insert 1000 or 10000 sample rows, check table size to find out how much space you actually need. Maybe you need way less than you think and 40gb ssd VPS will be enough.

    Also its not bad idea fill table with 1 milion rows of some dummy data and check performance to find out what you actually need. You can also use real data. Scraping 1 milion records shoudnt be that hard. Back in time I've scraped something like 50 or 100 milion users data from one social site. It was fun ;)

    Good luck ;)

    P.S. Prepare that you will need to learn alot stuff along the way and reserve time for this.

    Thanked by 2navneetkk Hetzner_OL
  • navneetkknavneetkk Member
    edited June 2025

    ...> @dodheimsgard said:

    I'm in same boat, let me give you some feedback.
    Dedicated server >>> vps

    You need to be able to benchmark your API performance in reliable way.
    If you are on VPS you never know if its your queries, your code or just crowded VPS node. Big database on VPS = unpredictable outcome.
    If you can - go for dedicated server. Ovh has some cheap dedicated servers with 2x480gb drives. This should give you way better performance than any VPS around.

    If you cant afford dedicated server - you need NVME based vps, throw away idea of hdd based VPS. Netcup root servers are nice deal for the buck.

    Milion rows is not that big table. Take cheapest VPS you can find with hourly billing (hetzner for example), insert 1000 or 10000 sample rows, check table size to find out how much space you actually need. Maybe you need way less than you think and 40gb ssd VPS will be enough.

    Also its not bad idea fill table with 1 milion rows of some dummy data and check performance to find out what you actually need. You can also use real data. Scraping 1 milion records shoudnt be that hard. Back in time I've scraped something like 50 or 100 milion users data from one social site. It was fun ;)

    Good luck ;)

    P.S. Prepare that you will need to learn alot stuff along the way and reserve time for this.

    Can you give some tips for noob like me here or on dm?
    Wanna scrap 5000+ products from different ecom sites but I am not sure how to get results faster πŸ˜•
    It takes hell lot of time like 5 products per minute..... whether to employ multiple vps or there are some secrets which will yield better results..... whether to use database, CSV or json format??

  • @navneetkk
    You need to scrape with browser or without (ofc without browser = better performance per buck spent on hardware).
    You need to implement proxies in your code, datacenter, mobile or residential.
    Once thats done you scale by using multiple processes or multiple threads.
    For storage database is only format that makes sense for me.
    Either multiple VPSes or just one bigger dedicated server. From my experience $20 spent on dedi can outperform $40-50 spent on vpses.

    Thanked by 1navneetkk
  • βœ…οΈ Let’s break this into a feasible high-level solution that combines low-level control (BIOS-like orchestration), remote volume management (via NVMe-oF or RDMA), and tight ScyllaDB integration for high-performance operations. The goal is to run a minimal, BIOS-style control layer from a central server to manage and serve ScyllaDB volumes on physical or virtual nodes with maximum throughput and minimal OS overhead. The architecture includes a β€œBIOS-like” control layer built using coreboot, u-root, or a unikernel like OSv, which boots into a stripped-down runtime that connects to a central server, mounts remote volumes, and launches the ScyllaDB node. Volume management is handled centrally via RDMA, SPDK, or NVMe-over-TCP to expose high-speed block devices and assign them to nodes dynamically. Each node runs ScyllaDB directly from the mounted volume with zero OS/container overhead, configured via scylla.yaml and launched with performance-tuned flags. The stack includes a bare-metal runtime, remote volume protocol, central control daemon, PXE boot provisioning, and observability via Prometheus. The boot sequence involves PXE boot, identity fetch, volume assignment, mounting, and Scylla launch. Benefits include zero OS overhead, centralized storage control, modular upgrades, and ideal suitability for edge or datacenter racks. Security is ensured with signed runtimes, TLS provisioning, RBAC on volumes, and watchdog crash recovery. Remote updates are supported via lightweight downloads during boot. This BIOS-inspired runtime is a powerful foundation for private cloud or high-throughput infrastructure, enabling servers to boot, mount remote Scylla volumes, and join a cluster with minimal software and maximum efficiency.

  • @dodheimsgard said:
    @navneetkk
    You need to scrape with browser or without (ofc without browser = better performance per buck spent on hardware).
    You need to implement proxies in your code, datacenter, mobile or residential.
    Once thats done you scale by using multiple processes or multiple threads.
    For storage database is only format that makes sense for me.
    Either multiple VPSes or just one bigger dedicated server. From my experience $20 spent on dedi can outperform $40-50 spent on vpses.

    Cool

    Thanked by 1navneetkk
  • ahnlakahnlak Member

    How the hell does this crap not fall under the "LET is White Hat" rule?

  • tentortentor Member, Host Rep

    @ahnlak said:
    How the hell does this crap not fall under the "LET is White Hat" rule?

    "Residential proxy" requests threads are not banned since a while, so I don't even know if this rule even works nowadays

  • ahnlakahnlak Member

    @tentor said:

    @ahnlak said:
    How the hell does this crap not fall under the "LET is White Hat" rule?

    "Residential proxy" requests threads are not banned since a while, so I don't even know if this rule even works nowadays

    True, but "I wanna build a scraping farm" goes even beyond the "residential proxy" nonsense.

    But yeah, it feels more and more like the rules no longer apply.

Sign In or Register to comment.