Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Shells Virtual Desktop
BMail.ag - Secure Email Service
Server.net
CPLicense.net
VPS Server
Buy VPN
Vultr
VMs for AI
HostDare
ReliableSite White-Label Dedicated Hosting for Resellers
InterServer VPS
BMail.ag - Secure Email Service
Best VPN
High-Performance Bare Metal Server Solutions
Karvl.com
Server Mania Cloud Hosting
DataWagon Hosting
AlphaVPS Hosting
Evoxt.com
Clouvider
VPS Hosting with NVMe
Residential IPs in the US & 4G Mobile Proxies in EU & US with Unlimited Bandwidth
ReliableSite White-Label Dedicated Hosting for Resellers
Rabisu - Hosting Solutions
Shells Virtual Desktop
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

I am having a horrible time with a VPS and its performance - can anybody please help me?

jackboyjackboy Member

I've been doing this 20+ years. I have a live production server on hosting.com (formerly A2) it is one of about a dozen boxes I have running.

There is a mariadb, apache2, redis and some other basic stuff. The server does marketing and generates a lot of data.

Near the end of the week last week, it crawled almost to a halt and has been unusable this week. The host claims nothing is wrong or different on their end.

From my side, I archived millions of rows of data, recompiled crucial binaries to improve their performance, rigged the database and other server settings... everything you can imagine, round-and-round. The server performs so poorly, things start to cascade and fail after a period.

Queries that worked fine last week on millions MORE rows of data, suddenly time out. I've adjusted buffer pool size (up and down), adjusted swap, every trick I know in the book from 20+ years of doing this.

The performance just gets worse. Nothing I've done over the last 72+ hours has seemed to make a single dent in whatever is going wrong with that server.

It may have brief period of lucidity where it seems everything is running fine, especially after I restart the database and whatnot - but as soon as all the automations start to run, it takes no time at all for the server to act like it is entirely thrashed.

Could this be someting on the host side and they are just not being honest?

What would cause something like this to start happening suddenly?

On our end, we didn't really make any crazy changes to anything - not enough to explain this drop in performance. I was thinking maybe just the total cumulative size of the data + indexes had grown so large that we were having an issue keeping everything in RAM. Archiving millions of rows should have fixed that, but it didn't.

Because the performance is what is causing issues, every single cron and script and query that runs has been thoroughly investigated and optimized - they all seem like the culprit when the performance is taking, but I feel like I'm LOSING MY MIND.

I've replaced crucial components with Rust binaries. I've truncated off huge swaths of data. I've cleaned up old logs (tons of free disk, plenty of RAM, CPU doesn't always appear throttled). Nothing makes a difference. If anything, it has gotten WORSE.

I guess really, what I am wondering is like what I asked: is it best maybe I just move to another host and get a different box? I'll have to pull an all-nighter to ensure the migration works, but I just can't physically contend with this other server any more. It must be something on their end, I just can't pinpoint what the nuisance could be. I've never had such a problem before.

Thanks for your time in reading this and any responses I may get.

I'm also in the market for another VPS, I guess :)

Comments

  • olokeoloke Member, Host Rep

    I'm also in the market for another VPS, I guess

    Please tell us your desired specs and acceptable budget :)

  • muddymuddy Member

    back in the old days, I used A2hosting for everything. Decent pricing and good quality support. Looks like they were recently acquired/merged into something that smells a little like the EIG/Newfold Digital situation. could be totally wrong, but some quick research on reddit seems to confirm those suspicions. Again, I could be wrong, but if you've not had recent good luck with their support, it might be wise to look elsewhere. Plenty of other good options available out there...

  • jackboyjackboy Member

    @oloke said:

    I'm also in the market for another VPS, I guess

    Please tell us your desired specs and acceptable budget :)

    $30-$40 a month would be ideal, or if I could get a yearly plan for less than that per month, I would possibly be interested in that. It really depends, I like a good deal, but I need to initiate this migration ASAP - this system generates a lot of money and it is essentially offline at the moment, for all intents and purposes. As much RAM and as many vCPU as I could get would be ideal.

    Thanked by 1oloke
  • 3K333K33 Member, Host Rep
    edited March 17

    @jackboy said:

    @oloke said:

    I'm also in the market for another VPS, I guess

    Please tell us your desired specs and acceptable budget :)

    $30-$40 a month would be ideal, or if I could get a yearly plan for less than that per month, I would possibly be interested in that. It really depends, I like a good deal, but I need to initiate this migration ASAP - this system generates a lot of money and it is essentially offline at the moment, for all intents and purposes. As much RAM and as many vCPU as I could get would be ideal.

    Check our plans at https://strike.bz, PRO line (in SG/PL) on yearly is discounted at 50% on yearly plan. Our highest plan is Enterprise (in Poland) with 18GB/8vCPU/120GB NVMe Disk with discount it comes out at 19.50€/mo, if you need more specs, let me know, we can work something out. Discount code: UK6WGYS59N

    Thanked by 1oloke
  • HeronHeron Member

    If you're able, run the Passmark Performance Test. That tests CPU, RAM, drives and might show what is underperforming. If not, Geekbench can test the CPU.

  • JabJabJabJab Member

    Still have no idea what is the issue.
    IO and storage latency? CPU usage? Packet loss? OOM? Anything?

    I am very confused on those "20+ years of doing this.".

  • jackboyjackboy Member
    edited March 17

    @JabJab said:
    Still have no idea what is the issue.
    IO and storage latency? CPU usage? Packet loss? OOM? Anything?

    I am very confused on those "20+ years of doing this.".

    Well, here is a breakdown:

    I don't have dmesg access (it is in an OpenVZ container) - that limits what I can see.

    r_await is always listed as 0.00 and the %util is always reporting 0.00

    iostat is saying 10,980-11,701 reads/second at 4KB each with ~44-47 MB/s throughput

    the %system is 43-45%, half the CPU or so is spent in kernel/system calls (I/O).

    I don't have GUI for passmark, but I did run sysbench.

    CPU is at 1,380 events/s
    Memory is at 3,124 MB/s
    Disk is at 22.8 MB/s, 1,458 IOPS

    If you want me to check or run anything else, let me know.

    It manifests as the box just being unusable. Queries time out, stuff fails to load, SSH is slow/unresponsive - it is across the board. I've tried restarts, disabling services, truncating huge swaths of data, adjusting settings... I've even disabled my replication over to the slave server. I've rewritten critical components into precompiled Rust binaries.

    I've been developing proprietary software for companies for 20+ years, using similar setups - often with exponentially MORE data and heavier workloads. While this particular server is busy, it is a small fraction of the traffic and activity as some of my other projects that are not struggling the same way.

    That is why I say I am losing my mind over this. I'm not sure what else I could possibly do - I have a whole bag of tricks for when I need to juice more performance out of a box, but no matter what I do on this server, it just seems to get WORSE. This was a sudden thing, also, that started last week. It seems to clear up sometimes, but any brief periods of reprieve quickly crumble and before I know it, the system is slammed again and can't do anything. Since it is too slow to complete the workloads, stuff piles up and the problem cascades. I've spent three days addressing individual "issues", none of which ended up being the culprit. I even contacted the host and let them know what I was going through, but it is their claim that nothing is wrong on their end - which caused me to go back and just keep trying more stuff... more configuration changes, more data removed, more processes stopped. All for naught, as nothing has remedied the issue yet. As I said, if anything, it has become WORSE.

  • layer7layer7 Member, Host Rep, LIR

    @jackboy said:

    @JabJab said:
    Still have no idea what is the issue.
    IO and storage latency? CPU usage? Packet loss? OOM? Anything?

    I am very confused on those "20+ years of doing this.".

    Well, here is a breakdown:

    I don't have dmesg access (it is in an OpenVZ container) - that limits what I can see.

    r_await is always listed as 0.00 and the %util is always reporting 0.00

    iostat is saying 10,980-11,701 reads/second at 4KB each with ~44-47 MB/s throughput

    the %system is 43-45%, half the CPU or so is spent in kernel/system calls (I/O).

    I don't have GUI for passmark, but I did run sysbench.

    CPU is at 1,380 events/s
    Memory is at 3,124 MB/s
    Disk is at 22.8 MB/s, 1,458 IOPS

    If you want me to check or run anything else, let me know.

    It manifests as the box just being unusable. Queries time out, stuff fails to load, SSH is slow/unresponsive - it is across the board. I've tried restarts, disabling services, truncating huge swaths of data, adjusting settings... I've even disabled my replication over to the slave server. I've rewritten critical components into precompiled Rust binaries.

    I've been developing proprietary software for companies for 20+ years, using similar setups - often with exponentially MORE data and heavier workloads. While this particular server is busy, it is a small fraction of the traffic and activity as some of my other projects that are not struggling the same way.

    That is why I say I am losing my mind over this. I'm not sure what else I could possibly do - I have a whole bag of tricks for when I need to juice more performance out of a box, but no matter what I do on this server, it just seems to get WORSE. This was a sudden thing, also, that started last week. It seems to clear up sometimes, but any brief periods of reprieve quickly crumble and before I know it, the system is slammed again and can't do anything. Since it is too slow to complete the workloads, stuff piles up and the problem cascades. I've spent three days addressing individual "issues", none of which ended up being the culprit. I even contacted the host and let them know what I was going through, but it is their claim that nothing is wrong on their end - which caused me to go back and just keep trying more stuff... more configuration changes, more data removed, more processes stopped. All for naught, as nothing has remedied the issue yet. As I said, if anything, it has become WORSE.

    Hi,

    you do not need a GUI for passmark. There is also a CLI available. But that only tests CPU and RAM. Not disk or network.

    You can ( and should ) also use something like yabs which includes a comparable CPU test but also helps with other tests.

    So stop everything and start doing some intense testing.

    Independent of this all, switching away from openVZ wont be a bad idea either way.

    Thanked by 1hyperblast
  • conceptconcept Member
    edited March 17

    @jackboy said:

    $30-$40 a month would be ideal, or if I could get a yearly plan for less than that per month, I would possibly be interested in that. It really depends, I like a good deal, but I need to initiate this migration ASAP - this system generates a lot of money and it is essentially offline at the moment, for all intents and purposes. As much RAM and as many vCPU as I could get would be ideal.

    Sounds like you need a dedicated server but if vps works then I would check out OVH VPS. You can get a lot of vCPU and Ram for cheap.
    https://www.ovhcloud.com/en/vps/

    Since @layer7 is here already, they can get you a Epyc VPS or VDS with lots of cores and ram in Germany or France.

    Thanked by 1layer7
  • ralfralf Member

    @jackboy said:
    It may have brief period of lucidity where it seems everything is running fine, especially after I restart the database and whatnot - but as soon as all the automations start to run, it takes no time at all for the server to act like it is entirely thrashed.

    What would cause something like this to start happening suddenly?

    Are you sure that something isn't consuming a lot more RAM for some reason and it's now swapping when before it was entirely memory bound?

  • dbadudedbadude Member

    @jackboy said:
    I've been doing this 20+ years. I have a live production server on hosting.com (formerly A2) it is one of about a dozen boxes I have running.

    There is a mariadb, apache2, redis and some other basic stuff. The server does marketing and generates a lot of data.

    Near the end of the week last week, it crawled almost to a halt and has been unusable this week. The host claims nothing is wrong or different on their end.

    From my side, I archived millions of rows of data, recompiled crucial binaries to improve their performance, rigged the database and other server settings... everything you can imagine, round-and-round. The server performs so poorly, things start to cascade and fail after a period.

    Queries that worked fine last week on millions MORE rows of data, suddenly time out. I've adjusted buffer pool size (up and down), adjusted swap, every trick I know in the book from 20+ years of doing this.

    The performance just gets worse. Nothing I've done over the last 72+ hours has seemed to make a single dent in whatever is going wrong with that server.

    It may have brief period of lucidity where it seems everything is running fine, especially after I restart the database and whatnot - but as soon as all the automations start to run, it takes no time at all for the server to act like it is entirely thrashed.

    Could this be someting on the host side and they are just not being honest?

    What would cause something like this to start happening suddenly?

    On our end, we didn't really make any crazy changes to anything - not enough to explain this drop in performance. I was thinking maybe just the total cumulative size of the data + indexes had grown so large that we were having an issue keeping everything in RAM. Archiving millions of rows should have fixed that, but it didn't.

    Because the performance is what is causing issues, every single cron and script and query that runs has been thoroughly investigated and optimized - they all seem like the culprit when the performance is taking, but I feel like I'm LOSING MY MIND.

    I've replaced crucial components with Rust binaries. I've truncated off huge swaths of data. I've cleaned up old logs (tons of free disk, plenty of RAM, CPU doesn't always appear throttled). Nothing makes a difference. If anything, it has gotten WORSE.

    I guess really, what I am wondering is like what I asked: is it best maybe I just move to another host and get a different box? I'll have to pull an all-nighter to ensure the migration works, but I just can't physically contend with this other server any more. It must be something on their end, I just can't pinpoint what the nuisance could be. I've never had such a problem before.

    Thanks for your time in reading this and any responses I may get.

    I'm also in the market for another VPS, I guess :)

    after a upgrade sometimes database indexes can get corrupted...
    mysqlcheck --repair --all-databases
    or mariadb equivalent...

  • TrKTrK Member

    When working with crucial money generating stuff, people(including me) tend to get dedi servers or setup clusters of vps with database in different cluster of servers. I mean there's usually a failsafe approach to do things for production. Or just go with big cloud operators if you have deep pockets.

    Thanked by 2tentor rpqu
  • You need to setup proper monitoring so you can watch the behaviour over time rather than relying on snapshot testing that may not identify an intermittent issue as it's occuring.

    Personally I use Zabbix and Prometheus/Grafana/Node Exporter in parallel. Lots of people here recommended Hetrix Tools, but I've no experience with it.

    But whatever you do, you're going to need something that helps you identify the issue and illustrate it to the provider so you're going to need to deploy a proper monitoring solution.

  • alfatarsosalfatarsos Member, Host Rep

    The answer is on the first part of that last message: OpenVZ. The technology in itself it's not that bad, but it's oversold like hell at some providers, and with IOPS limitations. You'll need KVM there... or a decent OpenVZ. But KVM.

    Thanked by 1oloke
  • 22MB/s disk is shitty. Openvz is shitty.

  • Honestly, I feel your pain. 22MB/s disk speed for a production database is just brutal. It really sounds like you've hit the ceiling of what that OpenVZ node can handle, or the host is just overselling it way too much. Moving to a KVM setup would likely solve 90% of these 'ghost' issues instantly. Don't let it drive you crazy - at this point) it’s almost certainly the hypervisor's fault, not your config.

  • forestforest Member
    edited March 19

    @jackboy said: I don't have dmesg access (it is in an OpenVZ container) - that limits what I can see.

    It'll be virtually impossible to do any real profiling on OpenVZ. It's a glorified shell account with pretend root. Switch away from it and get a real VPS. Then you can adjust sysctls as needed, run real profilers like perf and bpftrace, check performance counters, etc.

  • Could be a few different things depending on your setup. If it’s a cheaper VPS, it might be overselling or CPU steal, where other users on the same node affect your performance. I’ve seen cases where everything looks fine on paper but the node itself is just overloaded. You could try checking CPU steal in htop and also see how disk I/O behaves, since slow storage can make everything feel laggy. If it’s already slow even when the system is mostly idle, there’s a good chance it’s just a bad node.

    What provider and specs are you using?

  • BeanzyBeanzy Member

    Ask the merchant to help you resolve the issue

  • TierNetTierNet Member, Patron Provider

    This is why I always recommend, even here on LET, to avoid cheap, overselling providers - as it always ends in complaining about downtimes, performance issue etc.

    Check your YABS and compare to ours in Tier.net

    Sat Oct 18 05:50:22 AM EDT 2025
    
    Basic System Information:
    ---------------------------------
    Uptime     : 0 days, 18 hours, 1 minutes
    Processor  : QEMU Virtual CPU version 2.5+
    CPU cores  : 8 @ 3594.596 MHz
    AES-NI     : ❌ Disabled
    VM-x/AMD-V : ❌ Disabled
    RAM        : 31.3 GiB
    Swap       : 4.9 GiB
    Disk       : 44.3 GiB
    Distro     : Debian GNU/Linux 12 (bookworm)
    Kernel     : 6.1.0-9-amd64
    VM Type    : KVM
    IPv4/IPv6  : ✔ Online / ❌ Offline
    
    IPv4 Network Information:
    ---------------------------------
    ISP        : Tier.Net Technologies LLC
    ASN        : AS397423 Tier.Net Technologies LLC
    Host       : Tier.Net Technologies LLC
    Location   : Binghamton, New York (NY)
    Country    : United States
    
    fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/vda1):
    ---------------------------------
    Block Size | 4k            (IOPS) | 64k           (IOPS)
      ------   | ---            ----  | ----           ---- 
    Read       | 366.02 MB/s  (91.5k) | 5.10 GB/s    (79.7k)
    Write      | 366.98 MB/s  (91.7k) | 5.12 GB/s    (80.1k)
    Total      | 733.01 MB/s (183.2k) | 10.23 GB/s  (159.8k)
               |                      |                     
    Block Size | 512k          (IOPS) | 1m            (IOPS)
      ------   | ---            ----  | ----           ---- 
    Read       | 21.96 GB/s   (42.9k) | 27.80 GB/s   (27.1k)
    Write      | 23.13 GB/s   (45.1k) | 29.65 GB/s   (28.9k)
    Total      | 45.10 GB/s   (88.0k) | 57.45 GB/s   (56.1k)
    
    iperf3 Network Speed Tests (IPv4):
    ---------------------------------
    Provider        | Location (Link)           | Send Speed      | Recv Speed      | Ping           
    -----           | -----                     | ----            | ----            | ----           
    Clouvider       | London, UK (10G)          | 1.93 Gbits/sec  | 1.18 Gbits/sec  | 69.0 ms        
    Eranium         | Amsterdam, NL (100G)      | 2.55 Gbits/sec  | 1.83 Gbits/sec  | 81.9 ms        
    Uztelecom       | Tashkent, UZ (10G)        | 1.01 Gbits/sec  | 69.7 Mbits/sec  | 159 ms         
    Leaseweb        | Singapore, SG (10G)       | 529 Mbits/sec   | 36.9 Mbits/sec  | --             
    Clouvider       | Los Angeles, CA, US (10G) | 1.57 Gbits/sec  | 97.9 Mbits/sec  | 69.6 ms        
    Leaseweb        | NYC, NY, US (10G)         | 9.18 Gbits/sec  | 4.27 Gbits/sec  | 2.32 ms        
    Edgoo           | Sao Paulo, BR (1G)        | 1.65 Gbits/sec  | 1.13 Gbits/sec  | 109 ms         
    
    Geekbench 6 Benchmark Test:
    ---------------------------------
    Test            | Value                         
                    |                               
    Single Core     | 1006                          
    Multi Core      | 5499                          
    Full Test       | https://browser.geekbench.com/v6/cpu/14521407
    
Sign In or Register to comment.