I am having a horrible time with a VPS and its performance - can anybody please help me?

jackboy · March 17

I've been doing this 20+ years. I have a live production server on hosting.com (formerly A2) it is one of about a dozen boxes I have running.

There is a mariadb, apache2, redis and some other basic stuff. The server does marketing and generates a lot of data.

Near the end of the week last week, it crawled almost to a halt and has been unusable this week. The host claims nothing is wrong or different on their end.

From my side, I archived millions of rows of data, recompiled crucial binaries to improve their performance, rigged the database and other server settings... everything you can imagine, round-and-round. The server performs so poorly, things start to cascade and fail after a period.

Queries that worked fine last week on millions MORE rows of data, suddenly time out. I've adjusted buffer pool size (up and down), adjusted swap, every trick I know in the book from 20+ years of doing this.

The performance just gets worse. Nothing I've done over the last 72+ hours has seemed to make a single dent in whatever is going wrong with that server.

It may have brief period of lucidity where it seems everything is running fine, especially after I restart the database and whatnot - but as soon as all the automations start to run, it takes no time at all for the server to act like it is entirely thrashed.

Could this be someting on the host side and they are just not being honest?

What would cause something like this to start happening suddenly?

On our end, we didn't really make any crazy changes to anything - not enough to explain this drop in performance. I was thinking maybe just the total cumulative size of the data + indexes had grown so large that we were having an issue keeping everything in RAM. Archiving millions of rows should have fixed that, but it didn't.

Because the performance is what is causing issues, every single cron and script and query that runs has been thoroughly investigated and optimized - they all seem like the culprit when the performance is taking, but I feel like I'm LOSING MY MIND.

I've replaced crucial components with Rust binaries. I've truncated off huge swaths of data. I've cleaned up old logs (tons of free disk, plenty of RAM, CPU doesn't always appear throttled). Nothing makes a difference. If anything, it has gotten WORSE.

I guess really, what I am wondering is like what I asked: is it best maybe I just move to another host and get a different box? I'll have to pull an all-nighter to ensure the migration works, but I just can't physically contend with this other server any more. It must be something on their end, I just can't pinpoint what the nuisance could be. I've never had such a problem before.

Thanks for your time in reading this and any responses I may get.

I'm also in the market for another VPS, I guess

oloke · March 17

I'm also in the market for another VPS, I guess

Please tell us your desired specs and acceptable budget

muddy · March 17

back in the old days, I used A2hosting for everything. Decent pricing and good quality support. Looks like they were recently acquired/merged into something that smells a little like the EIG/Newfold Digital situation. could be totally wrong, but some quick research on reddit seems to confirm those suspicions. Again, I could be wrong, but if you've not had recent good luck with their support, it might be wise to look elsewhere. Plenty of other good options available out there...

jackboy · March 17

@oloke said:

I'm also in the market for another VPS, I guess

Please tell us your desired specs and acceptable budget

$30-$40 a month would be ideal, or if I could get a yearly plan for less than that per month, I would possibly be interested in that. It really depends, I like a good deal, but I need to initiate this migration ASAP - this system generates a lot of money and it is essentially offline at the moment, for all intents and purposes. As much RAM and as many vCPU as I could get would be ideal.

3K33 · March 17

@jackboy said:

@oloke said:

I'm also in the market for another VPS, I guess

Please tell us your desired specs and acceptable budget

$30-$40 a month would be ideal, or if I could get a yearly plan for less than that per month, I would possibly be interested in that. It really depends, I like a good deal, but I need to initiate this migration ASAP - this system generates a lot of money and it is essentially offline at the moment, for all intents and purposes. As much RAM and as many vCPU as I could get would be ideal.

Check our plans at https://strike.bz, PRO line (in SG/PL) on yearly is discounted at 50% on yearly plan. Our highest plan is Enterprise (in Poland) with 18GB/8vCPU/120GB NVMe Disk with discount it comes out at 19.50€/mo, if you need more specs, let me know, we can work something out. Discount code: UK6WGYS59N

Heron · March 17

If you're able, run the Passmark Performance Test. That tests CPU, RAM, drives and might show what is underperforming. If not, Geekbench can test the CPU.

JabJab · March 17

Still have no idea what is the issue.
IO and storage latency? CPU usage? Packet loss? OOM? Anything?

I am very confused on those "20+ years of doing this.".

jackboy · March 17

@JabJab said:
Still have no idea what is the issue.
IO and storage latency? CPU usage? Packet loss? OOM? Anything?

I am very confused on those "20+ years of doing this.".

Well, here is a breakdown:

I don't have dmesg access (it is in an OpenVZ container) - that limits what I can see.

r_await is always listed as 0.00 and the %util is always reporting 0.00

iostat is saying 10,980-11,701 reads/second at 4KB each with ~44-47 MB/s throughput

the %system is 43-45%, half the CPU or so is spent in kernel/system calls (I/O).

I don't have GUI for passmark, but I did run sysbench.

CPU is at 1,380 events/s
Memory is at 3,124 MB/s
Disk is at 22.8 MB/s, 1,458 IOPS

If you want me to check or run anything else, let me know.

It manifests as the box just being unusable. Queries time out, stuff fails to load, SSH is slow/unresponsive - it is across the board. I've tried restarts, disabling services, truncating huge swaths of data, adjusting settings... I've even disabled my replication over to the slave server. I've rewritten critical components into precompiled Rust binaries.

I've been developing proprietary software for companies for 20+ years, using similar setups - often with exponentially MORE data and heavier workloads. While this particular server is busy, it is a small fraction of the traffic and activity as some of my other projects that are not struggling the same way.

That is why I say I am losing my mind over this. I'm not sure what else I could possibly do - I have a whole bag of tricks for when I need to juice more performance out of a box, but no matter what I do on this server, it just seems to get WORSE. This was a sudden thing, also, that started last week. It seems to clear up sometimes, but any brief periods of reprieve quickly crumble and before I know it, the system is slammed again and can't do anything. Since it is too slow to complete the workloads, stuff piles up and the problem cascades. I've spent three days addressing individual "issues", none of which ended up being the culprit. I even contacted the host and let them know what I was going through, but it is their claim that nothing is wrong on their end - which caused me to go back and just keep trying more stuff... more configuration changes, more data removed, more processes stopped. All for naught, as nothing has remedied the issue yet. As I said, if anything, it has become WORSE.

layer7 · March 17

@jackboy said:

@JabJab said:
Still have no idea what is the issue.
IO and storage latency? CPU usage? Packet loss? OOM? Anything?

I am very confused on those "20+ years of doing this.".

Well, here is a breakdown:

I don't have dmesg access (it is in an OpenVZ container) - that limits what I can see.

r_await is always listed as 0.00 and the %util is always reporting 0.00

iostat is saying 10,980-11,701 reads/second at 4KB each with ~44-47 MB/s throughput

the %system is 43-45%, half the CPU or so is spent in kernel/system calls (I/O).

I don't have GUI for passmark, but I did run sysbench.

CPU is at 1,380 events/s
Memory is at 3,124 MB/s
Disk is at 22.8 MB/s, 1,458 IOPS

If you want me to check or run anything else, let me know.

It manifests as the box just being unusable. Queries time out, stuff fails to load, SSH is slow/unresponsive - it is across the board. I've tried restarts, disabling services, truncating huge swaths of data, adjusting settings... I've even disabled my replication over to the slave server. I've rewritten critical components into precompiled Rust binaries.

I've been developing proprietary software for companies for 20+ years, using similar setups - often with exponentially MORE data and heavier workloads. While this particular server is busy, it is a small fraction of the traffic and activity as some of my other projects that are not struggling the same way.

That is why I say I am losing my mind over this. I'm not sure what else I could possibly do - I have a whole bag of tricks for when I need to juice more performance out of a box, but no matter what I do on this server, it just seems to get WORSE. This was a sudden thing, also, that started last week. It seems to clear up sometimes, but any brief periods of reprieve quickly crumble and before I know it, the system is slammed again and can't do anything. Since it is too slow to complete the workloads, stuff piles up and the problem cascades. I've spent three days addressing individual "issues", none of which ended up being the culprit. I even contacted the host and let them know what I was going through, but it is their claim that nothing is wrong on their end - which caused me to go back and just keep trying more stuff... more configuration changes, more data removed, more processes stopped. All for naught, as nothing has remedied the issue yet. As I said, if anything, it has become WORSE.

Hi,

you do not need a GUI for passmark. There is also a CLI available. But that only tests CPU and RAM. Not disk or network.

You can ( and should ) also use something like yabs which includes a comparable CPU test but also helps with other tests.

So stop everything and start doing some intense testing.

Independent of this all, switching away from openVZ wont be a bad idea either way.

concept · March 17

@jackboy said:

$30-$40 a month would be ideal, or if I could get a yearly plan for less than that per month, I would possibly be interested in that. It really depends, I like a good deal, but I need to initiate this migration ASAP - this system generates a lot of money and it is essentially offline at the moment, for all intents and purposes. As much RAM and as many vCPU as I could get would be ideal.

Sounds like you need a dedicated server but if vps works then I would check out OVH VPS. You can get a lot of vCPU and Ram for cheap.
https://www.ovhcloud.com/en/vps/

Since @layer7 is here already, they can get you a Epyc VPS or VDS with lots of cores and ram in Germany or France.

ralf · March 17

@jackboy said:
It may have brief period of lucidity where it seems everything is running fine, especially after I restart the database and whatnot - but as soon as all the automations start to run, it takes no time at all for the server to act like it is entirely thrashed.

What would cause something like this to start happening suddenly?

Are you sure that something isn't consuming a lot more RAM for some reason and it's now swapping when before it was entirely memory bound?

dbadude · March 17

@jackboy said:
I've been doing this 20+ years. I have a live production server on hosting.com (formerly A2) it is one of about a dozen boxes I have running.

There is a mariadb, apache2, redis and some other basic stuff. The server does marketing and generates a lot of data.

Near the end of the week last week, it crawled almost to a halt and has been unusable this week. The host claims nothing is wrong or different on their end.

From my side, I archived millions of rows of data, recompiled crucial binaries to improve their performance, rigged the database and other server settings... everything you can imagine, round-and-round. The server performs so poorly, things start to cascade and fail after a period.

Queries that worked fine last week on millions MORE rows of data, suddenly time out. I've adjusted buffer pool size (up and down), adjusted swap, every trick I know in the book from 20+ years of doing this.

The performance just gets worse. Nothing I've done over the last 72+ hours has seemed to make a single dent in whatever is going wrong with that server.

It may have brief period of lucidity where it seems everything is running fine, especially after I restart the database and whatnot - but as soon as all the automations start to run, it takes no time at all for the server to act like it is entirely thrashed.

Could this be someting on the host side and they are just not being honest?

What would cause something like this to start happening suddenly?

On our end, we didn't really make any crazy changes to anything - not enough to explain this drop in performance. I was thinking maybe just the total cumulative size of the data + indexes had grown so large that we were having an issue keeping everything in RAM. Archiving millions of rows should have fixed that, but it didn't.

Because the performance is what is causing issues, every single cron and script and query that runs has been thoroughly investigated and optimized - they all seem like the culprit when the performance is taking, but I feel like I'm LOSING MY MIND.

I've replaced crucial components with Rust binaries. I've truncated off huge swaths of data. I've cleaned up old logs (tons of free disk, plenty of RAM, CPU doesn't always appear throttled). Nothing makes a difference. If anything, it has gotten WORSE.

I guess really, what I am wondering is like what I asked: is it best maybe I just move to another host and get a different box? I'll have to pull an all-nighter to ensure the migration works, but I just can't physically contend with this other server any more. It must be something on their end, I just can't pinpoint what the nuisance could be. I've never had such a problem before.

Thanks for your time in reading this and any responses I may get.

I'm also in the market for another VPS, I guess

after a upgrade sometimes database indexes can get corrupted...
mysqlcheck --repair --all-databases
or mariadb equivalent...

TrK · March 17

When working with crucial money generating stuff, people(including me) tend to get dedi servers or setup clusters of vps with database in different cluster of servers. I mean there's usually a failsafe approach to do things for production. Or just go with big cloud operators if you have deep pockets.

CloudHopper · March 17

You need to setup proper monitoring so you can watch the behaviour over time rather than relying on snapshot testing that may not identify an intermittent issue as it's occuring.

Personally I use Zabbix and Prometheus/Grafana/Node Exporter in parallel. Lots of people here recommended Hetrix Tools, but I've no experience with it.

But whatever you do, you're going to need something that helps you identify the issue and illustrate it to the provider so you're going to need to deploy a proper monitoring solution.

alfatarsos · March 17

The answer is on the first part of that last message: OpenVZ. The technology in itself it's not that bad, but it's oversold like hell at some providers, and with IOPS limitations. You'll need KVM there... or a decent OpenVZ. But KVM.

TimboJones · March 18

22MB/s disk is shitty. Openvz is shitty.

brainjava · March 18

Honestly, I feel your pain. 22MB/s disk speed for a production database is just brutal. It really sounds like you've hit the ceiling of what that OpenVZ node can handle, or the host is just overselling it way too much. Moving to a KVM setup would likely solve 90% of these 'ghost' issues instantly. Don't let it drive you crazy - at this point) it’s almost certainly the hypervisor's fault, not your config.

forest · March 19

@jackboy said: I don't have dmesg access (it is in an OpenVZ container) - that limits what I can see.

It'll be virtually impossible to do any real profiling on OpenVZ. It's a glorified shell account with pretend root. Switch away from it and get a real VPS. Then you can adjust sysctls as needed, run real profilers like perf and bpftrace, check performance counters, etc.

muhbootloader · March 19

Could be a few different things depending on your setup. If it’s a cheaper VPS, it might be overselling or CPU steal, where other users on the same node affect your performance. I’ve seen cases where everything looks fine on paper but the node itself is just overloaded. You could try checking CPU steal in htop and also see how disk I/O behaves, since slow storage can make everything feel laggy. If it’s already slow even when the system is mostly idle, there’s a good chance it’s just a bad node.

What provider and specs are you using?

Beanzy · March 20

Ask the merchant to help you resolve the issue

TierNet · March 20

This is why I always recommend, even here on LET, to avoid cheap, overselling providers - as it always ends in complaining about downtimes, performance issue etc.

Check your YABS and compare to ours in Tier.net

Sat Oct 18 05:50:22 AM EDT 2025

Basic System Information:
---------------------------------
Uptime     : 0 days, 18 hours, 1 minutes
Processor  : QEMU Virtual CPU version 2.5+
CPU cores  : 8 @ 3594.596 MHz
AES-NI     : ❌ Disabled
VM-x/AMD-V : ❌ Disabled
RAM        : 31.3 GiB
Swap       : 4.9 GiB
Disk       : 44.3 GiB
Distro     : Debian GNU/Linux 12 (bookworm)
Kernel     : 6.1.0-9-amd64
VM Type    : KVM
IPv4/IPv6  : ✔ Online / ❌ Offline

IPv4 Network Information:
---------------------------------
ISP        : Tier.Net Technologies LLC
ASN        : AS397423 Tier.Net Technologies LLC
Host       : Tier.Net Technologies LLC
Location   : Binghamton, New York (NY)
Country    : United States

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/vda1):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 366.02 MB/s  (91.5k) | 5.10 GB/s    (79.7k)
Write      | 366.98 MB/s  (91.7k) | 5.12 GB/s    (80.1k)
Total      | 733.01 MB/s (183.2k) | 10.23 GB/s  (159.8k)
           |                      |                     
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 21.96 GB/s   (42.9k) | 27.80 GB/s   (27.1k)
Write      | 23.13 GB/s   (45.1k) | 29.65 GB/s   (28.9k)
Total      | 45.10 GB/s   (88.0k) | 57.45 GB/s   (56.1k)

iperf3 Network Speed Tests (IPv4):
---------------------------------
Provider        | Location (Link)           | Send Speed      | Recv Speed      | Ping           
-----           | -----                     | ----            | ----            | ----           
Clouvider       | London, UK (10G)          | 1.93 Gbits/sec  | 1.18 Gbits/sec  | 69.0 ms        
Eranium         | Amsterdam, NL (100G)      | 2.55 Gbits/sec  | 1.83 Gbits/sec  | 81.9 ms        
Uztelecom       | Tashkent, UZ (10G)        | 1.01 Gbits/sec  | 69.7 Mbits/sec  | 159 ms         
Leaseweb        | Singapore, SG (10G)       | 529 Mbits/sec   | 36.9 Mbits/sec  | --             
Clouvider       | Los Angeles, CA, US (10G) | 1.57 Gbits/sec  | 97.9 Mbits/sec  | 69.6 ms        
Leaseweb        | NYC, NY, US (10G)         | 9.18 Gbits/sec  | 4.27 Gbits/sec  | 2.32 ms        
Edgoo           | Sao Paulo, BR (1G)        | 1.65 Gbits/sec  | 1.13 Gbits/sec  | 109 ms         

Geekbench 6 Benchmark Test:
---------------------------------
Test            | Value                         
                |                               
Single Core     | 1006                          
Multi Core      | 5499                          
Full Test       | https://browser.geekbench.com/v6/cpu/14521407

Howdy, Stranger!

Categories

In this Discussion

I am having a horrible time with a VPS and its performance - can anybody please help me?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

I am having a horrible time with a VPS and its performance - can anybody please help me?

Comments