Does anyone have memory allocation failures on Racknerd KVM?

davide · November 2022

I haven't been able to track the problem to its source, but every couple of weeks or so I find that either some daemon processes, like the webserver and NodeJS instances, have been unduly terminated without being supposed to, or that the KVM instance itself was unexpectedly rebooted, and upon the subsequent restart some of the automatically-started daemons are not running, such as, again, the webserver and NodeJS instances.

There are no system logs reporting either OOM conditions or errors, not even the random system reboots are logged, as if the KVM instance is hard-reset without issuing an ACPI signal; only the subsequent boots are logged, but not the preceding shut-downs, which happen accidentally and randomly every few weeks.

I suspect that Racknerd KVM instances may be over committed with respect to the host server memory, and this may cause accidental crashes to guest processes and to the VM itself. I don't think this is implausible considering the extremely low prices.

Anyone else noticing similar weirdness?

Sapcedor · November 2022

I actually run three VPS servers with them and have never had a similar issue. Everything is working as expected. I had a similar issue with another provider, and I asked them to migrate my VPS; they did it, and the problem is gone.

Sorry, I can't be of much assistance!

WebProject · November 2022

Contact support and ask them to resolve the issue and double the bandwidth and RAM, if the do oversell it so it cost them peanuts

Sapcedor · November 2022

Did you check your resource usage? Even if the RAM is not dedicated, you should have high RAM usage for that to happen. If you install Netdata, you will be able to track resource usage on your VPS, but that will probably not help you identify the problem. I really was not able to identify the problem with my previous provider, and I just needed to claim that by not doing anything, the instance got reboot every day.

dustinc · November 2022

Hi @davide -- Thank You for being our valued customer This definitely should not be the case - our services are not overloaded, and quite the opposite of what we're known for (we overall have a very solid reputation for providing solid performance and services). In addition to that, we proactively monitor the health and status of all of our host nodes (though we do not monitor individual VM's with our unmanaged services).

If you don't mind, shoot me an e-mail at [email protected] with your VPS IP, I'll double check things on our end, and help you out with a resolution path.

pedagang · November 2022

escalated quickly and resolved ... as usual ... imaging is number one

davide · November 2022

@Sapcedor

I'm monitoring the instance with Munin, I have graphical plots of everything. Memory usage is somewhat high, but stable, with little variance over time: 70% is allocated by user processes, 25% is cache, and 5% is for buffers or unused. I copied this filesystem tree (/etc, /home, ...) as-is from my previous VPS provider from a VM with the same amount of memory (2 GB) but at a significantly higher price. With the same OS, config files and processes, it ran for 10 years there without such issues.

AXYZE · November 2022

@dustinc said:
Hi @davide -- Thank You for being our valued customer This definitely should not be the case - our services are not overloaded, and quite the opposite of what we're known for (we overall have a very solid reputation for providing solid performance and services). In addition to that, we proactively monitor the health and status of all of our host nodes (though we do not monitor individual VM's with our unmanaged services).

If you don't mind, shoot me an e-mail at [email protected] with your VPS IP, I'll double check things on our end, and help you out with a resolution path.

After you "reintroduced" Amsterdam during this BF performance of my Amsterdam VPS that I have for 1.5years went down by 40-50%.

For example multicore (2core) Geekbench 5 score is 600-700.
Will you also take a look at my node if I send you mail?

dustinc · November 2022

@AXYZE said:

@dustinc said:
Hi @davide -- Thank You for being our valued customer This definitely should not be the case - our services are not overloaded, and quite the opposite of what we're known for (we overall have a very solid reputation for providing solid performance and services). In addition to that, we proactively monitor the health and status of all of our host nodes (though we do not monitor individual VM's with our unmanaged services).

If you don't mind, shoot me an e-mail at [email protected] with your VPS IP, I'll double check things on our end, and help you out with a resolution path.

After you "reintroduced" Amsterdam during this BF performance of my Amsterdam VPS that I have for 1.5years went down by 40-50%.

For example multicore (2core) Geekbench 5 score is 600-700.
Will you also take a look at my node if I send you mail?

Hi @AXYZE -- Thank You for your business over the years (wow, time goes by quick!) Happy to hear that our services have been working out well for the better part of the 1.5 years you've been with us.

We're not seeing any monitoring events in Amsterdam at the moment, nonetheless, it doesn't hurt to double check - feel free to send me an e-mail with your VPS IP and I'd be happy to take a look, we can evaluate real usage, etc. [email protected]

We sincerely appreciate your business and look forward to working with you for another 1.5 years, and then some more Thank You again.

Carlin0 · November 2022

I have a VPS in LA and so far no problem

davide · November 2022

@dustinc said:
Hi @davide -- Thank You for being our valued customer

Hi dustin,

we have already discussed this issue months ago, and to no avail you blamed me for incompetence in improperly sizing the VPS, prompting me to buy an expansion pack of even more (oversold) system memory.

As you know, the same malfunctions I described here were already reported multiple times on TrustPilot by numerous customers, me included. These negative reviews have now mysteriously disappeared. My own review went down under the pretext that I violated TrustPilot's ToS despite being honest, eloquent and polite.

Only one of these reviews can still be found on archive.org

Daniel15 · November 2022

@davide said: I'm monitoring the instance with Munin

Munin runs every 5 minutes, so you'll likely miss any anomalies unless they last longer than a few minutes. I'd recommend upgrading to Netdata instead, which updates its stats every second. I switched a few years ago and would recommend it.

dustinc · November 2022

@davide said:

@dustinc said:
Hi @davide -- Thank You for being our valued customer

Hi dustin,

we have already discussed this issue months ago, and to no avail you blamed me for incompetence in improperly sizing the VPS, prompting me to buy an expansion pack of even more (oversold) system memory.

As you know, the same malfunctions I described here were already reported multiple times on TrustPilot by numerous customers, me included. These negative reviews have now mysteriously disappeared. My own review went down under the pretext that I violated TrustPilot's ToS despite being honest, eloquent and polite.

Only one of these reviews can still be found on archive.org

Hi @davide -- appreciate you following up. I was able to pull up your account, and in the one and only support ticket I am able to see under your account (let me know if I'm mistaken, or if you may have created a ticket from a different email, etc), it looks like our team had reached out to you as a follow up (but we did not hear back from you after our response on July 7), suggesting that you may have OOM'd as a result of the applications running within your VPS. You mentioned here you purchased an expansion/upgrade pack, but I am not seeing any upgrades or addons purchased, your VPS is still intact with the original included amount of RAM with the package (again - feel free to correct me if I'm mistaken) nor do I see any communication within your account related to purchasing an upgrade. In your ticket, you mentioned some concerns about your VM utilizing swap space, and I do see that our team followed up and made some suggestions for you with regards to optimizing your vm.swappiness level from within your VM's operating system level. By default most Linux distributions have the vm.swappiness level at 60, and also keep in mind https://www.linuxatemyram.com/ - so it is normal for some swap usage to be utilized unless you specifically notate otherwise in the vm.swappiness level within your OS.

Furthermore, I just personally checked the node your VPS resides on (DAL107KVM) and do not see any current issues nor any resource contention (this node actually looks to be pretty quiet in terms of resource utilization). Even still we'd like to work with you on a resolution path here, if you don't mind, ultimately we're only happy if our customers are happy, so let's work together on identifying an amicable solution. I'll be reaching out to you to the e-mail on file shortly and we can go from there.

jackb · November 2022

Just a thought @dustinc

There are no system logs reporting either OOM conditions or errors, not even the random system reboots are logged, as if the KVM instance is hard-reset without issuing an ACPI signal; only the subsequent boots are logged, but not the preceding shut-downs, which happen accidentally and randomly every few weeks.

I've seen this sort of thing a few times before -- where there was sufficient memory available on the host but VMs were OOM killed anyway.

This would be visible in the qemu log file and messages log file on the host.

Probably worth a look. I found that the disk cache mode of the VM was a trigger -- in writeback the issue would occur; in none - it wouldn't.

dustinc · November 2022

@jackb said:
Just a thought @dustinc

There are no system logs reporting either OOM conditions or errors, not even the random system reboots are logged, as if the KVM instance is hard-reset without issuing an ACPI signal; only the subsequent boots are logged, but not the preceding shut-downs, which happen accidentally and randomly every few weeks.

I've seen this sort of thing a few times before -- where there was sufficient memory available on the host but VMs were OOM killed anyway.

This would be visible in the qemu log file and messages log file on the host.

Probably worth a look. I found that the disk cache mode of the VM was a trigger -- in writeback the issue would occur; in none - it wouldn't.

Appreciate this @jackb -- I can confirm that by default we do have disk cache set to ‘none’, unless specifically requested otherwise by the customer.

I’ve reached out to the OP via email shortly after my latest reply here (currently pending his response) -- always happy to dig deeper and curious to do so. Based on our very last discussion as commented above, we’d be happy to see where things are at today within his environment, and take it from there.

davide · November 2022

@Daniel15 said:
Munin runs every 5 minutes, so you'll likely miss any anomalies unless they last longer than a few minutes. I'd recommend upgrading to Netdata instead, which updates its stats every second. I switched a few years ago and would recommend it.

That's a good idea, to monitor memory usage at shorter intervals. I have Munin already too much customized and integrated with a fleet of other monitored VMs to throw it away with nonchalance, so for the moment I wrote this Bash script to log memory usage on my Racknerd VM:

#!/bin/bash

while :; do
    date
    free -m
    echo

    sleep 1
done >monitor.log

Right now it's printing values close to these: (all values in MB)

Tue 29 Nov 2022 00:33:19 AM CET
              total        used        free      shared  buff/cache   available
Mem:           1995        1427         264           1         302         617
Swap:           510           0         510

So far, with 20 minutes logged, there's almost no variance in memory consumption over time, consistently with the Munin charts.

If I keep getting crashes that are not justified by memory exhaustion within the VM, I'll find a way to further reduce memory allocations in userspace, until the Racknerd contract expires in a few months. We'll see.

Howdy, Stranger!

Categories

In this Discussion

Does anyone have memory allocation failures on Racknerd KVM?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Does anyone have memory allocation failures on Racknerd KVM?

Comments