All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Help with OVH/Kimsufi Server Issues (Randomly restarting)
I've been having issues with my OVH server (That I've had for 4 years) with it randomly rebooting. Well, I say rebooting but when I check the logs I can't see it shutting down only starting up - unless I'm missing something obvious.
At one stage a few months back the server went offline and wouldn't startup, I had to rebuild it. Since the rebuild though the issue continues which makes me think it isn't software related.
I reported the issues to OVH and they went on to check the power and said that they replaced the power supply and also the 'cooling system'. Unfortunately the problem still persists.
They said that they would run 'extensive diagnostics' which would result in the server being down for 'several hours'. For whatever reason they completed this test in just under 6 minutes. I'm not convinced they did any real diagnostics in this short period of time.
I was just looking for any guidance, logs to check or suggestions on what may be causing this or what I can do to narrow down the issue. I've been looking at it on/off for so long now I maybe just missing something obvious.
Normally I would just take out a new OVH server and move everything across, but despite having the server for 4 years and renewing monthly, I recently paid upfront for 12 months when the price increase came along to secure the lower price. Talk about bad timing!!
Any suggestions would be greatly appreciated.
Thanks in advance.
Comments
Which server model, and did you check the temperatures yourself?
OVH has a guide for Hardware diagnostics
https://help.ovhcloud.com/csm/en-dedicated-servers-hardware-diagnostics?id=kb_article_view&sysparm_article=KB0043506
Hope it helps
ovh is the most pathetic provider for last 2 years
read there review for ovh.it https://www.trustpilot.com/review/ovh.it
they removed there site ovh.com from trustpilot, it had the worst reviews on trustpilot so they removed there site
It is the Kimsufi KS-10 with an Intel i5-2400 @ 3.10GHz, 16GB RAM and 2TB HDD.
Yes, checked the temperatures, everything seemed OK Here are the results using lm_sensors:
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +40.0°C (high = +80.0°C, crit = +99.0°C)
Core 0: +30.0°C (high = +80.0°C, crit = +99.0°C)
Core 1: +33.0°C (high = +80.0°C, crit = +99.0°C)
Core 2: +40.0°C (high = +80.0°C, crit = +99.0°C)
Core 3: +35.0°C (high = +80.0°C, crit = +99.0°C)
nct6775-isa-0290
Adapter: ISA adapter
Vcore: +1.21 V (min = +0.00 V, max = +1.74 V)
in1: +1.10 V (min = +0.00 V, max = +0.00 V) ALARM
AVCC: +3.30 V (min = +2.98 V, max = +3.63 V)
+3.3V: +3.30 V (min = +2.98 V, max = +3.63 V)
in4: +1.02 V (min = +0.00 V, max = +0.00 V) ALARM
in5: +1.03 V (min = +0.00 V, max = +0.00 V) ALARM
in6: +1.06 V (min = +0.00 V, max = +0.00 V) ALARM
3VSB: +3.36 V (min = +2.98 V, max = +3.63 V)
Vbat: +3.31 V (min = +2.70 V, max = +3.63 V)
fan1: 0 RPM (min = 0 RPM, div = 128)
fan2: 0 RPM (min = 0 RPM, div = 128)
fan3: 0 RPM (min = 0 RPM, div = 128)
fan4: 0 RPM (div = 128)
SYSTIN: +31.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = CPU diode
CPUTIN: +29.0°C (high = +80.0°C, hyst = +75.0°C) sensor = CPU diode
AUXTIN: -128.0°C (high = +80.0°C, hyst = +75.0°C) sensor = CPU diode
PCH_CHIP_TEMP: +49.0°C
PECI Agent 1: +0.0°C (high = +80.0°C, hyst = +75.0°C)
PECI Agent 0: +36.0°C (high = +80.0°C, hyst = +75.0°C)
cpu0_vid: +0.000 V
intrusion0: OK
If it passes the rescue mode diagnostics then ask nicely for a replacement motherboard
+1, as I read the thread it was my first thought too.
@happyman How much you pay for this KS-10 service? All direction of traffic caped to 100 Mbps?
Thanks, I'll give that a go. See if they are helpful and willing to do that without me providing them with proof that is the cause.
€15.59 per month including tax.
@happyman you still around?
On my server (same specs as yours) I suddenly got all sorts of failures in dmesg (USB, SATA, Ethernet). Then it got a high packet loss and stopped pinging entirely. They did an intervention and as they wrote replaced the motherboard. And it all got fixed.
Except with the new motherboard it now suddenly reboots, so far about once a day.
How did the issue end for you? I feel there might be a comical situation, if they replaced your motherboard, put it through some shallow testing, "hum seems fine", and put it into the stockpile of spare motherboards to install into other servers. And I now maybe got yours
@rm_
I struggled to get them to replace the motherboard and got them to in the end (only about a week ago). Unfortunately the system has rebooted a few times since then unfortunately so not entirely sure what else it could be apart from the actual power cabling or supply coming into the server.
Did you try to correlate it with CPU load (or idling)?
One guess I had is VRM at some point in a power-saving mode could be giving too low voltage to the CPU, causing it to reboot (putting aside the root causes of that).
If it happens often enough for you, first you can try disabling the frequency switching and see if that helps:
for I in {0..3}; do cpufreq-set -g performance -c $I; done
If not, then add some actual CPU load at the lowest priority and see if that affects things:
nice -+20 dd if=/dev/zero | nice -+20 md5sum & disown
On mine after two reboots a day apart, I did not have any more so far (so did not try these workarounds yet).
This issue is still occurring. It doesn't seem to be related to CPU load or idling.
I'm at a loss what else to try and what else to suggest OVH swap out.
I've asked if I can get a new server from them and transfer the amount prepaid over but their billing department say it isn't possible.
When I had issues similar to this in the past (many many years ago using the old i7's) I had motherboards replaced and it worked fine afterwards. Only had that happen with their old SYS Intel Core model boards.