Your VPS with the best uptime?

Xei · January 2013

@Maounique said: 95% of our KVM and 100% of Xen and vmware are up since have been setup, ovz has been plagued by reboots tho, it seems there are still exploits that can crash the kernel.

Is that why I read about constant reboot complaints regarding LEBs, is this most likely due to OVZ exploit? If that's the most vulnerable virtualization technology, probably a good for me to stay away..

prometeus · January 2013

@Xei said: Is that why I read about constant reboot complaints regarding LEBs, is this most likely due to OVZ exploit? If that's the most vulnerable virtualization technology, probably a good for me to stay away..

.32 openvz kernels are a bit more bugged than the .18 and with more changes on each release :-)

Xei · January 2013

@prometeus said: .32 openvz kernels are a bit more bugged than the .18 and with more changes on each release :-)

Thanks for that info! I need to make sure I avoid getting placed on that kernel then. :P Would love to get on a Prometeus 128MB yearly but from what I've read those are indefinitely out of stock? I didn't get into LEBs till Nov 2012 so I missed a lot of great yearly opportunities from the seemingly small list of high uptime providers.

miTgiB · January 2013

@prometeus said: .32 openvz kernels are a bit more bugged than the .18 and with more changes on each release :-)

But once you get the right combo, don't mess with it

[root@e3la17 ~]# w
 23:13:29 up 180 days,  4:57,  1 user,  load average: 2.99, 2.81, 2.24
[root@e3la17 ~]# uname -a
Linux e3la17.hostigation.com 2.6.32-042stab055.16 #1 SMP Fri Jun 8 19:22:28 MSD 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@e3la17 ~]# uptrack-uname -a
Linux e3la17.hostigation.com 2.6.32-042stab068.8 #1 SMP Fri Dec 7 17:06:14 MSK 2012 x86_64 x86_64 x86_64 GNU/Linux

VPNsh · January 2013

BlueVM
21:13:44 up 183 days

BlueVM has been pretty consistent for me, and this is on a $10/year package as far as I'm aware .

jcaleb · January 2013

@miTgiB said: But once you get the right combo, don't mess with it

Is all your VZ already .32?

prometeus · January 2013

@miTgiB said: But once you get the right combo, don't mess with it

of course:
[root@pm18 ~]# w
07:14:20 up 143 days, 23:20, 1 user, load average: 0.83, 1.19, 1.12
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT

Linux pm18.prometeus.net 2.6.32-042stab059.7 #1 SMP Tue Jul 24 19:12:01 MSK 2012 x86_64 x86_64 x86_64 GNU/Linux

[root@pm25 ~]# w
07:17:20 up 107 days, 23:29, 1 user, load average: 0.85, 1.28, 1.54
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT

Linux pm25.prometeus.net 2.6.32-042stab061.2 #1 SMP Fri Aug 24 09:07:21 MSK 2012 x86_64 x86_64 x86_64 GNU/Linux

[root@pm27 ~]# w
07:20:25 up 94 days, 11:41, 1 user, load average: 0.51, 1.48, 2.60
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT

[root@pm17 ~]# w
07:21:08 up 89 days, 8:29, 2 users, load average: 0.69, 0.49, 0.38

However with the same extact hardware / software configuration, there are/were cases of random reboot.

Maounique · January 2013

@prometeus said: However with the same extact hardware / software configuration, there are/were cases of random reboot.

Which are not 100% random in the sense that it doesnt happen all the time on all nodes, we isolated the problem on a few containers on node 38. After each reboot we halve them, but will need quite a few more reboots to track the problem one.
One thing is sure, not everything works stable on OVZ and some software can crash the kernel, we just need to find out which.
I think the right combination has a bit of luck in it. You need to find the hw/kernel/containers combination that works. And works for long for that matter, if there was a reboot every hour, we would have found the problem fast, with a reboot every week is going to take a bit longer and committing 2 full nodes for a few containers to apply the halving method is not really a good idea when our stock is always low and putting them on other nodes to risk rebooting those is also not great to hinder even more innocent customers, so it is really a hard situation here, this is why Uncle doesnt really like OVZ, but the people has spoken, more than half our VPSes are on OVZ even with heavily discounted KVM promo we run now.

gsrdgrdghd · January 2013

Why don't you just re-distribute each of the containers you have circled in to other nodes and then see which one crashes?

Anyway OpenVZ seems like a horrible software when stability is important.

Maounique · January 2013

@gsrdgrdghd said: Why don't you just re-distribute each of the containers you have circled in to other nodes and then see which one crashes?

This will mean to crash nodes and customers that have no fault It might also be a case when 2 containers interact to create some race condition or something, it would help a lot to be able to look what they run. but customer privacy is very important and it is not sure i will get more than a list of "suspects".

miTgiB · January 2013

@prometeus said: with the same extact hardware / software configuration

E3's? I have zero issue on E5 systems, but E3's are hit or miss

jar · January 2013

12:47:11 up 211 days, 1:13, 1 user, load average: 0.00, 0.00, 0.00

Hostigation

prometeus · January 2013

@miTgiB said: E3's? I have zero issue on E5 systems, but E3's are hit or miss

I had the reverse experience. All the E5 but one I did throw in were plagued by reboot, while of the 14-15 E3 only one is bad and two rebooted once since they got in production (with more than one month between reboot)

Maounique · January 2013

@miTgiB said: E3's? I have zero issue on E5 systems, but E3's are hit or miss

Actually, the E5 we had to abandon due to these lockups. However, we moved the containers from pm14 we had for the special offer (dual e5 64 gb ram ssd raid 10) to other E3 nodes 16 gb ram but one of them still crashes which didnt happen before. I mean we had quite a few 2xE5s (3 at least) that we had to move ppl from because these reboots, but once we moved them to E3s reboots stopped. This time didnt and it is almost sure there is some software incompatibility.
I hope to find the container and see what is that software, hope the user will allow access.

Xei · January 2013

dedicated server: 4:02PM up 313 days, 23:58, 2 users, load averages: 0.10, 0.10, 0.08
ec2: 21:02:27 up 300 days, 11:15, 1 user, load average: 0.00, 0.01, 0.05

I just have one LEB (Hostigation) which has 17 days uptime (up since I purchased it), hopefully later this year I can report some nice LEB uptimes. I am getting a 2nd LEB this week or next, using this thread as guidance to pick my 2nd provider. So keep those uptimes coming. :P

Ishaq · January 2013

@liamwithers said: BlueVM has been pretty consistent for me, and this is on a $10/year package as far as I'm aware .

I'm glad everything is working out for you

miTgiB · January 2013

@prometeus said: I had the reverse experience.

SuperMicro or some other FoxConn failure?

prometeus · January 2013

@miTgiB said: SuperMicro or some other FoxConn failure?

supermicro. the first locks were under form of filesystem lock (they were ext4 related) and were solved with one kernel upgrade (I don't remember which version) but at that point I had several nodes with ext3.
But at least you had some clue about the problems, some dumps and so on.

Then I had these reboot were you don't see anything. It's like if somebody press a reset button. I started thinking about some ipmi vulnerability :-)

miTgiB · January 2013

@prometeus said: It's like if somebody press a reset button.

I was seeing the same thing on an E3 node, bricked it remotely updating the bios since the raid card firmware didn't like it, then again with another E3 that is my newest KVM node now

pcan · January 2013

@prometeus: Have you tried to update the Bios? A sudden reboot that only happens with specific software/OS configurations could be the result of a processor bug, the kind of issue that Intel routinely fix with the processor microcode upgrade. Thats remind me a similar issue I fixed with a Bios update, some years ago. The list of known bugs is published by Intel every few months: it is called "Specification Update". The latest Intel Xeon E5 specification update lists more than 200 bugs: one of the worst I have seen in recent times. By contrast, Xeon E3 and E3-v2 specification updates are a lot shorter. It may be that the E3 processor is less tested than E5, but I have some doubt. With so many bugs to patch, a fresh microcode update is crucial.

prometeus · January 2013

@pcan said: Have you tried to update the Bios?

I did on the latest E5 I put in production without any difference. Also the problem is openvz related as they rocks as KVM nodes...

Ishaq · January 2013

Probably another OpenVZ bug..

miTgiB · January 2013

@prometeus said: the problem is openvz related

Since I have 7 E5 nodes, all with LSI cards, and zero issues, I am curious what you might be doing different, as it seems we are sharing the same troubles, just with the E3 for me and E5 for yourself. I do agree something is odd with the ovz kernel, I just haven't seen it produce any trouble for me on E5's. I've been using the SM X9DRL-IF-O motherboard and either the LSI 9260/9266/9271 raid card.

miTgiB · January 2013

@Zen said: debian since debian seems much more stable than centos

Must be perception, as vz is based on RHEL 2.6.32 kernel, so even debian underneath is still redhat when running vz

Maounique · January 2013

@Zen said: Nested virtualization would seem optimal to me

I tried to talk about Xen-pv into KVM and Uncle didnt want to hear about it. I am sure it has some advantages and I would definitely do it as I did it and do it at home and in my colo server since i have to try a demo cloud setup, but it is not the way Uncle likes it, he is used only with best hw and half empty vmware/rhev clusters kept there ready for big companies and streaming for Italian media.
A node which is over 30% cpu usage is "overloaded" in his view, he really likes to keep space for abusers so they dont hinder other ppl, give them time to sort their issues, etc.
Imagine how much money ppl that oversell are doing if we can survive in this market...

concerto49 · January 2013

@miTgiB said: I've been using the SM X9DRL-IF-O motherboard and either the LSI 9260/9266/9271 raid card.

I have been hearing motherboard incompatibility and instability issues with RHEL based systems, but I forgot which exact models.

ShardHostSarah · January 2013

@Maounique said: A node which is over 30% cpu usage is "overloaded" in his view

These poor unused servers

Amitz · January 2013

I like Sal's attitude. ;-)

Xei · January 2013

@Amitz said: I like Sal's attitude. ;-)

Agreed. Probably why they were #1 in Q4 2012.

ErawanArifNugroho · January 2013

@Maounique said: 30% cpu usage is "overloaded" in his view

It's really really...
We love you Uncle

Howdy, Stranger!

Categories

In this Discussion

Your VPS with the best uptime?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Your VPS with the best uptime?

Comments