New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Comments
Is that why I read about constant reboot complaints regarding LEBs, is this most likely due to OVZ exploit? If that's the most vulnerable virtualization technology, probably a good for me to stay away..
.32 openvz kernels are a bit more bugged than the .18 and with more changes on each release :-)
Thanks for that info! I need to make sure I avoid getting placed on that kernel then. :P Would love to get on a Prometeus 128MB yearly but from what I've read those are indefinitely out of stock? I didn't get into LEBs till Nov 2012 so I missed a lot of great yearly opportunities from the seemingly small list of high uptime providers.
But once you get the right combo, don't mess with it
BlueVM has been pretty consistent for me, and this is on a $10/year package as far as I'm aware .
Is all your VZ already .32?
of course:
[root@pm18 ~]# w
07:14:20 up 143 days, 23:20, 1 user, load average: 0.83, 1.19, 1.12
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
Linux pm18.prometeus.net 2.6.32-042stab059.7 #1 SMP Tue Jul 24 19:12:01 MSK 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@pm25 ~]# w
07:17:20 up 107 days, 23:29, 1 user, load average: 0.85, 1.28, 1.54
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
Linux pm25.prometeus.net 2.6.32-042stab061.2 #1 SMP Fri Aug 24 09:07:21 MSK 2012 x86_64 x86_64 x86_64 GNU/Linux
[root@pm27 ~]# w
07:20:25 up 94 days, 11:41, 1 user, load average: 0.51, 1.48, 2.60
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
[root@pm17 ~]# w
07:21:08 up 89 days, 8:29, 2 users, load average: 0.69, 0.49, 0.38
However with the same extact hardware / software configuration, there are/were cases of random reboot.
Which are not 100% random in the sense that it doesnt happen all the time on all nodes, we isolated the problem on a few containers on node 38. After each reboot we halve them, but will need quite a few more reboots to track the problem one.
One thing is sure, not everything works stable on OVZ and some software can crash the kernel, we just need to find out which.
I think the right combination has a bit of luck in it. You need to find the hw/kernel/containers combination that works. And works for long for that matter, if there was a reboot every hour, we would have found the problem fast, with a reboot every week is going to take a bit longer and committing 2 full nodes for a few containers to apply the halving method is not really a good idea when our stock is always low and putting them on other nodes to risk rebooting those is also not great to hinder even more innocent customers, so it is really a hard situation here, this is why Uncle doesnt really like OVZ, but the people has spoken, more than half our VPSes are on OVZ even with heavily discounted KVM promo we run now.
Why don't you just re-distribute each of the containers you have circled in to other nodes and then see which one crashes?
Anyway OpenVZ seems like a horrible software when stability is important.
This will mean to crash nodes and customers that have no fault It might also be a case when 2 containers interact to create some race condition or something, it would help a lot to be able to look what they run. but customer privacy is very important and it is not sure i will get more than a list of "suspects".
E3's? I have zero issue on E5 systems, but E3's are hit or miss
12:47:11 up 211 days, 1:13, 1 user, load average: 0.00, 0.00, 0.00
Hostigation
I had the reverse experience. All the E5 but one I did throw in were plagued by reboot, while of the 14-15 E3 only one is bad and two rebooted once since they got in production (with more than one month between reboot)
Actually, the E5 we had to abandon due to these lockups. However, we moved the containers from pm14 we had for the special offer (dual e5 64 gb ram ssd raid 10) to other E3 nodes 16 gb ram but one of them still crashes which didnt happen before. I mean we had quite a few 2xE5s (3 at least) that we had to move ppl from because these reboots, but once we moved them to E3s reboots stopped. This time didnt and it is almost sure there is some software incompatibility.
I hope to find the container and see what is that software, hope the user will allow access.
dedicated server: 4:02PM up 313 days, 23:58, 2 users, load averages: 0.10, 0.10, 0.08
ec2: 21:02:27 up 300 days, 11:15, 1 user, load average: 0.00, 0.01, 0.05
I just have one LEB (Hostigation) which has 17 days uptime (up since I purchased it), hopefully later this year I can report some nice LEB uptimes. I am getting a 2nd LEB this week or next, using this thread as guidance to pick my 2nd provider. So keep those uptimes coming. :P
I'm glad everything is working out for you
SuperMicro or some other FoxConn failure?
supermicro. the first locks were under form of filesystem lock (they were ext4 related) and were solved with one kernel upgrade (I don't remember which version) but at that point I had several nodes with ext3.
But at least you had some clue about the problems, some dumps and so on.
Then I had these reboot were you don't see anything. It's like if somebody press a reset button. I started thinking about some ipmi vulnerability :-)
I was seeing the same thing on an E3 node, bricked it remotely updating the bios since the raid card firmware didn't like it, then again with another E3 that is my newest KVM node now
@prometeus: Have you tried to update the Bios? A sudden reboot that only happens with specific software/OS configurations could be the result of a processor bug, the kind of issue that Intel routinely fix with the processor microcode upgrade. Thats remind me a similar issue I fixed with a Bios update, some years ago. The list of known bugs is published by Intel every few months: it is called "Specification Update". The latest Intel Xeon E5 specification update lists more than 200 bugs: one of the worst I have seen in recent times. By contrast, Xeon E3 and E3-v2 specification updates are a lot shorter. It may be that the E3 processor is less tested than E5, but I have some doubt. With so many bugs to patch, a fresh microcode update is crucial.
I did on the latest E5 I put in production without any difference. Also the problem is openvz related as they rocks as KVM nodes...
Probably another OpenVZ bug..
Since I have 7 E5 nodes, all with LSI cards, and zero issues, I am curious what you might be doing different, as it seems we are sharing the same troubles, just with the E3 for me and E5 for yourself. I do agree something is odd with the ovz kernel, I just haven't seen it produce any trouble for me on E5's. I've been using the SM X9DRL-IF-O motherboard and either the LSI 9260/9266/9271 raid card.
Must be perception, as vz is based on RHEL 2.6.32 kernel, so even debian underneath is still redhat when running vz
I tried to talk about Xen-pv into KVM and Uncle didnt want to hear about it. I am sure it has some advantages and I would definitely do it as I did it and do it at home and in my colo server since i have to try a demo cloud setup, but it is not the way Uncle likes it, he is used only with best hw and half empty vmware/rhev clusters kept there ready for big companies and streaming for Italian media.
A node which is over 30% cpu usage is "overloaded" in his view, he really likes to keep space for abusers so they dont hinder other ppl, give them time to sort their issues, etc.
Imagine how much money ppl that oversell are doing if we can survive in this market...
I have been hearing motherboard incompatibility and instability issues with RHEL based systems, but I forgot which exact models.
These poor unused servers
I like Sal's attitude. ;-)
Agreed. Probably why they were #1 in Q4 2012.
It's really really...
We love you Uncle