New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
OpenVZ Node Slow?
Having some trouble here, SolusVM says the following about my node "openvz1"
18:11:44 up 37 days, 5:24, 1 user, load average: 11.91, 13.58, 12.76
Obviously that is not acceptable, I checked with iotop to see what is going on, well, almost nothing!
same with top, even though I had 49.9%wa?
Here's the RAM info about the node: 2.93 GB of 15.57 GB used / 12.64 GB free
Comments
Hi, even though it is probably only showing a few hundred K/s, is there any high percentages in the IO> column, often this is missed,
Ben
Is the node with Software RAID?
@BenND not on my computer anymore I'll have to use ssh on my phone to look so it might take me a minute
@Alex_LiquidHost yes software RAID 1
Is the raid healthy? (cat /proc/mdstat)
This says it all. 49.9% of the server is waiting for disk i/o to complete. Another way to put it: every other second is spent waiting for disk i/o. Or, 30 seconds out of every minute.
@MiguelQ what should it say?
Something like this:
md0 : active raid1 sda1[0] sdb1[1] 131060 blocks super 1.2 [2/2] [UU]
yeah it says that twice.
Do you have smartmontools installed? If so, check both disks for errors. While you are at it, check dmesg output for errors as well.
I don't know. When I'm at home I'll find and install those things.
Or I could just call OVH im sure they could fix it.
If it's an OVH default installation, smartctl is probably blocked (no permissions), so you have to fix those first
OK, well I rebooted it from OVH Manager, now my load is at 6, which is more manageable IMO than 15. But there is still something wrong... looks to me like it's rebuilding the RAID array? does that mean the drive is bad?
Try doing:
mdadm --detail /dev/md2
It will give you information on what it's doing.
got an error three times before i realized I was trying to type that into the wrong putty window, of the new ipxcore vps i just got. lol...
[root@openvz1 ~]# mdadm --detail /dev/md2
/dev/md2:
Version : 0.90
Creation Time : Sun Dec 23 12:33:14 2012
Raid Level : raid1
Array Size : 1932012480 (1842.51 GiB 1978.38 GB)
Used Dev Size : 1932012480 (1842.51 GiB 1978.38 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Resync Status : 4% complete
[root@openvz1 ~]#
I dont blame you for asking about help... but you have 49.9% wa and you are asking what's wrong....
@Corey: I assumed that was abnormal, we are just trying to find the root cause of it so my clients can be happy. Do you know if my RAID issue might have anything to do with it? Thank you.
I thought that's pretty obvious?
Yes, but what I'm trying to understand is should I get OVH to put in a new disk, or is normal to rebuild?
If the rebuild succeeds, the disks should be fine. If you rebooted uncleanly (not from the console) I'm not surprised it's resyncing.
Either way you just have to wait it out for the resync. Probably an hour's wait that's about it. If iowait is still high after that, your RAID1 isn't cutting it.
You might want to increase your resyncing speed.
You currently are resyncing at 11 MB/s.. At that rate, it says it will finish in 2 days lol.
That's ok it can take two days I don't care
I rebooted from OVH Manager.
Well I guess SMART is fine because
[root@openvz1 ~]# smartctl -H /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-042stab068.8] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
[root@openvz1 ~]# smartctl -H /dev/sdb
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-042stab068.8] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
mdadm resyncing is 'normal' but an annoying consequence. It will strike at the worst times.
Here's the top header for a server that we're considering full:
This server is running a really good hardware raid card, the LSI MegaRAID SAS 9260‑4i. You may want to consider including a hardware raid card in your next server; it's the difference between night and day.
@Damian: yeah the VPS i have with you is performing very well
Does that RAID card need a BBU?
Yes. Newer cards, such as the 9266, can use either a standard BBU, or the "CacheVault", which is a slightly-more-expensive cache that uses flash memory instead of a BBU to keep DRAM alive: http://www.lsi.com/channel/products/storagecomponents/Pages/MegaRAIDSAS9266-4i.aspx
The CacheVault is more expensive initially, but pays for itself when you don't have to replace BBUs anymore.
@Damian What are the rest of the specs on that server? I'm guess Xeon E5 with 32 or 64GB RAM?
Also what types of disks do you prefer?
I mean, honestly right now I'm stuck with Software RAID1 or RAID0(which I would never use), on OVH. I could do Software RAID10 with DataShack or WholeSaleInternet... but then I'd have to increase my prices.
Why on earth would he speed up the rebuild rate? It's already causing performance issues, speeding it up would just make performance for end users worse. Either take it down and let it rebuild at full speed, or let it go as long as you can stomach and minimize impact.
I took down a couple VPS that were using the most resources for a short while, it jumped about 5% nicely. But I booted them back up and I'm waiting it out for now. If anybody complains about performance I'll happily give them some account credit or something.
Shut down most of the VPS on the node, rebuild progressed significantly more quickly, now everybody's back up with great performance.
1073741824 bytes (1.1 GB) copied, 7.81418 s, 137 MB/s
Thanks to everybody who contributed to the thread, and thanks to my customers for waiting through the period of poor performance.