OpenVZ Node Slow?

shovenose · January 2013

Having some trouble here, SolusVM says the following about my node "openvz1"
18:11:44 up 37 days, 5:24, 1 user, load average: 11.91, 13.58, 12.76
Obviously that is not acceptable, I checked with iotop to see what is going on, well, almost nothing!
same with top, even though I had 49.9%wa?
Here's the RAM info about the node: 2.93 GB of 15.57 GB used / 12.64 GB free

Ben1002 · January 2013

Hi, even though it is probably only showing a few hundred K/s, is there any high percentages in the IO> column, often this is missed,

Ben

AlexBarakov · January 2013

Is the node with Software RAID?

shovenose · January 2013

@BenND not on my computer anymore I'll have to use ssh on my phone to look so it might take me a minute
@Alex_LiquidHost yes software RAID 1

MiguelQ · January 2013

@shovenose said: yes software RAID 1

Is the raid healthy? (cat /proc/mdstat)

Damian · January 2013

@shovenose said: even though I had 49.9%wa?

This says it all. 49.9% of the server is waiting for disk i/o to complete. Another way to put it: every other second is spent waiting for disk i/o. Or, 30 seconds out of every minute.

shovenose · January 2013

@MiguelQ what should it say?

MiguelQ · January 2013

@shovenose said: what should it say?

Something like this:
md0 : active raid1 sda1[0] sdb1[1] 131060 blocks super 1.2 [2/2] [UU]

shovenose · January 2013

yeah it says that twice.

MiguelQ · January 2013

@shovenose said: yeah it says that twice.

Do you have smartmontools installed? If so, check both disks for errors. While you are at it, check dmesg output for errors as well.

shovenose · January 2013

I don't know. When I'm at home I'll find and install those things.
Or I could just call OVH im sure they could fix it.

MiguelQ · January 2013

@shovenose said: Or I could just call OVH im sure they could fix it.

If it's an OVH default installation, smartctl is probably blocked (no permissions), so you have to fix those first

shovenose · January 2013

OK, well I rebooted it from OVH Manager, now my load is at 6, which is more manageable IMO than 15. But there is still something wrong... looks to me like it's rebuilding the RAID array? does that mean the drive is bad?

Damian · January 2013

Try doing:

mdadm --detail /dev/md2

It will give you information on what it's doing.

shovenose · January 2013

got an error three times before i realized I was trying to type that into the wrong putty window, of the new ipxcore vps i just got. lol...

[root@openvz1 ~]# mdadm --detail /dev/md2
/dev/md2:
Version : 0.90
Creation Time : Sun Dec 23 12:33:14 2012
Raid Level : raid1
Array Size : 1932012480 (1842.51 GiB 1978.38 GB)
Used Dev Size : 1932012480 (1842.51 GiB 1978.38 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Tue Jan 29 21:12:32 2013
      State : active, resyncing

Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Resync Status : 4% complete

       UUID : 641aaab6:e4182967:a4d2adc2:26fd5302
     Events : 0.90

Number   Major   Minor   RaidDevice State
   0       8        2        0      active sync   /dev/sda2
   1       8       18        1      active sync   /dev/sdb2

[root@openvz1 ~]#

Corey · January 2013

I dont blame you for asking about help... but you have 49.9% wa and you are asking what's wrong....

shovenose · January 2013

@Corey: I assumed that was abnormal, we are just trying to find the root cause of it so my clients can be happy. Do you know if my RAID issue might have anything to do with it? Thank you.

eLohkCalb · January 2013

Resync Status : 4% complete

I thought that's pretty obvious?

shovenose · January 2013

Yes, but what I'm trying to understand is should I get OVH to put in a new disk, or is normal to rebuild?

Kenshin · January 2013

If the rebuild succeeds, the disks should be fine. If you rebooted uncleanly (not from the console) I'm not surprised it's resyncing.

Either way you just have to wait it out for the resync. Probably an hour's wait that's about it. If iowait is still high after that, your RAID1 isn't cutting it.

NateN34 · January 2013

You might want to increase your resyncing speed.

You currently are resyncing at 11 MB/s.. At that rate, it says it will finish in 2 days lol.

shovenose · January 2013

That's ok it can take two days I don't care

shovenose · January 2013

I rebooted from OVH Manager.

shovenose · January 2013

Well I guess SMART is fine because

[root@openvz1 ~]# smartctl -H /dev/sda
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-042stab068.8] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

[root@openvz1 ~]# smartctl -H /dev/sdb
smartctl 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-042stab068.8] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

Damian · January 2013

mdadm resyncing is 'normal' but an annoying consequence. It will strike at the worst times.

Here's the top header for a server that we're considering full:

Tasks: 4881 total,   3 running, 4869 sleeping,   1 stopped,   8 zombie
Cpu(s): 14.6%us,  3.8%sy,  0.0%ni, 80.3%id,  0.5%wa,  0.0%hi,  0.7%si,  0.0%st
Mem:  32783676k total, 32553456k used,   230220k free,  5851044k buffers
Swap:  2086904k total,   637188k used,  1449716k free, 16399948k cached

This server is running a really good hardware raid card, the LSI MegaRAID SAS 9260‑4i. You may want to consider including a hardware raid card in your next server; it's the difference between night and day.

shovenose · January 2013

@Damian: yeah the VPS i have with you is performing very well
Does that RAID card need a BBU?

Damian · January 2013

@shovenose said: Does that RAID card need a BBU?

Yes. Newer cards, such as the 9266, can use either a standard BBU, or the "CacheVault", which is a slightly-more-expensive cache that uses flash memory instead of a BBU to keep DRAM alive: http://www.lsi.com/channel/products/storagecomponents/Pages/MegaRAIDSAS9266-4i.aspx

The CacheVault is more expensive initially, but pays for itself when you don't have to replace BBUs anymore.

shovenose · January 2013

@Damian What are the rest of the specs on that server? I'm guess Xeon E5 with 32 or 64GB RAM?
Also what types of disks do you prefer?
I mean, honestly right now I'm stuck with Software RAID1 or RAID0(which I would never use), on OVH. I could do Software RAID10 with DataShack or WholeSaleInternet... but then I'd have to increase my prices.

diffra · January 2013

@NateN34 said: You might want to increase your resyncing speed.

Why on earth would he speed up the rebuild rate? It's already causing performance issues, speeding it up would just make performance for end users worse. Either take it down and let it rebuild at full speed, or let it go as long as you can stomach and minimize impact.

shovenose · January 2013

I took down a couple VPS that were using the most resources for a short while, it jumped about 5% nicely. But I booted them back up and I'm waiting it out for now. If anybody complains about performance I'll happily give them some account credit or something.

shovenose · January 2013

Shut down most of the VPS on the node, rebuild progressed significantly more quickly, now everybody's back up with great performance.
1073741824 bytes (1.1 GB) copied, 7.81418 s, 137 MB/s
Thanks to everybody who contributed to the thread, and thanks to my customers for waiting through the period of poor performance.

Howdy, Stranger!

Categories

In this Discussion

OpenVZ Node Slow?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

OpenVZ Node Slow?

Comments