New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Slow KVM I/O speeds when host node much faster (virtio)
I'm running a single CentOS 6 virtual machine on a SolusVM KVM hostnode using virtio.
On the hostnode, I can get (elevator:deadline):
# dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync 16384+0 records in 16384+0 records out 1073741824 bytes (1.1 GB) copied, 5.38596 s, 199 MB/s
but on the VM, I only get (elevator: noop):
root@grasshopper [~]# dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync 16384+0 records in 16384+0 records out 1073741824 bytes (1.1 GB) copied, 18.4232 s, 58.3 MB/s
I can actually feel the sluggishness within the VM itself...
The hostnode has 4 x 1TB in RAID-10 with hardware raid + bbu. From your experience, does 199MB/s sound reasonable for a hard-drive only array?
Does anyone know why this is, and what I can do to fix it? Perhaps the LVM is misaligned? If so, how can I tackle that problem?
Thanks in advance,
Comments
Check your raid to see if its ok, and maybe rebuild the raid mirrors.
Yea that makes no sense at all
The host node array is healthy. It has ~200MB/s writes, which I think is reasonable? I'm not too sure though, as I usually deal with RAID-6 arrays with 6+ more drives. Perhaps someone can correct me, or comment on that.
and yes; not sure why I'm seeing such a big discrepancy when there's a SINGLE virtual machine running on the host node with no load as I'm still testing the performance.
@Spencer its 5am here so might of worded it wrong.
I created a LV and mounted it directly to the host node itself, and ran the test:
so it looks like the bottleneck is the LVM/PV, but I'm not sure how to fix it
The partition tables looks aligned to me:
data_alignment_detection = 1
is also set in /etc/lvm/lvm.conf.
Software raid?
Is your LVM's PE Size set to 32M?
+1 on this
Check 'vgdisplay' and check PE size like others mentioned!
What RAID Card & Drives are you using? May be BBU is charging? 199 still seems low with a raid controller and bbu up.
4 x 1TB in HW RAID = 200MB/s is quite bad. You can achieve that with SW RAID.
HW RAID has consistancy, which is it's main advantage, the controller you're using doesn't look like a performance controller.
But I have never used KVM so it may just be KVM, or your partitioning on the HN.
Tell us more and we can probably give better answers.
I would say something must be afoul in your setup. I have an old intel 5450(32GB Mem) with a degraded hardware RAID10 array of 3x10k RPM SAS disks (one drive died last week), with Xen as a hypervisor. Currently on an Ubuntu 12 VM(shared with 6 other VM's), I am seeing about 120-130MB/s via using dd on this degraded RAID10 array. The host does about 300+MB/s with dd.
I also have an older 5340(8GB Mem) machine with a single 7200 rpm with Centos 6 and Qemu/KVM running. I just installed a Debian 5 virtual machine, and dd is showing about 115-125MB/s. This host machine shows 450MB/s via dd now, with a single disk and one vm running.
I am thinking that a RAID10 setup with 7200RPM drives should be speedier than 199MB/s on the host and 60MB/s on a vm -- I have seen vibration cause MAJOR I/O issue. The company I worked for bought cheap servers and was having VERY slow I/O to some raid10 and raid6 setups. After lots of troubleshooting, it was found that the cage that holds the disk was flimsy and needed reinforcement. The company that sold us these cheap servers came out and drilled holes and added screws to keep the disk cage from vibrating so much.
This helped lots - more doubled the IO of the machines to acceptable levels. If your using CHEAP servers, this is something to think about - sometimes these machines are not tested to well and you end up being the beta tester for a shoddy chassis design.
Video that I saw once about vibration and IO performance.
Otherwise, I would explore your raid controller and disk firmware to see if there is anything that can be done to solve the root cause of your slow disk i/o.
What is your default caching? Change it to write back.
Oh no, it's set to 4MB.
I try changing it, but it's not letting me:
Also, this server has an older LSI 8888-ELP:
Probably a bad BBU that's disabling the write-back cache...
Also, thank you for everyone's help so far, even though I forgot to mention that this server is only for personal use and that I'm not going to be providing VMs to anyone (I'm not a competitor!)
@OttoYiu Guest cache not the LVM. Use writeback caching.
How much you achieve with hardware raid?
delete
??
If caching enabled, you are looking at 250+, but regular raid, nothing special.
It's not; the sales/marketing point of the controller he says he's using is for large external SATA connections. It's not the best controller to use for internal drives, but it's still much better than software raid.
I asked the datacenter to replace the BBU for this particular server.
I'm still having troubles with setting the PE size though. Do I have to recreate the VG? Also, how does the PE size affect performance?
I use this card for several large RAID-6 arrays, and they work pretty well. This is my first time using this card for a small RAID-10 however.
Yes, it's a great card for large arrays, but it's kinda unspectacular for smaller arrays. But the price was probably right, and it still works better than software RAID
So I was reading up on the PE size, and I couldn't find any references regarding performance hits with small PE and LVM2. Can anyone point me in the right direction?
Your speed will improve once you update your guest caching to writeback.
So, the BBU has finished its learning cycle and writeback is now enabled on the hostnode:
On a LV mounted on the host node:
On the VM (with guest writeback enabled like what @NHRoel said)
Why is there still such a big discrepancy between the 3?
Did you ever manage to get a hold on this?
Nope, I still have no idea why the discrepancy.
What do you have the Guest Disk Cache set as on the node settings?
I have set it as 'Default'. I overrode it to writeback in the VM settings of the specific VM I was using to test the speeds. Would it matter in this case?
I was having the same issue that you are having as well. I set mine to None which helped. May want to give this a try. If it works on that VM then I would make it a server wide setting.
Wow. You're a life saver! First the internal IP problem that I was facing, then this one.
Tried this with a fresh VM with guest cache set to 'None', and bam - faster speeds all around.
I guess it was double-caching...
Thanks again Brian and everyone who helped!
Edit: It seems that there is a correlation with the size of the logical volumn and write speeds...
A fresh VM that is 160G in size as follows:
vs 40G in size:
Both VMs are deployed with the same template.
That's weird :O