New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Inconsistent fio results on similar VPSes on the same node
I am seeing some inconsistency on fio tests in yabs run today on similar spec VPSes on the same node. Please see three examples below.
I'm unclear on what's happening, whether it's related to any single VPS that I happen to be testing, whether it's related to some other VPS or some node process using high file I/O at certain times, or maybe something else.
I've been watching iotop -b 3 -o a little. So far, no obvious insight.
Ideas? Thanks!
fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/vda1):
---------------------------------
Block Size | 4k (IOPS) | 64k (IOPS)
------ | --- ---- | ---- ----
Read | 193.28 MB/s (48.3k) | 1.78 GB/s (27.8k)
Write | 193.79 MB/s (48.4k) | 1.79 GB/s (28.0k)
Total | 387.07 MB/s (96.7k) | 3.57 GB/s (55.9k)
| |
Block Size | 512k (IOPS) | 1m (IOPS)
------ | --- ---- | ---- ----
Read | 2.12 GB/s (4.1k) | 2.18 GB/s (2.1k)
Write | 2.23 GB/s (4.3k) | 2.33 GB/s (2.2k)
Total | 4.35 GB/s (8.5k) | 4.51 GB/s (4.4k)
fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/vda1):
---------------------------------
Block Size | 4k (IOPS) | 64k (IOPS)
------ | --- ---- | ---- ----
Read | 193.53 MB/s (48.3k) | 1.95 GB/s (30.4k)
Write | 194.04 MB/s (48.5k) | 1.96 GB/s (30.6k)
Total | 387.57 MB/s (96.8k) | 3.91 GB/s (61.1k)
| |
Block Size | 512k (IOPS) | 1m (IOPS)
------ | --- ---- | ---- ----
Read | 957.00 KB/s (1) | 18.49 MB/s (18)
Write | 1.12 MB/s (2) | 20.26 MB/s (19)
Total | 2.07 MB/s (3) | 38.75 MB/s (37)
fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/vda1):
---------------------------------
Block Size | 4k (IOPS) | 64k (IOPS)
------ | --- ---- | ---- ----
Read | 16.51 MB/s (4.1k) | 1.84 GB/s (28.7k)
Write | 16.52 MB/s (4.1k) | 1.85 GB/s (28.9k)
Total | 33.03 MB/s (8.2k) | 3.69 GB/s (57.7k)
| |
Block Size | 512k (IOPS) | 1m (IOPS)
------ | --- ---- | ---- ----
Read | 2.11 GB/s (4.1k) | 2.19 GB/s (2.1k)
Write | 2.22 GB/s (4.3k) | 2.34 GB/s (2.2k)
Total | 4.34 GB/s (8.4k) | 4.53 GB/s (4.4k)


Comments
Please let me add that the server is running hardware RAID 10. The RAID controller describes the state of the array as "Optimal."
All of the VMs use the same controller and disk set.
Today, so far, things seem better!
Now, same VM as above:
And again, same VM as above:
Is it just that, on this node, 4k RW IOPS fluctuates between about 8.3k and 100k?
Seems like too much variation?
These are spinning disks, no?
@TimboJones Yes! Eight in hardware RAID 10.
Spinning disk are physically limited in what they can really do. Think about it, the mechanical head repositioning is something around 4ms on average for most models. If you divide one second by that you will learn that no more than 250 IOps are doable for a single disk (on average)
Even 8 of them in raid 10 will only bring you to roughly 1k in writes, maybe a bit more in reads, depending on the controllers capabilities....
Now, you are seeing much higher numbers, which show the magic caching can do. Probably your HW Raid comes with some battery backed cache memory, which is doing all the heavy lifting.
Depending on how all layers on top like pagecache, filesystem and schedulers on hostnode and in the guests are configured, of course you will see fluctuating numbers as you can't break physics. If all caches force a flush at the same time you might run into a traffic jam being limited by what the HDDs really are capable off.
You can only try to match the caching and scheduler strategies throughout the layers for better balance and less clashing situations.
Very helpful to think about these issues beginning with the average mechanical head repositioning time!
Thanks very much @Falzo!
But what @Falzo said is true for all your tests. And while I can confirm that quite some spread over multiple identical tests is totally normal, the numbers you show are absolutely not normal and IMO can't be explained by normal circumstances!
So far I didn't comment mainly because I don't care about inconsistent testing but also because I assume you are running linux which is very aggressively caching and IMO weird so I usually don't care about it on servers and hadn't a lot to say about your numbers.
But as Falzo (laudably) started to seriously introduce some technical and relevant factors and as, at least the usual ones, do NOT explain what you showed I chimed in. And I'd suggest to look at the factors Falzo mentioned with a solid dose of mistrust, not because what he said is wrong (it is not) but because without a very solid understanding of the concrete config of that particular server the whole thing largely is a blackbox. To really and properly analyze and troubleshoot way more extensive testing, incl, e.g. testing those disks (each and all of them) under clearly specified conditions like e.g. connected to a plain (no cache whatsoever) controller would be needed.
"Some" disks (what brand, model, etc?) on "some" Raid controller (brand, model, memory type, battery and type, etc?) in "some" servers (I guess the point is clear by now) under "some" linux distro (my guess) and "some" config ... basically means you're poking in dense fog.
I wish you success in finding the problem.