New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Comments
My guess would be your VPS has no swap space, and your DEDI has swap space to use.
How large is the file your trying to SUM?
Thanks
Ryan
Correct!
sha256sum target file is 4.6 GB. The VPS has 4 GB RAM and zero swap space. The dedi has 64 GB of RAM and 4 GB swap space.
Thanks for helping, Ryan!
The question in my mind now is, "How can we understand the cause of the persistent I/O wait?"
Are you suggesting that the cause of the persistent file I/O wait is simply that there isn't enough RAM in the VPS to accept the size of the target file?
I haven't looked at the source for
sha256sumbut wouldn't it be surprising ifsha256sumrequired the entire target file to be moved into RAM?In the above
vmstatoutput, it looks like there consistently is free RAM shown by the second column, titled "free." But this is the first time I've ever triedvmstatso I have a lot to learn, that's for sure!Thanks again!
Best!
Your E3 has 4 cores vs 2 cores. Run comparisons limiting E3 to two cores.
When doing your testing make sure to clear caches so that the dedi is not given an advantage from potential caching. This will then reveal a CPU or disk bottleneck.
i.e.
sync && echo 3 > /proc/sys/vm/drop_cachesOr you could generate say a 2GB random file, copy to the to other machine, load it into RAM (maybe ramdisk for determinism) on both, and then compare again.
Did another run to test consistency.
So, chunk size is the answer, at least on the Ryzen VPS?
I can see how the E3 could just keep on reading because the E3 has plenty of memory. But the VPS, even without much memory, seems to work fast when we use 1 M block size with
dd. So it seems that the chunk size issue is at least partly independent of overall memory size. Maybe the root of the issue not in the memory size, but, instead, is in the increased number of read operations when the chunk size is small? How is the chunk size issue (memory size or number of operations, or maybe both) avoided on the E3 Dedi?Thanks yet again!
That is not a contributing factor that single threaded processes.
You’re correct in suspecting that the chunk size plays a key role in performance during read operations, and that it’s not entirely tied to the overall memory size.
When performing read operations, especially with tools like dd, the block size (or chunk size) determines how much data is read from the disk in a single operation.
A smaller block size results in more frequent read operations, which can increase overhead due to the need to manage and process each request. Larger block sizes reduce the number of operations needed but require more memory to handle larger chunks of data.
From the OP:
Now, using
ddto increase block size forsha256sum.This looks enough faster than the E3 Dedi.
But this one is almost twice as long. Why?
I added some swap. Swap didn't seem to help the I/O wait. I also wanted to see whether another file I/O program would show I/O wait like sha256sum. So, just for some quick fun, I ran a yabs, which calls fio. I ran vmstat while the Yabs was running. Note that the interval for this vmstat was 6 seconds instead of 1 second. This time I ran vmstat in the background instead of in a separate terminal. Here's the Yabs result, and I will post the vmstat results in a moment.
Here's adding some swap in case anypne's interested.
Looks like these two lines show significant I/O wait, 43% and 22%. So it seems that both fio and sha256sum experience significant I/O wait. Why?
The provider of this very nice VPS asked me to retest with RAM doubled to 4 GB. The short answer seems to be that increasing the RAM doesn't help. Even with 4 GB RAM, there still is a lot of I/O wait.
"very nice VPS" is not sarcasm. It really is a great VPS! And the provider's customer service is great as well!
Could you ask your provider to temporarily increase your ram so that you can test using a ramdisk? At this point it’s really hard to tell if it is storage bounded or there are other limiting factors.
Sorry @Not_Oles
but I stopped reading the whole thread relatively early; too much quoted, too many (IMO) irrelevant numbers.
What you experience doesn't surprise me at all, I see that all the time (I'm developing a lot in the crypto field, not as in "using crypto libraries" but as in "implementing algorithms").
IMO the decisive - and so very often overlooked - point is memory, both as in speed and size. Explanation: crypto algorithms breath (or suffocate) in memory, preferably L1 and L2, but of course (a) those are very limited, and (b) the data must be fetched first from normal RAM anyway, and also written back to it. In fact, I often saw memory being a factor that is more relevant than processor (speed, computing power, etc).
Now, in particular with a large data size (file size), memory enter the game yet again, in particular as (file) cache.
So, your (oldish, "smallish") dedi with 64 GB memory has multiple advantages over the Ryzen VM. Among others it highly likely can cache the whole file; and keep in mind that the VM does not read/write directly from/to the disk but it has to go through the host. Moreover the dedi can directly access its memory that is it can both get and put the SHA results from/to the memory while the Ryzen VM has to go through the hypervisor.
Short version: Your E3 dedi may have lower SHA throughput in the processor but it can transfer the data faster: the whole circle is more balanced, while your Ryzen may process the data much faster but that is of limited value if it takes a small eternity to shove the data to/from memory and disk!
Btw, you can largely ignore SSD vs NVME, this file system vs. that file system, etc. That plays, if any noticeable at all, quite minor role. The significant factor wrt disk is that the dedi can comfortably shove many GB of data into (comparably fast) memory and have the OS shove them to/from disk without breaking a sweat, while the Ryzen VM is wasting cycles waiting for relatively small blocks of data shove to/from disk.
My provider previously suggested a temporary RAM increase. I believe they still would do it. Maybe they will see this comment and the RAM increase might happen automagically?
Aditionally, I might have previous test results or maybe could do a RAM test on a different server.
Give me a day or two, please. I will post something.
Thanks for your interest @zakkuuno!
What I did this morning was create a 2 core, 4 GB VPS on my E3 machine. Then I ran a test on the E3 VPS. Here are the results. Almost no I/O wait!
Why is there so much I/O wait on the Ryzen VPS results shown above and practically none on the E3 VPS?
Because there is contention for resources on the Ryzen VPS (doesn't sound like you are the sole user of that server) while on the E3 machine you alone are using it. I'm not suggesting the Ryzen VPS is oversaturated, any other users are going to slow things down with regard to I/O wait.
'right answer' € [lots of answers].
Maybe increase VM memory in 100 MB steps and each time "benchmark" the hell out of it?
(or spot the answer within the rich set of comments (and "benchmarks")).
Friendly greetings
Seems to be heavily storage/io bounded.
The e3 vm has much fewer context switches. My guess is that its processing power is too weak to saturate the io, so it does not have to wait for io, and thus the number of context switch is just a few.
I'd suggest you test on a smaller file (~3G) and load the file into ramdisk for both ryzen and e3 servers to compare the results.
The kind provider and I finally got around to testing, as @zakkuuno suggested, with increased RAM on the Ryzen VPS.
Bingo! No I/O wait!
Why does the Ryzen VPS need more RAM than the E3 VPS needs?
In this scenario, ram is useful as the storage io is the constraint, because you can cache the file into ram. E3 requires less io because it runs checksum slower than the Ryzen. It's not that Ryzen VPS needs more RAM than the E3 VPS needs, but io is more likely to be a constraint for the faster CPU. Your Ryzen VPS probably runs on a more saturated host whereas the E3 is basically dedicated for you to test, which makes IO more of a problem.
Possible answer: The Ryzen needs more RAM because it is using NVMe instead of the SSD that is in the E3? So it's not the single threaded sha256sum that makes the difference here? Instead it's the difference in the disk setup?
Clearly a case of

@Not_Oles
After you seemed to have completely ignored my serious comment and I just happened to come across you mentioning "swap" I'll undertake a last serious attempt.
Remember: the system needs memory for both the data (or at least a significant chunk of it) and the file. Now add the fact that linux caches drives & files very aggressively which means that in a worst case scenario you need a lot more memory than the data you hash (and the result). Also note that when your processor has a large L2 and / or L3 cache - and the Ryzen has - it'll likely read and write larger chunks from memory (which of course needs backfilling from disk).
And all that on a VM (as opposed to the E3 dedi). Adding swap doesn't help, quite the contrary (same goes for RAM disk).
What you need is 2 factors, (a) lots of RAM, and (b) nothing but the bare OS running (and of course your SHA program).
Nope! I appreciate your serious comments! I thanked your serious comment to which you refer. You can look and see. My thanks is there for you and for everyone to see.
It's clear that you understand these topics about RAM and disk operations very well! I believe you understand more deeply, more thoroughly, and more comprehensively than I understand.
It's Low End here. I was trying to see what I could do on an inexpensive VPS compared to my E3 dedicated server. I discovered what could be described as a "limitation" of the VPS. With help from the kind provider and from other LET members, I compared what happened when we increased the VPS cores and RAM. I also made an equivalent spec VPS on the E3 and compared that VPS' performance.
There's more brewing behind the scenes, because, with even more help from the kind provider, it looks like I might be going to learn a little about certain parameters and how these differ between Debian and Ubuntu default installs. Probably, we're also going to compare with an Intel E5 CPU VPS. I still might post more here if, as, and when I find out more.
So this has been a great thread for me -- thanks to you, @jsg
-- and thanks to other members
-- and thanks to the awesome Provider of this VPS.
I hope this has been a good thread for all of you guys too! Best wishes @jsg! Best wishes, everyone!
This could be the case, or IO resource throttling since often that is the case for VPS, because it's shared VPS resources and the provider would do that because they want to load up the host to fill it for the most profits while trying to keep the host from getting overloaded and causing neighboring effects later. This is assuming it's a new host in reality that is not overloaded at the time.
This is where dedicated and unthrottled VDS is unmatched (if it's true VDS and not a marketing gimmick shilling customers).
Disk I/O is still a shared resource for VDS, unless single use exclusive disk storage is attached to the VDS.
Do you offer such exclusive resource for your VDS? Most likely, not.
Also, VDS with a single “dedicated core HT” is not a true VDS either, as core HT performance is not independent of its counterparts HT. That is why VDS with physical cpu core (with both its HT) is recommended for true VDS experience and better performance.
Yes, I do have a VDS with dedicated cpu core from a premium LET provider. @heartbeat_IT
My point is that IO can be rate limited regardless, assume that is the case in VPS often, not always. VDS can use the same disks yes, but they should not be IO capped and if they are it's capped just enough to keep the rest of the VDS on the host from getting Iops depleted performance issues because some long running IO intensive operation on one VDS is running. Balancing act when it happens.
CharityHost.org VDS are not currently IO capped, but it is something to be careful with VDS for sure.
Regarding single use disk storage, will be very hard to find any provider that does this in a virtualized environment. If you need that, that's what dedicated servers are for but you lose the benefits that virtualization give you other than costs.
Should I share performance of my VDS?
And it is more cost effective as well
In my opinion, true VDS should have dedicated cpu core. cpu core HT performance is not truly independent of its counterpart HT.