All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
Why does E3 SSD dedi seem faster than Ryzen 7950X NVMe 2 vCore VPS at `sha256sum -c`?
Yesterday I uploaded a 4.6 GB Chromebook backup to storage. I uploaded the same file to both an E3 dedicated server and to a 2 vCore Ryzen 7950X VPS.
When I checked the integrity of the uploaded files, I was surprised that running sha256sum -c
seemed to take longer on the Ryzen 7950X VPS than on the E3 dedi. For the E3 dedi the wall clock "real" time was 22.655 seconds, and, for the Ryzen, 30.658 seconds.
The E3 dedi has SSD Raid 10, and the Ryzen 7950X 2 vCore VPS has NVMe Raid 10.
The E3 is running Ubuntu 22.04.4 LTS. The Ryzen is running Debian 12.7.
Geekbench and fio
scores from Yabs on both machines are shown below along with both machines' time
results.
Note that both fio
and Geekbench scores are significantly higher on the Ryzen 2 vCore VPS than on the E3 Dedi.
I guess the sha256sum -c
execution time might depend on how many threads are being used by sha256sum
. From both machines' top
results, also shown below, it seems like sha256sum
might be single threaded.
I haven't seen any steal on the Ryzen VPS, and I was told that it was on a new node.
In the time
results shown below, please note that "real", "user", and "sys" each differ.
I decided to try time sha256sum -c
also with another, bigger 17 GB file. For the 17 GB file, the wall clock "real" time was 1m18.146s on the E3 dedi and 2m16.960s on the Ryzen. So, again, the E3 Dedi seemed to beat the 2 vCore Ryzen 7950X.
It really seems like I must be missing something basic here! It doesn't seem sensible that the E3 would be faster than the Ryzen for sha256sum -c
. But, what explains the time differences? Why does the E3 seem faster?
Both processors have integrated graphics. The E3 graphics are enabled on the bare metal, and the Ryzen graphics are passed through into and are enabled on the Ryzen VPS. But, are the graphics processors involved with sha256sum
?
Assuming these results are not from way out in left field, and assuming the lack of some easy explanation that I am missing, does anybody here have experience checking the time expended on the various operations performed by the sha256sum
program and also the detailed E3 and Ryzen architectural specifications? Why does E3 seem faster than Ryzen on sha256sum -c
?
E3 Dedi
OS: Ubuntu 22.04.4 LTS
fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/mapper/vg0-root):
---------------------------------
Block Size | 4k (IOPS) | 64k (IOPS)
------ | --- ---- | ---- ----
Read | 199.08 MB/s (49.7k) | 180.73 MB/s (2.8k)
Write | 199.61 MB/s (49.9k) | 181.68 MB/s (2.8k)
Total | 398.69 MB/s (99.6k) | 362.42 MB/s (5.6k)
| |
Block Size | 512k (IOPS) | 1m (IOPS)
------ | --- ---- | ---- ----
Read | 301.57 MB/s (589) | 323.23 MB/s (315)
Write | 317.60 MB/s (620) | 344.75 MB/s (336)
Total | 619.17 MB/s (1.2k) | 667.98 MB/s (651)
Geekbench 6 Benchmark Test:
---------------------------------
Test | Value
|
Single Core | 1344
Multi Core | 4430
Full Test | https://browser.geekbench.com/v6/cpu/6406219
top - 23:16:48 up 1 day, 5:59, 2 users, load average: 1.21, 0.93, 0.90
Tasks: 216 total, 2 running, 214 sleeping, 0 stopped, 0 zombie
%Cpu0 : 4.7 us, 0.3 sy, 0.0 ni, 94.3 id, 0.3 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu1 : 1.3 us, 2.3 sy, 0.0 ni, 96.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 10.6 us, 1.7 sy, 0.0 ni, 87.1 id, 0.0 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu3 : 93.0 us, 1.7 sy, 0.0 ni, 5.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 2.0 us, 2.3 sy, 0.0 ni, 95.4 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu5 : 4.0 us, 1.0 sy, 0.0 ni, 94.4 id, 0.3 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu6 : 1.0 us, 1.0 sy, 0.0 ni, 97.3 id, 0.0 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu7 : 6.9 us, 2.3 sy, 0.0 ni, 90.5 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
MiB Mem : 64084.4 total, 27018.1 free, 7788.8 used, 29277.5 buff/cache
MiB Swap: 4096.0 total, 4096.0 free, 0.0 used. 55571.9 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
58585 root 20 0 5792 1056 968 R 100.0 0.0 0:24.90 sha256sum
root@E3-Dedi:~# time sha256sum -c chronos-20240904.tgz.cpt.SHA256
chronos-20240904.tgz.cpt: OK
real 0m22.655s
user 0m22.090s
sys 0m0.564s
root@E3-Dedi:~#
root@E3-Dedi:~# time sha256sum -c Documents.tgz.cpt.SHA256
Documents.tgz.cpt: OK
real 1m18.146s
user 1m16.245s
sys 0m1.892s
root@E3-Dedi:~#
Ryzen 7950X 2 vCore VPS
OS: Debian 12.7
fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/sda1):
---------------------------------
Block Size | 4k (IOPS) | 64k (IOPS)
------ | --- ---- | ---- ----
Read | 331.18 MB/s (82.7k) | 1.68 GB/s (26.3k)
Write | 332.05 MB/s (83.0k) | 1.69 GB/s (26.5k)
Total | 663.23 MB/s (165.8k) | 3.38 GB/s (52.8k)
| |
Block Size | 512k (IOPS) | 1m (IOPS)
------ | --- ---- | ---- ----
Read | 4.98 GB/s (9.7k) | 4.53 GB/s (4.4k)
Write | 5.24 GB/s (10.2k) | 4.83 GB/s (4.7k)
Total | 10.23 GB/s (19.9k) | 9.36 GB/s (9.1k)
Geekbench 6 Benchmark Test:
---------------------------------
Test | Value
|
Single Core | 2534
Multi Core | 4414
Full Test | https://browser.geekbench.com/v6/cpu/7541417
top - 23:15:06 up 4 days, 2:11, 2 users, load average: 0.44, 0.21, 0.09
Tasks: 91 total, 2 running, 89 sleeping, 0 stopped, 0 zombie
%Cpu0 : 0.0 us, 0.7 sy, 0.0 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu1 : 94.6 us, 4.4 sy, 0.0 ni, 0.0 id, 1.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3915.5 total, 191.6 free, 329.2 used, 3616.6 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 3586.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23848 root 20 0 5484 904 812 R 99.0 0.0 0:25.57 sha256sum
root@Ryzen-2-vCore-VPS:~# time sha256sum -c chronos-20240904.tgz.cpt.SHA256
chronos-20240904.tgz.cpt: OK
real 0m30.658s
user 0m9.694s
sys 0m0.668s
root@Ryzen-2-vCore-VPS:~#
root@Ryzen-2-vCore-VPS:~# time sha256sum -c Documents.tgz.cpt.SHA256
Documents.tgz.cpt: OK
real 2m16.960s
user 0m34.364s
sys 0m1.840s
root@Ryzen-2-vCore-VPS:~#
Comments
I know close to nothing when it comes to CPUs, but I wonder if sha256sum is able to use the SHA256 extensions in the E3 CPU, and for some reason the hypervisor of the VPS does not pass them to the VPS?! Hopefully someone who actually knows something will help you!
My guess is that you have the entire files cached in memory on your dedicated.
real includes time spent waiting on IO. If you look at user+sys, it is lower on the vps.
I have no idea either but this was my hunch.
In addition:
sha256sum -c
could be more affected by I/O latency or overhead related to virtualization, impacting performance even if disk speeds are theoretically higher.You stated the E3 is a dedicated server and the Ryzen is a VPS...
File is probably cached in RAM. Maybe also check
sha_ni
in/proc/cpuinfo
.You might also be interested in other faster hashing algorithms if you just wanted to check integrity.
So what does that imply? The GB6 score of the VPS is clearly larger than the dedi, and since the sha256sum only runs for 2 minutes (shorter) than GB6, throttling isn't really a concern.
Thanks to everyone who has commented.
Just now remembering that the E3 is running ext4 with LVM, whereas the Ryzen is running xfs.
I haven't yet studied up on the effect of these filesystem differences. Ideas, please?
I don't think the file system matters much. I'm 99% sure the speed difference is because you have the files cached in ram on your dedicated.
This means you have 29 gb of files cached in ram.
User + sys is much lower on your ryzen, but "Real" is higher because it includes time spent waiting on IO.
Here is some additional data about caching, including a single run where the Ryzen was really fast.
Reboot plus two successive runs.
Another reboot plus another two successive runs.
Yet another reboot plus yet another two successive runs.
One more double try.
Why are the times on the first set of reruns here so different? The wall clock "real" time was 0m11.104s for the first run of the first set and 0m27.489s for the second run of the first set.
We might expect the second run of each set (the cached run) to be faster if caching is a significant factor. That's not what happened.
Indeed, the first run of the first set of reruns here, the Ryzen VPS at 11.104s was the fastest of all the runs, including the original E3 and Ryzen runs, 22.655s and 30.658s, respectively.
Having Ryzen take 11 seconds for something an E3 does in 22 seconds seems reasonable. But, here, Ryzen doesn't consistently run that fast in these tests.
More ideas, please?
Yes, excellent idea, and sha_ni shows up in the /proc/cpuinfo flags for both CPU 0 and CPU 1.
Hi! Thanks! Haven't done any long term testing as yet. For what it's worth, the Geekbench Single Core scores seem reasonably consistent.
@fredo1664 Yeah, good idea, the sha_ni seems to be available inside the VPS. Thanks very much!
Virtualized environments often utilize CPU frequency scaling to balance load and power consumption. Depending on how the virtual machine’s CPU cores are allocated and the overall load on the host system, performance scaling could result in inconsistent performance between runs. This could explain why the Ryzen core took 11 seconds in one run but 35 seconds in another.
First Run Performance Spike, well, it could be that the system briefly clocked the CPU cores to a higher performance state due to recent activity (like your first checksum run), but later, due to idle or less-demanding tasks, the cores scaled down in frequency.
This explains why subsequent runs, though cached, might have slower execution times.
You're not getting much caching on the vps if any. The file you are processing is much larger than 4 gb.
It seems to me that CPU frequency probably could be measured inside a VPS. But I've never looked into it.
Based on the frequency reports in the Yabs tests I have run, the frequency seems precisely consistent.
Beginning at line 225 of yabs.sh
I'd have to look some more to see how the frequency in /proc/cpuinfo is determined. I'm guessing the frequency is not actually tested, because, as shown above, the frequency reports of multiple Yabs runs are precisely identical.
Do you have experience with measuring CPU frequency scaling inside a VPS?
Thanks!
Yes. You're right.
Note that there are two files in the original post, 4.6 GB, and 17 GB. Even the smaller file is bigger than the entire memory allocated to the VPS.
Thanks!
my understanding is that only base frequency is reported from inside the vps.
2 cores (if ht then technically just 1) vs full cpu
i don't see what is so hard to understand
From the results of
top
in the OP, it seems likesha256sum
is only single core and not multicore in this context. Thus, only one core is used, no matter how many cores there are.What is hard to understand is why the single, old E3 core is usually, but not always faster on this task than the single, new, fast Ryzen core.
Does that help?
Interesting.
Maybe install WordPress site with Elementor, WooCommerce, bunch of elementor addons on both, then compare the doc timing.
sha256sum
uses a single logical core, and only one hyper-thread (if HT is enabled) will be used for the computation.E31270v6 single core isn't too bad? What's the e3?
Hi @akaemu!
In this case the E3 seems to beat Ryzen in some jobs. Not too bad for an older processor.
This one is Intel(R) Xeon(R) CPU E3-1245 v5 @ 3.50GHz.
Does that answer your question?
My guess is that a virtualized environment might have some overhead penalty, even if the node is relatively quiet.
I think a more fair comparison in my opinion would be to create a qemu VM on your dedicated server and try and see how it compares to the Ryzen VPS. This way you can also "limit" caching by setting a max of 4GB RAM to the VM.
You haven't mentioned if the VPS is KVM or LXC/OVZ (if you have, please disregard this part) but the latter will probably be closer to baremetal than a full fledged VM.
Anything that is disk i/o related activity on VPS (and on VDS) is dependent on activity level of other VMs as disk is a shared resource and unless, a dedicated disk assigned to a VM, it can not compete with dedicated server, even with older cpu running on dedicated server.
Assign a dedicated space to that 2 vCore Ryzen and one will see marked improved in any task that is disk i/o dependent
True VDS will have disk space exclusively single-vm-use dedicated as well.
E3 Xeon CPUs, while older, are optimized for strong single-threaded performance like checksum...
Also, that 2vcore AMD are you sure you get 200% from that CPU?
Hi @Pixels!
Yes, for sure!
Yes. Of course, all this didn't start with the idea of doing a virtualization overhead comparison. Instead, I just did the same thing on VPS and dedi and was surprised that the VPS was actually slower in a case when I assumed wrongly that the VPS would be much faster.
The VPS is KVM.
Sure, but that overhead is tiny.
I can't see that being the difference between minutes.
Hello @AndreiPerju!
Nice to meet you!
Of course!
No, not sure. Unable to compare node and VM CPU performance because, on this machine, I don't have access to the node.
What I do have is Geekbench scores from inside the VM, e.g.,
Not too bad!
Thanks and best wishes!
At the kind suggestion of @cmeerw, I ran
sha256sum -c
in the KVM Ryzen VPS together withvmstat
and GNUtime
.The results below show numbers in the twenties in the second to last column on the right, which is I/O wait (wa).
The VM is running xfs and RAID 10.
Now the question becomes, "Why so much I/O wait time?" Maybe I am going to investigate by running
fio
together withvmstat
to see whether the wait time generalizes to other applications or is somehow limited tosha256sum
.Thanks again to everyone who has posted! More ideas on the I/O wait time?