Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


NVMe Speed Issue :: CentOS vs Windows
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

NVMe Speed Issue :: CentOS vs Windows

Mahfuz_SS_EHLMahfuz_SS_EHL Host Rep, Veteran

Hello,

I'm facing NVMe speed difference between CentOS & Windows. In windows, it's pulling ~3.5 GB/s Read & ~2 GB/s Write.

But, when it's CentOS 7/CentOS 8, it's pulling only 1.2~1.3GB/s write.

[root@localhost ~]# dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync
16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 0.997785 s, 1.1 GB/s

The configuration is: AMD Ryzen 3700X + Asrock Rack X470D4U + 32GB Ram + NVMe Adapter (PCIe x16).

Can anyone shed some light if I'm making any mistake ?

Regards.

Comments

  • Hwo about debian 11?

  • Or maybe use yabs.sh/fio as the software as standard disk benchmark.

    You must know that different software give different result as the method of measurement is different.

  • Filesystem?

  • skorupionskorupion Member, Host Rep

    Different size of bytes

    Thanked by 2Falzo frog
  • tetechtetech Member
    edited October 2021

    @skorupion said:
    Different size of bytes

    Uh... sort of like different weights of 1 kg?

    Thanked by 2Falzo chocolateshirt
  • @chocolateshirt said:
    Or maybe use yabs.sh/fio as the software as standard disk benchmark.

    You must know that different software give different result as the method of measurement is different.

    here is an idea. do the dd or yabs on windows too, using wsl.

    Thanked by 1chocolateshirt
  • Mahfuz_SS_EHLMahfuz_SS_EHL Host Rep, Veteran

    @tetech said:
    Filesystem?

    NTFS in Windows, XFS in CentOS

  • Mahfuz_SS_EHLMahfuz_SS_EHL Host Rep, Veteran

    @skorupion said:
    Different size of bytes

    Probably not. It's the same.

  • coolicecoolice Member
    edited October 2021

    Centos 7-8 has too old kernels to see full potential of your ryzen maybe for the nvme too (all drivers are in the kernel) try Ubuntu 20.4.3 with hwe kernel it is 5.11 or proxmox

    https://askubuntu.com/questions/248914/what-is-hardware-enablement-hwe

    Brand new hardware devices are released to the public always more frequently. And we want such hardware to be always working on Ubuntu, even if it has been released after an Ubuntu release. Six months (the time it takes for a new Ubuntu release to be made) is a very long period in the IT field. Hardware Enablement (HWE) is about that: catching up with the newest hardware technologies.

  • Mahfuz_SS_EHLMahfuz_SS_EHL Host Rep, Veteran

    @yokowasis said:

    @chocolateshirt said:
    Or maybe use yabs.sh/fio as the software as standard disk benchmark.

    You must know that different software give different result as the method of measurement is different.

    here is an idea. do the dd or yabs on windows too, using wsl.

    Yabs provides 500+700 MB, so around 1.2GB/s. I'll post the benchmark as soon as I'm free.

  • edited October 2021

    @Mahfuz_SS_EHL said:

    @yokowasis said:

    @chocolateshirt said:
    Or maybe use yabs.sh/fio as the software as standard disk benchmark.

    You must know that different software give different result as the method of measurement is different.

    here is an idea. do the dd or yabs on windows too, using wsl.

    Yabs provides 500+700 MB, so around 1.2GB/s. I'll post the benchmark as soon as I'm free.

    Which one? 4k? 64k? 512k? Or 1m?

    Thanked by 1AXYZE
  • Try some other OS like Debian 11 or Ubuntu 20.04/21.04 - Centos has old garbage kernel 4.x https://repology.org/project/linux/versions

  • Try ext4 rather than XFS.

  • @tetech said:

    @skorupion said:
    Different size of bytes

    Uh... sort of like different weights of 1 kg?

    while I get the joke, he is right ;-)

    testing 1M blocksize in windows vs 64k in linux obviously leads to different results, if you keep in mind that bandwidth is the result of iops*blocksize ...

  • Mahfuz_SS_EHLMahfuz_SS_EHL Host Rep, Veteran
    edited October 2021

    @Falzo said:

    @tetech said:

    @skorupion said:
    Different size of bytes

    Uh... sort of like different weights of 1 kg?

    while I get the joke, he is right ;-)

    testing 1M blocksize in windows vs 64k in linux obviously leads to different results, if you keep in mind that bandwidth is the result of iops*blocksize ...

    So, what will be the bs & count size if I want to match it to Windows ?? I thought he's talking about Allocation unit of the partition.

  • FalzoFalzo Member
    edited October 2021

    @Mahfuz_SS_EHL said:

    @Falzo said:

    @tetech said:

    @skorupion said:
    Different size of bytes

    Uh... sort of like different weights of 1 kg?

    while I get the joke, he is right ;-)

    testing 1M blocksize in windows vs 64k in linux obviously leads to different results, if you keep in mind that bandwidth is the result of iops*blocksize ...

    So, what will be the bs & count size if I want to match it to Windows ?? I thought he's talking about Allocation unit of the partition.

    I haven't used crystaldiskmark in a while, but from your screenshot seems you are using a testfile of 1 GB size and the blocksize for the first sequential test is 1MB

    dd if=/dev/zero of=test bs=1M count=1k conv=fdatasync

    should be closest to that (1M blocksize and a 1GB testfile)

    Thanked by 1AlwaysSkint
  • Tr33nTr33n Member
    edited October 2021

    What Falzo says is correct, you are testing with different blocksizes. Apart from that, I also observed that dd returns worse results for "small" files (1 GB is small in that case) with a fast storage backend, also under CentOS 7.

    I invested some time to find the cause of this, but could not figure it out and I can't remember the exact details either. However, I believe that there is a short delay when dd is started where some pre stuff is done, which is counted as runtime. With such fast storages as with NVMe and with such small test files this has a big effect on the result.

    If you let dd write a larger file (with 1M blocksize) you should get a similar result as under Windows. By larger file I mean e.g. 10 GB.

    Thanked by 1Falzo
  • @Tr33n said:
    What Falzo says is correct, you are testing with different blocksizes. Apart from that, I also observed that dd returns worse results for "small" files (1 GB is small in that case) with a fast storage backend, also under CentOS 7.

    I invested some time to find the cause of this, but could not figure it out and I can't remember the exact details either. However, I believe that there is a short delay when dd is started where some pre stuff is done, which is counted as runtime. With such fast storages as with NVMe and with such small test files this has a big effect on the result.

    If you let dd write a larger file (with 1M blocksize) you should get a similar result as under Windows. By larger file I mean e.g. 10 GB.

    yes definitely possible, as with small files you are in a sub-second field already, so this is a good suggestion. for comparison it seems reasonable to also use a larger testfile in windows as well. might also help with caching related things, though I haven't checked, how crystaldiskmark handles that.

    always difficult to compare different benchmarks anyway. maybe use fio und centos instead of dd as I have found it more reliable or at least more 'tunable' to replicate the same settings somehow.

  • Mahfuz_SS_EHLMahfuz_SS_EHL Host Rep, Veteran

    @Falzo said:

    @Mahfuz_SS_EHL said:

    @Falzo said:

    @tetech said:

    @skorupion said:
    Different size of bytes

    Uh... sort of like different weights of 1 kg?

    while I get the joke, he is right ;-)

    testing 1M blocksize in windows vs 64k in linux obviously leads to different results, if you keep in mind that bandwidth is the result of iops*blocksize ...

    So, what will be the bs & count size if I want to match it to Windows ?? I thought he's talking about Allocation unit of the partition.

    I haven't used crystaldiskmark in a while, but from your screenshot seems you are using a testfile of 1 GB size and the blocksize for the first sequential test is 1MB

    dd if=/dev/zero of=test bs=1M count=1k conv=fdatasync

    should be closest to that (1M blocksize and a 1GB testfile)

    bs=1M count=1K pulls the same as 1.2GB/s. But, if I increase the filesize then sequentially speed increases too. I have got fio. But the command I found is:

    fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1

    I think I need to change the stats here to match with CrystalMark (SEQ1M Q8T1 is interpreted as Sequential Test for 1 Mebibyte block size data with total of 8 tasks in sequence on thread 1.)

  • Mahfuz_SS_EHLMahfuz_SS_EHL Host Rep, Veteran
    [root@localhost ~]# fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=64k --size=256m --numjobs=16 --iodepth=16 --runtime=60 --time_based --end_fsync=1
    random-write: (g=0): rw=randwrite, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T) 64.0KiB-64.0KiB, ioengine=posixaio, iodepth=16
    ...
    fio-3.19
    Starting 16 processes
    random-write: Laying out IO file (1 file / 256MiB)
    random-write: Laying out IO file (1 file / 256MiB)
    random-write: Laying out IO file (1 file / 256MiB)
    random-write: Laying out IO file (1 file / 256MiB)
    random-write: Laying out IO file (1 file / 256MiB)
    random-write: Laying out IO file (1 file / 256MiB)
    random-write: Laying out IO file (1 file / 256MiB)
    random-write: Laying out IO file (1 file / 256MiB)
    random-write: Laying out IO file (1 file / 256MiB)
    random-write: Laying out IO file (1 file / 256MiB)
    random-write: Laying out IO file (1 file / 256MiB)
    random-write: Laying out IO file (1 file / 256MiB)
    random-write: Laying out IO file (1 file / 256MiB)
    random-write: Laying out IO file (1 file / 256MiB)
    random-write: Laying out IO file (1 file / 256MiB)
    Jobs: 16 (f=16): [w(9),F(1),w(5),F(1)][100.0%][w=1513MiB/s][w=24.2k IOPS][eta 00m:00s]
    random-write: (groupid=0, jobs=1): err= 0: pid=2221: Mon Oct  4 23:05:49 2021
      write: IOPS=1750, BW=109MiB/s (115MB/s)(6656MiB/60840msec); 0 zone resets
        slat (nsec): min=510, max=447420, avg=4725.32, stdev=4749.57
        clat (usec): min=35, max=2210.7k, avg=7554.41, stdev=120393.52
         lat (usec): min=38, max=2210.7k, avg=7559.13, stdev=120393.49
    .
    .
    .
    Run status group 0 (all jobs):
      WRITE: bw=1835MiB/s (1925MB/s), 109MiB/s-119MiB/s (115MB/s-125MB/s), io=109GiB (117GB), run=60020-60959msec
    
    Disk stats (read/write):
      nvme0n1: ios=0/905877, merge=0/3, ticks=0/5909119, in_queue=5909119, util=98.30%
    
  • FalzoFalzo Member
    edited October 2021

    yeah you're getting closer.

    with fio obviously --bs=1M would represent the correct blocksize and --size=1GB the overall testfile size. you don't need 16 jobs, with fio they run in parallel but I think Q8T1 in CDM means it breaks down it's test in 8 parts that still run one after another (1 thread), so you'd rather want only one job. yet if CPU power plays a role windows could still handle that differently - I don't know.

    however deep you dive into it, IMHO your takeaway should be: do not overengineer it!

    the nvme speeds won't change just because of the os. there might be slight differences depending on filesystem and such, but I'd say these are rather negligible.
    everything you see in the benchmarks that comes across as big difference reflects an artificial combination of different settings, and as said before even caching could play a role...

    also maybe try vpsbench if you want to see linux beat windows anyway.
    ...sorry @jsg , but I simply couldn't resist ;-)

  • Mahfuz_SS_EHLMahfuz_SS_EHL Host Rep, Veteran

    @Falzo said:
    yeah you're getting closer.

    with fio obviously --bs=1M would represent the correct blocksize and --size=1GB the overall testfile size. you don't need 16 jobs, with fio they run in parallel but I think Q8T1 in CDM means it breaks down it's test in 8 parts that still run one after another (1 thread), so you'd rather want only one job. yet if CPU power plays a role windows could still handle that differently - I don't know.

    however deep you dive into it, IMHO your takeaway should be: do not overengineer it!

    the nvme speeds won't change just because of the os. there might be slight differences depending on filesystem and such, but I'd say these are rather negligible.
    everything you see in the benchmarks that comes across as big difference reflects an artificial combination of different settings, and as said before even caching could play a role...

    also maybe try vpsbench if you want to see linux beat windows anyway.
    ...sorry @jsg , but I simply couldn't resist ;-)

    Actually the motherboard was having NVMe Slots of PCIe Gen3 x2 & PCIe Gen2 x4. I wanted to use PCIe Gen3 x4. That's why I managed adapters. But, then got confused at this. Now, this is clear to me. Thanks for helping out <3

    Thanked by 1Falzo
  • jsgjsg Member, Resident Benchmarker

    @coolice quoted:
    Brand new hardware devices are released to the public always more frequently. And we want such hardware to be always working on Ubuntu, even if it has been released after an Ubuntu release. Six months (the time it takes for a new Ubuntu release to be made) is a very long period in the IT field. Hardware Enablement (HWE) is about that: catching up with the newest hardware technologies.

    (a) "HWE" is mainly about stuff that actually needs newer or specific driver like graphics cards.
    (b) For NVMe there is a standard driver because NVMe, unlike e.g. graphics cards, is a standard. So, unless one is on a really old kernel like 2.6 there is no need at all to worry about v.5.x vs v.4x

    @Mahfuz_SS_EHL said:
    Yabs provides 500+700 MB, so around 1.2GB/s. I'll post the benchmark as soon as I'm free.

    Makes no sense on diverse levels. For one the testing methods are totally different (e.g. read+write in 1 test vs. read test + write test), plus Windows and Unix/linux are totally different beasts on many levels.

    @Falzo said:

    dd ... conv=fdatasync

    Are you sure that CrystalMark reads/writes sync?
    Anyway, I think you nailed it well by hinting at the diverse differences between benchmarks (not to even talk between Windows and Unix/linux).

    @Mahfuz_SS_EHL said:
    Actually the motherboard was having NVMe Slots of PCIe Gen3 x2 & PCIe Gen2 x4. I wanted to use PCIe Gen3 x4. That's why I managed adapters. But, then got confused at this. Now, this is clear to me. Thanks for helping out <3

    G3x2 is roughly equal to G2x4. Btw in such a case you should go with G2x4 because in case the NVMe itself is G2 (unlikely but it could be) it will at least see - and use - 4 lanes.

    Finally a piece of advice: use whatever OS you'll run in production for benchmarking. Unless you plan to use Windows don't waste time on Windows benchmarks. Also don't be overly concerned about fine details, because you don't know how VM users (and the software they use) will use the system anyway. A database for example has very different needs and ways to operate than say some kind of a file server.
    As a provider you should just care about no test (e.g. random writing) being particularly bad plus learn valuable information for your node caching strategy.

Sign In or Register to comment.