is this acceptable for nvme?

nszerver · December 2024

I bought the machine yesterday and used it, but sometimes it slows down during installation, the machine is freshly reinstalled and the nvme is in ahaci in bios, is this acceptable in the case of nvme or could it be faulty?

/dev/nvme0n1 SN204508909101 GIGABYTE GP-GSM2NE3128GNTD
1 128.04 GB / 128.04 GB 512 B + 0 B EDFM00.5
yabs:

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/nvme0n1p4):

Block Size	4k (IOPS)	64k (IOPS)
Read	157.10 MB/s (39.2k)	246.88 MB/s (3.8k)
Write	157.52 MB/s (39.3k)	248.18 MB/s (3.8k)
Total	314.62 MB/s (78.6k)	495.07 MB/s (7.7k)

Block Size	512k (IOPS)	1m (IOPS)
------	--- ----	---- ----
Read	339.29 MB/s (662)	371.23 MB/s (362)
Write	357.32 MB/s (697)	395.96 MB/s (386)
Total	696.61 MB/s (1.3k)	767.20 MB/s (748)

How can I check what the error is? Sometimes it's very fast, sometimes it's slow.

ralf · December 2024

Use smartctl -a /dev/nvme0n1 (ot whatever device) and look at the data. The most important ones are "Percentage Use" which shows how much of the stated lifetime write capacity has occurred (note, that drives, can survive way past 100%), and "Available Spare" which is normally 100% and gets reduced as the memory cells starts degrading and the drive is using some of the spare (unadvertised) capacity to replace the dead cells. You'd have to look at the specs for the drive to know how much of an issue this is, but I'd definitely be monitoring it regularly if it was below 100% and making sure my backup strategy was in place / replacing the drive depending on how quickly the number was decreasing.

MikeA · December 2024

Looks normal since that NVMe drive is very low end/old.
https://www.gigabyte.com/SSD/GIGABYTE-NVMe-SSD-128GB#kf
Sequential read speeds up to 1550 MB/s.
Sequential write speeds up to 550 MB/s.

Nobody should advertise NVMe and delivery a dedicated server with that drive in it.

nszerver · December 2024

Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning : 0
temperature : 26 C
available_spare : 100%
available_spare_threshold : 5%
percentage_used : 5%
data_units_read : 1,904,474
data_units_written : 3,190,542
host_read_commands : 26,572,446
host_write_commands : 70,195,517
controller_busy_time : 640
power_cycles : 694
power_on_hours : 3,610
unsafe_shutdowns : 632
media_errors : 0
num_err_log_entries : 1,298
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1 : 52 C
Thermal Management T1 Trans Count : 0
Thermal Management T2 Trans Count : 0
Thermal Management T1 Total Time : 0
Thermal Management T2 Total Time : 0

Falzo · December 2024

looks totally fine. I suggest to rather not look at benchmarks, when you have no idea how to read them ;-)

smartctl (smartmontools) can give you additional data, but I don't think that there will be anything wrong with it.

that it sometimes slows down is often related to its internal cache and once this is filled rates are dropping quickly. could also be due to throttling, if it gets too hot - which totally depends on the nevironment and case where it has been built into.

ralf · December 2024

As for the speed, look at the specs here: https://gigabyte.com/SSD/GIGABYTE-NVMe-SSD-128GB#kf

Write speed for sequential tops out at 550MB/s sequential and read speed tops out at 1550MB/s. Your write speed isn't far from that for 1MB blocks, but reads quite slow. It might just be that the drive has a RAM cache and the quoted read speeds are ideal if the data is already in the cache.

nszerver · December 2024

@ralf said:
Use smartctl -a /dev/nvme0n1 (ot whatever device) and look at the data. The most important ones are "Percentage Use" which shows how much of the stated lifetime write capacity has occurred (note, that drives, can survive way past 100%), and "Available Spare" which is normally 100% and gets reduced as the memory cells starts degrading and the drive is using some of the spare (unadvertised) capacity to replace the dead cells. You'd have to look at the specs for the drive to know how much of an issue this is, but I'd definitely be monitoring it regularly if it was below 100% and making sure my backup strategy was in place / replacing the drive depending on how quickly the number was decreasing.

Error:
== START OF INFORMATION SECTION ===
Model Number: GIGABYTE GP-GSM2NE3128GNTD
Serial Number: SN204508909101
Firmware Version: EDFM00.5
PCI Vendor/Subsystem ID: 0x1987
IEEE OUI Identifier: 0x6479a7
Total NVM Capacity: 128,035,676,160 [128 GB]
Unallocated NVM Capacity: 0
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 128,035,676,160 [128 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Mon Dec 9 12:23:37 2024 CET
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Other
Optional NVM Commands (0x005e): Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Other
Maximum Data Transfer Size: 64 Pages
Warning Comp. Temp. Threshold: 85 Celsius
Critical Comp. Temp. Threshold: 95 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 4.50W - - 0 0 0 0 0 0
1 + 2.70W - - 1 1 1 1 0 0
2 + 2.16W - - 2 2 2 2 0 0
3 - 0.0700W - - 3 3 3 3 1000 1000
4 - 0.0020W - - 4 4 4 4 5000 45000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 1
1 - 4096 0 0
=== START OF SMART DATA SECTION ===
Read NVMe SMART/Health Information failed: NVMe Status 0x2002

cybertech · December 2024

which provider giving such nvme

ralf · December 2024

@nszerver said:

@ralf said:
Use smartctl -a /dev/nvme0n1

Error:
=== START OF SMART DATA SECTION ===
Read NVMe SMART/Health Information failed: NVMe Status 0x2002

Are you passing a partition device to smartctl or the actual drive? If it ends in something like p1 then drop the p1 part.

nszerver · December 2024

@cybertech said:
which provider giving such nvme

no, I bought the machine used.
at home.

nszerver · December 2024

@ralf said:

@nszerver said:

@ralf said:
Use smartctl -a /dev/nvme0n1

Error:
=== START OF SMART DATA SECTION ===
Read NVMe SMART/Health Information failed: NVMe Status 0x2002

Are you passing a partition device to smartctl or the actual drive? If it ends in something like p1 then drop the p1 part.

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 23 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 5%
Data Units Read: 1,904,481 [975 GB]
Data Units Written: 3,190,836 [1.63 TB]
Host Read Commands: 26,572,642
Host Write Commands: 70,206,548
Controller Busy Time: 640
Power Cycles: 694
Power On Hours: 3,610
Unsafe Shutdowns: 632
Media and Data Integrity Errors: 0
Error Information Log Entries: 1,299
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 49 Celsius
Error Information (NVMe Log 0x01, max 16 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 1299 0 0x001d 0x4004 0x004

layer7 · December 2024

@nszerver said:
I bought the machine yesterday and used it, but sometimes it slows down during installation, the machine is freshly reinstalled and the nvme is in ahaci in bios, is this acceptable in the case of nvme or could it be faulty?

/dev/nvme0n1 SN204508909101 GIGABYTE GP-GSM2NE3128GNTD
1 128.04 GB / 128.04 GB 512 B + 0 B EDFM00.5
yabs:

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/nvme0n1p4):

Block Size 4k (IOPS) 64k (IOPS)

Read 157.10 MB/s (39.2k) 246.88 MB/s (3.8k)

Write 157.52 MB/s (39.3k) 248.18 MB/s (3.8k)

Total 314.62 MB/s (78.6k) 495.07 MB/s (7.7k)

Block Size 512k (IOPS) 1m (IOPS)

------ --- ---- ---- ----

Read 339.29 MB/s (662) 371.23 MB/s (362)

Write 357.32 MB/s (697) 395.96 MB/s (386)

Total 696.61 MB/s (1.3k) 767.20 MB/s (748)

How can I check what the error is? Sometimes it's very fast, sometimes it's slow.

Hi,

deeeep consumer grade hardware:

https://www.gigabyte.com/SSD/GIGABYTE-NVMe-SSD-128GB#kf

=> Warranty: Limited 5-year or 110TBW

Thats all normal for this kind of hardware...

Falzo · December 2024

as said before. totally fine. might not be the fastest drive on the planet for sure. but fine for what it is.

maybe try monitoring the temperature and if the slowing down happens when it becomes hot. if that is what happens here, check if you can vent it better.

also might wanna check, how it actually is connected. directly on the mainboard? or with a slot adapter etc.

still nothing really wrong with it I would say.

ralf · December 2024

Yeah, if you believe the numbers that looks like a basically new drive, although the "Power On Hours" might have overflowed, as it's unlikely that a drive that was only 150 days old would have had 632 unsafe shutdown and 694 power cycles. The units read/write numbers also look reasonable, at around 10x the drive capacity which would suggest it's not been used for heavy load.

The only thing that looks suspicious to me is the 1299 error log entries. That might suggest a fault, I'm not sure. Keep running the test over the next few days and see if it increases. It's possible it's related to the high number of power cycles, and could be reporting on the same single bad block each time.

Personally, I wouldn't care too much about drive speed from yabs, unless you notice it being an actual problem with your use case. It's quite possible there are other factors limiting performance, e.g. if you have a slow CPU as well as a slow drive. Just run smartctl every week or so and just worry if the numbers are changing too much.

Falzo · December 2024

@ralf could be that it was in some external case and often plugged/unplugged or the likes. each unsafe shutdown might have produced at least one entry in the error log and so on... I wouldn't worry too much. maybe spent 50 bucks to replace it with a shiny new drive with much more capacity on top.

nszerver · December 2024

Motherboard:
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: A320M-S2H-CF
It is in the M2 slot on the motherboard.

nszerver · December 2024

@layer7 said:

@nszerver said:
I bought the machine yesterday and used it, but sometimes it slows down during installation, the machine is freshly reinstalled and the nvme is in ahaci in bios, is this acceptable in the case of nvme or could it be faulty?

/dev/nvme0n1 SN204508909101 GIGABYTE GP-GSM2NE3128GNTD
1 128.04 GB / 128.04 GB 512 B + 0 B EDFM00.5
yabs:

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/nvme0n1p4):

Block Size 4k (IOPS) 64k (IOPS)

Read 157.10 MB/s (39.2k) 246.88 MB/s (3.8k)

Write 157.52 MB/s (39.3k) 248.18 MB/s (3.8k)

Total 314.62 MB/s (78.6k) 495.07 MB/s (7.7k)

Block Size 512k (IOPS) 1m (IOPS)

------ --- ---- ---- ----

Read 339.29 MB/s (662) 371.23 MB/s (362)

Write 357.32 MB/s (697) 395.96 MB/s (386)

Total 696.61 MB/s (1.3k) 767.20 MB/s (748)

How can I check what the error is? Sometimes it's very fast, sometimes it's slow.

Hi,

deeeep consumer grade hardware:

https://www.gigabyte.com/SSD/GIGABYTE-NVMe-SSD-128GB#kf

=> Warranty: Limited 5-year or 110TBW

Thats all normal for this kind of hardware...

then I think I should slowly add a new nvme ssd.
Which one would be the best and fastest? medium price? m.2 2280 in.

cybertech · December 2024

entry level motherboard with entry grade NVMe.

looks normal.

try to allocate memory for HMB and see if it improves.

layer7 · December 2024

@nszerver said:
then I think I should slowly add a new nvme ssd.
Which one would be the best and fastest? medium price? m.2 2280 in.

Hi,

we use Seagate Firecuda and WD SN700 mostly.

Corsair MP510 or higher could be also an option.

While i am not sure if thats the price range you are looking for. But they all have quiet high TBW duration and they are definitely not slow.

But you should definitely check what is causing this:

Controller Busy Time: 640
Power Cycles: 694
Power On Hours: 3,610
Unsafe Shutdowns: 632

3600 power on hours with 700 power cycles and 600 shutdowns? so every 1h an unclean shutdown with a powercycle? Are you resetting your server every 1h hard? ^^;

This, together with the controller busy time which is according to Intel:

"
Controller Busy Time (in minutes)

Contains the amount of time the controller is busy with I/O commands. The controller is busy when there is a command outstanding to an I/O Queue. (Specifically, a command was issued by way of an I/O Submission Queue Tail doorbell write and the corresponding completion queue entry has not been posted yet to the associated I/O Completion Queue.) This value is reported in minutes.
"

looks like your server has some very strange problem.

Either your NVMe drive or what ever holds your M.2 drive ( PCIe card or onboard ) seems to be not OK.

darkimmortal · December 2024

@ralf said:
As for the speed, look at the specs here: https://gigabyte.com/SSD/GIGABYTE-NVMe-SSD-128GB#kf

Write speed for sequential tops out at 550MB/s sequential and read speed tops out at 1550MB/s. Your write speed isn't far from that for 1MB blocks, but reads quite slow. It might just be that the drive has a RAM cache and the quoted read speeds are ideal if the data is already in the cache.

YABS tests mixed read/write simultaneously so the numbers will always be lower than specs

DataRecovery · December 2024

@nszerver said:
is this acceptable in the case of nvme or could it be faulty?
Product Name: A320M-S2H-CF

Yes, for the drives like this one, such speeds are rather typical.

Even the "major" brands have budget NVMe SSDs, which aren't much faster than the SATA ones. E.g. see Intel 600p.

Falzo · December 2024

@layer7 said: looks like your server has some very strange problem.

it's not a server from what he wrote. at least nothing constantly powered on in some datacenter, but just some machine with used parts that now runs at his home.

one can probably only speculate, what the usage scenario of that drive has been before it ended up in that box.

nszerver · December 2024

I'll put in a new SSD and see how it works.

nszerver · December 2024

New Line error.
Error Information (NVMe Log 0x01, max 16 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 1300 0 0x0013 0x4004 0x004 0 1 -
1 1299 0 0x001d 0x4004 0x004 0 1
nvme error-log /dev/nvme0
Error Log Entries for device:nvme0 entries:16
.................
Entry[ 0]
.................
error_count : 1300
sqid : 0
cmdid : 0x13
status_field : 0x4004(INVALID_FIELD)
parm_err_loc : 0x4
lba : 0
nsid : 0x1
vs : 0
.................
Entry[ 1]
.................
error_count : 1299
sqid : 0
cmdid : 0x1d
status_field : 0x4004(INVALID_FIELD)
parm_err_loc : 0x4
lba : 0
nsid : 0x1
vs : 0
.................
Entry[ 2]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[ 3]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[ 4]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[ 5]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[ 6]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[ 7]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[ 8]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[ 9]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[10]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[11]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[12]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[13]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[14]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[15]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................

nszerver · December 2024

@nszerver said:
New Line error.
Error Information (NVMe Log 0x01, max 16 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 1300 0 0x0013 0x4004 0x004 0 1 -
1 1299 0 0x001d 0x4004 0x004 0 1

nszerver · December 2024

finally found the error.
EDITED ... grub add ...
grub default cmd:
nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
and no error.

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 35 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 5%
Data Units Read: 1,912,417 [979 GB]
Data Units Written: 3,195,359 [1.63 TB]
Host Read Commands: 26,606,154
Host Write Commands: 70,588,430
Controller Busy Time: 641
Power Cycles: 694
Power On Hours: 3,617
Unsafe Shutdowns: 632
Media and Data Integrity Errors: 0
Error Information Log Entries: 1,300
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 61 Celsius

Error Information (NVMe Log 0x01, max 16 entries)
No Errors Logged

ralf · December 2024

Have you just rebooted? From googling, 0x4004 might be unrecognised data in some field, so probably the computer trying to speak a newer version of the NVMe protocol than your drive understands. If that's the case, and the computer is doing that twice during every boot, it neatly explains why the error count was about double the power cycle count.

If so, try rebooting a few more times and see if it goes up by 2 every time. If so, I'd say, you don't have anything to worry about.

nszerver · December 2024

@ralf said:
Have you just rebooted? From googling, 0x4004 might be unrecognised data in some field, so probably the computer trying to speak a newer version of the NVMe protocol than your drive understands. If that's the case, and the computer is doing that twice during every boot, it neatly explains why the error count was about double the power cycle count.

If so, try rebooting a few more times and see if it goes up by 2 every time. If so, I'd say, you don't have anything to worry about.

Thankyou added grub nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
And update-grub
And reboot
And no errors and fast again

layer7 · December 2024

@nszerver said:

@ralf said:
Have you just rebooted? From googling, 0x4004 might be unrecognised data in some field, so probably the computer trying to speak a newer version of the NVMe protocol than your drive understands. If that's the case, and the computer is doing that twice during every boot, it neatly explains why the error count was about double the power cycle count.

If so, try rebooting a few more times and see if it goes up by 2 every time. If so, I'd say, you don't have anything to worry about.

Thankyou added grub nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
And update-grub
And reboot
And no errors and fast again

Hi,

with that you should have actually seen something in the kernel log / dmesg... just for the future.

TimboJones · December 2024

@MikeA said:
Looks normal since that NVMe drive is very low end/old.
https://www.gigabyte.com/SSD/GIGABYTE-NVMe-SSD-128GB#kf
Sequential read speeds up to 1550 MB/s.
Sequential write speeds up to 550 MB/s.

Nobody should advertise NVMe and delivery a dedicated server with that drive in it.

If they have to replace under warranty, this is bad for provider.
If they charge remote hands to fix this, it'll be a cash maker and provider is an asshole.

cybertech · December 2024

Howdy, Stranger!

Categories

In this Discussion

is this acceptable for nvme?

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/nvme0n1p4):

Comments

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/nvme0n1p4):

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/nvme0n1p4):

chargeback yesterday

Howdy, Stranger!

Quick Links

Categories

In this Discussion

is this acceptable for nvme?

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/nvme0n1p4):

Comments

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/nvme0n1p4):

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/nvme0n1p4):

chargeback yesterday