New on LowEndTalk? Please Register and read our Community Rules.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
is this acceptable for nvme?
I bought the machine yesterday and used it, but sometimes it slows down during installation, the machine is freshly reinstalled and the nvme is in ahaci in bios, is this acceptable in the case of nvme or could it be faulty?
/dev/nvme0n1 SN204508909101 GIGABYTE GP-GSM2NE3128GNTD
1 128.04 GB / 128.04 GB 512 B + 0 B EDFM00.5
yabs:
fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/nvme0n1p4):
Block Size | 4k (IOPS) | 64k (IOPS) |
---|---|---|
Read | 157.10 MB/s (39.2k) | 246.88 MB/s (3.8k) |
Write | 157.52 MB/s (39.3k) | 248.18 MB/s (3.8k) |
Total | 314.62 MB/s (78.6k) | 495.07 MB/s (7.7k) |
Block Size | 512k (IOPS) | 1m (IOPS) |
------ | --- ---- | ---- ---- |
Read | 339.29 MB/s (662) | 371.23 MB/s (362) |
Write | 357.32 MB/s (697) | 395.96 MB/s (386) |
Total | 696.61 MB/s (1.3k) | 767.20 MB/s (748) |
How can I check what the error is? Sometimes it's very fast, sometimes it's slow.
Comments
Use
smartctl -a /dev/nvme0n1
(ot whatever device) and look at the data. The most important ones are "Percentage Use" which shows how much of the stated lifetime write capacity has occurred (note, that drives, can survive way past 100%), and "Available Spare" which is normally 100% and gets reduced as the memory cells starts degrading and the drive is using some of the spare (unadvertised) capacity to replace the dead cells. You'd have to look at the specs for the drive to know how much of an issue this is, but I'd definitely be monitoring it regularly if it was below 100% and making sure my backup strategy was in place / replacing the drive depending on how quickly the number was decreasing.Looks normal since that NVMe drive is very low end/old.
https://www.gigabyte.com/SSD/GIGABYTE-NVMe-SSD-128GB#kf
Sequential read speeds up to 1550 MB/s.
Sequential write speeds up to 550 MB/s.
Nobody should advertise NVMe and delivery a dedicated server with that drive in it.
Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning : 0
temperature : 26 C
available_spare : 100%
available_spare_threshold : 5%
percentage_used : 5%
data_units_read : 1,904,474
data_units_written : 3,190,542
host_read_commands : 26,572,446
host_write_commands : 70,195,517
controller_busy_time : 640
power_cycles : 694
power_on_hours : 3,610
unsafe_shutdowns : 632
media_errors : 0
num_err_log_entries : 1,298
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1 : 52 C
Thermal Management T1 Trans Count : 0
Thermal Management T2 Trans Count : 0
Thermal Management T1 Total Time : 0
Thermal Management T2 Total Time : 0
looks totally fine. I suggest to rather not look at benchmarks, when you have no idea how to read them ;-)
smartctl (smartmontools) can give you additional data, but I don't think that there will be anything wrong with it.
that it sometimes slows down is often related to its internal cache and once this is filled rates are dropping quickly. could also be due to throttling, if it gets too hot - which totally depends on the nevironment and case where it has been built into.
As for the speed, look at the specs here: https://gigabyte.com/SSD/GIGABYTE-NVMe-SSD-128GB#kf
Write speed for sequential tops out at 550MB/s sequential and read speed tops out at 1550MB/s. Your write speed isn't far from that for 1MB blocks, but reads quite slow. It might just be that the drive has a RAM cache and the quoted read speeds are ideal if the data is already in the cache.
Error:
== START OF INFORMATION SECTION ===
Model Number: GIGABYTE GP-GSM2NE3128GNTD
Serial Number: SN204508909101
Firmware Version: EDFM00.5
PCI Vendor/Subsystem ID: 0x1987
IEEE OUI Identifier: 0x6479a7
Total NVM Capacity: 128,035,676,160 [128 GB]
Unallocated NVM Capacity: 0
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 128,035,676,160 [128 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Mon Dec 9 12:23:37 2024 CET
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Other
Optional NVM Commands (0x005e): Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Other
Maximum Data Transfer Size: 64 Pages
Warning Comp. Temp. Threshold: 85 Celsius
Critical Comp. Temp. Threshold: 95 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 4.50W - - 0 0 0 0 0 0
1 + 2.70W - - 1 1 1 1 0 0
2 + 2.16W - - 2 2 2 2 0 0
3 - 0.0700W - - 3 3 3 3 1000 1000
4 - 0.0020W - - 4 4 4 4 5000 45000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 1
1 - 4096 0 0
=== START OF SMART DATA SECTION ===
Read NVMe SMART/Health Information failed: NVMe Status 0x2002
which provider giving such nvme
Are you passing a partition device to
smartctl
or the actual drive? If it ends in something likep1
then drop the p1 part.no, I bought the machine used.
at home.
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 23 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 5%
Data Units Read: 1,904,481 [975 GB]
Data Units Written: 3,190,836 [1.63 TB]
Host Read Commands: 26,572,642
Host Write Commands: 70,206,548
Controller Busy Time: 640
Power Cycles: 694
Power On Hours: 3,610
Unsafe Shutdowns: 632
Media and Data Integrity Errors: 0
Error Information Log Entries: 1,299
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 49 Celsius
Error Information (NVMe Log 0x01, max 16 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 1299 0 0x001d 0x4004 0x004
Hi,
deeeep consumer grade hardware:
https://www.gigabyte.com/SSD/GIGABYTE-NVMe-SSD-128GB#kf
=> Warranty: Limited 5-year or 110TBW
Thats all normal for this kind of hardware...
as said before. totally fine. might not be the fastest drive on the planet for sure. but fine for what it is.
maybe try monitoring the temperature and if the slowing down happens when it becomes hot. if that is what happens here, check if you can vent it better.
also might wanna check, how it actually is connected. directly on the mainboard? or with a slot adapter etc.
still nothing really wrong with it I would say.
Yeah, if you believe the numbers that looks like a basically new drive, although the "Power On Hours" might have overflowed, as it's unlikely that a drive that was only 150 days old would have had 632 unsafe shutdown and 694 power cycles. The units read/write numbers also look reasonable, at around 10x the drive capacity which would suggest it's not been used for heavy load.
The only thing that looks suspicious to me is the 1299 error log entries. That might suggest a fault, I'm not sure. Keep running the test over the next few days and see if it increases. It's possible it's related to the high number of power cycles, and could be reporting on the same single bad block each time.
Personally, I wouldn't care too much about drive speed from yabs, unless you notice it being an actual problem with your use case. It's quite possible there are other factors limiting performance, e.g. if you have a slow CPU as well as a slow drive. Just run smartctl every week or so and just worry if the numbers are changing too much.
@ralf could be that it was in some external case and often plugged/unplugged or the likes. each unsafe shutdown might have produced at least one entry in the error log and so on... I wouldn't worry too much. maybe spent 50 bucks to replace it with a shiny new drive with much more capacity on top.
Motherboard:
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: A320M-S2H-CF
It is in the M2 slot on the motherboard.
then I think I should slowly add a new nvme ssd.
Which one would be the best and fastest? medium price? m.2 2280 in.
entry level motherboard with entry grade NVMe.
looks normal.
try to allocate memory for HMB and see if it improves.
Hi,
we use Seagate Firecuda and WD SN700 mostly.
Corsair MP510 or higher could be also an option.
While i am not sure if thats the price range you are looking for. But they all have quiet high TBW duration and they are definitely not slow.
But you should definitely check what is causing this:
3600 power on hours with 700 power cycles and 600 shutdowns? so every 1h an unclean shutdown with a powercycle? Are you resetting your server every 1h hard? ^^;
This, together with the controller busy time which is according to Intel:
"
Controller Busy Time (in minutes)
Contains the amount of time the controller is busy with I/O commands. The controller is busy when there is a command outstanding to an I/O Queue. (Specifically, a command was issued by way of an I/O Submission Queue Tail doorbell write and the corresponding completion queue entry has not been posted yet to the associated I/O Completion Queue.) This value is reported in minutes.
"
looks like your server has some very strange problem.
Either your NVMe drive or what ever holds your M.2 drive ( PCIe card or onboard ) seems to be not OK.
YABS tests mixed read/write simultaneously so the numbers will always be lower than specs
Yes, for the drives like this one, such speeds are rather typical.
Even the "major" brands have budget NVMe SSDs, which aren't much faster than the SATA ones. E.g. see Intel 600p.
it's not a server from what he wrote. at least nothing constantly powered on in some datacenter, but just some machine with used parts that now runs at his home.
one can probably only speculate, what the usage scenario of that drive has been before it ended up in that box.
I'll put in a new SSD and see how it works.
New Line error.
Error Information (NVMe Log 0x01, max 16 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 1300 0 0x0013 0x4004 0x004 0 1 -
1 1299 0 0x001d 0x4004 0x004 0 1
nvme error-log /dev/nvme0
Error Log Entries for device:nvme0 entries:16
.................
Entry[ 0]
.................
error_count : 1300
sqid : 0
cmdid : 0x13
status_field : 0x4004(INVALID_FIELD)
parm_err_loc : 0x4
lba : 0
nsid : 0x1
vs : 0
.................
Entry[ 1]
.................
error_count : 1299
sqid : 0
cmdid : 0x1d
status_field : 0x4004(INVALID_FIELD)
parm_err_loc : 0x4
lba : 0
nsid : 0x1
vs : 0
.................
Entry[ 2]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[ 3]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[ 4]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[ 5]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[ 6]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[ 7]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[ 8]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[ 9]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[10]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[11]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[12]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[13]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[14]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
Entry[15]
.................
error_count : 0
sqid : 0
cmdid : 0
status_field : 0(SUCCESS)
parm_err_loc : 0
lba : 0
nsid : 0
vs : 0
.................
finally found the error.
EDITED ... grub add ...
grub default cmd:
nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
and no error.
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 35 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 5%
Data Units Read: 1,912,417 [979 GB]
Data Units Written: 3,195,359 [1.63 TB]
Host Read Commands: 26,606,154
Host Write Commands: 70,588,430
Controller Busy Time: 641
Power Cycles: 694
Power On Hours: 3,617
Unsafe Shutdowns: 632
Media and Data Integrity Errors: 0
Error Information Log Entries: 1,300
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 61 Celsius
Error Information (NVMe Log 0x01, max 16 entries)
No Errors Logged
Have you just rebooted? From googling, 0x4004 might be unrecognised data in some field, so probably the computer trying to speak a newer version of the NVMe protocol than your drive understands. If that's the case, and the computer is doing that twice during every boot, it neatly explains why the error count was about double the power cycle count.
If so, try rebooting a few more times and see if it goes up by 2 every time. If so, I'd say, you don't have anything to worry about.
Thankyou added grub nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
And update-grub
And reboot
And no errors and fast again
Hi,
with that you should have actually seen something in the kernel log / dmesg... just for the future.
If they have to replace under warranty, this is bad for provider.
If they charge remote hands to fix this, it'll be a cash maker and provider is an asshole.
chargeback yesterday