A seriously big problem with ZFS and HBA Mode to 4kn HDDs , who help me i pay 40$

Calin · August 2023

Hello LET users ,I know it's not the most professional way to express myself to the community, but I have a problem that I can't understand and I'm at my wits' end

So , I'm have a 24x 14 TB HDD SAS 4kn

I'm put this HDDs in ZFS RaidZ2 , at a certain time, for example after space usage, at 10 TB, or 32 TB, or 40 TB, one or more HDDs begin to appear as "Fault" in zfs status and raid disappears , if I reboot to node, sometimes all HDDs return to normal and raid starts to appear and work all VPS, but sometimes disappears completely and can not recover data If I test the HDDs individually, they have no problem, how can I solve this because I despair

What it's the problem?Bad HDDs??Poor raid configuration??Ram memory problems??Any help it's very happy

Regards

Hotmarer · August 2023

Oh my god, it is $40 per 15 minutes?

Calin · August 2023

@Hotmarer i'm paid 40$ who have a solution , or know what causes this

Regards

amarc · August 2023

@Calin said: how can I solve this because I despair

You deploy ZFS on server(s) without anything controlling HDD's be it software or hardware. So yeah.. I bet your $40 problem is with "HBO mode"

Calin · August 2023

Hello @amarc raid it's make directly from proxmox , proxmox don't recommend usage any controller on ZFS

Just a example:

Regards,
Calin

wii747 · August 2023

zfs raid needs a lot of RAM so if the server runs out of RAM it will break the raid,

1 GB per TB of actual disk (since you lose some to parity).

For example, if you have 16 TB in physical disks, you need 16 GB of RAM.

comXyz · August 2023

@Calin check the HDD if it has any bad sectors.
Sometimes if the HDD is too slow to response to sync from zfs, zfs think it failed.

yoursunny · August 2023

"I pay" or "I'm paid", big difference.

amarc · August 2023

@Calin said: Hello @amarc raid it's make directly from proxmox , proxmox don't recommend usage any controller on ZFS

You don't understand. If your disks are connected to server via somekind of RAID (hardware, software or those dummy ones), for example HP XYZ.. that is/could be issue. Basically HP can say "this disk is bad (it's not "HP certified") and ditch it.. even tho there is nothing wrong with it.

FlorinMarian · August 2023

@amarc said:

@Calin said: Hello @amarc raid it's make directly from proxmox , proxmox don't recommend usage any controller on ZFS

You don't understand. If your disks are connected to server via somekind of RAID (hardware, software or those dummy ones), for example HP XYZ.. that is/could be issue. Basically HP can say "this disk is bad (it's not "HP certified") and ditch it.. even tho there is nothing wrong with it.

Anyone who connects > 4 disks uses a hardware HBA or raid controller to make the physical connections.
As long as that HBA interface is used only for the hardware connection, there are no problems (we also have a storage server with 16 HDDs with proxmox and it works perfectly).

Hotmarer · August 2023

https://github.com/openzfs/zfs/issues/14734

TimboJones · August 2023

Collect and check the logs.

Thanks for the $40.

Are you using ECC RAM and how much?

Calin · August 2023

Hello , thanks all for the help

@wii747 said: For example, if you have 16 TB in physical disks, you need 16 GB of RAM.

>

I'm start implement this on my server test for check

@Hotmarer said: https://github.com/openzfs/zfs/issues/14734

Thanks , I'm start implement this , I'm read and it s dedup problem

@comXyz said: Sometimes if the HDD is too slow to response to sync from zfs, zfs think it failed.

>

It's on tasks for re-check all HDDs

All these 3 are implemented in our tasks, unfortunately I do not have so many test nodes with a similar configuration, so I will take 1 task per week, I will come back in the next 3 weeks here and see who will receive that amount of money

Thanks again , all for help!

Regards

koly1 · August 2023

How many GBs of RAM do you have? How many TBs do your drives have?
Are there any errors in system logs?
Did you test your RAM for memory errors (with memtest or something)? The hardware might be faulty

Maounique · August 2023

@koly1 said: Did you test your RAM for memory errors (with memtest or something)? The hardware might be faulty

I am not sure, faulty memory is so bad that it is unlikely the system would not generate kernel panics or similar.
I have had intermittent problems like that and it was never a memory issue, neither ram, nor cache in the controller, when there were memory problems, then the failure was catastrophic and fast unless the memory was may more than needed which is very unlikely the case here.

I remember when I bought a large number of disks (32) and the vendor didn't have the whole batch and ordered more from their supplier but they ended up being a different revision with a different set of chips. Until I returned all the older revision and replaced with the newer one, I couldn't get a stable raid, it kept failing mysteriously with various errors.

emre · August 2023

@Calin said: I'm put this HDDs in ZFS RaidZ2

https://discourse.practicalzfs.com/

try this resource they can help.

wii747 · August 2023

@Calin Looking at Proxmox documentation ZFS uses 50% of the Host Memory for ARC

Limit ZFS Memory Usage
ZFS uses 50 % of the host memory for the Adaptive Replacement Cache (ARC) by default. Allocating enough memory for the ARC is crucial for IO performance, so reduce it with caution. As a general rule of thumb, allocate at least 2 GiB Base + 1 GiB/TiB-Storage. For example, if you have a pool with 8 TiB of available storage space then you should use 10 GiB of memory for the ARC.

Levi · August 2023

Do not use zfs if you don't understand it. Xfs, ext4 those are uncomplicated, simple fs's. Zfs, btrfs and alikes are complicated with requirements to read documentation and have strong background in linux generaly.

snow2k · August 2023

@Calin said:
Hello LET users ,I know it's not the most professional way to express myself to the community, but I have a problem that I can't understand and I'm at my wits' end

So , I'm have a 24x 14 TB HDD SAS 4kn

I'm put this HDDs in ZFS RaidZ2 , at a certain time, for example after space usage, at 10 TB, or 32 TB, or 40 TB, one or more HDDs begin to appear as "Fault" in zfs status and raid disappears , if I reboot to node, sometimes all HDDs return to normal and raid starts to appear and work all VPS, but sometimes disappears completely and can not recover data If I test the HDDs individually, they have no problem, how can I solve this because I despair

What it's the problem?Bad HDDs??Poor raid configuration??Ram memory problems??Any help it's very happy

Regards

Check SAS cables
Check PSU (zfs is very picky regarding constant V)
Lower SAS speeds to 3 GB (via Kernel Parameter)
Change SAS Controller
Change OS. BSD works better with OpenZFS

snow2k · August 2023

And please post more (Kernel) logs. I assume it's the unalligned write command....

hades_corps · August 2023

Could this could be something mundane as @Calin forgot to add startup power off all those drives when chosing the PSU? 24x drives could use extra ~250-300W during startup. Also, those drives probaly needed to be split into separate power rails.

45Drive has always says that they start the drive in sections.

DeadlyChemist · August 2023

my 2cents
truenas scale (not core!) is way better for ZFS than pure proxmox
check the simple things, cables and power, 24 drives spinning up can peak like insane current. (2A-5A per drive isn't new)
for 24 drives at 14TB each i would recommend around uhh 200ish gigs 256gb should be plenty. i hope i dont have to mention ECC but let's be sure...

ZFS is very very complicated and should not be used without very good understand of how it works.

wii747 · August 2023

I use proxmox with software RAID at home. I am hoping this servers will get stable enough to use as a backup location

Howdy, Stranger!

Categories

In this Discussion

A seriously big problem with ZFS and HBA Mode to 4kn HDDs , who help me i pay 40$

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

A seriously big problem with ZFS and HBA Mode to 4kn HDDs , who help me i pay 40$

Comments