All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
A seriously big problem with ZFS and HBA Mode to 4kn HDDs , who help me i pay 40$
Hello LET users ,I know it's not the most professional way to express myself to the community, but I have a problem that I can't understand and I'm at my wits' end
So , I'm have a 24x 14 TB HDD SAS 4kn
I'm put this HDDs in ZFS RaidZ2 , at a certain time, for example after space usage, at 10 TB, or 32 TB, or 40 TB, one or more HDDs begin to appear as "Fault" in zfs status and raid disappears , if I reboot to node, sometimes all HDDs return to normal and raid starts to appear and work all VPS, but sometimes disappears completely and can not recover data If I test the HDDs individually, they have no problem, how can I solve this because I despair
What it's the problem?Bad HDDs??Poor raid configuration??Ram memory problems??Any help it's very happy
Regards
Comments
Oh my god, it is $40 per 15 minutes?
@Hotmarer i'm paid 40$ who have a solution , or know what causes this
Regards
You deploy ZFS on server(s) without anything controlling HDD's be it software or hardware. So yeah.. I bet your $40 problem is with "HBO mode"
Hello @amarc raid it's make directly from proxmox , proxmox don't recommend usage any controller on ZFS
Just a example:
Regards,
Calin
zfs raid needs a lot of RAM so if the server runs out of RAM it will break the raid,
1 GB per TB of actual disk (since you lose some to parity).
For example, if you have 16 TB in physical disks, you need 16 GB of RAM.
@Calin check the HDD if it has any bad sectors.
Sometimes if the HDD is too slow to response to sync from zfs, zfs think it failed.
"I pay" or "I'm paid", big difference.
You don't understand. If your disks are connected to server via somekind of RAID (hardware, software or those dummy ones), for example HP XYZ.. that is/could be issue. Basically HP can say "this disk is bad (it's not "HP certified") and ditch it.. even tho there is nothing wrong with it.
Anyone who connects > 4 disks uses a hardware HBA or raid controller to make the physical connections.
As long as that HBA interface is used only for the hardware connection, there are no problems (we also have a storage server with 16 HDDs with proxmox and it works perfectly).
https://github.com/openzfs/zfs/issues/14734
Collect and check the logs.
Thanks for the $40.
Are you using ECC RAM and how much?
Hello , thanks all for the help
>
I'm start implement this on my server test for check
Thanks , I'm start implement this , I'm read and it s dedup problem
>
It's on tasks for re-check all HDDs
All these 3 are implemented in our tasks, unfortunately I do not have so many test nodes with a similar configuration, so I will take 1 task per week, I will come back in the next 3 weeks here and see who will receive that amount of money
Thanks again , all for help!
Regards
I am not sure, faulty memory is so bad that it is unlikely the system would not generate kernel panics or similar.
I have had intermittent problems like that and it was never a memory issue, neither ram, nor cache in the controller, when there were memory problems, then the failure was catastrophic and fast unless the memory was may more than needed which is very unlikely the case here.
I remember when I bought a large number of disks (32) and the vendor didn't have the whole batch and ordered more from their supplier but they ended up being a different revision with a different set of chips. Until I returned all the older revision and replaced with the newer one, I couldn't get a stable raid, it kept failing mysteriously with various errors.
https://discourse.practicalzfs.com/
try this resource they can help.
@Calin Looking at Proxmox documentation ZFS uses 50% of the Host Memory for ARC
Do not use zfs if you don't understand it. Xfs, ext4 those are uncomplicated, simple fs's. Zfs, btrfs and alikes are complicated with requirements to read documentation and have strong background in linux generaly.
And please post more (Kernel) logs. I assume it's the unalligned write command....
Could this could be something mundane as @Calin forgot to add startup power off all those drives when chosing the PSU? 24x drives could use extra ~250-300W during startup. Also, those drives probaly needed to be split into separate power rails.
45Drive has always says that they start the drive in sections.
my 2cents
truenas scale (not core!) is way better for ZFS than pure proxmox
check the simple things, cables and power, 24 drives spinning up can peak like insane current. (2A-5A per drive isn't new)
for 24 drives at 14TB each i would recommend around uhh 200ish gigs 256gb should be plenty. i hope i dont have to mention ECC but let's be sure...
ZFS is very very complicated and should not be used without very good understand of how it works.
I use proxmox with software RAID at home. I am hoping this servers will get stable enough to use as a backup location