Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


A seriously big problem with ZFS and HBA Mode to 4kn HDDs , who help me i pay 40$
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

A seriously big problem with ZFS and HBA Mode to 4kn HDDs , who help me i pay 40$

CalinCalin Member, Patron Provider
edited August 2023 in General

Hello LET users ,I know it's not the most professional way to express myself to the community, but I have a problem that I can't understand and I'm at my wits' end

So , I'm have a 24x 14 TB HDD SAS 4kn

I'm put this HDDs in ZFS RaidZ2 , at a certain time, for example after space usage, at 10 TB, or 32 TB, or 40 TB, one or more HDDs begin to appear as "Fault" in zfs status and raid disappears , if I reboot to node, sometimes all HDDs return to normal and raid starts to appear and work all VPS, but sometimes disappears completely and can not recover data If I test the HDDs individually, they have no problem, how can I solve this because I despair

What it's the problem?Bad HDDs??Poor raid configuration??Ram memory problems??Any help it's very happy

Regards

Comments

  • Oh my god, it is $40 per 15 minutes?

  • CalinCalin Member, Patron Provider

    @Hotmarer i'm paid 40$ who have a solution , or know what causes this

    Regards

  • amarcamarc Veteran

    @Calin said: how can I solve this because I despair

    You deploy ZFS on server(s) without anything controlling HDD's be it software or hardware. So yeah.. I bet your $40 problem is with "HBO mode"

  • CalinCalin Member, Patron Provider

    Hello @amarc raid it's make directly from proxmox , proxmox don't recommend usage any controller on ZFS

    Just a example:

    Regards,
    Calin

  • zfs raid needs a lot of RAM so if the server runs out of RAM it will break the raid,

    1 GB per TB of actual disk (since you lose some to parity).

    For example, if you have 16 TB in physical disks, you need 16 GB of RAM.

    Thanked by 2koly1 Veldanava
  • @Calin check the HDD if it has any bad sectors.
    Sometimes if the HDD is too slow to response to sync from zfs, zfs think it failed.

  • yoursunnyyoursunny Member, IPv6 Advocate
    edited August 2023

    "I pay" or "I'm paid", big difference.

  • amarcamarc Veteran

    @Calin said: Hello @amarc raid it's make directly from proxmox , proxmox don't recommend usage any controller on ZFS

    You don't understand. If your disks are connected to server via somekind of RAID (hardware, software or those dummy ones), for example HP XYZ.. that is/could be issue. Basically HP can say "this disk is bad (it's not "HP certified") and ditch it.. even tho there is nothing wrong with it.

  • FlorinMarianFlorinMarian Member, Host Rep

    @amarc said:

    @Calin said: Hello @amarc raid it's make directly from proxmox , proxmox don't recommend usage any controller on ZFS

    You don't understand. If your disks are connected to server via somekind of RAID (hardware, software or those dummy ones), for example HP XYZ.. that is/could be issue. Basically HP can say "this disk is bad (it's not "HP certified") and ditch it.. even tho there is nothing wrong with it.

    Anyone who connects > 4 disks uses a hardware HBA or raid controller to make the physical connections.
    As long as that HBA interface is used only for the hardware connection, there are no problems (we also have a storage server with 16 HDDs with proxmox and it works perfectly).

  • Collect and check the logs.

    Thanks for the $40.

    Are you using ECC RAM and how much?

    Thanked by 1koly1
  • CalinCalin Member, Patron Provider

    Hello , thanks all for the help

    @wii747 said: For example, if you have 16 TB in physical disks, you need 16 GB of RAM.

    >

    I'm start implement this on my server test for check

    Thanks , I'm start implement this , I'm read and it s dedup problem

    @comXyz said: Sometimes if the HDD is too slow to response to sync from zfs, zfs think it failed.

    >

    It's on tasks for re-check all HDDs

    All these 3 are implemented in our tasks, unfortunately I do not have so many test nodes with a similar configuration, so I will take 1 task per week, I will come back in the next 3 weeks here and see who will receive that amount of money

    Thanks again , all for help!

    Regards

    1. How many GBs of RAM do you have? How many TBs do your drives have?
    2. Are there any errors in system logs?
    3. Did you test your RAM for memory errors (with memtest or something)? The hardware might be faulty
  • MaouniqueMaounique Host Rep, Veteran
    edited August 2023

    @koly1 said: Did you test your RAM for memory errors (with memtest or something)? The hardware might be faulty

    I am not sure, faulty memory is so bad that it is unlikely the system would not generate kernel panics or similar.
    I have had intermittent problems like that and it was never a memory issue, neither ram, nor cache in the controller, when there were memory problems, then the failure was catastrophic and fast unless the memory was may more than needed which is very unlikely the case here.

    I remember when I bought a large number of disks (32) and the vendor didn't have the whole batch and ordered more from their supplier but they ended up being a different revision with a different set of chips. Until I returned all the older revision and replaced with the newer one, I couldn't get a stable raid, it kept failing mysteriously with various errors.

  • emreemre Member, LIR

    @Calin said: I'm put this HDDs in ZFS RaidZ2

    https://discourse.practicalzfs.com/

    try this resource they can help.

    Thanked by 1Calin
  • wii747wii747 Member
    edited August 2023

    @Calin Looking at Proxmox documentation ZFS uses 50% of the Host Memory for ARC

    Limit ZFS Memory Usage
    ZFS uses 50 % of the host memory for the Adaptive Replacement Cache (ARC) by default. Allocating enough memory for the ARC is crucial for IO performance, so reduce it with caution. As a general rule of thumb, allocate at least 2 GiB Base + 1 GiB/TiB-Storage. For example, if you have a pool with 8 TiB of available storage space then you should use 10 GiB of memory for the ARC.

  • LeviLevi Member

    Do not use zfs if you don't understand it. Xfs, ext4 those are uncomplicated, simple fs's. Zfs, btrfs and alikes are complicated with requirements to read documentation and have strong background in linux generaly.

    Thanked by 1tentor
  • @Calin said:
    Hello LET users ,I know it's not the most professional way to express myself to the community, but I have a problem that I can't understand and I'm at my wits' end

    So , I'm have a 24x 14 TB HDD SAS 4kn

    I'm put this HDDs in ZFS RaidZ2 , at a certain time, for example after space usage, at 10 TB, or 32 TB, or 40 TB, one or more HDDs begin to appear as "Fault" in zfs status and raid disappears , if I reboot to node, sometimes all HDDs return to normal and raid starts to appear and work all VPS, but sometimes disappears completely and can not recover data If I test the HDDs individually, they have no problem, how can I solve this because I despair

    What it's the problem?Bad HDDs??Poor raid configuration??Ram memory problems??Any help it's very happy

    Regards

    1. Check SAS cables
    2. Check PSU (zfs is very picky regarding constant V)
    3. Lower SAS speeds to 3 GB (via Kernel Parameter)
    4. Change SAS Controller
    5. Change OS. BSD works better with OpenZFS
  • And please post more (Kernel) logs. I assume it's the unalligned write command....

  • Could this could be something mundane as @Calin forgot to add startup power off all those drives when chosing the PSU? 24x drives could use extra ~250-300W during startup. Also, those drives probaly needed to be split into separate power rails.

    45Drive has always says that they start the drive in sections.

  • my 2cents
    truenas scale (not core!) is way better for ZFS than pure proxmox
    check the simple things, cables and power, 24 drives spinning up can peak like insane current. (2A-5A per drive isn't new)
    for 24 drives at 14TB each i would recommend around uhh 200ish gigs 256gb should be plenty. i hope i dont have to mention ECC but let's be sure...

    ZFS is very very complicated and should not be used without very good understand of how it works.

  • I use proxmox with software RAID at home. I am hoping this servers will get stable enough to use as a backup location

Sign In or Register to comment.