All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.
RAID1 or ZFS RAIDZ2
I have some old hard drives that is running in RAIDZ2 and haven't been boot up for a long time. I would like to construct a new pool in this few days but I am considering between RAID1 and RAIDZ2.
Due to the parity design of ZFS, I am wondering if the wear and tear of hard drives will be more as compare to RAID1, which shorten the lifespan, and costs even more money than RAID1. One good thing I like about ZFS is silent corruption will be found effectively.
It will be a pool of 8 hard drives, RAIDZ2 will give me a 75% of effective storage while RAID1 have only 50%. There will be more read than write, but usually very little delete will occurs. Any suggestions or recommendations?
Comments
I use RAIDz2 on my NAS at home with 6 3TB drives, giving me a total of 12TB usable storage. I've been running this setup for about 2 years and have had some issue with Seagate 3TB drives (3 failures) but failures with the 3TB Seagate's seems to be pretty common from what I've read online. I've had no issue with the WD drives in the array for the same length of time.
The thing I really like about zfs is it's hardware-agnostic: I can export my array from one machine, move the drives to a new one and import the array. I actually migrated this array from FreeBSD to Ubuntu about a year ago and had no issues with the export/import process.
Performance-wise I've been pretty happy - I get around 90-100MB/s read and 80MB/s write from my Win 10 workstation over Samba.
Seconding Z2.
Thanks for the response, Seagate is really problematic as mentioned in BackBlaze, I am planning to get HGST hard drives anyway and hope it can save me some hassle.
I am now praying the old drives can still be read for at least once to allow me to clone the data to new pool (There are 2 drives from Seagate). Hope I can migrate it successfully too.
Seagate's mostly fine again, they had a bad batch of 3TB drives a while ago.
If you're going for HGST, just be sure to read reviews of the specific model as they've released a few dodgy NAS branded drives recently.
I've got a handful of drives in BTRFS RAID1 for about a year or more without issue. I'd throw that in the mix for consideration.
I have a pretty similar setup and can say that I lost three out of four 3 TB Seagate drives (array never faulted thanks to RAID though). That particular model is just defective and should have been recalled.
That aside, I ran RAIDZ2 for years without issue. I found some noticeable performance impacts when the array reached about 70% of capacity. I've just recently moved from from RAIDZ2 to RAID-10; but I only have four drives to work with so the performance gain was a fair trade off for me. Rebuild times on Z2 will be slower than RAID-1, but I never felt they were horrible considering the size of the drives and what not.
tl;dr
Given your 8 drives I think going with RAIDZ2 is a no-brainer. You'll get a good amount of usable storage and excellent redundancy.
With 8 drives you could even use RAID10 what comes to
Speed gain: 8x read and 4x write speed gain
Fault tolerance: At least 1-drive failure per RAID1 group
Means you could have 4 drive fails en each group of 2 disks
But if 2 disks in the same group fails you have a problem. So i recommend always to backup to an external server/storage.
From my experiments ZFS was much slower than raid 10 in a 4 drive solution.
Keep in mind though, that IOPS seem to increase approximately with the square root of the number of drives, not linearly.
I would personally recommend raid z2, as I'm a sucker for storage capacity. However, after experimenting with it a bit in my own NAS, zfs on linux seems to require some additional configuring (it was eating all the ram it could get its hands on as the arc max size was set to unlimited it seems), so if you want a very easy solution, that's not it.
^^ That as well.
I havent tinkered with any raid10, but I can tell you that my raidz1 for some reason is slow as balls, although it can still decently fill a gbit port.
Yes, your totally right about this but in my experience so far in the long run i would choose RAID10 with 8 drives or more above a zfs build.
Keep in mind ZFS checksums blocks. Even a single, non-redundant drive can know when a bit has corrupted; and with multiple drives in a redundant setup ZFS can correct the failure.
I would have thought to go with zfs mirror pool for drives over 4yrs old
Effectively, if you don't care if it is a bit slow and it is just for file storage ZFS should do fine. (( You should still max out a 1gbps port with this. ))
If you are hosting things off of it, like VMs or something... Ehhh... At that point you probably need a SSD cache drive and a few good chunks of RAM.
You will likely need to tune your ZFS config and remove functions as it is going to eat up most if not all your RAM.
Despite that, they still bought 35 million Seagate drives.
Yea HGST should save you hassle. They are solid rock.
HDD really don't care that much on wear tear. I'd second Z2. We use Z3 for our company NAS and it's stable and quick as fk.
Do you have ECC RAM and if yes, enough RAM?
Thanks for all the comments above, I decided to continue with RAIDZ2 due to its durability.
Besides that, Any recommended external HDD?, just to store some critical file out of the pool, or should I just get a SATA to USB converter with internal HGST drive...
I just read some info about btrfs, but it seems like it is deprecated.
That's why Z2 is better in a sense that any 2 of them can fail
Yes, this is one important factor, I believe if a single bit is corrupted in RAID1, there is no way to know which drive contains the correct bit.
It is mainly for archive purpose, no heavy load on it
Yes, it will be running on 16GB ECC RAM
Good, also take a look at this perhaps http://wintelguy.com/zfs-calc.pl
Also to quote from http://olddoc.freenas.org/index.php/Hardware_Recommendations
"For systems with large disk capacity (greater than 8 TB), a general rule of thumb is 1 GB of RAM for every 1 TB of storage. This post describes how RAM is used by ZFS."
(Just a petty rule of thumb for you to look at)
Sort of.
If the bit is read from a drive and the hard drive itself returns an error (i.e. it tried to read the data, but failed because of a bad sector) the array knows which drive has the bad data; it can then read the correct block from the other drive and (assuming that drive also doesn't return an error) re-write it to a different sector on the drive with an error. All is well. This is one of the reasons it's important to still run regular scrubs on a mirror RAID (be it RAID-1 or RAID-10) because you can still recover some data.
On the other hand, if the bit is read from the drive and there was some sort of silent corruption so the hard drive thinks the data is fine and returns it as such, then both hard drives report that data is fine yet return different data. In that case, recovery is not possible without intervention (which is to say for most of us, recovery is not possible).
Thanks, the calculator webpage is pretty cool on calculating effective space.
Yes, I read from somewhere about the RAM should be same as the Storage too (Differ by the unit). If that's the case I might have to increase my RAM because the pool will be 32TB
I had 4x4 TB drives with 16 GB of RAM while running ZFS and often found I had only megabytes of RAM free at any point in time. ZFS is extremely aggressive in using RAM for caching (and dumping huge amounts of that to actually use for user processes was noticable). I'd definitely recommend upgrading RAM if it's within your means.
So how does Datacenter usually handle this (For example: Backblaze, Amazon Glacier) to ensure data integrity? Or they don't?
I have no idea because I've never done anything to that scale. However, I'm assuming they're not using mirrored RAID levels. Something like RAID-6 or RAIDZ2 is able to fix things from parity and what not.
See the following, rather interesting:
https://www.backblaze.com/blog/vault-cloud-storage-architecture/
HGST drives are HOT and LOUD doesn't mean they'll fail but it would be wise to have airflow across them.