Experiences with ZFS?

Crandolph · February 2018

I've been munching on my popcorn for a while now, thanks to this thread.

Keep it up guys, don't stop.

I don't even have to pay for this entertainment.

lwt · February 2018

@bsdguy said:
@mmoris

Funnily you also seem to still fail to understand the difference between "I had no data loss" and "I noticed no data loss".

Even accepting your interpretation, no other FS I know of goes as far as zfs (by design/engineering) in an attempt to safeguard data. No other FS does checksums for everything like zfs does (I think btrfs gets close or maybe even matches zfs but let's not talk about its reliability).

aaraya1516 · February 2018

@bsdguy I don't want to attack or belittle your opinions. I want to know what you suggest for redundant and hopefully reliable data storage. I switched to ZFS from ext4 because I had data corruption on ext4. I don't think ZFS is a perfect solution, but it is a good one from my experience, and I have not noticed data loss yet. I've also had RAID cards go out and leave my array broken, so HW RAID isn't perfect either. Granted I didn't have a high-end expensive RAID controller. Or is it maybe a workflow to ensure data integrity, like make a backup and validate the data from ZFS (or your favorite filesystem) is exactly matching the backup?

lwt · February 2018

@bsdguy - I understand your arguments, but I am not aware of any combination of standalone tools that can together deliver what zfs does. Would be very interested to know what tools would you use to deliver (a) (zfs) send/receive, (b) no write hole (by design), (c) the same level of data checksumming you get from zfs, (d) snapshots with cloned filesystems, (e) atomic writes.

this is a good summary of some of these features that I depend on every day.

No other FS that I know of is able to allow me to simply roll back to a guaranteed known good state if I somehow messed up something on my system.

mmoris · February 2018

@bsdguy said:

P.S. My aunt has a spare key under a flower pot in front of the house (I'm serious) and there was never a burglary. Does that somehow proof that having the key under flower pots is somehow safe and secure?

Every few years a meteorite is known to hit the earth surface, are you implying it's unsafe for people to walk on the street?

This is the rational base that is present throughout your entire argumentation, please note that so far you haven't added anything useful to the discussion. Except that ZFS requires lots of RAM (no shit, Sherlock).

Apart from this, you've only mentioned that ZFS is not a good FS, without providing any substantial info.

lwt · February 2018

@bsdguy said:
Looking closer one notes, to name but one example, that max net usage of storage devices with zfs is about 75%.

Seriously? Care to better define "max net usage"?

If you refer to usable disk space, in reality it depends on vdev device count and raidz level (none/1/2/3). For a vdev of 10 disks with raidz1 it is 90%, raidz2 80%, raidz3 70%. No different from any equivalent hardware raid array.

If you refer to write performance loss as the array fills out you should know that, for example, Netapp recomends to grow (add disks) an aggregate when the utilization approaches 80% - 85 % for better performance. And zfs has similar recommended utilisation: here

Crandolph · February 2018

lwt · February 2018

@Crandolph show is over AFAIAC, need to get some sleep. Sorry about it - hopefully others will take it further for your benefit.

bsdguy · February 2018

@aaraya1516 said:
@bsdguy I don't want to attack or belittle your opinions. I want to know what you suggest for redundant and hopefully reliable data storage. I switched to ZFS from ext4 because I had data corruption on ext4. I don't think ZFS is a perfect solution, but it is a good one from my experience, and I have not noticed data loss yet. I've also had RAID cards go out and leave my array broken, so HW RAID isn't perfect either. Granted I didn't have a high-end expensive RAID controller. Or is it maybe a workflow to ensure data integrity, like make a backup and validate the data from ZFS (or your favorite filesystem) is exactly matching the backup?

First, kindly note that I expressly wrote that my position is not "zfs is shit"

Maybe I should have made position even clearer. You see, I'm not happy with hardware raid either, i.a. because it has it's own set of problems (e.g. finding a working replacement if a controller fails, or (which I chose as my route) to have a spare one right away).

My viewpoint is largely influenced by my job which is strongly relying on formal logic. And from that point of view zfs simply isn't reliable (nor are most others). What we know is that complexity is something that drastically decreases chances to get something safe and reliable. Just look at the OpenBSD guys whose first and major step with librssl was to rip out lots and lots of cruft and unnecessary complexity. Accordingly I think that a set of smaller, simpler, focused on 1 task components is a much better way to go than a complex all-in-one behemoth like zfs.

Well noted, again: my point is not "zfs is shit" - it rather is "I'm confident that zfs is not as reliable and safe as many fans believe" and, as I laid out, based on well established reasoning and experience.

I would even understand if zfs fans told me that they love the comfort and the many features of that all-in-one approach zfs. The point where I object is when they assert - without any tangible and tenable basis - that zfs also is ultra reliable and safe.

bsdguy · February 2018

@mmoris said:
... without providing any substantial info.

Grave error. You seem to assume that I must prove anything or somehow convince you. Wrong. Those who make the claim that zfs is ultra reliable and safe owe the proof.

Crandolph · February 2018

@bsdguy said:

@mmoris said:
... without providing any substantial info.

Grave error. You seem to assume that I must prove anything or somehow convince you. Wrong. Those who make the claim that zfs is ultra reliable and safe owe the proof.

popcorn eating intensifies

bsdguy · February 2018

@Crandolph said:

@bsdguy said:

@mmoris said:
... without providing any substantial info.

Grave error. You seem to assume that I must prove anything or somehow convince you. Wrong. Those who make the claim that zfs is ultra reliable and safe owe the proof.

popcorn eating intensifies

Get some more popcorn. Chances that they actually provide proof for their "zfs is totally awesomely secure!" credo are quite slim.

lazyt · February 2018

Very interesting.

techhelper1 · February 2018

@bsdguy said:

>

History and context. zfs, like every technology, was conceived and designed under a set of assumptions. One important assumption was "large data" (probably made sense for Sun), another one was full control of hardware. Hence it was questionable (to avoid saying idiotic) to simply transplant that technology to a segment, where by far less than 1% is about large data ("large" as in "many exabytes") and where there not only is virtually no control of hardware (for the zfs guys) but, in fact, hardware is a brutal cut corners everywhere commodity business.

This is just saying you do not like software RAID in general.

zfs is very memory hungry and doing a lot in memory - which, however, often has worse reliability than hard disks and anyway nowhere near high reliability (keep in mind that ecc pretty much always is single error correction, "error" typically meaning bit).

ZFS is memory hungry because it caches data in the ARC (Adaptive Replacement Cache), which is a tweakable parameter. You can also add caching to a zpool in the form of a spare drive like a cheap SSD (which is what I do only for reading), which is caused the L2ARC or Layer 2 ARC. Going directly to your RAM argument, if you have issues with your RAM, you will have other things to worry about than just ZFS.

A related problem is that of locality and responsibility. What we really want is a disk system that offers a simply interface and just works and works reliably. Why? For diverse reasons, one of them being that we want a high payload vs admin ratio, we want as few resources (cpu cycles, mem, ...) as possibly spent on housekeeping and as many as possible on user jobs. That was, in fact, one of the main reasons for raid hw controllers. high payload vs admin ratio and locality (have some specialized device deal with specific (e.g. disk) details and offer a simple resource cheap interface to the main system).

If you want a hardware RAID, that is perfectly fine. ZFS is just a version of software RAID on steroids. Someone can develop a PCI-E card running Linux + ZFS, but no one sees the point in that when the overhead is not an issue and LSI HBA's are very cheap. Going back to the overhead issue, IPv4 and IPv6 network stacks for Intel x86 CPUs convert little endian byte ordering to big-endian, yet can push 10's of gigabits when tuned properly.

Looking at that one finds another questionable view, namely something boiling down to "todays cpus can do checksumming and other (disk related) things much faster than hardware raid controllers anyway". While that's true it quickly turns out to be questionable when being properly framed: Sure, a 20 core xeon is much much faster than, say, an arm based raid processor, but a) that doesn't mean that that arm raid processors is too slow (after all, it's job is quite limited), and b) and more importantly it's not eating away user cycles like the xeon! Moreover, it's bullshit anyway because most relevant algorithms run much faster on a lowly but specialized raid processor than in software on a xeon.

No shit sherlock, of course off loading tasks to specifically crafted ASICs or just different CPUs away from the host, will increase performance. But ARM really does not have a space in the server environment for this to really be an issue.

Safety, security. Here we are at another assumption issue. Sun could and did have very experienced high level developers. Most foss developers, however, are mediocre; a few super stars don't change that fact. Moreover we fucking know - and from plenty pain - that complexity and size are the natural enemy of quality. In case you still have doubts, just have a look at openssl (or linux or windows or ...). So the unix fathers did have concrete and heavyweight pragmatic reasons for the "do 1 thing and do it well" credo.

>

What does it matter what the UNIX fathers thought? Sun and OpenZFS developers are not UNIX forefathers, they wrote code to achieve a new way to manage data on hard drives. This really just sounds like you have a beef with the open source community, more than anything.

Religion and zealotry. Let's be honest, maybe not you but the vast majority of people are herd animals and always following what's cool or fashionable. That is the single most important reason for everyone and his dog preaching how great zfs is. Looking closer one notes, to name but one example, that max net usage of storage devices with zfs is about 75%. Pardon me, you want to tell me to follow a religion that generously throws away the one holy resource it's all about? Thanks, no.

>

Or maybe because ZFS has the featureset that caters to everyone's desires? Snapshots, send/receive datasets, key/value metadata storage, customizable read/write caching, on-the-fly compression and decompression, raw volume support, the list just goes on my friend. Most, if not all of these features listed above, is something every VPS provider could have.

techhelper1 · February 2018

@bsdguy said:
Now look at zfs: It adds a shitload of work to the main processor. For your home server or a small companies thingy that might be quite OK but in a data center it's often nonsensical.

Umm, what? If you are concerned about ZFS chewing up valuable disk I/O, then I would seriously take a look into your network stack wasting time flipping bits on all your precious line rate network traffic.

Moreover an important aspect of data storage I mentioned above just happens to bite us again: The fact that raid cards are so expensive is due to social and economical reasons. Technically, a raid card (I'm talking about a professional 6 x sas/sata card here!) is a few chips, namely (typically) a specialized (xor, galois) arm core, typ. 1 or 2 (cheap) port driver chips, some memory, all together costing less than 50$ plus a bunch of (ridiculously) cheap electronics plus firmware, that's it. Yet you pay 500$ for it.

All the more reason why LSI HBA hardware and ZFS software is a very cheap solution. To quote a famous Bellheads vs Netheads piece, "just deliver the bits, stupid".

According to a Adaptec knowledge base article, you cannot import foreign multi-level RAID arrays if your $500 RAID card takes a shit. Which I find quite disturbing.

techhelper1 · February 2018

@bsdguy said:

>

My viewpoint is largely influenced by my job which is strongly relying on formal logic. And from that point of view zfs simply isn't reliable (nor are most others). What we know is that complexity is something that drastically decreases chances to get something safe and reliable. Just look at the OpenBSD guys whose first and major step with librssl was to rip out lots and lots of cruft and unnecessary complexity. Accordingly I think that a set of smaller, simpler, focused on 1 task components is a much better way to go than a complex all-in-one behemoth like zfs.

What does OpenBSD's philosophy have to do with ZFS? If an OS really wants ZFS, they will either bend it to make it work for that environment, or a user can port it over, it's really that simple.

Well noted, again: my point is not "zfs is shit" - it rather is "I'm confident that zfs is not as reliable and safe as many fans believe" and, as I laid out, based on well established reasoning and experience.

Then I would question my confidence in you as an engineer for not keeping your mind sharp with all the latest storage technologies and keeping an open mind about what the future has to hold. Someone had to trust UFS, XFS, EXT, what is this any different?

I would even understand if zfs fans told me that they love the comfort and the many features of that all-in-one approach zfs. The point where I object is when they assert - without any tangible and tenable basis - that zfs also is ultra reliable and safe.

The point is asserted because the code is open, you can verify it, you can even make corrections, add features for example make it fire up ClamAV on every file write, and compile it into your own version. I don't think you can do that with your $500 RAID card.

mmoris · February 2018

@bsdguy said:

Grave error. You seem to assume that I must prove anything or somehow convince you. Wrong. Those who make the claim that zfs is ultra reliable and safe owe the proof.

Exactly, you on the other hand are suppose to say whatever comes to mind without providing any useful information to back it up. But as you probably know, making noise is not comparable to making music I'm afraid.

If all arguments exposed are not capable to oppose your lack of knowledge on ZFS, than I'm afraid nothing is.

I think the "don't feed the troll" is the only viable approach in this case...

bsdguy · February 2018

@techhelper1

Unfortunately, you pretty much limited yourself to fanboy ranting and failed to understand what you were ranting against. Example: OpenBSD, which I only used as an illustration of a highly important principle, namely that complexity works against reliability. Another example is "Arm": I wasn't talking about Arm based servers. I was talking about an architecture which is used by many Raid, network, and other controllers, incl. btw ASICs.

You see, as engineers we are used to terms meaning something. To say that xyz (e.g. zfs) is reliable - or even enhancing reliability! - has to actually and concretely mean something.

Let me quote an example from Seagate to show you what I'm talking about:

Seagate said:
The product shall achieve an Annualized Failure Rate - AFR - of 0.73% (Mean Time Between Failures - MTBF - of 1.2 Million hrs) when operated in an environment that ensures the HDA case temperatures do not exceed 40°C. Operation at case temperatures outside the specifications in Section 2.9 may increase the product Annualized Failure Rate (decrease MTBF). AFR and MTBF are population statistics that are not relevant to individual units.

AFR and MTBF specifications are based on the following assumptions for business critical storage system environments:

8,760 power-on-hours per year.
250 average motor start/stop cycles per year.
Operations at nominal voltages.
Systems will provide adequate cooling to ensure the case temperatures do not
exceed 40°C. Temperatures outside the specifications in Section 2.9 will increase
the product AFR and decrease MTBF.

Empirical data from backblaze suggest that one should reasonably assume typical AFRs in the range of 1% - 3% with the kind of disks they use (which are also used in many servers). Moreover typical UBEs (bit error on reading) of hard drives are in the 1 in 10^14 range (~~2^42) which is roughly 1 out of 2^39 Bytes which again is roughly 1 in 128 GB. Not exactly a lot ~ not highly reliable. Well noted those are purely statistical values based on an observation of a large number of drives. Moreover we know that hard drives age, i.e. error density increases over time.

Now how reliable is any given software system interacting with hard drives? What are the major factors?

One very major - probably the single most important - factor that can be found in both the scientific literature and in empirical data is complexity. To make the problem worse, complexity does not increase in a linear fashion but exponentially. ALL highly reliable mechanisms and devices we have developed so far have one factor in common: Avoidance of complexity.

The major forms of complexity are algorithmic, temporal, and in space. The first is obvious and the typical target of study, the second is even more obvious, and the third is, to summarize it in a somewhat pragmatical way, about storage (typ. memory).

That is why the unix dogma "have programs do 1 thing and well" is of high significance. It's in a way a one line summary of what we have learned from our studies of complexity. And the rule is as simple as important: Avoid complexity!

And that's also the reason behind the OpenBSD teams decision to quickly get openssl (in the form of libressl) more safe, reliable, and secure by trimming out loads of cruft and exotic cases. Simple reason: Avoid (unnecessary) complexity. The same goes for the people specifying and implementing tls 1.3 in f-star, which is a a largely functional programming language for a reason: functional languages very much minimize space complexity and quite some time complexity.

And now some people want to tell me that zfs, a complex, large conglomerate comprising disk handling, a file system, diverse caches, plus a plethora of utilities is reliable and safe? Sorry, that goes against pretty much everything mankind has learned about systems so far.

Well noted, again, I'm not saying that zfs is shitty and bad. What I'm saying is that I want to see evidence, tangible data and evidence instead of "bsdguy you are an idiot" and "zfs is great" fanboy exclamations.

Show me the evidence.

omelas · March 2018

@bsdguy
Well, I think Unix philosophy on program level is redundant in structured programming languages(like any language with class) as each class should follows the philosophy (each class is one thing, and they became a tool) and 'program' is just a wrapper of them, like a metapackage.

bsdguy · March 2018

Well, a lot of people think a lot of things. However, a statement like "xyz is safe" needs evidence to have any weight.

As you talked about programming: There is evidence for e.g. "Ada offers good mechanisms to create safe software" or "functional programming is safer than imperative languages" both logically and empirically.

lwt · March 2018

@bsdguy. You wrote: "Show me the evidence." What type of evidence are you looking for?

Also. What evidence do you have to prove that ext4 is reliable? Or more reliable than zfs? Can I see it?

I agree that everything else being equal, the more complex a program, the more likely to have more bugs.

But you need to compare like with like.

Show me a combination of tools that give the same benefits/features as zfs, but with a smaller code base and I would certainly adopt them.

When you compare zfs with ufs or other filesystems you assert that it HAS to be less reliable simply because it is more complex. NO. It is likely to have more bugs, yes, but that is not a given. And again, you should compare like with like as I wrote above.

For someone who needs the combination of features zfs offers, there's simply nothing else available on the market except maybe netapp file system which you cannot compare because you do not really know the size of its codebase.

So again, I will accept that zfs is likely to have more bugs than say ufs, but this is meaningless because you do not compare like with like.

bsdguy · March 2018

@lwt said:
Also. What evidence do you have to prove that ext4 is reliable? Or more reliable than zfs? Can I see it?

None - and I need none as I didn't assert that ext4 is somehow safe.

I agree that everything else being equal, the more complex a program, the more likely to have more bugs.

Yes and the bugs don't simply increase in a linear fashion (except for trivial short code). The complexity of (finding) the bugs increases, too.

But you need to compare like with like.

True - but that is, in fact, one of the points that makes me cautious. Note that if we tried and put a normal fs + a bunch of stuff and utilities which together provide something very similar to zfs against zfs we could compare like with like. And I cn tell you the outcome: Chance are very good that zfs would look worse for a simple reason, higher complexity.

Funnily, you do see and understand complexity but in the UI. Your "zfs is all in one and simple and handy to use" attitude (which I can well understand on a human level) also means that the complexity you try to avoid in the UI is simply somewhere else, namely in the code.

But once more: My position is not to drive anyway from zfs. My point is about not simply accepting zfs evangelism. You like it and don't care about what I say? Great, have fun and enjoy zfs - just don't preach to me how great and safe and reliable it is.

lwt · March 2018

@bsdguy "Note that if we tried and put a normal fs + a bunch of stuff and utilities which together provide something very similar to zfs against zfs we could compare like with like."

And that is my point precisely. I have no knowledge of any combination of independent standalone tools that would give something comparable to what zfs provides.

If there was, I would consider moving from zfs on the basis that they may have less bugs than the more monolithic zfs.

But there is no such combination and therefore the argument that zfs is somehow to be avoided because monolithic = full of bugs does not have a leg to stand on.

bsdguy · March 2018

@lwt said:
@bsdguy "Note that if we tried and put a normal fs + a bunch of stuff and utilities which together provide something very similar to zfs against zfs we could compare like with like."

And that is my point precisely. I have no knowledge of any combination of independent standalone tools that would give something comparable to what zfs provides.

If there was, I would consider moving from zfs on the basis that they may have less bugs than the more monolithic zfs.

But there is no such combination and therefore the argument that zfs is somehow to be avoided because monolithic = full of bugs does not have a leg to stand on.

Pardon me, premise error. You assume that the zfs feature set is required - well, it is not (as gazillions of non zfs servers demonstrate).

But I respect your personal attitude as one (among other) views.

lwt · March 2018

@bsdguy Hehe, I do not assume it is required. I make the choice to require it because it makes my life easier. :-). And this is a personal choice, I agree, and I also understand it is a choice you made in the opposite direction.
rgds

Howdy, Stranger!

Categories

In this Discussion

Experiences with ZFS?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion

Experiences with ZFS?

Comments