Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Help Needed! Proxmox + KVM Guest; Randomly hangs without any reason, no console, watchdog useless - Page 2
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Help Needed! Proxmox + KVM Guest; Randomly hangs without any reason, no console, watchdog useless

2

Comments

  • mailcheapmailcheap Member, Host Rep
    edited January 25

    @Levi said:
    If issue is business braking - pay for proxmox support request. They are decent.

    And reasonable pricing too, Proxmox is awesome!

    @PulsedMedia said:
    Survey; People don't like to advertise the fact their beloved ZFS has failed, but they let it slip sometimes. Well known case example; Jake from LTT is as obvious ZFS zealot as they come; But even he let it slip on one of their videos ZFS has nuked LTT data, i understood multiple times, and config issues on the same video.

    LTT has so many videos I don't watch all of them, but if you're referencing that one video where they had 15 wide RAIDz3 arrays with no monitoring for failed drives then it was a complete shitshow on the LTT admin's part (rewatched the video, there was no admin. they set it up and never once took a look at it again until more than 3 drives failed taking the whole array with it 😂). They lost a bunch of their archived video because of poor monitoring/maintenance.

    When the monitoring in question is a one-line config change, it's inexcusable.

    Pavin.

    Thanked by 1fluffernutter
  • @mailcheap said: We'll have to agree to disagree on ZFS, it has been rock solid for us these past few years.

    I definitely side with @mailcheap here. If anything, ZFS gives you an assurance (and feedback) on integrity (and accepted, that one does pay a performance penalty for it - like with many things, it's a tradeoff - for many folks, the integrity is important enough that they're willing to give up some performance for it).

    I still miss using it (simplicity and performance) as our primary datastore (moved to ceph) but we still use it for our backups.

    I'd politely disagree on "simplicity". ZFS is a relatively complex beast with lots of knobs, bells and whistles to tune/tweak/trash. It does require some good planning and understanding - and with an appropriate investment (esp. technical), you do reap the rewards. It has evolved (with its fair share of issues over time) to a point where it is robust enough for a wide variety of use cases (yes, there are still niggling issues here and there which can risk the loss of an entire pool - just because you use ZFS it doesn't give you an excuse to avoid backups).

    Once you do things well and build a good foundation, the simplicity (relatively speaking) kicks in and then it's snapshots and zfs send+recv all the way.

    Thanked by 1mailcheap
  • SagnikSSagnikS Member, Host Rep

    We had the same issue with VMs locking up on a few servers. I didn't get to test it much, but it seems like it was introduced with the kernel 6.2 update and later updates fixed it, atleast for us.

  • PulsedMediaPulsedMedia Member, Patron Provider
    edited January 25

    @yoursunny said: Lies lies lies.

    Yes, you are lying.
    Some providers have confided to us that they have given you free or near free services (loss leaders) because otherwise you would harass them. You demanded this from us aswell.

    You might not even realize this yourself but you have been such a nasty boy they silently just do what you ask unless it's super expensive. Some people simply can't do introspection / lack of self awareness and potentially you might be one of those people if you have not realized these things.

    @angstrom said: If you or any other user here feels harassed by @yoursunny or any other user, then please document the alleged harassment, letting me or another mod/admin know, and I'll (we'll) look into it. In this connection, I would simply note that a user may sometimes say something that a provider doesn't like to hear -- for example, "I'd like to pay less" -- which wouldn't count as harassment per se

    What is freedom of speech if not for your worst enemy?
    We believe everyone is free to express themselves. They'll have to face the consequences tho, in Yoursunny's case being made fun of ;)
    [insert obligatory pushup image here!]

    I'll let know when something actually unlawful happens.

    @angstrom said: Again, unless you have evidence that @yoursunny acts on behalf of the CCP, please refrain from further cheap CCP rhetoric/accusations

    His own posts and actions. We will not stop from expressing ourselves.

    Why do you always run to protect yoursunny, but apparently don't act on when he harasses others?

    Just today another provider confided to us that Yoursunny had harassed yet another provider a lot here lately, but you were completely MIA when that happened. I didn't see that popcorn drama myself tho.


    @nullnothere said: ZFS is a relatively complex beast with lots of knobs, bells and whistles to tune/tweak/trash.

    I still remember the days it was advertised "no tuning needed, it just works! so you don't need those settings available" lol

    @nullnothere said: and accepted, that one does pay a performance penalty for it

    Same thing here, ZFS used to be advertised as THE highest performance solution.

    @mailcheap said: 15 wide RAIDz3 arrays with no monitoring for failed drives

    Most of the world runs stuff like that, i've even seen super premium cloud provider not even check on their RAID50 array for more than a decade, then panicking when 3rd drive failed, and bunch of others were on the edge. Of course no backups. And to imagine from their half rack of servers they used to sell hosting to very high profile clients paying them 10s of thousands monthly for very little resources.

    15/3 == 20% redundancy factor, you should expect that to be highly reliable. That has something like 600k years mean time to data loss mathematically; if drives immediately swapped AND ZFS wouldn't nuke it by itself.

    Albeit, mission critical data like that, should be backed up multiple times, in plethora of ways.

    @nullnothere said: ZFS gives you an assurance (and feedback) on integrity

    So does MD. It's called a raid check, or simply reading from the array. Have you noticed how mysteriously sometimes MD starts resync / check by itself? It's because a chunk failed to verify, but drive accessible so it was not dropped from array.

    Also by default array check is done every month.

    @mailcheap said: When the monitoring in question is a one-line config change, it's inexcusable.

    If you talk about email monitoring, getting email to go through is not "one line config", it might be a rabbit hole of pain and suffering depending on your configs. It used to be, but these days you have to send from valid domain, SPF, DKIM set, MTAs with history and clean subnets etc.
    Email has become one of the most unreliable // most burdensome communication methods.

    Within your own domain you could send email, but provider like us? If our servers can send email as pulsedmedia.com (or any other domain) we'd need to restrict those MTAs only to target @pulsedmedia.com -- but alas, then a malicious user could register a 9cent seedbox with us, and spam 10 000 000 emails to our helpdesk every hour bringing all other emails down, so it has to still go through regular antispam checks etc.

    Gets complicated wicked fast, when you can't just whitelist IPs etc. That means setting up credentials then for every server independently, sendmail or something on each server. Again more complication.

    It used to be really just say yes to send monitoring email, and it would pick whatever system name and it would go through. These days ... Yeah, no valid SPF or DKIM, dropped.

    So it might be easier to setup rsyslog, or go fetch dmesg logs other means etc etc. (every setup is different, just listed some challenges with email monitoring these days)

    That same is true for MD and Smart too btw, you can make them send email.

    @SagnikS said:
    We had the same issue with VMs locking up on a few servers. I didn't get to test it much, but it seems like it was introduced with the kernel 6.2 update and later updates fixed it, atleast for us.

    Glad to hear your issue got resolved, can't check what kernel we are on right now, but VE7 and VE8 both running, and we've not been as lucky :(
    Must be different kind of halt / race condition.

    Thanked by 1quicksilver03
  • yoursunnyyoursunny Member, IPv6 Advocate
    edited January 25

    @PulsedMedia said:

    @yoursunny said: Lies lies lies.

    Yes, you are lying.
    Some providers have confided to us that they have given you free or near free services (loss leaders) because otherwise you would harass them.

    Lies lies lies.
    Which service in my free compute services list were obtained through extortion?

  • PulsedMediaPulsedMedia Member, Patron Provider

    @yoursunny said:

    @PulsedMedia said:

    @yoursunny said: Lies lies lies.

    Yes, you are lying.
    Some providers have confided to us that they have given you free or near free services (loss leaders) because otherwise you would harass them.

    Lies lies lies.
    Which service in my free compute services list were obtained through extortion?

    Didn't know you have issues with reading comprehension too.

    Further, you are conditioning that for feign. That's dishonest. Why not ask "tell me which service i got via extortion" but you just had to put filters and conditions on that. That's not how honest people work.

    Let's say;

    I have never sold any hosting service [for minecraft]

    Now we both know that is bullshit, but try to prove it's bullshit. No one knows what our customers use their services for, pretty sure there's someone running minecraft server, even on a seedbox but low key enough not to trigger abuse control.

    I could just as easily say for plex streaming.

    Thanked by 20xOkami sillycat
  • 0xOkami0xOkami Member
    edited January 25

    @angstrom said: Again, unless you have evidence that @yoursunny acts on behalf of the CCP, please refrain from further cheap CCP rhetoric/accusations

    Not fair to see you're saying that, he actively does this to people. And I am not even defending @PulsedMedia here, don't even have active services with them at this time nor did I have contact with the provider before before yall start saying "yeah he is prolly a customer bro".

    @PulsedMedia said: The post was about class room someone making a post online which they should not (according to CCP), and that they have to arrest the whole class room instead of single individual because they could not track the person behind NAT.

    I can agree on this, above is true: he has made multiple statements. Please see his old posts, threads where he replied and talked to people. Some posts have been changed tho.

    @yoursunny is definitely very annoying and doesn't add anything valuable to this site.

    LET is becoming a shithole, as told by multiple people.

    Okami

    Thanked by 1PulsedMedia
  • yoursunnyyoursunny Member, IPv6 Advocate

    @0xOkami said:
    @yoursunny is definitely very annoying and doesn't add anything valuable to this site.

    LET is becoming a shithole, as told by multiple people.

    https://github.com/risharde/let-nosunny
    https://github.com/FatGrizzly/no-let
    Choose your plugin.
    You're welcome.

  • @yoursunny said:

    @0xOkami said:
    @yoursunny is definitely very annoying and doesn't add anything valuable to this site.

    LET is becoming a shithole, as told by multiple people.

    https://github.com/risharde/let-nosunny
    https://github.com/FatGrizzly/no-let
    Choose your plugin.
    You're welcome.

    Funny. How about you removing yourself without any help of a plugin?

    Okami

    Thanked by 1PulsedMedia
  • moral of the story kids is, everyone sucks.

    Thanked by 1totally_not_banned
  • yoursunnyyoursunny Member, IPv6 Advocate

    @0xOkami said:

    @yoursunny said:

    @0xOkami said:
    @yoursunny is definitely very annoying and doesn't add anything valuable to this site.

    LET is becoming a shithole, as told by multiple people.

    https://github.com/risharde/let-nosunny
    https://github.com/FatGrizzly/no-let
    Choose your plugin.
    You're welcome.

    Funny. How about you removing yourself without any help of a plugin?

    Okami

    echo 192.0.2.1 lowendtalk.com | sudo tee -a /etc/hosts
    

    You're welcome.

  • @PulsedMedia said: ZFS is the bar none worst "filesystem" out there, it's on it's own class of badness.

    Yea there is hype around it, but if you start asking people, you are hard pressed to find a single person who's not had all their data nuked due to ZFS.

    We've had a perfect 100% track record with ZFS tho, and this is not 1 or 2 setups, but like 50+. Perfect 100% track record of data getting nuked. ;)

    Can you explain how exactly zfs nuked your data? Was it some bug?

  • yoursunnyyoursunny Member, IPv6 Advocate

    @PulsedMedia said:
    Further, you are conditioning that for feign. That's dishonest. Why not ask "tell me which service i got via extortion" but you just had to put filters and conditions on that. That's not how honest people work.

    Lies lies lies.
    We are mentally strong people and don't practice extortion.

  • SagnikSSagnikS Member, Host Rep

    @PulsedMedia said: Glad to hear your issue got resolved, can't check what kernel we are on right now, but VE7 and VE8 both running, and we've not been as lucky

    If possible, probably live migrate the VMs to a node running the latest kernel + QEMU version and see if it still happens. I have a bug now where Proxmox VMs go readonly randomly, will need to look into that more. I use plain old mdadm however, no ZFS (too much of a performance penalty + bugs).

  • alfatarsosalfatarsos Member
    edited January 25

    @PulsedMedia let me just give you some probably very stupid ideas about your issue... and try to help.

    I've had issues in the past with Linux and CPUs, be it virtual or bare-metal ones. That behaviour you're seeing is probably related to some error that pops up when Proxmox does an illegal operation to the CPU or any intermediary in the pathway between the software and the CPU.

    The usage of that VM stays constant because it's a direct halt. I've seen time and again Linux show constant usage even on bare-metal CPU when that happens. Linux actually doesn't stop doing some things when that happens.

    It is something access-related that either has its access denied or makes an illegal operation. Less chance of that happening, but it could be also the OS inside the VM itself trying to pull some weird CPU instruction and, while Proxmox denies, it simply doesn't know what to do from there with that "container". It could also be related to some incorrect energy management by the CPU governor inside that specific VM.

    It's either the cache program or something around this ballpark, I'd believe...

    Since CPUs and configurations were switched, the issue has to be agnostic to the bare-metal hardware, but rather software-started.

    Thanked by 1PulsedMedia
  • @yoursunny said:

    @0xOkami said:

    @yoursunny said:

    @0xOkami said:
    @yoursunny is definitely very annoying and doesn't add anything valuable to this site.

    LET is becoming a shithole, as told by multiple people.

    https://github.com/risharde/let-nosunny
    https://github.com/FatGrizzly/no-let
    Choose your plugin.
    You're welcome.

    Funny. How about you removing yourself without any help of a plugin?

    Okami

    echo 192.0.2.1 lowendtalk.com | sudo tee -a /etc/hosts
    

    You're welcome.

    I understand that you are not quite up to speed mentally, but this goes a long way. I give you an instruction, and then you give a totally stupid response. Memory of a monkey :smile:

    Okami

    Thanked by 2valk PulsedMedia
  • edited January 26

    One of the main problems with ZFS (well rather with Linux but oh well...) is it's license. It simply can't be merged with GPL'd code and therefore will forever stay a second class citizen. Besides when i look at the requirements it demands to perform well i basically loose interest. Admittedly i've found by Btrfs that all those clever features are something i really don't need, so i'm obviously biased but even if those were somewhat beneficial to me i'd still pass on it and if it's anything like Btrfs when things go wrong (painful...) i wouldn't want it anywhere near me too.

    TL;DR: Not a ZFS fan.

    Thanked by 1PulsedMedia
  • yoursunnyyoursunny Member, IPv6 Advocate

    @0xOkami said:
    I understand that you are not quite up to speed mentally, but this goes a long way. I give you an instruction, and then you give a totally stupid response. Memory of a monkey :smile:

    We are a rabbit, not a monkey.

  • MaouniqueMaounique Host Rep, Veteran
    edited January 26

    I have seen some weird Proxmox issues through the years.
    For example, I have experienced such lock-ups with the Virtio driver for network. Changing to another emulation solved the problem completely, but it took me months to reach the exasperation state in which I started to make random changes.
    That was some time ago, though, didn't re-occur, I suppose it was some flawed virtio/kernel issue which solved itself through some upgrade, but it might be relevant if you change the emulations and it solves the issue, at least could put you on the track towards the real culprit.

  • FatGrizzlyFatGrizzly Member, Host Rep

    @yoursunny said:

    @PulsedMedia said:
    Further, you are conditioning that for feign. That's dishonest. Why not ask "tell me which service i got via extortion" but you just had to put filters and conditions on that. That's not how honest people work.

    Lies lies lies.
    We are mentally strong people and don't practice extortion.

    "We"?

  • edited January 26

    @FatGrizzly said:

    @yoursunny said:

    @PulsedMedia said:
    Further, you are conditioning that for feign. That's dishonest. Why not ask "tell me which service i got via extortion" but you just had to put filters and conditions on that. That's not how honest people work.

    Lies lies lies.
    We are mentally strong people and don't practice extortion.

    "We"?

    He's a Nigerian princess.

  • PulsedMediaPulsedMedia Member, Patron Provider

    @0xOkami said:

    @angstrom said: Again, unless you have evidence that @yoursunny acts on behalf of the CCP, please refrain from further cheap CCP rhetoric/accusations

    Not fair to see you're saying that, he actively does this to people. And I am not even defending @PulsedMedia here, don't even have active services with them at this time nor did I have contact with the provider before before yall start saying "yeah he is prolly a customer bro".

    @PulsedMedia said: The post was about class room someone making a post online which they should not (according to CCP), and that they have to arrest the whole class room instead of single individual because they could not track the person behind NAT.

    I can agree on this, above is true: he has made multiple statements. Please see his old posts, threads where he replied and talked to people. Some posts have been changed tho.

    @yoursunny is definitely very annoying and doesn't add anything valuable to this site.

    LET is becoming a shithole, as told by multiple people.

    Okami

    Nope, it's only OK if sunniest of the sunnies does the harassment etc. ;)
    He is protected ;)

    .... joking aside, there's a reason people are making browser plugins to filter out The Sunny, and the funniest thing is that he pointed that himself out. Talk about lack of self reflection and self awareness! Or maybe he is just a narcissistic sociopath. Who knows. He likes to derail threads like this and cause general annoyance tho.

    @itsdeadjim said:

    @PulsedMedia said: ZFS is the bar none worst "filesystem" out there, it's on it's own class of badness.

    Yea there is hype around it, but if you start asking people, you are hard pressed to find a single person who's not had all their data nuked due to ZFS.

    We've had a perfect 100% track record with ZFS tho, and this is not 1 or 2 setups, but like 50+. Perfect 100% track record of data getting nuked. ;)

    Can you explain how exactly zfs nuked your data? Was it some bug?

    Bugs and utter and complete failure of basic logic. Drives never fail RIIIIIGHT?
    At the time if a drive dropped from array, ZFS continued to "write to it", causing whole array to corrupt. Further, at the time there was no "e2fsck" type of utility neither.

    Not sure if those 2 have been fixed.

    One bug exist still tho, presume your drives were part of MD array, maybe partitioned etc. i'm hazy on the exact layout of the drives, BUT you forged to completely wipe the first 20MiB or so of the drives.
    You create ZFS array ,start using it all works fine, then you reboot -> ZFS pool/vdevs are gone, as if there was no ZFS at all. No way to recover.

    Sometimes just a simple system crash or server reboot will also cause ZFS metadata to vanish, and once again; No tools to recover, unlike ext4 which places backup superblocks on the partition.

    @SagnikS said:

    @PulsedMedia said: Glad to hear your issue got resolved, can't check what kernel we are on right now, but VE7 and VE8 both running, and we've not been as lucky

    If possible, probably live migrate the VMs to a node running the latest kernel + QEMU version and see if it still happens. I have a bug now where Proxmox VMs go readonly randomly, will need to look into that more. I use plain old mdadm however, no ZFS (too much of a performance penalty + bugs).

    Already running on latest, and since these are device passthroughs, live migration is not possible.

    The data amounts are so vast that even if they were on say LVM gcow2 images (with the performance impact etc.) the live migrations would take absurdly long. Only way to do with these data mounts live migrations is to have SAN. One VM could have 100TB of data.

    Perhaps share the bugs you've encountered, since others are asking still under the ZFS illusion spell? :)

    Thanked by 10xOkami
  • PulsedMediaPulsedMedia Member, Patron Provider

    @alfatarsos said:
    @PulsedMedia let me just give you some probably very stupid ideas about your issue... and try to help.

    I've had issues in the past with Linux and CPUs, be it virtual or bare-metal ones. That behaviour you're seeing is probably related to some error that pops up when Proxmox does an illegal operation to the CPU or any intermediary in the pathway between the software and the CPU.

    The usage of that VM stays constant because it's a direct halt. I've seen time and again Linux show constant usage even on bare-metal CPU when that happens. Linux actually doesn't stop doing some things when that happens.

    It is something access-related that either has its access denied or makes an illegal operation. Less chance of that happening, but it could be also the OS inside the VM itself trying to pull some weird CPU instruction and, while Proxmox denies, it simply doesn't know what to do from there with that "container". It could also be related to some incorrect energy management by the CPU governor inside that specific VM.

    It's either the cache program or something around this ballpark, I'd believe...

    Since CPUs and configurations were switched, the issue has to be agnostic to the bare-metal hardware, but rather software-started.

    Could very well be, and agree, has to be software side.

    There's very occasional where a drive drops from array, or times out, but those are very rare and easy to distinguish from host dmesg.

    @Maounique said:
    I have seen some weird Proxmox issues through the years.
    For example, I have experienced such lock-ups with the Virtio driver for network. Changing to another emulation solved the problem completely, but it took me months to reach the exasperation state in which I started to make random changes.
    That was some time ago, though, didn't re-occur, I suppose it was some flawed virtio/kernel issue which solved itself through some upgrade, but it might be relevant if you change the emulations and it solves the issue, at least could put you on the track towards the real culprit.

    Been with this on the random changes portion for many months now :(
    I tried CPU settings, driver for block device passthrough, but net driver is untested.

    We got a system now testing those kernel parameters, when that proves to be unfruitful, perhaps i'll try changing network driver.

    Thanked by 1alfatarsos
  • @PulsedMedia said: I still remember the days it was advertised "no tuning needed, it just works! so you don't need those settings available" lol

    That was if you set it up correctly - it is a relatively low maintenance FS once you get all the things setup properly and pay attention to alerts.

    Also, ZFS has evolved a fair bit over the past few years (especially once it has been "unified" into zfsonlinux.org).

    @PulsedMedia said: Same thing here, ZFS used to be advertised as THE highest performance solution.

    Context matters. For the features it offers, once built correctly with the right "parts" for the usecase, it works wonderfully well AND can give you amazing performance (especially when you consider that it could all be large scale spinning rust augmented with fast nvme/ssd devices of a significantly smaller size).

    @PulsedMedia said: So does MD. It's called a raid check, or simply reading from the array. Have you noticed how mysteriously sometimes MD starts resync / check by itself? It's because a chunk failed to verify, but drive accessible so it was not dropped from array.

    MD's check pales in comparison to what ZFS does. In a mirror, MD has no idea which is the "right" copy.

    Another (convoluted) example - when you start with a fresh "pool" there's no sync required for ZFS. With MD, you're going to be spending the first chunk of your time "syncing" that empty drive set. Similarly, a scrub or a resilver is going to be data driven NOT raid/disk size driven. Add in compression (and all the features you get with snapshots, clones, etc.) and ZFS clearly wins on many layers of convenience, speed as well as what I consider guaranteed integrity. It takes a of things to bring that feature set to MD land.

    At the end, it's a choice - it may not be "comfortable" (or even the right choice) for many people/use-cases and that's fine - there are other file systems to choose from and work with. YMMV.

    Nobody is claiming ZFS is perfect (and neither are the other FSs out there) - but taking a completely contrary stance and saying it is useless is taking things to the other (unjustifiable, IMHO) extreme.

    Given ZFS's relative complexity (and non-forgiving nature), it's easy to mess up (and pay the price for it) - but I would stop short of blaming the FS for what is most likely configuration, setup or user error.

    Thanked by 1iKeyZ
  • PulsedMediaPulsedMedia Member, Patron Provider

    @nullnothere said: That was if you set it up correctly - it is a relatively low maintenance FS once you get all the things setup properly and pay attention to alerts.

    conditions conditions and more conditions.
    That way you get even a pile of turd be diamonds! ;)

    You don't need any special config or care when doing MD+Ext4 for it to be low maintenance reliable setup. None at all. Literally none with SSDs to get the performance too. just mdadm create; mkfs and you are done with them.
    (ofc fine tuning can always yield some benefits)

    @nullnothere said: Also, ZFS has evolved a fair bit over the past few years (especially once it has been "unified" into zfsonlinux.org).

    Tried as of late after those news broke out, same issues still plagued. It's still complete utter shite.

    @nullnothere said: Context matters. For the features it offers, once built correctly with the right "parts" for the usecase, it works wonderfully well AND can give you amazing performance (especially when you consider that it could all be large scale spinning rust augmented with fast nvme/ssd devices of a significantly smaller size).

    conditions conditions and more conditions.
    It used to be advertised as The Highest Performance Solution. Period. in the past, without any conditions or caveats.

    Amazing performance my ass! BY DESIGN It cannot offer you amazing performance. That's the friggin' design goal.

    The only situation it can outperform anything else is single, or very few threads sequential workload.
    I don't know any but backups or large single file transfers which are like that. Do you?

    @nullnothere said: Another (convoluted) example - when you start with a fresh "pool" there's no sync required for ZFS. With MD, you're going to be spending the first chunk of your time "syncing" that empty drive set.

    Yes, also "fresh pool" fails when you reboot it with high degree of likelyhood, nor does the drives get tested if they are working.

    MD initial sync verifies the data is coherent and drives are functioning, sure thing it takes time, but it's also part of the QA process and actual assurance you have sane, performing, functional storage system. Unlike ZFS, which couldn't give a rat's ass if the underlying storage works or not.

    @nullnothere said: Add in compression (and all the features you get with snapshots, clones, etc.) and ZFS clearly wins on many layers of convenience

    Yes, incremental rsync is too complicated to setup. Everyone needs snapshots, compression, dedup, clones etc. every day for certainty. NOT.

    None of that matters when your underlaying storage fails with a decree of certainty, and for our in production use case? Those features add negative value, just like for most others. At best they are just a little bit of extra bloat.

    None of that saves you from having actual backups neither.

    @nullnothere said: ZFS clearly wins on many layers of convenience, speed as well as what I consider guaranteed integrity.

    I agree. You never have to worry about free space, because your FS just crashed and you got fresh empty pool to use once again ^_^
    So convenient to never run out of free storage space! ;)

    @nullnothere said: It takes a of things to bring that feature set to MD land.

    Those are not features which belong to MD/RAID at all.

    This is another big issue with ZFS, and thing you fail to realize completely; You need to separate things into layers for management and maintainability.
    ZFS tries to do everything, and does everything poorly, nothing well.

    MD does excellent job of it's purpose, it's narrow focused just to get that job done; Build raid arrays on top of regular drives.

    Ext4 does excellent job of it's purpose, it's narrow focused just to get that job done; Offer an highly performant reliable filesystem.

    ZFS doesn't do good job on either.

    @nullnothere said: At the end, it's a choice - it may not be "comfortable" (or even the right choice) for many people/use-cases and that's fine - there are other file systems to choose from and work with. YMMV.

    Yes, ZFS is The Right Choice absolutely, for your worst enemy >;)

    @nullnothere said: Nobody is claiming ZFS is perfect (and neither are the other FSs out there) - but taking a completely contrary stance and saying it is useless is taking things to the other (unjustifiable, IMHO) extreme.

    Not extreme. Experience based.

    Any storage system primary job, The One job which needs to always go right; To be an storage system! ZFS Fails this because it's too unreliable to call storage system. AT BEST it's Ephemeral Storage System.

    @nullnothere said: Given ZFS's relative complexity (and non-forgiving nature), it's easy to mess up (and pay the price for it) - but I would stop short of blaming the FS for what is most likely configuration, setup or user error.

    conditions conditions and more conditions.
    ZFS was originally advertised as the hands free, no knobs to turn, no tuning needed system, just create your vdevs and pools and be done with it!

    If it needs the skills of 25 years of experience to function more than 1 month. It's a dud.

    Meanwhile, Ext4 works if you just do mkfs.ext4 ... so does ntfs. so does fat. as does ext2, ext3, xfs. All of these require you to click a button or issue 1 command and be done with it, and they work. Just work.

    No experience needed, no tuning needed, no special hardware needed, no "11 magic herbs and spices" required. Nada.

  • @PulsedMedia said: conditions conditions and more conditions.

    All I'm hearing is rants, rants and more rants.

    Nothing concrete in terms of here's what we did (or tried to do) - in fact I don't think I've ever seen ANY post from you on any attempts related to ZFS. Without so much as sharing something about your experiences (not just we tried and it crashed, but with some detail on WHAT you tried etc.) how do you expect your posts to either be convincing to others or even get listened to/read?

    There are clearly MANY MANY satisfied ZFS users out there (myself included) who have "cracked" ZFS enough to trust it with our data. Granted it may not be invaluable Linux ISOs or readily available whatever else that are in abundant supply in the torrentiverse. Yet here we are continuing to use ZFS, defending it even - so clearly we are happy with it and seem to have got our act together to use it as daily work horses and even production systems. It works for us well and we're quite happy with it (and consider some of it's features non-negotiable even).

    Unlike say a bug in a VM where you just restart with "transient" losses, data isn't something that easy to loose+rebuild.

    In your experience ZFS isn't cutting it - and it's perfectly reasonable (in your world view) to go off an use mdadm+ext4. I believe that there's much more to be gained by using ZFS but maybe it's not all that important for your primary use case of seedboxes (with raid-0 and clear expectation that all is lost on drive failure).

    Don't blame the tool when it doesn't work for you (and when it does work for others). It's not your cup of tea and I'll leave it at that (and this is also seems to apply for IPv6 which you loathe for whatever reason).

    Thanked by 2iKeyZ fluffernutter
  • edited January 26

    @nullnothere Genuinely interested: What would i gain from using ZFS? The only thing that seems semi-worthwhile to me is compression and even that is pretty situational. Snapshots seem pointless to me (and the idea of having to take some kind of shadow data into account when assessing the amount of actual storage space i'm using doesn't feel good either) as like @PulsedMedia says they won't replace backups anyways and if i want to keep historical versions of my data i'll just use a VCS. ZFS arrays also seem to have no clear advantage over raid to me and all of this comes with a lot of added complexity and requirements. What am i missing here? I'm not saying it's necessarily bad or anything, i just fail to see the point.

    Thanked by 1nullnothere
  • @totally_not_banned said: What would i gain from using ZFS?

    That totally depends on your use case. For you (from what you've said), it doesn't look like you care much about snapshots (or it's easy enough for you to just restore a backup or go back in time to a VCS). Likewise, maybe compression isn't useful for you.

    As you start to go up in terms of number of people using a file server of sorts along with volume of data (changes), the recovery expectations (RTO+RPO), ZFS starts to become more and more useful. Just knowing if there's corruption, getting it auto-fixed and having that comfort that there's no strange issue with a set of file(s) somewhere is a huge sleep enabler. Compression (now there's zstd, with early abort), dynamic quotas (no more LVM-esque grow volumes), clones that you can promote into first class datasets (etc.), persistent L2ARC ... there are obviously a lot more.

    Like Pavin (@mailcheap) said earlier - dealing with a large collection of files is quite painful and ZFS really excels in terms of being able to snapshot a system to another without really trying to find out which file has changed (ala rsync). I'm a huge user of rsync but when you have millions of files nested at different levels, even traversing the filesystem to figure out WHAT has changed takes too much time. Most people don't hit rsync limits (I do) and have to work around with parallel rsync invocations and breaking things down into nested folders just to get things to work (i.e. complete in reasonable time or even know that it's working, and to be fair, even rsync has improved a lot over the years with much better memory utilization, hash algorithms for fast change detection etc.).

    Of course one can argue that it was not designed properly from the ground up (etc.) and yes - hind sight is 20/20 (or 6/6). Often you don't get the luxury of simply throwing away existing stuff because of limitations.

    My only suggestion is to give it a shot and see how it simplifies your life (or if it isn't worth the pain). In my case it has tremendously helped (and I'm not a ZFS zealot by any standard - I use a healthy mix of mdadm+lvm+ext4, mdadm+lvm+zfs, plain-old-zfs, luks+zfs and other weird combinations as well.

    What's also really nice about zfs is that you can actually get a lot of solid playtime by using files as devices and testing things for yourself.

    If you live in file-system land long enough, you'll come face to face with various limitations (of any FS) and you'll pick your poison accordingly - nothing is perfect of course so continue to use whatever backups and 5-4-3-2-1 or whatever else makes you sleep at night.

    I really encourage you to try it out (just scratch the itch/surface even) and see what you find - I'm positive you'll change your mind even if you don't use it.

  • angstromangstrom Moderator

    @PulsedMedia said:

    @0xOkami said:

    @angstrom said: Again, unless you have evidence that @yoursunny acts on behalf of the CCP, please refrain from further cheap CCP rhetoric/accusations

    Not fair to see you're saying that, he actively does this to people. And I am not even defending @PulsedMedia here, don't even have active services with them at this time nor did I have contact with the provider before before yall start saying "yeah he is prolly a customer bro".

    @PulsedMedia said: The post was about class room someone making a post online which they should not (according to CCP), and that they have to arrest the whole class room instead of single individual because they could not track the person behind NAT.

    I can agree on this, above is true: he has made multiple statements. Please see his old posts, threads where he replied and talked to people. Some posts have been changed tho.

    @yoursunny is definitely very annoying and doesn't add anything valuable to this site.

    LET is becoming a shithole, as told by multiple people.

    Okami

    Nope, it's only OK if sunniest of the sunnies does the harassment etc. ;)
    He is protected ;)

    I said this earlier, but again, if you feel that you're being harassed by someone, document your case and let me/us know, and I'll/we'll review it

    I would ask you to stop the cheap CCP rhetoric/accusations no matter who the targeted user was. There's no special treatment for @yoursunny here

  • edited January 26

    @nullnothere said:
    Just knowing if there's corruption, getting it auto-fixed and having that comfort that there's no strange issue with a set of file(s) somewhere

    That's actually a very valid gain in my opinion.

    Like Pavin (@mailcheap) said earlier - dealing with a large collection of files is quite painful and ZFS really excels in terms of being able to snapshot a system to another without really trying to find out which file has changed (ala rsync). I'm a huge user of rsync but when you have millions of files nested at different levels, even traversing the filesystem to figure out WHAT has changed takes too much time.

    Makes sense. Personally i don't tend to run into that though.

    Of course one can argue that it was not designed properly from the ground up (etc.) and yes - hind sight is 20/20 (or 6/6). Often you don't get the luxury of simply throwing away existing stuff because of limitations.

    To be perfectly honest, i seriously don't know or care as for evaluating that to make sense i'd need to find a use first. I mean, if i'm not going to use it why spend a ton of time thinking about it's design (obviously not checking a new array for sanity doesn't sound ideal but as long as i can force the check manually it's really not much more than a different usage pattern). I still think it's problematic that due to it's license it'll never be able to get first class support on Linux though. Still if that would come out as the major concern there are always the BSDs (and what's left of Solaris but oh well...), which at least partly offer way tighter integration and therefore more support.

    My only suggestion is to give it a shot and see how it simplifies your life (or if it isn't worth the pain).

    Actually that's highly unlikely. I'm kind of anticipating the conclusion here but the only major advantage i see for me personally would be the error correction functionality. That's something i'd like to have. I already had a run-in with Btrfs because i thought compression would be advantageous for my use case (it really wasn't all that much in the end) and the only part where i really noticed i was running a modern filesystem was when things broke and repair proved to be difficult.

    In my case it has tremendously helped (and I'm not a ZFS zealot by any standard - I use a healthy mix of mdadm+lvm+ext4, mdadm+lvm+zfs, plain-old-zfs, luks+zfs and other weird combinations as well.

    I'm extremely boring. It's almost always just madm+ext4 or madm+xfs (decrypt_keyctl renders lvm superfluous in a lot of my setups). I've had a few funky setups involving nbd in the past but those were really just experimental messing around.

    What's also really nice about zfs is that you can actually get a lot of solid playtime by using files as devices and testing things for yourself.

    Well, i might be missing something here but that sounds suspiciously like losetup functionality without losetup.

    If you live in file-system land long enough, you'll come face to face with various limitations (of any FS) and you'll pick your poison accordingly - nothing is perfect of course so continue to use whatever backups and 5-4-3-2-1 or whatever else makes you sleep at night.

    Obviously. I'm probably not the person to push filesystems towards the edge in any shape or form.

    I really encourage you to try it out (just scratch the itch/surface even) and see what you find - I'm positive you'll change your mind even if you don't use it.

    Well, like i've already said above it's highly unlikely. I'd really like to have the error correction but given i don't run into a situation where i'll have to backup massive filesystem structures that's seemingly the only thing i'd gain and i don't think i could justify the overhead for just this. Thanks for the insights though. Those are very much appreciated :)

Sign In or Register to comment.