Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!


Critical Data Backup - Page 3
New on LowEndTalk? Please Register and read our Community Rules.

All new Registrations are manually reviewed and approved, so a short delay after registration may occur before your account becomes active.

Critical Data Backup

13»

Comments

  • jsgjsg Member, Resident Benchmarker

    @darkimmortal said:
    Without RAID, when you hit a URE and if you notice it, you are left with a filesystem-specific manual faff to identify the file affected and restore it from backup

    (a) URE != corrupted data
    (b) If you backup files from a Raid'ed disk, the backup will contain the (possibly "repaired" by Raid) data.
    (c) If the problem isn't an URE but corrupted data Raid will not magically repair them.
    Note that 'corrupted' isn't limited to e.g. bit flips but can - and often does - result from an application writing out corrupted data.

    The whole thing is much more complicated when looking closer. Example: drive 1 of an R1 has some bits flipped and drive 2 has some bits flipped too and neither (or both) have a CRC error.
    Short, checksums (very significantly) increase the chance to detect errors - but they aren't a guarantee, nor do they address all possible situations, plus they usually can not correct data.

    Erasure codes otoh can correct corrupted data (e.g. bit flips).

    Also one must differentiate, most importantly between correct but corrupted data (e.g. bit flips on the device) vs "logically corrupt" data like (often CRC correct) data that are however corrupt on a higher level. Example: an app writes out hex data with some digits between 'g' and 'z' due to some internal problem but those wrong data are written out and stored correctly.

  • darkimmortaldarkimmortal Member
    edited December 2021

    @jsg said:

    @darkimmortal said:
    Without RAID, when you hit a URE and if you notice it, you are left with a filesystem-specific manual faff to identify the file affected and restore it from backup

    (a) URE != corrupted data
    (b) If you backup files from a Raid'ed disk, the backup will contain the (possibly "repaired" by Raid) data.
    (c) If the problem isn't an URE but corrupted data Raid will not magically repair them.
    Note that 'corrupted' isn't limited to e.g. bit flips but can - and often does - result from an application writing out corrupted data.

    The whole thing is much more complicated when looking closer. Example: drive 1 of an R1 has some bits flipped and drive 2 has some bits flipped too and neither (or both) have a CRC error.
    Short, checksums (very significantly) increase the chance to detect errors - but they aren't a guarantee, nor do they address all possible situations, plus they usually can not correct data.

    Erasure codes otoh can correct corrupted data (e.g. bit flips).

    Also one must differentiate, most importantly between correct but corrupted data (e.g. bit flips on the device) vs "logically corrupt" data like (often CRC correct) data that are however corrupt on a higher level. Example: an app writes out hex data with some digits between 'g' and 'z' due to some internal problem but those wrong data are written out and stored correctly.

    Right, nothing other than domain-specific checks can detect logically corrupt data such as from software bugs. On server grade hardware there should be nothing else in between that and UREs. So it is an argument of semantics, when I say corrupt data I mean the only type of corruption that one could expect to run into and be able to detect/fix - UREs (due to a transient disk issue at write time or bitrot in-place)

Sign In or Register to comment.