proxmox

ZFS Data Corruption – What you need to know for TrueNAS, UnRAID,

ZFS Data Corruption – What you need to know for TrueNAS, UnRAID, Proxmox and Ubuntu

#ZFS #Data #Corruption #TrueNAS #UnRAID

“Digital Spaceport”

Impacted by the recent ZFS data corruption bug? You may be affected and not even know it! Find out what is happening, who is impacted, and what you can do. Stay till the end to see my plan of action.
Hard Drives:
Recommended Server…

source

 

To see the full content, share this page by clicking one of the buttons below

Related Articles

10 Comments

  1. Great discussion, but man this is not good right as I'm planning out the move from ext4-based md arrays to zfs and kvm to Proxmox. I feel like I cannot trust Proxmox at all now… Also, those older WD enterprise drives are incredible, I still have a pair of 2TBs here that have to be around the 10 year+ mark as well.

  2. Your description at 3:30ish is incorrect in a very important way: Corruption is not introduced in a normal read. Rather, a hole may be reported where there is none, inside the very tight window for this race condition. If your workload does not check for holes (and most do not, for various reasons, or did not prior to Linux's coreutils update), you are entirely unaffected. Unmodified data is also entirely unaffected, as the issue is data corruption on *reads*, as sent to higher layers, and there are no internal ZFS operations that would involve this bug. VMs are not very likely to be affected, outside of some byzantine scenario with a CoW filesystem running in the VM, backed by a file instead of a zvol (zvol operations would be unaffected by virtue of not dealing with holes).
    In practical terms, backups are very likely to be safe as you would not be backing up data that was just written, over and over again in a tight loop, while checking for holes.
    The unfortunate other side of the coin is that there is no simple way (nor can there be) of detecting files affected by this issue without a ton of false positives (namely all files that are supposed to be full of strings of zeros for whatever reason).

    The best news overall is that it's clear that the baseline error rate is minimal, as the issue went undetected for literal decades until the right combination of factors led to it being easily reproduced and thus investigated.

    The worst news is that suddenly a lot of people are going to look at this and think they were affected, even though they were not and just had a different problem (bad disks, misbehaving applications, whatever).

  3. Well F…..I had to rebuild my proxmox cluster last week, and I have some mission critical VM's running that I'm afraid of having corruption. I do have backups, and they are on my Synology, but I really don't even know where to begin to check if all my VM's are ok.

  4. i was late with migration from old ext4 storage system to new on zfs (electrical installation in new server room was delay), i think my guard angel help me to not hit this bug 🙂 but it is madness and terrifying

  5. I've been using ZFS on my NAS since 2019 and I've never had any corrupted files.
    Nor have I had any data corruption on any ZFS pools on any systems for that matter.
    Other than the block copy bug on Linux, I've never seen any other instances of such a ZFS bug. 🤔

  6. I think it’s impacted me. I recently (2-3 weeks ago) upgraded from Truenas Core 12.x (don’t recall) to 13.U6 and within days both of my pools went ‘unhealthy’. I initially thought I had a bad disk or something but no – there was no hardware issue that I could detect. I scrubbed the pools and although these complete successfully and flag ‘errors’, it did not fix any of those errors. Using zpool mypool status -v lists the files that have ‘errors’ but further scrubbing does not fix them and indeed seems to introduce further errors. I had deleted plot files thinking these were just bad, even though a plots check said they were good. Have I been impacted ? When did this issue first get embedded in the ZFS code ? Did upgrading from Truenas Core 12.x to 13.U6 actually introduce this problem for me ?

Leave a Reply