Filesystem Data Integrity: Checksums, Scrubbing, and Silent Corruption Detection
Understand how modern filesystems protect against data corruption with checksums, scrubbing, and error correction. Explore integrity mechanisms through interactive visualizations.
Best viewed on desktop for optimal interactive experience
The Silent Data Corruption Problem
Traditional filesystems (ext4, XFS, FAT) have a critical flaw: they trust the storage layer. If a disk returns corrupted data, the filesystem serves it—no questions asked.
Sources of corruption:
- Bit rot: Cosmic rays, magnetic decay, aging
- Buggy firmware: RAID controller errors, SSD bugs
- Silent failures: Disk returns wrong data (no error reported)
- Memory errors: Corrupted during transfer (no ECC RAM)
- Misdirected writes: Wrong block written (cache/firmware bugs)
The problem: Traditional filesystems detect corruption only during reads—and often not even then.
Modern Integrity Solutions
Checksum-based filesystems (ZFS, Btrfs, ReFS) solve this with:
- End-to-End Checksums: Verify data from disk to application
- Self-Healing: Automatic corruption repair (with redundancy)
- Scrubbing: Proactive corruption detection
- Metadata Protection: Checksums for all metadata too
How Data Integrity Works: Interactive Exploration
See checksum verification, corruption detection, and self-healing in action:
Interactive Data Integrity Demo
Checksum Detection: Finding Silent Corruption
Step 1: Initial Write (Checksum Computed)
What's happening:
- →Application writes PDF data (128KB)
- →Filesystem computes checksum: sha256(data) = abc123def456
- →Write data to Block 5280
- →Store checksum in PARENT metadata (not with data)
- →Separation ensures corruption can't hide
Checksum Mechanisms
ZFS Checksums
Every block checksummed:
Data Block: [file data 128KB] Checksum: sha256(data) Location: Stored in parent metadata (NOT with data)
Why parent storage?
- Corruption can't affect its own checksum
- Read path: Fetch metadata (checksum) → Fetch data → Verify
- Mismatch = Corruption detected
Checksum algorithms:
- fletcher2: Fast, weak (legacy)
- fletcher4: Fast, good (default)
- sha256: Strong, slower (critical data)
- sha512: Strongest, slowest
Configure per dataset:
zfs set checksum=sha256 tank/important zfs set checksum=fletcher4 tank/bulk # Default
Btrfs Checksums
Data and metadata checksummed:
Checksum: crc32c (default) Alternatives: xxhash, sha256, blake2 Location: Stored in parent tree node
Configurable:
# Set checksum algorithm at mkfs mkfs.btrfs --checksum xxhash /dev/sda1 # Or per-file (via properties) btrfs property set /path/to/file checksum sha256
Nodatasum option:
# Disable checksums for specific files (faster, no protection) chattr +C /var/lib/mysql/data # Also disables CoW
ext4 Metadata Checksums
ext4 has limited checksums (metadata only):
# Enable metadata checksums at format mkfs.ext4 -O metadata_csum /dev/sda1 # Or convert existing filesystem tune2fs -O metadata_csum /dev/sda1 # Requires e2fsck
What's protected:
- Superblock
- Group descriptors
- Inode tables
- Directory entries
- Journal
What's NOT protected:
- File data (no data checksums!)
- Can't detect silent data corruption
Corruption Detection Flow
Read Path with Checksums
Traditional filesystem (ext4):
1. Application: read(file, offset) 2. Filesystem: Lookup block number 3. Disk: Return block data 4. Filesystem: Return to application ❌ No verification - corrupt data silently served
Checksum filesystem (ZFS/Btrfs):
1. Application: read(file, offset) 2. Filesystem: Lookup block + checksum 3. Disk: Return block data 4. Filesystem: Compute checksum of data 5. Compare: Computed vs Stored checksum ✅ Match → Return data ❌ Mismatch → Corruption detected! 6. If redundant copy exists: - Try mirror/parity copy - Verify checksum - Return good copy - Repair corrupted copy
Write Path with Checksums
ZFS/Btrfs write:
1. Application: write(data) 2. Filesystem: Compute checksum(data) 3. Write data to new location (CoW) 4. Update parent metadata with: - Pointer to new data block - Checksum value 5. Commit transaction (atomic)
Integrity guarantee:
- Checksum stored BEFORE data is referenced
- Corruption during write detected on next read
- Old data preserved (CoW) until commit
Self-Healing
Requirements for Self-Healing
Need redundancy:
- RAID-1/10: Mirror copies
- RAID-5/6: Parity reconstruction
- ZFS RAID-Z: Parity with checksums
- Btrfs RAID: Mirror or RAID-5/6
Self-healing flow:
1. Read block from disk 1 2. Checksum mismatch → Corruption! 3. Try mirror copy (disk 2) 4. Checksum matches → Good copy found 5. Repair corrupted block: - Write good data to disk 1 - Verify checksum 6. Log repair: "Corrected 1 block"
ZFS Self-Healing
Automatic on every read:
# Read file - automatic healing if corrupted cat /tank/data/file.txt # ZFS detects corruption, repairs from mirror/parity # Check healing stats zpool status -v tank # Shows: "X blocks repaired"
Scrub for proactive healing:
# Read and verify EVERYTHING zpool scrub tank # Monitor progress zpool status tank # Shows: scan: scrub in progress, 45% done
Btrfs Self-Healing
Automatic on read (with RAID):
# File read with corruption cat /mnt/btrfs/file # Btrfs: detects corruption, repairs from mirror # Check errors btrfs device stats /mnt # Shows corruption/repair counts per device
Scrub for proactive healing:
# Scrub all data and metadata btrfs scrub start /mnt # Monitor progress btrfs scrub status /mnt # Shows: "X errors found, Y corrected"
Scrubbing: Proactive Verification
What Is Scrubbing?
Scrub = Read every block, verify checksums, repair corruption
Purpose:
- Find corruption before you need the data
- Detect bit rot early (before spreading)
- Verify RAID parity consistency
- Background integrity maintenance
ZFS Scrubbing
Manual scrub:
# Start scrub zpool scrub tank # Check status zpool status tank # Output: # scan: scrub in progress since Sun Jan 10 12:00:00 2025 # 45.2G scanned at 1.5G/s, 12.1G to go # 0 repaired, 78.9% done # Stop scrub (if needed) zpool scrub -s tank
Automatic scrubbing (recommended):
# Weekly scrub via systemd timer systemctl enable zfs-scrub@tank.timer systemctl start zfs-scrub@tank.timer # Or via cron 0 2 * * 0 zpool scrub tank # Every Sunday 2 AM
Scrub results:
zpool status -v tank # Shows: # errors: No known data errors ✅ # OR # errors: Permanent errors in: # /tank/important/file.txt ❌ # (1 corrupted block, no redundancy to repair)
Btrfs Scrubbing
Manual scrub:
# Start scrub btrfs scrub start /mnt # Monitor btrfs scrub status /mnt # Output: # Scrub started: Sun Jan 10 12:00:00 2025 # Status: running # Total to scan: 100GB # ... # Wait for completion btrfs scrub status -d /mnt # Detailed stats
Automatic scrubbing:
# Monthly scrub via systemd systemctl enable btrfs-scrub@mnt.timer # Or via cron 0 3 1 * * btrfs scrub start /mnt # Monthly
Scrub results:
btrfs scrub status -d /mnt # Shows per-device: # Data extents scrubbed: 12345 # Checksum errors: 10 # Corrected errors: 10 ✅ # Uncorrectable errors: 0
Corruption Types and Detection
Detectable Corruption
With checksums (ZFS/Btrfs):
- ✅ Bit flips in data
- ✅ Bit flips in metadata
- ✅ Misdirected writes (block written to wrong location)
- ✅ Torn writes (partial block write)
- ✅ Firmware bugs returning wrong data
- ✅ Memory corruption during transfer
Undetectable Corruption
Even with checksums:
- ❌ Corruption during write (before checksum computed)
- Mitigation: ECC RAM
- ❌ Application writes wrong data
- Mitigation: Application-level checksums
- ❌ Encryption key corruption
- Mitigation: Key backup and verification
Unrecoverable Corruption
Corruption detected but can't repair:
1. Read block: Checksum mismatch 2. Try redundant copy: Also corrupted (or doesn't exist) 3. Try parity reconstruction: Parity also corrupted 4. Result: Permanent data loss
ZFS response:
zpool status -v tank # errors: Permanent errors have been detected in the following files: # /tank/data/important.txt
Btrfs response:
# Read returns: I/O error # dmesg shows: "checksum error, no good copy found"
Checksum Overhead
Performance Impact
Read path:
- Checksum verification: ~1-5% CPU overhead
- Modern CPUs (with AES-NI, SSE4.2): Negligible
- Usually bottlenecked by disk, not checksum
Write path:
- Checksum computation: ~2-10% CPU overhead
- Depends on algorithm (fletcher4 < sha256)
- Often hidden by disk latency
Benchmarks:
No checksum (ext4): 1000 MB/s read ZFS (fletcher4): 980 MB/s read (-2%) ZFS (sha256): 920 MB/s read (-8%) Btrfs (crc32c): 990 MB/s read (-1%) Bottleneck: Usually disk speed, not checksum
Space Overhead
Checksum storage:
- ZFS: 1/1024 blocks (~0.1% overhead)
- Btrfs: Stored in metadata (~0.5% overhead)
- ext4 metadata_csum: less than 1% for metadata only
Negligible space cost for significant protection
Comparison: Integrity Features
Filesystem | Data Checksums | Metadata Checksums | Self-Healing | Scrubbing |
---|---|---|---|---|
ext4 | ❌ No | ✅ Optional | ❌ No | ❌ No |
XFS | ❌ No | ✅ Yes | ❌ No | ❌ No |
Btrfs | ✅ Yes | ✅ Yes | ✅ With RAID | ✅ Yes |
ZFS | ✅ Yes | ✅ Yes | ✅ With RAID | ✅ Yes |
NTFS | ❌ No | ❌ No | ❌ No | ❌ No |
APFS | ✅ Yes | ✅ Yes | ✅ With RAID | ✅ Yes |
Integrity leaders: ZFS, Btrfs, APFS
Best Practices
1. Use Checksums
For critical data:
# ZFS: Use strong checksums zfs set checksum=sha256 tank/important # Btrfs: Enable checksums (default) mkfs.btrfs /dev/sda1 # crc32c enabled
2. Enable Redundancy
Checksums detect corruption, redundancy repairs it:
# ZFS: RAID-Z or mirror zpool create tank raidz /dev/sda /dev/sdb /dev/sdc # Btrfs: RAID1 or RAID10 mkfs.btrfs -d raid1 -m raid1 /dev/sda /dev/sdb
3. Regular Scrubbing
Monthly for normal use, weekly for critical:
# ZFS monthly scrub systemctl enable zfs-scrub@tank.timer # Btrfs weekly scrub 0 3 * * 0 btrfs scrub start /mnt
4. Monitor Errors
Check for corruption regularly:
# ZFS zpool status -v tank | grep -i error # Btrfs btrfs device stats /mnt
5. Use ECC RAM
Protect in-memory data:
- Checksums protect on-disk data
- ECC RAM protects in-memory data
- Recommended for ZFS/Btrfs servers
6. Test Restores
Verify backups can detect corruption:
# Scrub before backup zpool scrub tank # Wait for completion # Then backup (ensures no silent corruption)
Limitations
What Checksums Don't Protect
-
Application-level corruption: App writes wrong data
- Solution: Application checksums (e.g., database checksums)
-
Corruption during write: Data corrupted before checksum computed
- Solution: ECC RAM
-
No redundancy: Can detect but not repair
- Solution: RAID or replication
-
Complete disk failure: All copies lost
- Solution: Offsite backups
Performance Considerations
When to disable checksums:
- Never for metadata (always checksum metadata)
- Rarely for data (only if proven bottleneck)
- Databases: May have own checksums (DM-Integrity or database-level)
Disable data checksums (Btrfs):
# Per-file (also disables CoW) chattr +C /var/lib/mysql/data # Or mount option (entire filesystem) mount -o nodatasum /dev/sda1 /mnt
Advanced: DM-Integrity (ext4/XFS)
Device-mapper integrity for non-checksum filesystems:
# Create integrity device integritysetup format /dev/sda1 integritysetup open /dev/sda1 integrity-dev # Format with ext4 mkfs.ext4 /dev/mapper/integrity-dev # Mount mount /dev/mapper/integrity-dev /mnt
Provides:
- Block-level checksums (below filesystem)
- Works with any filesystem
- Performance: 10-30% overhead
- See:
man integritysetup
Related Concepts
- Copy-on-Write: Enables atomic checksum updates
- Snapshots: Immutable copies for data protection
- ZFS: End-to-end checksums and self-healing
- Btrfs: Checksums and scrubbing
- RAID: Redundancy for self-healing
Key Takeaways
- Silent Corruption: Traditional filesystems serve corrupted data unknowingly
- Checksums: Detect corruption at read time (ZFS, Btrfs, APFS)
- Self-Healing: Automatic repair with redundancy (RAID)
- Scrubbing: Proactive verification (find corruption early)
- Overhead: Minimal (~1-5% CPU, less than 1% space)
- Best Practice: Checksums + Redundancy + Scrubbing + ECC RAM
- Limitations: Can't fix corruption without redundancy
- ext4/XFS: Use DM-Integrity for block-level checksums