Filesystem Data Integrity: Checksums, Scrubbing, and Silent Corruption Detection

The Silent Data Corruption Problem

Traditional filesystems (ext4, XFS, FAT) have a critical flaw: they trust the storage layer. If a disk returns corrupted data, the filesystem serves it—no questions asked.

Sources of corruption:

Bit rot: Cosmic rays, magnetic decay, aging
Buggy firmware: RAID controller errors, SSD bugs
Silent failures: Disk returns wrong data (no error reported)
Memory errors: Corrupted during transfer (no ECC RAM)
Misdirected writes: Wrong block written (cache/firmware bugs)

The problem: Traditional filesystems detect corruption only during reads—and often not even then.

Modern Integrity Solutions

Checksum-based filesystems (ZFS, Btrfs, ReFS) solve this with:

End-to-End Checksums: Verify data from disk to application
Self-Healing: Automatic corruption repair (with redundancy)
Scrubbing: Proactive corruption detection
Metadata Protection: Checksums for all metadata too

How Data Integrity Works: Interactive Exploration

See checksum verification, corruption detection, and self-healing in action:

Interactive Data Integrity Demo

Checksum Detection: Finding Silent Corruption

Step 1: Initial Write (Checksum Computed)

Storage (Disk 1):

Block:5280

Data:Original PDF Data

Metadata (Parent):

Pointer:Block 5280

Checksum:abc123def456

Parent metadata (separate from data)

What's happening:

→Application writes PDF data (128KB)
→Filesystem computes checksum: sha256(data) = abc123def456
→Write data to Block 5280
→Store checksum in PARENT metadata (not with data)
→Separation ensures corruption can't hide

Step 1 of 4

Checksum Mechanisms

ZFS Checksums

Every block checksummed:

Data Block:     [file data 128KB]
Checksum:       sha256(data)
Location:       Stored in parent metadata (NOT with data)

Why parent storage?

Corruption can't affect its own checksum
Read path: Fetch metadata (checksum) → Fetch data → Verify
Mismatch = Corruption detected

Checksum algorithms:

fletcher2: Fast, weak (legacy)
fletcher4: Fast, good (default)
sha256: Strong, slower (critical data)
sha512: Strongest, slowest

Configure per dataset:

zfs set checksum=sha256 tank/important
zfs set checksum=fletcher4 tank/bulk  # Default

Btrfs Checksums

Data and metadata checksummed:

Checksum:     crc32c (default)
Alternatives: xxhash, sha256, blake2
Location:     Stored in parent tree node

Configurable:

# Set checksum algorithm at mkfs
mkfs.btrfs --checksum xxhash /dev/sda1

# Or per-file (via properties)
btrfs property set /path/to/file checksum sha256

Nodatasum option:

# Disable checksums for specific files (faster, no protection)
chattr +C /var/lib/mysql/data  # Also disables CoW

ext4 Metadata Checksums

ext4 has limited checksums (metadata only):

# Enable metadata checksums at format
mkfs.ext4 -O metadata_csum /dev/sda1

# Or convert existing filesystem
tune2fs -O metadata_csum /dev/sda1  # Requires e2fsck

What's protected:

Superblock
Group descriptors
Inode tables
Directory entries
Journal

What's NOT protected:

File data (no data checksums!)
Can't detect silent data corruption

Corruption Detection Flow

Read Path with Checksums

Traditional filesystem (ext4):

1. Application: read(file, offset)
2. Filesystem: Lookup block number
3. Disk: Return block data
4. Filesystem: Return to application
❌ No verification - corrupt data silently served

Checksum filesystem (ZFS/Btrfs):

1. Application: read(file, offset)
2. Filesystem: Lookup block + checksum
3. Disk: Return block data
4. Filesystem: Compute checksum of data
5. Compare: Computed vs Stored checksum
   ✅ Match → Return data
   ❌ Mismatch → Corruption detected!
6. If redundant copy exists:
   - Try mirror/parity copy
   - Verify checksum
   - Return good copy
   - Repair corrupted copy

Write Path with Checksums

ZFS/Btrfs write:

1. Application: write(data)
2. Filesystem: Compute checksum(data)
3. Write data to new location (CoW)
4. Update parent metadata with:
   - Pointer to new data block
   - Checksum value
5. Commit transaction (atomic)

Integrity guarantee:

Checksum stored BEFORE data is referenced
Corruption during write detected on next read
Old data preserved (CoW) until commit

Self-Healing

Requirements for Self-Healing

Need redundancy:

RAID-1/10: Mirror copies
RAID-5/6: Parity reconstruction
ZFS RAID-Z: Parity with checksums
Btrfs RAID: Mirror or RAID-5/6

Self-healing flow:

1. Read block from disk 1
2. Checksum mismatch → Corruption!
3. Try mirror copy (disk 2)
4. Checksum matches → Good copy found
5. Repair corrupted block:
   - Write good data to disk 1
   - Verify checksum
6. Log repair: "Corrected 1 block"

ZFS Self-Healing

Automatic on every read:

# Read file - automatic healing if corrupted
cat /tank/data/file.txt
# ZFS detects corruption, repairs from mirror/parity

# Check healing stats
zpool status -v tank
# Shows: "X blocks repaired"

Scrub for proactive healing:

# Read and verify EVERYTHING
zpool scrub tank

# Monitor progress
zpool status tank
# Shows: scan: scrub in progress, 45% done

Btrfs Self-Healing

Automatic on read (with RAID):

# File read with corruption
cat /mnt/btrfs/file
# Btrfs: detects corruption, repairs from mirror

# Check errors
btrfs device stats /mnt
# Shows corruption/repair counts per device

Scrub for proactive healing:

# Scrub all data and metadata
btrfs scrub start /mnt

# Monitor progress
btrfs scrub status /mnt
# Shows: "X errors found, Y corrected"

Scrubbing: Proactive Verification

What Is Scrubbing?

Scrub = Read every block, verify checksums, repair corruption

Purpose:

Find corruption before you need the data
Detect bit rot early (before spreading)
Verify RAID parity consistency
Background integrity maintenance

ZFS Scrubbing

Manual scrub:

# Start scrub
zpool scrub tank

# Check status
zpool status tank
# Output:
# scan: scrub in progress since Sun Jan 10 12:00:00 2025
#     45.2G scanned at 1.5G/s, 12.1G to go
#     0 repaired, 78.9% done

# Stop scrub (if needed)
zpool scrub -s tank

Automatic scrubbing (recommended):

# Weekly scrub via systemd timer
systemctl enable zfs-scrub@tank.timer
systemctl start zfs-scrub@tank.timer

# Or via cron
0 2 * * 0 zpool scrub tank  # Every Sunday 2 AM

Scrub results:

zpool status -v tank
# Shows:
# errors: No known data errors  ✅
# OR
# errors: Permanent errors in:
#   /tank/important/file.txt    ❌
#   (1 corrupted block, no redundancy to repair)

Btrfs Scrubbing

Manual scrub:

# Start scrub
btrfs scrub start /mnt

# Monitor
btrfs scrub status /mnt
# Output:
# Scrub started: Sun Jan 10 12:00:00 2025
# Status: running
# Total to scan: 100GB
# ...

# Wait for completion
btrfs scrub status -d /mnt  # Detailed stats

Automatic scrubbing:

# Monthly scrub via systemd
systemctl enable btrfs-scrub@mnt.timer

# Or via cron
0 3 1 * * btrfs scrub start /mnt  # Monthly

Scrub results:

btrfs scrub status -d /mnt
# Shows per-device:
# Data extents scrubbed: 12345
# Checksum errors: 10
# Corrected errors: 10      ✅
# Uncorrectable errors: 0

Corruption Types and Detection

Detectable Corruption

With checksums (ZFS/Btrfs):

✅ Bit flips in data
✅ Bit flips in metadata
✅ Misdirected writes (block written to wrong location)
✅ Torn writes (partial block write)
✅ Firmware bugs returning wrong data
✅ Memory corruption during transfer

Undetectable Corruption

Even with checksums:

❌ Corruption during write (before checksum computed)
- Mitigation: ECC RAM
❌ Application writes wrong data
- Mitigation: Application-level checksums
❌ Encryption key corruption
- Mitigation: Key backup and verification

Unrecoverable Corruption

Corruption detected but can't repair:

1. Read block: Checksum mismatch
2. Try redundant copy: Also corrupted (or doesn't exist)
3. Try parity reconstruction: Parity also corrupted
4. Result: Permanent data loss

ZFS response:

zpool status -v tank
# errors: Permanent errors have been detected in the following files:
#   /tank/data/important.txt

Btrfs response:

# Read returns: I/O error
# dmesg shows: "checksum error, no good copy found"

Checksum Overhead

Performance Impact

Read path:

Checksum verification: ~1-5% CPU overhead
Modern CPUs (with AES-NI, SSE4.2): Negligible
Usually bottlenecked by disk, not checksum

Write path:

Checksum computation: ~2-10% CPU overhead
Depends on algorithm (fletcher4 < sha256)
Often hidden by disk latency

Benchmarks:

No checksum (ext4):     1000 MB/s read
ZFS (fletcher4):         980 MB/s read (-2%)
ZFS (sha256):            920 MB/s read (-8%)
Btrfs (crc32c):          990 MB/s read (-1%)

Bottleneck: Usually disk speed, not checksum

Space Overhead

Checksum storage:

ZFS: 1/1024 blocks (~0.1% overhead)
Btrfs: Stored in metadata (~0.5% overhead)
ext4 metadata_csum: less than 1% for metadata only

Negligible space cost for significant protection

Comparison: Integrity Features

Filesystem	Data Checksums	Metadata Checksums	Self-Healing	Scrubbing
ext4	❌ No	✅ Optional	❌ No	❌ No
XFS	❌ No	✅ Yes	❌ No	❌ No
Btrfs	✅ Yes	✅ Yes	✅ With RAID	✅ Yes
ZFS	✅ Yes	✅ Yes	✅ With RAID	✅ Yes
NTFS	❌ No	❌ No	❌ No	❌ No
APFS	✅ Yes	✅ Yes	✅ With RAID	✅ Yes

Integrity leaders: ZFS, Btrfs, APFS

Best Practices

1. Use Checksums

For critical data:

# ZFS: Use strong checksums
zfs set checksum=sha256 tank/important

# Btrfs: Enable checksums (default)
mkfs.btrfs /dev/sda1  # crc32c enabled

2. Enable Redundancy

Checksums detect corruption, redundancy repairs it:

# ZFS: RAID-Z or mirror
zpool create tank raidz /dev/sda /dev/sdb /dev/sdc

# Btrfs: RAID1 or RAID10
mkfs.btrfs -d raid1 -m raid1 /dev/sda /dev/sdb

3. Regular Scrubbing

Monthly for normal use, weekly for critical:

# ZFS monthly scrub
systemctl enable zfs-scrub@tank.timer

# Btrfs weekly scrub
0 3 * * 0 btrfs scrub start /mnt

4. Monitor Errors

Check for corruption regularly:

# ZFS
zpool status -v tank | grep -i error

# Btrfs
btrfs device stats /mnt

5. Use ECC RAM

Protect in-memory data:

Checksums protect on-disk data
ECC RAM protects in-memory data
Recommended for ZFS/Btrfs servers

6. Test Restores

Verify backups can detect corruption:

# Scrub before backup
zpool scrub tank
# Wait for completion
# Then backup (ensures no silent corruption)

Limitations

What Checksums Don't Protect

Application-level corruption: App writes wrong data
- Solution: Application checksums (e.g., database checksums)
Corruption during write: Data corrupted before checksum computed
- Solution: ECC RAM
No redundancy: Can detect but not repair
- Solution: RAID or replication
Complete disk failure: All copies lost
- Solution: Offsite backups

Performance Considerations

When to disable checksums:

Never for metadata (always checksum metadata)
Rarely for data (only if proven bottleneck)
Databases: May have own checksums (DM-Integrity or database-level)

Disable data checksums (Btrfs):

# Per-file (also disables CoW)
chattr +C /var/lib/mysql/data

# Or mount option (entire filesystem)
mount -o nodatasum /dev/sda1 /mnt

Advanced: DM-Integrity (ext4/XFS)

Device-mapper integrity for non-checksum filesystems:

# Create integrity device
integritysetup format /dev/sda1
integritysetup open /dev/sda1 integrity-dev

# Format with ext4
mkfs.ext4 /dev/mapper/integrity-dev

# Mount
mount /dev/mapper/integrity-dev /mnt

Provides:

Block-level checksums (below filesystem)
Works with any filesystem
Performance: 10-30% overhead
See: man integritysetup

Copy-on-Write: Enables atomic checksum updates
Snapshots: Immutable copies for data protection
ZFS: End-to-end checksums and self-healing
Btrfs: Checksums and scrubbing
RAID: Redundancy for self-healing

Key Takeaways

Silent Corruption: Traditional filesystems serve corrupted data unknowingly
Checksums: Detect corruption at read time (ZFS, Btrfs, APFS)
Self-Healing: Automatic repair with redundancy (RAID)
Scrubbing: Proactive verification (find corruption early)
Overhead: Minimal (~1-5% CPU, less than 1% space)
Best Practice: Checksums + Redundancy + Scrubbing + ECC RAM
Limitations: Can't fix corruption without redundancy
ext4/XFS: Use DM-Integrity for block-level checksums

Table of Contents

Interactive Data Integrity Demo

Checksum Detection: Finding Silent Corruption

What's happening: