Copy-on-Write (CoW): Never Overwrite, Always Preserve

The Traditional Problem: In-Place Updates

Traditional filesystems (ext4, XFS, FAT) use in-place updates:

Read existing block
Modify content
Overwrite same block
Old data gone forever

Problems:

Not atomic: Power failure = partially written block (corruption)
No history: Can't undo or snapshot without copying entire filesystem
Dangerous: One wrong write destroys data permanently

The Copy-on-Write Solution

Core Principle: Never modify data in place. Instead:

Read existing block
Allocate NEW block
Write modified data to new block
Update pointer (metadata)
Old data remains untouched until no longer needed

Benefits:

Atomic writes: Either old state or new state (never corrupted)
Free snapshots: Old data already preserved!
Time travel: Keep references to old blocks = instant history
Data integrity: Never risk overwriting good data

How Copy-on-Write Works: Interactive Exploration

See CoW in action—from simple writes to instant snapshots:

Simple Write Operation: CoW in Action

Step 1 of 5

Initial State: File with 3 Blocks

Block 100

Block A

refs: 1

Block 101

Block B

refs: 1

Block 102

Block C

refs: 1

file.txt

Pointers: [100, 101, 102]

Size: 12KB

File "file.txt" consists of 3 data blocks

Blocks 100, 101, 102 stored on disk

Metadata points to these blocks

Each block has reference count = 1

User wants to modify Block B (middle block)

Key CoW Concepts

1. Write-Anywhere Allocation

Traditional: "Write block 1000 to sector 1000" CoW: "Write data anywhere free, update pointer"

Traditional (in-place):
Block 1000: [old data] → [new data]  ❌ Old data lost

CoW (write-anywhere):
Block 1000: [old data] ← still exists!
Block 5280: [new data] ← written here
Pointer: 1000 → 5280 ✅ Old data preserved

2. Metadata Updates Are Key

CoW depends on atomic metadata updates:

1. Allocate new block (5280)
2. Write data to new block
3. Update parent pointer: 1000 → 5280  ← Atomic!
4. Old block (1000) now unreferenced

If crash happens:

Before step 3: Old data still referenced (no change visible)
After step 3: New data referenced (change complete)
Never half-updated!

3. Reference Counting

Blocks are freed only when no references remain:

Block 1000: refs=2 (original file + snapshot)
Block 5280: refs=1 (only current file)

Delete snapshot:
Block 1000: refs=1 → Can't free yet
Block 5280: refs=1 → Keep

Delete file:
Block 1000: refs=0 → NOW free!
Block 5280: refs=0 → Free

Snapshots: The Magic of CoW

With CoW, snapshots are free:

Traditional (non-CoW) Snapshot:

Copy entire filesystem: 100GB → 100GB copy
Time: Minutes to hours
Space: 200GB total

CoW Snapshot:

1. Create new root pointer → same blocks
2. Mark: "preserve current state"
Time: Instant (milliseconds)
Space: 0 bytes initially!

After modifications:

Modified blocks: New copies created (CoW kicks in)
Unmodified blocks: Shared between original and snapshot
Space used = only changed data

Snapshot Space Efficiency

Original: 100GB
Snapshot: 0GB (just metadata)

Modify 10GB:
Original: points to 90GB old + 10GB new = 100GB
Snapshot: points to 100GB old
Total space: 110GB (not 200GB!)

Efficiency: Only changed blocks duplicated

Atomic Operations

CoW makes complex operations atomic:

Example: Rename Directory

Traditional filesystem:

1. Update old parent: remove entry
2. Update new parent: add entry
3. Update directory: change ".." link
❌ Crash between steps = corruption!

CoW filesystem:

1. Create new metadata tree with changes
2. Update root pointer (atomic!)
✅ Either all changes visible or none

Example: Database Transaction

1. Write new data blocks (CoW)
2. Write new index blocks (CoW)
3. Write new metadata (CoW)
4. Update root (atomic commit!)

Crash before step 4: Old state intact
Crash after step 4: New state complete
Never inconsistent!

CoW in Different Filesystems

Btrfs

Full CoW: Data and metadata
B-tree based: All structures use CoW
Subvolumes: Lightweight CoW containers
Reflinks: Share blocks between files
Command: cp --reflink=always src dest (instant copy!)

ZFS

Full CoW: Data and metadata
Pooled storage: Write anywhere in pool
Checksums: Every block verified
Snapshots: Recursive across datasets
Clones: Writable snapshots

APFS (Apple)

CoW for metadata: Data optionally
Space sharing: Multiple volumes, one pool
Clones: Instant file copies

Performance Implications

Advantages

Parallel writes: Write anywhere = no seek
SSD friendly: Even wear across device
Fast snapshots: No data copying
No fragmentation concerns: Every write is "fresh"

Challenges

Write amplification: Metadata updates cascade up tree
Fragmentation: Related blocks scattered
Space accounting: Hard to predict free space
Performance: Can degrade when full (>80%)

Optimization Tips

# Btrfs: Defragment (breaks CoW links!)
btrfs filesystem defragment -r /mnt

# ZFS: Set recordsize for workload
zfs set recordsize=128k tank/database

# Keep free space >20% for performance
df -h /mnt  # Monitor usage

Space Reclamation

Old blocks freed when no longer referenced:

# Btrfs: Delete old snapshots to free space
btrfs subvolume delete /mnt/.snapshots/old

# ZFS: Destroy snapshots
zfs destroy tank/data@old-snapshot

# Both: Check space used by snapshots
btrfs qgroup show /mnt
zfs list -t snapshot -o space

When CoW Hurts: Disable It

Some workloads conflict with CoW:

Databases (Random Writes)

# Btrfs: Disable CoW for database directory
chattr +C /var/lib/mysql  # Before creating files!

# ZFS: Set copies=1, disable compression
zfs set copies=1 tank/database
zfs set compression=off tank/database

VM Disk Images

# Btrfs: Disable CoW for VM images
chattr +C /var/lib/libvirt/images

# Or use nodatacow mount option
mount -o nodatacow /dev/sda1 /mnt

Note: Disabling CoW = lose snapshot benefits for that data!

CoW vs Journaling

Aspect	CoW (Btrfs/ZFS)	Journaling (ext4/XFS)
Consistency	Always atomic	Via journal replay
Snapshots	Free, instant	Need LVM/external
Write pattern	Anywhere	Mostly sequential
Metadata overhead	Higher (tree updates)	Lower (journal only)
Maturity	Newer	Very mature
Recovery	Always consistent	Replay journal

Best Practices

Keep 20% free space: CoW performance degrades when full
Monitor snapshots: Delete old snapshots to reclaim space
Disable CoW for databases: Use chattr +C on Btrfs
Use reflinks: Instant file copies with cp --reflink
Regular scrubbing: Verify checksums (Btrfs/ZFS)
Balance space: btrfs balance for optimal allocation

Journaling: Alternative consistency mechanism
Snapshots: CoW enables instant snapshots
Btrfs: Linux's CoW filesystem
ZFS: Advanced CoW filesystem with pooled storage
Data Integrity: CoW enables checksum verification

Key Takeaways

Never Overwrite: CoW writes new blocks, preserves old data
Atomic by Design: All operations either complete or don't happen
Snapshots are Free: No copying needed—just preserve references
Space Efficient: Share unchanged blocks between versions
Trade-offs: Some overhead for databases, needs free space
Modern Default: Btrfs, ZFS, APFS all use CoW for reliability

Table of Contents

Simple Write Operation: CoW in Action

Initial State: File with 3 Blocks