Copy-on-Write (CoW): Never Overwrite, Always Preserve
Discover how Copy-on-Write filesystems enable instant snapshots, atomic operations, and data integrity by never overwriting existing data. Explore CoW mechanics through interactive visualizations.
Best viewed on desktop for optimal interactive experience
The Traditional Problem: In-Place Updates
Traditional filesystems (ext4, XFS, FAT) use in-place updates:
- Read existing block
- Modify content
- Overwrite same block
- Old data gone forever
Problems:
- Not atomic: Power failure = partially written block (corruption)
- No history: Can't undo or snapshot without copying entire filesystem
- Dangerous: One wrong write destroys data permanently
The Copy-on-Write Solution
Core Principle: Never modify data in place. Instead:
- Read existing block
- Allocate NEW block
- Write modified data to new block
- Update pointer (metadata)
- Old data remains untouched until no longer needed
Benefits:
- Atomic writes: Either old state or new state (never corrupted)
- Free snapshots: Old data already preserved!
- Time travel: Keep references to old blocks = instant history
- Data integrity: Never risk overwriting good data
How Copy-on-Write Works: Interactive Exploration
See CoW in action—from simple writes to instant snapshots:
Simple Write Operation: CoW in Action
Initial State: File with 3 Blocks
File "file.txt" consists of 3 data blocks
Blocks 100, 101, 102 stored on disk
Metadata points to these blocks
Each block has reference count = 1
User wants to modify Block B (middle block)
Key CoW Concepts
1. Write-Anywhere Allocation
Traditional: "Write block 1000 to sector 1000" CoW: "Write data anywhere free, update pointer"
Traditional (in-place): Block 1000: [old data] → [new data] ❌ Old data lost CoW (write-anywhere): Block 1000: [old data] ← still exists! Block 5280: [new data] ← written here Pointer: 1000 → 5280 ✅ Old data preserved
2. Metadata Updates Are Key
CoW depends on atomic metadata updates:
1. Allocate new block (5280) 2. Write data to new block 3. Update parent pointer: 1000 → 5280 ← Atomic! 4. Old block (1000) now unreferenced
If crash happens:
- Before step 3: Old data still referenced (no change visible)
- After step 3: New data referenced (change complete)
- Never half-updated!
3. Reference Counting
Blocks are freed only when no references remain:
Block 1000: refs=2 (original file + snapshot) Block 5280: refs=1 (only current file) Delete snapshot: Block 1000: refs=1 → Can't free yet Block 5280: refs=1 → Keep Delete file: Block 1000: refs=0 → NOW free! Block 5280: refs=0 → Free
Snapshots: The Magic of CoW
With CoW, snapshots are free:
Traditional (non-CoW) Snapshot:
Copy entire filesystem: 100GB → 100GB copy Time: Minutes to hours Space: 200GB total
CoW Snapshot:
1. Create new root pointer → same blocks 2. Mark: "preserve current state" Time: Instant (milliseconds) Space: 0 bytes initially!
After modifications:
- Modified blocks: New copies created (CoW kicks in)
- Unmodified blocks: Shared between original and snapshot
- Space used = only changed data
Snapshot Space Efficiency
Original: 100GB Snapshot: 0GB (just metadata) Modify 10GB: Original: points to 90GB old + 10GB new = 100GB Snapshot: points to 100GB old Total space: 110GB (not 200GB!) Efficiency: Only changed blocks duplicated
Atomic Operations
CoW makes complex operations atomic:
Example: Rename Directory
Traditional filesystem:
1. Update old parent: remove entry 2. Update new parent: add entry 3. Update directory: change ".." link ❌ Crash between steps = corruption!
CoW filesystem:
1. Create new metadata tree with changes 2. Update root pointer (atomic!) ✅ Either all changes visible or none
Example: Database Transaction
1. Write new data blocks (CoW) 2. Write new index blocks (CoW) 3. Write new metadata (CoW) 4. Update root (atomic commit!) Crash before step 4: Old state intact Crash after step 4: New state complete Never inconsistent!
CoW in Different Filesystems
Btrfs
- Full CoW: Data and metadata
- B-tree based: All structures use CoW
- Subvolumes: Lightweight CoW containers
- Reflinks: Share blocks between files
- Command:
cp --reflink=always src dest
(instant copy!)
ZFS
- Full CoW: Data and metadata
- Pooled storage: Write anywhere in pool
- Checksums: Every block verified
- Snapshots: Recursive across datasets
- Clones: Writable snapshots
APFS (Apple)
- CoW for metadata: Data optionally
- Space sharing: Multiple volumes, one pool
- Clones: Instant file copies
Performance Implications
Advantages
- Parallel writes: Write anywhere = no seek
- SSD friendly: Even wear across device
- Fast snapshots: No data copying
- No fragmentation concerns: Every write is "fresh"
Challenges
- Write amplification: Metadata updates cascade up tree
- Fragmentation: Related blocks scattered
- Space accounting: Hard to predict free space
- Performance: Can degrade when full (>80%)
Optimization Tips
# Btrfs: Defragment (breaks CoW links!) btrfs filesystem defragment -r /mnt # ZFS: Set recordsize for workload zfs set recordsize=128k tank/database # Keep free space >20% for performance df -h /mnt # Monitor usage
Space Reclamation
Old blocks freed when no longer referenced:
# Btrfs: Delete old snapshots to free space btrfs subvolume delete /mnt/.snapshots/old # ZFS: Destroy snapshots zfs destroy tank/data@old-snapshot # Both: Check space used by snapshots btrfs qgroup show /mnt zfs list -t snapshot -o space
When CoW Hurts: Disable It
Some workloads conflict with CoW:
Databases (Random Writes)
# Btrfs: Disable CoW for database directory chattr +C /var/lib/mysql # Before creating files! # ZFS: Set copies=1, disable compression zfs set copies=1 tank/database zfs set compression=off tank/database
VM Disk Images
# Btrfs: Disable CoW for VM images chattr +C /var/lib/libvirt/images # Or use nodatacow mount option mount -o nodatacow /dev/sda1 /mnt
Note: Disabling CoW = lose snapshot benefits for that data!
CoW vs Journaling
Aspect | CoW (Btrfs/ZFS) | Journaling (ext4/XFS) |
---|---|---|
Consistency | Always atomic | Via journal replay |
Snapshots | Free, instant | Need LVM/external |
Write pattern | Anywhere | Mostly sequential |
Metadata overhead | Higher (tree updates) | Lower (journal only) |
Maturity | Newer | Very mature |
Recovery | Always consistent | Replay journal |
Best Practices
- Keep 20% free space: CoW performance degrades when full
- Monitor snapshots: Delete old snapshots to reclaim space
- Disable CoW for databases: Use
chattr +C
on Btrfs - Use reflinks: Instant file copies with
cp --reflink
- Regular scrubbing: Verify checksums (Btrfs/ZFS)
- Balance space:
btrfs balance
for optimal allocation
Related Concepts
- Journaling: Alternative consistency mechanism
- Snapshots: CoW enables instant snapshots
- Btrfs: Linux's CoW filesystem
- ZFS: Advanced CoW filesystem with pooled storage
- Data Integrity: CoW enables checksum verification
Key Takeaways
- Never Overwrite: CoW writes new blocks, preserves old data
- Atomic by Design: All operations either complete or don't happen
- Snapshots are Free: No copying needed—just preserve references
- Space Efficient: Share unchanged blocks between versions
- Trade-offs: Some overhead for databases, needs free space
- Modern Default: Btrfs, ZFS, APFS all use CoW for reliability