Copy-on-Write (CoW): Never Overwrite, Always Preserve

8 min

Discover how Copy-on-Write filesystems enable instant snapshots, atomic operations, and data integrity by never overwriting existing data. Explore CoW mechanics through interactive visualizations.

Best viewed on desktop for optimal interactive experience

The Traditional Problem: In-Place Updates

Traditional filesystems (ext4, XFS, FAT) use in-place updates:

  1. Read existing block
  2. Modify content
  3. Overwrite same block
  4. Old data gone forever

Problems:

  • Not atomic: Power failure = partially written block (corruption)
  • No history: Can't undo or snapshot without copying entire filesystem
  • Dangerous: One wrong write destroys data permanently

The Copy-on-Write Solution

Core Principle: Never modify data in place. Instead:

  1. Read existing block
  2. Allocate NEW block
  3. Write modified data to new block
  4. Update pointer (metadata)
  5. Old data remains untouched until no longer needed

Benefits:

  • Atomic writes: Either old state or new state (never corrupted)
  • Free snapshots: Old data already preserved!
  • Time travel: Keep references to old blocks = instant history
  • Data integrity: Never risk overwriting good data

How Copy-on-Write Works: Interactive Exploration

See CoW in action—from simple writes to instant snapshots:

Simple Write Operation: CoW in Action

Step 1 of 5

Initial State: File with 3 Blocks

Block 100
Block A
refs: 1
Block 101
Block B
refs: 1
Block 102
Block C
refs: 1
file.txt
Pointers: [100, 101, 102]
Size: 12KB

File "file.txt" consists of 3 data blocks

Blocks 100, 101, 102 stored on disk

Metadata points to these blocks

Each block has reference count = 1

User wants to modify Block B (middle block)

Key CoW Concepts

1. Write-Anywhere Allocation

Traditional: "Write block 1000 to sector 1000" CoW: "Write data anywhere free, update pointer"

Traditional (in-place): Block 1000: [old data] → [new data] ❌ Old data lost CoW (write-anywhere): Block 1000: [old data] ← still exists! Block 5280: [new data] ← written here Pointer: 1000 → 5280 ✅ Old data preserved

2. Metadata Updates Are Key

CoW depends on atomic metadata updates:

1. Allocate new block (5280) 2. Write data to new block 3. Update parent pointer: 1000 → 5280 ← Atomic! 4. Old block (1000) now unreferenced

If crash happens:

  • Before step 3: Old data still referenced (no change visible)
  • After step 3: New data referenced (change complete)
  • Never half-updated!

3. Reference Counting

Blocks are freed only when no references remain:

Block 1000: refs=2 (original file + snapshot) Block 5280: refs=1 (only current file) Delete snapshot: Block 1000: refs=1 → Can't free yet Block 5280: refs=1 → Keep Delete file: Block 1000: refs=0 → NOW free! Block 5280: refs=0 → Free

Snapshots: The Magic of CoW

With CoW, snapshots are free:

Traditional (non-CoW) Snapshot:

Copy entire filesystem: 100GB → 100GB copy Time: Minutes to hours Space: 200GB total

CoW Snapshot:

1. Create new root pointer → same blocks 2. Mark: "preserve current state" Time: Instant (milliseconds) Space: 0 bytes initially!

After modifications:

  • Modified blocks: New copies created (CoW kicks in)
  • Unmodified blocks: Shared between original and snapshot
  • Space used = only changed data

Snapshot Space Efficiency

Original: 100GB Snapshot: 0GB (just metadata) Modify 10GB: Original: points to 90GB old + 10GB new = 100GB Snapshot: points to 100GB old Total space: 110GB (not 200GB!) Efficiency: Only changed blocks duplicated

Atomic Operations

CoW makes complex operations atomic:

Example: Rename Directory

Traditional filesystem:

1. Update old parent: remove entry 2. Update new parent: add entry 3. Update directory: change ".." link ❌ Crash between steps = corruption!

CoW filesystem:

1. Create new metadata tree with changes 2. Update root pointer (atomic!) ✅ Either all changes visible or none

Example: Database Transaction

1. Write new data blocks (CoW) 2. Write new index blocks (CoW) 3. Write new metadata (CoW) 4. Update root (atomic commit!) Crash before step 4: Old state intact Crash after step 4: New state complete Never inconsistent!

CoW in Different Filesystems

Btrfs

  • Full CoW: Data and metadata
  • B-tree based: All structures use CoW
  • Subvolumes: Lightweight CoW containers
  • Reflinks: Share blocks between files
  • Command: cp --reflink=always src dest (instant copy!)

ZFS

  • Full CoW: Data and metadata
  • Pooled storage: Write anywhere in pool
  • Checksums: Every block verified
  • Snapshots: Recursive across datasets
  • Clones: Writable snapshots

APFS (Apple)

  • CoW for metadata: Data optionally
  • Space sharing: Multiple volumes, one pool
  • Clones: Instant file copies

Performance Implications

Advantages

  • Parallel writes: Write anywhere = no seek
  • SSD friendly: Even wear across device
  • Fast snapshots: No data copying
  • No fragmentation concerns: Every write is "fresh"

Challenges

  • Write amplification: Metadata updates cascade up tree
  • Fragmentation: Related blocks scattered
  • Space accounting: Hard to predict free space
  • Performance: Can degrade when full (>80%)

Optimization Tips

# Btrfs: Defragment (breaks CoW links!) btrfs filesystem defragment -r /mnt # ZFS: Set recordsize for workload zfs set recordsize=128k tank/database # Keep free space >20% for performance df -h /mnt # Monitor usage

Space Reclamation

Old blocks freed when no longer referenced:

# Btrfs: Delete old snapshots to free space btrfs subvolume delete /mnt/.snapshots/old # ZFS: Destroy snapshots zfs destroy tank/data@old-snapshot # Both: Check space used by snapshots btrfs qgroup show /mnt zfs list -t snapshot -o space

When CoW Hurts: Disable It

Some workloads conflict with CoW:

Databases (Random Writes)

# Btrfs: Disable CoW for database directory chattr +C /var/lib/mysql # Before creating files! # ZFS: Set copies=1, disable compression zfs set copies=1 tank/database zfs set compression=off tank/database

VM Disk Images

# Btrfs: Disable CoW for VM images chattr +C /var/lib/libvirt/images # Or use nodatacow mount option mount -o nodatacow /dev/sda1 /mnt

Note: Disabling CoW = lose snapshot benefits for that data!

CoW vs Journaling

AspectCoW (Btrfs/ZFS)Journaling (ext4/XFS)
ConsistencyAlways atomicVia journal replay
SnapshotsFree, instantNeed LVM/external
Write patternAnywhereMostly sequential
Metadata overheadHigher (tree updates)Lower (journal only)
MaturityNewerVery mature
RecoveryAlways consistentReplay journal

Best Practices

  1. Keep 20% free space: CoW performance degrades when full
  2. Monitor snapshots: Delete old snapshots to reclaim space
  3. Disable CoW for databases: Use chattr +C on Btrfs
  4. Use reflinks: Instant file copies with cp --reflink
  5. Regular scrubbing: Verify checksums (Btrfs/ZFS)
  6. Balance space: btrfs balance for optimal allocation
  • Journaling: Alternative consistency mechanism
  • Snapshots: CoW enables instant snapshots
  • Btrfs: Linux's CoW filesystem
  • ZFS: Advanced CoW filesystem with pooled storage
  • Data Integrity: CoW enables checksum verification

Key Takeaways

  • Never Overwrite: CoW writes new blocks, preserves old data
  • Atomic by Design: All operations either complete or don't happen
  • Snapshots are Free: No copying needed—just preserve references
  • Space Efficient: Share unchanged blocks between versions
  • Trade-offs: Some overhead for databases, needs free space
  • Modern Default: Btrfs, ZFS, APFS all use CoW for reliability

If you found this explanation helpful, consider sharing it with others.

Mastodon