Filesystem Journaling: How Write-Ahead Logging Prevents Data Loss

10 min

Understand the journaling mechanism that protects filesystems from crashes. Explore write-ahead logging, transaction states, and crash recovery through interactive visualizations.

Best viewed on desktop for optimal interactive experience

The Problem: Crashes During Writes

Imagine you're updating a file when suddenly—power failure! Without protection, your filesystem could be left in an inconsistent state:

  • Half-written metadata: Directory entries point to freed blocks
  • Orphaned data: Allocated blocks with no file reference
  • Corrupted structures: Inconsistent inode tables, bitmaps

Traditional filesystems required full disk scans (fsck) after crashes—potentially hours on large drives. Journaling solves this with write-ahead logging.

The Journaling Solution

Core Idea: Before making any changes, write your intentions to a journal (transaction log). If a crash occurs, replay the journal to complete or undo partial operations.

Think of it like a chef's prep notes: write down what you're about to cook before you start. If interrupted, check your notes to know what state you're in.

How Journaling Works: Interactive Exploration

See the journaling mechanism in action—from transaction start to commit, and crash recovery:

Normal Operation: Transaction Flow

Step 1 of 5

Initial State: Filesystem Consistent

Transaction Journal
Journal empty
Filesystem State
doc.txt
Size: 10KB
Blocks: 100, 101
State: consistent

Filesystem in consistent state

Journal is empty (or only old transactions)

Last checkpoint at position 0

Ready to process new write operation

User wants to append data to doc.txt

Journal Modes: Safety vs Performance

Different journaling modes offer varying guarantees:

1. Journal Mode (Full Journaling)

  • What's journaled: Both metadata AND data
  • Process: Write data to journal → Write metadata to journal → Commit → Write to final location
  • Safety: Highest - complete consistency
  • Performance: Slowest - everything written twice
  • Use case: Critical data (financial systems)

2. Ordered Mode (Default)

  • What's journaled: Only metadata
  • Process: Write data to disk → Write metadata to journal → Commit → Write metadata to final location
  • Safety: High - metadata consistent, data may be old
  • Performance: Balanced - data written once
  • Use case: Most systems (ext4 default, XFS)

3. Writeback Mode

  • What's journaled: Only metadata
  • Process: Write metadata to journal → Commit → Write data and metadata to disk (any order)
  • Safety: Lower - metadata consistent, data may be garbage
  • Performance: Fastest - no ordering constraints
  • Use case: Non-critical data, scratch disks

The Transaction Lifecycle

1. Transaction Start 2. Write Intent to Journal (WAL) 3. Wait for Journal Flush 4. Commit Transaction (Commit Record) 5. Apply Changes to Filesystem 6. Mark Journal Entries as Completed (Checkpoint) 7. Reuse Journal Space (Circular Buffer)

Recovery After Crash

When a filesystem mounts after a crash:

  1. Scan Journal: Read from last checkpoint
  2. Check Commit Records: Find complete vs incomplete transactions
  3. Replay Complete: Apply committed but not-yet-applied changes
  4. Rollback Incomplete: Ignore uncommitted transactions
  5. Mount Filesystem: Now in consistent state

Recovery Time: Seconds to minutes (scanning journal only), not hours (scanning entire disk).

Journaling in Different Filesystems

ext4

  • Journal: Dedicated journal inode or external device
  • Modes: journal, ordered (default), writeback
  • Journal size: Configurable (default ~128MB)
  • Command: tune2fs -o journal_data /dev/sda1

XFS

  • Journal: Metadata-only (always ordered mode)
  • Log size: Configurable with -l size=128m
  • Real-time log: Optional separate device for sync writes
  • Efficient: Only logs metadata changes

NTFS

  • $LogFile: Transaction log for metadata
  • Mode: Metadata journaling only
  • Recovery: Automatic on mount (chkdsk if needed)
  • USN Journal: Separate change journal for applications

Btrfs / ZFS

  • No traditional journal: Use Copy-on-Write instead
  • Atomic operations: CoW provides transaction semantics
  • See: Copy-on-Write mechanism

Performance Impact

Journal Placement

# Internal journal (default) mkfs.ext4 /dev/sda1 # External journal (faster, separate device) mkfs.ext4 -J device=/dev/sdb1 /dev/sda1

External journal benefits:

  • Reduced seek time (journal on SSD, data on HDD)
  • Parallel I/O
  • Better for write-heavy workloads

Journal Size Tuning

# Larger journal = more buffering, less frequent commits tune2fs -J size=400 /dev/sda1 # 400MB journal # For databases: larger journal reduces checkpoint frequency

Best Practices

  1. Use ordered mode for most workloads (default)
  2. Enable full journaling only for critical data
  3. External journal on separate SSD for performance
  4. Monitor journal wraps: dumpe2fs /dev/sda1 | grep -i journal
  5. Disable journaling only for scratch/tmp filesystems

Journaling vs Copy-on-Write

AspectJournalingCoW (Btrfs/ZFS)
MethodWrite-ahead logNever overwrite
OverheadWrite twice (journal + final)Write once (new location)
RecoveryReplay journalAlways consistent
SnapshotsNot supportedFree with CoW
MaturityVery matureNewer (Btrfs)

When to choose:

  • Journaling (ext4, XFS): Maximum maturity, proven reliability
  • CoW (Btrfs, ZFS): Want snapshots, checksums, modern features
  • Copy-on-Write: Alternative consistency mechanism
  • Data Integrity: Checksums and corruption detection
  • ext4: Journaling modes in detail
  • XFS: Metadata journaling
  • NTFS: $LogFile transaction log

Key Takeaways

  • Journaling = Insurance: Protects against crashes with write-ahead logging
  • Fast Recovery: Seconds instead of hours (no full fsck needed)
  • Configurable Safety: Choose journal mode based on data criticality
  • Performance Trade-off: More safety = more writes (journal overhead)
  • Universal: Used by ext4, XFS, NTFS, and most modern filesystems

If you found this explanation helpful, consider sharing it with others.

Mastodon