Hazard Detection: Pipeline Dependencies and Solutions

Master pipeline hazards through interactive visualizations of data dependencies, control hazards, structural conflicts, and advanced detection mechanisms.

Best viewed on desktop for optimal interactive experience

Understanding Pipeline Hazards

Pipeline hazards are situations that prevent the next instruction from executing during its designated clock cycle. They're the primary obstacles to achieving ideal pipeline performance and require sophisticated hardware mechanisms to detect and resolve.

Modern processors dedicate significant silicon area to hazard detection and resolution, making it one of the most critical aspects of CPU design.

Interactive Hazard Detection Demo

Explore how different types of hazards occur and how modern CPUs detect and resolve them:

1000ms

Pipeline State - Cycle 0

Fetch
Empty
Decode
Empty
Execute
Empty
Memory
Empty
Write
Empty

Instruction Memory

I0: ADD R1, R2, R3
I1: SUB R4, R1, R5
I2: AND R6, R4, R7
I3: OR R8, R6, R9
I4: XOR R10, R11, R12

Register File

Hazard Statistics

0
RAW
0
WAR
0
WAW
0
structural
0
load-use

Performance

0
Cycles
0
Stalls
0
Forwards

Current Demo: RAW (Read After Write) Hazards

RAW (Read After Write) hazards occur when an instruction needs a value that a previous instruction will write. These are true dependencies that cannot be eliminated, only mitigated through forwarding or stalling.

Types of Pipeline Hazards

1. Structural Hazards

Resource conflicts when hardware cannot support all possible combinations of instructions.

Cycle: 1 2 3 4 5 6 7 Load: IF ID EX MEM WB Add: IF ID EX MEM WB Store: IF ID EX MEM <- Conflict! ^^^ Single memory port conflict

Common Structural Hazards:

  • Single memory port (instruction + data access)
  • Single ALU (computation + address calculation)
  • Single register file write port
  • Limited functional units

Solutions:

  • Duplicate resources (separate I-cache/D-cache)
  • Pipeline functional units
  • Multiple register file ports
  • Resource arbitration

2. Data Hazards

Dependencies between instructions involving registers or memory.

RAW (Read After Write) - True Dependency

ADD R1, R2, R3 # R1 = R2 + R3 SUB R4, R1, R5 # R4 = R1 - R5 (needs R1)

The most common and problematic type - cannot be eliminated, only mitigated.

WAR (Write After Read) - Anti-dependency

ADD R1, R2, R3 # R1 = R2 + R3 (reads R2) SUB R2, R4, R5 # R2 = R4 - R5 (writes R2)

Only occurs in out-of-order execution - rename registers to eliminate.

WAW (Write After Write) - Output Dependency

ADD R1, R2, R3 # R1 = R2 + R3 SUB R1, R4, R5 # R1 = R4 - R5 (overwrites R1)

Also only in out-of-order execution - use register renaming.

3. Control Hazards

Disruptions in instruction flow due to branches and jumps.

BEQ R1, R2, label # Branch if equal ADD R3, R4, R5 # Fetched speculatively SUB R6, R7, R8 # May need to be flushed

Impact:

  • 15-20% of instructions are branches
  • Deep pipelines = high misprediction penalty
  • Critical for performance

Hazard Detection Mechanisms

1. Combinational Logic Detection

Simple pipelines use combinational logic to detect hazards:

// RAW hazard detection wire raw_hazard_ex = (id_rs1 == ex_rd && ex_rd != 0) || (id_rs2 == ex_rd && ex_rd != 0); wire raw_hazard_mem = (id_rs1 == mem_rd && mem_rd != 0) || (id_rs2 == mem_rd && mem_rd != 0); wire stall = raw_hazard_ex || raw_hazard_mem;

2. Scoreboarding

Track instruction status and dependencies dynamically:

Scoreboard Table: ┌─────────┬────────┬────────┬────────┬─────────┐ │ FU │ Busy │ Op │ Dest │ Sources │ ├─────────┼────────┼────────┼────────┼─────────┤ │ ALU1 │ Yes │ ADD │ R1 │ R2, R3 │ │ ALU2 │ No │ - │ - │ - │ │ Load │ Yes │ LOAD │ R4 │ R5+100 │ │ Store │ No │ - │ - │ - │ └─────────┴────────┴────────┴────────┴─────────┘ Register Status: R1: ALU1 (writing) R4: Load (writing)

Scoreboard Algorithm:

  1. Issue: Check for structural and WAW hazards
  2. Read Operands: Wait for RAW hazards to clear
  3. Execute: Perform operation when operands ready
  4. Write Result: Check for WAR hazards

3. Tomasulo's Algorithm

More sophisticated out-of-order execution with register renaming:

Reservation Stations: ┌──────┬──────┬────┬────┬──────┬──────┬──────┐ │ Name │ Busy │ Op │ Vj │ Vk │ Qj │ Qk │ ├──────┼──────┼────┼────┼──────┼──────┼──────┤ │ RS1 │ Yes │ADD │ 10 │ 20 │ - │ - │ │ RS2 │ Yes │SUB │ - │ 15 │ RS1 │ - │ │ RS3 │ No │ - │ - │ - │ - │ - │ └──────┴──────┴────┴────┴──────┴──────┴──────┘ Common Data Bus (CDB): Broadcasts results

Key Features:

  • Distributed hazard detection
  • Dynamic scheduling
  • Register renaming via reservation stations
  • Eliminates WAR and WAW hazards

Data Forwarding (Bypassing)

Forward results directly from pipeline stages without waiting for writeback:

Forwarding Paths

Pipeline Stages: ┌────┐ ┌────┐ ┌────┐ ┌─────┐ ┌────┐ │ IF │→│ ID │→│ EX │→│ MEM │→│ WB │ └────┘ └────┘ └────┘ └─────┘ └────┘ ↑ ↑ ↑ └──────┴───────┘ Forwarding Paths

Forwarding Logic

// EX/MEM to EX forwarding if (EX_MEM.RegWrite && EX_MEM.RegisterRd != 0 && EX_MEM.RegisterRd == ID_EX.RegisterRs1) { ForwardA = 2; // Forward from EX/MEM } // MEM/WB to EX forwarding if (MEM_WB.RegWrite && MEM_WB.RegisterRd != 0 && MEM_WB.RegisterRd == ID_EX.RegisterRs1 && !(EX_MEM.RegWrite && EX_MEM.RegisterRd == ID_EX.RegisterRs1)) { ForwardA = 1; // Forward from MEM/WB }

Forwarding Priority

When multiple stages can forward:

  1. Most recent value takes priority
  2. EX/MEM over MEM/WB
  3. Check for register 0 (hardwired to zero)

Advanced Hazard Detection

1. Load-Use Hazards

Special case requiring a stall even with forwarding:

LOAD R1, 0(R2) # Load into R1 ADD R3, R1, R4 # Uses R1 immediately

Detection:

if (ID_EX.MemRead && ((ID_EX.RegisterRt == IF_ID.RegisterRs) || (ID_EX.RegisterRt == IF_ID.RegisterRt))) { stall_pipeline = true; }

2. Memory Hazards

Store-Load dependencies through memory:

STORE R1, 0(R2) # Store to address LOAD R3, 0(R2) # Load from same address

Solutions:

  • Store-Load forwarding
  • Memory disambiguation
  • Load speculation with verification

3. Cross-Iteration Dependencies

Loop-carried dependencies:

for (i = 0; i < n; i++) { a[i] = a[i-1] + b[i]; // RAW dependency }

Techniques:

  • Software pipelining
  • Loop unrolling
  • Modulo scheduling

Hardware Implementation

Dependency Check Matrix

For N-way superscalar, check all instruction pairs:

I0 I1 I2 I3 I0 - ✓ ✓ ✓ I1 x - ✓ ✓ I2 x x - ✓ I3 x x x - ✓ = Check needed x = Already checked

Complexity: O(N²) comparisons per cycle

CAM-Based Detection

Content-Addressable Memory for fast lookups:

Register Tag CAM: ┌─────┬──────────┐ │ Tag │ Producer │ ├─────┼──────────┤ │ R1 │ ROB #5 │ │ R2 │ ROB #3 │ │ R3 │ Ready │ └─────┴──────────┘ Parallel search all entries

Performance Impact

Hazard Frequency

Typical program characteristics:

  • RAW hazards: 20-25% of instructions
  • Control hazards: 15-20% (branches)
  • Structural hazards: < 5% (with good design)
  • WAR/WAW: < 5% (in-order) or eliminated (OoO)

CPI Impact

CPIactual = CPIideal + Stallsstructural + Stallsdata + Stallscontrol

Where:

  • CPIideal = 1.0 for scalar pipeline
  • Stallsdata ≈ 0.1-0.3 with forwarding
  • Stallscontrol ≈ 0.1-0.2 with good prediction

Compiler Techniques

1. Instruction Scheduling

Reorder to minimize hazards:

# Original (2 stalls) LOAD R1, 0(R2) ADD R3, R1, R4 # Stall LOAD R5, 4(R2) ADD R6, R5, R7 # Stall # Scheduled (0 stalls) LOAD R1, 0(R2) LOAD R5, 4(R2) ADD R3, R1, R4 # No stall ADD R6, R5, R7 # No stall

2. Software Pipelining

Overlap loop iterations:

// Original loop for (i = 0; i < n; i++) { load(a[i]); compute(); store(b[i]); } // Software pipelined load(a[0]); for (i = 1; i < n; i++) { compute(i-1); load(a[i]); store(b[i-1]); } compute(n-1); store(b[n-1]);

3. Predication

Convert control dependencies to data dependencies:

# Branching version CMP R1, R2 BNE skip ADD R3, R4, R5 skip: # Predicated version CMP R1, R2 ADDEQ R3, R4, R5 # Execute if equal

Modern Hazard Detection Examples

Intel Skylake

  • 224-entry reorder buffer
  • 97-entry scheduler
  • 7 execution ports
  • Zero-cycle register move
  • Memory disambiguation predictor

AMD Zen 3

  • 256-entry reorder buffer
  • Improved branch predictor
  • Op cache for decoded instructions
  • Enhanced load/store unit

ARM Cortex-A78

  • Out-of-order execution
  • Macro-op fusion
  • Complex branch predictor
  • Load/store clustering

Best Practices

1. Algorithm Level

  • Minimize dependencies in inner loops
  • Use cache-friendly access patterns
  • Reduce unpredictable branches

2. Code Level

// Avoid tight dependencies sum = a + b + c + d; // Chain of dependencies // Better: tree reduction t1 = a + b; t2 = c + d; sum = t1 + t2; // Parallel execution

3. Compiler Flags

# GCC/Clang -O3 # Aggressive optimization -march=native # Target CPU features -ffast-math # Relax FP dependencies -funroll-loops # Reduce branch hazards # Profile-guided optimization gcc -fprofile-generate prog.c ./a.out # Run with typical data gcc -fprofile-use prog.c

Debugging Hazards

Performance Counters

perf stat -e \ resource_stalls.any,\ resource_stalls.sb,\ resource_stalls.rs,\ int_misc.rat_stall_cycles,\ cycle_activity.stalls_total \ ./program

Intel VTune Metrics

  • Pipeline slots analysis
  • Dependency chains
  • Port utilization
  • Stall reasons

Future Directions

1. Machine Learning

  • Neural hazard predictors
  • Dynamic scheduling optimization
  • Workload-specific adaptation

2. Quantum Computing

  • Superposition of states
  • No classical hazards
  • New paradigm needed

3. Neuromorphic Computing

  • Event-driven execution
  • Asynchronous operation
  • Different hazard model

Understanding hazard detection connects to:

Conclusion

Hazard detection is the unsung hero of modern CPU performance. While pipelines promise parallel execution, hazards threaten to serialize it. Through sophisticated detection mechanisms, forwarding paths, and dynamic scheduling, modern processors achieve remarkable performance despite frequent dependencies. Understanding these mechanisms is crucial for both hardware designers creating efficient CPUs and software developers writing high-performance code.

If you found this explanation helpful, consider sharing it with others.

Mastodon