Pipelining Hazards

Definition

Hazards are conditions that will lead to incorrect results if not mitigated

Mitigation strategies: add hardware, stall the pipeline or fix the software (e.g. add no-ops)

Hazard types: data, control, structural

Stalling

Definition

Stalling the pipeline means to halt an instruction, not allowing it to progress.

Steps

To stall an instruction in ID:

Disable writes to the PC and IF/ID pipeline register
Set control signals in ID/EX to 0 (inserts a no-op into the pipeline)

Data Hazards

Example

ADD  X2, X0, X1
ADDI X3, X2, #1

Hazard: First instruction writes to X2, second instruction uses it, but we expect that it has been updated at this point

Solution: We need a 2 cycle stall

We assume something called forwarding through the register file (i.e it writes in first half and reads in second half of the clock cycle)

Observation: the result of X0 + X1 is present in the pipeline after clock cycle 3 (and this matters because that means we can use it earlier instead of stalling 2 cycles, see #Data Forwarding)

Data Forwarding

Definition

Data forwarding allows us to access data as soon as it's ready, so we don't have to wait for writeback to complete.

Example

ADD  X2, X0, X1
ADDI X3, X2, #1
ADD  X4, X5, X2

Now instruction 3 also uses X2. Should we stall the pipeline again? That would defeat the point of pipelining, we might as well just use a single-cycle processor. Instead, we can forward the data.

Info

The forwarding unit controls the ALU input (A and B) mux selects. The logic for A is as follows:

// If the current instruction writes to a register and we use that register in the next instruction
if (EX/ME.RegWrite && EX/ME.Rd == ID/EX.Rn) {
    ForwardA = 2 // Forward from ME
} else if (ME/WB.RegWrite && ME/WB.Rd == ID/EX.Rn) {
    ForwardA = 1 // Forward from WB
} else {
    ForwardA = 0 // No forwarding
}

Load Hazards

Definition

A load hazard is a special case of a data hazard: the value is produced in ME instead of EX. The following dependent instruction must stall for 1 cycle (even with forwarding).

Example

MOV X1, #3
LDR X2, [X0, #8]
ADD X3, X1, X2

We need to stall instruction 3 (and consequently any proceeding instructions). After 1 cycle, the load hazard is gone and the stall signal is de-asserted.

Alternatively, instead of stalling, a compiler can schedule (re-order) instructions to avoid a load hazard: ```armasm LDR X2, [X0, #8] MOV X1, #3 ADD X3, X1, X2 ```

Info

Logic to detect load hazards while in ID:

if (ID/EX.MemRead && (ID/EX.Rd == IF/ID.Rn || ID/EX.Rd = IF/ID.Rm)) {
    stall = 1
} else {
    stall = 0
}

in other words, if the instruction in EX is LDR (only LDR has MemRead control signal) and the destination register of the instruction in EX is equal to any of the src registers of instructions in ID, the stall.

Control Hazards

Definition

Control hazards are caused by branch instructions. They are an issue because we are branching to new code while there are other instructions lined up in the pipeline.

Diagram

Info

The branch penalty can be reduced to 1 cycle by:

Moving the branch adder from EX to ID and connecting its output to the PC mux
Conditional branches also require forwarding paths from ME and WB to ID for checking the branch condition

Diagram

Conditional Branching

Info

If a conditional branch depends on a preceding:

ALU instr, it must stall 1 cycle
LDR instr, it must stall 2 cycles

Example

Case 1: conditional branch dependent on preceding ADD

ADD X1, X2, X3
CBZ X1, label
ORR X4, X5, X6

Note: if the CBZ is taken, the ORR will be squashed

Impact on CPI (cycles per instruction)

Assume 20% of dynamic instruction mix are branches
50% of branches are unconditional
50% of branches are conditional (half are taken)

Calculating the branch penalty:

Ideal CPI = 1
Average CPI: 1.15

Conditional Branch Prediction

With deeper pipelines (e.g 10-20 stages) and wider pipelines (4-8 instr. / stage), branch penalties are more significant
Branch prediction is worth the extra hardware / chip area

Steps

Predict the branch decision (taken or not taken) and the branch target address
Speculatively execute the predicted path
Execute the branch to check the prediction and recover if wrong (squash the mispredicted instructions and start fetching correct instructions)

Static prediction (without runtime history):

Forward branch: predict not taken (easy)
Backward branch: predict taken (more likely -- loops)

Dynamic Prediction (with runtime history):

Low-order PC bits index into the BHT (e.g 1200134)

if (instr.address == PC && predictor == T) {
    PC <- targetAddress
} else {
    PC <- PC + 4
}

On a correct prediction, the branch penalty is 0 cycles

Structural Hazards

Occur when different instructions need the same resource (at the same time)
E.g instr. and data share same mem. IF and ME (for LDR/STR) can't operate in parallel
- stall in IF
E.g no branch adder: branch target addr. must be calculated with the ALU in EX
- 2 cycle branch penalty

Solution: add more hardware!