2,525 views
A processor $\text{X}_{1}$ operating at $2 \; \text{GHz}$ has a standard $5-$stage $\text{RISC}$ instruction pipeline having a base $\text{CPI (cycles per instruction)}$ of one without any pipeline hazards. For a given program $\text{P}$ that has $30 \%$ branch instructions, control hazards incur $2$ cycles stall for every branch. A new version of the processor $\text{X}_{2}$ operating at same clock frequency has an additional branch predictor unit $\text{(BPU)}$ that completely eliminates stalls for correctly predicted branches. There is neither any savings nor any additional stalls for wrong predictions. There are no structural hazards and data hazards for $\text{X}_{1}$ and $\text{X}_{2}.$ If the $\text{BPU}$ has a prediction accuracy of $80 \%,$ the speed up $\textit{(rounded off to two decimal places)}$ obtained by $\text{X}_{2}$ over $\text{X}_{1}$ in executing $\text{P}$ is _______________.

Execution Time = No.of Instructions x Clocks per Instructions x Clock cylce time

$Speedup = \frac{\text{Old system execution time}}{\text{New system execution time}}$

When no.of Instructions and clock cycle time are same, then

$Speedup = \frac{\text{Old system CPI}}{\text{New system CPI}}$

Normal processor CPI = 1, without any pipeline hazards.

Given that, Program P has 30% branch instructions where each instruction will lead to 2 stall cycles.

Processor X1 has NO BPU, therefore CPI = $1+\overset{\text{branch instructions penalty}}{\overbrace{(0.30*2)}}$ = 1.60

Processor X2 has BPU, therefore CPI = $1 + \overset{\text{branch instructions penalty}}{\overbrace{(0.30* ({ \underset{\text{BPU correctly predicted}}{\underbrace{0.80* 0}} \;\;\;+ \underset{\text{BPU wrongly predicted}}{\underbrace{0.2*2}} }) )}}$ = 1.12

$Speedup = \frac{\text{1.60}}{\text{1.12}}=1.4285$

## There is neither any savings nor any additional stalls for wrong predictions.

What is the significance of this line??

Shouldn't it mean that no penalty for wrong prediction.

This line had confused me, and I assumed that there is no penalty for incorrect prediction which is, by common seems absurd but still, I thought maybe the question is asking us to deliberately assume that, very very bad understanding :(, and it cost me 2 marks :(

I am regretting now for not believing my common sense

The answer given by mahisingh is correct. In fact, it is more accurate than the answer given by the IIT.  He took base CPI as 5 i.e, each stage will run for 1 cycle.

But he missed one convention used to calculate execution times. The Very first instruction in both processors should be taken as only one stage execution i.e, the first instruction takes only 1 cycle(1/5 CPI)  rather than 5 Cycles(1 CPI). So there is an extra 4 in both numerator and denominator.

So instead of 164/116 which is more accurate by all means, by convention, the answer is (164-4/116-4) => 160/112 = 1.428

Dealing with percentages can be counterintuitive so let the program has 100 instructions, then it has 30 branch instructions.

The effective time taken by each instruction in X1 and X2 due to pipelining is 1 Cycle, without any pipeline hazards. CPI is 5 without pipelining. But Pipeline hazards are created by branch instructions (only) in both processors.

In X1, 70 instructions take 1 Cycle since CPI = 1 and another 30 instructions take 3 Cycles (Here an extra 2 is added to 1 for each branch instruction because of the 2-cycle stall caused by every branch instruction)

So for executing the program P with 100 instructions, the time taken by X1 is  (70 * 1) Cycles + (30 * 3)  Cycles = 160 cycles.

Therefore effective CPI (cycles per instruction) with X1 = 160/100 = 1.6

Similarly,

In X2, 70 instructions take 1 cycle. 80 percent of 30 instructions, i.e, 24 instructions take 1 cycle because BPU of X2 has a prediction accuracy of 80%. And 20 percent of 30 instructions, i.e, 6 instructions take 3 cycles. (since BPU prediction goes wrong for 20% of the instructions).

So for executing the program P with 100 instructions, the time taken by X2 is  (70 * 1) Cycles + (24 * 1) Cycles + (6 * 3) Cycles = 112 cycles.

Therefore effective CPI (cycles per instruction) with X2 = 112/100 = 1.12

Speedup Factor =  (N1 * C1 * T1 / N2 * C2 * T2) = 1.60/1.12 = 1.428… = 1.43 (rounded off)

Here N1 = N2 = 100, T1 = T2 = 1/(2 GHZ)

C1 = CPI of X1 and C2 = CPI of X2.

by

Speed up factor will always be greater than 1