3,920 views
A processor $\text{X}_{1}$ operating at $2 \; \text{GHz}$ has a standard $5-$stage $\text{RISC}$ instruction pipeline having a base $\text{CPI (cycles per instruction)}$ of one without any pipeline hazards. For a given program $\text{P}$ that has $30 \%$ branch instructions, control hazards incur $2$ cycles stall for every branch. A new version of the processor $\text{X}_{2}$ operating at same clock frequency has an additional branch predictor unit $\text{(BPU)}$ that completely eliminates stalls for correctly predicted branches. There is neither any savings nor any additional stalls for wrong predictions. There are no structural hazards and data hazards for $\text{X}_{1}$ and $\text{X}_{2}.$ If the $\text{BPU}$ has a prediction accuracy of $80 \%,$ the speed up $\textit{(rounded off to two decimal places)}$ obtained by $\text{X}_{2}$ over $\text{X}_{1}$ in executing $\text{P}$ is _______________.

Execution Time = No.of Instructions x Clocks per Instructions x Clock cylce time

$Speedup = \frac{\text{Old system execution time}}{\text{New system execution time}}$

When no.of Instructions and clock cycle time are same, then

$Speedup = \frac{\text{Old system CPI}}{\text{New system CPI}}$

Normal processor CPI = 1, without any pipeline hazards.

Given that, Program P has 30% branch instructions where each instruction will lead to 2 stall cycles.

Processor X1 has NO BPU, therefore CPI = $1+\overset{\text{branch instructions penalty}}{\overbrace{(0.30*2)}}$ = 1.60

Processor X2 has BPU, therefore CPI = $1 + \overset{\text{branch instructions penalty}}{\overbrace{(0.30* ({ \underset{\text{BPU correctly predicted}}{\underbrace{0.80* 0}} \;\;\;+ \underset{\text{BPU wrongly predicted}}{\underbrace{0.2*2}} }) )}}$ = 1.12

$Speedup = \frac{\text{1.60}}{\text{1.12}}=1.4285$

This line had confused me, and I assumed that there is no penalty for incorrect prediction which is, by common seems absurd but still, I thought maybe the question is asking us to deliberately assume that, very very bad understanding :(, and it cost me 2 marks :(

I am regretting now for not believing my common sense

reshown ago by
The answer given by mahisingh is correct. In fact, it is more accurate than the answer given by the IIT.  He took base CPI as 5 i.e, each stage will run for 1 cycle.

But he missed one convention used to calculate execution times. The Very first instruction in both processors should be taken as only one stage execution i.e, the first instruction takes only 1 cycle(1/5 CPI)  rather than 5 Cycles(1 CPI). So there is an extra 4 in both numerator and denominator.

So instead of 164/116 which is more accurate by all means, by convention, the answer is (164-4/116-4) => 160/112 = 1.428
edited

There is neither any savings nor any additional stalls for wrong predictions.

Sir isn’t this statement a bit ambiguous? Like can't it be understood like this as well → In case of a BUP unit being installed it eliminates additional stalls be it a correct prediction or a misprediction?? Going by that 1.6 can as well be an acceptable solution right?

Speed up factor will always be greater than 1

Dealing with percentages can be counterintuitive so let the program has 100 instructions, then it has 30 branch instructions.

The effective time taken by each instruction in X1 and X2 due to pipelining is 1 Cycle, without any pipeline hazards. CPI is 5 without pipelining. But Pipeline hazards are created by branch instructions (only) in both processors.

In X1, 70 instructions take 1 Cycle since CPI = 1 and another 30 instructions take 3 Cycles (Here an extra 2 is added to 1 for each branch instruction because of the 2-cycle stall caused by every branch instruction)

So for executing the program P with 100 instructions, the time taken by X1 is  (70 * 1) Cycles + (30 * 3)  Cycles = 160 cycles.

Therefore effective CPI (cycles per instruction) with X1 = 160/100 = 1.6

Similarly,

In X2, 70 instructions take 1 cycle. 80 percent of 30 instructions, i.e, 24 instructions take 1 cycle because BPU of X2 has a prediction accuracy of 80%. And 20 percent of 30 instructions, i.e, 6 instructions take 3 cycles. (since BPU prediction goes wrong for 20% of the instructions).

So for executing the program P with 100 instructions, the time taken by X2 is  (70 * 1) Cycles + (24 * 1) Cycles + (6 * 3) Cycles = 112 cycles.

Therefore effective CPI (cycles per instruction) with X2 = 112/100 = 1.12

Speedup Factor =  (N1 * C1 * T1 / N2 * C2 * T2) = 1.60/1.12 = 1.428… = 1.43 (rounded off)

Here N1 = N2 = 100, T1 = T2 = 1/(2 GHZ)

C1 = CPI of X1 and C2 = CPI of X2.

by

5 Stage RISC has these stages

 IF ID EX MA WB

Processor has a frequency of 2GHz  means clock pulse = 0.5 nano Sec (tp = 0.5 nsec)

As per the question CPI is one(1) when there is no any hazard. Now 30% instructions cause 2 cycle stall.

let us suppose we have some x number of instructions.  0.3*x instructions suffer stall of 2 cycles each means total stall cycles is 0.6*x. So total number of cycles for x instructions = 1.6*x.

Avg CPI is 1.6 cc

Now this version is improved which has now a Branch predictor unit. It is mentioned that if prediction goes wrong there is no any penalty(extra stall) means 2 cc stall will still be there which was there in older version.

But when we have a correct branch prediction those extra 2 cc stall in the older version would be eliminated.

$\textup{80% of 30%}$ instructions has zero(0) stall but  $\textup{20% of 30%}$ instructions has 2 cycle stall.

Number of stall cycles in the new version is $\frac{20}{100} * \frac{30}{100}* 2 = 0.12$ cc and overall CPI is 1.12 cc

Now speedup of older version X1 :

$\frac{Non - pipeline \, execution\, time}{Pipelined \, execution\, time}$ = $\frac{5*x* 0.5}{1.6*x* 0.5} = \frac{5}{1.6} = 3.125$

Speedup of New version X2 :

$\frac{Non - pipeline \, execution\, time}{Pipelined \, execution\, time}$ = $\frac{5*x* 0.5}{1.12*x* 0.5} = \frac{5}{1.12} = 4.464$

Now Ratio of Speedup of X2 over X1 = $\frac{Speedup\, X2}{Speedup\, X1} = \frac{4.64}{3.125} = 1.4285$