retagged by
9,890 views
18 votes
18 votes
A processor $\text{X}_{1}$ operating at $2 \; \text{GHz}$ has a standard $5-$stage $\text{RISC}$ instruction pipeline having a base $\text{CPI (cycles per instruction)}$ of one without any pipeline hazards. For a given program $\text{P}$ that has $30 \%$ branch instructions, control hazards incur $2$ cycles stall for every branch. A new version of the processor $\text{X}_{2}$ operating at same clock frequency has an additional branch predictor unit $\text{(BPU)}$ that completely eliminates stalls for correctly predicted branches. There is neither any savings nor any additional stalls for wrong predictions. There are no structural hazards and data hazards for $\text{X}_{1}$ and $\text{X}_{2}.$ If the $\text{BPU}$ has a prediction accuracy of $80 \%,$ the speed up $\textit{(rounded off to two decimal places)}$ obtained by $\text{X}_{2}$ over $\text{X}_{1}$ in executing $\text{P}$ is _______________.
retagged by

5 Answers

12 votes
12 votes
Execution Time = No.of Instructions x Clocks per Instructions x Clock cylce time

$Speedup = \frac{\text{Old system execution time}}{\text{New system execution time}}$

When no.of Instructions and clock cycle time are same, then  

$Speedup = \frac{\text{Old system CPI}}{\text{New system CPI}}$

 

Normal processor CPI = 1, without any pipeline hazards.

Given that, Program P has 30% branch instructions where each instruction will lead to 2 stall cycles.

 

Processor X1 has NO BPU, therefore CPI = $1+\overset{\text{branch instructions penalty}}{\overbrace{(0.30*2)}}$ = 1.60

Processor X2 has BPU, therefore CPI = $1 + \overset{\text{branch instructions penalty}}{\overbrace{(0.30* ({ \underset{\text{BPU correctly predicted}}{\underbrace{0.80* 0}} \;\;\;+ \underset{\text{BPU wrongly predicted}}{\underbrace{0.2*2}} }) )}} $ = 1.12

$Speedup = \frac{\text{1.60}}{\text{1.12}}=1.4285$
4 votes
4 votes

As the question contains percentage so let’s assume the total number of instructions = 100.

Total Number of Instruction = 100 (Let).

For X1 :

As 30% are branch instructions and branch instruction incurs 2 stalls. So these branch instruction will take 3 cycles.

Total number of clock for branch Instruction = 90 clocks.

Total number of clock for Non-branch Instruction = 70 clocks.

Total clocks for 100 instructions will be = 90 + 70 = 160 clocks.

As the processor is operating at 2GHz. So, 1 clock cycle will take 0.5 ns.

So, Total time required for 100 instructions = 160 * 0.5 ns = 80 ns.

Processor X1 will take 80 ns.

 

For X2 :

As for improvement in X1 we are provided with a Branch Prediction Unit(BPU) with accuracy of 80%.

So, Out of 30 Branch Instruction, BPU will eliminate stalls for 80%. For rest 20% question is saying that :

“ there is neither any savings nor any additional stalls for wrong predictions”. So, for 20% of instructions

they will incur 2 stalls (means 3 clock) as it is.

So total number of clocks = 24 + 18 + 70 = 112 clocks.

As X2 is also operating at 2GHz. So, 1 clock cycle will take 0.5 ns.

So, Total time required for 100 instruction = 112 * 0.5 = 56 ns. 

Processor X2 will take 56 ns.

 

So, overall speed up = (Time Taken By X1) / (Time Taken By X2) = 80 / 56 = 1.42 (2 decimal places).

edited by
2 votes
2 votes

Dealing with percentages can be counterintuitive so let the program has 100 instructions, then it has 30 branch instructions.

The effective time taken by each instruction in X1 and X2 due to pipelining is 1 Cycle, without any pipeline hazards. CPI is 5 without pipelining. But Pipeline hazards are created by branch instructions (only) in both processors.

In X1, 70 instructions take 1 Cycle since CPI = 1 and another 30 instructions take 3 Cycles (Here an extra 2 is added to 1 for each branch instruction because of the 2-cycle stall caused by every branch instruction)

So for executing the program P with 100 instructions, the time taken by X1 is  (70 * 1) Cycles + (30 * 3)  Cycles = 160 cycles.

Therefore effective CPI (cycles per instruction) with X1 = 160/100 = 1.6

Similarly,

In X2, 70 instructions take 1 cycle. 80 percent of 30 instructions, i.e, 24 instructions take 1 cycle because BPU of X2 has a prediction accuracy of 80%. And 20 percent of 30 instructions, i.e, 6 instructions take 3 cycles. (since BPU prediction goes wrong for 20% of the instructions).

So for executing the program P with 100 instructions, the time taken by X2 is  (70 * 1) Cycles + (24 * 1) Cycles + (6 * 3) Cycles = 112 cycles.

Therefore effective CPI (cycles per instruction) with X2 = 112/100 = 1.12

 

Speedup Factor =  (N1 * C1 * T1 / N2 * C2 * T2) = 1.60/1.12 = 1.428… = 1.43 (rounded off)

Here N1 = N2 = 100, T1 = T2 = 1/(2 GHZ)

C1 = CPI of X1 and C2 = CPI of X2.

 

edited by
Answer:

Related questions

0 votes
0 votes
0 answers
1
rishabhdevsingh1 asked Nov 10, 2018
687 views
consider a5 stage pipeline processor, 20% load instructions, 25% branches, 20% stores, 20% of all instructions are data dependent on instructions in front of them and bra...
4 votes
4 votes
0 answers
2
jayadev asked Feb 3, 2022
386 views
Consider a 5—stage pipeline processor used to execute 200 number of instructions and among those 100 instructions cause 3 stall cycles each. What is the total cycles re...
0 votes
0 votes
0 answers
3
Ritabrata Dey asked May 21, 2019
389 views
How to find number of stall cycles and branch penalty & CPI in a branched instruction pipelining?