6.3k views
An instruction pipeline has five stages, namely, instruction fetch (IF), instruction decode and register fetch (ID/RF), instruction execution (EX), memory access (MEM), and register writeback (WB) with stage latencies $1$ ns, $2.2$ ns, $2$ ns, $1$ ns, and $0.75$ ns, respectively (ns stands for nanoseconds). To gain in terms of frequency, the designers have decided to split the ID/RF stage into three stages (ID, RF1, RF2) each of latency $2.2/3$ ns. Also, the EX stage is split into two stages (EX1, EX2) each of latency $1$ ns. The new design has a total of eight pipeline stages. A program has $20\%$ branch instructions which execute in the EX stage and produce the next instruction pointer at the end of the EX stage in the old design and at the end of the EX2 stage in the new design. The IF stage stalls after fetching a branch instruction until the next instruction pointer is computed. All instructions other than the branch instruction have an average CPI of one in both the designs. The execution times of this program on the old and the new design are $P$ and $Q$ nanoseconds, respectively. The value of $P/Q$ is __________.

edited | 6.3k views
+4

Is this solution from gateforum correct? I couldn't understand how stall clocks are 2 and 5 they should be 1 and 4 instead. Please explain

0
2nd last line shud be  Q = (.8 + .12) * 1ns = 2 ns . else correct.
+1
Thanks
0

For explanation with diagrams see this thread

0
This is absolutely correct.You look at the fact that in first design pipeline will be stalled for 2 stages where as in second design pipeline will be stalled for 5 stages due to intermediate breakdown of stages.
+1
1 mark should be awarded for reading the question only.. XD
0
Do u think , GATE is such easy walk XD

Five stages:

(IF), instruction decode and register fetch (ID/RF),

instruction execution (EX),

memory access (MEM), and register writeback (WB)

P old  design:

with stage latencies $\text{1 ns, 2.2 ns, 2 ns, 1 ns, and 0.75 ns}$

$\text{MAX( 1 ns, 2.2 ns, 2 ns, 1 ns, and 0.75 ns) = 2.2nsec}$

AVG instruction execution time is

$\text{Tavg=(1+no of stalls$\times $branch penality)$\times $cycle time}$

$=(1+0.20\times 2)2.2$  { branch peanlity is $2$ because the next instruction
pointer at the end of the EX stage in the old design.}

$=3.08 \text{ nsec}$

Q :new DESIGN:

the designers decided to split the ID/RF stage into three stages $\text{(ID, RF1, RF2)}$
each of latency $\dfrac{2.2}{3}\text{ ns}$. Also, the $EX$ stage is split into two stages
$\text{(EX1, EX2)}$ each of latency $1\text{ ns}$.
The new design has a total of eight pipeline stages.

Time of stages in new design $=\text{{1 ns, 0.73ns, 0.73ns, 0.73ns , 1ns,1ns, 1 ns, and 0.75 ns}}$

(IF), instruction decode

register fetch (ID/RF) $\rightarrow$ further divided into $3$ ie with latency $0.73$ of each

instruction execution (EX) $\rightarrow$ further divided int $1\text{ nsec}$ of each)

memory access (MEM)

register writeback (WB)

$\text{MAX( 1 ns, 0.73ns, 0.73ns, 0.73ns , 1ns,1ns, 1 ns, and 0.75 ns) =1 nsec}$

AVG instruction execution time is

$\text{Tavg=(1+no of stalls$\times $branch penality)$\times $cycle time}$

$=(1+0.20\times 5)1$ { branch penalty is $5$ because the next instruction pointer
at the end of the $EX2$ stage in the new design.}

$=2 \text{nsec}$

final result

$\dfrac{P}{Q}=\dfrac{3.08}{2}=1.54$

by Boss (19.9k points)
edited
0
I think in both case,branch penalty is  .2...and ni of stalls in first case 2,and second case 5..
0

in both case,no of stalls is same 0.2  ....and BRANCH PENALTY  in first case 2,and second case 5.. rt???

0
@Arjun  sir.

as stall is a stage in pipeline..that is waiting stage..so how it is in fraction??
0

Could anyone will tell how to calculate (the no. of stall) ?

+3

@ashiwa see this, 0
Their is a doubt, I just can't understand in the formula of Tavg as u used above why we are adding 1 in the no. Of stalls ...pls explain it coz it is confusing me a lot
0
if you see clearly he had take 2.2 as common. so, actually he is adding 2.2(first case) and 1(second case) not 1
0

WHY ISNT THE ANSWER LIKE THIS ? Difference is marked in the equation
** Since only 80% instructions have CPI=1

My interpretation is as follows. (It arrives at a different answer for the question)
P old  design:
Tavg  P =(1+no of stalls×branch penality)×cycle time =(0.8+0.20×2)2.2

Q :new DESIGN:
Tavg  Q =(1+no of stalls×branch penality)×cycle time =(0.8+0.20×5)1

Hence P/Q = 1.4667
cpi for first case $=2.2(1+2\times .2)$ as the stall required is $2$ and $2.2$
is the maximum stage delay.

cpi for second state $=1\times (1+5\times .2)$ as now stall increase to $5$ as
there are five stages before the address is calculated and the maximum stage
delay now is $1.$

$\dfrac{\text{cpu_time1}}{\text{cpu_time2}}=\dfrac{3.08}{2}=1.54$
by Active (3.8k points)
edited
+1
plzz explain how to find stall cycles here ??
0
to calculate stalls in question you have to  find the branch instructions . because branch instruction  cause the non overlapping of the subsequent instruction due to which waiting is there hence stalls occurs .!
0

@Arpit : The approach is logically correct, but shouldn't the maximum stage delay for the new design be 3ns instead of 1ns since it has been explicitly mentioned in the new design as per the question :

"split the ID/RF stage into three stages (ID, RF1, RF2) each of latency 2.2/3 ns" ?

So we should get the ratio as : 2.2*(1 + 0.2*2 ) / [3*(1 + 0.2*5)] = 0.5133.  (which is not really an improvement w.r.t the old design.)

0
In the first design, total no. of stages before EX is two and branch penalty is also two. Similarly in the second design total no. of stages before EX2 is 5 and branch penalty is also 5. Is this a general identity?
0
How  to find s tall cycle ?? Im getting stall cycles as 3 and 6
+7

Video Solution:

(Explained very nicely!)

and he calculated 3 and 6 CPI for Branch instead of 2 and 5 CPI

0
@arpit

Hey i guess you have mentioned wrong formula for CPI

CPI= 1+stall frequency*no of stall cycle

avg execution time= CPI*cycle time   where cycle time =max(stage delay)+register delay

what you have calculated is average execution time.

I know this does not affect the solution but using wrong notation or formula causes confusion.
+1 vote
OLD DESIGN :

phase time = 2 ns branch penalty= 2 (since execution is 3rd stage penalty will be of 2 cycles) ,                                                 other than all branch instrucions CPI = 1

branch instrucion = 20% , other instruction = 80%

total execution time = CPI*Phase time

= (1+2*.2)*2ns

P = 3.08ns

NEW DESIGN :

phase time = 1ns branch penalty = 5 (since EX2 is 6th stage penalty will be of 5 cycles)

total execution time = (1+.2*5)*1ns

Q= 2 ns

therefor P/Q = 3.08/2 = 1.54ns
by Active (1.9k points)
+1 vote
Old design:

Two bubbles appear in the pipeline when a branch is taken. So clock cycles needed to complete the instruction after a branch is 3. Two cycles wasted by bubbles and one more to actually execute the instruction.

P = (0.8 * 1 + 0.2 * 3) * 2.2 = 3.08 ns

In new design, by the same reasoning, clock cycles for a branch instruction becomes 6. (5 bubbles + 1 )

Q = (0.8 * 1 + 0.2 * 6) * 1 = 2 ns

P/Q = 1.5
by (51 points) This might help.

by Active (4.7k points)