# GATE2017-1-50

8.1k views

Instruction execution in a processor is divided into 5 stages, Instruction Fetch (IF), Instruction Decode (ID), Operand fetch (OF), Execute (EX), and Write Back (WB). These stages take 5, 4, 20, 10 and 3 nanoseconds (ns) respectively. A pipelined implementation of the processor requires buffering between each pair of consecutive stages with a delay of 2 ns. Two pipelined implementation of the processor are contemplated:

1. a naive pipeline implementation (NP) with 5 stages and
2. an efficient pipeline (EP) where the OF stage is divided into stages OF1 and OF2 with execution times of 12 ns and 8 ns respectively.

The speedup (correct to two decimal places) achieved by EP over NP in executing 20 independent instructions with no hazards is _________ .

retagged
1
why synchronous pipeline is assumed when total instruction are constant
0
As the delay for different stages are different so  the time for the first instruction for NP pipelined processor is

5+2+4+2+20+2+10+2+3=50

So time to execute 20 instructions will be

50+19*22=468

But here we are using the following formula

5+19*22=423

Why??
5

$Pipeline-1:$ $(1\times 4\times 22)+20+19\times 1\times 22=108+418=526ns$

$Pipeline-2:$ $(1\times 5\times 14)+12+19\times 1\times 14=82+266=348ns$

$Speedup=\dfrac{526}{348}=1.51$

Official answer key $1.49-1.52$

CASE 1

Stages $5,$ max delay $= 22\text{ (after adding buffer delay), number of instructions}= 20$

CASE 2

Stages $6,$ (since OF is split), max delay $= 14,\text{ number of instructions}=20$

So, execution time is $(K+N-1)\times \text{ Max delay}$

Speed Up $=\dfrac{528}{350}=1.508$ (Execution time case 1/Execution time case $2$)

edited
0

if u have written 1.50 also correct ans..bcz they mentioned Speedup (Correct to 2 decimal places)

0
Why we are not taking CPI=1 here ?
Because, A successful pipeline have CPI(Cycles Per Instruction)=1.
2

@Bikram Sir @Debashish @mcjoshi

What is meaning of " The speedup (correct to two decimal places) achieved by EP over NP "

Why is it EP/NP and not NP/EP? Plz tell me. I am doing wrong here.

Isnot speedup value less than 1?

Here speedup value greater than 1. So plz clarify my doubt here

12

" The speedup (correct to two decimal places) achieved by EP over NP "

means NP/ EP .

it means time taken without EP / Time taken with EP = speedup

because speed up is defined as

=  average instruction  time taken without pipeline /  average instruction time taken with pipeline

so that's why it is NP / EP and NOT EP/NP .

where , NP = Naive Pipeline, EP = Effective Pipeline .

4
Why hadn't we use here this formula

Execution time = (5+4+20+10+3)+4*2+(20-1)*(22)
0
Why are we considering total time here instead of average time??

@bikram sir  as you defined speed up as

Speed up= average time without pipeline/ average time with pipeline

Do we have to consider total time while comparing or calculating spped up of two pipelines always??
0

speedup is a number that measures the relative performance of two systems processing the same problem

0
why is it n-1 and not just n in the formula?
0

@, check only for 3 instruction with general pipeline approach. you will understand that why it is not n.

2

the processor requires buffering between each pair of consecutive stages with a delay of 2 ns.

So for 5 stages, we will have 4 buffers and for 6 stages, we have 5 buffers. The final ratio should be 526/348 = 1.51. It doesn't change the answer by much though

0
if we use same method for Ep;

$Execution time of ep(Tep) = (5+2+4+2+12+2)_{execution-of-one instruction-till-max-delay-segment} + ( 20-1)\times(12+2)_{time-for-executing-remaining-12ns-segments} + (8+2+10+2+3+2)_{time-to-execute-remaining-segments-in-lst-instruction}$

Tep = 318ns

Similarly Tnp = 5+4+20+10+3+2*4+(20-1)*(20+2) = 468ns

$speedup = \frac{468}{318} = 1.47$

NP=5,4,20,10,3    ,latch=2  ,clock cycle time=20+2=22

EP=5,4,12,8,10,3 ,latch=2 ,clock cycle time=12+2=14

For 20 instructions

NP=(5+19)*22=528

EP=(6+19)*14=350

speedup=NP/EP=528/350=1.508

edited
0
I rounded it off to 1.51. Will my answer be correct?
2

i have written 1.50 in exam.

i think they will have range from 1.50 to 1.51 like below gateoverflow key,so don't worry.

### Case -1 Naive Pipeline

Since the cycle time is chosen as the largest stage time, so here the cycle time would be 20ns

We are given that there is an interstage delay of 2ns after each stage. So effectively each stage would take 22ns.  (Except the last stage of last instruction being executed)

So, let us choose the cycle time as 22ns

So, 20 instructions would take

[1st instruction x (20ns max stage time + 2ns stage delay) x total stages] +
[Rest 19 instructions x (20ns max stage + 2ns stage delay)] - 2ns (as last stage would not take buffering time)
[22 x 5] + [19 x 22 ] - 2
110 + 418 - 2
526

### Case - 2 Efficient Pipeline

Here cycle time is chosen as the largest stage time (i.e of stage OF2), so here the cycle time would be 12ns

We are given that there is an interstage delay of 2ns after each stage. So effectively each stage would take 14ns.  (Except the last stage of last instruction being executed)

So, let us choose the cycle time as 22ns

So, 20 instructions would take

[1st instruction x (12ns max stage time + 2ns stage delay) x total stages] +
[Rest 19 instructions x (12ns max stage + 2ns stage delay)] - 2ns (as last stage would not take buffering time)
[14 x 6] + [19 x 14 ] - 2
84 + 266 - 2
348

Speedup is given by

Time taken without EP / Time taken with EP
526 / 348
1.51

edited
0
22ns isnot for easy calculation.

it is because cycle time of synchronous pipeline=the stage with max  time + buffer time
0
No. In reality 20ns will be cycle time and stage delay (buffer) would be 2ns.

Just in case the question has asked for Frequency, then I found have found not using 1 / 20ns = 50 Mhz

and not 1/22 = 45 Mhz

Right ?
0
why do u think 1/22 ns. is not correct?
0
Sorry.. I gave a thought on it.. It has to be 22ns. Delay of 2ns has to be added to 20ns. This is because each stage must complete in a single clock cycle. If I take as 20 ns, then after 20ns, other cycle will start and would only end after 20ns. So next stage would be able to start at 40th ns.

This is not desired and this is not intended. So, I was wrong. Thanks for pointing out
0
why are we considering naive pipeline answer in numerator of speedup formula?

Speedup= avg. Instruction execution time without pipeline/avg.instruction execution time with pipeline

naive pipeline has pipeline execution and we want without pipeline in numerator.??

For Naive pipelined CPU

K = 5, Tseg = max(5,4,20,10,3) + 2(delay) = 22 ns, n = 20.

Total time needed for 20 instructions

TNP =(k+n-1) x Tseg = (5 + 20 –1) x 22 ns = 24 x 22 ns = 528 ns

For Efficient pipelined processor

K = 6,Tseg = max(5,4,12,2,10,3) + 2(delay) = 14 ns; k = 6, n = 20

Total time for 20 instructions

TEP =(k+n-1) x Tseg = (6 + 20 – 1)  x 14 ns = 350 ns.

Speed up =528/350 =1.508

Naive pipeline Implementation:

IF                      ID                     OF                     EX                 WB

5ns                  4ns                20ns                  10ns             3ns

delay = 2 ns

Phase time =largest stage delay + delay = 20 + 2  =22 ns

Time to Execute N instructions in pipelined  processor = k * phase time + (N-1)  * phase time                      k=number of stages

= 5 * 22  + 19 * 22

= 528 ns

Similarly, Efficient pipeline Implementation:

IF                      ID                     OF1               OF2                     EX                 WB

5ns                  4ns                      12ns             8ns                     10ns                3ns

delay = 2 ns

Phase time =largest stage delay + delay = 12 + 2  =14 ns

Time to Execute N instructions in pipelined  processor = k * phase time + (N-1)  * phase time                      k=number of stages

= 6 *14   +  19 * 14

= 350 ns

The speedup (correct to two decimal places) achieved by EP over NP =  528 /350 = 1.50
The exact answer is 528/350 = 1.50

In GATE 2017 official answer key, answer is given in range 1.49 to 1.52.

Question no. 50(cs set 1)

edited
Naive Pipeline implementation:
The stage delays are 5, 4, 20, 10 and 3. And buffer delay = 2ns
So clock cycle time = max of stage delays + buffer delay
= max(5, 4, 20, 10,3)+2
= 20+2
= 22ns
Execution time for n-instructions in a pipeline with k-stages = (k+n-1) clock cycles
= (k+n-1)* clock cycle time
In this case execution time for 20 instructions in the pipeline with 5-stages
= (5+20-1)*22ns
= 24*22
= 528ns
Efficient Pipeline implementation:
OF phase is split into two stages OF1, OF2 with execution times of 12ns, 8ns
New stage delays in this case = 5, 4, 12, 8, 10, 3
Buffer delay is the same 2ns.
So clock cycle time = max of stage delays + buffer delay
= max(5, 4, 12, 8, 10,3) + 2
= 12+2
= 14ns
Execution time = (k+n-1) clock cycles
= (k+n-1)* clock cycle time
In this case no. of pipeline stages, k = 6
No. of instructions = 20
Execution time = (6+20-1)*14 = 25*14 = 350ns
Speed up of Efficient pipeline over native pipeline
= Naive pipeline execution time / efficient pipeline execution time
= 528 / 350
≌ 1.51

## Related questions

1
10.2k views
A cache memory unit with capacity of $N$ words and block size of $B$ words is to be designed. If it is designed as a direct mapped cache, the length of the TAG field is 10 bits. If the cache unit is now designed as a 16-way set-associative cache, the length of the TAG field is ____________ bits.
Consider a $2-$way set associative cache with $256$ blocks and uses $LRU$ replacement. Initially the cache is empty. Conflict misses are those misses which occur due to the contention of multiple blocks for the same cache set. Compulsory misses occur due to first time access ... $10$ times. The number of conflict misses experienced by the cache is _________ .
Consider a RISC machine where each instruction is exactly $4$ bytes long. Conditional and unconditional branch instructions use PC-relative addressing mode with Offset specified in bytes to the target location of the branch instruction. Further the Offset is always with respect to ... $i,$ then the decimal value of the Offset is ____________ .
Consider a two-level cache hierarchy with $L1$ and $L2$ caches. An application incurs $1.4$ memory accesses per instruction on average. For this application, the miss rate of $L1$ cache is $0.1$; the $L2$ cache experiences, on average, $7$ misses per $1000$ instructions. The miss rate of $L2$ expressed correct to two decimal places is ________.