10,018 views

Consider a pipeline processor with $4$ stages $S1$ to $S4$. We want to execute the following loop:

for (i = 1; i < = 1000; i++)
{I1, I2, I3, I4} 

where the time taken (in ns) by instructions $I1$ to $I4$ for stages $S1$ to $S4$ are given below:

$$\begin{array}{|c|c|c|c|c|} \hline & \textbf {S _1} &\textbf {S _2} & \textbf {S _3} & \textbf{S _4 } \\\hline \textbf{I1}& \text{1} & \text{2} & \text{1} & \text{2} \\\hline \textbf{I2} & \text{2} & \text{1} & \text{2} & \text{1}\\\hline \textbf{I3}& \text{1} & \text{1} & \text{2} & \text{1} \\\hline \textbf{I4} & \text{2} & \text{1} & \text{2} & \text{1} \\\hline \end{array}$$

The output of  $I1$ for $i = 2$ will be available after

1. $\text{11 ns}$
2. $\text{12 ns}$
3. $\text{13 ns}$
4. $\text{28 ns}$

Wrt the question and the selected answer given here, why aren't we considering all the above 4 pipeline stages to be of the same duration, i.e., the duration having the maximum length of 2 ns ??

Is it because in that question, constant clocking rate has been asked to be considered ?

edited
@the_bob

If your doubt is cleared then please let me know the reason too. What I believe is if there are synchronous transfer / constant clocking rate terms are used then we need to use max(d1, d2, d3) where d1, d2, and d3 is the delay of stages S1, S2, S3 because then we need to have constant transfer for all the stages. And if those above-mentioned terms aren’t used then consider time for each stage of instruction, then using summation and the required formula we can easily calculate as we have done in this problem. If I am wrong then please correct me.

$$\begin{array}{|c|c|c|c|c|} \hline \textbf{} & \textbf {t1} & \textbf {t2} & \textbf {t3} & \textbf {t4} & \textbf {t5} & \textbf {t6} & \textbf {t7} & \textbf {t8} & \textbf {t9} & \textbf {t10} & \textbf {t11} & \textbf {t12} & \textbf {t13} \\\hline \textbf{I1}& \text{s _1} & \text{s _2} & \text{s _2} & \text{s _3} & \text{s _4} & \text{s _4} \\\hline \textbf{I2} & & \text{s _1} & \text{ s _1} & \text{ s _2} & \text{s _3} & \text{s _3} & \text{s _4}\\\hline \textbf{I3}& & & &\text{s_1} & \text{s_2} & \text{--}&\text{s_3} & \text{s _3} & \text{s _4}\\\hline \textbf{I4} & &&&&\text{s_1} & \text{s_1} & \text{s_2} & \text{--} & \text{s_3} & \text{s _3}& \text{s _4}\\\hline \textbf{I5} & &&&&&&\text{s_1} & \text{--} & \text{s_2} & \text{s _2} & \text{s_3} & \text{s _4}& \text{s _4}\\\hline \end{array}$$

So, total time would be $13\;ns$

Option (c).
by

You are right.
edited

The answer is 13 ns, but in the 2nd time for I1 ,S2 phase will not start at 8th cycle bcz S3 for I4 has not stated yet.So S2 will start at 9th cylcle and so both 9th and 10th cycle will be used by S2 of I1 (in the 2nd time) thus the total clock cycles still remain 13.

 1 2 3 4 5 6 7 8 9 10 11 12 13 I1 s1 s2 s2 s3 s4 s4 I2 s1 s1 s2 s3 s3 s4 I3 s1 s2 s3 s3 s4 I4 s1 s1 s2 s3 s3 s4 I1 s1 s2 s2 s3 s4 s4
yes, thats correct.
@Arjun @Uddipto

I read on some gate question here only that sufficient buffers can be assumed in the pipeline questions. So why can't we start S2 of I1 at 8th cycle?
Buffers are considered for operand forwarding.

Since nothing is mentioned about operand forwarding, we don't consider buffers here
edited by
Why we have put s1 stage of i4 instruction in 6th cycle when i3 instruction is waiting for s3 stage in that cycle Kkk got it as s1 stage  of  i4  depends  on s2 stage of  i3 and It has been  completed

Can some one tell me why loop unrolling concepts is not used here like in this one:-https://gateoverflow.in/1314/gate2009-28 .Can some one explain when to use loop enrolling and when not to.Although answer will be same but with loop unrolling for second iteration of I1 it can do work in 8th and 9th clock cycle and not in 9th and 10th clock cycle.Please help here

I am not able to understand why S2 started at 9th clock and not at 8th clock, I am not able to understand the reason behind it.
In pipeline execution, each stage produces output to a buffer and next stage takes it from there. So, unless the previous instruction goes to the next stage, the current instruction cannot execute a given stage. But this problem can be avoided if multiple buffers are used (or a queue).
Thanks Sir, got it now!

Sir but why are we not assuming the use of queue as in  https://gateoverflow.in/1314/gate2009-28

Because it doesnt make a difference here. As long as all stages are taking same cycles, we dont need a queue.

@rahul sharma 5

Even you consider loop level parallelism for it you will still get answer as 13ns.

@Arjun sir. In pipelined architecture, don't we consider the slowest stage time as the time fr each stage?

Then why don't we consider in this question that since the maximum time taken by any instruction in any stage is 2 units, therefore every stage will be synchronized with a clock so that each stage takes 2 unit time? I am having a lot of confusion in this subject. Can you please tell me some resource to study this from.

@Arjun sir,You mentioned

Because it doesnt make a difference here. As long as all stages are taking same cycles, we dont need a queue.

But here also different stage delays are there.I am getting same answer with both(with and without loop level enrollment),but it will not be same always as in https://gateoverflow.in/1314/gate2009-28

same doubt.

In the below example, it was said that "multiple stage buffers which can queue the result of the stages"

then in the second iteration s2 should start in 8 clk cycle. Is it ?

edited

When should we consider stage buffers as queue and when not ..because of which answers may be different here it is same but for this https://gateoverflow.in/1512/gate1999-13

BY DEFAULT ..we should consider a single buffer right not as queue right ???? ..that is unless prev instruction leaves stage and get into next stage current instruction can't enter into that stage

yes single stage buffer

Suppose $I_{1}$ Decode stage started, with that $I_{2}$ instruction Fetch stage can start
Can someone please tell where to study the concept of multistage bufffer

I have the same doubt. Did you get an answer @Rishabh Gupta 2 ? Slowest stage time for each stage is considered in this question: https://gateoverflow.in/1063/gate2004-69. So why are we not doing the same here? Arjun sir also mentioned in a comment for that answer that if no information about clock rate is given, we must assume uniform clock rate. Therefore, clock period for this question should be considered as 2ns. Please correct me if I am wrong

There’s mistake here although this is giving the correct ans but for I5 s2 shd be in t8 and t9 and not in t9 and t10. correct me if i am wrong

ans) 13

### 1 comment

why have you done I1 I2 I3 I4 and again I1 for each stage.

Silly mistake should be taken care of

Remember we have to find second iteration of Instruction I1 finish time.

by

### 1 comment

Actually you made a silly mistake here. Check the answers given by others, there will be a stall at 8th clock cycle when I1 is executing for second time.