41 votes

Consider a $4$ stage pipeline processor. The number of cycles needed by the four instructions $I1, I2, I3, I4$ in stages $S1, S2, S3, S4$ is shown below:

$$\begin{array}{|c|c|c|c|c|} \hline \textbf{} & \textbf {$S _1$} &\textbf {$S _2$} & \textbf {$S _3$} & \textbf{$S _4$ } \\\hline \textbf{I1}& \text{$2$} & \text{$1$} & \text{$1$} & \text{$1$} \\\hline \textbf{I2} & \text{$1$} & \text{$3$} & \text{$2$} & \text{$2$}\\\hline \textbf{I3}& \text{$2$} & \text{$1$} & \text{$1$} & \text{$3$} \\\hline \textbf{I4} & \text{$1$} & \text{$2$} & \text{$2$} & \text{$2$} \\\hline \end{array}$$

What is the number of cycles needed to execute the following loop?

For $(i=1$ to $2)$ {I1; I2; I3; I4;}

- $16$
- $23$
- $28$
- $30$

0

Then the question should state whether interstage buffers are used or not, else the question will likely to be cancelled.

0

Here if we consider loop level parallelism then it is taking 25 cycles but no option is matching so we must execute without loop level parallelism hence ans is 30 cycles

53 votes

Best answer

Here bound of the loop are constants, therefore compiler will do the loop unrolling(If compiler won't then prefetcher will do) to increase the instruction level parallelism. And after loop unrolling $23$ cycles are required for execution. Therefore, correct answer would be **(B).**

PS: We assume the buffers between the pipeline stages can store multiple results in the form of a queue.

$$\scriptsize \begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|} \hline &C_1&C_2&C_3&C_4&C_5&C_6&C_7&C_8&C_9&C_{10}&C_{11}&C_{12}&C_{13}&C_{14}&C_{15}&C_{16}&C_{17}&C_{18}&C_{19}&C_{20}

&C_{21}&C_{22}&C_{23}\\\hline

\bf{I_1}&S_1&S_1&S_2&S_3&S_4\\\hline

\bf{I_2}&&&S_1&S_2&S_2&S_2&S_3& S_3&S_4&S_4\\\hline

\bf{I_3}&&&&S_1&S_1&\color{red}{-}&S_2&\color{red}{-}&S_3&\color{red}{-}&S_4& S_4&S_4\\\hline

\bf{I_4}&&&&&&S_1&\color{red}{-}&S_2&S_2&S_3&S_3&\color{red}{-}&\color{red}{-}&S_4&S_4\\\hline

\bf{I_1}&&&&&&&S_1&S_1&\color{red}{-}&S_2&\color{red}{-}&S_3&\color{red}{-}&\color{red}{-}&\color{red}{-}&S_4\\\hline

\bf{I_2}&&&&&&&&&S_1&\color{red}{-}&S_2&S_2&S_2&S_3& S_3&\color{red}{-}&S_4&S_4\\\hline

\bf{I_3}&&&&&&&&&&S_1&S_1&\color{red}{-}&\color{red}{-}&S_2&\color{red}{-}&S_3&\color{red}{-}&\color{red}{-}&S_4& S_4&S_4\\\hline

\bf{I_4} &&&&&&&&&&&&S_1&\color{red}{-}&\color{red}{-}&S_2&S_2&S_3&S_3&\color{red}{-}&\color{red}{-}&\color{red}{-}&S_4&S_4\\\hline

\end{array} $$

1

@Arjun sir

In the T_{22 } & T_{23} cycle , I4 instruction second iteration, it should be S4 instead of 2.

1

@Arjun sir @srestha

I am little bit confused now...

1)If we do it normally (considering as 8 instructions) 24 cycles

2)If no loop level parallelism : for iteration 1=15cycles, for iteration 2=15

Total 30 cycles

3)in option 23 (nearest to 24, less clock cycles hence better), 30 both are present. So what to choose?

4) should we answer a question in such a case?

Please help :(

I am little bit confused now...

1)If we do it normally (considering as 8 instructions) 24 cycles

2)If no loop level parallelism : for iteration 1=15cycles, for iteration 2=15

Total 30 cycles

3)in option 23 (nearest to 24, less clock cycles hence better), 30 both are present. So what to choose?

4) should we answer a question in such a case?

Please help :(

1

How can we start the next iteration before the previous loop has got over? After each loop the condition has to be checked first and then only the next instruction can go. Please help me with this doubt.

3

@Prateek Since the loop is running for 2 times(i.e constant times), we can say that **loop unrolling** will be done by the compiler(loop optimization topic in compiler design). So, the code will be transformed as:

$I_1, I_2, I_3, I_4,I_1, I_2, I_3, I_4 $

And loop instructions, like GOTO will be removed (by the COMPILER), so there will be no jumps and branching.

0

Sir, for i=2 it will be like taking branch so all earlier excution will be flush out . so when you are going to start for i=2 start instruction cycle from 15. isnt it? so ans should be 30 otherwise 23 is correct.

1

If no option is given ans should be 24

right?

As, in i=2 in I2, S3 should start 1 stage later, than the pic shown

right?

As, in i=2 in I2, S3 should start 1 stage later, than the pic shown

5

@Arjun sir, @sushmita , @Ayush Upadhyaya

If it would have been numerical answer type..then what will be the answer??

When should we consider buffers as queue and when as single stage buffer ????

0

according to me, if loop condition is a constant one here i=1 to 2, we can use loop unrolling and the answer should be 23 cycles.

If the second iteration depends on execution of the last instruction then we should do accordingly without unrolling.

If the second iteration depends on execution of the last instruction then we should do accordingly without unrolling.

0

why is stage 1 of I4 happening at clock cycle C6?

shouldn't it start at C7?

otherwise I4 will occupy I3's stage 1 buffer?

could u help here?

0

It is concept of multistage buffer

right??

chk here:https://gateoverflow.in/165929/doubt-in-pipelining-questions

0

There are some difference, but that should explicitly mentioned in question, and I think no question came like that , other than these two Rishabh Gupta mentioned. Is there any other question in came like this? Otherwise , that concept is not very clear to me too. We can do it just like normal pipeline.

20 votes

this is the loop level level paralellism question but when we apply the loop level parallelism then we get 25 cycles so dis is not in option so we have to do it without loop level parallelism and the frst time loop output the result at 15 cc so total 30 cc for 2 iterations

14 votes

When using the following timeline we get **23** cycles:

In the 6th cycle, instruction 4 is entering stage 1 while instruction 3 has not started stage 2. This requires additional output buffer for stage 1 and we can assume this is available. (This additional buffer is not required if all stage delays are the same).

Hence , option (b) is true.

0

Pipelining, if suppose 1 instruction i1 has completed s1 stage but it cant go to s2,s2 is occupied by i0 instruction Then i2 came, can it enter first stage i mean if prev instruction hv completed 1st stage but not moved to next state.. Can new instruction be fetched in that cycle

2

Ok, in above diag, in 6th cycle we are doing it introducing i4 while i3 hasn't gone to next stage

In this ques if we take that approach, we will get diff ans i think i4 will be introduced in 7th clock.

but why cant we use that stage if its free?cant we just use intermediate buffers to store prev instruction output. We will require those buffers only at end of that stage for new instruction.

Ok so i hv read discussion regarding this doubt in below ans.. So in which cases we can use that particular stage.. As we are using here

7 votes

Concept for solving this question:

Stage S_{i} is allocated to instruction I_{i }iff

1. Eligibility: I_{i} must have completed S_{i-1} ... X time

2. Availability: I_{i-1 }must have released S_{i }... Y time

Time of allocation = Max(X,Y)

In first iteration:

I1 completes after 5th cycle

I2 completes after 10th cycle

I3 completes after 13th cycle

I4 completes after 15th cycle

In second iteration

I1 completes after 16th cycle

I2 completes after 18th cycle

I3 completes after 21th cycle

I4 completes after 23th cycle

Answer : B

6 votes

0

Only after excution of I4 , the loop enter next iteration right?

For one iteration I get 15. So for two iterations it would be 30 only right?

For one iteration I get 15. So for two iterations it would be 30 only right?

0

@Anil Khatri @Nithish @rahul sharma 5 @Bikram What if we have only **Single Stage Buffer**?

By default, we need to take **Multiple Stage Buffer**?

0 votes

http://www.techtud.com/example/consider-4-stage-pipeline-processor

i think this link will help you in much more clear way