The Gateway to Computer Science Excellence
First time here? Checkout the FAQ!
+28 votes

Consider a $4$ stage pipeline processor. The number of cycles needed by the four instructions $I1, I2, I3, I4$ in stages $S1, S2, S3, S4$ is shown below:

$$\begin{array}{|c|c|c|c|c|} \hline \textbf{} & \textbf {$S _1$} &\textbf {$S _2$} & \textbf {$S _3$} &  \textbf{$S _4$ } \\\hline \textbf{I1}& \text{$2$} & \text{$1$}  & \text{$1$} & \text{$1$} \\\hline \textbf{I2} & \text{$1$} & \text{$3$} & \text{$2$}  & \text{$2$}\\\hline  \textbf{I3}& \text{$2$} & \text{$1$}  & \text{$1$} & \text{$3$} \\\hline \textbf{I4} & \text{$1$} & \text{$2$} & \text{$2$}  & \text{$2$} \\\hline  \end{array}$$


What is the number of cycles needed to execute the following loop?

For $(i=1$ to $2)$ {I1; I2; I3; I4;}

  1. $16$
  2. $23$
  3. $28$
  4. $30$
asked in CO & Architecture by Veteran (52k points)
edited by | 9.3k views
If inter stage buffers are used, then 23 cycles are required, else answer will be 25 cycles.
If 25 would have been an option then what would you choose ?
Then the question should state whether interstage buffers are used or not, else the question will likely to be cancelled.
what if have to find speed gain for I2 instruction and for all overall instruction?

 @Arjun sir should loop level parallelism be taken by default or not?

Here if we consider loop level parallelism then it is taking 25 cycles but no option is matching so we must execute without loop level parallelism hence ans is 30 cycles
Thanks I got that was just asking if by default parallelism should be considered or not in the case where both had been given perhaps
It will be considered

8 Answers

+37 votes
Best answer

Here bound of the loop are constants, therefore compiler will do the loop unrolling(If compiler won't then prefetcher will do) to increase the instruction level parallelism. And after loop unrolling $23$ cycles are required for execution. Therefore, correct answer would be (B).

PS: We assume the buffers between the pipeline stages can store multiple results in the form of a queue.

$$\small \begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|} \hline &C_1&C_2&C_3&C_4&C_5&C_6&C_7&C_8&C_9&C_{10}&C_{11}&C_{12}&C_{13}&C_{14}&C_{15}&C_{16}&C_{17}&C_{18}&C_{19}&C_{20}
\bf{I_2}&&&S_1&S_2&S_2&S_2&S_3& S_3&S_4&S_4\\\hline
\bf{I_3}&&&&S_1&S_1&\color{red}{-}&S_2&\color{red}{-}&S_3&\color{red}{-}&S_4& S_4&S_4\\\hline
\bf{I_2}&&&&&&&&&S_1&\color{red}{-}&S_2&S_2&S_2&S_3& S_3&\color{red}{-}&S_4&S_4\\\hline
\bf{I_3}&&&&&&&&&&S_1&S_1&\color{red}{-}&\color{red}{-}&S_2&\color{red}{-}&S_3&\color{red}{-}&\color{red}{-}&S_4& S_4&S_4\\\hline
\bf{I_4} &&&&&&&&&&&&S_1&\color{red}{-}&\color{red}{-}&S_2&S_2&S_3&S_3&\color{red}{-}&\color{red}{-}&\color{red}{-}&S_4&S_4\\\hline

\end{array} $$

answered by Loyal (5.8k points)
edited by

@Prateek Since the loop is running for 2 times(i.e constant times), we can say that loop unrolling will be done by the compiler(loop optimization topic in compiler design). So, the code will be transformed as:

$I_1, I_2, I_3, I_4,I_1, I_2, I_3, I_4 $

And loop instructions, like GOTO will be removed (by the COMPILER), so there will be no jumps and branching.


If single stage buffer is used ... is the below way of solving correct ?

Got it. Thanks
Sir, for i=2 it will be like taking branch so all earlier excution will be flush out . so when you are going to start for i=2 start instruction cycle from 15. isnt it? so ans should be 30 otherwise 23 is correct.
When can/should we use multiple stage buffers?
If no option is given ans should be 24
As, in i=2 in I2, S3 should start 1 stage later, than the pic shown
Yes @sresha it's $24$ following the way questions are being solved in this board.
How can we know that we can store multiple results  in the buffer.

@Arjun sir, @sushmita , @Ayush Upadhyaya

If it would have been numerical answer type..then what will be the answer??

When should we consider buffers as queue and when as single stage buffer ????

according to me, if loop condition is a constant one here i=1 to 2, we can use loop unrolling and the answer should be 23 cycles.

If the second iteration depends on execution of the last instruction then we should do accordingly without unrolling.
+19 votes
this is the loop level level paralellism question but when we apply the loop level parallelism then we get 25 cycles so dis is not in option so we have to do it without loop level parallelism and the frst time loop output the result at 15 cc so total 30 cc for 2 iterations
answered by Active (2.6k points)
how is this a loop level parallelism question?
+13 votes

When using the following timeline we get 23 cycles:


In the 6th cycle, instruction 4 is entering stage 1 while instruction 3 has not started stage 2. This requires additional output buffer for stage 1 and we can assume this is available. (This additional buffer is not required if all stage delays are the same).  

Hence , option (b) is true.

answered by Active (5k points)
reshown by
There's some problem here rt? S3 stages of I1 and I2
S3 stage of I1 is 1 cycle and of I2 is 2 cycles as per question.
yes, I got confused with stages and instructions.
Pipelining, if suppose 1 instruction i1 has completed s1 stage but it cant go to s2,s2 is occupied by i0 instruction Then i2 came, can it enter first stage i mean if prev instruction hv completed 1st stage but not moved to next state.. Can new instruction be fetched in that cycle
no in a normal pipeline.

Ok,  in above diag,  in 6th cycle we are doing it introducing i4 while i3 hasn't  gone to next stage

In this ques if we take that approach,  we will get diff ans i think i4 will be introduced in 7th clock. 

  but why cant we use that stage if its free?cant we just use intermediate buffers to store prev instruction output. We will require those buffers only at end of that stage for new instruction. 

Ok so i hv read discussion regarding this doubt in below ans.. So in which cases we can use that particular stage.. As we are using here

Process-2 can't get into slot-9 untill it completes first time!

@Arjun sir,

Why loop enrolling is not used in

+7 votes

Concept for solving this question:

Stage Si is allocated to instruction Ii iff

1. Eligibility: Ii must have completed Si-1 ... X time

2. Availability: Ii-1 must have released Si ... Y time

Time of allocation =  Max(X,Y)

In first iteration:

I1 completes after 5th cycle

I2 completes after 10th cycle

I3 completes after 13th cycle

I4 completes after 15th cycle

In second iteration

I1 completes after 16th cycle

I2 completes after 18th cycle

I3 completes after 21th cycle

I4 completes after 23th cycle

Answer : B

answered by Boss (13.2k points)
In second iteration I2 completes in 20th cycle and not 18th cycle. So we have to go with 30 only as 25 is not present in options.
23 is not correct u will get 25 but as this is not in the options loop level parallelism wont work here and u have to go drctly by 2*15=30
+6 votes

ans 23

answered by Active (2.4k points)
Only after excution of I4 , the loop enter next iteration right?

For one iteration I get 15. So for two iterations it would be 30 only right?
Is there any reference to this concept in Hamacher or stallings or any other book?
@rahul sharma 5

This is loop level parallelism . you can read this topic .
+2 votes

here is solution::

answered by Loyal (9.7k points)
I4 cycle 6
is wrong filled
0 votes

i think this link will help you in much more clear way

answered by Active (5k points)
–1 vote
Correct answer is 23 Cycles. here loop level parallelism used.
answered by (429 points)

Related questions

Quick search syntax
tags tag:apple
author user:martin
title title:apple
content content:apple
exclude -tag:apple
force match +apple
views views:100
score score:10
answers answers:2
is accepted isaccepted:true
is closed isclosed:true
49,535 questions
54,120 answers
71,034 users