search
Log In
41 votes
13.4k views

Consider a $4$ stage pipeline processor. The number of cycles needed by the four instructions $I1, I2, I3, I4$ in stages $S1, S2, S3, S4$ is shown below:

$$\begin{array}{|c|c|c|c|c|} \hline \textbf{} & \textbf {$S _1$} &\textbf {$S _2$} & \textbf {$S _3$} &  \textbf{$S _4$ } \\\hline \textbf{I1}& \text{$2$} & \text{$1$}  & \text{$1$} & \text{$1$} \\\hline \textbf{I2} & \text{$1$} & \text{$3$} & \text{$2$}  & \text{$2$}\\\hline  \textbf{I3}& \text{$2$} & \text{$1$}  & \text{$1$} & \text{$3$} \\\hline \textbf{I4} & \text{$1$} & \text{$2$} & \text{$2$}  & \text{$2$} \\\hline  \end{array}$$

 

What is the number of cycles needed to execute the following loop?

For $(i=1$ to $2)$ {I1; I2; I3; I4;}

  1. $16$
  2. $23$
  3. $28$
  4. $30$
in CO and Architecture
edited by
13.4k views
1
If inter stage buffers are used, then 23 cycles are required, else answer will be 25 cycles.
0
If 25 would have been an option then what would you choose ?
0
Then the question should state whether interstage buffers are used or not, else the question will likely to be cancelled.
0
what if have to find speed gain for I2 instruction and for all overall instruction?
0

 @Arjun sir should loop level parallelism be taken by default or not?

0
Here if we consider loop level parallelism then it is taking 25 cycles but no option is matching so we must execute without loop level parallelism hence ans is 30 cycles
0
Thanks I got that was just asking if by default parallelism should be considered or not in the case where both had been given perhaps
1
It will be considered

7 Answers

53 votes
 
Best answer

Here bound of the loop are constants, therefore compiler will do the loop unrolling(If compiler won't then prefetcher will do) to increase the instruction level parallelism. And after loop unrolling $23$ cycles are required for execution. Therefore, correct answer would be (B).

PS: We assume the buffers between the pipeline stages can store multiple results in the form of a queue.

$$\scriptsize \begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|} \hline &C_1&C_2&C_3&C_4&C_5&C_6&C_7&C_8&C_9&C_{10}&C_{11}&C_{12}&C_{13}&C_{14}&C_{15}&C_{16}&C_{17}&C_{18}&C_{19}&C_{20}
&C_{21}&C_{22}&C_{23}\\\hline
\bf{I_1}&S_1&S_1&S_2&S_3&S_4\\\hline
\bf{I_2}&&&S_1&S_2&S_2&S_2&S_3& S_3&S_4&S_4\\\hline
\bf{I_3}&&&&S_1&S_1&\color{red}{-}&S_2&\color{red}{-}&S_3&\color{red}{-}&S_4& S_4&S_4\\\hline
\bf{I_4}&&&&&&S_1&\color{red}{-}&S_2&S_2&S_3&S_3&\color{red}{-}&\color{red}{-}&S_4&S_4\\\hline
\bf{I_1}&&&&&&&S_1&S_1&\color{red}{-}&S_2&\color{red}{-}&S_3&\color{red}{-}&\color{red}{-}&\color{red}{-}&S_4\\\hline
\bf{I_2}&&&&&&&&&S_1&\color{red}{-}&S_2&S_2&S_2&S_3& S_3&\color{red}{-}&S_4&S_4\\\hline
\bf{I_3}&&&&&&&&&&S_1&S_1&\color{red}{-}&\color{red}{-}&S_2&\color{red}{-}&S_3&\color{red}{-}&\color{red}{-}&S_4& S_4&S_4\\\hline
\bf{I_4} &&&&&&&&&&&&S_1&\color{red}{-}&\color{red}{-}&S_2&S_2&S_3&S_3&\color{red}{-}&\color{red}{-}&\color{red}{-}&S_4&S_4\\\hline

\end{array} $$


edited by
1

@Arjun sir

In the T22  & T23 cycle , I4 instruction second iteration, it should be S4 instead of 2.

0
Yes. It'll be corrected soon.
1
@Arjun sir @srestha

I am little bit confused now...

1)If we do it normally (considering as 8 instructions) 24 cycles

2)If no loop level parallelism : for iteration 1=15cycles, for iteration 2=15

Total 30 cycles

3)in option 23 (nearest to 24, less clock cycles hence better), 30 both are present. So what to choose?

4) should we answer a question in such a case?

Please help :(
1
How can we start the next iteration before the previous loop has got over? After each loop the condition has to be checked first and then only the next instruction can go. Please help me with this doubt.
3

@Prateek Since the loop is running for 2 times(i.e constant times), we can say that loop unrolling will be done by the compiler(loop optimization topic in compiler design). So, the code will be transformed as:

$I_1, I_2, I_3, I_4,I_1, I_2, I_3, I_4 $

And loop instructions, like GOTO will be removed (by the COMPILER), so there will be no jumps and branching.

9

If single stage buffer is used ... is the below way of solving correct ?

0
Got it. Thanks
0
Sir, for i=2 it will be like taking branch so all earlier excution will be flush out . so when you are going to start for i=2 start instruction cycle from 15. isnt it? so ans should be 30 otherwise 23 is correct.
0
When can/should we use multiple stage buffers?
1
If no option is given ans should be 24
right?
As, in i=2 in I2, S3 should start 1 stage later, than the pic shown
0
Yes @sresha it's $24$ following the way questions are being solved in this board.
1
How can we know that we can store multiple results  in the buffer.
5

@Arjun sir, @sushmita , @Ayush Upadhyaya

If it would have been numerical answer type..then what will be the answer??

When should we consider buffers as queue and when as single stage buffer ????
 

0
according to me, if loop condition is a constant one here i=1 to 2, we can use loop unrolling and the answer should be 23 cycles.

If the second iteration depends on execution of the last instruction then we should do accordingly without unrolling.
0

why is stage 1 of I4 happening at clock cycle C6?
shouldn't it start at C7?
otherwise I4 will occupy I3's stage 1 buffer?

 

@srestha

could u help here?

0

@aditi19

It is concept of multistage buffer

right??

chk here:https://gateoverflow.in/165929/doubt-in-pipelining-questions

0

@srestha

I have heard of interstage buffers

what is multistage buffer in pipeline?

0
means parallely many buffer working at a time.
0
are they same as interstage buffers?
0

@aditi19

There are some difference, but that should explicitly mentioned in question, and I think no question came like that , other than these two Rishabh Gupta mentioned. Is there any other question in came like this? Otherwise , that concept is not very clear to me too. We can do it just like normal pipeline.

20 votes
this is the loop level level paralellism question but when we apply the loop level parallelism then we get 25 cycles so dis is not in option so we have to do it without loop level parallelism and the frst time loop output the result at 15 cc so total 30 cc for 2 iterations
0
how is this a loop level parallelism question?
14 votes

When using the following timeline we get 23 cycles:

 

In the 6th cycle, instruction 4 is entering stage 1 while instruction 3 has not started stage 2. This requires additional output buffer for stage 1 and we can assume this is available. (This additional buffer is not required if all stage delays are the same).  

Hence , option (b) is true.


reshown by
0
There's some problem here rt? S3 stages of I1 and I2
0
S3 stage of I1 is 1 cycle and of I2 is 2 cycles as per question.
0
yes, I got confused with stages and instructions.
0
Pipelining, if suppose 1 instruction i1 has completed s1 stage but it cant go to s2,s2 is occupied by i0 instruction Then i2 came, can it enter first stage i mean if prev instruction hv completed 1st stage but not moved to next state.. Can new instruction be fetched in that cycle
0
no in a normal pipeline.
2

Ok,  in above diag,  in 6th cycle we are doing it introducing i4 while i3 hasn't  gone to next stage

In this ques if we take that approach,  we will get diff ans i think i4 will be introduced in 7th clock. 

  but why cant we use that stage if its free?cant we just use intermediate buffers to store prev instruction output. We will require those buffers only at end of that stage for new instruction. 

Ok so i hv read discussion regarding this doubt in below ans.. So in which cases we can use that particular stage.. As we are using here

0
Process-2 can't get into slot-9 untill it completes first time!
1

@Arjun sir,

Why loop enrolling is not used in https://gateoverflow.in/3690/gate2004-it-47

7 votes

Concept for solving this question:

Stage Si is allocated to instruction Ii iff

1. Eligibility: Ii must have completed Si-1 ... X time

2. Availability: Ii-1 must have released Si ... Y time

Time of allocation =  Max(X,Y)

In first iteration:

I1 completes after 5th cycle

I2 completes after 10th cycle

I3 completes after 13th cycle

I4 completes after 15th cycle

In second iteration

I1 completes after 16th cycle

I2 completes after 18th cycle

I3 completes after 21th cycle

I4 completes after 23th cycle

Answer : B

0
In second iteration I2 completes in 20th cycle and not 18th cycle. So we have to go with 30 only as 25 is not present in options.
0
23 is not correct u will get 25 but as this is not in the options loop level parallelism wont work here and u have to go drctly by 2*15=30
6 votes

ans 23

0
Only after excution of I4 , the loop enter next iteration right?

For one iteration I get 15. So for two iterations it would be 30 only right?
0
Is there any reference to this concept in Hamacher or stallings or any other book?
0
@rahul sharma 5

This is loop level parallelism . you can read this topic .
0

@Anil Khatri @Nithish @rahul sharma 5 @Bikram What if we have only Single Stage Buffer

By default, we need to take Multiple Stage Buffer

0
What if we have only Single Stage Buffer ?

 

https://www.electronics-tutorials.ws/logic/logic_9.html
1

Thanks @Bikram Sir. All my doubts are cleared.

0
@Anil Khatri.

Your table is wrong. For I2, Stage 1 it is being scheduled at time = 8 and getting completed at t = 9.
2 votes

here is solution::

0
I4 cycle 6
is wrong filled
0 votes

http://www.techtud.com/example/consider-4-stage-pipeline-processor

i think this link will help you in much more clear way

Answer:

Related questions

17 votes
2 answers
1
5.1k views
Consider a $4$-way set associative cache (initially empty) with total $16$ cache blocks. The main memory consists of $256$ blocks and the request for memory blocks are in the following order: $0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155.$ Which one of the following memory block will NOT be in cache if LRU replacement policy is used? $3$ $8$ $129$ $216$
asked Sep 22, 2014 in CO and Architecture Kathleen 5.1k views
21 votes
4 answers
2
7.3k views
A CPU generally handles an interrupt by executing an interrupt service routine: As soon as an interrupt is raised. By checking the interrupt register at the end of fetch cycle. By checking the interrupt register after finishing the execution of the current instruction. By checking the interrupt register at fixed time intervals.
asked Sep 22, 2014 in CO and Architecture Kathleen 7.3k views
2 votes
0 answers
3
546 views
Consider this question and its selected answer: https://gateoverflow.in/3690/gate2004-it-47 And this question: https://gateoverflow.in/1314/gate2009-28 Both questions are somewhat similar. In the first one's answer, instruction $I_1$ (when i = 2), the $S_2$ stage is ... there any point that I am missing? PS. From where can I study this? Hamacher book doesn't contain pipelining in this much detail.
asked Nov 4, 2017 in CO and Architecture Rishabh Gupta 2 546 views
0 votes
1 answer
4
976 views
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 I1 S1 S1 S2 S3 S4 I2 S1 S2 S2 S2 S3 S3 S4 S4 I3 S1 S1 X S2 X S3 X S4 S4 S4 I4 S1 X S2 S2 S3 S3 X S4 S4 I1 S1 S1 S2 X X S3 X S4 I2 S1 X X S2 S2 S2 S3 S3 S4 S4 ... S4 S4 S4 I4 S1 X S2 S2 S3 S3 X S4 S4 Given ans is 23 and my ans 25.Where i'm going wrong? It is also gate previous year question :https://gateoverflow.in/1314/gate2009-28
asked Dec 31, 2016 in CO and Architecture Prajwal Bhat 976 views
...