The Gateway to Computer Science Excellence
First time here? Checkout the FAQ!
+26 votes

Consider a 4 stage pipeline processor. The number of cycles needed by the four instructions I1, I2, I3, I4 in stages S1, S2, S3, S4 is shown below:

  S1 S2 S3 S4
I1 2 1 1 1
I2 1 3 2 2
I3 2 1 1 3
I4 1 2 2 2


What is the number of cycles needed to execute the following loop?

For (i=1 to 2) {I1; I2; I3; I4;}

  1. 16
  2. 23
  3. 28
  4. 30
asked in CO & Architecture by Veteran (59.4k points)
retagged by | 6.3k views
If inter stage buffers are used, then 23 cycles are required, else answer will be 25 cycles.
If 25 would have been an option then what would you choose ?
Then the question should state whether interstage buffers are used or not, else the question will likely to be cancelled.

9 Answers

+28 votes
Best answer

Here bound of the loop are constants, therefore compiler will do the loop unrolling(If compiler won't then prefetcher will do) to increase the instruction level parallelism. And after loop unrolling 23 cycles are required for execution. Therefore correct answer would be (B).

PS: We assume the buffers between the pipeline stages can store multiple results in the form of a queue.

answered by Loyal (5.6k points)
selected by

@Arjun sir

In the T22  & T23 cycle , I4 instruction second iteration, it should be S4 instead of 2.

Yes. It'll be corrected soon.
@Arjun sir @srestha

I am little bit confused now...

1)If we do it normally (considering as 8 instructions) 24 cycles

2)If no loop level parallelism : for iteration 1=15cycles, for iteration 2=15

Total 30 cycles

3)in option 23 (nearest to 24, less clock cycles hence better), 30 both are present. So what to choose?

4) should we answer a question in such a case?

Please help :(
How can we start the next iteration before the previous loop has got over? After each loop the condition has to be checked first and then only the next instruction can go. Please help me with this doubt.

@Prateek Since the loop is running for 2 times(i.e constant times), we can say that loop unrolling will be done by the compiler(loop optimization topic in compiler design). So, the code will be transformed as:

$I_1, I_2, I_3, I_4,I_1, I_2, I_3, I_4 $

And loop instructions, like GOTO will be removed (by the COMPILER), so there will be no jumps and branching.


If single stage buffer is used ... is the below way of solving correct ?

Got it. Thanks
Sir, for i=2 it will be like taking branch so all earlier excution will be flush out . so when you are going to start for i=2 start instruction cycle from 15. isnt it? so ans should be 30 otherwise 23 is correct.
When can/should we use multiple stage buffers?
If no option is given ans should be 24
As, in i=2 in I2, S3 should start 1 stage later, than the pic shown
+16 votes
this is the loop level level paralellism question but when we apply the loop level parallelism then we get 25 cycles so dis is not in option so we have to do it without loop level parallelism and the frst time loop output the result at 15 cc so total 30 cc for 2 iterations
answered by Active (2.5k points)
+12 votes

When using the following timeline we get 23 cycles:



In the 6th cycle, instruction 4 is entering stage 1 while instruction 3 has not started stage 2. This requires additional output buffer for stage 1 and we can assume this is available. (This additional buffer is not required if all stage delays are the same).  

Hence , option (b) is true.

answered by Loyal (5.9k points)
reshown by
There's some problem here rt? S3 stages of I1 and I2
S3 stage of I1 is 1 cycle and of I2 is 2 cycles as per question.
yes, I got confused with stages and instructions.
Pipelining, if suppose 1 instruction i1 has completed s1 stage but it cant go to s2,s2 is occupied by i0 instruction Then i2 came, can it enter first stage i mean if prev instruction hv completed 1st stage but not moved to next state.. Can new instruction be fetched in that cycle
no in a normal pipeline.

Ok,  in above diag,  in 6th cycle we are doing it introducing i4 while i3 hasn't  gone to next stage

In this ques if we take that approach,  we will get diff ans i think i4 will be introduced in 7th clock. 

  but why cant we use that stage if its free?cant we just use intermediate buffers to store prev instruction output. We will require those buffers only at end of that stage for new instruction. 

Ok so i hv read discussion regarding this doubt in below ans.. So in which cases we can use that particular stage.. As we are using here


Process-2 can't get into slot-9 untill it completes first time!

@Arjun sir,

Why loop enrolling is not used in

+7 votes

Concept for solving this question:

Stage Si is allocated to instruction Ii iff

1. Eligibility: Ii must have completed Si-1 ... X time

2. Availability: Ii-1 must have released Si ... Y time

Time of allocation =  Max(X,Y)

In first iteration:

I1 completes after 5th cycle

I2 completes after 10th cycle

I3 completes after 13th cycle

I4 completes after 15th cycle

In second iteration

I1 completes after 16th cycle

I2 completes after 18th cycle

I3 completes after 21th cycle

I4 completes after 23th cycle

Answer : B

answered by Boss (13.5k points)
In second iteration I2 completes in 20th cycle and not 18th cycle. So we have to go with 30 only as 25 is not present in options.
23 is not correct u will get 25 but as this is not in the options loop level parallelism wont work here and u have to go drctly by 2*15=30
+5 votes

ans 23

answered by Active (2.4k points)
Only after excution of I4 , the loop enter next iteration right?

For one iteration I get 15. So for two iterations it would be 30 only right?
Is there any reference to this concept in Hamacher or stallings or any other book?
@rahul sharma 5

This is loop level parallelism . you can read this topic .
+2 votes

here is solution::

answered by Loyal (8.6k points)
I4 cycle 6
is wrong filled
0 votes

i think this link will help you in much more clear way

answered by (445 points)
–1 vote
Correct answer is 23 Cycles. here loop level parallelism used.
answered by (265 points)
–2 votes

As in the given question we are having instruntion with the given cycle per section so we have to calculate the total clock.

The given loop is executing 2 times so it will be double of what we get in 1st LOOP. 1 Loop requires 15 cycle. Loop is excuting 2 times so cycle required will be = 2*15 = 30

So Correct answer is 30 (D).

 loop execution for i=0


answered by (191 points)

Related questions

Quick search syntax
tags tag:apple
author user:martin
title title:apple
content content:apple
exclude -tag:apple
force match +apple
views views:100
score score:10
answers answers:2
is accepted isaccepted:true
is closed isclosed:true

35,528 questions
42,803 answers
42,166 users