5.8k views

Consider a 4 stage pipeline processor. The number of cycles needed by the four instructions I1, I2, I3, I4 in stages S1, S2, S3, S4 is shown below:

 S1 S2 S3 S4 I1 2 1 1 1 I2 1 3 2 2 I3 2 1 1 3 I4 1 2 2 2

What is the number of cycles needed to execute the following loop?

For (i=1 to 2) {I1; I2; I3; I4;}

1. 16
2. 23
3. 28
4. 30
retagged | 5.8k views
If inter stage buffers are used, then 23 cycles are required, else answer will be 25 cycles.
If 25 would have been an option then what would you choose ?
Then the question should state whether interstage buffers are used or not, else the question will likely to be cancelled.

Here bound of the loop are constants, therefore compiler will do the loop unrolling(If compiler won't then prefetcher will do) to increase the instruction level parallelism. And after loop unrolling 23 cycles are required for execution. Therefore correct answer would be (B).

PS: We assume the buffers between the pipeline stages can store multiple results in the form of a queue.

selected by

@Arjun sir

In the T22  & T23 cycle , I4 instruction second iteration, it should be S4 instead of 2.

Yes. It'll be corrected soon.
@Arjun sir @srestha

I am little bit confused now...

1)If we do it normally (considering as 8 instructions) 24 cycles

2)If no loop level parallelism : for iteration 1=15cycles, for iteration 2=15

Total 30 cycles

3)in option 23 (nearest to 24, less clock cycles hence better), 30 both are present. So what to choose?

4) should we answer a question in such a case?

How can we start the next iteration before the previous loop has got over? After each loop the condition has to be checked first and then only the next instruction can go. Please help me with this doubt.

@Prateek Since the loop is running for 2 times(i.e constant times), we can say that loop unrolling will be done by the compiler(loop optimization topic in compiler design). So, the code will be transformed as:

$I_1, I_2, I_3, I_4,I_1, I_2, I_3, I_4$

And loop instructions, like GOTO will be removed (by the COMPILER), so there will be no jumps and branching.

If single stage buffer is used ... is the below way of solving correct ?

Got it. Thanks
Sir, for i=2 it will be like taking branch so all earlier excution will be flush out . so when you are going to start for i=2 start instruction cycle from 15. isnt it? so ans should be 30 otherwise 23 is correct.
When can/should we use multiple stage buffers?
If no option is given ans should be 24
right?
As, in i=2 in I2, S3 should start 1 stage later, than the pic shown
this is the loop level level paralellism question but when we apply the loop level parallelism then we get 25 cycles so dis is not in option so we have to do it without loop level parallelism and the frst time loop output the result at 15 cc so total 30 cc for 2 iterations

When using the following timeline we get 23 cycles:

In the 6th cycle, instruction 4 is entering stage 1 while instruction 3 has not started stage 2. This requires additional output buffer for stage 1 and we can assume this is available. (This additional buffer is not required if all stage delays are the same).

Hence , option (b) is true.

reshown by
There's some problem here rt? S3 stages of I1 and I2
S3 stage of I1 is 1 cycle and of I2 is 2 cycles as per question.
yes, I got confused with stages and instructions.
Pipelining, if suppose 1 instruction i1 has completed s1 stage but it cant go to s2,s2 is occupied by i0 instruction Then i2 came, can it enter first stage i mean if prev instruction hv completed 1st stage but not moved to next state.. Can new instruction be fetched in that cycle
no in a normal pipeline.

Ok,  in above diag,  in 6th cycle we are doing it introducing i4 while i3 hasn't  gone to next stage

In this ques if we take that approach,  we will get diff ans i think i4 will be introduced in 7th clock.

but why cant we use that stage if its free?cant we just use intermediate buffers to store prev instruction output. We will require those buffers only at end of that stage for new instruction.

Ok so i hv read discussion regarding this doubt in below ans.. So in which cases we can use that particular stage.. As we are using here

Process-2 can't get into slot-9 untill it completes first time!

@Arjun sir,

Why loop enrolling is not used in https://gateoverflow.in/3690/gate2004-it-47

Concept for solving this question:

Stage Si is allocated to instruction Ii iff

1. Eligibility: Ii must have completed Si-1 ... X time

2. Availability: Ii-1 must have released Si ... Y time

Time of allocation =  Max(X,Y)

In first iteration:

I1 completes after 5th cycle

I2 completes after 10th cycle

I3 completes after 13th cycle

I4 completes after 15th cycle

In second iteration

I1 completes after 16th cycle

I2 completes after 18th cycle

I3 completes after 21th cycle

I4 completes after 23th cycle

In second iteration I2 completes in 20th cycle and not 18th cycle. So we have to go with 30 only as 25 is not present in options.
23 is not correct u will get 25 but as this is not in the options loop level parallelism wont work here and u have to go drctly by 2*15=30

ans 23

Only after excution of I4 , the loop enter next iteration right?

For one iteration I get 15. So for two iterations it would be 30 only right?
Is there any reference to this concept in Hamacher or stallings or any other book?
@rahul sharma 5

This is loop level parallelism . you can read this topic .

here is solution::

I4 cycle 6
is wrong filled
–1 vote
Correct answer is 23 Cycles. here loop level parallelism used.

As in the given question we are having instruntion with the given cycle per section so we have to calculate the total clock.

The given loop is executing 2 times so it will be double of what we get in 1st LOOP. 1 Loop requires 15 cycle. Loop is excuting 2 times so cycle required will be = 2*15 = 30

So Correct answer is 30 (D).

loop execution for i=0