13.2k views

Consider an instruction pipeline with five stages without any branch prediction:

Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are $\text{5 ns, 7 ns, 10 ns, 8 ns and 6 ns},$ respectively. There are intermediate storage buffers after each stage and the delay of each buffer is $1\ ns.$ A program consisting of $12$ instructions $\text{I1, I2, I3,}\ldots,\text{ I12}$ is executed in this pipelined processor. Instruction $\text{I4}$ is the only branch instruction and its branch target is $\text{I9}.$ If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is

1.  $132$
2.  $165$
3.  $176$
4.  $328$

edited | 13.2k views
0

There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns.

What is the significance of this statement in given problem?

Where are we using this information?

0

Hi @Arjun Sir,

First of all thanks for sharing following example -->

JZ LOOP

Here we can know whether we have to do a Jump or not only after the Execute stage of the JZ (Jump on Zero) instruction. Similar is the case for any conditional branch instruction.

Good point but in case of unconditional branch i think till ID phase decision could be taken. Please correct me if i am wrong.

+1

Chhotu yes in case of unconditional branch by decoding itself we can take decision IFF the operand fetch phase is merged with the decode phase then we can directly get the location (target ) by decode phase itself hence once this decode phase is done we can flush the pipleline and restart from I9

0
I think following is the generalized formula. Anyone confirm.

No of cycles = No. of instructions actually appeared in the datapath + No of stalls + No. of stages - 1

$N = 8 (I_{1,2,3,4,9,10,11,12}) \ +\ 3 \ +\ 5 -1 = 15$

$=> T_n = 15*(10+1) = 165$
0
@Venkat Sai

if in question it is given that it is conditional branch instruction then no of instructions=9

if in question it is given that it is unconditional branch instruction then no of instruction =11

but since nothing given therefore we assume target address will be available in last stage only thefore no. of instructions=12

if i m wrong then please correct me
0
Why we assume branch decision on ex stage???

0

Why Instruction I8 is not considered in Instruction cycle ? The question should not mention so Please clear in topics Very deep.  @Arjun Sir

0

@ because it is the best case, that we come to know about branch decision in EX stage

0
Always count for branch instruction separately. first instruction=5*11 now remaining (I2 I3 I10 I11 I12 I4)=6*11 and at last branch instruction will take 4*11=(5+6+4)*11=165

After pipelining we have to adjust the stage delays such that no stage will be waiting for another to ensure smooth pipelining (continuous flow). Since we can not easily decrease the stage delay, we can increase all the stage delays to the maximum delay possible. So, here maximum delay is $10$ ns. Buffer delay given is 1ns. So, each stage takes $11$ ns in total.

FI of I9 can start only after the EI of I4. So, the total execution time will be
$$15 \times 11 = 165$$
$$\small \begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \hline &\bf{t_1}&\bf{t_2}&\bf{t_3}&\bf{t_4}&\bf{t_5}&\bf{t_6}&\bf{t_7}&\bf{t_8}&\bf{t_9}&\bf{t_{10}}&\bf{t_{11}}&\bf{t_{12}}&\bf{t_{13}}&\bf{t_{14}}&\bf{t_{15}}\\ \hline \textbf{I1}&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I2}&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I3}&&&\text{FI}&\text{DI}&\text{FO} &\text{EI}&\text{WO}\\ \textbf{I4}&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ &&&&&\color{red}{\text{stall}}\\ &&&&&&\color{red}{\text{stall}}\\ &&&&&&&\color{red}{\text{stall}}\\ \textbf{I9}&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I10}&&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I11}&&&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I12}&&&&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \hline\end{array}$$

Correct Answer: $B$
by Boss (16.1k points)
edited
0
For this kind of questions should we always draw space time diagram to solve?

Can't we use any direct formula?

Bit confused about when to use [1 + stall frequency*stall cycle]*d formula?

+3
Thing here is we don't know what is actually practical there in architecture- we are in abstract world and concrete world is kind of vague. So, we believe certain things given in book and understand them. If you happen to work in Intel or AMD, you will see how different things are - whatever we learn in book are outdated long back but they are still the basics. As far as GATE questions are concerned they usually come from the most common stuffs given in standard books- if they ask otherwise as has happened sometimes - just be happy that everyone will be wasting time on it, and you can solve other questions :P
+1

Actually im bit confused with this computer architecture ,

So much confusion with this . write through write back dma pipeline anyways thanx :)

+1

Actually to do with that- that is allow different stage delays with pipelining processor must use different clocks- which is not something common- I don't know if any exist. And doing so doesn't add any advantage also as the final output depends on the slowest stage anyway- it doesn't really make sense to do some stage faster as the slowest stage is there as bottle neck. So, for simplicity (I guess) all stages are done using the same clock.

http://web.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/pipe_title.html

0

in the question it is not mentioned that whether it is a synchronous or asynchronous pipeline , how do  we conclude this is a synchronous pipeline ??

i read all the cmnts but still need help ?

0
Why operand forwarding is not applied..? Only when explicitly given in question to take into account operand forwarding then only it is applied.Is it so...?
+2
It is not mentioned as WHEN IS THE TARGET ADDRESS OF INSTRUCTION evaluated. Generally it is done in DI stage but here you have done that in Execution phase of  I4. Why so . Is it implicitly assumed here .if yes why ??
0
small doubt ... in the question it is not mentioned that whether it is a synchronous or asynchronous pipeline... so why are we taking the cycle time as 11ns (max stage delay) ?
+1

It is mentioned in the question itself
Consider an instruction pipeline with five stages "without any branch prediction"  i.e. target address is available after completion of instruction.

0
How can operand forwarding be applied here?
+1
yes sir, it cannot be applied. I had this doubt earlier. But now, i got this. :)
+1
"Without any branch prediction" should mean that target address is availabe only after the completion of the instruction. ie. after WO stage. Why isnt that considered here?
+2

Why should the WO stage be completed? After Execution stage, target is known- this is not prediction. In branch prediction, based on the history of the branch, further instructions will be fetched. During successful prediction, there will be no branch overhead. On a prediction failure, the pipeline must be flushed.

+1
Got it. Thanks. :)
0
we should have considered stalls due to stage delays itself right?

when EI and FO execute in parallel the pipeline stalls for 2ns. Same for EI and WO.
0
0
if EI stage is being executed in 1 instruction and FO in the next one parallely since delay of FO is 2ns more than EI pipeline will stall for 2ns
0

Why is the maximum delay being considered in Clk8, Clk 9, Clk 10 and Clk 15? Also why is it considered in Clk1, Clk2, Clk3?
{Since in those clock cycles, EI stage is not functioning, hence the time for those clock cycles should be taken as the maximum delay of the stages which are active in those cycles, isn't it?}

+1
It is not mentioned as WHEN IS THE TARGET ADDRESS OF INSTRUCTION evaluated. Generally it is done in DI stage but here you have done that in Execution phase of I4. Why so . Is it implicitly assumed here .if yes why ?? ans : pipeline with five stages "without any branch prediction" i.e. target address is available after completion of instruction. sir it means that I9 should be fetched after EI stage of I4 instruction.?? plzz clear my doubt..
0

sir , why  FI of I9 can start only after the EI of I4??

+1
JZ LOOP

Here we can know whether we have to do a Jump or not only after the Execute stage of the JZ (Jump on Zero) instruction. Similar is the case for any conditional branch instruction.
0
k sir, means in DI stage we can know only wt type of operation are involved in the instruction..? but in case of conditional branch instruction . BRANCh is know to be in EI stage only.?? am i right.??
+1
Yes.
0

In the given ans there would be stall for I8 too,plz clarify why only 3 stalls,.Plz explain someone.

0
Hi ,

How is that clock cycles are 15??
0
Instruction number 12 WO will be at T15 which is not shown due to insufficient space

so total number of clock =15
0

When the 1st instruction is in FI stage, why are we considering that it is taking 11 ns of time and not 7 ns, because FI is taking 7 ns? Do we assume a time of 11 ns for all the stages even though an instruction is not being executed in the stage of 11 ns?

+1
How to know when the branch address is resolved whether during Execute phase or Decode phase since it is not mentioned explicity in the question??
+1
@Gaurav yes, once pipelined all stages take same time or else synchronization of stages won't work.

@sushmita I assumed conditional branch since it is mentioned "branch was taken during execution" as for an uncoditional branch it will be taken everytime. And for conditional branch, condition will be evaluated only during EI stage.
0
Thanx Arjun sir.
0
@Arjun Sir
Why we have to wait for EX stage in conditional branch ?
Why can't we move branch decision logic to ID stage only in Conditional branches also ?
please see side 11 here http://courses.cs.vt.edu/cs2506/Spring2013/Notes/L13.BranchPrediction.pdf
0
@Arjun sir

Why we r not taking branch instruction in the EX stage as it is mentioned in the ques branch is taken during the execution state.
0
becz it is not given in the question
0
Sir, Why u are not showing about Instructions no. 5,6,7,8 what about these? what happen about these..
+3

Instruction I4 is the only branch instruction and its branch target is I9. If the branch is taken during the execution of this program

+1
if the question was like that the branch is NOT TAKEN would the execution finish at I8 or we will go till the I12 instruction?
+4
Total instruction executed will be 1 to 4 and then 9 to 12.Whihc is total 8.Now because of I4 is the branch and its target will be available in 4th stage.So 3 more instructions(NOP) will be fetched.Which means total 11 instructions to be executed by pipeline.

Now we know Time taken by pipeline will be k+n-1 cycles.which is 5+11-1=15 cycles

Cycle time is 11 ns ,so total time would be 11*15=165
0
Sir,

Can't we push I9 after decoding the instruction I4.Why should I wait till EI of I4?
0
Putting aside the options, what about the delay due to branch? wont 1.25nsec be added to 165nsec due to pipeline stall? Mr. @Arjun?
0

@Arjun vetran

Sir, In question there is not mention about operator forwarding.but still we fetch I9 in T8.I think it should be fetch in T9.

0
Pls Explain I8 are not  added stall Cycle if you have any reason ? Briefly Explain Anyone ?
0

gatecse Veteran:

in the question they did'nt mention in which state of I4 target address is available.

when without branch prediction given in question ,then we assume that target address available at the last stage but then stalls present will be 4.it also fetched I8 also.

0
Can somebody clarify what if the instruction is unconditional jump??
+1
@arjun sir how you saying branch address availiable after EI stage..?
0

@Nils

Fetch will happen in T8. Even if there is a RAW hazard between I4 and I9, I9 will get the updated data at decode phase of the instruction.

0
Why we are not using the stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively and taking only highest one?

At least for first Instruction we suppose to use it if its given.

first instruction will take 5+7+10+8+6=36 and 36+5 (buffer delay)= 41 ns and remaining instruction will take (upto 4 only) 11 ns * 3= 33. thus total time after 4th instruction execution 41+33 =74 ns.

similary, form 9 to 12 th instruction it will take 41 ns (for 9th) + 33 ns (from 10 to 12th)=74ns.

and thus total time required for execution 74 + 74 =148 ns (No option is given)

What I am saying here is that if they are giving time for each stage then why we are not using it and just solve it by assuming only highest delay.
0
It is because the highest delay in the pipeline is 10ns. What we are doing is we increase the delay of all other stages to 10ns to ensure continuous flow. now, every stage has a delay of 10ns. Refer the best answer above. He has explained it nicely.
+1

Can anyone guide me why can't we do like this ?

0
Its the clock period which determines when the instruction will move to next hardware in the datapath - Not the execution time of datapath hardwares. I hope you now understand why it is 10+1 = 11 ns.
0
why FI of I9 is not taken  at  T9?
0

we are not doing it like this becoz pipeline's time is decided by its slowest unit!!

if it is still not clear to u, then tell me ill give u an example;

0
HOW TO GET 165  I8 IS NOT COUNTED IN SOLUTION
TOTAL CYCLE ARE 16 ->16 *11=176 ANS PLS ELABORATE `

0
after completing the branch instruction you have to come back so then why the answer is b
0
hahah
+2

cycles in pink are stall cycles, at EI-4 it was notified to the system that instruction 9 has to be loaded next.
We have completed execution in a total of 15 cycles where each cycle was (10+1)ns long,

Hence, answer = $15 \times 11 = 165$ns

by Boss (30.5k points)
0
In the given ans there would be stall for I8 too,plz clarify why only 3 stalls,.Plz explain someone.
–1

Updated.

+2
can somebody tell me pls differnce in flushing pipeline and stalling?here in this example we stall pipeline or flush pipeline .
0

### Could you explain why structural hazard has not been considered?

0
Instruction number 8 is omitted,  any particular reason?
0
Yes. Not able to understand . Could someone explain in detail ?
0
Usually when we drew timing diagram if fetch stage is 5 ns then we will place decode staage in 6th clock cycle . Here every stage is taking 1 clock cycle only. Why it is so ?
+2
Here we are stalling the pipeline not flushing. Because it is specified in question that no prediction has to be made. So we cant predict that "branch will not be taken and fetch I5". Hence to stall is the only way possible.

Flushing is done once we fetched the instruction and started executing it and later realize that it is not the next instruction to be executed. Howerver if fetched instruction is the next instruction to be executed no flushing is done and performance improves.

To stall means "not performing any operations in those cycles." It will always degrade the performance.
0
why you dont consider the I8 i am totally confused for this
0
How do you know that
at EI-4 it was notified to the system that instruction 9 has to be loaded next.
Plz explain.
Clock Time = max stage delay + Buffer Delay

= 10+1= 11ns

I1 - Finish at 5th clock

I2 - Finish at 6th clock

I3 - Finish at 7th clock

I4- Finish at 8th clock

Due to branching at I4 pipelining halts and starts after  EI stage of I4 and performs FI of I9 at 8th clock.

I9 - Finish at 12th clock

I10 - Finish at 13th clock

I11- Finish at 14th clock

I12 - Finish at 15 clock

Total time to complete program = 11*15= 165 ns
by Active (2.2k points)
0
Nice logic.No confusion.
+4

Short Trick :

Branching descion will be taken in Execute Instruction (EI) phase (4th phase) so there will be 3 stalls

first I1 will complete in 5 cycles + (I2,I3,I4,3 STALLS,I9,I10,I11,I12) WILL TAKE ONE-ONE CYCLE=15 CYCLE

cycle time=(largest cycle time + buffer delay)=10+1=11

Execution time =11*15=165

+1 vote
clock cycle time= max. stage delay+ buffer delay

= 10+1

= 11 ns

clock cycle =15

execution time= 11*15=165
by Active (1.8k points)
After pipelining we have to adjust the stage delays such that no stage will be waiting for another to ensure smooth pipelining (continuous flow). Since we can not easily decrease the stage delay, we can increase all the stage delays to the maximum delay possible. So, here maximum delay is 1010 ns. Buffer delay given is 1ns. So, each stage takes 1111 ns in total.

FI of I9 can start only after the EI of I4. So, the total execution time will be

15×11=165ns
by (355 points)
0
how you got 15

.......

by Boss (34.6k points)
0
How you identified the branch instruction is  conditional branch (like if else).
if it was a branch instruction like a function call  ?
0
In question it clearlly mention I4 is a branch instruction.
0
Does the question need more clarity about the branch instruction ?

1
2