13.9k views

Consider an instruction pipeline with five stages without any branch prediction:

Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are $\text{5 ns, 7 ns, 10 ns, 8 ns and 6 ns},$ respectively. There are intermediate storage buffers after each stage and the delay of each buffer is $1\ ns.$ A program consisting of $12$ instructions $\text{I1, I2, I3,}\ldots,\text{ I12}$ is executed in this pipelined processor. Instruction $\text{I4}$ is the only branch instruction and its branch target is $\text{I9}.$ If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is

1.  $132$
2.  $165$
3.  $176$
4.  $328$

edited | 13.9k views
0

There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns.

What is the significance of this statement in given problem?

Where are we using this information?

+1

Hi @Arjun Sir,

First of all thanks for sharing following example -->

JZ LOOP

Here we can know whether we have to do a Jump or not only after the Execute stage of the JZ (Jump on Zero) instruction. Similar is the case for any conditional branch instruction.

Good point but in case of unconditional branch i think till ID phase decision could be taken. Please correct me if i am wrong.

+2

Chhotu yes in case of unconditional branch by decoding itself we can take decision IFF the operand fetch phase is merged with the decode phase then we can directly get the location (target ) by decode phase itself hence once this decode phase is done we can flush the pipleline and restart from I9

0
I think following is the generalized formula. Anyone confirm.

No of cycles = No. of instructions actually appeared in the datapath + No of stalls + No. of stages - 1

$N = 8 (I_{1,2,3,4,9,10,11,12}) \ +\ 3 \ +\ 5 -1 = 15$

$=> T_n = 15*(10+1) = 165$
0
@Venkat Sai

if in question it is given that it is conditional branch instruction then no of instructions=9

if in question it is given that it is unconditional branch instruction then no of instruction =11

but since nothing given therefore we assume target address will be available in last stage only thefore no. of instructions=12

if i m wrong then please correct me
0
Why we assume branch decision on ex stage???

0

Why Instruction I8 is not considered in Instruction cycle ? The question should not mention so Please clear in topics Very deep.  @Arjun Sir

0

@ because it is the best case, that we come to know about branch decision in EX stage

0
Always count for branch instruction separately. first instruction=5 Clock, now remaining (I2 I3 I10 I11 I12 I4)=6 clock and at last branch instruction will take 4 clock.

Total=(5+6+4)*11=165 ns

After pipelining we have to adjust the stage delays such that no stage will be waiting for another to ensure smooth pipelining (continuous flow). Since we can not easily decrease the stage delay, we can increase all the stage delays to the maximum delay possible. So, here maximum delay is $10$ ns. Buffer delay given is 1ns. So, each stage takes $11$ ns in total.

FI of I9 can start only after the EI of I4. So, the total execution time will be
$$15 \times 11 = 165$$
$$\small \begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \hline &\bf{t_1}&\bf{t_2}&\bf{t_3}&\bf{t_4}&\bf{t_5}&\bf{t_6}&\bf{t_7}&\bf{t_8}&\bf{t_9}&\bf{t_{10}}&\bf{t_{11}}&\bf{t_{12}}&\bf{t_{13}}&\bf{t_{14}}&\bf{t_{15}}\\ \hline \textbf{I1}&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I2}&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I3}&&&\text{FI}&\text{DI}&\text{FO} &\text{EI}&\text{WO}\\ \textbf{I4}&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ &&&&&\color{red}{\text{stall}}\\ &&&&&&\color{red}{\text{stall}}\\ &&&&&&&\color{red}{\text{stall}}\\ \textbf{I9}&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I10}&&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I11}&&&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I12}&&&&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \hline\end{array}$$

Correct Answer: $B$
by Boss (16.8k points)
edited
0

@Arjun vetran

Sir, In question there is not mention about operator forwarding.but still we fetch I9 in T8.I think it should be fetch in T9.

0
Pls Explain I8 are not  added stall Cycle if you have any reason ? Briefly Explain Anyone ?
+1

gatecse Veteran:

in the question they did'nt mention in which state of I4 target address is available.

when without branch prediction given in question ,then we assume that target address available at the last stage but then stalls present will be 4.it also fetched I8 also.

0
Can somebody clarify what if the instruction is unconditional jump??
+1
@arjun sir how you saying branch address availiable after EI stage..?
0

@Nils

Fetch will happen in T8. Even if there is a RAW hazard between I4 and I9, I9 will get the updated data at decode phase of the instruction.

0
Why we are not using the stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively and taking only highest one?

At least for first Instruction we suppose to use it if its given.

first instruction will take 5+7+10+8+6=36 and 36+5 (buffer delay)= 41 ns and remaining instruction will take (upto 4 only) 11 ns * 3= 33. thus total time after 4th instruction execution 41+33 =74 ns.

similary, form 9 to 12 th instruction it will take 41 ns (for 9th) + 33 ns (from 10 to 12th)=74ns.

and thus total time required for execution 74 + 74 =148 ns (No option is given)

What I am saying here is that if they are giving time for each stage then why we are not using it and just solve it by assuming only highest delay.
0
It is because the highest delay in the pipeline is 10ns. What we are doing is we increase the delay of all other stages to 10ns to ensure continuous flow. now, every stage has a delay of 10ns. Refer the best answer above. He has explained it nicely.
+1

Can anyone guide me why can't we do like this ? 0
Its the clock period which determines when the instruction will move to next hardware in the datapath - Not the execution time of datapath hardwares. I hope you now understand why it is 10+1 = 11 ns.
0
why FI of I9 is not taken  at  T9?
0

we are not doing it like this becoz pipeline's time is decided by its slowest unit!!

if it is still not clear to u, then tell me ill give u an example;

+1
HOW TO GET 165  I8 IS NOT COUNTED IN SOLUTION
TOTAL CYCLE ARE 16 ->16 *11=176 ANS PLS ELABORATE 

0
after completing the branch instruction you have to come back so then why the answer is b
0
hahah
+2

0
In conditional branch statements, the target address is computed in which stage? in Hamacher  is given as "branch condition can also be moved to decode stage to reduce branch penalty"

then in this question why are you waiting till the writeback stage to get the targeted address?
0
There will be 4 stalls ,so answer is C
0

In conditional branch instructions, condition and target address are resolved at EX' stage and bubbles are fed during its execution to the pipeline. (By default)

But we can also resolve it at ID -stage using more complex pipeline architecture( add simple ALU and computation unit at ID stage) but don't use this architecture unless otherwise stated.

0

without any branch prediction

then we should consider I5 also ????

If we don't use any method to remove control dependency, then I5 will be executed. cycles in pink are stall cycles, at EI-4 it was notified to the system that instruction 9 has to be loaded next.
We have completed execution in a total of 15 cycles where each cycle was (10+1)ns long,

Hence, answer = $15 \times 11 = 165$ns

by Boss (30.7k points)
0
In the given ans there would be stall for I8 too,plz clarify why only 3 stalls,.Plz explain someone.
–1

Updated.

+2
can somebody tell me pls differnce in flushing pipeline and stalling?here in this example we stall pipeline or flush pipeline .
0

### Could you explain why structural hazard has not been considered?

0
Instruction number 8 is omitted,  any particular reason?
0
Yes. Not able to understand . Could someone explain in detail ?
0
Usually when we drew timing diagram if fetch stage is 5 ns then we will place decode staage in 6th clock cycle . Here every stage is taking 1 clock cycle only. Why it is so ?
+2
Here we are stalling the pipeline not flushing. Because it is specified in question that no prediction has to be made. So we cant predict that "branch will not be taken and fetch I5". Hence to stall is the only way possible.

Flushing is done once we fetched the instruction and started executing it and later realize that it is not the next instruction to be executed. Howerver if fetched instruction is the next instruction to be executed no flushing is done and performance improves.

To stall means "not performing any operations in those cycles." It will always degrade the performance.
0
why you dont consider the I8 i am totally confused for this
0
How do you know that
at EI-4 it was notified to the system that instruction 9 has to be loaded next.
Plz explain.
Clock Time = max stage delay + Buffer Delay

= 10+1= 11ns

I1 - Finish at 5th clock

I2 - Finish at 6th clock

I3 - Finish at 7th clock

I4- Finish at 8th clock

Due to branching at I4 pipelining halts and starts after  EI stage of I4 and performs FI of I9 at 8th clock.

I9 - Finish at 12th clock

I10 - Finish at 13th clock

I11- Finish at 14th clock

I12 - Finish at 15 clock

Total time to complete program = 11*15= 165 ns
by Active (2.3k points)
0
Nice logic.No confusion.
+4

Short Trick :

Branching descion will be taken in Execute Instruction (EI) phase (4th phase) so there will be 3 stalls

first I1 will complete in 5 cycles + (I2,I3,I4,3 STALLS,I9,I10,I11,I12) WILL TAKE ONE-ONE CYCLE=15 CYCLE

cycle time=(largest cycle time + buffer delay)=10+1=11

Execution time =11*15=165

+1 vote
clock cycle time= max. stage delay+ buffer delay

= 10+1

= 11 ns

clock cycle =15

execution time= 11*15=165
by Active (1.8k points)
After pipelining we have to adjust the stage delays such that no stage will be waiting for another to ensure smooth pipelining (continuous flow). Since we can not easily decrease the stage delay, we can increase all the stage delays to the maximum delay possible. So, here maximum delay is 1010 ns. Buffer delay given is 1ns. So, each stage takes 1111 ns in total.

FI of I9 can start only after the EI of I4. So, the total execution time will be

15×11=165ns
by Junior (545 points)
0
how you got 15

....... by Boss (35.6k points)
0
How you identified the branch instruction is  conditional branch (like if else).
if it was a branch instruction like a function call  ?