39,739 views

Consider an instruction pipeline with five stages without any branch prediction:

Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are $\text{5 ns, 7 ns, 10 ns, 8 ns and 6 ns},$ respectively. There are intermediate storage buffers after each stage and the delay of each buffer is $1\ \text{ns}.$ A program consisting of $12$ instructions $\text{I1, I2, I3,}\ldots,\text{ I12}$ is executed in this pipelined processor. Instruction $\text{I4}$ is the only branch instruction and its branch target is $\text{I9}.$ If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is

1.  $132$
2.  $165$
3.  $176$
4.  $328$

@ because it is the best case, that we come to know about branch decision in EX stage

edited
Always count for branch instruction separately. first instruction=5 Clock, now remaining (I2 I3 I10 I11 I12 I4)=6 clock and at last branch instruction will take 4 clock.

Total=(5+6+4)*11=165 ns
Because during execution only we came to know where we have to branch. During execution  effective address for the next instruction will be calculated and loaded into PC

After pipelining we have to adjust the stage delays such that no stage will be waiting for another to ensure smooth pipelining (continuous flow). Since we can not easily decrease the stage delay, we can increase all the stage delays to the maximum delay possible. So, here maximum delay is $10$ ns. Buffer delay given is $1$ ns. So, each stage takes $11$ ns in total.

FI of $\text{I9}$ can start only after the EI of $\text{I4}.$ So, the total execution time will be
$$15 \times 11 = 165$$
$$\small \begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \hline &\bf{t_1}&\bf{t_2}&\bf{t_3}&\bf{t_4}&\bf{t_5}&\bf{t_6}&\bf{t_7}&\bf{t_8}&\bf{t_9}&\bf{t_{10}}&\bf{t_{11}}&\bf{t_{12}}&\bf{t_{13}}&\bf{t_{14}}&\bf{t_{15}}\\ \hline \textbf{I1}&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I2}&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I3}&&&\text{FI}&\text{DI}&\text{FO} &\text{EI}&\text{WO}\\ \textbf{I4}&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ &&&&&\color{red}{\text{stall}}\\ &&&&&&\color{red}{\text{stall}}\\ &&&&&&&\color{red}{\text{stall}}\\ \textbf{I9}&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I10}&&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I11}&&&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \textbf{I12}&&&&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\ \hline\end{array}$$

Correct Answer: $B$
by

There is no branch prediction used. So how can $I9$ start after the EI of $I4$. It should start after $WO$ right? So we would get $4 stalls$ instead of $3$. Answer should be : $16*11 = 176 ns.$

@Abhrajyoti00 bro we are starting I9 after EI because it is clearly mentioned in question that we have to branch during the execution of the program .

@abir_banerjee That’s not exactly correct. The meaning of that sentence is we have a conditional branch instruction like “Jump on Carry” and the condition is true during the execution of the program and thus the branch is taken.

@Abhrajyoti00 Even without a branch predictor can’t we start I9 as soon as we know it is the branch target? Which pipeline feature facilitates this?

cycles in pink are stall cycles, at EI-4 it was notified to the system that instruction 9 has to be loaded next.
We have completed execution in a total of 15 cycles where each cycle was (10+1)ns long,

Hence, answer = $15 \times 11 = 165$ns

Here we are stalling the pipeline not flushing. Because it is specified in question that no prediction has to be made. So we cant predict that "branch will not be taken and fetch I5". Hence to stall is the only way possible.

Flushing is done once we fetched the instruction and started executing it and later realize that it is not the next instruction to be executed. Howerver if fetched instruction is the next instruction to be executed no flushing is done and performance improves.

To stall means "not performing any operations in those cycles." It will always degrade the performance.
why you dont consider the I8 i am totally confused for this
How do you know that
at EI-4 it was notified to the system that instruction 9 has to be loaded next.
Plz explain.
Clock Time = max stage delay + Buffer Delay

= 10+1= 11ns

I1 - Finish at 5th clock

I2 - Finish at 6th clock

I3 - Finish at 7th clock

I4- Finish at 8th clock

Due to branching at I4 pipelining halts and starts after  EI stage of I4 and performs FI of I9 at 8th clock.

I9 - Finish at 12th clock

I10 - Finish at 13th clock

I11- Finish at 14th clock

I12 - Finish at 15 clock

Total time to complete program = 11*15= 165 ns

Nice logic.No confusion.

Short Trick :

Branching descion will be taken in Execute Instruction (EI) phase (4th phase) so there will be 3 stalls

first I1 will complete in 5 cycles + (I2,I3,I4,3 STALLS,I9,I10,I11,I12) WILL TAKE ONE-ONE CYCLE=15 CYCLE

cycle time=(largest cycle time + buffer delay)=10+1=11

Execution time =11*15=165

clock cycle time= max. stage delay+ buffer delay

= 10+1

= 11 ns

clock cycle =15

execution time= 11*15=165
by