The Gateway to Computer Science Excellence
First time here? Checkout the FAQ!
x
+43 votes
9.1k views

Consider an instruction pipeline with five stages without any branch prediction:

Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI and WO are $\text{5 ns, 7 ns, 10 ns, 8 ns and 6 ns},$ respectively. There are intermediate storage buffers after each stage and the delay of each buffer is $1\ ns.$ A program consisting of $12$ instructions $\text{I1, I2, I3,}\ldots,\text{ I12}$ is executed in this pipelined processor. Instruction $\text{I4}$ is the only branch instruction and its branch target is $\text{I9}.$ If the branch is taken during the execution of this program, the time (in ns) needed to complete the program is

  1.  $132$   
  2.  $165$ 
  3.  $176$
  4.  $328$
asked in CO & Architecture by (209 points)
edited by | 9.1k views
0

There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns.

What is the significance of this statement in given problem?

Where are we using this information?

0

Hi @Arjun Sir,

First of all thanks for sharing following example -->

JZ LOOP
ADD X, Y

Here we can know whether we have to do a Jump or not only after the Execute stage of the JZ (Jump on Zero) instruction. Similar is the case for any conditional branch instruction.

Good point but in case of unconditional branch i think till ID phase decision could be taken. Please correct me if i am wrong.

+1

Chhotu yes in case of unconditional branch by decoding itself we can take decision IFF the operand fetch phase is merged with the decode phase then we can directly get the location (target ) by decode phase itself hence once this decode phase is done we can flush the pipleline and restart from I9

0
I think following is the generalized formula. Anyone confirm.

No of cycles = No. of instructions actually appeared in the datapath + No of stalls + No. of stages - 1

$N = 8 (I_{1,2,3,4,9,10,11,12}) \ +\ 3 \ +\ 5 -1 = 15$

$=> T_n = 15*(10+1) = 165$
0
@Venkat Sai

if in question it is given that it is conditional branch instruction then no of instructions=9

if in question it is given that it is unconditional branch instruction then no of instruction =11

but since nothing given therefore we assume target address will be available in last stage only thefore no. of instructions=12

if i m wrong then please correct me

4 Answers

+64 votes
Best answer
After pipelining we have to adjust the stage delays such that no stage will be waiting for another to ensure smooth pipelining (continuous flow). Since we can not easily decrease the stage delay, we can increase all the stage delays to the maximum delay possible. So, here maximum delay is $10$ ns. Buffer delay given is 1ns. So, each stage takes $11$ ns in total.

FI of I9 can start only after the EI of I4. So, the total execution time will be

$$15 \times 11 = 165$$
$$\small \begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \hline
&\bf{t_1}&\bf{t_2}&\bf{t_3}&\bf{t_4}&\bf{t_5}&\bf{t_6}&\bf{t_7}&\bf{t_8}&\bf{t_9}&\bf{t_{10}}&\bf{t_{11}}&\bf{t_{12}}&\bf{t_{13}}&\bf{t_{14}}&\bf{t_{15}}\\
\hline
\textbf{I1}&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\
\textbf{I2}&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\
\textbf{I3}&&&\text{FI}&\text{DI}&\text{FO}
&\text{EI}&\text{WO}\\
\textbf{I4}&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\
&&&&&\color{red}{\text{stall}}\\
&&&&&&\color{red}{\text{stall}}\\
&&&&&&&\color{red}{\text{stall}}\\
\textbf{I9}&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\
\textbf{I10}&&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\
\textbf{I11}&&&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\
\textbf{I12}&&&&&&&&&&&\text{FI}&\text{DI}&\text{FO}&\text{EI}&\text{WO}\\
\hline\end{array}$$
answered by Boss (18.1k points)
edited by
0
Pls Explain I8 are not  added stall Cycle if you have any reason ? Briefly Explain Anyone ?
0

gatecse Veteran:

in the question they did'nt mention in which state of I4 target address is available.

when without branch prediction given in question ,then we assume that target address available at the last stage but then stalls present will be 4.it also fetched I8 also.

0
Can somebody clarify what if the instruction is unconditional jump??
0
@arjun sir how you saying branch address availiable after EI stage..?
0

@Nils

Fetch will happen in T8. Even if there is a RAW hazard between I4 and I9, I9 will get the updated data at decode phase of the instruction.

0
Why we are not using the stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively and taking only highest one?

At least for first Instruction we suppose to use it if its given.

first instruction will take 5+7+10+8+6=36 and 36+5 (buffer delay)= 41 ns and remaining instruction will take (upto 4 only) 11 ns * 3= 33. thus total time after 4th instruction execution 41+33 =74 ns.

similary, form 9 to 12 th instruction it will take 41 ns (for 9th) + 33 ns (from 10 to 12th)=74ns.

and thus total time required for execution 74 + 74 =148 ns (No option is given)

What I am saying here is that if they are giving time for each stage then why we are not using it and just solve it by assuming only highest delay.
0
It is because the highest delay in the pipeline is 10ns. What we are doing is we increase the delay of all other stages to 10ns to ensure continuous flow. now, every stage has a delay of 10ns. Refer the best answer above. He has explained it nicely.
0

Can anyone guide me why can't we do like this ?

0
Its the clock period which determines when the instruction will move to next hardware in the datapath - Not the execution time of datapath hardwares. I hope you now understand why it is 10+1 = 11 ns.
0
why FI of I9 is not taken  at  T9?
+18 votes

answer = option B

cycles in pink are stall cycles, at EI-4 it was notified to the system that instruction 9 has to be loaded next. 
We have completed execution in a total of 15 cycles where each cycle was (10+1)ns long,

Hence, answer = $15 \times 11 = 165$ns

answered by Boss (30.8k points)
0
In the given ans there would be stall for I8 too,plz clarify why only 3 stalls,.Plz explain someone.
–1

Updated.

+2
can somebody tell me pls differnce in flushing pipeline and stalling?here in this example we stall pipeline or flush pipeline .
0

Could you explain why structural hazard has not been considered?

–1
Instruction number 8 is omitted,  any particular reason?
0
Yes. Not able to understand . Could someone explain in detail ?
0
Usually when we drew timing diagram if fetch stage is 5 ns then we will place decode staage in 6th clock cycle . Here every stage is taking 1 clock cycle only. Why it is so ?
+2
Here we are stalling the pipeline not flushing. Because it is specified in question that no prediction has to be made. So we cant predict that "branch will not be taken and fetch I5". Hence to stall is the only way possible.

Flushing is done once we fetched the instruction and started executing it and later realize that it is not the next instruction to be executed. Howerver if fetched instruction is the next instruction to be executed no flushing is done and performance improves.

To stall means "not performing any operations in those cycles." It will always degrade the performance.
+6 votes
Clock Time = max stage delay + Buffer Delay

 = 10+1= 11ns

I1 - Finish at 5th clock

I2 - Finish at 6th clock

I3 - Finish at 7th clock

I4- Finish at 8th clock

Due to branching at I4 pipelining halts and starts after  EI stage of I4 and performs FI of I9 at 8th clock.

I9 - Finish at 12th clock

I10 - Finish at 13th clock

I11- Finish at 14th clock

I12 - Finish at 15 clock

Total time to complete program = 11*15= 165 ns
answered by Active (1.4k points)
0
Nice logic.No confusion.
0

Short Trick : 

Branching descion will be taken in Execute Instruction (EI) phase (4th phase) so there will be 3 stalls

first I1 will complete in 5 cycles + (I2,I3,I4,3 STALLS,I9,I10,I11,I12) WILL TAKE ONE-ONE CYCLE=15 CYCLE

cycle time=(largest cycle time + buffer delay)=10+1=11

Execution time =11*15=165

+1 vote
clock cycle time= max. stage delay+ buffer delay   

         = 10+1

         = 11 ns

clock cycle =15

execution time= 11*15=165
answered by Active (2.1k points)
Answer:

Related questions



Quick search syntax
tags tag:apple
author user:martin
title title:apple
content content:apple
exclude -tag:apple
force match +apple
views views:100
score score:10
answers answers:2
is accepted isaccepted:true
is closed isclosed:true

39,828 questions
46,802 answers
140,987 comments
58,943 users