8.1k views

Consider an instruction pipeline with five stages without any branch prediction:

Fetch Instruction (FI), Decode Instruction (DI), Fetch Operand (FO), Execute
Instruction (EI) and Write Operand (WO). The stage delays for FI, DI, FO, EI
and WO are $\text{5 ns, 7 ns, 10 ns, 8 ns and 6 ns},$ respectively.
There are intermediate storage buffers after each stage and the delay of each
buffer is $1\ ns.$ A program consisting of $12$ instructions $\text{I1, I2, I3, …, I12}$
is executed in this pipelined processor. Instruction $\text{I4}$ is the only branch instruction
and its branch target is $\text{I9}.$ If the branch is taken during the execution of this
program, the time (in ns) needed to complete the program is

1.  $132$
2.  $165$
3.  $176$
4.  $328$
edited | 8.1k views
0

There are intermediate storage buffers after each stage and the delay of each buffer is 1 ns.

What is the significance of this statement in given problem?

Where are we using this information?

0

Hi @Arjun Sir,

First of all thanks for sharing following example -->

JZ LOOP

Here we can know whether we have to do a Jump or not only after the Execute stage of the JZ (Jump on Zero) instruction. Similar is the case for any conditional branch instruction.

Good point but in case of unconditional branch i think till ID phase decision could be taken. Please correct me if i am wrong.

+1

Chhotu yes in case of unconditional branch by decoding itself we can take decision IFF the operand fetch phase is merged with the decode phase then we can directly get the location (target ) by decode phase itself hence once this decode phase is done we can flush the pipleline and restart from I9

0
I think following is the generalized formula. Anyone confirm.

No of cycles = No. of instructions actually appeared in the datapath + No of stalls + No. of stages - 1

$N = 8 (I_{1,2,3,4,9,10,11,12}) \ +\ 3 \ +\ 5 -1 = 15$

$=> T_n = 15*(10+1) = 165$

After pipelining we have to adjust the stage delays such that no stage will be waiting for another to ensure smooth pipelining (continuous flow). Since we can not easily decrease the stage delay, we can increase all the stage delays to the maximum delay possible. So, here maximum delay is 10ns. Buffer delay given is 1ns. So, each stage takes 11ns in total.

FI of I9 can start only after the EI of I4. So, the total execution time will be

$$15 \times 11 = 165$$

T1  T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15
I1 FI DI FO EI WO
I2   FI DI FO EI WO
I3     FI DI FO EI WO
I4       FI DI FO EI WO
stall
stall
stall
I9               FI DI FO EI WO
I10                 FI DI FO EI WO
I11                   FI DI FO EI WO
I12                     FI DI FO EI WO

selected by
0
Putting aside the options, what about the delay due to branch? wont 1.25nsec be added to 165nsec due to pipeline stall? Mr. @Arjun?
0

@Arjun vetran

Sir, In question there is not mention about operator forwarding.but still we fetch I9 in T8.I think it should be fetch in T9.

0
Pls Explain I8 are not  added stall Cycle if you have any reason ? Briefly Explain Anyone ?
0

gatecse Veteran:

in the question they did'nt mention in which state of I4 target address is available.

when without branch prediction given in question ,then we assume that target address available at the last stage but then stalls present will be 4.it also fetched I8 also.

0
Can somebody clarify what if the instruction is unconditional jump??
0
@arjun sir how you saying branch address availiable after EI stage..?
0

@Nils

Fetch will happen in T8. Even if there is a RAW hazard between I4 and I9, I9 will get the updated data at decode phase of the instruction.

0
Why we are not using the stage delays for FI, DI, FO, EI and WO are 5 ns, 7 ns, 10 ns, 8 ns and 6 ns, respectively and taking only highest one?

At least for first Instruction we suppose to use it if its given.

first instruction will take 5+7+10+8+6=36 and 36+5 (buffer delay)= 41 ns and remaining instruction will take (upto 4 only) 11 ns * 3= 33. thus total time after 4th instruction execution 41+33 =74 ns.

similary, form 9 to 12 th instruction it will take 41 ns (for 9th) + 33 ns (from 10 to 12th)=74ns.

and thus total time required for execution 74 + 74 =148 ns (No option is given)

What I am saying here is that if they are giving time for each stage then why we are not using it and just solve it by assuming only highest delay.
0
It is because the highest delay in the pipeline is 10ns. What we are doing is we increase the delay of all other stages to 10ns to ensure continuous flow. now, every stage has a delay of 10ns. Refer the best answer above. He has explained it nicely.
0

Can anyone guide me why can't we do like this ?

cycles in pink are stall cycles, at EI-4 it was notified to the system that instruction 9 has to be loaded next.
We have completed execution in a total of 15 cycles where each cycle was (10+1)ns long,

Hence, answer = $15 \times 11 = 165$ns

0
In the given ans there would be stall for I8 too,plz clarify why only 3 stalls,.Plz explain someone.
–1

Updated.

+2
can somebody tell me pls differnce in flushing pipeline and stalling?here in this example we stall pipeline or flush pipeline .
0

### Could you explain why structural hazard has not been considered?

–1
Instruction number 8 is omitted,  any particular reason?
0
Yes. Not able to understand . Could someone explain in detail ?
0
Usually when we drew timing diagram if fetch stage is 5 ns then we will place decode staage in 6th clock cycle . Here every stage is taking 1 clock cycle only. Why it is so ?
+2
Here we are stalling the pipeline not flushing. Because it is specified in question that no prediction has to be made. So we cant predict that "branch will not be taken and fetch I5". Hence to stall is the only way possible.

Flushing is done once we fetched the instruction and started executing it and later realize that it is not the next instruction to be executed. Howerver if fetched instruction is the next instruction to be executed no flushing is done and performance improves.

To stall means "not performing any operations in those cycles." It will always degrade the performance.
Clock Time = max stage delay + Buffer Delay

= 10+1= 11ns

I1 - Finish at 5th clock

I2 - Finish at 6th clock

I3 - Finish at 7th clock

I4- Finish at 8th clock

Due to branching at I4 pipelining halts and starts after  EI stage of I4 and performs FI of I9 at 8th clock.

I9 - Finish at 12th clock

I10 - Finish at 13th clock

I11- Finish at 14th clock

I12 - Finish at 15 clock

Total time to complete program = 11*15= 165 ns
0
Nice logic.No confusion.
+1 vote
clock cycle time= max. stage delay+ buffer delay

= 10+1

= 11 ns

clock cycle =15

execution time= 11*15=165