search
Log In
57 votes
8.3k views

An instruction pipeline has five stages where each stage take 2 nanoseconds and all instruction use all five stages. Branch instructions are not overlapped. i.e., the instruction after the branch is not fetched till the branch instruction is completed. Under ideal conditions,

  1. Calculate the average instruction execution time assuming that 20% of all instructions executed are branch instruction. Ignore the fact that some branch instructions may be conditional.
  2. If a branch instruction is a conditional branch instruction, the branch need not be taken. If the branch is not taken, the following instructions can be overlapped. When 80% of all branch instructions are conditional branch instructions, and 50% of the conditional branch instructions are such that the branch is taken, calculate the average instruction execution time.
in CO and Architecture
edited by
8.3k views
8
someone answer this beautiful question please.
22

$A)\ 20\%$ Branch instruction $=4\ stalls$ $\&\ 80\%$ other instruction $=0\ stalls$ 

$Note: Branch\ penalty=last\ stage \#-1=5-1=4$

$\#\ of\ stalls/inst=0.2\times 4=0.8$

$Avg. inst\ exe.\ time=(1+\#\ of\ stalls/inst)\times cycle time=(1+0.8)\times 2=3.6ns$

$B)\ $Branch instructions are not overlapped (given in the question) means $20\%$ of un-conditional branch instruction$=4\ stalls$. $80\%$ of conditonal branch instruction is further divided into branch taken and branch not taken.

If the branch is not taken, the following instructions can be overlapped. $50\%$ of the conditional branch instructions are such that the branch is taken (given in the question)

$50\%$ of the conditional branch instructions are such that the branch is taken $=4\ stalls$

$\#\ of\ stalls/inst=0.2\times 0.2\times 4+0.2\times 0.8\times 0.5\times 4=0.16+0.32=0.48$

$Avg.inst\ exe.\ time=(1+\#\ of\ stalls/inst)\times cycle time=(1+0.48)\times 2=2.96ns$

 

Image courtesy: @@just_bhavana

 

8 Answers

64 votes
 
Best answer

Each stage is $2$ns. So, after $5$ time units each of $2$ns, the first instruction finishes (i.e., after $10$ns), in every $2$ns after that a new instruction gets finished. This is assuming no branch instructions. Now, once the pipeline is full, we can assume that the initial fill time doesn't matter our calculations and average execution time for each instruction is $2$ns assuming no branch instructions.

  1. Now, we are given that $20\%$ of instructions are branch (like JMP) and when a branch instruction is executed, no further instruction enters the pipeline. So, we can assume every $5$th instruction is a branch instruction. So, with this assumption, total time to finish $5$ instruction will be $5 * 2 + 8 = 18$ ns (as when a branch instruction enters the pipeline and before it finishes, $4$ pipeline stages will be empty totaling $4 * 2 = 8$ ns, as it is mentioned in question that the next instruction fetch starts only when branch instruction completes). And this is the same for every set of 5 instructions, and hence the average instruction execution time $= 18/5 = 3.6$ ns
     
  2. This is just a complex statement. But what we need is to identify the $\%$ of branch instructions which cause a branch to be taken as others will have no effect on the pipeline flow.
    $20\%$ of instructions are branch instructions. $80\%$ of branch instructions are conditional.
    That means .$2*.8 = 16\%$ of instructions are conditional branch instructions and it is given that $50\%$ of those result in a branch being taken.
    So, $8\%$ of instructions are conditional branches being taken and we also have $20\%$ of $20\% = 4\%$ of unconditional branch instructions which are always taken.


So, percentage of instructions where a branch is taken is $8+4 = 12\%$ instead of $20\%$ in (A) part.

So, in $100$ instructions there will be $12$ branch instructions. We can do a different calculation here as compared to (A) as $12$ is not a divisor of $100$. Each branch instruction causes a pipeline delay of $4*2 = 8$ ns. So, $12$ instructions will cause a delay of $12 * 8 = 96$ ns. For $100$ instructions, we need $100 * 2 = 200 $ ns without any delay and with delay we require $200 + 96 = 296$ ns for $100$ instructions.

So, average instruction execution time $= 296/100 = 2.96$ ns

(We can also use this method for part (A) which will give $100 * 2 + 20*8 = 360$ ns for $100$ instructions)


edited by
0
Thank You Arjun Sir for an elaborate and understandable explanation.
Could you help me with a link for such kind of pipeline problems with delays .?
23
part b)

if branch not taken following instructions can be overlapped , implies stall=0

if branch taken, it would be simply after k-1 stall = 4 stalls.

20% are branch

80% of branch are conditional ,implies 80% of 20% are conditional

which means 20% are unconditional(always takes), implies 20% of 20% are unconditional which are taken

50% of branch conditional are taken, implies 50% of 80% of 20% are taken

considering for all cases where there is stall :
therefore, Tavg=(1+stall cyle*stall freq) clocks

                      = (1+ ( 0.50*0.80*0.20*4 ) + (0.20*0.20*4) ) clocks

                      = 1.48 clocks= 1.48 * max{stage dealy}= 1.48*2ns= 2.96 ns
0
sir am not getting second part ???
0
Nicely explained sir
31

2nd part

0
It isn't the branch instruction but the instruction following it which incurs the overhead of 10 units, am I correct?
0

if branch taken, it would be simply after k-1 stall

can someone please elaborate this statement ? thanks 

0
"as when a branch instruction enters the pipeline and before it finishes, 4 pipeline stages will be empty totaling 4 * 2 = 8 ns, as it is mentioned in question that the next instruction fetch starts only when branch instruction completes)."

This 8 ns delay will be in 6 th instruction? As 5 th instruction is branch instruction the delay will happen in 6th instruction. I am confused which instruction will have delay of 8 ns and will face stalls if 5th instruction is branch then next instruction should be fetched only once 5 th instruction is fiinhsed with its execution.

To execute 5 instruction it will take 18 ns but to execute 6th it will include 8 ns delay plus 2 ns to execute total 10 ns. I don't know if my understanding is correct please help.
0

@arjun sir,

Now, once the pipeline is full, we can assume that the initial fill time doesn't matter our calculations 

If ideal condition is not mentioned in question ..we should consider initial pipeline filling time also right by, just adding extra time of filling in whatever is calculated considering ideal condition.

i.e , in part (A) 3.6 + 4(2) 

Please correct me if i m wrong 

 

3
@ jatin khachane 1

As we don't know total number of instructions, so we can't find average time taking into account initial pipeline fill time, because it will be $3.6 + (4*2) / total Instructions$ in case of (A). so we need to ignore second term, that's why answer is $3.6ns$
1
@Arjun sir

@just_bhavana

Here it is given that if the branch is not taken next instructions can be overlapped ..But whether instruction takes branch or not is known after completion of branch instruction execution..how it is decided here that if branch taken then instructions can overlapped and not overlapped if branch taken ...how can we use two different mechanism for diif branch instructions .. ??

if we use stalling(next ) mechanism in both cases it will not overlap next instruction till we get EA

If we use flushing then we can overlap and flush depending on branch taken / not
0

@Arjun sir

Can you please explain why in case of unconditional branch instruction, next instruction is known @ the end of instruction only and not @the end of decode or execution stage??

0
In part b) "for 100 iinstructions, we need 100*2=200ns without any delay".  It should be (5+99*1)*2=204ns without any delay,isnt it?
0

@Vishal Rana, 

As we don't know total number of instructions, so we can't find average time taking into account initial pipeline fill time,

But in both A and B we have assumed number of instructions to be 100 for calculation. So, shouldn't we use (k+n-1) method of including initial instruction's total time. 

@Arjun sir, please help with the same.

74 votes

if an instruction branches then it takes $2ns \times 5 = 10ns$, coz if branch is taken then the instruction after that branch instruction is not fetched until entire current branch instruction is completed, this means it will go through all stages.

if an instruction is non-branch or branching does not happen then, it takes $2ns$ to get completed. 

a) $\text{average time taken} = 0.8 \times 2ns + 0.2 \times 10ns = 3.6ns$

b) 

$\text{Average time taken,}$

$ = 0.8 \times 2ns + 0.2 \Big( 0.2 \times 10ns + 0.8 \big( 0.5\times 10ns + 0.5 \times 2ns\big)\Big)$

$ =2.96ns$


edited by
0
nice answer :)
0
For first Part

First instruction will always take n stage * t ns i,e first instruction will take 5*2=10 ns. is nt?

Assume 100 instruction

so average time= (2*5+79*2+20*5*2)/100

=3.68 ns

Please tell What is wrong in it?
0

@Abhisek Tiwari 4 ,I am also getting 3.68ns 

5

Abhisek Tiwari 4  QUESTION SAYS UNDER IDEAL CONDITION SO  IN PIPELINE IDEAL CONDITION IS CPI=1

0

@amarVashishth

Kindly help me in this. I followed the approach given in your solution , got answer as 62.5 but that is wrong.

https://gateoverflow.in/285418/go2019-flt1-61

4 votes

A))

Let total number of instruction is n

Tavg=n x 20% x 5 x 2ns + n x 80% x 1 x 2ns/n

Tavg=3.6ns

B))

Let us suppose we have 100 instruction

20% are branch instruction and 80% Not branch instruction

out of 20% ,80% of 20% are cnditional branch and 20% of 20% are unconditional

Now 50% of 80% of 20% are branch instruction where branch is taken and 50% of 80% of 20% branch not taken.

Clock average=148/100=1.48

Tavg=1.48 x 2ns=2.96ns

3 votes

.........


edited by
0
New CPI = OLD CPI + stall-rate*stall-cycle

execution time = new CPI * cycle time.
2 votes

Part A:

suppose total instructions = n

time for each stage = 2 ns , CPI=stall + 1

average instruction execution time = ([no branch instrution * cpi * stage time] + [branch instrution * cpi *stage time])/total instruction

for branch instruction :

no. of stall = 4 , so, CPI=stall+1= 5

20 % of  total instructions are branch instructions = n * 0.2 

for no branch instruction :

no. of stall = 0 , so; CPI=stall + 1= 1

80 % of  total instructions are no branch instructions = n * 0.8

average time = ([n * 0.2 * 5 * 2]+[n * 0.8 * 1 * 2])/n = 3.6 ns

 

Part B:

this will be the graph to understand better :

this will be little hard to solve in terms of n , so take total instruction n= 100

then no branch instn= 80 , branch instn(conditional branch=16(branch taken=8 , branch not taken=8) , unconditional branch=4)

average instruction execution time = ([80*1*2]+[8*5*2]+[8*1*2]+[4*5*2])/100 = 2.96 ns

1 vote

People who are still facing difficulty in understanding the question.Refer this I hope it will remove all your doubts.

a )Read the line very carefully "the instruction after Brach is not fetched till the branch is completed"

See the diagram below Considering there are 100 instructions.

20 % are branch means there are 20 branch and 80 instructions without any branch.

Solution 1-Consedering the case where every 5 the instruction is branch .

In that case 1st instruction takes 10ns and from 2nd to 5th it takes 2*4=8ns(following concept of pipelne).

So in total 5  instruction takes 18ns.

There are in total 20 sets of 5  instruction so total time is 20*18 =360ns.

Average time=360/100=3.6ns.

Solution -2 for part a

Considering all non branch instructions are executed in at once and all branch instructions at once.

Time for 80 instructions (non-branch)

10+79*2=168ns

Time for 20 instructions (branch)

20*10=200ns(as till branch is not completed another branch cannot fetched).

So total time=168+200=368ns

Average=368/100=3.68ns.

Solution 3 for part a-refering to Arjun sir solution

What sir has assumed is that is that the all non branch instruction takes=80*2 and not 10+79*2 reason being we can neglect the first instruction time as it will not affect much.

Total time for non branch=2*80=160ns

Non branch=20*10=200ns

Total time =200+160=360

Average time =360/100=3.6ns

Hope you now can do b part on your own.Still having issues comment.

1 vote

 

 

I hope the solution is more clear now, I hope you got why we are taking 4 stalls.

As it is mentioned in the question itself that next instruction fetch is after the complete execution of the previous branch instruction.

 

Thanks

1
Thanks bruh, it's much more clear to me now. Btw how can I reach you? I'm having a few questions related to exam!?
1
send me a message
0
I can't message you directly, as I'm not verified yet here on GateOverflow. I can't start the conversation.

One of the main issue is:

In COA, how to distinguish between simultaneous and hierarchical one, as in GATE they directly not mention it most of the time. How to get it right?
0 votes

Execution time formula T1= $\frac{N*S}{R}$

Where N = dynamic instruction count

             S = Avg no. Of cycles to fecth and execte one instruction

            R = clock rate in cycles per second

 

Avg execution time T = $\frac{T1}{N}$

                                   T= $\frac{S}{R}$

 

Now according to question R = $\frac{1}{2 ns}$

So T = S *2 ns

For ideal pipelines S = 1

For non-ideal pipelines S = 1 + S' ( stall)

 

Now according to question when branch instruction is fetched next instruction is not fetched until after it's completion which means there is a stall of 4 cycles.

 

A).     20% instructions are branch instructions so 

            S' = 0.2*4 = 0.8

           S =1 + 0.8 = 1.8

 

So     T =1.8 * 2 ns

             =3.6 ns    (Ans)

 

B).      For Conditional Branch

                Branch prediction = Not taken

                If prediction true = overlapping done 

          This case gives us 0 stall.

                Prediction false = no overlapping

           This case gives us 4 stalls.

                S' = 0.2 * 0.8 *0.5 *4  ( 20 % branch instruction, 80% Conditional Branch , 50% branch taken giving false prediction)

                 S'  = 0.32

 

           For unconditional branch

                S' = 0.2 * 0.2 * 4

                    = 0.16

 

          So S = 1+0.32+0.16 = 1.48

 

              T = 1.48*2 ns

                 =2.96 ns (Ans)

       

   

 


edited by

Related questions

32 votes
4 answers
1
6.2k views
Comparing the time T1 taken for a single instruction on a pipelined CPU with time T2 taken on a non-pipelined but identical CPU, we can say that T1 ≤ T2 T1 ≥ T2 T1 < T2 T1 and T2 plus the time taken for one instruction fetch cycle
asked Sep 14, 2014 in CO and Architecture Kathleen 6.2k views
6 votes
0 answers
2
741 views
Consider the following 8085 program segment, where registers B and C contain BCD values: S1: MVI A, 99H MVI D, 00H SUB C ADD B DAA S2: JC S3 MOV E, A MVI A, 99H SUB E MOV E, A JZ S4 MVI D, FFH JMP S4 S3:INC A DAA MOV E, A S4: ......... For the two ... the values in register D and E when control reaches S4. What, in general, is the value of D and E as a function of B and C when control reaches S4?
asked Sep 14, 2014 in CO and Architecture Kathleen 741 views
0 votes
0 answers
3
414 views
Consider the 8085 instruction IN 09H stored as follows: Memory Address Machine Code 3050 3051 DA 09 and the following incomplete timing diagram for the instruction: Write the contents of the boxes, A, B, C and D in hexadecimal in your answer sheet. Do not draw any ... to put the data on the bus? Answer by completing the following statement in your answer book: By combining signals.........
asked Sep 14, 2014 in CO and Architecture Kathleen 414 views
21 votes
3 answers
4
4.4k views
The most appropriate matching for the following pairs$\begin{array}{ll} \text{X: Indirect addressing} & \text{1: Loops } \\ \text{Y: Immediate addressing } & \text{2: Pointers} \\ \text{Z: Auto decrement addressing } & \text{3: Constants } \\ \end{array}$ is $X - 3, Y - 2, Z - 1$ $X - 1, Y - 3, Z - 2$ $X - 2, Y - 3, Z - 1$ $X - 3, Y - 1, Z - 2$
asked Sep 14, 2014 in CO and Architecture Kathleen 4.4k views
...