6.5k views
Consider a $6$-stage instruction pipeline, where all stages are perfectly balanced. Assume that there is no cycle-time overhead of pipelining. When an application is executing on this $6$-stage pipeline, the speedup achieved with respect to non-pipelined execution if $25$% of the instructions incur $2$ pipeline stall cycles is ____________

edited | 6.5k views
+1

Assume that there is no cycle-time overhead of pipelining ? what is meaning for this

0
There is no cycle time overhead of pipelining means all stage delays are same ...Having different stage delays will cause cycle time overhead ...
0
What's a perfectly balanced pipeline?
0
0

http://web.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/pipe_title.html

but in this site they mentioned

If the stages are perfectly balanced, then the time per instruction on the pipelined machine is equal to Time per instruction on nonpipelined machine
Under these conditions, the speedup from pipelining equals the number of pipe stages.

Time without pipeline $=6 \text{ stages}=6 \text{ cycles}$

Time with pipeline $=1+\text{stall freqency}\times \text{stall cycle}$

$=1+.25\times 2$
$=1.5$

Speed up $=\dfrac{6}{1.5}=4$
by (429 points)
edited
+9
Is this approach fine :

Suppose each instruction is taking n time without pipelining.

Time for 100 instructions : 100n

For pipelined So each stage will be n/6.

Time for 100 instructions : n+99*n/6+25*2*n/6  = 155n/6

So speed up is : 100*6/155 ~= 4
+1
yes, but you can avoid adding "n" and use 100 for 99 as for average time wee can assume a very large sequence of instructions- not just 100.
+5

2 stall for 25 instruction, lets take total instruction 100.

6 stage pipeline 100 instruction will complete in pipeline 6+(100-1) & for stall 2 cycle extra for 25 instructions so pipeline time 2+(25-1)

total pipeline time 6+99+2+24=131

Non pipeline = 6*100 =600

Speedup = Non pipeline / Pipeline = 600/131= 4.58 ≈ 4

It should be more accurate 4.58

0

not understood

Time with pipeline =1+stall freqency*stall cycle

0
when we take total 4 instructions, 2 stall for 1 instruction ( as per above calculation for 100 instructions!!)

Speedup = Non pipeline / Pipeline = 24/11= 2.18

Can somebody make this clear please?
0
@arjun is it correct to say those 25% instruction incurring stalls cycles should be contigous?Stalling of i leads to i+1 and so on?.Having 2 stall cycles doesn't neccesarily make CPI 3 for those 25% instruction, as pointed out in other answers.Please correct me if wrong.
0

Non-pipeline does not suffer from a stall . here 2 is branch penalty.

0
@Anu007

Actually the question asked by manoja was that, suppose there are 4 instructions.

For non pipeline = 6 stages * 4 instructions = 24

For pipeline (if you draw the pipelining diagram considering 25% i.e 1 instruction containing 2 stall cycles, you get answer as) = 11

Speed up = non pipeline/ pipeline

I.e 24/11= 2.18

I was unable to understand your explanation on this question. So will you please elaborate it again for me?
Speed Up= 4

Speed up = Time without Pipeline /  Time with Pipeline

Time without pipeline = 6 clock cycle

25% of the instructions incur 2 pipeline stall cycles means CPI = 3

Time with pipeline = 0.25*( 3 ) + 0.75*1 = 1.5 clock cycle

speed up = 6 clock cycle / 1.5 clock cycle   = 4
by Loyal (6.9k points)
+1
how did you get CPI as 3?
0
B/C it have 2 stalls
+7
For those who didn't get how CPI=3:

1    2    3   4   5    6   7    8    9

S1 S2 S3 S4 S5 S6

X    X  S1 S2 S3 S4 S5 S6

The first instruction takes 6 clock cycles. S1 stage of next instruction should start at clock cycle 2. But because of 2 stall cycles S1 stage starts at clock cycle 4. Thus 1st instruction will get completed at 6 , second instruction will get completed at 9 so on...

So CPI=3
0
very good explanation
+4

cpi=1+stall cycle

It was a numerical digit type question so answer must be 4.

As for 6 stages, non-pipelining takes 6 cycles.

There were 2 stall cycles for pipelining for 25% of the instructions

So pipe line time = (1+(25/100)*2) = 1.5

Speed up = Non pipeline time/Pipeline time = 6/1.5 = 4
by Boss (10k points)

Suppose 'n' instructions are there and n is very large...and Tp time of each cycle. Thus we are having 0.25n branched and 0.75n non-branched instructions

now 0.25n instructions create 2 stalls and thus the time taken is:

=    #f cycles * Tp

=   (6+0.25n-1 + 2*0.25n)Tp

=   (5+0.75n)Tp

Now similarly for non branch, time= (6+0.75n-1)Tp = (5+0.75n)Tp

thus total time taken in Pipeline =  (5+0.75n)Tp + (5+0.75n)Tp =(10 + 1.5n)Tp

For Non pipeline time taken for n instruction = #f cycles for per instruction * each cycle time * #f instructions

= 6*Tp*n = 6n*Tp

now, speedup achieved =Time(non-pipe)/Time(pipe)

= (6n Tp) / (10+1.5n)Tp

= 6n / (10+1.5n)

as  n → , taking limits,  speedup= 6/1.5 =4

by Active (1.9k points)
The question has been answered before. However, when I looked at it, I couldn't get it. So I am providing an alternate explanation here.

By the basic performance equation, we know that $$\text{Execution Time} = \frac{N \times S}{ R}$$ where N is the number of instructions, S is the average clock cycles taken per instruction and R is the number of clock cycles per second.

Another useful metric is throughput, which is the number of instructions executed per second. It is defined as:

$$\text{Throughput} = \frac{R}{S}$$.

Now in an non-pipelined processor, the number of stages is equal to the number of cycles i.e $S = \text{No. of stages}$

In a pipelined processor, the value of $S = 1$ because an instruction can enter the pipeline in every cycle.

However, this is the case for an ideal pipeline. In case of a pipeline with stall, the general formula is:

$$S = 1 + \delta_{stall} + \delta_{branch} + \delta_{cache}$$ where $\delta$ is the penalty associated with each of the misses.

Here, $\delta_{stall}$ is given as $0.25 \times 2$, hence the value of $S$ is $1+ 0.5 = 1.5$.

So according to the question, non-pipelined $T_{np} = R/6$ and for the pipelined, $T_{p} = R/1.5$.

From this, we can see that the speedup is $4$ times.
by Loyal (6.9k points)
In non-pipelined , for 100 instructions, we need 6*100 = 600 cycles. As 6 stages are there and no cycle time overhead.

Now in pipelined architecture we need = 6+ (100-1) = 105 cycles and 25 instructions additionally take 2 cycle stalls so 25*2 = 50 cycles more, so 105+50 = 155 cycles in pipelined architecture.

So speedup = 600/155 = 3.87 ~ 4.

4 is the ans.
by Active (1.2k points)