Dark Mode

14,594 views

48 votes

Consider a $6$-stage instruction pipeline, where all stages are perfectly balanced. Assume that there is no cycle-time overhead of pipelining. When an application is executing on this $6$-stage pipeline, the speedup achieved with respect to non-pipelined execution if $25$% of the instructions incur $2$ pipeline stall cycles is ____________

http://web.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/pipe_title.html

but in this site they mentioned

If the stages are perfectly balanced, then the time per instruction on the pipelined machine is equal to __Time per instruction on nonpipelined machine__

Under these conditions, the speedup from pipelining equals the number of pipe stages.

2

Check this appendix taken from Hennessy and Patterson CO – https://www.cs.ucf.edu/~dcm/Teaching/CDA5106-Fall2015/Appendices/appendix_c.pdf.

The page C12 describes pipeline performance in case of stalling, and even talks about a perfectly balanced pipeline.

1

69 votes

Best answer

0

0

65 votes

Speed Up= 4

Speed up = Time without Pipeline / Time with Pipeline

Time without pipeline = 6 clock cycle

25% of the instructions incur 2 pipeline stall cycles means CPI = 3

Time with pipeline = 0.25*( 3 ) + 0.75*1 = 1.5 clock cycle

speed up = 6 clock cycle / 1.5 clock cycle = 4

Speed up = Time without Pipeline / Time with Pipeline

Time without pipeline = 6 clock cycle

25% of the instructions incur 2 pipeline stall cycles means CPI = 3

Time with pipeline = 0.25*( 3 ) + 0.75*1 = 1.5 clock cycle

speed up = 6 clock cycle / 1.5 clock cycle = 4

This is obviously one of best explanation but here is one flaw according to me as sequencing of these 25% instructions will also matter.

If all 25% instruction are at last then in that case those 25 % will get stablized and avg CPI will be 1 for them as well so total time they take effectively will be(for n instructions) = .75n*1 + 2(moving from normal to stalled instructions) + .25n*1 = n+2

for non pipelined = n*6

so speed up in that case = 6*n/(n+2) and for very large n that will be effectively ~6

but if these 25% instuctions are not together and there is 1 such instruction after 3 normal instructions in that case avg CPI for these 25% instructions will be = 3

in that case we can apply above formula...

please correct if i am wrong somewhere.

0

4 votes

The question has been answered before. However, when I looked at it, I couldn't get it. So I am providing an alternate explanation here.

By the basic performance equation, we know that $$ \text{Execution Time} = \frac{N \times S}{ R} $$ where N is the number of instructions, S is the average clock cycles taken per instruction and R is the number of clock cycles per second.

Another useful metric is throughput, which is the number of instructions executed per second. It is defined as:

$$\text{Throughput} = \frac{R}{S}$$.

Now in an non-pipelined processor, the number of stages is equal to the number of cycles i.e $S = \text{No. of stages}$

In a pipelined processor, the value of $S = 1$ because an instruction can enter the pipeline in every cycle.

However, this is the case for an ideal pipeline. In case of a pipeline with stall, the general formula is:

$$S = 1 + \delta_{stall} + \delta_{branch} + \delta_{cache}$$ where $\delta$ is the penalty associated with each of the misses.

Here, $\delta_{stall}$ is given as $0.25 \times 2$, hence the value of $S$ is $1+ 0.5 = 1.5$.

So according to the question, non-pipelined $T_{np} = R/6$ and for the pipelined, $T_{p} = R/1.5$.

From this, we can see that the speedup is $4$ times.

By the basic performance equation, we know that $$ \text{Execution Time} = \frac{N \times S}{ R} $$ where N is the number of instructions, S is the average clock cycles taken per instruction and R is the number of clock cycles per second.

Another useful metric is throughput, which is the number of instructions executed per second. It is defined as:

$$\text{Throughput} = \frac{R}{S}$$.

Now in an non-pipelined processor, the number of stages is equal to the number of cycles i.e $S = \text{No. of stages}$

In a pipelined processor, the value of $S = 1$ because an instruction can enter the pipeline in every cycle.

However, this is the case for an ideal pipeline. In case of a pipeline with stall, the general formula is:

$$S = 1 + \delta_{stall} + \delta_{branch} + \delta_{cache}$$ where $\delta$ is the penalty associated with each of the misses.

Here, $\delta_{stall}$ is given as $0.25 \times 2$, hence the value of $S$ is $1+ 0.5 = 1.5$.

So according to the question, non-pipelined $T_{np} = R/6$ and for the pipelined, $T_{p} = R/1.5$.

From this, we can see that the speedup is $4$ times.