11,393 views
Consider a non-pipelined processor operating at $2.5$ GHz. It takes $5$ clock cycles to complete an instruction. You are going to make a $5$- stage pipeline out of this processor. Overheads associated with pipelining force you to operate the pipelined processor at $2$ GHz. In a given program, assume that $30\%$ are memory instructions, $60 \%$ are ALU instructions and the rest are branch instructions. $5 \%$ of the memory instructions cause stalls of $50$ clock cycles each due to cache misses and $50 \%$ of the branch instructions cause stalls of $2$ cycles each. Assume that there are no stalls associated with the execution of ALU instructions. For this program, the speedup achieved by the pipelined processor over the non-pipelined processor (round off to $2$ decimal places) is_____________.

edited
Why for a cache miss the non-pipelined processor has no stalls, and why does a pipelined processor has a whopping 50 cycle stall?
The range for this Qn needs to be corrected. 2.16 is not accepted as correct answer.
@ gorgeus This is very great question not a dumb one. Please comment and let us know if you get it clarified.

Time taken by non-pipelined processor to finish executing the $n$ instructions $: \frac{5n}{2.5}=2n\;\text{ns}$

Now, for pipelined processor,

$\small \begin{array}{|c|c |c| c|} \hline \text{Instruction type} & \text{Number of such instructions} & \% \text{Causing stalls} & \text{Number of stall cycles} \\\hline \text{Memory} & 0.3n & 5\% \;\text{of}\; 0.3n & 50 \\\hline \text{ALU} & 0.6n & \text{None} & 0 \\\hline \text{Branch} & 0.1n & 50\% \;\text{of}\;0.1n & 2 \\\hline \end{array}$

Therefore, time taken by pipelined processor:

$0.6n(1) + 0.3n[0.05(1+50) + 0.95(1)] + 0.1n[0.5(1+2) + 0.5(1)]$ cycles

$= 1.85n$ cycles

$= \frac{1.85n}{2}\;\text{ns}$

$= 0.925n\;\text{ns}$

Speedup $= \frac{2n}{0.925n} = 2.162 \approx 2.16.$
by

@hollow_mind

How will pipeline stall when there is no pipeline?

In question (and in general)stalls are no ops done for instructions so it should be considered(as far as my knowledge is concerned).

What about the overhead due to the first 4 instructions? Why are we not considering the same?
Non-pipelined processor operates at 2.5GHz ==> T$_{Non-pipelined}=\frac{1}{2.5GHz}= 0.4 ns$

CPI of Non-pipelined processor = 5 clocks.

pipelined processor operates at 2GHz ==> T$_{Pipelined}=\frac{1}{2GHz}= 0.5 ns$

Memory instructions = 30 %

in those memory instructions 5 %, each creating stalls of 50 cycles

ALU instructions = 60%, these are creating no stalls

Branch instructions = 10%

in those branch instructions 50 %, each creating stalls of 2 cycles.

CPI of Pipelined processor = 1 + extra cycles. = 1 + stalls by memory instructions + stalls by Branch instructions.

= $1 + (\frac{30}{100}*\frac{5}{100}*50) + (\frac{10}{100}*\frac{50}{100}*2)$ = 1.85

Execution time = no.of instructions * No.of clock cycles * clock time = CPI * clock time.

Speed up achieved = $\frac{5*0.4}{1.85*0.5}$ = 2.16

sir can u explain why we use (1+50)?
This is the approach I was looking for. Thank you sir.:)

GopiChand

CPI(cycles per instruction) = (K+n-1)/n   (where K=stages , n= total number of instructions ) .Now , if there are some stall cycles due to memory or any other instructions then we have to add those extra cycles also   CPI= [(K+n-1)+x]/n  and considering n>>>>K  we will get  CPI=(n+x)/n and by simplifying further you will get  CPI=1+x/n (where x/n  refers to the number of instructions which are causing the stall cycles out of n instructions) so we can write it as f*c ( f=frequency of instructions and c=stall cycles caused by them). that's how the +1 came into the picture.

hope it helps:)

by