Why for a cache miss the non-pipelined processor has no stalls, and why does a pipelined processor has a whopping 50 cycle stall?

Dark Mode

11,393 views

21 votes

Consider a non-pipelined processor operating at $2.5$ GHz. It takes $5$ clock cycles to complete an instruction. You are going to make a $5$- stage pipeline out of this processor. Overheads associated with pipelining force you to operate the pipelined processor at $2$ GHz. In a given program, assume that $30\%$ are memory instructions, $60 \%$ are ALU instructions and the rest are branch instructions. $5 \%$ of the memory instructions cause stalls of $50$ clock cycles each due to cache misses and $50 \%$ of the branch instructions cause stalls of $2$ cycles each. Assume that there are no stalls associated with the execution of ALU instructions. For this program, the speedup achieved by the pipelined processor over the non-pipelined processor (round off to $2$ decimal places) is_____________.

25 votes

Best answer

Time taken by non-pipelined processor to finish executing the $n$ instructions $: \frac{5n}{2.5}=2n\;\text{ns}$

Now, for pipelined processor,

$\small \begin{array}{|c|c |c| c|} \hline \text{Instruction type} & \text{Number of such instructions} & \% \text{Causing stalls} & \text{Number of stall cycles} \\\hline \text{Memory} & 0.3n & 5\% \;\text{of}\; 0.3n & 50 \\\hline \text{ALU} & 0.6n & \text{None} & 0 \\\hline \text{Branch} & 0.1n & 50\% \;\text{of}\;0.1n & 2 \\\hline \end{array}$

Therefore, time taken by pipelined processor:

$0.6n(1) + 0.3n[0.05(1+50) + 0.95(1)] + 0.1n[0.5(1+2) + 0.5(1)]$ cycles

$= 1.85n$ cycles

$= \frac{1.85n}{2}\;\text{ns}$

$= 0.925n\;\text{ns}$

Speedup $= \frac{2n}{0.925n} = 2.162 \approx 2.16.$

Now, for pipelined processor,

$\small \begin{array}{|c|c |c| c|} \hline \text{Instruction type} & \text{Number of such instructions} & \% \text{Causing stalls} & \text{Number of stall cycles} \\\hline \text{Memory} & 0.3n & 5\% \;\text{of}\; 0.3n & 50 \\\hline \text{ALU} & 0.6n & \text{None} & 0 \\\hline \text{Branch} & 0.1n & 50\% \;\text{of}\;0.1n & 2 \\\hline \end{array}$

Therefore, time taken by pipelined processor:

$0.6n(1) + 0.3n[0.05(1+50) + 0.95(1)] + 0.1n[0.5(1+2) + 0.5(1)]$ cycles

$= 1.85n$ cycles

$= \frac{1.85n}{2}\;\text{ns}$

$= 0.925n\;\text{ns}$

Speedup $= \frac{2n}{0.925n} = 2.162 \approx 2.16.$

0

35 votes

Non-pipelined processor operates at 2.5GHz ==> T$_{Non-pipelined}=\frac{1}{2.5GHz}= 0.4 ns $

CPI of Non-pipelined processor = 5 clocks.

pipelined processor operates at 2GHz ==> T$_{Pipelined}=\frac{1}{2GHz}= 0.5 ns $

Memory instructions = 30 %

in those memory instructions 5 %, each creating stalls of 50 cycles

ALU instructions = 60%, these are creating no stalls

Branch instructions = 10%

in those branch instructions 50 %, each creating stalls of 2 cycles.

CPI of Pipelined processor = 1 + extra cycles. = 1 + stalls by memory instructions + stalls by Branch instructions.

= $1 + (\frac{30}{100}*\frac{5}{100}*50) + (\frac{10}{100}*\frac{50}{100}*2)$ = 1.85

Execution time = no.of instructions * No.of clock cycles * clock time = CPI * clock time.

Speed up achieved = $\frac{5*0.4}{1.85*0.5}$ = 2.16

CPI of Non-pipelined processor = 5 clocks.

pipelined processor operates at 2GHz ==> T$_{Pipelined}=\frac{1}{2GHz}= 0.5 ns $

Memory instructions = 30 %

in those memory instructions 5 %, each creating stalls of 50 cycles

ALU instructions = 60%, these are creating no stalls

Branch instructions = 10%

in those branch instructions 50 %, each creating stalls of 2 cycles.

CPI of Pipelined processor = 1 + extra cycles. = 1 + stalls by memory instructions + stalls by Branch instructions.

= $1 + (\frac{30}{100}*\frac{5}{100}*50) + (\frac{10}{100}*\frac{50}{100}*2)$ = 1.85

Execution time = no.of instructions * No.of clock cycles * clock time = CPI * clock time.

Speed up achieved = $\frac{5*0.4}{1.85*0.5}$ = 2.16

CPI**(cycles per instruction)** = **(K+n-1)/n** ** (where K=stages , n= total number of instructions ) .**Now , if there are some stall cycles due to memory or any other instructions then we have to add those extra cycles also

hope it helps:)

1