edited by
35,650 views
53 votes
53 votes

Consider a $4$ stage pipeline processor. The number of cycles needed by the four instructions $I1, I2, I3, I4$ in stages $S1, S2, S3, S4$ is shown below:

$$\begin{array}{|c|c|c|c|c|} \hline \textbf{} & \textbf {S1} &\textbf {S2} & \textbf {S3} &  \textbf{S4 } \\\hline \textbf{I1}& 2 & 1  & 1 & 1 \\\hline \textbf{I2} & 1 & 3 & 2  & 2\\\hline  \textbf{I3}& 2 & 1  & 1 & 3 \\\hline \textbf{I4} & 1 & 2 & 2  & 2 \\\hline  \end{array}$$

 

What is the number of cycles needed to execute the following loop?

For (i=1 to 2) {I1; I2; I3; I4;}

  1. $16$
  2. $23$
  3. $28$
  4. $30$
edited by

10 Answers

Best answer
70 votes
70 votes

Here bound of the loop are constants, therefore compiler will do the loop unrolling(If compiler won't then prefetcher will do) to increase the instruction level parallelism. And after loop unrolling $23$ cycles are required for execution. Therefore, correct answer would be (B).

PS: We assume the buffers between the pipeline stages can store multiple results in the form of a queue.

$$\tiny \begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|} \hline &C_1&C_2&C_3&C_4&C_5&C_6&C_7&C_8&C_9&C_{10}&C_{11}&C_{12}&C_{13}&C_{14}&C_{15}&C_{16}&C_{17}&C_{18}&C_{19}&C_{20}
&C_{21}&C_{22}&C_{23}\\\hline
\bf{I_1}&S_1&S_1&S_2&S_3&S_4\\\hline
\bf{I_2}&&&S_1&S_2&S_2&S_2&S_3& S_3&S_4&S_4\\\hline
\bf{I_3}&&&&S_1&S_1&\color{red}{-}&S_2&\color{red}{-}&S_3&\color{red}{-}&S_4& S_4&S_4\\\hline
\bf{I_4}&&&&&&S_1&\color{red}{-}&S_2&S_2&S_3&S_3&\color{red}{-}&\color{red}{-}&S_4&S_4\\\hline
\bf{I_1}&&&&&&&S_1&S_1&\color{red}{-}&S_2&\color{red}{-}&S_3&\color{red}{-}&\color{red}{-}&\color{red}{-}&S_4\\\hline
\bf{I_2}&&&&&&&&&S_1&\color{red}{-}&S_2&S_2&S_2&S_3& S_3&\color{red}{-}&S_4&S_4\\\hline
\bf{I_3}&&&&&&&&&&S_1&S_1&\color{red}{-}&\color{red}{-}&S_2&\color{red}{-}&S_3&\color{red}{-}&\color{red}{-}&S_4& S_4&S_4\\\hline
\bf{I_4} &&&&&&&&&&&&S_1&\color{red}{-}&\color{red}{-}&S_2&S_2&S_3&S_3&\color{red}{-}&\color{red}{-}&\color{red}{-}&S_4&S_4\\\hline

\end{array} $$

edited by
22 votes
22 votes
this is the loop level level paralellism question but when we apply the loop level parallelism then we get 25 cycles so dis is not in option so we have to do it without loop level parallelism and the frst time loop output the result at 15 cc so total 30 cc for 2 iterations
18 votes
18 votes

When using the following timeline we get 23 cycles:

 

In the 6th cycle, instruction 4 is entering stage 1 while instruction 3 has not started stage 2. This requires additional output buffer for stage 1 and we can assume this is available. (This additional buffer is not required if all stage delays are the same).  

Hence , option (b) is true.

reshown by
8 votes
8 votes

Concept for solving this question:

Stage Si is allocated to instruction Ii iff

1. Eligibility: Ii must have completed Si-1 ... X time

2. Availability: Ii-1 must have released Si ... Y time

Time of allocation =  Max(X,Y)

In first iteration:

I1 completes after 5th cycle

I2 completes after 10th cycle

I3 completes after 13th cycle

I4 completes after 15th cycle

In second iteration

I1 completes after 16th cycle

I2 completes after 18th cycle

I3 completes after 21th cycle

I4 completes after 23th cycle

Answer : B

Answer:

Related questions

15.0k
views
2 answers
24 votes
Kathleen asked Sep 22, 2014
15,034 views
Consider a $4$-way set associative cache (initially empty) with total $16$ cache blocks. The main memory consists of $256$ blocks and the request for memory blocks are in...
16.1k
views
4 answers
26 votes
Kathleen asked Sep 22, 2014
16,116 views
A CPU generally handles an interrupt by executing an interrupt service routine:As soon as an interrupt is raised.By checking the interrupt register at the end of fetch cy...
1.6k
views
0 answers
2 votes
Rishabh Gupta 2 asked Nov 4, 2017
1,565 views
Consider this question and its selected answer: https://gateoverflow.in/3690/gate2004-it-47And this question: https://gateoverflow.in/1314/gate2009-28Both questions are s...
10.8k
views
5 answers
34 votes
go_editor asked Apr 23, 2016
10,790 views
A hard disk has $63$ sectors per track, $10$ platters each with $2$ recording surfaces and $1000$ cylinders. The address of a sector is given as a triple $\langle c, h, s...