edited by
23,046 views
56 votes
56 votes

Following table indicates the latencies of operations between the instruction producing the result and instruction using the result.
$$\begin{array}{|l|l|c|} \hline \textbf {Instruction producing the result} &  \textbf{Instruction using the result }& \textbf{Latency} \\\hline \text{ALU Operation} & \text{ALU Operation} & 2 \\\hline \text{ALU Operation} & \text{Store} & \text{2}\\\hline \text{Load} & \text{ALU Operation} & \text{1}\\\hline \text{Load} & \text{Store} & \text{0} \\\hline \end{array}$$

Consider the following code segment:

 Load R1, Loc 1; Load R1 from memory location Loc1
 Load R2, Loc 2; Load R2 from memory location Loc 2
 Add R1, R2, R1; Add R1 and R2 and save result in R1
 Dec R2;         Decrement R2
 Dec R1;         Decrement R1
 Mpy R1, R2, R3; Multiply R1 and R2 and save result in R3
 Store R3, Loc 3; Store R3 in memory location Loc 3


What is the number of cycles needed to execute the above code segment assuming each instruction takes one cycle to execute?

  1. $7$
  2. $10$
  3. $13$
  4. $14$
edited by

5 Answers

Best answer
184 votes
184 votes

In the given question there are $7$ instructions each of which takes $1$ clock cycle to complete. (Pipelining may be used)
If an instruction is in execution phase and any other instructions can not be in the execution phase. So, at least $7$ clock cycles will be taken.
Now, it is given that between two instructions latency or delay should be there based on their operation. Ex- $1^{st}$ line of the table says that between two operations in which first is producing the result of an ALU operation and the $2^{nd}$ is using the result there should be a delay of $2$ clock cycles.

$$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c||} \hline \text {Clock cycle} & \text{T1} & \text{T2}&  \text{T3 }& \text{T4} & \text{T5} & \text{T6} & \text{T7}&  \text{T8 }& \text{T9} & \text{T10} & \text{T11} & \text{T12}&  \text{T13 }\\\hline & \text{I1} & \text{I2} &  & \text{I3}&  \text{I4 } & & \text{I5} & & & \text{I6} & & & \text{I7}\\\hline \end{array}$$

 

  1. Load R1, Loc 1; Load R1 from memory location Loc1
    Takes 1 clock cycle, simply loading R1 on loc1.
     
  2. Load R2, Loc 2; Load R2 from memory location Loc2
    Takes 1 clock cycle, simply loading r2 on loc2.
     
  3. (Add R1, R2, R1; Add R1 and R2 and save result in R1
    R1=R1+R2;
    Hence, this instruction is using the result of R1 and R2, i.e. result of Instruction 1 and Instruction 2.
    As instruction 1 is load operation and instruction 3 is ALU operation. So, there should be a delay of 1 clock cycle between instruction 1 and instruction 3, which is already there due to I2.
    As instruction 2 is load operation and instruction 3 is ALU operation. So, there should be a delay of 1 clock cycle between instruction 2 and instruction 3.
     
  4. Dec R2; Decrement R2
    This instruction is dependent on instruction 2 and there should be a delay of one clock cycle between Instruction 2 and
    Instruction 4. As instruction 2 is load and 4 is ALU, which is already there due to Instruction 3.
     
  5. Dec R1 Decrement R1
    This instruction is dependent on Instruction 3
    As Instruction I3 is ALU and I5 is also ALU so a delay of 2 clock cycles should be there between them of which 1 clock cycle delay is already there due to I4 so one clock cycle delay between I4 and I5.
     
  6. MPY R1, R2, R3; Multiply R1 and R2 and save result in R3
    R3=R1*R2;
    This instruction uses the result of Instruction 5, as both instruction 5 and 6 are ALU so there should be a delay of 2 clock cycles.
  7. Store R3, Loc 3 Store R3 in memory location Loc3
    This instruction is dependent on instruction 6 which is ALU and instruction 7 is store so there should be a delay of 2 clock cycles between them.

Hence, a total of 13 clock cycles will be there.

Correct Answer: $C$

edited by
19 votes
19 votes

Answer is (C)

Here each instruction takes $1$ cycle but apart from that we have to consider latencies b/w
instruction: If there are two ALU operations by $I1$ and $I2$ such that $I2$ uses the value
produced by $I1$ in some register then $I2$ will be executed ONLY after waiting TWO more
cycles after $I1$ has executed because latency b/w two ALU operations is $2$

See here:

Clock         1       2         3         4         5            6           7           8         9          10           11          12           13             14

Inst.           I1       I2       -           I3       I4           -           I5           -          -             I6            -             -              I7

$I3$ is ALU operation which uses the result of LOAD in $I2,$ so latency is of $1$ cycle.

$I5$ is ALU operation using result of ALU in I3, therefore, has to wait for $2$ cycles after $I3$

$I6$ is ALU and uses result of ALU in I5, therefore waits for $2$ cycles

edited by
2 votes
2 votes

As per the given table and the assumption that 
The instruction of Type R1 <- R1 + R2 requires 
2 x (load - alu op) type

1 x (alu op - store) type

Then the answer should be 14

Analysis:
Instr | Cost
==== ====
1. 0
2. 0
3. 1+1+2 = 4
4. 1+2 = 3
5. 1+2 = 3
6. 1+1+2 = 4
7. 0

AND 
if, R1 <- R1+R2 
is only an (alu-store) type.. then intrs. 3 and 6 take 2 time units each
resulting in the answer as 10..

Unless, somebody presents a different interpretation.. 
smiley

Answer:

Related questions

30 votes
30 votes
2 answers
4
go_editor asked Apr 23, 2016
9,840 views
Consider the following program segment. Here $\text{R1, R2}$ and $\text{R3}$ are the general purpose registers.$$\begin{array}{|l|l|l|c|} \hline & \text {Instruction} & \...