5.2k views

Following table indicates the latencies of operations between the instruction producing the result and instruction using the result.
$$\begin{array}{|l|l|c|} \hline \textbf {Instruction producing the result} & \textbf{Instruction using the result }& \textbf{Latency} \\\hline \text{ALU Operation} & \text{ALU Operation} & 2 \\\hline \text{ALU Operation} & \text{Store} & \text{2}\\\hline \text{Load} & \text{ALU Operation} & \text{1}\\\hline \text{Load} & \text{Store} & \text{0} \\\hline \end{array}$$

Consider the following code segment:

Add R1, R2, R1; Add R1 and R2 and save result in R1
Dec R2;         Decrement R2
Dec R1;         Decrement R1
Mpy R1, R2, R3; Multiply R1 and R2 and save result in R3
Store R3, Loc 3; Store R3 in memory location Loc 3

What is the number of cycles needed to execute the above code segment assuming each instruction takes one cycle to execute?

1. $7$
2. $10$
3. $13$
4. $14$

edited | 5.2k views
0
What is the answer ? 13 or 14?
0
instruction i4 is decrementing i.e arithmatic operation it is betn LOAD and ALU operation so latency is 1

you havnt included that ...
+3

No.The explanation given above is correct and answer is 13 only.

See-->

I3 is ALU operation which uses result of LOAD in I2 , so latency is of 1 cycle (after I2)

I4 is ALU operation which uses result of LOAD in I2 , so latency is of 1 cycle (after I2) but I3 is executing at cycle 4. Therefore. I4 will execute at cycle no. 5.

I5 is ALU operation using result of ALU in I3 therefore has to wait for 2 cycles after I3

I6 is ALU and uses result of ALU in I5 ,therefore waits 2 cycles

In the given question there are $7$ instructions each of which takes $1$ clock cycle to complete. (Pipelining may be used)
If an instruction is in execution phase and any other instructions can not be in the execution phase. So, at least $7$ clock cycles will be taken.
Now, it is given that between two instructions latency or delay should be there based on their operation. Ex- $1^{st}$ line of the table says that between two operations in which first is producing the result of an ALU operation and the $2^{nd}$ is using the result there should be a delay of $2$ clock cycles.

$$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c||} \hline \text {Clock cycle} & \text{T1} & \text{T2}& \text{T3 }& \text{T4} & \text{T5} & \text{T6} & \text{T7}& \text{T8 }& \text{T9} & \text{T10} & \text{T11} & \text{T12}& \text{T13 }\\\hline & \text{I1} & \text{I2} & & \text{I3}& \text{I4 } & & \text{I5} & & & \text{I6} & & & \text{I7}\\\hline \end{array}$$

3. (Add R1, R2, R1; Add R1 and R2 and save result in R1
R1=R1+R2;
Hence, this instruction is using the result of R1 and R2, i.e. result of Instruction 1 and Instruction 2.
As instruction 1 is load operation and instruction 3 is ALU operation. So, there should be a delay of 1 clock cycle between instruction 1 and instruction 3, which is already there due to I2.
As instruction 2 is load operation and instruction 3 is ALU operation. So, there should be a delay of 1 clock cycle between instruction 2 and instruction 3.

4. Dec R2; Decrement R2
This instruction is dependent on instruction 2 and there should be a delay of one clock cycle between Instruction 2 and
Instruction 4. As instruction 2 is load and 4 is ALU, which is already there due to Instruction 3.

5. Dec R1 Decrement R1
This instruction is dependent on Instruction 3
As Instruction I3 is ALU and I5 is also ALU so a delay of 2 clock cycles should be there between them of which 1 clock cycle delay is already there due to I4 so one clock cycle delay between I4 and I5.

6. MPY R1, R2, R3; Multiply R1 and R2 and save result in R3
R3=R1*R2;
This instruction uses the result of Instruction 5, as both instruction 5 and 6 are ALU so there should be a delay of 2 clock cycles.
7. Store R3, Loc 3 Store R3 in memory location Loc3
This instruction is dependent on instruction 6 which is ALU and instruction 7 is store so there should be a delay of 2 clock cycles between them.

Hence, a total of 13 clock cycles will be there.

Correct Answer: $C$

by Loyal (6.2k points)
edited
+1
well explained!!
0
best explaination !!
+8
I just have one doubt why the Decrement instructions are considered as ALU instructions?

Increment and Decrement can also be done using counter circuits without the need of using the ALU for the same.
0
Wonderful explanation
0
Greatly explained! Thank you
+1
If DEC was not considered to be an ALU operation then the answer would be 10 because there would be no time dependency between DEC and MPY which would reduce the answer by 2 and the additional delay that was added between DEC and DEC can be reduced.
0
what a lovely explanation
0
why aren't we considering dependencies here!? as there is WAR dependency between I3 and I4! @madhab
0
the first two instructions are using store and load which takes a latency of 0 according to given table in the question.but in your explanation you have taken it as latency 1.why?

+1
plz help   ,The difference between Ints4 and Inst2 is 1 right I got it but if you look in the diagram there  is a difference of 2 cycles that is T3 and T4  between I4 and I2.plz tell where i get it wrong
+1

Already mentioned in question that each instruction takes one cycle to execute.

So,for those instructions, you must add one more clock cycle with whatever you got from instruction operation.

0

@Mostafize Mondal so what are you saying  is that T4 is not counted as latency for Instrn.4 if that is the case then how would you define the the  number of cycles between Inst5 and Inst3 is it 1 or 2  .

Pictorially the difference in no. of cycles between  instr2 and Inst4 is similiar to Instr.3 and Instr5 but it is not it is 1 and 2 respectively .

plz help

thanks

0

Instruction 4 is dependent on instruction 2.As instruction 2 is load and instruction 4 is ALU.So,there should be delay of one clock cycle between those instructions,which is already there due to instruction 3.

We know that CPU executes instructions one by one.Becoz of pipeline,we can do parallel works at a time.So, instruction 4 will complete  after completing of instruction 3.

0
Amazing Explanation :)
0
0
beautiful explanation loved it.
0
Why do we consider DEC as an ALU operation?
0
For I3 why we are not considering 2 more cycle delay as we are performing Alu opn and then storing it... According to table we need 2 cycle delay in between them

Here each instruction takes $1$ cycle but apart from that we have to consider latencies b/w
instruction: If there are two ALU operations by $I1$ and $I2$ such that $I2$ uses the value
produced by $I1$ in some register then $I2$ will be executed ONLY after waiting TWO more
cycles after $I1$ has executed because latency b/w two ALU operations is $2$

See here:

Clock         1       2         3         4         5            6           7           8         9          10           11          12           13             14

Inst.           I1       I2       -           I3       I4           -           I5           -          -             I6            -             -              I7

$I3$ is ALU operation which uses the result of LOAD in $I2,$ so latency is of $1$ cycle.

$I5$ is ALU operation using result of ALU in I3, therefore, has to wait for $2$ cycles after $I3$

$I6$ is ALU and uses result of ALU in I5, therefore waits for $2$ cycles

by Loyal (7.8k points)
edited
0
Increase font size of ur answer , it will look better..
+1
calculation not clear.
0
I3 is ALU operation which uses result of load in I2 hence latency should be two cycles

I4 is also ALU opn which uses result of load in I2 hence latency 2

I5 is ALU - ALU hence 2

I6 latency 2

I7 latency 1

In this way answer comes to be 16.

Can you please clear my doubt sir?
0
instruction i4 is decrementing i.e arithmatic operation it is betn LOAD and ALU operation so latency is 1

you havnt included that .

As per the given table and the assumption that
The instruction of Type R1 <- R1 + R2 requires
2 x (load - alu op) type

1 x (alu op - store) type

Then the answer should be 14

Analysis:
Instr | Cost
==== ====
1. 0
2. 0
3. 1+1+2 = 4
4. 1+2 = 3
5. 1+2 = 3
6. 1+1+2 = 4
7. 0

AND
if, R1 <- R1+R2
is only an (alu-store) type.. then intrs. 3 and 6 take 2 time units each
resulting in the answer as 10..

Unless, somebody presents a different interpretation..

by (361 points)
0
I m not sure between (A) and (B).. If go with table.. it will come out to be 10 clocks, assuming 1 latency is 1 clock.

Next it is clearly given that "Assuming each instruction takes one cycle to execute".. here there are 7 instructions.. So 7 cycle.. isn't it?.. So which should we follow the table or the statement that given..?
0

by Junior (655 points)