37,256 views

A $5$ stage pipelined CPU has the following sequence of stages:

• IF – instruction fetch from instruction memory
• RD – Instruction decode and register read
• EX – Execute: ALU operation for data and address computation
• MA – Data memory access – for write access, the register read at RD state is used.
• WB – Register write back

Consider the following sequence of instructions:

• $I_1$: $L$ $R0, loc$ $1$; $R0 \Leftarrow M[loc1]$
• $I_2$: $A$ $R0$, $R0$; $R0 \Leftarrow R0 +R0$
• $I_3$: $S$ $R2$, $R0$; $R2 \Leftarrow R2-R0$

Let each stage take one clock cycle.

What is the number of clock cycles taken to complete the above sequence of instructions starting from the fetch of $I_1$?

1. $8$
2. $10$
3. $12$
4. $15$

$A)Without\ Operand-forwarding$

$1) Split-phase:Clock-cycles=11$

$2) Without\ split-phase: Clock-cycles=13$

$B)With\ Operand-forwarding$

$Clock-cycles=8$
@KUSHAGRA गुप्ता so what should we assume  operand forwarding or no operand forwarding in such questions where nothing is mentioned ?

go for the best case!

Without data forwarding:

13 clock - WB and RD state non overlapping.

$$\begin{array}{|c|c|c|c|c|} \hline \textbf {T1} & \textbf {T2} & \textbf {T3} & \textbf {T4} & \textbf {T5} & \textbf {T6} & \textbf {T7} & \textbf {T8} & \textbf {T9} & \textbf {T10} & \textbf {T11} & \textbf {T12} & \textbf {T13} \\\hline \text{IF}& \text{RD} & \text{EX} & \text{MA} & \text{WB} & & \\\hline \text{} & \text{IF} & &&&\text{RD} & \text{EX} & \text{MA} & \text{WB} & \text{} & \text{}\\\hline &&&&& \text{IF}& & & &\text{RD} & \text{EX} & \text{MA}&\text{WB} \\\hline \end{array}$$

Here, WB and RD stage operate in Non-Overlapping mode.

11 clock - WB and RD states overlapping.

$$\begin{array}{|c|c|c|c|c|} \hline \textbf {T1} & \textbf {T2} & \textbf {T3} & \textbf {T4} & \textbf {T5} & \textbf {T6} & \textbf {T7} & \textbf {T8} & \textbf {T9} & \textbf {T10} & \textbf {T11} \\\hline \text{IF}& \text{RD} & \text{EX} & \text{MA} & \text{WB} & & \\\hline \text{} & \text{IF} & &&\text{RD} & \text{EX} & \text{MA} & \text{WB} & \text{} & \text{}\\\hline &&&& \text{IF}& & &\text{RD} & \text{EX} & \text{MA}&\text{WB} \\\hline \end{array}$$

Split Phase access between WB and RD means:

WB stage produce the output during the rising edge of the clock and RD stage fetch the output during the falling edge.

In Question it is mentioned

for write access, the register read at RD state is used.

This means that for writing operands back to memory, register read at RD state is used (no operand forward for STORE instructions).

Note

• As in any question in any subject unless otherwise stated we always consider the best case. So, do overlap - unless otherwise stated. But this is for only WB/RD
1. Why there is stall for I2 in T3 and T4 ?
RD is instruction decode and register read. IF we execute RD of I2 in T3, data from memory will not get stored to R0 hence proper operands are not available at T3. Perhaps I2 has to wait until I1 write values to memory.
2. WB of I1 and RD of I2 are operating in same clock why it is so ?
If nothing has mentioned in question. This scenario is taken into consideration by default. It is because after MA operands will be available in register so RD and WB could overlap .

With data forwarding

(Should be the case here as question says no operand forwarding for memory register for STORE instructions)

8 clock cycles

1. Why there is a stall I2 in T4 ?
Data is being forwarded from MA of I1 EX of I2 .MA operation of I1 must complete so that correct data will be available in register .
2. Why RD of I2 in T3 ? Will it not fetch incorrect information if executed before Operand are forwarded from MA of I1 ?
Yes. RD of I2 will definitely fetch INCORRECT data at T3 . But don't worry about it Operand Forwarding technique will take care of it .
3. Why can't RD of I2 be placed in T4 ?
Yes . We can place RD of I2 in T4 as well. But what is the fun in that ? pipeline is a technique used to reduce the execution time of instructions . Why do we need to make an extra stall ? Moreover there is one more problem which is discussed just below .After reading the below point  Just think if we had created a stall at T3 !
4. Why can't RD of I3 be placed at  T4 ?
This cannot be done . I3 cannot use RD because Previous instruction I2 should start next stage (EX) before current (I3) could utilize that(RD) stage . It is because data will be residing in buffers.
5. Can an operand being forwarded from one clock cycle to same clock cycle ?
No, the previous clock cycle  must complete before data being forwarded . Unless split phase technique is used
6. Cant there be a forwarding from EX stage(T3) of I1 to EX stage(T4) of I2 ?
This is not possible . See what is happening in I1 . It is Memory Read .So data will be available in register after memory read only .So data cannot be forwarded from EX of I1 .
7. In some case data is forwarded from MA and some case data is forwarded from EX Why it is so ?
Data is forwarded when it is ready . It solely depends on the type of instruction .
8. When to use Split-Phase ?
We can use split phase if data is readily available like between WB/RD and also when operand forwarding happens from EX-ID stage, but not from EX-EX stage. We cannot do split phase access between EX-EX because here the instruction execution may not be possible in the first phase. (This is not mentioned in any standard resource but said by Arjun Suresh by considering practical implementation and how previous year GATE questions have been formed)

[Mostly it is given in question that there is operand forwarding from A stage to B stage eg:https://gateoverflow.in/8218/gate2015-2_44 ]

Split-Phase can be used even when no Operand Forwarding because they aren't related.

References

Similar Questions

Discussions

by

I have a doubt regarding the 8th point in the answer.

Its true that the EX-EX operand forwarding cant use split phase. But I think its not because of this

“We cannot do split phase access between EX-EX because here the instruction execution may not be possible in the first phase”.

But the reason probably will be , you cant do split phase if you are forwarding from any stage to EX because the write in split phase happens in first phase and read happens in second phase, so there is no time to execute the instruction after reading the forwarded operand in the second phase. I think that the reason we are not using split phase in MA → EX stage also.

@fred20978 yes, that’s what the other sentence is also trying to convey.

edited by

@Deepak Poonia @gatecse Sir is there split phase between $I_1\text{ WB}$ and $I_3 \text{ RD}$? Because buffer value will change after $I_2$ executes its $\text{EX}$ phase, so how data is forwarded from $I_1 \text{ WB}$ to $I_3 \text{ EX}$?

For write access the register read at RD stage is used- this means for a STORE instruction we cannot get operand forwarded but only from RD stage. So, we can assume data forwarding is possible for all other instructions.

 T1 T2 T3 T4 T5 T6 T7 T8 IF RD EX MA WB IF RD EX MA WB IF RD EX MA WB MA -> EX forwarding done between I1 and I2 EX -> EX forwarding done between I2 and I3

http://www.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/forward.html

by

@ arjun sir, is operand forwarding taken in default case?

@sushmitaYeah as nothing mentioned in question which technique shall be used?

Hi Arjun Sir,

Instead of forwarding R0's value from EX stage of I2 to EX stage of I3, is it not possible to use split phase technique to forward R0's value from EX stage of I2 to RD stage of I3 in the 5th clock cycle T5 itself?

$8$ cycles required with operand forwarding.

it is not given that RD and WB stage could overlap.

I feel Hamacher's book says something different about point 6 in the answer

6. Cant there be a forwarding from EX stage(T3) of I1 to EX stage(T4) of I2 ?
This is not possible . See what is happening in I1 . It is Memory Read .So data will be available in register after memory read only .So data cannot be forwarded from EX of I1.

In 6th edition of Hamacher's book, section 6.4. Data Dependencies, it gives following instructions:

Add R2, R3, #100
Subtract R9, R2, #30

Without operand forwarding, the solution is given as follows:

With operand forwarding, its given as follows:

Author says ALU's output can be given back as a feedback to its input to achieve above.

Based on this, I feel it should be possible for EX of I2 to execute in T4. What I am missing here :( ???

I1: R0<= M[loc1]

lets break it down to all stages...

as that given program was executing in a pipelined processor at a certain time PC got this address of the above instruction while executing a previous instruction..say I0

now its turn for I1.

first in IF CYCLE the instruction was fetch from the memory location previously pointed by pc.

now hypothetically imagine :>>

say instruction was 0-101-11-10 (say 8 bit address)

now in RD phase its its decoded like this.:::>>>>

0 means direct address i.e. loc1=10

11 means register R0

10 is the address of loc1(we still dont know whats the data in loc1)

no register to be read for this instruction as u can see

in EX phase if it was an indirect address or indexed or relative address the effective address would have been computed..

still don't know whats the data in loc1.

now in MA phase actual load happens.

the data from mem location loc1(i.e. 10) on R0

so only after MA phase we got correct value in R0

that's why operand forwarded from MA and not from EX.

There are lots of things going on here...

without data forwarding and split phase..

its simply 13 clk cycles..(fully understandable)

without data forwarding but with split phase

its 11 clk cycles...

why?

see split phase meaning doing 2 diff things in each half of 1 cycle..

in first half we use it to write and other half to read.

we can do both in 1st half only as we do have diff h/w lines for reading and writing (suppose)

but that will led us into wrong or old value read.

now in split phase we use PIPO shift registers as buffers in between 2 successive stages.This makes the processor perform write and read in 1 clk cycle.

now coming to operand/data forwarding..(8 clks)

we have extra hardwares mostly comparators to check for RAW dependencies..

we say domain of an instruction is the registers in between operation is being done

range of instruction is register on which o/p is being written.

so we can perform comparation between

range(instruction i-1) & domain (instruction i)

so by default assume that when operand forwarding is not used theres no such hardware..else there is..

this is why even if with operand forwarding we are reading the regs in T3 cycle it does not matter as along with this extra circuitry is checking fot RAW and as RAW exists it will lead to 1 stall cycle.

we forward dummy signals (all 0) to next exec. phase .

thats why for I2 after RD ,EX phase does nothing in T4.
Everywhere the explanation for this question is wrong, The correct explanation is:

OPERAND FORWARDING:

(RAW):

1. In case of LOAD statements data forwarding fails and the operand is available in (MA) stage of instruction Here I1 (MA) and I2 (RD)

2. While In case of ALU type statements the operand is available in EX stage of instruction Here I2(EX) and I3(RD)

This is the correct way to do such questions.

Statement 1 mentioned above is the drawback of operand forwarding due to which it is not able to solve all such dependencies.

because i am thinking same as you are...

Hope this helps you..

This has some conceptual issues. Pipeline wont stall in RD2 stage for no reasons. These decision is taken by CPU control unit after seeing(ie: decoding) the instruction). So immediately after IF2 it will perform RD2.

1
19,908 views
2
19,717 views
3