A $5$ stage pipelined CPU has the following sequence of stages:

• IF – instruction fetch from instruction memory
• RD – Instruction decode and register read
• EX – Execute: ALU operation for data and address computation
• MA – Data memory access – for write access, the register read at RD state is used.
• WB – Register write back

Consider the following sequence of instructions:

• $I_1$: $L$ $R0$, loc $1$; $R0 <= M[loc1]$
• $I_2$: $A$ $R0$, $R0$; $R0 <= R0 +R0$
• $I_3$: $S$ $R2$, $R0$; $R2 <= R2-R0$

Let each stage take one clock cycle.

What is the number of clock cycles taken to complete the above sequence of instructions starting from the fetch of $I_1$?

1. $8$
2. $10$
3. $12$
4. $15$

What should we assume about bypassing here?
These type of question how to know if we should apply data forwarding or not? If allow data forwarding, then also from which state to which state?
is it necessary to giving by passing the question ?
"If a instruction is stalling somewhere in the pipeline, it is still in some stage of the pipeline"

IF of I3 is placed in T5 and not in T3 because the preceeding instruction I2 is stuck in the IF stage of the pipeline, and therefore I3 cannot enter a stage which is already filled. Once the instruction I2 enters the RD stage,it leaves the IF stage and then, the instruction I2 can enter that stage.
What is meant by the statement given for MA i.e.

• MA – Data memory access – for write access, the register read at RD state is used.
it is given in the question that it is a 5 stage pipelining (bydefault 5 stage pipelining uses in RISC processor and in RISC  processor operand forwarding is bydefault)
• MA – Data memory access – for write access, the register read at RD state is used.

what is the exact meaning of this line?

$A)Without\ Operand-forwarding$

$1) Split-phase:Clock-cycles=11$

$2) Without\ split-phase: Clock-cycles=13$

$B)With\ Operand-forwarding$

$Clock-cycles=8$

Without data forwarding:

13 clock - WB and RD state non overlapping.

$$\begin{array}{|c|c|c|c|c|} \hline \textbf {T1} & \textbf {T2} & \textbf {T3} & \textbf {T4} & \textbf {T5} & \textbf {T6} & \textbf {T7} & \textbf {T8} & \textbf {T9} & \textbf {T10} & \textbf {T11} & \textbf {T12} & \textbf {T13} \\\hline \text{IF}& \text{RD} & \text{EX} & \text{MA} & \text{WB} & & \\\hline \text{} & \text{IF} & &&&\text{RD} & \text{EX} & \text{MA} & \text{WB} & \text{} & \text{}\\\hline &&&&& \text{IF}& & & &\text{RD} & \text{EX} & \text{MA}&\text{WB} \\\hline \end{array}$$

Here, WB and RD stage operate in Non-Overlapping mode.

11 clock - WB and RD states overlapping.

$$\begin{array}{|c|c|c|c|c|} \hline \textbf {T1} & \textbf {T2} & \textbf {T3} & \textbf {T4} & \textbf {T5} & \textbf {T6} & \textbf {T7} & \textbf {T8} & \textbf {T9} & \textbf {T10} & \textbf {T11} \\\hline \text{IF}& \text{RD} & \text{EX} & \text{MA} & \text{WB} & & \\\hline \text{} & \text{IF} & &&\text{RD} & \text{EX} & \text{MA} & \text{WB} & \text{} & \text{}\\\hline &&&& \text{IF}& & &\text{RD} & \text{EX} & \text{MA}&\text{WB} \\\hline \end{array}$$

Split Phase access between WB and RD means:

WB stage produce the output during the rising edge of the clock and RD stage fetch the output during the falling edge.

In Question it is mentioned

for write access, the register read at RD state is used.

This means that for writing operands back to memory, register read at RD state is used (no operand forward for STORE instructions).

Note

• As in any question in any subject unless otherwise stated we always consider the best case. So, do overlap - unless otherwise stated. But this is for only WB/RD
1. Why there is stall for I2 in T3 and T4 ?
RD is instruction decode and register read. IF we execute RD of I2 in T3, data from memory will not get stored to R0 hence proper operands are not available at T3. Perhaps I2 has to wait until I1 write values to memory.
2. WB of I1 and RD of I2 are operating in same clock why it is so ?
If nothing has mentioned in question. This scenario is taken into consideration by default. It is because after MA operands will be available in register so RD and WB could overlap .

With data forwarding

(Should be the case here as question says no operand forwarding for memory register for STORE instructions)

8 clock cycles

1. Why there is a stall I2 in T4 ?
Data is being forwarded from MA of I1 EX of I2 .MA operation of I1 must complete so that correct data will be available in register .
2. Why RD of I2 in T3 ? Will it not fetch incorrect information if executed before Operand are forwarded from MA of I1 ?
Yes. RD of I2 will definitely fetch INCORRECT data at T3 . But don't worry about it Operand Forwarding technique will take care of it .
3. Why can't RD of I2 be placed in T4 ?
Yes . We can place RD of I2 in T4 as well. But what is the fun in that ? pipeline is a technique used to reduce the execution time of instructions . Why do we need to make an extra stall ? Moreover there is one more problem which is discussed just below .After reading the below point  Just think if we had created a stall at T3 !
4. Why can't RD of I3 be placed at  T4 ?
This cannot be done . I3 cannot use RD because Previous instruction I2 should start next stage (EX) before current (I3) could utilize that(RD) stage . It is because data will be residing in buffers.
5. Can an operand being forwarded from one clock cycle to same clock cycle ?
No, the previous clock cycle  must complete before data being forwarded . Unless split phase technique is used
6. Cant there be a forwarding from EX stage(T3) of I1 to EX stage(T4) of I2 ?
This is not possible . See what is happening in I1 . It is Memory Read .So data will be available in register after memory read only .So data cannot be forwarded from EX of I1 .
7. In some case data is forwarded from MA and some case data is forwarded from EX Why it is so ?
Data is forwarded when it is ready . It solely depends on the type of instruction .
8. When to use Split-Phase ?
We can use split phase if data is readily available like between WB/RD and also when operand forwarding happens from EX-ID stage, but not from EX-EX stage. We cannot do split phase access between EX-EX because here the instruction execution may not be possible in the first phase. (This is not mentioned in any standard resource but said by Arjun Suresh by considering practical implementation and how previous year GATE questions have been formed)

[Mostly it is given in question that there is operand forwarding from A stage to B stage eg:https://gateoverflow.in/8218/gate2015-2_44 ]

Split-Phase can be used even when no Operand Forwarding because they aren't related.

References

by Boss (21.5k points)
edited
0
@pc
please read my comment on that question and tell me if that makes some sense.

pC,

I dont see the problem placing RD (of I2)  and IF(of I3)  in T4 rather than T3,  and the rest would be as it is . why would it introduce a additional delay as mentioned in the point in point 3 ?

0

@Harsh181996 ,
Actually pipeline is introduced to reduce the number of stall cycles as min as possible.And you says you want to create a stall intentionally .  This is actually opposite concept .
You are absolutely correct ,  In this question there is NO problem ( Will no more create any extra stall cycle )while placing it  like that .  But it is not preffered because  it may likely introduce more stalls in the follwing  instructions to come. ( How ? is explained in point 4 )
We always go for best case .

0

@Arjun SIR, @Sachin Mittal_1 , @pC

why they havn't considered split-phase by default ?

+10

Sir,How are we forwarding:-

1. MA->EX is done directly

or

2. MA->ID->EX.And if it is this case then this MA->ID thing is called as phase split?

+1
You should read the given reference link or relevant part from standard books. Otherwise, your doubts will grow exponentially and you won't understand many things even when one answers.
0

In some case data is forwarded from MA and some case data is forwarded from EX Why it is so ?
Data is forwarded when it is ready . It solely depends on the type of instruction .

quoted from the solution.

Can you please mention the operations for which data is forwarded from MA and operation for which data is forwarded from EX ?

+1

I feel Hamacher's book says something different about point 6 in the answer

6. Cant there be a forwarding from EX stage(T3) of I1 to EX stage(T4) of I2 ?
This is not possible . See what is happening in I1 . It is Memory Read .So data will be available in register after memory read only .So data cannot be forwarded from EX of I1.

In 6th edition of Hamacher's book, section 6.4. Data Dependencies, it gives following instructions:

Add R2, R3, #100
Subtract R9, R2, #30

Without operand forwarding, the solution is given as follows:

With operand forwarding, its given as follows:

Author says ALU's output can be given back as a feedback to its input to achieve above.

Based on this, I feel it should be possible for EX of I2 to execute in T4. What I am missing here :( ???

0

@GateAspirant999

See the example you quote here is a different thing ,it is showing forwarding from EX-EX stage which is not possible according to one of the above comments by Arjun Sir.

And regarding : I feel it should be possible for EX of I2 to execute in T4

It is not possible because I1 is a LOAD instruction and R0 contents will be updated only when MA is completed by I1.

We cannot load data into RO(local buffer) from memory and EX stage of I2 in same cycle because R0(local buffer) updated at end of T4 and how can we access Memory buffers in same cycle as we will not be sure whether local buffers are updated or not.

@Arjun sir

I have a doubt here.

Why can't I3 go in ID stage at T4? Why can't we assume multiple buffers here??

May be because all stages are having uniform delay.

@pC

" for write access, the register read at RD state is used. "

this case will be followed in this question with operand forwarding or without operand forwarding?

If this line is used for without operand forwarding , then this question must asking for without operand forwarding.

Then why operand forwarding is used here?

(As now there is no option in numerical question, what should be answer )
+1

@GateAspirant999

I was also having the similar doubt and thinking that I can execute the program in 7 cycles. But the problem here is that I1 is not a normal instruction like add/sub but a load instruction whose result is only available during memory access stage where we load the operand from memory to some temporary register and then during write back stage we store it from temporary register to register file.

That's why we can't do anything.

If you are through this then the following question will make sense to you.

https://gateoverflow.in/447/gate2008-36

0

Consider the following sequence of instructions:

• I1I1: L R0, loc 1; R0 <= M[loc1]
• I2I2: A R0, R0; R0 <= R0 +R0
• I3I3: S R2, R0; R2 <= R2-R0
• Explain meaning of instructions please
0

@Arjun sir

This means that for writing operands back to memory, register read at RD state is used (no operand forward for STORE instructions).

0

As in any question in any subject unless otherwise stated we always consider the best case.

By default, we use the Hierarchical Organization in cache memory technique which is worst-case scenario. What you have to say about this.

0

@pC Brother awesome answer! Thanks for other references too!

Why cannot we put RD of I3 in T4? (I read the explanation given in Answer) but what I think is, anyways RD if I2 is fetching incorrect operands in T3, then it should not make any difference if we put RD of I3 in T4?

For write access the register read at RD stage is used- this means for a STORE instruction we cannot get operand forwarded but only from RD stage. So, we can assume data forwarding is possible for all other instructions.

 T1 T2 T3 T4 T5 T6 T7 T8 IF RD EX MA WB IF RD EX MA WB IF RD EX MA WB MA -> EX forwarding done between I1 and I2 EX -> EX forwarding done between I2 and I3

http://www.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/forward.html

by Veteran (431k points)
edited by
Sir, why is there a stall in the data forwarding case? Please explain it elaborately.
+1
Because EX stage of I2 needs the data in R0 and that is filled only by the MA stage of I1.
0
Ok. So it is MA to EX data forwarding because for I1: R0 <= M[loc1] a memory access at location loc1 has to be made and that is the reason there is nothing in EX/MEM register. Right?
0
Yes. Exactly.
+1

Hi Arjun Sir ,

Could you please elaborate more on :
For write access, the register read at RD state is used." This means, no operand forwarding for memory writes ?
I'm unable to understand ..

0

I got 13 as without operand forwarding..in the explanation given above

In without OF , RD should have come after WB. i.e at T6.

Same thing at 3rd instruction also. RD should have come at T9(after WB).  Correct me if wrong.

0
Without data forwarding

Contents of mem loc  are read in t4  and  stored in t5

Decode stage of  ins i2 should be done in t6 decode stage requires ro and  r0 write takes place in t5

So ans will be 13 for without data forwarding
+1
nopes. This is solved by producing the output of WB stage in first half of a clock cycle and reading the input of RD stage in second half. So, both can happen in same clock cycle.
0

no operand forwarding for memory writes. ??

0

Nopes. It is mentioned " for write access, the register read at RD state is used"

0
sir what is  " write access" - and operand Forwarding for it ?

till now when we had to perform operations , and operand was available in pipeline , we did operand forwarding,

But what is operand forwarding for a write ?
0
that is not relevant to this question as there is no STORE operation.
0

Oky.

+1

Arjun Sir,

My query is regarding the "with operand forwarding" case:

Query1: Are you forwarding from "WB of I1 to EX of I2" or from "MA of I1 to EX of I2"  ?

Query2: If the answer to the Query1 is "WB of I1 to EX of I2"

If we can forward from WB of I1 to EX of I2, why cannot we forward from MA of I1 to EX of I2 ?

Query3: If the answer to the Query1 is "MA of I1 to EX of WB"

Output of MA stage of I1 is needed as input to the EX stage of I2. So you have taken EX of I2 just after MA of I1. Why didn't you take EX of I2 below MA of I1 as you did in the following question?

https://gateoverflow.in/8218/gate2015-2_44

Your solution / approach in gate-2005 contradicts with the solution / approach in the gate-2015 question. In the gate-2015 question you applied "write during edge, read during falling edge"  logic. Why this logic is not applicable in this (gate-2005) question?

0
+1
No, they can come in same cycle- it is ensured by doing register file write in first half of cycle and register file read during the second half.
0
0

Sir, in the question, it is given that in RD stage "RD – Instruction decode and register readt

Then, how can we have the RD from I2 in T3, right below EX from I1, shouldn't it be present in T5, where the value for R0 is available from the MA stage of I1?

0
Arjun Sir,

In the question it is mentioned that in EX stage only address computation or Alu operation will take place, then before MA phase how can the content of loc be forwarded to the next ie. I2 instruction ? In the table which made for data forwarding I am confused about this RD of I2 which is taking place at T3 whereas the memory access is yet to be made at T4 in I1 instruction?
0
Data forwarding is not to RD stage of I2. It is to EX stage of I2.
0

But register read is taking place in RD phase only (here reading R0), So won't wrong R0 value be used in EX phase of I2?bcz the EX phase is only meant for the ALU operation .Isn't it?

Okay then without data forwarding technique register value updated in Write Back can simultaneously be used in the next instruction's Decode and operand fetch stage. And when data forwarding is implemented the EX stage's ALU computed value will directly be forwarded to the next instruction's EX phase.But here it is the MEM phase from where the data is getting forwarded to the next instruction's EX phase. So whatever register (R0) read has taken place in RD phase will be replaced in EX phase by the data forwarding technique.This is what I have understood.Please correct me if I am wrong.

0

"simultaneously be used"

yes, you are correct- this is done using split phase technique.

And your other explanation is also correct- here forwarding happens from MEM stage.

In I3 instruction why is there a stall between IF and RD? It will not change the answer i know because anyways Ex phase of I3 will occur only after EX phase of I2. But i just wanna know the logic behind placement of RD instruction in case of operand forwarding in general?? please answer arjun sir
0
Only after the next stage accepts the putput of a stage can that stage proceed with the next instruction. Otherwise, there is a chance that the previously produced output getting overwritten.
0
It means that if an instruction is stalled in say IF stage no instruction can enter IF stage unless the previous instruction proceeds to ID stage??
0
0
@arjun sir ,

With data forwarding

Why there is a stall  in I2 ?
data is being forwarded from MA of I1 to EX of 12 . They are different stages so can execute in same clock. Why did u place EX of I2 in T4 ?
0
@arjun sir,for I1 and 12,data forwarding is from MA to Ex but for I2 & I3,data forwarding is from EX to EX..is it correct??if yes then why for  I2 & I3,data forwarding is from EX to EX??
0
@ arjun sir, is operand forwarding taken in default case?
0

@sushmitaYeah as nothing mentioned in question which technique shall be used?

$8$ cycles required with operand forwarding.

it is not given that RD and WB stage could overlap.

by Boss (30.8k points)
Currently overlapping is the default scenario.
0
 1 2 3 4 5 6 7 8 9 10 11 I1 IF RD EX MA WB I2 IF RD - - EX MA WB I3 IF - - RD - - EX MA WB

Arjun Sir i think table for without forwarding should be like this. Because how can CPU know about instruction until it decodes it. Tell me if anything incorrect.

+1

for reading register we need RD stage, until registers are updated(which is done in WB stage) how can you read value before as you are doing in $I_2$

0
@amarVashishth :  Actually when I2 decodes its instruction,it will come to know that it requires operand R0. Therefore we need stalls i.e without doing any write operation in PC or in memory we keep on repeating stage 2 of RD. Then at the first half of WB cycle R0 will be written by I1 then in the second half I2 will read R0.
0
Correct your without operand forwarding case.
0
when it was mentioned that hardware enables split-access phase it was utilized as here :
https://gateoverflow.in/753/gate2001_12

But, it is not mentioned in this question then it's use has been exempted, with a note at it's bottom.
+1
@amarVashisth : Then how can you tell that instruction wants to Read Register R0. Its only when you decode it. Now After decoding the instruction, compiler saw that it requires R0 which is not written yet and we do not have Operand Forwarding technique. So in that case it will just keep on executing instruction RD and then when WB of 1st instruction is over in first half, then R0 will be read in second half for 2nd instruction.
+1

ok, I got your point. But, why are you so sure that first the instruction decode will be done after that only, operands will be fetched(read).

Who knows?, the hardware implementation may be such that it reads(decodes) the instruction $$\text{A R0, R0}$$from the rightmost end and first fetches operands and then understands that they are to be added.

We should reduce as much as we can(or simply avoid) making assumptions/reality from intuitions/experience of someone/self.

What I have provided as answer has no scope of conflict as such.

+1

Happy to see high level of discussion going on :)

@Gate Very good doubt, but it is not "compiler" here:

"Now After decoding the instruction, compiler saw that it requires"

Compilers job is long done before this - generating instructions. Now, ID unit is inside CPU and as per your doubt for operand forwarding, "someone" must "predict" the operand use rt? That is a very valid doubt. There are units inside CPU which might not be there in most CO books. There are Instruction Reorder buffer, branch prediction unit etc. Your doubt is also applicable for Instruction reorder rt? So, these units must be pre-analyzing instructions. Now, each instructions initiates a set of control signals. So, operand forwarding can be enabled by adding some new control signal by "whoever" doing the pre-analysis of instructions. More details I have to ask some guys working on this..

0
@Arjun: So Sir what do you suggest for such questions in GATE 2016?
0
You need not worry about "who" and "how" the dependency is found. You can assume whoever handles the pipeline know that.
0
0

With data forwarding

Why there is a stall  in I2 ?
data is being forwarded from MA of I1 to EX of 12 . They are different stages so can execute in same clock. Why did u place EX of I2 in T5 ? Why cant' it be placed in TT5 ? Why cant' it be placed in T4 ?

+1
They cannot execute in same cycle because MA stage must be complete before operand can be forwarded to EX. This cannot happen in split phase as do for WB/RD because here wedo not have the forwarding operand ready and only after the completion of MA it will be ready which takes at least a full clock cycle.
0
@Arjun, operand required by I2 is already available when read operation of I2 is complete. So, no dependency once read of I2 is complete.
+2

@gokou.

10 cycles is correct with data forwarding.

They have said that whenever you want to write the result into a register, the register that was used at read stage is used again. So, EX stage cant overlap with MA and WB till the result in ALU is written back into the register at the WB stage since it can overwrite the result in ALU.

0
@ARjun, gokou. I am saying that because  in almost all the pipeline problems where storage buffers are not allocated for each stage, we do 'operand fetch' for 2nd ins. only after 'execute stage' of 1st ins. is completed and  'execute stage' of 2nd ins. only after 'write back' of 1st ins. is completed.

If storage buffers are allocated to store each stage results, only then we can overlap 'execute stage' of 2nd ins. with 'write back into register stage' of 1st ins.

@Arjun, am I right?
+1

I feel Hamacher's book says something different about point 6 in the answer

6. Cant there be a forwarding from EX stage(T3) of I1 to EX stage(T4) of I2 ?
This is not possible . See what is happening in I1 . It is Memory Read .So data will be available in register after memory read only .So data cannot be forwarded from EX of I1.

In 6th edition of Hamacher's book, section 6.4. Data Dependencies, it gives following instructions:

Add R2, R3, #100
Subtract R9, R2, #30

Without operand forwarding, the solution is given as follows:

With operand forwarding, its given as follows:

Author says ALU's output can be given back as a feedback to its input to achieve above.

Based on this, I feel it should be possible for EX of I2 to execute in T4. What I am missing here :( ???

0

I1: R0<= M[loc1]

lets break it down to all stages...

as that given program was executing in a pipelined processor at a certain time PC got this address of the above instruction while executing a previous instruction..say I0

now its turn for I1.

first in IF CYCLE the instruction was fetch from the memory location previously pointed by pc.

now hypothetically imagine :>>

say instruction was 0-101-11-10 (say 8 bit address)

now in RD phase its its decoded like this.:::>>>>

0 means direct address i.e. loc1=10

11 means register R0

10 is the address of loc1(we still dont know whats the data in loc1)

no register to be read for this instruction as u can see

in EX phase if it was an indirect address or indexed or relative address the effective address would have been computed..

still don't know whats the data in loc1.

now in MA phase actual load happens.

the data from mem location loc1(i.e. 10) on R0

so only after MA phase we got correct value in R0

that's why operand forwarded from MA and not from EX.

0
There are lots of things going on here...

without data forwarding and split phase..

its simply 13 clk cycles..(fully understandable)

without data forwarding but with split phase

its 11 clk cycles...

why?

see split phase meaning doing 2 diff things in each half of 1 cycle..

in first half we use it to write and other half to read.

we can do both in 1st half only as we do have diff h/w lines for reading and writing (suppose)

but that will led us into wrong or old value read.

now in split phase we use PIPO shift registers as buffers in between 2 successive stages.This makes the processor perform write and read in 1 clk cycle.

now coming to operand/data forwarding..(8 clks)

we have extra hardwares mostly comparators to check for RAW dependencies..

we say domain of an instruction is the registers in between operation is being done

range of instruction is register on which o/p is being written.

so we can perform comparation between

range(instruction i-1) & domain (instruction i)

so by default assume that when operand forwarding is not used theres no such hardware..else there is..

this is why even if with operand forwarding we are reading the regs in T3 cycle it does not matter as along with this extra circuitry is checking fot RAW and as RAW exists it will lead to 1 stall cycle.

we forward dummy signals (all 0) to next exec. phase .

thats why for I2 after RD ,EX phase does nothing in T4.
Everywhere the explanation for this question is wrong, The correct explanation is:

OPERAND FORWARDING:

(RAW):

1. In case of LOAD statements data forwarding fails and the operand is available in (MA) stage of instruction Here I1 (MA) and I2 (RD)

2. While In case of ALU type statements the operand is available in EX stage of instruction Here I2(EX) and I3(RD)

This is the correct way to do such questions.

Statement 1 mentioned above is the drawback of operand forwarding due to which it is not able to solve all such dependencies.
by (43 points)
+1

Hope this helps you..