+59 votes
7.6k views

A 5 stage pipelined CPU has the following sequence of stages:

• IF – instruction fetch from instruction memory
• RD – Instruction decode and register read
• EX – Execute: ALU operation for data and address computation
• MA – Data memory access – for write access, the register read at RD state is used.
• WB – Register write back

Consider the following sequence of instructions:

• $I_1$: L R0, loc 1; R0 <= M[loc1]
• $I_2$: A R0, R0; R0 <= R0 +R0
• $I_3$: S R2, R0; R2 <= R2-R0

Let each stage take one clock cycle

What is the number of clock cycles taken to complete the above sequence of instructions starting from the fetch of $I_1$?

1. 8
2. 10
3. 12
4. 15
asked
retagged | 7.6k views
What should we assume about bypassing here?
These type of question how to know if we should apply data forwarding or not? If allow data forwarding, then also from which state to which state?
is it necessary to giving by passing the question ?
"If a instruction is stalling somewhere in the pipeline, it is still in some stage of the pipeline"

IF of I3 is placed in T5 and not in T3 because the preceeding instruction I2 is stuck in the IF stage of the pipeline, and therefore I3 cannot enter a stage which is already filled. Once the instruction I2 enters the RD stage,it leaves the IF stage and then, the instruction I2 can enter that stage.

What is meant by the statement given for MA i.e.

• MA – Data memory access – for write access, the register read at RD state is used.

## 3 Answers

+74 votes
Best answer

Answer here is option A

Without data forwarding:

13 clock - WB and RD state non overlapping .

 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 IF RD EX MA WB IF RD EX MA WB IF RD EX MA WB

Here , WB and RD stage operate in Non- Overlaping mode .

11 clock - WB and RD state  overlapping .

 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 IF RD EX MA WB IF RD EX MA WB IF RD EX MA WB

Split Phase access between WB and RD means :

WB stage produce the output during the rising edge of the clock and RD stage fetch the output during the falling edge.

In Question it is  mentioned

for write access, the register read at RD state is used.

This means that for writing operands back to memory, register read at RD state is used (no operand forward for STORE instructions).

Note

• As in any question in any subject unless otherwise stated we always consider the best case. So, do overlap - unless otherwise stated. But this is for only WB/RD
1. Why there is stall for I2 in T3 and T4 ?
RD is instruction decode and register read . IF we execute RD of I2 in T3 . data from memory will not get stored to R0 hence proper operands are not available at T3 . Perhaps I2 has to wait untill I1 write values to memory
2. WB of I1 and RD of I2 are operating in same clock why it is so ?
If nothing has mentioned in question . This senario is taken into consideration by default . It is because after MA operands will be available in register so RD and WB could overlap .

With data forwarding

(Should be the case here as question says no operand forwarding for memory register for STORE instructions)

8 clock cycles

1. Why there is a stall I2 in T4 ?
Data is being forwarded from MA of I1 EX of I2 .MA operation of I1 must complete so that correct data will be available in register .
2. Why RD of I2 in T3 ? Will it not fetch incorrect information if executed before Operand are forwarded from MA of I1 ?
Yes. RD of I2 will definetly fetch INCORRECT data at T3 . But don't worry about it Operand Forwarding technique will take care of it .
3. Why can't RD of I2 be placed in T4 ?
Yes . We can place RD of I2 in T4 as well. But what is the fun in that ? pipeline is a technique used to reduce the execution time of instrcutions . Why do we need to make an extra stall ? Moreover there is one more problem which is discussed just below .After reading the below point  Just think if we had created a stall at T3 !
4. Why can't RD of I3 be placed at  T4 ?
This cannot be done . I3 cannot use RD because Previous instrction I2 should start next stage (EX) before current (I3) could utilise that(RD) stage . It is because data will be residing in buffers.
5. Can an operand being forwarded from one clock cycle to same clock cycle ?
No . the previous clock cycle  must complete before data being forwarded . Unless split phase technique is used
6. Cant there be a forwarding from EX stage(T3) of I1 to EX stage(T4) of I2 ?
This is not possible . See what is happening in I1 . It is Memory Read .So data will be available in register after memory read only .So data cannot be forwarded from EX of I1 .
7. In some case data is forwarded from MA and some case data is forwarded from EX Why it is so ?
Data is forwarded when it is ready . It solely depends on the type of instruction .
8. When to use Split-Phase ?

[mostly when it is given in question that there is operand forwarding from A stage to B stage eg:https://gateoverflow.in/8218/gate2015-2_44 ]

Split-Phase can be used even when  no Operand Forwarding beacause they aren't releated

References

Similar Question

Discussions

answered by Veteran (24.4k points)
selected by

Sir,How are we forwarding:-

1. MA->EX is done directly

or

2. MA->ID->EX.And if it is this case then this MA->ID thing is called as phase split?

Please help here @arjun sir,
You should read the given reference link or relevant part from standard books. Otherwise, your doubts will grow exponentially and you won't understand many things even when one answers.

In some case data is forwarded from MA and some case data is forwarded from EX Why it is so ?
Data is forwarded when it is ready . It solely depends on the type of instruction .

quoted from the solution.

Can you please mention the operations for which data is forwarded from MA and operation for which data is forwarded from EX ?

I feel Hamacher's book says something different about point 6 in the answer

6. Cant there be a forwarding from EX stage(T3) of I1 to EX stage(T4) of I2 ?
This is not possible . See what is happening in I1 . It is Memory Read .So data will be available in register after memory read only .So data cannot be forwarded from EX of I1.

In 6th edition of Hamacher's book, section 6.4. Data Dependencies, it gives following instructions:

Add R2, R3, #100
Subtract R9, R2, #30

Without operand forwarding, the solution is given as follows:

With operand forwarding, its given as follows:

Author says ALU's output can be given back as a feedback to its input to achieve above.

Based on this, I feel it should be possible for EX of I2 to execute in T4. What I am missing here :( ???

@GateAspirant999

See the example you quote here is a different thing ,it is showing forwarding from EX-EX stage which is not possible according to one of the above comments by Arjun Sir.

And regarding : I feel it should be possible for EX of I2 to execute in T4

It is not possible because I1 is a LOAD instruction and R0 contents will be updated only when MA is completed by I1.

We cannot load data into RO(local buffer) from memory and EX stage of I2 in same cycle because R0(local buffer) updated at end of T4 and how can we access Memory buffers in same cycle as we will not be sure whether local buffers are updated or not.

@Arjun sir

I have a doubt here.

Why can't I3 go in ID stage at T4? Why can't we assume multiple buffers here??

May be because all stages are having uniform delay.

Not sure. Please confirm :)
@pC

" for write access, the register read at RD state is used. "

this case will be followed in this question with operand forwarding or without operand forwarding?

If this line is used for without operand forwarding , then this question must asking for without operand forwarding.

Then why operand forwarding is used here?

(As now there is no option in numerical question, what should be answer )

@GateAspirant999

I was also having the similar doubt and thinking that I can execute the program in 7 cycles. But the problem here is that I1 is not a normal instruction like add/sub but a load instruction whose result is only available during memory access stage where we load the operand from memory to some temporary register and then during write back stage we store it from temporary register to register file.

That's why we can't do anything.

If you are through this then the following question will make sense to you.

https://gateoverflow.in/447/gate2008-36

Consider the following sequence of instructions:

• I1I1: L R0, loc 1; R0 <= M[loc1]
• I2I2: A R0, R0; R0 <= R0 +R0
• I3I3: S R2, R0; R2 <= R2-R0
• Explain meaning of instructions please
+21 votes

answer = option A

$8$ cycles required with operand forwarding.

it is not given that RD and WB stage could overlap.

answered by Veteran (31k points)
Currently overlapping is the default scenario.
 1 2 3 4 5 6 7 8 9 10 11 I1 IF RD EX MA WB I2 IF RD - - EX MA WB I3 IF - - RD - - EX MA WB

Arjun Sir i think table for without forwarding should be like this. Because how can CPU know about instruction until it decodes it. Tell me if anything incorrect.

Your second table is correct.

for reading register we need RD stage, until registers are updated(which is done in WB stage) how can you read value before as you are doing in $I_2$

@amarVashishth :  Actually when I2 decodes its instruction,it will come to know that it requires operand R0. Therefore we need stalls i.e without doing any write operation in PC or in memory we keep on repeating stage 2 of RD. Then at the first half of WB cycle R0 will be written by I1 then in the second half I2 will read R0.
Correct your without operand forwarding case.
when it was mentioned that hardware enables split-access phase it was utilized as here :
https://gateoverflow.in/753/gate2001_12

But, it is not mentioned in this question then it's use has been exempted, with a note at it's bottom.
@amarVashisth : Then how can you tell that instruction wants to Read Register R0. Its only when you decode it. Now After decoding the instruction, compiler saw that it requires R0 which is not written yet and we do not have Operand Forwarding technique. So in that case it will just keep on executing instruction RD and then when WB of 1st instruction is over in first half, then R0 will be read in second half for 2nd instruction.

ok, I got your point. But, why are you so sure that first the instruction decode will be done after that only, operands will be fetched(read).

Who knows?, the hardware implementation may be such that it reads(decodes) the instruction $$\text{A R0, R0}$$from the rightmost end and first fetches operands and then understands that they are to be added.

We should reduce as much as we can(or simply avoid) making assumptions/reality from intuitions/experience of someone/self.

What I have provided as answer has no scope of conflict as such.

Happy to see high level of discussion going on :)

@Gate Very good doubt, but it is not "compiler" here:

"Now After decoding the instruction, compiler saw that it requires"

Compilers job is long done before this - generating instructions. Now, ID unit is inside CPU and as per your doubt for operand forwarding, "someone" must "predict" the operand use rt? That is a very valid doubt. There are units inside CPU which might not be there in most CO books. There are Instruction Reorder buffer, branch prediction unit etc. Your doubt is also applicable for Instruction reorder rt? So, these units must be pre-analyzing instructions. Now, each instructions initiates a set of control signals. So, operand forwarding can be enabled by adding some new control signal by "whoever" doing the pre-analysis of instructions. More details I have to ask some guys working on this..

@Arjun: So Sir what do you suggest for such questions in GATE 2016?
You need not worry about "who" and "how" the dependency is found. You can assume whoever handles the pipeline know that.
Updated.

With data forwarding

Why there is a stall  in I2 ?
data is being forwarded from MA of I1 to EX of 12 . They are different stages so can execute in same clock. Why did u place EX of I2 in T5 ? Why cant' it be placed in TT5 ? Why cant' it be placed in T4 ?

They cannot execute in same cycle because MA stage must be complete before operand can be forwarded to EX. This cannot happen in split phase as do for WB/RD because here wedo not have the forwarding operand ready and only after the completion of MA it will be ready which takes at least a full clock cycle.
@Arjun, operand required by I2 is already available when read operation of I2 is complete. So, no dependency once read of I2 is complete.

@gokou.

10 cycles is correct with data forwarding.

They have said that whenever you want to write the result into a register, the register that was used at read stage is used again. So, EX stage cant overlap with MA and WB till the result in ALU is written back into the register at the WB stage since it can overwrite the result in ALU.

@ARjun, gokou. I am saying that because  in almost all the pipeline problems where storage buffers are not allocated for each stage, we do 'operand fetch' for 2nd ins. only after 'execute stage' of 1st ins. is completed and  'execute stage' of 2nd ins. only after 'write back' of 1st ins. is completed.

If storage buffers are allocated to store each stage results, only then we can overlap 'execute stage' of 2nd ins. with 'write back into register stage' of 1st ins.

@Arjun, am I right?

I feel Hamacher's book says something different about point 6 in the answer

6. Cant there be a forwarding from EX stage(T3) of I1 to EX stage(T4) of I2 ?
This is not possible . See what is happening in I1 . It is Memory Read .So data will be available in register after memory read only .So data cannot be forwarded from EX of I1.

In 6th edition of Hamacher's book, section 6.4. Data Dependencies, it gives following instructions:

Add R2, R3, #100
Subtract R9, R2, #30

Without operand forwarding, the solution is given as follows:

With operand forwarding, its given as follows:

Author says ALU's output can be given back as a feedback to its input to achieve above.

Based on this, I feel it should be possible for EX of I2 to execute in T4. What I am missing here :( ???

+18 votes

For write access the register read at RD stage is used- this means for a STORE instruction we cannot get operand forwarded but only from RD stage. So, we can assume data forwarding is possible for all other instructions.

 T1 T2 T3 T4 T5 T6 T7 T8 IF RD EX MA WB IF RD EX MA WB IF RD EX MA WB MA -> EX forwarding done between I1 and I2 EX -> EX forwarding done between I2 and I3

Hence, answer will be 8.

http://www.cs.iastate.edu/~prabhu/Tutorial/PIPELINE/forward.html

answered by Veteran (346k points)
edited by
Data forwarding is not to RD stage of I2. It is to EX stage of I2.

But register read is taking place in RD phase only (here reading R0), So won't wrong R0 value be used in EX phase of I2?bcz the EX phase is only meant for the ALU operation .Isn't it?

Okay then without data forwarding technique register value updated in Write Back can simultaneously be used in the next instruction's Decode and operand fetch stage. And when data forwarding is implemented the EX stage's ALU computed value will directly be forwarded to the next instruction's EX phase.But here it is the MEM phase from where the data is getting forwarded to the next instruction's EX phase. So whatever register (R0) read has taken place in RD phase will be replaced in EX phase by the data forwarding technique.This is what I have understood.Please correct me if I am wrong.

"simultaneously be used"

yes, you are correct- this is done using split phase technique.

And your other explanation is also correct- here forwarding happens from MEM stage.

In I3 instruction why is there a stall between IF and RD? It will not change the answer i know because anyways Ex phase of I3 will occur only after EX phase of I2. But i just wanna know the logic behind placement of RD instruction in case of operand forwarding in general?? please answer arjun sir
Only after the next stage accepts the putput of a stage can that stage proceed with the next instruction. Otherwise, there is a chance that the previously produced output getting overwritten.
It means that if an instruction is stalled in say IF stage no instruction can enter IF stage unless the previous instruction proceeds to ID stage??
yes..
@arjun sir ,

With data forwarding

Why there is a stall  in I2 ?
data is being forwarded from MA of I1 to EX of 12 . They are different stages so can execute in same clock. Why did u place EX of I2 in T4 ?
@arjun sir,for I1 and 12,data forwarding is from MA to Ex but for I2 & I3,data forwarding is from EX to EX..is it correct??if yes then why for  I2 & I3,data forwarding is from EX to EX??
@ arjun sir, is operand forwarding taken in default case?
Answer:

+18 votes
7 answers
1
+20 votes
4 answers
2
+37 votes
3 answers
3