3.5k views

Delayed branching can help in the handling of control hazards

The following code is to run on a pipelined processor with one branch delay slot:

I1: ADD $R2 \leftarrow R7 + R8$

I2: Sub $R4 \leftarrow R5 – R6$

I3: ADD $R1 \leftarrow R2 + R3$

I4: STORE Memory $[R4] \leftarrow R1$

BRANCH to Label if $R1 == 0$

Which of the instructions I1, I2, I3 or I4 can legitimately occupy the delay slot without any program modification?

1. I1
2. I2
3. I3
4. I4
retagged | 3.5k views
0

What is Delayed Branching ?
One way to maximize the use of the pipeline, is to find an instruction that can be safely exeucted whether the branch is taken or not, and execute that instruction. So, when a branch instruction is encountered, the hardware puts the instruction following the branch into the pipe and begins executing it, just as in predict-not-taken. However, unlike in predict-not-taken, we do not need to worry about whether the branch is taken or not, we do not need to clear the pipe because no matter whether the branch is taken or not, we know the instruction is safe to execute.
Moving $I_1$ after branch

• $I1$ is updating the value of  $R2$
• $R2$ which is used to determine branch condition $R1$
• Value of $R2$ is available after branch
$\Rightarrow$ Cannot be moved

Moving $I_3$ after branch

• value of $R1$ is computed in this instruction
• $R1$ is the branch condition
$\Rightarrow$ Cannot be moved

Moving $I_4$ after branch

• $I4$ is simple store instruction used to store R1 in memory

• program execution will have no effect if this is placed after conditional branch
$\Rightarrow$ Can be moved

Moving $I_2$ after branch

• It update the memory location to place the storing of conditional branch instruction $R1$
• If moved after branch , when compiler reaches $I4$ program execution will stop
$\Rightarrow$ Cannot be moved

edited
0
Why do u think $I_{4}$ is better choice than $I_{2}$

in $I_{2}$ $R_{4}$ is a location of register

while in $I_{4}$ ,$R_{4}$ is memory location.

So, which one better to put in delay slot?
0
@srestha , just think what happens when you put I2 after braching ? ( when compiler reaches I4  ) will it be able to countinue ?

Well You can move I2 and I4 after brach . that option is not given so i choosed I4 .
If Anything wrong with my thinking  pls do  point it out
+7
The branch condition is dependent on R1 which is in turn dependent on R2. So moving instructions that modify R1 and R2 after the branch instruction is not the Option. So we can eliminate I1 and I3.

If we move I2 after branch, and keep all the I1, I3 and I4 before the branch, then a time will come when I4 is about to get executed. At that time the compiler would be clueless(or will get wrong/old value) as to what the value of R4 is. This is because R4 gets a value at I2 but that has been shifted after branch. So we cannot move I2 also.

The only remaining option left is I4.

Option D

@pC please check this and point out the mistakes (if any) in my understanding.
0
good explanation @PC boss
we have 5 inst. and let pipeline stages will be 5.so we can process all the inst. in (n+k-1)=(5+5-1)=9 clock cycle.

if in 9 clock cycle we are able to arrange in such a way that the dependancy will not violate.but the order may be different.  then it will be ok.

I1 CANT BE AFTER I3 (DUE TO DEPENDENCY ON R2)

I2 CANT BE AFTER I4 (COZ  I4 REQUIRE MEM[R4] OF I2)

I3 CANT BE AFTER I4 (DUE TO DEPENDENCY ON R1)

SO  I1 THEN I3

I2,I3 THEN I4

=>I4 CANT BE BEFORE I1,I2,I3.

=>LET I1 FIRST(5 CLOCK)

I2(1 CLOCK)

I3(1 CLOCK)

BRANCH(1 CLOCK)

I4 (1 CLOCK)

TOTAL 9 CLOCK AND THAT IS WHAT WE WANT

It can't be l1 or l3, because directly or indirectly they are taking part in the branching decision. Now we can have both l2 and l4 after the branching decision statement and  the order of I2 and I4 matters because in I2 we are getting the final value in register R4 and in instruction we are saving contents of R1 in memory whose address is stored in the register. So If we made I2 to be the instruction after branch then the value in the first loop itself the value stored in memory location whose address is stored in R4 would be wrong because it actually should have been updated first by R5-R6.So  I4 is correct. So (D) is correct option.

You should know about Delayed Branching concept first for such type of problem. It is the concept where in delay slot(the instruction space following branch instruction) we insert that instruction which is always executed, whether branch is taken or not.

Hence, let's consider branch instruction as Ij and the instruction following branch instruction as Ij+1, so in delay slot, we can't place Ij+1 , as it may be discarded when branch is taken, hence we can place instruction preceding to branch instruction in delay slot, So as per above program I4 should be placed in Delay Slot.

So remember in Delayed branching, we are not logically changing order of instructions, only branch instruction is executed one instruction later(Delayed).

1
2