602 views
A pipelined processor has two branch delay slots. An optimizing compiler can fill one of these slots $85$ % of the time, and can fill the second slot only $20$ % of the time.

If we assume that $20$ % of the instructions executed are branch instructions, then the percentage improvement in performance achieved by this optimization is ________%.

Number of cycles needed to execute 100 instructions:

1. For 100 instructions 20 instructions creates 2 stall cycle ( because they mention a pipeline processor has two branch delay slots.)  because of branch instruction.

So total no of stall cycle/instruction = 0.80*0 + 0.20*2 = 0.40 Stall/instruction

Total time = 1 + 0.40 = 1.40 cycle/instruction .

now for 100 instructions total time takes 1.40 * 100 = 140 ( this is without optimization)

so 140 is without optimization value,

2.  With optimization ( 140 - 20 * 0.85- 20 * 0.2 ) = 119

In question it is given "we assume that 20 percent of the instructions executed are branch instructions "

here 20 is number of instructions out of 100 instructions

so 20* 0.85 as (optimizing compiler can fill one of these slots 85 percent of the time - is given in question)  and 20 * 0.2 ( as given - can fill the second slot only 20 percent of the time. )  and subtract that from 140, so it gives 119

see here

{140 -( 20 * 0.85) - (20* 0.2) } = 119

now to calculate  percentage improvement in performance achieved by this optimization they use the formula :

(without optimization time / with optimization time ) * 100

that means {(140 / 119 ) - 1 * 100} = 17.64 %

Earlier performance was 1 and now it is  1.1764 ( 140/119 = 1.1764)   so performance improvement by the optimization is 17.64%

by

Instead of this if we use Throughput method then it will be more intuitive for the readers also....

Throughput initial(Ti)= 100/1.4=71.428

Throughput new(Tn)=100/1.19=84.0336

Thus percentage increase = ((Tn-Ti)/Ti)*100 = ((84.0336-71.428)/71.428)*100= (12.6056/71.428)*100 = 17.647

And in fact Throughput is an actual performance measure as written in Hamacher also.

--------------------------------------------------------------------------
First of all, there are 2 slots, and both of them are independent of each other means it is not same a combinatorial problem filling the first box then second so it is not that you'll get 0 stalls by 0.85 * 0.20 times and 1 stall 0.85*(1-.20) times, THIS IS WRONG APPROACH.

85% times compiler can fill the first slot, it means (1-0.85) times compiler can't fill i.e. 0.15 times there will be 1 stall(we can't say anything about 2nd slot so we can't write 0.85 * 1 by thinking that 0.85 times there will be no stall from 1st slot but there will be from 2nd slot, both slots are independent to each other).
and the second slot is also filled 20% time i.e (1-0.20)=0.80 times there will be 1 stall (notice they are independent of each other). THIS IS CORRECT APPROACH.
Rest you can get by any other answer given above or answer given by vijaycs.

performance improvement should not it be ((140-119)/140) * 100 which comes 15% @Bikram ??

1 vote