Dark Mode

4,619 views

12 votes

Consider the following $\text{ANSI C}$ code segment:

z=x + 3 + y->f1 + y->f2; for (i = 0; i < 200; i = i + 2) { if (z > i) { p = p + x + 3; q = q + y->f1; } else { p = p + y->f2; q = q + x + 3; } }

Assume that the variable $y$ points to a $\textsf{struct}$ (allocated on the heap) containing two fields $\textsf{f1}$ and $\textsf{f2}$, and the local variables $\textsf{x, y, z, p, q,}$ and $\textsf{i}$ are allotted registers. Common sub-expression elimination $\text{(CSE)}$ optimization is applied on the code. The number of addition and the dereference operations (of the form $\textsf{ y ->f1}$ or $\textsf{y ->f2}$) in the optimized code, respectively, are:

- $403$ and $102$
- $203$ and $2$
- $303$ and $102$
- $303$ and $2$

0

19 votes

Best answer

t1 = x + 3 // 1 addition t2 = y->f1; // 1 dereference t3 = y->f2; // 1 dereference z = t1 + t2 + t3 // 2 additions for (i = 0; i < 200; i += 2) { if (z > i) { p = p + t1; // 1 addition q = q + t2; // 1 addition } else { p = p + t3; // 1 addition q = q + t1; // 1 addition } }

Whether we take if or else block we get $2$ additions, the loop runs exactly $\frac{200}{2} = 100$ times, so from loop we get $2\times100 = 200$ additions plus $100$ additions for incrementing the value of $i,$ before loop we had perform $3$ additions, so total additions $303.$

We only do two de-reference outside the for loop, so total de-references $= 2.$

Option D.

@Hira Thakur write the values of i which will be i = 0,2,4,6,8,…...198.

at 200 the loop will be become false so loop will not run for 200 so we are not including 200 in series.

i = 0,2,4,6,8….198 forms an AP .

So use formula for AP i.e Tn = a + (n-1)d , here a is 0 .

198 = 0 + (n-1)*2

from here you will get n = 100(number of times the loop will run).

1

8 votes

t1 = x + 3; // 1 addition t2 = y -> f1; // 1 dereference t3 = y -> f2; // 1 dereference z = t1 + t2 + t3; // 2 additions for (i = 0; i < 200; i += 2) { // 100 additions if (z > i) { p = p + t1; // 1 addition q = q + t2; // 1 addition } else { p = p + t3; // 1 addition q = q + t1; // 1 addition } }

So, in total we get $1 + 2 + 100 + 100 * 2 = 303$ additions and $ 2$ defrerences. Since all the variables are mentioned to be in registers and any way p and q are not struct objects there is no pointer aliasing issue (say if y was pointing to object p or q, we cannot move the sub expression out of the loop – they are no longer loop invariant.

Option D

Not asked in this question. But lets do some more optimizations here.

t1 = x + 3; t2 = y -> f1; t3 = y -> f2; z = t1 + t2 + t3; for (i = z+1 + (z%2); i < 200; i += 2) { p = p + t1; q = q + t2; } for (i = 0; i <= z; i += 2) { p = p + t3; q = q + t1; } }

The above optimization is loop splitting. The advantage here is now we have one less branch inside the loop – less chance of branch miss prediction and more expected instruction level parallelism – remember pipeline stalls due to branch instructions in COA. Also, now we can optimize the code even further as follows:

t1 = x + 3; t2 = y -> f1; t3 = y -> f2; z = t1 + t2 + t3; p = p + ((200-z-1-(z%2))/2) * t1; q = q + ((200-z-1-(z%2))/2) * t2; p = p + ((z+1)/2) * t3; q = q + ((z+1)/2)* t1;

Again doing sub-expression elimination:

t1 = x + 3; t2 = y -> f1; t3 = y -> f2; z = t1 + t2 + t3; t4 = ((200-z-1-(z%2))/2); p = p + t4 * t1; q = q + t4 * t2; //previous t4 usage is dead here t4 = ((z+1)/2); p = p + t4 * t3; q = q + t4 * t1;

So, finally, $4$ multiplications, $3$ divisions/mod, $11$ additions/subtractions.

That's what compiler does freely for you :)

1

I think it is because of the following reason. See in the main program, $i$ has an even sequence. $i=0,2,4,6,8,…,198$

Now suppose for example, $z=100$, then the block under $\text{if (z>i)}$ should work for $i=0,2,4,6,8,98$ and the block under else should work for the other values of $i=100, 102,...,198$.

Now suppose for example, $z=101$, then the block under $\text{if (z>i)}$ should work for $i=0,2,4,6,8,98,100$ and the block under else should work for the other values of $i=102, 104,...,198$.

Based on this I feel that the first loop shall be:

for( i=0;i<z;i+=2) { p=p+t1; q=q+t2; }

And the second loop shall be :

for( i=z+ (z%2) ;i<200;i+=2) { p=p+t3; q=q+t1; } // such that for z=100. the loop starts with z=100. But if z=101. the loop starts with z=102

2