4,619 views

​​​​​​Consider the following $\text{ANSI C}$ code segment:

z=x + 3 + y->f1 + y->f2;
for (i = 0; i < 200; i = i + 2)
{
if (z > i)
{
p = p + x + 3;
q = q + y->f1;
} else
{
p = p + y->f2;
q = q + x + 3;
}
}

Assume that the variable $y$ points to a $\textsf{struct}$ (allocated on the heap) containing two fields $\textsf{f1}$ and $\textsf{f2}$, and the local variables $\textsf{x, y, z, p, q,}$ and $\textsf{i}$ are allotted registers. Common sub-expression elimination $\text{(CSE)}$ optimization is applied on the code. The number of addition and the dereference operations (of the form $\textsf{ y ->f1}$ or $\textsf{y ->f2}$) in the optimized code, respectively, are:

1. $403$ and $102$
2. $203$ and $2$
3. $303$ and $102$
4. $303$ and $2$

Here, had the loop contained the updation as “i++” instead of “i = i+2”, will we consider i++ as an addition since it can directly be incremented in the register itself without actually performing the addition?

in this question why dereference is not 102 times  ? please explain this @Mittal sachin

@harshschauhan  i++ is nothing but i=i+1,

but here we are incrementing by 2.

t1 = x + 3 // 1 addition
t2 = y->f1; // 1 dereference
t3 = y->f2; // 1 dereference
z = t1 + t2 + t3 // 2 additions
for (i = 0; i < 200; i += 2) {
if (z > i) {
p = p + t1; // 1 addition
q = q + t2; // 1 addition
} else {
p = p + t3; // 1 addition
q = q + t1; // 1 addition
}
}

Whether we take if or else block we get $2$ additions, the loop runs exactly $\frac{200}{2} = 100$ times, so from loop we get $2\times100 = 200$ additions plus $100$ additions for incrementing the value of $i,$ before loop we had perform $3$ additions, so total additions $303.$

We only do two de-reference outside the for loop, so total de-references $= 2.$

Option D.

by

the loop runs exactly $\frac{200}{2}=100$, I didn’t understand this, please explain this point.
See the increment operation for $i$ in the for loop it’s $i+=2$

@Hira Thakur write the values of i which will be i = 0,2,4,6,8,…...198.

at 200 the loop will be become false so loop will not run for 200 so we are not including 200 in series.

i = 0,2,4,6,8….198 forms an AP .

So use formula for AP i.e Tn = a + (n-1)d , here a is 0 .

198 = 0 + (n-1)*2

from here you will get n = 100(number of times the loop will run).

t1 = x + 3;       // 1 addition
t2 = y -> f1;      // 1 dereference
t3 = y -> f2;      // 1 dereference
z = t1 + t2 + t3; // 2 additions
for (i = 0; i < 200; i += 2) { // 100 additions
if (z > i) {
p = p + t1; // 1 addition
q = q + t2; // 1 addition
} else {
p = p + t3; // 1 addition
q = q + t1; // 1 addition
}
}

So, in total we get $1 + 2 + 100 + 100 * 2 = 303$ additions and $2$ defrerences. Since all the variables are mentioned to be in registers and any way p and q are not struct objects there is no pointer aliasing issue (say if y was pointing to object p or q, we cannot move the sub expression out of the loop – they are no longer loop invariant.

Option D

Not asked in this question. But lets do some more optimizations here.

t1 = x + 3;
t2 = y -> f1;
t3 = y -> f2;
z = t1 + t2 + t3;

for (i = z+1 + (z%2); i < 200; i += 2) {
p = p + t1;
q = q + t2;
}

for (i = 0; i <= z; i += 2) {
p = p + t3;
q = q + t1;
}
}

The above optimization is loop splitting. The advantage here is now we have one less branch inside the loop – less chance of branch miss prediction and more expected instruction level parallelism – remember pipeline stalls due to branch instructions in COA. Also, now we can optimize the code even further as follows:

t1 = x + 3;
t2 = y -> f1;
t3 = y -> f2;
z = t1 + t2 + t3;

p = p + ((200-z-1-(z%2))/2) * t1;
q = q + ((200-z-1-(z%2))/2) * t2;

p = p + ((z+1)/2) * t3;
q = q + ((z+1)/2)* t1;

Again doing sub-expression elimination:

t1 = x + 3;
t2 = y -> f1;
t3 = y -> f2;
z = t1 + t2 + t3;

t4 = ((200-z-1-(z%2))/2);
p = p + t4 * t1;
q = q + t4 * t2;
//previous t4 usage is dead here
t4 = ((z+1)/2);
p = p + t4 * t3;
q = q + t4 * t1; 

So, finally, $4$ multiplications, $3$ divisions/mod, $11$ additions/subtractions.

That's what compiler does freely for you :)

by

Sir , not able to understand why we needed (z%2) in the first for loop …

I think it is because of the following reason. See in the main program, $i$ has an even sequence. $i=0,2,4,6,8,…,198$

Now suppose for example, $z=100$, then the block under $\text{if (z>i)}$ should work for $i=0,2,4,6,8,98$ and the block under else should work for the other values of $i=100, 102,...,198$.

Now suppose for example, $z=101$, then the block under $\text{if (z>i)}$ should work for $i=0,2,4,6,8,98,100$ and the block under else should work for the other values of $i=102, 104,...,198$.

Based on this I feel that the first loop shall be:

for( i=0;i<z;i+=2)
{
p=p+t1;
q=q+t2;

}

And the second loop shall be :

for( i=z+ (z%2) ;i<200;i+=2)
{
p=p+t3;
q=q+t1;
}

// such that for z=100. the loop starts with z=100. But  if z=101. the loop starts with z=102

@HitechGa

Nice explanation...

@HitechGa

Thankyou Bhai !!