t1 = x + 3; // 1 addition
t2 = y -> f1; // 1 dereference
t3 = y -> f2; // 1 dereference
z = t1 + t2 + t3; // 2 additions
for (i = 0; i < 200; i += 2) { // 100 additions
if (z > i) {
p = p + t1; // 1 addition
q = q + t2; // 1 addition
} else {
p = p + t3; // 1 addition
q = q + t1; // 1 addition
}
}
So, in total we get $1 + 2 + 100 + 100 * 2 = 303$ additions and $ 2$ defrerences. Since all the variables are mentioned to be in registers and any way p and q are not struct objects there is no pointer aliasing issue (say if y was pointing to object p or q, we cannot move the sub expression out of the loop – they are no longer loop invariant.
Option D
Not asked in this question. But lets do some more optimizations here.
t1 = x + 3;
t2 = y -> f1;
t3 = y -> f2;
z = t1 + t2 + t3;
for (i = z+1 + (z%2); i < 200; i += 2) {
p = p + t1;
q = q + t2;
}
for (i = 0; i <= z; i += 2) {
p = p + t3;
q = q + t1;
}
}
The above optimization is loop splitting. The advantage here is now we have one less branch inside the loop – less chance of branch miss prediction and more expected instruction level parallelism – remember pipeline stalls due to branch instructions in COA. Also, now we can optimize the code even further as follows:
t1 = x + 3;
t2 = y -> f1;
t3 = y -> f2;
z = t1 + t2 + t3;
p = p + ((200-z-1-(z%2))/2) * t1;
q = q + ((200-z-1-(z%2))/2) * t2;
p = p + ((z+1)/2) * t3;
q = q + ((z+1)/2)* t1;
Again doing sub-expression elimination:
t1 = x + 3;
t2 = y -> f1;
t3 = y -> f2;
z = t1 + t2 + t3;
t4 = ((200-z-1-(z%2))/2);
p = p + t4 * t1;
q = q + t4 * t2;
//previous t4 usage is dead here
t4 = ((z+1)/2);
p = p + t4 * t3;
q = q + t4 * t1;
So, finally, $4$ multiplications, $3$ divisions/mod, $11$ additions/subtractions.
That's what compiler does freely for you :)