A compiler can perform some optimizations based only on local information. For example, consider the following code:
x = a + b;
x = 5 * 2;
It is easy for an optimizer to recognize that:
- The first assignment to x is a "useless" assignment, since the value computed for x is never used (and thus the first statement can be eliminated from the program).
- The expression 5 * 2 can be computed at compile time, simplifying the second assignment statement to x = 10;
Some optimizations, however, require more "global" information. For example, consider the following code:
a = 1;
b = 2;
c = 3;
if (...) x = a + 5;
else x = b + 4;
c = x + 1;
In this example, the initial assignment to c (at line 3) is useless, and the expression x + 1 can be simplified to 7, but it is less obvious how a compiler can discover these facts since they cannot be discovered by looking only at one or two consecutive statements. A more global analysis is needed so that the compiler knows at each point in the program:
- which variables are guaranteed to have constant values, and
- which variables will be used before being redefined.
To discover these kinds of properties, we use dataflow analysis. Dataflow analysis is usually performed on the program's control-flow graph (CFG); the goal is to associate with each program component (each node of the CFG) information that is guaranteed to hold at that point on all executions.