There are a couple more ways of improving loop efficiencies:

### Loop unrolling

Every iteration of a loop requires modification and testing of the loop variable. This over- head can be reduced if the number of iterations of the loop is small, say less than five. This is called unrolling the loop. The previous example in *loop fusion,* can be *unrolled,* further reducing overhead:

int i, a[1000], b[1000]; for (i=0; i<1000; i=i+2) { a[i] = 0; b[i] = 0; a[i+1] = 0; b[i+1] = 0; }

The most extreme case is to eliminate the loop completely and work as sequential line code. For example:

for (i=1; i<=3; i=i+1) { x = x + sqrt(i); }

would arguably be better expressed as:

x = x + sqrt(1) + sqrt(2) + sqrt(3);

### removing loop-independent expressions

Expressions in loops that perform a calculation independent of the loop should be evalu- ated outside the loop. This is sometimes known as loop streamlining:

for (i=1; i<=1000; i=i+1) sum = sum + pow(x,4.0);

should be replaced by

powX = pow(x,4.0); for (i=1; i<=1000; i=i+1) sum = sum + powX;

We remove the expression pow(x,4.0) because its calculation is independent of the loop and would otherwise be calculated 1000 times. Consider the following nested loop:

for (i=1; i<=100; i=i+1) for (j=1; j<=100; j=j+1) for (k=1; k<=100; k=k+1)

The nested structure loops 100 × 100 × 100 = 1,000,000 times, meaning that a statement such as **pow(x,4.0)** would be executed 1 million times. For a nested loop struc- ture, the deeper a loop is nested, the higher the dividend with respect to efficiency. Always optimize inner loops first. The same principle can be applied to nested loops, by moving independent expres- sions from an inner loop outward. For example, some repeated calculations make use of the subscript as part of the calculation:

for (i=1; i<=100; i=i+1) for (j=1; j<=100; j=j+1) A[i][j] = B[i][j] + c/i;

The calculation** c/i** is a repeat calculation and should be removed from the inner loop. A better way to code this is

for (i=1; i<=100; i=i+1){ ci = c / i; for (j=1; j<=100; j=j+1) A[i][j] = B[i][j] + ci; }

Now the value **c/i** is calculated 100 times instead of 10,000 times.