Efficient compilers = less need for optimization?

There was a time in the history of programming when code optimization was a popular topic. Books like “Efficient C” written by Thomas Plum and Jim Brodie in 1985 discussed methods of measuring time and optimizing arithmetic operations. Their work was performed on machines like the 16-bit PDP-11/23, 32-bit VAX-11/780, exploring how long it took to perform basic arithmetic operations (in C) in microseconds (one millionth of a second). Here are some of the sample times per instruction (sec), calculated using 100,000 calculations:


This expose clearly showed that operations such as division and exponentiation are disproportionately expensive, as opposed to addition. Over two decades later, the differential has disappeared. If we perform the same calculations on a Macintosh 2.16Ghz Intel Core 2 Duo, we get across the board values of zero seconds. We can only get some idea of the magnitude of each of the arithmetic operations if we increase the number of calculations to 1,000,000,000, a 10,000-fold increase:


Even then the time taken to execute each of these operations is trivial. So does optimization matter anymore? In some situations, such as designing algorithms for embedded or mobile devices, where there are constraints on the size and speed of processors, there is a dire need for optimization strategies. More likely bottlenecks will occur in dealing with large pieces of data, such as manipulating image arrays with 20 million elements. If we compare the double addition for the PDP-11/23 and the Intel Core 2 Duo, we get a 20,000 fold decrease in the amount of time taken to perform the operation. Moore’s Law basically stipulates that chip density will double every 18 months, which gels well with the idea that increases in CPU speed will fix speed problems in algorithms. For those that do optimize their code it often comes with a price: intricate and convoluted code that is hard to decipher. At the end of the day, performance enhancements aren’t that critical for most programs. Where they do matter is in embedded systems with limited system resources.

Plum and Brodie also make a case for speed increases in array manipulation based on the interchangeability of subscripted array references and pointer references. They consider four scenarios: 

int a[1000][1000], b[1000000];
int i, j, *p;

// Using a two-subscript loop
for (i=0; i<1000; i=i+1)
    for (j=0; j<1000; j=j+1)
        a[i][j] = 0;

// A single subscript loop over a 1000000-element array
for (i=0; i<1000000; i=i+1)
    b[i] = 0;

// A pointer
register int *p;
for (p=b; p<b; ++p)
    *p = 0;

// Combine the increment with the indirection
    *p++ = 0;

We have increased the number of elements in the arrays from 100 to 1,000,000, but even then when they run, all four pieces have an identical running time: 0 seconds. Systems handle arithmetic operations so efficiently, it is hard to fathom even thinking about optimizing them.

NOTE: And of course this code was compiled with gcc, there are more efficient commercial C compilers out there that would likely do an even better job.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.