Caveats of algorithm complexity (i)

The birth of the modern computer in the 1940s allowed vast problems to be solved in an expedited manner. In these early years the use of computers, such as the Colossus, were restricted to military applications, such as for cracking German encrypted messages during the Second World War. Indeed Colossus, a machine running the equivalent of 5.8Mhz was an immense structure capable of simplistic, yet highly repetitive tasks. Skip forward over seventy years, the speed of CPUs has increased in excess of 5GHz, they are being grouped into clusters, and the applications they run are inherently more complex. The question we have to ask ourselves is whether a complex algorithm actually performs any better than a simple one?

Brute force = simplicity

In the early days of chess-playing programs, many of the algorithms were based on human inspired heuristics, and didn’t perform very well. In the late 1970’s, Ken Thompson and Joe Condon built Belle, the first chess playing computer,  which took advantage of what computers do best: searching through as many possibilities as they can as quickly as they can. IBMs Deep Blue, which played Garry Kasparov in 1997, worked on this principle of brute force. Deep Blue contained 480 special purpose chess chips and was capable of evaluating 200 million positions per second. Brute force, also known as exhaustive search, is a simple algorithm which tries every possible combination of candidates for a solution until the most appropriate is found. There are situations where a simple algorithm is the best approach. A dictionary is sorted alphabetically, so a brute force approach to finding the word Templar would require testing every word from the letter a. Not the most efficient algorithm, but it is simple, and it does work. Were the dictionary unsorted, then brute force would likely be the best approach.

Consider the task of finding relevant images on flickrflickr is a visually-based social networking site devoted to photographs. It currently maintains in excess of two billion images, with 7-10 million new images added each day. The easiest way of searching for this information is by way of tags associated with each image. A tag is a non-hierarchical keyword assigned to an image. At the other end of the complexity spectrum is contentbased image retrieval (CBIR). It would ideally allow searches to be performed based on a keyword or a given image. The search would be performed, by matching information extracted from the content of each of the images in the database. The problem with CBIR is the complexity involved in extracting relevant information from images. A human can look at a photograph and intrinsically find objects within the image, and easily identify the context of the photograph, a computer can not. A computer must partition an image into what it considers relevant objects, extract information on those objects and try and make some prediction about what is in the image based on its knowledge base. By trying to model algorithms on the basis of the human visual system, we are adding complexity to the problem, and building algorithms which may never work.