The art of tagging (i)

“One picture is worth ten thousand words”, a phrase from an ad by Fred R. Barnard in a  1927 issue of the trade journal Printers’ Ink, probably spurned the classic phrase “a picture’s worth a thousand words”. This phrase refers to the idea that complex stories can be explained by a single image. But how many words do we need to describe a picture? Until recently, this may not have mattered, but now with the advent of immense online image repositories, the onus has fallen on the necessity to tag images. A tag is a keyword or term associated with a piece of information. It describes the item and facilitates text-based classification and searching. It is a form of metadata. In the context of digital image repositories, tagging allows images to be described, and is easier than trying to find some form of feature which describes the image. It is used to label content. Part of the notion of tagging relates to viewing an image and identifying items which help describe the image. Identifying items requires the innate ability of the minds image processing system to segment an image. Having described an image, this information can be used in image searching queries.

In recent years there has been immense interest in the application of computer based techniques to “retrieve” images. The foremost limitation of computer-based visual systems is that computers interpret numbers, yet we try and design algorithms which mimic the human visual system, a system with 100 million years of evolutionary design, which works on dynamic images in real-time. Humans can look at a cereal package, determine it is a rectangular box, and even estimate its approximate dimensions, primarily because humans perceive in 3D. By interpreting text and images on the sides of the cereal package humans are able to allude to its contents. Deriving an algorithm to perform the same using an image from a digital video stream is more of a challenge. The brain has models of the world, and thirty distinct information processing regions to deal with colour, texture, etc. The only reason optical character recognition algorithms are so successful is that the task is much less complex. The reason is simple. Text consists of words which are well-defined models of concepts. Images of text, usually consist of black characters on a near-white background. As such it is relatively simple to extract the characters from the background and then match them against templates to determine their meaning. Consider the example below. Here, some text, the word “Trajan” as been extracted from the image of the text. The segmentation task, whilst not trivial is not difficult either.

Simplicity is the key to robust algorithms

The key to a robust algorithm of any sort lies in its simplicity. A case in point is image processing. Over the years we have seen numerous algorithms try to do numerous things. Some things, like robust, generic segmentation is beyond the scope of current algorithms. A good example of simplicity is in areas like fire or skin detection. One could design a quite complex algorithm to detect skin in an image, but the reality is that simple. Consider this simple algorithm based on the YCbCr colour space [1]:

YCbCr colour space is just a colour space which separates colour from “structural” intensity information. The two colour components, Cb and Cr are used together with some constraint ranges to extract potential “skin” regions. Simple right? But you may think it doesn’t work so well on the range of skin tones? Well, actually it performs quite well. Consider the following image containing a number of faces:

Here is the result of applying the algorithm:

As you can see it extracted all 16 face regions. Yes, some of the photos have also extracted hair, because it is similarly coloured, but there is enough information in these images to extract facial regions to allow for further analysis such as eye or mouth localization. The algorithm is simple, robust, and efficient. What more could one want from an algorithm (hey, and no AI needed!).

[1] Chai, D., Ngan, K.N., “Face segmentation using skin colour map in video phone applications”, IEEE Trans. on Circuits and Systems for Video Technology, Vol.9(4), pp.551-564 (1999)

 

Caveats of algorithm complexity (ii)

Increased intelligence = increased complexity

Consider the design of an automatic fastener recognition system for retailers. The idea is that when a customer wishes to purchase a “stainless-steel masonry bolt”, the fastener would be placed on a system which acquires an image, and returns a series of search results, which could be selected by the cashier. The image processing problem is simple (sometimes). Obtain an image, subtract from it a “template” image of the background without any objects on it, turn the resulting image into a black-and-white image containing the background (white) and the object (black), and proceed to extract features from the object such as length, width, circularity etc. At the same time the weight of the bolt is obtained using a built-in scale. When all the relevant information had been gathered, some form of learned algorithm is used to produce a list of the most probable items. Each sub-component is relatively simple in nature, the complexity may only arise when an attempt is made to integrate the systems.

Difficulties arise when there is a belief that such systems can be adapted to work with things such as identifying vegetables and fruit. The idea seems simple enough, but there is no guarantee that fruit and vegetables will always conform to a specific weight or shape category. It may be easy to determine an orange from a banana by shape, or colour, but how does the system distinguish between different types of oranges, or even organic from non-organic bananas? Now the system becomes inherently more complex, in part because colour information, and maybe even 3D shape information must be incorporated into the algorithm. However it may be impossible to derive a system which can distinguish between hot-house and field grown tomatoes. That’s why fruit and vegetables come with little stickers that bear product codes, or barcodes. Besides which barcodes can provide more information than just the type of fruit, for instance whether a piece of fruit is organic, or genetically modified, and even where it was grown.

Decrease complexity by altering the data

Sometimes the failure of an algorithm to perform is an indication of limitations in the data. Automatic number plate recognition (ANPR) systems have the capability of scanning one plate per second on cars traveling up to 160 km/h. ANPR systems typically use optical character recognition to identify the characters on a number plate. Characters such as “P” and “R” are similar and may lead to false classifications. To resolve this some countries such as the Netherlands have introduced small gaps in some letters to make them easier to identify. This is a good case for not making the algorithm more complex, but making the data easier to process. Handwriting recognition is a good example. Whilst early pen-based personal organizers struggled with handwriting recognition algorithms, the Palm X used a simple alphabet. Training a person is often easier than training the software to recognize literally millions of different styles of writing. Handwriting recognition for address identification has improved over the years, but has this been because of the increase in machine printed addresses? Royal Mail processes about 50 million items per day, with approximately 70% being machine printed.

A Dutch number plate using the font Kenteken. O, P, and R have anti-tampering cuts sliced out of the characters.

Complexity is based on need

Text segmentation is a good example for the case that complexity is based on need. Accurate text segmentation is a precursor to the process of optical character recognition (OCR), whereby written text is recognized. In the case of simple machine-derived text, segmentation is relatively easy if the context is dark text on a fairly uniform white background. If the paper is aged, yellowed, water-stained or contains foxing, the algorithm becomes more complex. Yet combinations of simple algorithms should be able to extract the text. Alternatively, extracting characters from handwritten text is more challenging, partially because everyones handwritten text is unique. In many cases it is easier for a human to try and decipher handwritten postal addresses than to design a more complex algorithm which has to be somehow tailored to individual needs.

Simplistic design nearly always works. The volume of data being produced can limit the usefulness of some complex algorithms. We live in a world mired by complexity. Have you ever wondered why handwriting recognition and voice-based systems haven’t made a huge impact?

 

 

Caveats of algorithm complexity (i)

The birth of the modern computer in the 1940s allowed vast problems to be solved in an expedited manner. In these early years the use of computers, such as the Colossus, were restricted to military applications, such as for cracking German encrypted messages during the Second World War. Indeed Colossus, a machine running the equivalent of 5.8Mhz was an immense structure capable of simplistic, yet highly repetitive tasks. Skip forward over seventy years, the speed of CPUs has increased in excess of 5GHz, they are being grouped into clusters, and the applications they run are inherently more complex. The question we have to ask ourselves is whether a complex algorithm actually performs any better than a simple one?

Brute force = simplicity

In the early days of chess-playing programs, many of the algorithms were based on human inspired heuristics, and didn’t perform very well. In the late 1970’s, Ken Thompson and Joe Condon built Belle, the first chess playing computer,  which took advantage of what computers do best: searching through as many possibilities as they can as quickly as they can. IBMs Deep Blue, which played Garry Kasparov in 1997, worked on this principle of brute force. Deep Blue contained 480 special purpose chess chips and was capable of evaluating 200 million positions per second. Brute force, also known as exhaustive search, is a simple algorithm which tries every possible combination of candidates for a solution until the most appropriate is found. There are situations where a simple algorithm is the best approach. A dictionary is sorted alphabetically, so a brute force approach to finding the word Templar would require testing every word from the letter a. Not the most efficient algorithm, but it is simple, and it does work. Were the dictionary unsorted, then brute force would likely be the best approach.

Adding complexity to a problem

Consider the task of finding relevant images on flickrflickr is a visually-based social networking site devoted to photographs. It currently maintains in excess of two billion images, with 7-10 million new images added each day. The easiest way of searching for this information is by way of tags associated with each image. A tag is a non-hierarchical keyword assigned to an image. At the other end of the complexity spectrum is contentbased image retrieval (CBIR). It would ideally allow searches to be performed based on a keyword or a given image. The search would be performed, by matching information extracted from the content of each of the images in the database. The problem with CBIR is the complexity involved in extracting relevant information from images. A human can look at a photograph and intrinsically find objects within the image, and easily identify the context of the photograph, a computer can not. A computer must partition an image into what it considers relevant objects, extract information on those objects and try and make some prediction about what is in the image based on its knowledge base. By trying to model algorithms on the basis of the human visual system, we are adding complexity to the problem, and building algorithms which may never work.

 

 

Image statistics from a pixel perspective?

Sometimes algorithms talk about the mean or variance of a region of an image. But what do they mean? To obtain a better perspective, it’s best to actually explore some images. Consider the following image in Fig.1.

Fig 1: The original image

We have extracted four 200×200 regions from the image, shown in Fig.2.

Fig.2: Different image regions

Statistics are given at the end of the post. Region B represents a part of the background sky. Region A is the same region processed using a median filter to smooth out discontinuities. In comparing region A and B, they both have similar means: 214.3 and 212.37 respectively. Yet their appearance is different – one is uniform, and the other seems to contain some variance, something we might attribute to noise in some circumstances. The variance of A is 2.65, versus 70.57 for B. Variance is a poor descriptive statistic, because it is hard to visualize, so many times it is converted to standard deviation (SD), which is just the square root of variance. For A and B, this is 1.63 and 8.4 respectively.

As shown in region C and D, more complex scenes introduce an increased variance, and SD. The variances can be associated with the distribution of the histograms shown below.

Fig 3: Comparison of histograms

When comparing two regions, a lower variance (or in reality a lower SD) usually implies a more uniform region of pixels. Generally, mean and variance are not good estimators for image because two totally different images can have same mean, and variance.

Image A

Mean = 214.3
Variance = 2.65
SD = 1.63

Image B
Mean = 212.37
Variance = 70.57
SD = 8.4

Image C
Mean = 87.22
Variance = 4463.11
SD = 66.81

Image D
Mean = 56.33
Variance = 2115.24
SD = 46.0

 

Ghosts in the machine: Why Fortran failed to maintain its hold

Fortran dominated the world of computing in the 1960s, and early 1970s, but even though it is still widely used in scientific circles, it not continue to dominate the programming world. One the core reasons might be the rise of the personal computer in the late 1970s and beyond. People were now able to write programs at home on their (probably quite expensive) personal computers.

Fortran wasn’t popular because it was a compiled language – which meant that in the world of the early 80s, a Fortran program could take 10 minutes to compile on a system which used floppy disks. Once compiled, the Fortran program would offer fast execution.  Interpreted languages on the other hand, were slow to interpret the program, but this likely was less of an issue as interpreted programs were often small enough that the speed they ran at was not an issue. Fortran’s libraries such as the math library also consumed large amounts of memory. The micro world was to be dominated by the easy-to-learn BASIC, and later by the likes of Pascal. This was augmented by the fact that many home systems of the 1980s had a ROM-resident BASIC interpreter, making it easy to write BASIC programs. Fortran did make an appearance in the form of a few compilers for the PC, such as MS Fortran, but just didn’t make the inroads of BASIC, or Pascal, in the form of Turbo Pascal family of languages.

The other issue was Fortran’s unstructured nature. As the industry moved towards structured programming in the early 1970s. the two core languages designed in this context, Pascal and C were making inroads. By the time Fortran became more structured with Fortran 77, the damage had been done. Fortran would evolve and carve out a niche in scientific programming, but would not dominate programming as it had in the 1960s.

Coding: A small note on Cyclomatic complexity

One way of deriving a measure of complexity is by means of the Cyclomatic complexity. This measure was introduced by Thomas McCabe in 1976. This basically treats the programs structure as a graph. What is a graph? Similar in many respects to a flow chart, a graph has nodes (actions), and edges (program flow between actions). If E is the number of edges, and N the number of nodes, and P is the number of connected components (for a single piece of code P is equal to 1) the cyclomatic complexity can be derived using:

C = E – N + 2P

Consider the following program segment, and associated cyclomatic graph:

cyclo_complexity

In this case the number of nodes is 8, the number of edges 9, C=9-8+(2*1) = 3. Works nice for a small program, but I hate to think what a large piece of code would look like!

McCabe, T.J., “A complexity measure”, IEEE Trans. on Software Engineering, SE-2(4), pp.308-320 (1976)

Ghosts in the machine: Algol 60

When Algol 60 first appeared, it heralded a new era for programming languages, one which encompassed the idea of “blocks” of code, amongst other things. So why did such a promising language fail? There are likely a number of reasons.

Firstly, Algol 60 had no I/O in the specification. This meant that when Algol was implemented on a particular machine I/O was added, and so the specifications for I/O differed among implementations. Secondly, the 1960s were awash with new programming languages – everyone wanted theirs to be the next “big thing”. It is harder to compete in this sort of an environment. There was also very little time between the release of Algol 60 and beginning work on its successor Algol 68. Algol may have been more successful if effort had been spent on improving Algol 60, rather than creating a whole new language.

Thirdly, and likely the biggest issue was IBM. Algol 60 made some headway in Europe, but never really established itself in North America. Why? IBM’s US domestic installed market share for computer systems was around 71.4% in the 1960s. Its main competitors, Honeywell and Univac held 8.1% and 9.6% of the market respectively [1]. Algol60 was suppose to take over some of Fortran’s territory, but due to IBM’s dominance, this never happened. In fact Fortran so dominated the industry, that even IBM’s other language PL/I (1964) failed to make any headway. On the business side, IBM adopted Cobol in 1962 as its primary development language.

By the time the tide started to turn against Fortran (in the late 1960s), effected by the surge of interest in structured programming, Algol60 had been usurped by Algol68. Algol68 was too complex a language, and other players in the guise of Pascal (and later C) would take over. Pascal was block-oriented, and structured, and would lead the way for programming in the 1970s. By the time S-algol appeared in the late 1970s, it was over for Algol. Which is a pity, because S-algol showed real promise. Here’s a bubblesort in S-algol.

[1] Arnst, C., “Charts show IBM outran others in ’60s”,  Computerworld, Nov.7, p.57 (1977)

The recursive factorial began life as Algorithm 33

When did the first recursive Factorial program appear? As a non-descript piece of code flagged as Algorithm 33 in Communications of the ACM, Volume 4 Issue 2, Feb. 1961. The code is written in Algol 60, as Algol was the first procedural language to allow recursion (i.e. Lisp allowed recursion, but was not a procedural language).

Strangely enough, the algorithm is so overlooked, searching for “Algorithm 33”  in the ACM digital library does not find it… you have to search manually through the issues online.

Why Algol 68 was weird (and ultimately failed)

Look at this piece of code written in Algol 68, which finds the largest element of an array:

proc max = (ref[,] real a) real:
begin real big := -maxreal
   co assign the maximum real number allowed co;
   for i from 1 lwb a to 1 upb a do
   for j from 2 lwb a to 2 upb a do
   if a[i,j] > big then big := a[i,j] fi
   od od;
   big
end

It summarizes many of the things that were wrong with the language, and likely why it would have been a challenging language to teach people to program in.

  1. It uses a *lot* of acronyms. lwb is short for lower-bound.
  2. It uses control-structure terminators (for bracketing) that are the reverse of something in the beginning of the control structure… but not consistently. For example the for-do loop ends with od, and yet the if-then clause ends with fi. Shouldn’t that end in neht, or should the for loop end in rof? Not the smartest choice in the world.

There are elements of Algol68 that are inherently functional, however on the whole the language is mired by complexity. For example, consider the six forms of coercion (down from eight in the first report), something that would likely overly confuse the novice programmer: deproceduring, dereferencing, uniting, widening, rowing, voiding. Algol68 suffered from (i) being developed by committee – far too many people wanting far too many things incorporated into the language, and (ii) not being implemented widely enough, because implementation was a tricky process.

Check out this piece of code, which calculates a factorial:

Which is not exactly easy to read. Guess they didn’t care much about formatting either – not unusual for the time.

Reading a copy of the “Report on the Algorithmic Language ALGOL 68” might send you into a mild seizure, and even the Algol68 Genie learning guide is not for the faint-of-heart. I think if my first class in programming had involved Algol68, I would not have continued in computer science.

I think Algol 68 is summed up by this quote from  Aldous Huxley in his book Those Barren Leaves: It is the great dead language of the future. By the time Algol68 made its appearance, there were other hounds already barking at the gates… simpler languages that would take the lead.