“One picture is worth ten thousand words”, a phrase from an ad by Fred R. Barnard in a 1927 issue of the trade journal Printers’ Ink, probably spurned the classic phrase “a picture’s worth a thousand words”. This phrase refers to the idea that complex stories can be explained by a single image. But how many words do we need to describe a picture? Until recently, this may not have mattered, but now with the advent of immense online image repositories, the onus has fallen on the necessity to tag images. A tag is a keyword or term associated with a piece of information. It describes the item and facilitates text-based classification and searching. It is a form of metadata. In the context of digital image repositories, tagging allows images to be described, and is easier than trying to find some form of feature which describes the image. It is used to label content. Part of the notion of tagging relates to viewing an image and identifying items which help describe the image. Identifying items requires the innate ability of the minds image processing system to segment an image. Having described an image, this information can be used in image searching queries.
In recent years there has been immense interest in the application of computer based techniques to “retrieve” images. The foremost limitation of computer-based visual systems is that computers interpret numbers, yet we try and design algorithms which mimic the human visual system, a system with 100 million years of evolutionary design, which works on dynamic images in real-time. Humans can look at a cereal package, determine it is a rectangular box, and even estimate its approximate dimensions, primarily because humans perceive in 3D. By interpreting text and images on the sides of the cereal package humans are able to allude to its contents. Deriving an algorithm to perform the same using an image from a digital video stream is more of a challenge. The brain has models of the world, and thirty distinct information processing regions to deal with colour, texture, etc. The only reason optical character recognition algorithms are so successful is that the task is much less complex. The reason is simple. Text consists of words which are well-defined models of concepts. Images of text, usually consist of black characters on a near-white background. As such it is relatively simple to extract the characters from the background and then match them against templates to determine their meaning. Consider the example below. Here, some text, the word “Trajan” as been extracted from the image of the text. The segmentation task, whilst not trivial is not difficult either.