Why image segmentation is difficult (1)

One of the most non-trivial tasks in image processing is segmentation. Segmentation is the process defining an image in such a manner that different objects can be extracted from it. In it’s simplest form, segmentation exists as a thresholding problem. I have an image with an object and a background, and they are distinct enough that I can extract them. But not all images come in this cookie cutter form. In fact the majority of them don’t.

Why is segmentation important? Well, it is the first step in trying to automatically determine what is in an image. But it isn’t an easy task, and there is no segmentation algorithm out there that is effective on all images. But why?

The main issue may be that we are hindered by our own vision system – humans can easy extract object information from what we see. We are even able to determine movement, and estimate the distance an object is from us. Yet we try and design algorithms which mimic the human visual system, a system with 100 million years of evolutionary design, which works on dynamic images in real-time. Humans can look at a cereal package, determine it is a rectangular box, and even estimate its approximate dimensions. By interpreting text and images on the sides of the cereal package humans are able to allude to its contents. Deriving an algorithm to perform the same using an image from a digital video stream is more of a challenge. The brain has models of the world, and thirty distinct information processing regions to deal with colour, texture, etc.

Here is a picture of a tiger in a zoo (Wikipedia: Eddy1988)

This is what the human vision system (focusing on the tiger) sees:

Canny edge detection sees a bunch of lines, based on the specific parameters the algorithm is given (3 in this case). Sometimes it is hard for even a human to decipher anything from this jumble of lines, let alone extracting the mere shape of the tiger.

K-means clustering based  segmentation (with 4 “objects”) sees this, which is somewhat better, but even here the tigers light coloured coat is marked the same as parts of the foreground. this algorithm is also dependent on some value k, representing the number of objects to find.

There isn’t an algorithm which screams “Look, here is the tiger!”.