Like many things in image processing, the quality of the output from image binarization is directly dependent on the quality of the data in the original image. Sometimes we believe an image will be easy to segment, but that our belief is clouded by our eye’s ability to extract objects, and we forget that algorithms see things in a different light. The human visual system gets a complete picture of whatever it is looking at, and therefore can interpret in in context. A computer gets a bunch of pixels.
Here are three images all containing different content. The first one is an aerial image with a unimodal histogram, the second shows an indistinct histogram biased towards the high intensity values. The third is bimodal, but the image itself does not contain two distinct features. All three images would be challenging to binarize.
Sometimes even though the data seems like it will process well, the subtleties are where the problems lie. Some images cannot be effectively turned into binary images – in fact some images cannot be effectively segmented in any manner.The next three images shows an image which is indistinct multimodal, a uni-modal image with mini-peaks, and a unimodal peak which spans the entire range of intensity values. These three images are also considered challenging to binarize.
There are also images that *seem* like they could be thresholded, however the data within the image complicates matters. Both of the first images have bimodal histograms, but what would be segmented?
With many image binarization algorithms, the threshold is calculated based on some global statistic, so although the histogram shows a bimodal distribution, this is only a representation of how the intensity values are distributed in the image, not where they are distributed. 100 different images could have the same distribution, but radically different thresholding outcomes. They don’t take into account the spatial distribution of the pixels.