Image processing was once simple

The concept of image processing was once quite simple. Images taken in the 1960s were often photographs taken by spacecraft, or remote sensing/aerial photographs, that contained artifacts from the acquisition process. These artifacts needed to be removed to create a more aesthetically pleasing image for interpretation, devoid of things that could distract the viewer. An example is shown below of an image of the moons surface taken by the Lunar Orbiter 1. The original contains a series of (periodic) horizontal line artifacts, which can be magically removed using a Fast Fourier transform, and a little effort.

Image of Taruntius Crater on the Moon, taken by Lunar Orbiter 1, (frame M-31), August 20, 1966. The original image (left), versus the de-striped images (right).

But periodic noise and artifacts are often easy to remove. Let’s not forget that many of the early image processing techniques evolved due to limitations in optics. Images needed to be processed to improve enhancement, or reduce noise or improve acuity. These early algorithms are often the same ones we use now, things like unsharp masking to improve acuity. Why have we not moved on to move modern algorithms? The reason is simple – newer algorithms often don’t work any better. They are often more complex, making use of some fancy new method of manipulating data, but they rarely produce results which one could call awe-inspiring. There are hundreds of new image processing and analysis algorithms created each year, but many are just incremental changes of some existing algorithm, and are often poorly tested.

Part of the allure of historical algorithms is their simplicity. They are easy to program, and often quite efficient to run. It is hard for newer algorithms to justify their existence because they cannot sufficiently differentiate themselves.

Median image filtering: efficiency

In a previous post I showed the results of applying a mean filter to a 14MP grayscale image in various languages. But what about a more intense median filter? I applied a 5×5 median filter to the same 2144×6640 pixel, 8-bit image, and the results are interesting. I ran the code in C, Fortran, Julia, and Python.

  • In Julia, I coded it plain, and simple, using the built-in median() function to calculate the median of each neighbourhood region. I also coded the same algorithm using parallel loops.
  • In Python I coded it using loops, no vectorization, and the built-in median() function.
  • In Fortran, I coded the algorithm using a simple bubblesort() to sort the array before finding the median, and selecting the median using Hoare’s quickSelect() algorithm.
  • In C, I coded the algorithm using three methods of finding the median: built-in qsort(), sorting using a bubblesort() function, and selecting the median using Hoare’s quickSelect() algorithm.

The outcomes are shown below:

 

Parallelism in Julia (ii) – image processing

How can we expand parallel programming  to something more practical? What about some image processing, where the operations on a pixel are independent? For example applying some form of LUT (look-up-table), to each pixel, or even performing some neighborhood filtering? Let’s consider the task of median filtering of a grayscale image.  Here is a serial function to perform the task:

function serial_median(img)
   dx, dy = size(img)
   imgOut = zeros(dx,dy)
   for i=2:dx-1, j=2:dy-1
      block = img[i-1:i+1,j-1:j+1]
      newPixel = median(block)
      imgOut[i,j] = trunc(UInt8,newPixel)
   end
   return imgOut
end

Nothing inherently special about this. Here is the parallel version which uses parallel loops:

function parallel_median(img)
   dx, dy = size(img)
   imgOut = zeros(dx,dy)
   sum = 0
   @parallel for i = 2:dx-1
      for j = 2:dy-1
         block = img[i-1:i+1,j-1:j+1]
         newPixel = median(block)
         imgOut[i,j] = trunc(UInt8,newPixel)
      end
   end
   return imgOut
end

This effectively parallelized the rows of the image,

Now let’s test these algorithms. First let’s apply it to a small 1007×788 image (793,156 pixels). Using the @time, protocol, it is possible to time each of the functions.

Parallel:  1.223862 seconds (2.15 M allocations: 101.490 MB, 5.29% gc time)
Serial:  0.262374 seconds (3.17 M allocations: 223.417 MB, 6.80% gc time)

In this first case the parallel version actually took longer than the serial version. Why? This is largely because the data set was likely too small for the parallelization to make a large impact.

Now let’s try the algorithms using a larger dataset, an image that is 6640×2144 (14,236,160) pixels. Now the serial version is 4 times slower than the parallel version.

Parallel:  1.557181 seconds (2.15 M allocations: 203.976 MB)
Serial:  5.999110 seconds (56.88 M allocations: 3.920 GB, 6.64% gc time)

A further example parallelizes both loops in the nested loop:

function parallelNmedian(img)
   dx, dy = size(img)
   imgOut = zeros(dx,dy)

   sum = 0
   @parallel for i=2:dx-1 @parallel for j=2:dy-1
         block = img[i-1:i+1,j-1:j+1]
         newPixel = median(block)
         imgOut[i,j] = trunc(UInt8,newPixel)
      end
   end
   return imgOut
end

It leans to a slight decrease in runtime for the parallel algorithm, at 1.49 seconds.

 

 

 

Segmenting eyes through thresholding (ii)

Having done some experimentation, it is now time to expand the exploration to testing a series of images with the algorithms that how the most promise, namely:

  • Global: Otsu, Triangle
  • Localized: Phansalkar

Here are four images, with the image used in our original experimentation in the top-right.

 

Global threshold with Otsu

In three of the images, the iris has been quiet well segmented, however in the image where the iris is most differentiated from the white region of the eye (top-left), Otsu has failed to threshold the iris successfully.

Global threshold with Triangle

In the first test image, thresholding using the Triangle algorithm must have been somewhat of outlier as it failed to threshold the pupil, however in the lower two images, it has successfully extracted the iris.

Localized threshold with Phansalkar

The other localized algorithm, Phansalkar, has extracted the pupils successfully in each image. Each pupil could then be extracted as the largest circular shaped “blob” in each image, after some post-processing to clean up the image. Eyelashes are also easily extracted (for anyone wanting to do eyelash detection!

Segmenting eyes through thresholding (i)

Let’s evaluate the usefulness of using simple thresholding (binarization) to extract different regions of interest from the image of an eye. In this case let’s try for pupil and iris segmentation. There are two ways we could proceed here: using a global thresholding algorithm, or a more localized one. When choosing a thresholding algorithm, it is often a case of trial-and-error, so having a means of testing a whole series of thresholding is an effective way of doing this. This option is available in ImageJ, and it’s one of the reasons ImageJ is my go-to application for playing around with images, during the exploration phase. Let’s consider this image of the eye:

Now let’s test this from the perspective of global and local thresholding algorithms (in grayscale). Let’s do the global thresholding algorithms first.

The results are what one would consider expected. Most algorithms have delineated the iris quite effectively, with the Triangle algorithm actually extracting the pupil. Here is the histogram, to get some perspective:

There are two peaks in this histogram, at either end of the spectrum representing the white of the eye, and the pupil. In the middle is a large plateau representing the rest of the image. However not every image of the eye will look the same.

Moving on to the localized thresholding algorithms:

Here only one algorithm, that of Phansalkar, actually segments the pupil in a reasonable manner. Localized algorithms don’t rely on the global histogram, so aren’t distracted by differences between images. So where does this leave us? With some preliminary data about the algorithms capacities to segment parts of the eye – not perfectly of course, because there is no thresholding algorithm that is perfect, largely because there is no image that is truly perfect.

Next we will see how these three algorithms perform on three other images:

  • Global: Otsu, Triangle
  • Localized: Phansalkar

 

Pupil segmentation with blobs

Extracting the pupil from an image of the eye didn’t exactly go so well using the Hough transform. How about blobs? Or rather BLOBs, which are defined as Binary Large OBject and refers to a group of connected pixels in a binary image. OpenCV (CV2) actually incorporates a means of finding blobs (an indeterminate shape), in the guise of SimpleBlobDetector_create(). How does it work?

Now  SimpleBlobDetector_create()  has a bunch of parameters that can be set – which of course is a doubled edged sword, and requires some tinkering. First it creates several binary images from the original image using the parameters minThreshold, maxThreshold, and thresholdStep. So if minThreshold=0 and maxThreshold=255, it will produce 256 binary images. Then in the binary images pixels (white) are grouped together. Next the centres of the blobs are calculated, and any closer than the parameter minDistBetweenBlobs are merged. Finally the centres and radii of the new blobs are computed and returned.

It is also possible to filter blobs by colour, size and shape:

  • Colour:
    • filterByColor=1, and  blobColor=0 (for dark blobs)
  • Area:
    • filterByArea=1, minArea=100, maxArea=500 (blobs between 100 and 500 pixels)
  • Shape:
    • filterByCircularity = 1, then set minCircularity, and maxCircularity (circle = 1)
    • filterByConvexity = 1, then set minConvexity, and maxConvexity
    • filterByIntertia=1, then set minInertiaRatio, and maxInertiaRatio (circle=1, ellipse = 0→1)

Lots to think about right? But there is nothing too complicated about these parameters. Once they are assigned, a detector can be created, and then blobs detected:

detector = cv2.SimpleBlobDetector_create(params)
keyPoints = detector.detect(im)

Easy right? Hold on, not so fast. You have to make sure you pick the right parameters. For the pupil, I set up filterByArea, and filterByCircularity. Considering the pupil is a circular object, circularity makes sense. Usually in an application sense, the size of the image of the eye being taken will be relatively uniform, and hence setting parameters for the upper and lower bounds of the area of the pupil makes sense. In this case the image is about 600×600 in size. Here is the Python code:

import cv2
import math
import numpy as np;

filename = raw_input('Enter a file name: ')

# Read image (im=gray, img=colour)
im = cv2.imread(filename, 0)
img = cv2.imread(filename)

# Setup SimpleBlobDetector parameters.
params = cv2.SimpleBlobDetector_Params()

params.filterByArea = True
params.minArea = 3000
params.maxArea = 6000
params.filterByCircularity = True
params.minCircularity = 0.5

# Set up the detector with default parameters.
detector = cv2.SimpleBlobDetector_create(params)

# Detect blobs.
keyPoints = detector.detect(im)

for keypoint in keyPoints:
   x = int(keypoint.pt[0])
   y = int(keypoint.pt[1])
   s = keypoint.size
   r = int(math.floor(s/2))
   print x, y
   cv2.circle(img, (x, y), r, (255, 255, 0), 2)

cv2.imwrite("blobOutput.png", img)

Most of the code is taken up with setting up parameters. Area is set as 3000→6000 pixels, and circularity is set to 0.5. The detector is then applied, and any blobs marked on the image.

Does it work? Here is the output:

So the algorithm effectively finds the pupil. But does it work effectively with other eye images? The reality is no.  Consider the image below (which failed):

Eyes where the contrast between the pupil and the surrounding iris is reduced, due to iris colour, may result in the algorithm failing. There are inherently many issues: (i) pigmentation of the iris: green, gray, brown, hazel; (ii) position of the eyelids, i.e. from the viewpoint of obstruction; (iii) how dilated the pupil is, (iv) interference from things such as make-up, and (v) reflections from light.

There is no easy algorithm in image processing.

 

 

 

An example of the Hough transform – pupil segmentation

In a previous post I talk about using the Hough transform in counting logs in a pile. Now let’s turn our sights on to a simpler problem – that of extracting the pupil from the image of a eye, the first step in many eye-tracking algorithms. Now extracting the pupil from an image of the eye is usually quite simple, even though taking a image of the eye is not without problems. Taking an image of an eye can lead to reflections of what the eye is looking at, and also lighting artifacts. Here is a sample image (from wikipedia).

The pupil is normally uniformly dark in colour, and the iris which surrounds it is more variegated, but also well differentiated from the remainder of the eye. This image of course is a near perfect image, because both the pupil and the iris are clearly visible. In many images of the eye, portions of the iris may be obstructed by the eye lids.

So extracting the pupil must be an easy task using the Hough transform, right? Think again. Even beautifully round objects are hard to extract automatically, partially because of the number of differing parameters required by a function like OpenCV’s  cv2.HoughCircles(). Some parameters control the type of edge based segmentation performed on the grayscale version of the above image, and getting those parameters right often requires some intensive tweaking. Once you have found the pupil in one image, try replicating that in another. It’s not easy. Oh, it works fine in many of the examples showing how to use the function because they are often easy examples. It is a perfect illustration of an algorithm that seems useful, until you actually use it. There is no free lunch.

Here is an example of using the Hough transform. First we take an image with what are essentially some randomly placed black circles on a white background. Should be an easy task to find the circles using a Hough transform.

Below is the code we used:

import cv2
import numpy as np

filename = raw_input('Enter a file name: ')

# Read in grayscale version of image
imgG = cv2.imread(filename,0)
# Read in colour version of image
imgO = cv2.imread(filename)

# Process the image for circles using the Hough transform
circles = cv2.HoughCircles(imgG, cv2.HOUGH_GRADIENT, 1, 50, 
              param1=30, param2=15, minRadius=0, maxRadius=0)

# Determine if any circles were found
if circles is None:
    print "No circles found"
else:
    # convert the (x, y) coordinates and radius 
    # of the circles to integers
    circles = np.round(circles[0, :]).astype("int")

    # draw the circles
    for (x, y, r) in circles:
        cv2.circle(imgO, (x, y), r, (255, 255, 0), 1)

    # display the image
    cv2.imshow("output", imgO)
    cv2.waitKey(0)

Below is the result, with the cyan circles denoting the circles found by the Hough transform.

It’s by no means perfect, *but* it has found all the circles. So now let’s test the same code on the eye image. First we convert the image to grayscale (done in the program), and then apply HoughCircles() with the same parameters. Here is the result:

Did it find the pupil? … Not really. It found plenty of things that aren’t circular objects, and it got close, but certainly didn’t outline the pupil in any sort of effective manner (never mind all the false-positives). What about if we preprocess it? Say, grayscale it, then apply a local threshold (say the algorithm of Phansalkar). This is the pre-processed image:

Now here is the Hough transformed image:

Worse? Oh, SO much more. It has turned every eyelash into a potential circle (because they are arcs I would guess). Whether or not it actually found the pupil is hard to tell. We could likely get better results by tweaking the parameters – but how long would this take?

Another approach might be to apply Canny thresholding before submitting the image to cv2.HoughCircles(). Will this work? The problem with Canny is inherently that it too has parameters, in the guise of high and low thresholds. This problem can be alleviated by using Otsu thresholding to determine the thresholds for Canny [1]. Here is the modified code:

import cv2
import numpy as np

filename = raw_input('Enter a file name: ')

imgG = cv2.imread(filename,0)
imgO = cv2.imread(filename)

thV,thI = cv2.threshold(imgG,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
highTH = thV
lowTH = thV / 2

# Find the binary image with edges from the thresholded image
imgE = cv2.Canny(imgG, threshold1=lowTH, threshold2=highTH)
cv2.imwrite('eyeCanny.png', imgE)

# Process the image for circles using the Hough transform
circles = cv2.HoughCircles(imgE,cv2.HOUGH_GRADIENT, 2, 30, 
          param1=30, param2=150, minRadius=0, maxRadius=150)

# Determine if any circles were found
if circles is None:
    print "No circles found"
else:
    # convert the (x, y) coordinates and radius of the 
    # circles to integers
    circles = np.round(circles[0, :]).astype("int")

    # draw the circles
    for (x, y, r) in circles:
        cv2.circle(imgO, (x, y), r, (255, 2550, 0), 1)

    cv2.imwrite("houghOutput.png", imgO)

We have modified some of the HoughCircle() parameters, and added a value of 150 to maxRadius. Here is the output from the Canny edge detector:

Here is the subsequent output from the Hough transform:

Did it find the pupil? No. So it seems that extracting the pupil within an image of the eye is not exactly a trivial task using a circular Hough transform.

[1] The Study of an Application of Otsu Method in Canny Operator

 

How many logs can a woodchuck count?

In the forestry industry there is some value in pre-processing logs, with respect to size and shape,  before they arrive at the mill. Ideally, each log would be somehow run through a 3D scanner, and optimal milling decisions made. The reality is that the volume of logs is so great, that 3D scanning them is likely never going to be a reality. Sometimes, even trivial tasks such as determining the individual diameters of a series of logs in a pile, isn’t so trivial when we try and create an algorithm to do it. In reality, this is very similar to the problem of mushroom counting using computer vision. What is easy for the human eyes to do, may not be as easy to translate into a generic vision-based solution. Mushrooms are almost easier because they are often white… except for when they are brown. You get the picture. Wood on the other hand, can having varying colours even within a particular species, making automated vision challenging. Consider the following image of a pile of logs:

The first thing that comes to mind is of course to find circular objects in the image. Hough transforms maybe? Well, that would work, but it requires pre-processing to extract the edges of each of the logs. How do we deal with the differential nature of the wood colour? Could we just use colour? Unlikely due to the small difference in hues, and what if I wanted to use the algorithm on walnut? Let’s try and first homogenize the colours. in the image. We can do this using a Mean-Shift filter with a spatial radius of 9, and a colour distance of 25. Here is the result:

Mean-shift filter

Now ideally we could convert this image to grayscale, and apply some form of edge detection in order to find edges. This however, will only be as successful as there are edges in the image. You can clearly see the edges, and the human eyes can distinguish individual logs, however algorithms see things in a different light. First we convert it to grayscale:

Converted to grayscale.

Then we could use Canny edge detection algorithm to extract the edges of each of the logs.

Canny edge detection: Parameters – Gaussian kernel radius = 1.7, low threshold=2.5, high threshold =7.5

How successful was this? Canny is one of those algorithms which is highly dependent on a series of input parameters to direct the algorithm. Too low a Gaussian will find more edges, but will also find more edges, i.e. edges which may not necessarily be useful, and more likely confuse post processing algorithms. Too high a Gaussian, and the outlines of the logs will become more abstract, i.e. they will loose edge profile, but there will be less spurious edges. Ideally we would like an edge detection algorithm which is automatic – yet few (if any) of those exist that are generic enough to be applied to all images. Adding to the problem are overlapping logs, portions of the sides of logs visible in the image, and differential colours.

The result could now be applied to a Hough transform, which may or may not work well. Here is the result of using the a small piece of Python code (using the function HoughCircles from the computer vision library CV2).

Here is the code that did it:

circles = cv2.HoughCircles(img,cv2.HOUGH_GRADIENT,1,20,
 param1=40,param2=25,minRadius=40,maxRadius=80)

As you can see, HoughCircles has *7* parameters apart from the image to be processed, so finding the sweet spot is challenging. Made harder by the fact that the logs are by no means homogeneous. Many of the examples showing how to use the Hough transform to find circular objects use ideal circular objects, and maybe only a couple of them. Finding logs using the Hough transform isn’t perfect, but it has managed to find some. Modifying the parameters may find more. But then one set of parameters may not work on another image. There is also the problem of Hough itself, which works on the principle of using a “model” to find features in what is typically an edge image. For HoughCircles, that model is a perfect circle, and not all log cross-sections conform to a perfect circle, nor are they all perfectly aligned to eliminate shadows.

Now as a problem, counting logs is a good one, but there may be a reason why there is so little literature out there in the ether – because it may be inherently challenging to produce a system that will work in the diversity of environments, and diversity of tree species/sizes. There was a paper written in 1994 (pre-digital image) [1] on using the Hough transform to perform this very task, but the testing was more than lacklustre, and no real images were presented. A more recent paper [2] takes a different approach to the problem, without using Hough, which shows some promise.

If someone is interested in doing this for a MSc. project, we could look into industrial funding.

REFS:

[1] Brodie, J.R., Hansen, A.C., Reid, J.F., “Size assessment of stacked logs via the Hough transform”, Trans. of the ASAE, 37(1), pp.303-310 (1994).
[2] Rahman Shaik, A., “Image processing technique to count the number of logs in a timber truck”,

P.S. Here is a good tutorial on using cv2.HoughCircles.
Note: The pre-processing was done in ImageJ, but it wouldn’t be any different in a specifically written program.

 

Image mean filtering (iii) – in C

The last piece of code related to the mean filter is written in C. The first thing to notice is that the images are stored using a static array, which means that they are statically stored (put succinctly – 0 initialized static data goes in .BSS (Block Started by Symbol), non 0 initialized data goes in .DATA). To be able to properly store large images, it’s best to use dynamic arrays stored in heap memory, but that’s another story altogether. Here is the C code with relevant sections coloured.

#include <stdio.h>
#define dx 2144
#define dy 6640

int main(void)
{
   FILE *ifp, *ofp;
   int i, j, x, y, w=2, sum;
   static int image[dx][dy];
   static int imageN[dx][dy];

   ifp = fopen("pano.txt","r");
   for (i=0; i<dx; i=i+1)
      for (j=0; j<dy; j=j+1)
         fscanf(ifp,"%d", &image[i][j]);
   fclose(ifp);

   for (i=0; i<dx; i=i+1)
      for (j=0; j<dy; j=j+1)
         imageN[i][j] = image[i][j];

   for (i=w; i<dx-w; i=i+1)
      for (j=w; j<dy-w; j=j+1){
         sum = 0;
         for (x=-w; x<=w; x=x+1)
            for (y=-w; y<=w; y=y+1)
               sum = sum + image[i+x][j+y];
         imageN[i][j] = (int) sum/25.0;
       }

   ofp = fopen("meanPano.txt","w");
   for (i=0; i<dx; i=i+1){
      for (j=0; j<dy; j=j+1)
         fprintf(ofp,"%d ", imageN[i][j]);
      fprintf(ofp,"\n");
   }
   fclose(ofp);

   return 0;
}

LEGEND: image input, mean filtering, filtered image output, array copy

The C version of mean filtering program is clearly the fastest, at 0.875 seconds. But what it has in speed it lacks in usability, at least from the novice programmer’s perspective. The I/O sections of code require two for loops each, to iterate through the file to rad/write each element. Copying the image also requires a nested loop, and calculating the mean at each pixel requires four nested for loops. It isn;t that loops are bad, they are just tedious. Ten for loops versus the 1 (albeit nested) loop in Julia. For the novice programmer wishing to do a little image processing, C just provides too many hurdles to do it in an expedient manner.

No rapid prototyping, and *no* free lunch in C.

P.S. You can find more information on C memory on this blog or a good overview here.

 

 

 

Creating photographic filters (i) – desaturation and saturation

How does one create photographic filter effects such as those found in Instagram? Are they easy to replicate? In Photoshop they likely are – there are numerous tutorials on the net about how to make a filter effect – but it requires each image to be processed separately, so make take some of the “instant coolness” away from actually processing the image. So what are these filters? Usually they are a combination of effects – the use of some curve profiles, different blending modes, and the addition of colour hues. Basically, if you can create an effect in Photoshop, converting that to an automated process. Note that there is no single way to produce an effect. There are probably thousands of ways of giving a photograph a retro or vintage effect.

What are some of the tricks to adding effects?

Desaturation (i.e. a muted look)

Desaturation is the process of reducing saturation in an image. Saturation in the context of colour images is the “colorfulness of an area judged in proportion to its brightness” – saturation can be viewed as how far a colour differs from pure white. Effectively desaturation reduces this colourfulness. Removing colour from images may be more effective than enhancing it. Having a more colourful images lends itself towards a more “cheerful” interpretation, whereas reduced colour is more of a mood enhancer. Photographs that benefit more from desaturation tend to be those that have darker undertones. This can include earthy outdoor photos, cityscapes, and scenes that contain fog, or rain.

Saturation is one component in colour spaces such as HSV and HSB, and so it is relatively easy to modify saturation. It basically involves converted an image from RGB to HSV colour space, and then modifying the “S” component by multiplying it by some value in the range 0 to 1.0. Reducing saturation to 0% effectively reduces the image to grayscale.

Here is an example of desaturation. The image below was taken in August in Oslo, Norway. It is an image with a substantial amount of “colourfulness”, both in the vegetation, and the wood on the building.

Now if the image is converted to HSB, and the saturation component of the image is multiplied by 0.6 (60% of saturation), we get an image with much muted tones.

saturation (i.e. a vibrant look)

Obviously saturation is the opposite to desaturation. This is usually applied to images that are characterized as “cheerful”, and may contain a number of vibrant colours. Saturation is often overdone, producing colours which were unlikely to appear in real life. Consider the photograph of two snow-blower railcars in Norway. The scene might be a good candidate for saturation, providing a more vibrant colour to the railcars (even though in real-life they are faded).

Again, converting to HSB colour space first, then multiplying the saturation component by 1.3 (30% increase), results in the following saturated image:

The sky becomes bluer, the the grass greener, and the railcars have had their faded paintwork restored.