SURF, SIFT? What about ORB?

When they first appeared, algorithms which detected feature detectors/descriptors were a great idea. In 1999 came SIFT (Scale-Invariant Feature Transform). In 2006 came SURF (Speeded Up Robust Features) which was suppose to be several times faster than SIFT. These algorithms are used for a huge range of computer vision and image processing applications : panorama stitching, object recognition, gesture recognition. The problem with both these algorithms is that the are patented, which makes them harder for the broader community to use. It also makes it harder to build into packages such as OpenCV.

A good alternative is ORB, short for Oriented FAST and rotated BRIEF [1], first presented in 2011 to provide a fast and efficient alternative to SIFT or SURF. It’s also available in OpenCV, and doesn’t have a patent. Works really nicely. It’s basically a blend of FAST keypoint detector and BRIEF descriptor with performance enhancements. I used it to match the Prokudin-Gorskii R-G-B images. Works very well. I’ll write a post on it next week.

Other feature detectors in OpenCV (3.X) include:

  • BRIEF ( Binary robust independent elementary features) [2] – The first binary descriptor published (only a descriptor).
  • BRISK (Binary robust invariant scalable keypoints) [3]
  • FREAK (Fast retina keypoint) [4] – Can be found in OpenCV xfeatures2d, as an experimental algorithm.
  • KAZE (a Japanese word the means wind) [5] –  (Also has accelerated KAZE, AKAZE).

[1] Rublee, E., Rabaud, V., Konolige, K., Bradski, G., “ORB: an efficient alternative to SIFT or SURF“, in IEEE Int. Conf.  on Computer Vision, pp. 2564-2571 (2011).
[2] Calonder, M., Lepetit V., Strecha C., Fua P. , “Brief: Binary robust independent elementary features”, in  Computer Vision, Springer, pp.778-792 (2010)
[3]  Leutenegger, S., , Chli, M., Siegwart, R.Y., “BRISK: Binary robust invariant scalable keypoints.”, in IEEE Int. Conf.  on Computer Vision (2011)
[4] Alahi, A.,  Ortiz, R., Vandergheynst, P.,  “Freak: Fast retina keypoint”,  in IEEE Int. Conf.  on Computer Vision and Pattern Recognition (2012)
[5] Alcantarilla, P.F., Bartoli, A., Davison, A.J., “KAZE features”, in European Conference on Computer Vision (2012)

Software is like Lego bricks… except it isn’t

Some people liken the building blocks of software to Lego bricks, and I can see why the analogy is made. There are different sized bricks in Lego which do different things. Similarly in software there are modules, libraries, packages, subroutines etc. which contribute different functionality to programs. Theoretically one could reuse a function like sqrt() infinitum times, just like the standard Lego block. It won’t wear out. I can also use the functions of OpenCV to build applications which process images, so in fact I am constructing software. But Lego bricks have not changed since the plastic bricks appeared in the early 1960s. Software has changed, and continues to evolve, sometimes at a rapid pace. The use of building blocks such as OpenCV rely on an underlying infrastructure including a programming language Python? C++? Using it also relies on evolving *versions* of the building block. Because Lego bricks have not changed since their inception, they are backwards compatible. Sure new bricks have been added, but the method of connecting them hasn’t.

However when a package like OpenCV changes (for instance to deal with the fact that algorithms like SIFT and SURF have patents), software breaks because it is no longer backwards compatible. Then the system has to be changed to compensate for this. Lego doesn’t need to change. The other thing that is different is that Lego is a constructive entity. It can be used to build anything… from an space-ship to a real house. Software can’t.

Python and its “dependency” issues

I don’t think image processing in Python is great, *unless* you use OpenCV, although OpenCV also makes life better in C/C++ as well. Great package. Lots of books, and online information – it is almost overwhelming. Only problem is configuring it to work with Python is nasty (maybe many things are like this?). Dependencies is what kills image processing in Python. I’m not just talking libraries you can easily include… it is also the issue with correct versions, and complex install scripts (I like pip and homebrew, but if something goes wrong for the average person, they can get horribly lost). Python’s issue of course is partially the fact that it exists in two forms: 2.7.X and 3.X. Here’s a brief article of the key differences. Better idea would have been to name Python 3.X something else – viper? adder? constrictor?

To compound things, OSX also has the built-in Python in /usr/bin/python, and a choice of others, including the homebrew version in /usr/local/bin/python. Now the version on my laptop is Python 2.7.10 (default, Oct 23 2015). Seems old right? So I updated to homebrew python… and I get 2.7.13.

Problem is you have to first choose a version of Python, then a version of OpenCV. Do I choose to continue with OpenCV 2.4.13, OR OpenCV 3.2? Even more complicated is the fact that feature detection algorithms SIFT and SURF are no longer included in OpenCV 3 by default. It requires adding an extra module. How much time do I need to spend to make sure the dependencies are all up-to-scratch? How long before a book goes out of date? (quite quickly if libraries are moved around). The other issue is that existing code imports OpenCV using “import cv2“, which is cryptic, because it seems like it’s importing OpenCV 2.X, however it actually imports whatever the core version is. To find out what you are playing with requires the use of “cv2.__version__“. Then you find out it’s ‘’ and thats the reason some functions don’t work.

So I upgrade OpenCV to V3.2. And even for someone versed in the command-line, it wasn’t trivial. The install via homebrew on OSX was easy.

brew install opencv3

But after that it requires some changes to the PATH in .bash_profile, and a modification of the link in /usr/local/lib/python2.7/site-packages to point to the new This means doing this:

ln -s /usr/local/opt/opencv3/lib/python2.7/site-packages/ /usr/local/lib/python2.7/site-packages/

Which means the link now looks like this: -> /usr/local/opt/opencv3/lib/python2.7/site-packages/

And now… it all *seems* to work. Although some of the functions from OpenCV 2.X are different in OpenCV 3.X. Oh the drama of it all! I found a bunch of replacements for SIFT, one of which is ORB (Oriented FAST and Rotated BRIEF), and it seems to work okay. The only thing I don’t like is that the OpenCV documentation could use some more examples, and maybe a better explanation of some of the function parameters.

Look, I do like coding in Python. No issues there. But what I will say is to get any of this to work properly, and to have an understanding of how it works you have to understand and know how to use the command line. You also have to have a good handle on searching in Google, because information on how some things work is not easy to find. However if a Julia-native OpenCV were to appear (or a better interface), I would dump Python like a hotcake.


Here’s a guide to installing OpenCV and Python 2.7 on OSX. Adrian Rosebrock has a load of excellent tutorials on pyimagesearch. Here’s another guide.


A stylistic viewpoint on compressed code

I talked about this in a previous post (Did old code have weird style?). I think it really appeared with Algol 60, and disappeared somewhere in the 1980s. Partially I think it had to do with compressing code onto one line to reduce the number of punch cards needed. Otherwise, as Algol syntax was heavily used in articles, I imagine it had a place there, compressing code, that would have otherwise bloated.

Here’a an example from Algol 60:

procedure maxmin(a,m,n,large,small); value m,n;
    real array a; integer m,n; real large, small;
    begin real big, little; integer i, j;
        big := little := a[1,1];
        for j := 1 step 1 until n do
        for i := 1 step 1 until m do
        if a[i,j] > big then big := a[i,j] else
        if a[i,j] < little then little := a[i,j];
        large := big; small := little

The programmer of course had to know their syntax… there is very little in the way of indenting. This is largely because Algol was the first language to really embrace “free” style. Fortran and Cobol  were still somewhat fixated on fixed style.

If we were to write this now (in say C) , we might wind-bag it out.

void maxmin(double a[], int m; int n; double *large; double *small)
    double big, little;
    int i, j;
    big = a[0][0];
    little = a[0][0];
    for (i=0; i<m; i=i+1)
        for (j=0; j<n; j=j+1)
            if (a[i][j] > big)
                big = a[i][j];
            else if (a[i][j] < little)
                little = a[i][j];
    *large = big;
    *small = little;

What about if we wrote the code in “compressed” Julia?

function maxmin(a,m,n)
    little = big = a[1,1]
    for i=1:m, j=1:n
        if (a[i,j] > big) big = a[i,j]
        elseif (a[i,j] < little) little = a[i,j] end
    return little, big

Would it be terrible? I have to believe from a stylistic viewpoint, and even from a readability viewpoint, some use of compressed code is optimal.

Building a contrast enhancement filter in Julia (ii)

This is what the filter we are building in Julia currently looks like:

function localStatENH(img, k, n)

    dx, dy = size(img)
    imgN = copy(img)
    w = div(n,2)

Next to be added is the pixel processing, built using a nested-loop. If the image to be filtered is dx=100, dy=100, and w=2 (n=5), then the value of i=3:98, and j=3:98. (This allows for a 2-pixel wide border to deal with edge effects, as discussed in the previous post).

for i=w+1:dx-w, j=w+1:dy-w


Within the loop, the neighbourhood surrounding a pixel is extracted – we call this a block. 

block = img[i-w:i+w, j-w:j+w]

This effectively extracts a slice of the 2D array. The pixel at location[3,3], therefore extracts the region img[3-2:3+2, 3-2:3+2] which is effectively img[1:5,1:5] – the block bounded by rows 1 to 5 and columns 1 to 5, containing 25 pixels.

The one statistic needed from this block is the mean, which can be calculated using the Julia function mean().

statB = mean(block)

Now the equation to perform the enhancement and calculate the new pixel can be formulated:

newP = statB + k * (img[i,j] - statB)

Here is what the function looks like now. All that needs to happen now is some tidying up.

function localStatENH(img, k, n)

    dx, dy = size(img)
    imgN = copy(img)
    w = div(n,2)

    for i=w+1:dx-w, j=w+1:dy-w
        block = img[i-w:i+w, j-w:j+w]
        statB = mean(block)
        newP = statB + k * (img[i,j] - statB)

Firstly any values of newP below 0 or above 255 should be treated, and finally the value in newP assigned to the appropriate pixel in the enhanced image imgN (first it is truncated and converted to a UInt8, i.e. value between 0 and 255).

function localStatENH(img, k, n)

    dx, dy = size(img)
    imgN = copy(img)
    w = div(n,2)

    for i=w+1:dx-w, j=w+1:dy-w
        block = img[i-w:i+w, j-w:j+w]
        statB = mean(block)
        newP = statB + k * (img[i,j] - statB)
        if (newP < 0)
            newP = 0
        elseif (newP > 255)
            newP = 255
        imgN[i,j] = trunc(UInt8,newP)
    return imgN

Voilà! The code has been written. But how does it work? We’ll check that out in the next post.


Why reading code is important

One of the biggest problems with programmers today is that few have mastered the art of reading code. Read code? *Why* would you want to do that?

The reality is that the ability to read code is just as important as the skill of writing code. The main reason is to gain a better understanding of how a language works, and to improve writing skills. In this way it is no different to reading and writing a human language. The more you read, the better writer you become. It is the subliminal intake of word-craft, and writing style that makes this happen. It is no different with programming.

Reading a program helps improve quality, and style – and hence readability. It also helps absorb information about how things work. How many programmers look towards a piece of example code when trying to determine how things work? A description of a function findBlob() is fine, but seeing how it works is better for understanding how it can be used. Sometimes, as in the case of existing programs, you have to read it to comprehend what it does – there is no guarantee that it is well documented. The ability to port a piece of code, or make improvements to a program rely on good comprehension skills – the ability to look beyond what is printed on paper, and try and understand what the code is doing. Understanding a program is also a major linchpin in providing effective software maintenance, and evolution of systems.

Take for example, a piece of code written in Fortran:

There are of course many ways of reading code. Viewing on the screen works, but if the code is inherently long, then this becomes an issue. Large programs are hard to read. An alternative, and likely not very 21st century is to print out the code. This allows it to be annotated in a manner that probably will make it easier to understand. It can also be marked up with colours. It allows you to explore way more than looking at the code on a screen.

Now look at the same code from above printed out and annotated.

This allows a flowchart to be created. A visual representation often helps in analyzing code.

Now this can be used to re-engineer the code by migrating it to a newer version of Fortran, or translating it to C, Julia??

Reading code will help become a better programmer.

* Code and flowchart from “Introduction to Digital Computing and Fortran IV with MTS Applications”, (1971).

Building a contrast enhancement filter in Julia (i)

How hard is it to implement an image processing filter in Julia? Well, it’s not hard at all, and you don’t even need much programming experience. The simplest filters are often the most meaningful, and easiest to implement. There are numerous filters which make use of statistics in the image to facilitate enhancement – like the mean and variance of intensity values. Each pixel is surrounded by a neighbourhood of pixels which influence its interpretation. A pixel whose neighbours all have similar values lies in a uniform region, whereas a pixel who neighbours are vastly different may lie on an edge. Figure 1 shows a 3×3 neighbourhood region centred on a pixel [2,2] with the value 4.

Fig.1: A 3×3 neighbourhood

There are some algorithms that use both the local mean and variance to enhance a pixel. The neighbourhood in Fig.1 has a mean of ≈ 10, and a variance of 36.4. This tends to result in subtle details being enhanced at the expense of the features of interest. Consider instead the following equation which maintains its local mean, yet modifies the variance:

where E is the enhanced pixel, I is the original pixel, m is the mean from the neighbourhood surrounding the pixel of size n×n, and k is the gain (ratio of the new local standard deviation to the original standard deviation). When k > 1, the image will undergo sharpening, and when 0 ≤ k < 1, the image will be smoothed (If k=0, then it is equivalent to mean-filtering. (Note, i, and j represent the row and column indices).

So how do we build a Julia function to perform this operation? Let’s start with the basic structure of the function, named localStatENH():

function localStatENH(img, k, n)


There are three input parameters, or values we want to give the function:

img - the grayscale image to enhance
k   - the gain
n   - the size of the local neighbourhood (odd)

Now there are some basic things to add to the function, like finding the dimensions of the image, and calculating the radius of n. The first line uses the function size(), which returns the number of rows and columns in the image img, and stores them in dx and dy respectively. The second line makes a copy of the image img, and stores it in imgN. The last line divides n by 2, returning a whole number – so if n=5, w=2, which represents the “radius” of the neighbourhood.

dx, dy = size(img)
imgN = copy(img)
w = div(n,2)

Now the function needs to process each pixel in the image.

NOTE: The trick with this is dealing with the edges of the image. As can be seen in Fig.1, the neighbourhood overlays the image when centred on pixel [2,2]. However if the 3×3 neighbourhood were centred on [1,1] it would go outside the image. This problem is solved in two ways: (i) pad the image, or (ii) offset the start to the neighbourhood processing. This example will choose the latter option, so that for a 3×3 neighbourhood, processing will begin at pixel [2,2]. The dashed line in Fig.2 shows the inner boundary of pixels to be processed.

Fig 2: Dealing with edges


In the software world..

A. All that glitters is not gold.
B. Never count your breakthrough chickens before they’re hatched.
C. Tools cannot substitute for talent.
D. Breakthroughs cannot be scheduled.
E. Quality control is often only skin deep.

Robert L. Glass, Boeing Aerospace Company

Image binarization (9) : Why image processing involves tinkering

Sometimes images seem like good candidates for thresholding, because of the content they contain. But not all algorithms work all the time. Consider the following image of a stamp, and its grayscale equivalent. It has a lot of extremely fine detail, and might be a candidate for localized thresholding?

The original colour image and its grayscale representation

Now consider the histogram of the grayscale image. There are three peaks in the histogram, one near the lower intensities representing the background of the stamp, and the overlapped bimodal peak at the opposite end, representing the stamp.

Original histogram

Below are the results of four thresholding algorithms applied to the image.

Thresholding: Otsu (top-left); Percentile (top-right); Sauvola (bottom-left), and Bernsen (bottom-right)

Here we can see the failure of both global Otsu, and local Bernsen. Otsu fails because of the trimodal nature of the histogram – the threshold value calculated is 108. The unlikely algorithms that succeed are Bernsen (local) and the percentile algorithm (global). The latter uses a fraction of foreground pixels set to 50%, making the threshold equal to 176.

Now if the contrast in the grayscale image is stretched to remove the background, and the intensities redistributed, a bimodal histogram is created.

Contrast stretched histogram

If this contrasted stretched image is then thresholded using Otsu, a better binary image appears, with a threshold value of 126.

Contrast stretched image, and resulting binary image (Otsu)

A crazy command line

Here is a piece of code from unix.stackexchange which counts the number of occurrences of words in a text file.

sed -e 's/[^[:alpha:]]/ /g' vader.txt | tr '\n' " " |  tr -s " " | tr " " '\n'| tr 'A-Z' 'a-z' | sort | uniq -c | sort -nr | nl 

It basically does the following:

  1. Substitute all non alphanumeric characters with a blank space.
  2. All line breaks are converted to spaces.
  3. Reduces all multiple blank spaces to one blank space
  4. All spaces are now converted to line breaks. Each word in a line.
  5. Translates all words to lower case to avoid ‘Hello’ and ‘hello’ to be different words
  6. Sorts the text
  7. Counts and remove the equal lines
  8. Sorts reverse in order to count the most frequent words
  9. Add a line number to each word in order to know the word posotion in the whole

Now what it really does is show the power of Unix command line utilities.

Here’s the input file vader.txt:

I’ve been waiting for you Obi-Wan. We meet again, at last.
The circle is now complete; when I left you, I was but the
learner, now I am the master. Only a master of evil, Darth.

and the output:

1 4 i
2 3 the
3 2 you
4 2 now
5 2 master
6 1 when
7 1 we
8 1 was
9 1 wan
10 1 waiting
11 1 ve
12 1 only
13 1 of
14 1 obi
15 1 meet
16 1 left
17 1 learner
18 1 last
19 1 is
20 1 for
21 1 evil
22 1 darth
23 1 complete
24 1 circle
25 1 but
26 1 been
27 1 at
28 1 am
29 1 again
30 1 a