In Python loops can be lethal

I like Python, it’s a great language. What don’t I like? It’s slow. The thing is, you can’t just convert code from C to Python and hope it will run at the same speed. It won’t. This message came across loud and clear when I started doing image processing using Python. Algorithms that ran efficiently in C and Matlab became bogged down in Python. Maybe the core reason is that Python code executes slower because it is interpreted, rather than being compiled into native machine code. Built-in operations that improve efficiency in Python are typically pre-optimized – that’s why it’s good to use them. One of the biggest problems it seems is loops.

Write a small program using loops and this isn’t at all obvious. Write code to process a 3000 x 4000 pixel image, and you’ll see why. Consider the following Python code which converts an image from RGB to another colour space YIQ. Note that img is a numpy array, but the calculations are done using primitive operations.

for i in range(0,img.shape[0]):
    for j in range(0,img.shape[1]):
        imgY[i,j,0] = 0.299 * img[i,j,0] + 0.587 * img[i,j,1] + 0.114 * img[i,j,2]
        imgI[i,j,1] = 0.596 * img[i,j,0] - 0.275 * img[i,j,1] - 0.321 * img[i,j,2]
        imgQ[i,j,2] = 0.212 * img[i,j,0] - 0.523 * img[i,j,1] + 0.311 * img[i,j,2]

This code runs through each pixel individually, converting it from RGB (img), to the individual components of YIQ (imgY, imvI, imgQ). And it is slow. How slow? For a RGB image that is 3000 x 4000 pixels in size – 1004 seconds.

Now consider a better way of writing this piece of code, written using numpy operations.  Numpy is based on the “ndarray”, for n-dimensional array, data structure. These arrays are strided views on memory – where strides are  the number of bytes to skip in memory to proceed to the next element. Numpy also uses vectorization, where an operation is applied to all elements in the array. Vectorized operations in NumPy are implemented in C.

r, g, b = img[:,:,0], img[:,:,1], img[:,:,2]
imgY = 0.299 * r + 0.587 * g + 0.114 * b
imgI = 0.596 * r - 0.275 * g - 0.321 * b
imgQ = 0.212 * r - 0.523 * g + 0.311 * b

This piece of code works by performing the operations as a complete block, avoiding the use of loops altogether. The operations are automatically mapped to each element. How long does this take? 1 second.

Which goes to show that understanding how a programming language works goes a long way to writing efficient programs. Another image filtering example in Python will follow soon.

NB: An excellent paper on the Numpy data structure can be found here.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.