Read your own punch cards

Ever wonder how punch cards worked? Before the advent of programs that could just be edited on a machine, programmers had to deal with punch cards. Doesn’t seem like a big deal to us. But what was involved in using these punch cards? Punch cards meant that every line of code in a program had to have its own card. So if your program had 1000 LOC, then you had 1000 cards. Don’t trip now…

punchcard

Now most punch cards, had the statement they represented printed at the top of the card. The example above shows the statement Z(1) = Y + W(1). This Fortran card has 80-columns, and shows clearly the 7 columns of the fixed formatting. Now interpreting the card (lets assume we don’t know what’s on it), requires using EBCD, a subset of EBCDIC (Extended Binary Coded Decimal Interchange Code).

Notice that there are nine rows numbers 1 to 9, a row 0, and two unmarked rows above that which are usually marked Y and below that X. Each character of the statement is encoded using 0-3 punches in the card, with 0 representing a space. For example, Z is encoded as a punch in 0 and 9. Using the table below, one can see that the intersection of 0-9 is Z. The value 1 has only one punch in the 1. The coding for +mis Y-6-8.

ebcdpunch

Of course there were many different types of cards as well. This allows for 64 characters to be used in a program. In todays perspective this isn’t a lot. Notice what’s missing? Lowercase characters. This is one of the reasons early programming were coded in uppercase, and not lowercase. It would of course have been possible to increase the size of the card to accommodate lowercase characters, or an extended character set, but that might have impacted the structural integrity of the card, and may have required making a thicker card.

Check out this video on how to make punch cards:

We don’t know how lucky we are.

Python’s Achilles heel

I have mentioned Python’s lack of efficient processing for image processing before, but there is a bigger issue, and I am reminded of it every time I leave Python code for a few months. If the code has dependencies, then when I come back to it, one of these dependencies always seems to break. And herein lies Python’s biggest Achilles heels – it dependencies on libraries. Look, there is nothing wrong with external libraries, but a language should not have so many issues. It starts with Python 2 vs. 3… maybe Python 3 should be called something else – Boa? Anaconda? Something else. There is a reason C++ wasn’t named C 2 (although a better name than C++ would have been great).

So this morning I tried to run some code I probably wrote a year ago. First issue? OpenCV wasn’t installed on my machine (obviously I ran the code on my old machine).

ImportError: No module named cv2

Okay, so the problems are compounded by the fact that I can choose from CV2 or CV3… but to install OpenCV I had to first update brew. Then install OpenCV (CV2):

brew update
brew install opencv

Okay, fine. Then it told me I had to make more changes if I wanted to actually import cv2:

mkdir -p /Users/username/Library/Python/2.7/lib/python/site-packages
echo 'import site; site.addsitedir("/usr/local/lib/python2.7/site-packages")' >> /Users/username/Library/Python/2.7/lib/python/site-packages/homebrew.pth

Then I tried to run my code, and here’s what I got:

RuntimeError: module compiled against API version 0xa but this version of numpy is 0x9
ImportError: numpy.core.multiarray failed to import

Like, are you kidding me? So okay, I thought maybe I need to update numpy? Nope. Apparently there are two versions of numpy on the system (something to do with using pip vs. brew). So thanks to stackoverflow, I figured out I had to run:

sudo easy_install numpy

Now it runs. But I have to tell you , what a hassle! Ultimately maybe there should be an easier way of dealing with these dependencies. That’s probably why writing code in C, or Julia is nice. You can import libraries, but it is easy to do. That and most things that are critically needed are included in the language. Life would be better if numpy were just integrated into the language. At the end of the day, I don’t want to deal with these issues – and it impacts how portable code is. If I post code in Julia, all anyone has to do is have the compiler for Julia installed, and run the code. With Python,, they have to make sure all the dependencies are also loaded.

That’s one of the reasons Python is challenging for novice programmers to use.

End-of-rant.

P.S. To add insult to injury, now running some code that was working fine before… still is but now one of the other libraries is complaining…

/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/scipy/signal/signaltools.py:194: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`.
 if rank(in1) == rank(in2) == 0: # scalar inputs

Re-engineering loops for efficiency (ii)

There are a couple more ways of improving loop efficiencies:

Loop unrolling

Every iteration of a loop requires modification and testing of the loop variable. This over- head can be reduced if the number of iterations of the loop is small, say less than five. This is called unrolling the loop. The previous example in loop fusion, can be unrolled, further reducing overhead:

int i, a[1000], b[1000];
for (i=0; i<1000; i=i+2)
{
    a[i] = 0;
    b[i] = 0;
    a[i+1] = 0;
    b[i+1] = 0; 
}

The most extreme case is to eliminate the loop completely and work as sequential line code. For example:

for (i=1; i<=3; i=i+1) {
    x = x + sqrt(i);
}

would arguably be better expressed as:

x = x + sqrt(1) + sqrt(2) + sqrt(3);

removing loop-independent expressions

Expressions in loops that perform a calculation independent of the loop should be evalu- ated outside the loop. This is sometimes known as loop streamlining:

for (i=1; i<=1000; i=i+1)
    sum = sum + pow(x,4.0);

should be replaced by

powX = pow(x,4.0);
for (i=1; i<=1000; i=i+1)
    sum = sum + powX;

We remove the expression pow(x,4.0) because its calculation is independent of the loop and would otherwise be calculated 1000 times. Consider the following nested loop:

for (i=1; i<=100; i=i+1)
    for (j=1; j<=100; j=j+1)
        for (k=1; k<=100; k=k+1)

The nested structure loops 100 × 100 × 100 = 1,000,000 times, meaning that a statement such as pow(x,4.0) would be executed 1 million times. For a nested loop struc- ture, the deeper a loop is nested, the higher the dividend with respect to efficiency. Always optimize inner loops first. The same principle can be applied to nested loops, by moving independent expres- sions from an inner loop outward. For example, some repeated calculations make use of the subscript as part of the calculation:

for (i=1; i<=100; i=i+1)
    for (j=1; j<=100; j=j+1)
        A[i][j] = B[i][j] + c/i;

The calculation c/i is a repeat calculation and should be removed from the inner loop. A better way to code this is

for (i=1; i<=100; i=i+1){
    ci = c / i;
    for (j=1; j<=100; j=j+1)
        A[i][j] = B[i][j] + ci;
}

Now the value c/i is calculated 100 times instead of 10,000 times.

 

Quote from “The Mind of Mechanical Man”

The early development of “electronic brains” prompted much debate on AI. In June 1949, Sir Geoffrey Jefferson (prof. of neurosurgery at Uni. Manchester), made the following statement in a speech entitled “The Mind of Mechanical Man”:

“Not until a machine can write a sonnet or compose a concerto because of thoughts and emotions felt, and not by the chance fall of symbols, could we agree that machine equals brain – that is, not only write it but know that it had written it. No machine could feel pleasure at its success, grief when its valves fuse, be warmed by flattery, be made miserable by its mistakes, be charmed by sex, be angry or miserable when it cannot get what it wants.”

Re-engineering loops for efficiency (i)

When it comes to program efficiency, most times people tend to ignore it. I mean computers get faster all the time, so inefficiencies in a program will eventually dissipate, right? One of the more interesting area for efficiency is loops. Why? Because loops more often than not are used to process large pieces of data, like images. Sometimes it is the language being used that is inherently inefficient when it comes to loops – prime example? Python.

Loop Fusion

The best way of cutting down the amount of time spent in loops is to decrease the number of loops, hence decreasing overhead associated with incrementing indexes and testing. This is termed loop fusion, jamming, or merging. For example, consider the following two loops, each of which initializes an array:

int a[1000], b[1000], i, j;
for (i=0; i<1000; i=i+1)    
    a[i] = 0;
for (i=0; i<1000; i=i+1)
    b[i] = 0;

These loops involve 2000 index increases and tests. This can be reduced by half by merging the loops:

int a[1000], b[1000], i;

for (i=0; i<1000; i=i+1){
    a[i] = 0;
    b[i] = 0;
}

LOOP PEELING

Loop peeling (or loop splitting) attempts to simplify a loop or eliminate dependencies by breaking a loop into multiple loops that have the same bodies but iterate over different contiguous portions of the index range. A useful special case is simplifying a loop with a problematic first (or first few) iteration by performing that iteration separately before entering the loop. Here is an example of loop peeling. Suppose the original code looks like this:

int vector[100], r[100], i;
for (i=0; i<1000; i=i+1){
    r[i] = vector[i-1] * vector[i];
}

The problem here is the fact that the loop can not safely handle the case of vector[i-1] when i = 0, for the index becomes -1, which is illegal. A better solution would be to peel the first iteration off:

int vector[100], r[100], i;
r[0] = vector[0];
for (i=1; i<1000; i=i+1){
    r[i] = vector[i-1] * vector[i];
}

An artistic filter (in Julia)

Here is a cool filter which applies a “jitter” to an image. The filter basically selects a new pixel at random from a 5×5 region around a pixel in the original image.

copenhagen
Before the filter is applied
copenhagenj
After the filter is applied

Here’s the Julia code:

function jitter(img, amnt=10)

    dx,dy = size(img) 
    imgN = copy(img)
 
    for x=1:dx, y=1:dy
        nx = round(Int, x + (rand()-0.5) * amnt)
        ny = round(Int, y + (rand()-0.5) * amnt)
        if (nx < 1)
            nx = 1
        elseif (nx > dx)
            nx = dx
        end
        if (ny < 1)
            ny = 1
        elseif (ny > dy)
            ny = dy
        end
        pixel = img[nx,ny,:]
        imgN[x,y,:] = pixel
    end
 
    return imgN
end

Simpler Japanese toilet user interfaces…

It appears that due to visitor confusion (and likely some of the populace as well), the Japanese toilet industry has decided to standardize the symbols used on their technologically savvy toilets.

Complexity made more challenging by a lack of standardized symbols… until now.

Here is are the standardized pictograms:

Want to learn more? Apparently Toto, one of Japan’s biggest toilet makers has just opened a $60M toilet museum in Japan.

How to process problematic images?

Some photographs are extremely challenging to fix by means of post-processing. The worst ones may be those that are overexposed. Such a photograph will have a histogram skewed towards the higher intensities. Worse still are those with one part of the image that looks reasonable. As an example consider the image below, taken in the carpentry shop at Fort George (NY). The sunlight streaming in from the window has caused quite an over-exposure on the right side of the image.

IPexampleUNDERexp

It is *extremely* challenging to fix an image like this. It may be possible, given multiple images, or by somehow creating a “map” of the overexposed region, however this is a stretch.

Let’s face it, some images cannot be saved.