What is code complexity?

Code complexity is something the software community has struggled to define. In its simplest form it is often expressed in terms of LOC, or Lines-Of-Code. The more LOC, the more complex a piece of software is – or so the theory goes. The reality is quite different of course – there is no definitive way of defining the complexity of a piece of code –  a 1000-line program could be made up of simple statements, or could include a plethora of logical decision, both are the same length, but the latter has an inherently more complex algorithmic/logical structure. A program with a single if-else statement has two possible paths. A similar program with 10 if-else constructs has 1024 (2^10) distinct paths making it a more complex program.

Complexity could relate to a programs complexity in terms of  computational complexity  or psychological complexity.  A “rough” measure of computational complexity, logical complexity is a measure of the degree of decision making logic within a program, i.e. the number of “binary” decisions. If a program had 1000 statements, and 300 of those were “if” statements, then the logical complexity of the program is likely higher than if the program only had 20. Psychological complexity on the other hand refers to those characteristics of a program which make it difficult for a to human comprehend [1].

Here are some factors which affect the complexity of programs:

Program Form

  1. Presence or absence of well-placed, meaningful comments.
  2. Placement of declarations.
  3. Paragraphing.
  4. Variable names – mnemonic and length.
  5. Use of same variable names in an inner scope.
  6. Use of “magic” numbers.

Control flow

  1. Complexity of control flow.
  2. Choice of control construct.
  3. Length of program segments (e.g. loops)
  4. Passing functions as parameters.
  5. Recursion.
  6. Levels of nesting.

Data flow

  1. Scope of variables.
  2. Clustering of data references.
  3. Declaration and use of complex data structures.
  4. Use and manipulation of pointers.

[1] Chaudhary, B.D., “A study in dimensions of psychological complexity of programs”, Int. J. man-Machine Studies, 23, 113-133 (1985).

Machine code in BASIC

In the 1980’s a good portion of home programming was done in BASIC. A lot of this programming consisted of data which was essentially machine code. Companies would publish books with “games” in them that you had to code yourself – and to be honest the games were crap. Here is a good example, a piece of code for a game called “Space Blaster”, written for the Commodore 64. This, like may pieces of machine code embedded in Basic programs is written in hexadecimal.

basicMC

Basic used two words PEEK and POKE to view bytes stored in memory, and modify them respectively. This was just a horrible way to code.

Learning to code 1980s style

When personal computers first came out in the 1980s, there were no free compilers, except BASIC. Most systems came with a BASIC compiler of some sort. Then came the proliferation of  “games” written in BASIC, and published in books that the

One book I had when I was in high-school, was Computer Spacegames (1982), which seemed really cool looking at the cover, but it was actually really crap. But then again, my computer was an Apple IIe, and I had no clue about programming really, I just copied the code into a file, and ran it. Here’s one program – Moonlander.

10 CLS
20 PRINT "MOONLANDER"
30 LET T=0
40 LET H=500
50 LET V=50
60 LET F=120
70 PRINT "TIME";T
75 PRINT "HEIGHT";H
80 PRINT "VELOCITY";V
85 PRINT "FUEL";F
90 IF F=0 THEN GOTO 140
100 PRINT "BURN? (0-30)"
110 INPUT B
120 IF B<0 THEN LET B=0
130 IF B>30 THEN LET B=30
140 IF B>F THEN LET B=F
150 LET V1=V-B+5
160 LET F=F-B
170 IF (V1+V)/2>=H THEN GOTO 220
180 LET H=H-(V1+V)/2
190 LET T=T+1
200 LET V=V1
210 GOTO 70
220 LET V1=V+(5-B)*H/V
230 IF V1>5 THEN PRINT "YOU CRASHED-ALL DEAD"
240 IF V1>1 AND V1<=5 THEN PRINT "OK-BUT SOME INJURIES"
250 IF V1<=1 THEN PRINT "GOOD LANDING"
260 STOP

It taught some basics about program code, most notably IF statements to make decisions,  LET statements to assign values, and of course GOTO. Short programs sometimes worked, but more often than not the longer the program was, the less likelihood it would work.

Want to have fun with [crappy] basic programs? Try this site.

Julia breaks recursion!

I do love recursion.

However Julia it seems does not. And you know that’s okay – recursion isn’t the panacea that some  people think it is. I mean I think it is an interesting problem solving methodology (I am writing a book on it)… but there are other ways of doing things. I recently wrote an algorithm to flood fill a portion of an image – that basically involves replacing the colour of  an interconnected region in an image with another colour. There were two algorithms: one recursive, and one iterative, however it is the recursive one that causes some issues. Here is the code in Julia:

# Floodfill using recursive algorithm
function floodfill(img, seedx, seedy, newC)

  function flood(x, y, oldC, newC)
      if img[x,y] != oldC       # the base case
          return
      end
      img[x,y] = newC
      flood(x+1, y, oldC, newC) # right
      flood(x-1, y, oldC, newC) # left
      flood(x, y+1, oldC, newC) # down
      flood(x, y-1, oldC, newC) # up
  end
 
  oldC = img[seedx,seedy]
  flood(seedx, seedy, oldI, newC)
 
  return img
end

Note that Julia allows nested functions, which makes it easier from the point of view of passing large structures like arrays, because you only have to pass them into the outer function, and the inner recursive function has access to them. I tested the algorithm on the following 10×10 sample image, with a starting point of (3,4), and a flood-fill intensity of 7.

testI = [0 0 0 0 0 0 0 0 0 0; 
         0 0 1 1 1 1 1 0 0 0;
         0 1 1 1 0 0 1 0 0 0;
         0 0 1 1 1 1 1 1 0 0;
         0 0 0 0 0 1 1 1 0 0;
         0 1 1 1 0 0 0 0 0 0;
         0 1 1 0 0 0 0 0 0 0;
         0 1 1 1 0 0 0 0 0 0;
         0 0 1 1 1 1 1 1 0 0;
         0 0 0 0 0 0 0 0 0 0]
 
testIf = floodfill(testI, 3, 4, 7)

Here is the output:

[0 0 0 0 0 0 0 0 0 0
 0 0 7 7 7 7 7 0 0 0
 0 7 7 7 0 0 7 0 0 0
 0 0 7 7 7 7 7 7 0 0
 0 0 0 0 0 7 7 7 0 0
 0 1 1 1 0 0 0 0 0 0
 0 1 1 0 0 0 0 0 0 0
 0 1 1 1 0 0 0 0 0 0
 0 0 1 1 1 1 1 1 0 0
 0 0 0 0 0 0 0 0 0 0]

No problem at all. Now I tested it on an image that is 1000×1000 image with a 282,000 pixel region to be filled. It doesn’t take long to break though. The function flood() is called a  total of 24835 times before the program fails, giving the following message: “ERROR: LoadError: StackOverflowError:”. There is obviously a limit to the stack size in Julia, but for the life of me I could not find any information pertaining to this, nor how to increase the size of the stack. So, similar to Python, large recursive processes should likely be avoided.

Running the iterative version of the flood fill runs without any problem, regardless of the size of the image.

 

 

 

Making a heterogeneous stack in Julia

So, data structures in Julia can be a little tricky. The Collections module provides for a Priority Queue and some Heap functions. Arrays are nice, but sometimes a more complex structure is needed, say a heterogeneous stack. Firstly, a composite type can be created, and this is essentially a form of record, equivalent to a  struct in C. Consider the following example:

type Point
  x::Int
  y::Int
end

In Julia this is known as a “user-defined concrete type”, and contains two field names x and y, annotated with types (Int in this case). Then a variable of this type can be created in the following manner:

p = Point(7,12)

Therefore p.x has the value 7 and p.y has the value 12. The type here is applied like a function, and is termed a constructor. Now how to turn this into a stack? First create an empty array from the type Point.

stackP = Point[]

Then simply add to the stack using push!, and extract from the stack using pop!.

push!(stackP,Point(3,4))
x = pop!(stackP)

It works nicely, the only issue was finding some sample code that actually explained how this would work – again the official documentation is not the best at explaining things like this, and is certainly lacking in examples.

If you want a more comprehensive set of data structures nicely packaged up, check out this DataStructures on Github.

 

Why add a string type to C?

One of the caveats of C is its lack of string type. The only way strings can be defined is by the use of a character array, which isn’t always the best solution. The biggest problem with strings in C (for the novice anyway) is the end-of-string (NULL) terminator ‘\0’. Mainly it causes problems because a declaration like this:

char str[100];

Only technically stores 99 characters. To the experienced programmer, this isn’t an issue, to the novice it is. The other issue is that people see the “shortcut” of using:

char *str;

Which of course seems very seductive because the novice does not realize that memory must be allocated – and frankly C does allow it to happen. Consider the following two segments of code:

char *str = "Luke Skywalker? I Thought He Was A Myth.";
printf("%s", str);
char *str;
scanf("%s", str);
printf("%s", str);

The first piece of code works because a text string has been associated with the “pointer to char”, str. But it works because this is a string literal, i.e. stored in read-only memory. Try to modify the string, and an error will occur. The second piece of code does not work on most systems. In fact on my MacBook it produces a segmentation fault, and on a Raspberry Pi it produces a bus error.

Sometimes when a algorithm involving a string is created, the programmer forgets to add a terminating ‘\0’, which can lead to problems. Consider the following snippet of code.

char str[10];
str[0] = 'J';
str[1] = 'e';
str[2] = 'd';
str[3] = 'i';

Here str may cause undefined behaviour if used as a string, because the string has not been NULL-terminated. Ironically on many modern compilers, it seems as though somehow a string terminator  is automatically added.

C would be better serviced by a dedicated string type.

Technology overload?

Our lives may be full of far too much technology. Everyday we read about something new and fascinating that will radically change our lives. Self-driving cars that alleviate the need for traffic lights, smart houses, pre-prepared fruit, etc. etc. It’s all suppose to make our lives easier – but the only thing it really does is clutter our lives with stuff, most of which is of the disposable type. Sure, I have a workshop full of handtools, but those tools will likely still be around in 200+ years (if properly cared for). The Apple MacBook I’m writing this on probably has a life-span of 4-5 years… and that’s only because of the solid-state memory. When electricity goes out in an ice-storm, an axe or wood-saw might be useful, both from the point of view of getting fuel, or chopping down trees – yet how many modern houses have either of these?

People talk a lot about smart houses, but are they necessary to make things more efficient? How about building more passive houses – houses that are ultra-low energy and require very little energy for heating or cooling the space inside the house. It may be time to build better houses (or even smaller ones?) before we throw technology at the problem.

Image processing in Julia (i)

Some may think that C is a good language for image processing, and it is, certainly from the perspective of speed. However it suffers somewhat from usability issues with arrays (at least from my viewpoint). Partially this is attributed to forcing the user to use dynamic arrays for large 2D arrays to store images, but also because I do not want to deal with arrays that begin with an index of 0. Python on the other hand, just suffers from being slow, and frankly I don’t want to spend the energy to completely vectorize things.

COLUMN VS. ROW MAJOR

Most people use to languages such as C deal with 2D arrays in a row-major, which means arrays are stored as rows. Julia however harks back to Fortran (and MATLAB) and uses column-major arrays. So an array, z,  of the form:

1 2 3 4
5 6 7 8
9 10 11 12

would be stored as 1, 5, 9, 2, 6, 7, 3, 7, 11, 4, 8, 12. Therefore accessing z[8] would return the value 7. This is a little tricky, but it’s just about how things are stored, and shouldn’t interfere two much with processing an array. The only caveat occurs when “flattening” the array, as it is flattened by column, instead of by row. It’s really all about memory layout.

1D VS. 2D

So the way to create arrays in Julia is quite interesting. Doing the following:

z = [1 2 3 4 5 6 7 8]

Creates a 2D array, with 1 row and 8 columns. Remember Julia is column-major. To create what is essentially a vector, one writes:

z = [1, 2, 3, 4, 5, 6, 7, 8]

Trying to access the third element in both cases will work with z[3], although z[3,1] works with the vector, and z[1,3] works with the 2D array. A 3×3 array can be created in the following manner:

z = [1 2 3;4 5 6;7 8 9]

There are a lot of built-in functions which can be used on arrays, for example vec(x) which converts the array x, whatever shape into a vector.

ARRAYS begin with 1

As a person who does some image processing, dealing with arrays that begin with an index of 1 is  perfectly fine. Images are basically 2D arrays, and when processing neighborhoods in the region it is way easier to deal with 1..n than it is 0..n-1, from an algorithmic viewpoint.

ARRAY SLICING IS *SO* CONVENIENT

C does not allow for array slicing, which is an annoyance. So processing an entire image by extracting a 5×5 neighbourhood in C requires a piece of code of the form (where z=2 represents the boundary pixels to deal with the 5×5 neighbourhood):

for (i=z; i<dx-z; i=i+1)
  for (j=z; j<dy-z; j=j+1)
    for (x=-2; x<=2; x=x+1)
      for (y=-2; y<=2; y=y+1)
        block[x+2][y+2] = image[i+x][j+y]

Sure, it’s fast, but it really is inconvenient to have to write code like this. The same piece of code in Julia is:

for i=z+1:dx-z, j=z+1:dy-z
    block = img[i-2:i+2,j-2:j+2]
end

The syntax in Julia is way easier to code.

IMAGE I/O

One of the inconveniences of any programming language used for image processing is reading in specific file formats… in fact trying to read TIFF or JPEG images is often a bugbear in any language. One alternative is to store images as text files, containing the raw 8-bit values (for grayscale anyway). Julia provides an easy method of reading in the image, of the form:

imgIn = readdlm(fname,UInt8::Type)

This function readdlm reads the image in based on the the rows and columns of the image inside the file given by fname. The argument UInt8::Type specifies the format type to be stored in the array imgIn. Note that just using Int can lead to the 8-bit image being stored as an array of 32/64-bit integers, which can cause a system crash when processing large images.

IT’S ALL ABOUT THE DATA

One of the caveats of dealing with any image is how to store the data. In an ideal world, an 8-bit images would be stored as UInt8 in Julia, giving values from 0 to 255. This works well if any calculations only involve that set of discrete values, however if a function is responsible for applying a kernel with negative coefficients, using UInt8 is no longer feasible, and neither is Int8. So it is then necessary to take a step up and store/process images using Int16, which is overkill from a storage point-of-view, but saves a lot of hassle when processing images.

 

 

Testing Julia for speed (iii)

The last example for testing speed in Julia will be a 5×5 mean filter. A mean filter is simply a way of smoothing an image, ie. suppressing irregularities in the intensities, and is common used for noise suppression. It basically works by working through each pixel in an image, obtaining the mean of the 5×5 region around that pixel, and using this value as the pixel in the filtered image. As such it is an intensive algorithm.

In this particular example, we shall apply a mean filter to a 2144×6640 pixel, 8-bit image: effectively 14 megapixels. The algorithm is analyzed using code written in C, Fortran, Python and Julia.

chateau
The 14MP 8-bit image

The image is processed using the following Julia code:

function process(img)

    dx,dy = size(img)
    imgN = copy(img)
    e = div(5,2)
 
    for i=e+1:dx-e, j=e+1:dy-e
        block = img[i-e:i+e,j-e:j+e]
        mn = mean(block)
        imgN[i,j] = round(UInt8,mn)
    end
    return imgN 
end

img = readdlm("pano.txt",Int16::Type)
@time imgN = process(img)
writedlm("panoJ.txt",imgN)

Since Julia provides efficient means of reading an entire block of integers in from an ASCII file, the code is much shorter than it would be in C. In addition, Python, Julia and Fortran provide facilities to slice arrays, making extraction of the 5×5 neighbourhood easy.

The code has been timed based just on the process of performing the mean filtering, and avoiding extras such as file I/O. In most cases this involves a nested loop to process each pixel in the image. In the case of C, it includes a second set of nested loops to extract the 5×5 neighbourhood.

timingMeanFilter

As the results show, C was able to process the image in less than a second, with Fortran in second place marginally over a second. At just over 3 seconds, the Julia code could be considered to be exceptionally fast as well. The loser in this instance is again Python which is invariably slow, even with only one pair of nested loops. In a future post, I will explore this algorithm in each of the languages cited in the context of image processing.