Occupational hazards

From a Computerworld article from 1998 (Occupational hazards unmasked!)

Algolholism: Sufferers can’t stop writing Algol programs, even though no one has sighted a working Algol program since 1977. Victims spend their days rummaging though old computer output recycling bins looking for a line or two of pure Algol code. The best treatment? Wean addicts off Algol with a synthetic derivative called Cobol.

Fibonacci in Swift

So, I didn’t really need another distraction, but Swift has an interactive environment, so now I’m playing. Here are two functions for Fibonacci, first an iterative one, and then a recursive one.

Here’s the iterative version:

func fibonacciI(n: Int) {
    var f1=1, f2=1, fib=0
    for i in 3...n {
        fib = f1 + f2
        print("Fibonacci: \(i) = \(fib)")
        f1 = f2
        f2 = fib
    }
}

And now the recursive version:

func fibonacciR(n: Int) -> Int {
    if (n == 0){
        return 0
    } else if (n == 1) {
        return 1
    }
    return fibonacciR(n-1) + fibonacciR(n-2)
}

And now the version using arrays:

func fibonacciA(n: Int) {
    var fib: [Int] = []
    fib.append(1)
    fib.append(1)
    for i in 2..<n {
        fib.append(fib[i-1]+fib[i-2])
    }
    for i in 0..<n {
        print("\(fib[i])")
    }
}

Nothing terribly different about these implementations from other languages.  I do like the way it is possible to create an array and then append items to it. The range constraints in loops, i.e. and ..< are kind of interesting as well.

 

 

Experiential programming (with Julia) (ii)

So once the novice programmer is comfortable with dealing with small pieces of data, the purview of problems can be extended to say deal with 2D data such as simple monochromatic images. So let’s look at the “noise” problem. Some images can contain a type of noise called “impulse” or salt-and-pepper noise – random black (0) and white (255) pixels. Below is an (extreme) example of noise on a small image of a dome:

In this case, the algorithm is again a simple one, based on median filtering the image. This involves looking at each pixel in the image, calculating the median value of some region around that pixel, and then assigning this value to a new image in the same location. This could be explained i using some form of diagram showing how a picture is made up of individual pixels. Basically a median filter removes “extreme” pixels, which could represent the noise.

It will extend the experiences with both arrays, loops, and the median() function. In algorithmic terms.

  1. Read in a monochrome image called noisyimage.txt (it’s a simple text image), and store it in a 2D array (matrix).
  2. Find the size of the [image] array, and create a new array to store the “cleaned” image.
  3. For every element (pixel) in the 2D array, select a 3×3 neighbourhood around the pixel, and calculate the median value.
  4. Assign this median value as the new pixel in the output image.
  5. Write the output image in a file called cleanimage.txt.

An important part of experiential learning with respect to programming is also program comprehension. So instead of creating the code, from the algorithm it is also possible to provide the code and allow students to read the code and try and associate the parts of the code with the parts of the algorithm.  The Julia program to perform this noise suppression is only 9 lines in length. Here is what the program looks like:

imgIn = readdlm("noisyimage.txt",Int16::Type)
dx, dy = size(imgIn)
imgOut = zeros(dx,dy)

for i=2:dx-1, j=2:dy-1
   block = imgIn[i-1:i+1,j-1:j+1]
   newPixel = median(block)
   imgOut[i,j] = trunc(UInt8,newPixel)
end

writedlm("cleanimage.txt",imgOut)

Colour-coding helps to associate like regions of code. For example, the code in red here is associated with input/output, blue with setting data up, and purple with actually performing the median filtering.

Swift fixes C’s “issues”

Regardless of what you think about Apple’s new language Swift – it has fixed many of the underlying issues with C, and it’s descendants. The number one thing may be that they actually gave the language a new name that it *not* related in anyway to C. Some will see it as some form of evolution of Objective-C, but it’s not. Language design has always involved taking design prompts from existing languages. Since its inception in the early 1970s, few have attempted to fix some of the inadequacies of its language structure. What are some of its “fixes”?

Strings are fully supported

Unlike C, where strings are character arrays, strings in Swift are represented by the String type. This makes life easier, because operations like equality can now be done using ==. The downside? It is  not possible to access an element of a string via an integer index (although it is still possible to access it). There is a cornucopia of information on Swift strings here at Ole Begemann’s blog.

Assignments do not return a value

One of the problems in C is code of the form:

if (x = y){
    i = x + 1;
    vortex[i] = 0;
}

This effectively does not compare x and y, but assigns the value of y to x which is always true (unless y is zero). The assignment operator in Swift does not itself return a value, preventing the use of  i=0 instead of i==0 in if statements.

No need to use break statements in switch blocks.

The case statements in Swift do not fall through the bottom of each case and into the next one by default. To do that you have to explicitly use the fallthrough clause.

switch does intervals.. AND compound cases

Finally! A switch statement that allows for intervals in the case clauses.

let grade = 73
var lettergrade: String
switch grade {
    case 0..<50: lettergrade = "F"
    case 50..<60: lettergrade = "D"
    case 60..<70: lettergrade = "C" 
    case 70..<80: lettergrade = "B"
    case 80..100: lettergrade = "A"
}

And even compound cases.

let ch: Character = "o"
switch ch {
    case "a", "e", "i", "o", "u": print("vowel")
    default: print("consonant")
}

Integer overflows are trapped as a run-time error.

Can be allowed through the use of a special integer operators: &+, &-, &*, &/, and &%.

Braces in if are not optional

In C, you can write a single statement after an if, and there is no issue. Multiple statements require the use of braces { }, to contain the statements. This can lead to programming errors in code when people forget to include the braces. For example:

if (x > y)
    printf("%d is greater than %d\n", x, y);
    max = x;

Here the last statement is executed regardless of whether x is greater than y or not, because the braces have been omitted. Swift does not allow for the omission of braces around the statement, they are mandatory… so no more issues like the example above. (Or dangling-else for that matter)

The use of for-in

Swift provides a for of the for loop which allows iteration over “containers”, or numeric ranges. For example:

for index in 1...5 {
    print("\(index) times 5 is \(index * 5)")
}

printf be gone!

As you will notice from the previous example, the values are printed out using an overloaded print statement.

do-while becomes repeat-while

Just a small thing I guess, but the word repeat is more meaningful from a looping perspective than do (especially to novice programmers).

NO GOTO

Another language decides that enough of goto is enough. It’s not there, so don’t look for it. Surprisingly you can create a similar feature (and here is a blog post that shows you how).

FUNCTIONS RETURN MORE THAN ONE VALUE

Yes, like many of its contemporaries, Swift allows more than one value to be returned via a tuple.

func circle(radius: Double) -> (Double, Double) {
    var area=0.0, circumference=0.0 
    area = 3.14159 * radius * radius
    circumference = 2.0 * 3.14159 * radius
    return (area, circumference)
}

This is called in this manner:

let values = circle(7.0)
print("area is \(values.0) and circumference is \(values.1)")

There are *many* other features, like the use of optional parentheses in control structures. I kind of like the type inference as well. Both these are the same.

var life:Int = 42
var life = 42

You can also run Swift as an interactive session from the terminal. For more basics try Swift by Example.

 

Experiential programming (with Julia) (i)

In experiential programming, the concepts of learning to program are taught as a side-effect of solving a problem. This is sometimes called problem-based learning. The syntax of a language is introduced when it is needed to solve a problem. The novice programmer then obtains an understanding of how a particular piece of programming language syntax can be used in an algorithmic context. Consider a simple example which finds the median of a series of user-input numbers. In its simplest context, the problem involves three steps:

  1. Obtain the user-input numbers.
  2. Calculate the median value.
  3. Output the median to the user.

So first, we ask the user how many numbers they wish to enter.

println("How many numbers? ")

This introduces the most fundamental output device in Julia, the function println(). Next this piece of information has to be obtained from the user.

n = parse(chomp(readline()))

This seems more complicated, but it basically reads the line of input, “chomps” the <return> from the end, and then turns the input into a number.

Next, a storage medium is created to store the values entered. This vehicle is an array of numbers, in this case with zeros as starting values.

t = zeros(n)

Now a loop can be introduced to allow for the n numbers to be input.

for i=1:n
   t[i] = parse(chomp(readline()))
end

The final part of the puzzle involves calculating the median value of this set, and outputting  this value out to the user. This is relatively easy, using the function median().

m = median(t)

Now output it:

println("The median value = ", m)

This 8-line program works quite nicely and introduces some of the basic programming ideas:

  • input from and output
  • using a container to store more than one piece of data
  • iterative structures called loops
  • using a built in function, in this case a statistical function

There is no need at this point to discuss datatypes – the way the program has been written, all input numbers will be assumed to be float numbers¹. The great thing about this Julia program is the lack of things that can make life more complicated for a novice programmer. There are no objects, no datatypes, and no having to deal with memory, or formatting instructions for input/output statements.

println("How many numbers? ")
n = parse(chomp(readline()))
t = zeros(n)
for index=1:n
   t[index] = parse(chomp(readline()))
end
m = median(t)
println("The median value = ", m)

Yes, there are some words which will seem strange to novice programmers,i.e. parse and chomp. But, there are many other things that exist in other languages that are transparent here. Performing a median calculation in C would require building a function called median(). And unlike Python, (and C) where arrays begin at element 0, arrays in Julia begin at element 1… so an array with 10 elements is indexed from 1 to 10. Now this code could be easily modified so that median() is replaced with mean(), or var() (variance).

¹Experienced programmers may loathe this idea of using floating-point numbers as the default, but it does not harm anything and makes life way easier for the novice. They may also be troubled by the dynamic typing, but this takes away the problems associated with declaring the variables index for example.

Now this could be used as the jump off point for asking the novice to modify the code to add a feature like calculating the sum of the numbers, and outputting this to the user. (In Julia this is as easy as using the sum() function.

 

Why programming for non-CS students is different

There is a lot of literature in the CS community about the reasons students drop-out of introductory programming courses. There are two streams of students: those who will go on to immersive careers in computer science (e.g. software engineering), and those who will use programming as a tool in their discipline, maybe to analyze process data. The latter includes fields like finance, physics, and bioinformatics. They are two vastly different groups of people, and it is almost impossible to create one course that suits everyone. Teaching a course based on a language such as C or C++, which requires knowledge of low level concepts such as memory management, will alienate non-CS students. Spend too much time on high-level use of a language, and the CS-majors will miss critical information on low-level concepts.

Often it comes down to difficulties associated with the programming language being used to teach programming concepts. For example, non-CS students don’t really need to know about object-oriented programming – it often just complicates things from the perspective of building programs. Game developer John Carmack once said “Sometimes, the elegant implementation is just a function. Not a method. Not a class. Not a framework. Just a function“. They also don’t need to know that much about memory, and certainly not to the complexities of dynamic memory, or indeed complex ways of coding things to make a program more efficient, e.g. vectorization.

So what to teach a non-CS student in the way of learning to program?

  1. How programs can be used to solve real-world problems. This often involves the use of real-data, e.g. digital images. Computer-centric examples are often irrelevant for non-CS individuals.
  2. Focus on problem solving.
  3. How to formulate a problem as an algorithm. For example if the problem is  removing random noise from a monochromatic photograph – how do we do this?
  4. Learning the basic constructs of programming – making decisions, repeating things, how to store data (from a high-level perspective).
  5. Writing the programs to implement the the algorithms
  6. How to test programs.

Choice of language is extremely important, and from that perspective Julia is a perfect language to teach novice programmers who don’t want to make a career of it, but want the skills to help them in their disciplines. Teaching programming can also be moved to a more experiential process, whereby languages syntax is introduced as they are used in the problem solving context.

CamelCase or Underscore?

One thing people always seem to argue over is naming conventions in programs. Should we allow the use of Camelcase, or underscore when naming variables, and functions? CamelCase is defined as writing compound words, such that each word in the middle of the phrase begins with a capital letter, with no intervening spaces or punctuation.

boundingbox    - not CamelCase
bounding_box   - separated by underscore
boundingBox    - CamelCase

Arguably we do use CamelCase even in daily text we write – eBay, iPhone.

Some have argued that the use of Camelcase reduces readability. The reality is that programs written in UPPERCASE reduced readability because words in a program lost their shape, and word shape is one of the cues which helps us read effectively. Is it better than underscore? I would say yes.  There have been studies on what is known as an “interword filler”, in this case the _ character, that show that they cause lateral distraction, reducing reading speed. Although the alternate universe would argue that underscores better resemble natural writing, with the underscore simulating a space.  Epelboim [1] found that removing spaces from text slowed reading by only 10-20%.

CamelCase is aesthetically pleasing and underscores are ugly? Which of these is truly better?

playerIdentifier, playerID, playerId, player_ID

Likely the use of CamelCase is still somewhat controversial.

writetofile();
     VS
WriteToFile();
writeTOfile();
Write2File();
write2File();

Obviously you would only use CamelCase where appropriate.

Deißenböck [2] cite that approximately 70% of the source code of a software system consists of identifiers. They go on to describe the naming of identifiers in real-world software systems as being “close to obfuscation”. This of course is somewhat frightening.

Ultimately its up to you to make the right choice when it comes to selecting names for variables and functions. I like the notion of using a lowercase letter for the starting word, e.g. write2File. I would do the same thing for external modules, such as in Julia. For example a function readPGM() from module imageIO, would be accessed as imageIO.readPGM(). There is no right or wrong way – but the names must be readable and used consistently. And nowadays, I avoid underscores as much as I can.

 

Ref(s):
[1] Epelboim, J., Booth, J.R., Ashkenazy, R., Taleghani, A., Steinmans, R.,  “Fillers and spaces in text: The importance of word recognition during reading”, Vision Research, 37(20), pp.2899-2914 (1997)
– This study looks at fillers in words: Latin letters, Greek letters, digits and shaded boxes, and conclude that fillers in text disrupt reading by affecting word recognition directly.

[2] Deißenböck, F., Pizka, M., “Concise and consistent naming”, In: Proc. of the 13th Int. Workshop on Program Comprehension (2005)

 

The design of dead languages…

Once you are versed in a few languages, in principle you should be able to read any piece of code and decipher some idea of what it is doing. Take for example the following piece of code below. Obviously from the code you can quickly discern that it calculates Fibonacci numbers, but what happens in the code, and what language is it?

Firstly, don’t fret about the language, likely you have never seen it before… it is S-algol, short for St. Andrews Algol (a dialect of Algol 60). It doesn’t differ terribly from any other procedural language. The first part of the program revolves around the two phrases let and write. Obviously write is used for output. The use of let likely seems strange to many programmers. It basically “introduces” a variable, i.e. declares it with an initial value. Interestingly Swift uses let to assign constants. The while loop is fairly trivial (no parentheses, compound statement delineated by begin-end).

The rest of the program is pretty self-explanatory. There is an if-do statement, which uses = for equality, and relegates assignment to the classic := operator. Basically this program prints the first 42 Fibonacci numbers in a 6 column by 7 row format – the “‘n” outputs a newline after six columns. You’ll notice that semicolon is used, but only to separate two or more statements on a single line (that almost makes sense).

Once you know the basic control structures used in programming, it shouldn’t be that hard to decipher most pieces of code (languages like APL are an exception of course). Even dead languages have a tale to tell about their design, and what features could evolve into future languages. What would I choose from S-algol?

  • The use of the words reads and write is extremely intuitive – print always has associations with a physical printer. The reads function is quiet unique because it is used in the form x := reads , which does not require parameters.
  • The language uses dynamic typing, which is odd for a compiled language, however the language was designed to “deduce” the type from its declaration using let.
  • Strings are integral types (Yah!). String constants are easy to create: let s := “olympus” . The ++ operator can be used to concatenate strings.
  • Arrays are vectors, such that let x := vector 1::8 of 0 creates an 8-element integer array containing zeroes. very clean syntax. The use of the term vector is good because it builds on existing knowledge.

Negatives? The languages uses both = and := for assignment in vectors to distinguish between constants and variables. This seems kind-of vague from the examples I have seen, and could lead to confusion.

P.S. If you want to play with the S-algol compiler you can find the sourse code here.

History of the horrid switch statement (iii): C and beyond

C is where the concept of a multi-way statement was to fall off the rails. Partially because the control structure is hard to teach to novice programmers, and lacks the elegance of what came before. Consider the structure of C’s  switch statement:

switch (n) {
    case C1 : S1;
    case C2 : S2;
    ...
    case Cn : Sn;
    default : Sd;
}

The problem with the C switch statement, is that it allows fall-through, only single case, i.e. no ranges are allowed, and only integers for comparison (which might have been derived from Algol 68, which only allowed integer expressions). Fall-through is a unique problem (and in some cases is useful), which means that if n=C2, then after performing S3, the program will proceed to perform S4, etc., and the only way around this fall-through is the use of the keyword break. This harkens back to the notion that C is assembly in disguise. Break acts like it a separator, but in reality is more of a goto, transferring control to the end of the statement. It also suffered from an inability to allow ranges of values. So one could end up writing a piece of code that looks like this:

switch (n) {
    case 1 : ;
    case 2 : ;
    case 3 : ;
    case 4 : x = x + 1; break;
    case 5 : x = x + 2;
}

Here if the value of n is 1, 2 or 3, it just falls through to the statement at n=4, effectively mimicking “case 1..4” . Messy right? (But I have seen code like this). Better to have used a if statement to begin with. Finally, if the program requires a series of statements under each case, then you have to use { }.

The multi-selection of Ada combines the best characteristics of both Algol 68, and Pascal:

case expr of
    when C1 => S1;
    when C2 => S2;
    ...
    when Cn => Sn;
    when others => Sm;
end case;

C1,…,Cn are possible values, for example:

1
3..10
11 | 14
2 | 20..30

The case statement in Ada is functionally identical to that of C, however, unlike C, Ada supports ranges of values, and does not require a break statement. Here’s an example:

case currentmonth is
    when May..Jul => prices := summerrates;
    when Aug|Sep|Feb..Apr => prices := standardrates;
    when Dec => prices := holidayrates;
    when other => prices := winterrates;
end case;

Which is quite elegant. Fortran 90 provided a selectcase structure, which was essentially a replacement for the computed goto:

select case (i)
    case (C1)
        S1
    case (C2)
        S2
    case default
        Sd
end select

Here C1 and C2 are selectors which can consist of a list and/or range of integers, character or logical constants. The switch statement in Swift is a derivative of that of C:

switch C {
    case P1 : S1;
    case P2 : S2;
    case P3 : S3;
    default : Sd;
}

The major difference is that in Swift case statements do not implicitly fall-through. Swift case statements can match multiple cases on the same line – each value is separated by a comma before the colon. Also it allows ranges (X…Y). Here’s an example:

switch X {
    case 0,2,4,6,8 : println(“even”)
    case 1,3,5,7,9 : println(“odd”)
    default : println(“NaN”)
}

Finally, an elegant fix to the limitations of C’s switch.

Note that neither Lua, nor Python, nor Julia have a case or switch equivalent.

What’s in a name?

There are many good languages that have poor names.

C → Because it comes after B?
Java → Was it meant to be named after coffee, or the island?
Python → a big snake?
C++ → Better than C?
Perl → **BORING**

And anything that is just an acronym is just shear laziness. There are some names that I think are okay – Pascal(Blaise Pascal), Julia (from a random conversation), Oberon (moon of Uranus). But what about some really cool names? Here are some words that I like, and I think would invoke more interesting discussions than “C”.

zephyr → a soft gentle breeze, commonly used to name trains, a famous one which was streamlined. Now the name of an operating system.
rhubarb → a plant. Might be a good name for a recursive language.
Awa → Māori for river. Easy to say, short, sweet.
foehn → German for a dry warm wind. Just sounds nice. More European?
chinook → wind, helicopter (is there a wind thing happening here?)
Pålegg → anything put on a piece of bread (Norwegian). A language with the lot!
Boketto → no real translation from Japanese (staring at the sky without a thought?)

Sometimes the best name for a language is something that has no meaning. Maybe it rolls off the tongue nicely, or maybe it just sounds cool.