Abbreviations in programming languages

Ever wonder where those wonderful abbreviations in C came from? You know += ? If you delve into Algol 68, you will notice some of them there.

Until Algol 68 datatypes were mostly indicated as keywords such as integer, real,  character. As the new kid on the block, Algol 68 introduced abbreviations that unlike PL/I were not optional. So integer became int, character became char, Boolean became bool (pity C didn’t make booleans integral from day one). For the novice programmer, this does tend to make programs more challenging to read. It also coined the term void. Algol 68 also allowed datatypes to be lengthened and shortened, in a way which will be inherently familiar to C programmers:

long real, long int, short real, short int, long long real, long long int

See the similarity? Of course it doesn’t end there. Algol 68 also allowed a form of optimization as it relates to operators:

sum plusab x

Look familiar? This is the “plus and becomes” operator, which can also be written as +:=, and in C as += . It’s like a piece of assembler jumping into your code… and the presence of this in the code was meant to signal the compiler that this code can be optimized. There were others of course: minusab, timesab, divab, and overab/modab for integers.

 

Program re-engineering: code cocooning

As we have discussed before, re-engineering can involve migrating code to a new dialect of the same language, translating code to a new language, or porting code to a new system. One additional method of re-engineering is something we will term code cocooning. The process of cocooning involves taking a piece of existing legacy code written in one language, and essentially calling it from another wrapper language. This essentially insulates the legacy code, meaning that it is not modified in any manner, and almost exists in stasis. Due to backwards compatibility of languages such as Fortran, there is no issue with compiling the code into an object form, and having the wrapper language access the object code. The wrapper language should be capable of providing proper parameter passing.

Here is an example. Consider a series of three Fortran (F77) subroutines used for biomass computation¹. The three subroutines are RATIOS, STEM and NSTEM, used to perform the required computations for determining the dry weights of wood and bark for various components of a tree. The functions all work, but require a main program to interface data and produce results. Let’s say this main program was written in Julia. By cocooning the Fortran code, it does not need to be modified in any way – the Julia program will access the object code generated, and perform its functions accordingly.

Another more drastic approach is a process we call entombing. This is when then unmodified legacy functions are encapsulated within a newer dialect of the language in question. For example if the Fortran 77 functions from the example above were encased in a Fortran 2003 layer. From the users perspective, the functionality of the legacy code is available, yet the code is entombed – again there is not requirement to modify the legacy code in any way (unless it contains things like older “C” comments, or deleted features).

Of course neither of these techniques is perfect, but they offer a mechanism for re-engineering that may work in certain circumstances.

¹Singh, T., Campbell, D., “Fortran Subroutines for Biomass Computation”, Note No.22, Northern Forest Research Centre, Canadian Forestry Service, Environment Canada (1983)

Indenting… why not use 3-space indents?

Anyone would think I were interested in indenting. Well, it is the kind of thing many people ignore. I use to be very much a automaton of 4 spaces… but in recent years (maybe the wisdom acquired with age?), I have begun to explore the realms of computer science literature in the 1960s and 70s, and have realized the wealth of knowledge many have forsaken. I recently wrote an article on the origins of indenting. In 1980 Dennis Leinbaugh wrote a paper titled “Indenting for the Compiler“. In the opening sentence of the abstract he wrote:

A simple indentation rule, indent statements from the control statement they belong to, is sufficient to express a program’s block structure without the use of compound statements or closing keywords.

Which really means that indenting wasn’t that well known. It likely all began with formatting Algol programs in the 1960s, yet wasn’t explicitly discussed until Pascal programs in the late 1970s. Leinbaugh cites an indentation rule:

“all statements directly belonging to  control statement are right indented an equal amount from the beginning of the control statement”

A rule which is most commonly used to this day. But one has to wonder if some of the ideas of indenting rules we use today are useful. I have talked about different aspects of writing control structures on single lines before, and 2 space indenting. I’m almost willing to advocate for 3 space indenting as a form of compromise between 2 and 4. So why not combine these concepts?

Here is a piece of C code that prints out 3-digit Armstrong numbers:

#include <stdio.h>

int main(void)
{
    int i, j, k, arm;

    printf("The following are 3-digit Armstrong numbers:\n");
 
    for (i=1; i<=9; i=i+1)
        for (j=0; j<=9; j=j+1)
            for (k=0; k<=9; k=k+1)
            {
                arm = i*i*i + j*j*j + k*k*k;
                if (arm == i*100+j*10+k)
                    printf("%d ", arm);
            }

    return 0;
}

Now here is the code with modified style:

#include <stdio.h>

int main(void)
{
   int i, j, k, arm;

   printf("The following are 3-digit Armstrong numbers:\n");
 
   for (i=1; i<=9; i=i+1)
      for (j=0; j<=9; j=j+1)
         for (k=0; k<=9; k=k+1){
            arm = i*i*i + j*j*j + k*k*k;
            if (arm == i*100+j*10+k) printf("%d ", arm);
         }
   return 0;
}

The changes are subtle, but do they make for a more readable piece of code? It’s amazing what a difference one less space per indent makes. It also makes a single line out of an if clause which only has one statement.

Refs:
Leinbaugh, D.W., “Indenting for the compiler”, ACM SIGPLAN Notices, 15(5), pp.41-48 (1980)
Crider, J.E., “Structured formatting of Pascal programs”, ACM SIGPLAN Notices, 13(11), pp.15-22 (1978)
Gustafson, G.G., “Some practical experiences formatting Pascal programs”, ACM SIGPLAN Notices, 14(9), pp.42-49 (1979)
Peterson, J.L., “On formatting Pascal programs”, ACM SIGPLAN Notices, 12(12), pp.74-75 (1979)

Occupational hazards

From a Computerworld article from 1998 (Occupational hazards unmasked!)

Algolholism: Sufferers can’t stop writing Algol programs, even though no one has sighted a working Algol program since 1977. Victims spend their days rummaging though old computer output recycling bins looking for a line or two of pure Algol code. The best treatment? Wean addicts off Algol with a synthetic derivative called Cobol.

Fibonacci in Swift

So, I didn’t really need another distraction, but Swift has an interactive environment, so now I’m playing. Here are two functions for Fibonacci, first an iterative one, and then a recursive one.

Here’s the iterative version:

func fibonacciI(n: Int) {
    var f1=1, f2=1, fib=0
    for i in 3...n {
        fib = f1 + f2
        print("Fibonacci: \(i) = \(fib)")
        f1 = f2
        f2 = fib
    }
}

And now the recursive version:

func fibonacciR(n: Int) -> Int {
    if (n == 0){
        return 0
    } else if (n == 1) {
        return 1
    }
    return fibonacciR(n-1) + fibonacciR(n-2)
}

And now the version using arrays:

func fibonacciA(n: Int) {
    var fib: [Int] = []
    fib.append(1)
    fib.append(1)
    for i in 2..<n {
        fib.append(fib[i-1]+fib[i-2])
    }
    for i in 0..<n {
        print("\(fib[i])")
    }
}

Nothing terribly different about these implementations from other languages.  I do like the way it is possible to create an array and then append items to it. The range constraints in loops, i.e. and ..< are kind of interesting as well.

 

 

Experiential programming (with Julia) (ii)

So once the novice programmer is comfortable with dealing with small pieces of data, the purview of problems can be extended to say deal with 2D data such as simple monochromatic images. So let’s look at the “noise” problem. Some images can contain a type of noise called “impulse” or salt-and-pepper noise – random black (0) and white (255) pixels. Below is an (extreme) example of noise on a small image of a dome:

In this case, the algorithm is again a simple one, based on median filtering the image. This involves looking at each pixel in the image, calculating the median value of some region around that pixel, and then assigning this value to a new image in the same location. This could be explained i using some form of diagram showing how a picture is made up of individual pixels. Basically a median filter removes “extreme” pixels, which could represent the noise.

It will extend the experiences with both arrays, loops, and the median() function. In algorithmic terms.

  1. Read in a monochrome image called noisyimage.txt (it’s a simple text image), and store it in a 2D array (matrix).
  2. Find the size of the [image] array, and create a new array to store the “cleaned” image.
  3. For every element (pixel) in the 2D array, select a 3×3 neighbourhood around the pixel, and calculate the median value.
  4. Assign this median value as the new pixel in the output image.
  5. Write the output image in a file called cleanimage.txt.

An important part of experiential learning with respect to programming is also program comprehension. So instead of creating the code, from the algorithm it is also possible to provide the code and allow students to read the code and try and associate the parts of the code with the parts of the algorithm.  The Julia program to perform this noise suppression is only 9 lines in length. Here is what the program looks like:

imgIn = readdlm("noisyimage.txt",Int16::Type)
dx, dy = size(imgIn)
imgOut = zeros(dx,dy)

for i=2:dx-1, j=2:dy-1
   block = imgIn[i-1:i+1,j-1:j+1]
   newPixel = median(block)
   imgOut[i,j] = trunc(UInt8,newPixel)
end

writedlm("cleanimage.txt",imgOut)

Colour-coding helps to associate like regions of code. For example, the code in red here is associated with input/output, blue with setting data up, and purple with actually performing the median filtering.

Swift fixes C’s “issues”

Regardless of what you think about Apple’s new language Swift – it has fixed many of the underlying issues with C, and it’s descendants. The number one thing may be that they actually gave the language a new name that it *not* related in anyway to C. Some will see it as some form of evolution of Objective-C, but it’s not. Language design has always involved taking design prompts from existing languages. Since its inception in the early 1970s, few have attempted to fix some of the inadequacies of its language structure. What are some of its “fixes”?

Strings are fully supported

Unlike C, where strings are character arrays, strings in Swift are represented by the String type. This makes life easier, because operations like equality can now be done using ==. The downside? It is  not possible to access an element of a string via an integer index (although it is still possible to access it). There is a cornucopia of information on Swift strings here at Ole Begemann’s blog.

Assignments do not return a value

One of the problems in C is code of the form:

if (x = y){
    i = x + 1;
    vortex[i] = 0;
}

This effectively does not compare x and y, but assigns the value of y to x which is always true (unless y is zero). The assignment operator in Swift does not itself return a value, preventing the use of  i=0 instead of i==0 in if statements.

No need to use break statements in switch blocks.

The case statements in Swift do not fall through the bottom of each case and into the next one by default. To do that you have to explicitly use the fallthrough clause.

switch does intervals.. AND compound cases

Finally! A switch statement that allows for intervals in the case clauses.

let grade = 73
var lettergrade: String
switch grade {
    case 0..<50: lettergrade = "F"
    case 50..<60: lettergrade = "D"
    case 60..<70: lettergrade = "C" 
    case 70..<80: lettergrade = "B"
    case 80..100: lettergrade = "A"
}

And even compound cases.

let ch: Character = "o"
switch ch {
    case "a", "e", "i", "o", "u": print("vowel")
    default: print("consonant")
}

Integer overflows are trapped as a run-time error.

Can be allowed through the use of a special integer operators: &+, &-, &*, &/, and &%.

Braces in if are not optional

In C, you can write a single statement after an if, and there is no issue. Multiple statements require the use of braces { }, to contain the statements. This can lead to programming errors in code when people forget to include the braces. For example:

if (x > y)
    printf("%d is greater than %d\n", x, y);
    max = x;

Here the last statement is executed regardless of whether x is greater than y or not, because the braces have been omitted. Swift does not allow for the omission of braces around the statement, they are mandatory… so no more issues like the example above. (Or dangling-else for that matter)

The use of for-in

Swift provides a for of the for loop which allows iteration over “containers”, or numeric ranges. For example:

for index in 1...5 {
    print("\(index) times 5 is \(index * 5)")
}

printf be gone!

As you will notice from the previous example, the values are printed out using an overloaded print statement.

do-while becomes repeat-while

Just a small thing I guess, but the word repeat is more meaningful from a looping perspective than do (especially to novice programmers).

NO GOTO

Another language decides that enough of goto is enough. It’s not there, so don’t look for it. Surprisingly you can create a similar feature (and here is a blog post that shows you how).

FUNCTIONS RETURN MORE THAN ONE VALUE

Yes, like many of its contemporaries, Swift allows more than one value to be returned via a tuple.

func circle(radius: Double) -> (Double, Double) {
    var area=0.0, circumference=0.0 
    area = 3.14159 * radius * radius
    circumference = 2.0 * 3.14159 * radius
    return (area, circumference)
}

This is called in this manner:

let values = circle(7.0)
print("area is \(values.0) and circumference is \(values.1)")

There are *many* other features, like the use of optional parentheses in control structures. I kind of like the type inference as well. Both these are the same.

var life:Int = 42
var life = 42

You can also run Swift as an interactive session from the terminal. For more basics try Swift by Example.

 

Experiential programming (with Julia) (i)

In experiential programming, the concepts of learning to program are taught as a side-effect of solving a problem. This is sometimes called problem-based learning. The syntax of a language is introduced when it is needed to solve a problem. The novice programmer then obtains an understanding of how a particular piece of programming language syntax can be used in an algorithmic context. Consider a simple example which finds the median of a series of user-input numbers. In its simplest context, the problem involves three steps:

  1. Obtain the user-input numbers.
  2. Calculate the median value.
  3. Output the median to the user.

So first, we ask the user how many numbers they wish to enter.

println("How many numbers? ")

This introduces the most fundamental output device in Julia, the function println(). Next this piece of information has to be obtained from the user.

n = parse(chomp(readline()))

This seems more complicated, but it basically reads the line of input, “chomps” the <return> from the end, and then turns the input into a number.

Next, a storage medium is created to store the values entered. This vehicle is an array of numbers, in this case with zeros as starting values.

t = zeros(n)

Now a loop can be introduced to allow for the n numbers to be input.

for i=1:n
   t[i] = parse(chomp(readline()))
end

The final part of the puzzle involves calculating the median value of this set, and outputting  this value out to the user. This is relatively easy, using the function median().

m = median(t)

Now output it:

println("The median value = ", m)

This 8-line program works quite nicely and introduces some of the basic programming ideas:

  • input from and output
  • using a container to store more than one piece of data
  • iterative structures called loops
  • using a built in function, in this case a statistical function

There is no need at this point to discuss datatypes – the way the program has been written, all input numbers will be assumed to be float numbers¹. The great thing about this Julia program is the lack of things that can make life more complicated for a novice programmer. There are no objects, no datatypes, and no having to deal with memory, or formatting instructions for input/output statements.

println("How many numbers? ")
n = parse(chomp(readline()))
t = zeros(n)
for index=1:n
   t[index] = parse(chomp(readline()))
end
m = median(t)
println("The median value = ", m)

Yes, there are some words which will seem strange to novice programmers,i.e. parse and chomp. But, there are many other things that exist in other languages that are transparent here. Performing a median calculation in C would require building a function called median(). And unlike Python, (and C) where arrays begin at element 0, arrays in Julia begin at element 1… so an array with 10 elements is indexed from 1 to 10. Now this code could be easily modified so that median() is replaced with mean(), or var() (variance).

¹Experienced programmers may loathe this idea of using floating-point numbers as the default, but it does not harm anything and makes life way easier for the novice. They may also be troubled by the dynamic typing, but this takes away the problems associated with declaring the variables index for example.

Now this could be used as the jump off point for asking the novice to modify the code to add a feature like calculating the sum of the numbers, and outputting this to the user. (In Julia this is as easy as using the sum() function.

 

Why programming for non-CS students is different

There is a lot of literature in the CS community about the reasons students drop-out of introductory programming courses. There are two streams of students: those who will go on to immersive careers in computer science (e.g. software engineering), and those who will use programming as a tool in their discipline, maybe to analyze process data. The latter includes fields like finance, physics, and bioinformatics. They are two vastly different groups of people, and it is almost impossible to create one course that suits everyone. Teaching a course based on a language such as C or C++, which requires knowledge of low level concepts such as memory management, will alienate non-CS students. Spend too much time on high-level use of a language, and the CS-majors will miss critical information on low-level concepts.

Often it comes down to difficulties associated with the programming language being used to teach programming concepts. For example, non-CS students don’t really need to know about object-oriented programming – it often just complicates things from the perspective of building programs. Game developer John Carmack once said “Sometimes, the elegant implementation is just a function. Not a method. Not a class. Not a framework. Just a function“. They also don’t need to know that much about memory, and certainly not to the complexities of dynamic memory, or indeed complex ways of coding things to make a program more efficient, e.g. vectorization.

So what to teach a non-CS student in the way of learning to program?

  1. How programs can be used to solve real-world problems. This often involves the use of real-data, e.g. digital images. Computer-centric examples are often irrelevant for non-CS individuals.
  2. Focus on problem solving.
  3. How to formulate a problem as an algorithm. For example if the problem is  removing random noise from a monochromatic photograph – how do we do this?
  4. Learning the basic constructs of programming – making decisions, repeating things, how to store data (from a high-level perspective).
  5. Writing the programs to implement the the algorithms
  6. How to test programs.

Choice of language is extremely important, and from that perspective Julia is a perfect language to teach novice programmers who don’t want to make a career of it, but want the skills to help them in their disciplines. Teaching programming can also be moved to a more experiential process, whereby languages syntax is introduced as they are used in the problem solving context.

CamelCase or Underscore?

One thing people always seem to argue over is naming conventions in programs. Should we allow the use of Camelcase, or underscore when naming variables, and functions? CamelCase is defined as writing compound words, such that each word in the middle of the phrase begins with a capital letter, with no intervening spaces or punctuation.

boundingbox    - not CamelCase
bounding_box   - separated by underscore
boundingBox    - CamelCase

Arguably we do use CamelCase even in daily text we write – eBay, iPhone.

Some have argued that the use of Camelcase reduces readability. The reality is that programs written in UPPERCASE reduced readability because words in a program lost their shape, and word shape is one of the cues which helps us read effectively. Is it better than underscore? I would say yes.  There have been studies on what is known as an “interword filler”, in this case the _ character, that show that they cause lateral distraction, reducing reading speed. Although the alternate universe would argue that underscores better resemble natural writing, with the underscore simulating a space.  Epelboim [1] found that removing spaces from text slowed reading by only 10-20%.

CamelCase is aesthetically pleasing and underscores are ugly? Which of these is truly better?

playerIdentifier, playerID, playerId, player_ID

Likely the use of CamelCase is still somewhat controversial.

writetofile();
     VS
WriteToFile();
writeTOfile();
Write2File();
write2File();

Obviously you would only use CamelCase where appropriate.

Deißenböck [2] cite that approximately 70% of the source code of a software system consists of identifiers. They go on to describe the naming of identifiers in real-world software systems as being “close to obfuscation”. This of course is somewhat frightening.

Ultimately its up to you to make the right choice when it comes to selecting names for variables and functions. I like the notion of using a lowercase letter for the starting word, e.g. write2File. I would do the same thing for external modules, such as in Julia. For example a function readPGM() from module imageIO, would be accessed as imageIO.readPGM(). There is no right or wrong way – but the names must be readable and used consistently. And nowadays, I avoid underscores as much as I can.

 

Ref(s):
[1] Epelboim, J., Booth, J.R., Ashkenazy, R., Taleghani, A., Steinmans, R.,  “Fillers and spaces in text: The importance of word recognition during reading”, Vision Research, 37(20), pp.2899-2914 (1997)
– This study looks at fillers in words: Latin letters, Greek letters, digits and shaded boxes, and conclude that fillers in text disrupt reading by affecting word recognition directly.

[2] Deißenböck, F., Pizka, M., “Concise and consistent naming”, In: Proc. of the 13th Int. Workshop on Program Comprehension (2005)