The bustling evolution of languages

Languages evolve – to a point. The English language evolved over hundreds of years to its modern incarnation, and some would argue its still evolving, although for the better is dependent on your viewpoint. It was cobbled together from words from other languages, in particular German and French, due in part to their proximity and close associations. Old English evolved to Middle English and then to modern English. Programming languages also evolve, but their evolution is more methodical, or logical if you will. Old FORTRAN evolved to middle Fortran (77) and then to modern Fortran. The difference is that modern Fortran is still backwards compatible.

What does the evolution of a programming language look like? Fortran and Cobol evolved by means of evolutionary changes to the core of their structures, even though their DNA retains obfuscated features such as arithmetic if… almost how humans retain the appendix if you will. Languages like C have evolved through the addition of features, but whose DNA remains largely the same as it was when it crawled out of the PDP-11 it came from. It’s less than useful features still exist, a sign that beneficial evolution doesn’t always happen. Other languages have evolved from what we could best term gene manipulation…. or maybe Frankensteinism… take the best feature of other languages and put them together. This of course relies on the designers perception of what “best” really means. No system is perfect, and no language is perfect.

Yet if survival of the fittest is to be applied to programming languages then Fortran and Cobol both have strong characteristics. Others have tried to sideline them, but in their environments they excel. The language kingdom is one in which various species live in various environments and often thrive. What threatens them sometimes is the idea that another language could pervade their environment- it rarely happens though. The languages that have failed to take hold often suffer from faulty DNA. Algol seemed like a good idea, but dissatisfaction in the community lead to the evolution of Algol68, and it’s disastrous syntax… truly a murky Gene pool. It died out. APL (A Programming Language) first appeared in 1966, however its mathematical notation made it hard to understand, and today there are few isolated communities left.

Not every modern language is globally successful either, and sometimes it has nothing to do with its syntactic structure. Lua, which appeared in 1993 is a language largely used in embedded applications, and as such does not have an ecosystem that supports standalone apps (Lua is often embedded into C or C++). Any language which ends up as as a supplementary language lacks the visibility to evolve successfully. That’s not to say that Lua has bad DNA. It’s embedded nature, and constrained use likely limits how it can evolve.

Evolution of if (iv): Ada and beyond

By 1977, Fortran had likely its greatest metamorphosis from an unstructured, to a quasi-structured language. At the eleventh hour the revision for the F77 standard was modified to reduce the impact of goto statements to match other languages, where its influence was minimal, or even non-existent. The changes made Fortran 77 vastly different from its predecessor, Fortran 66.

Of major importance, was the inclusion of a “block IF“, which took the following form:

IF (E) THEN
...
END IF

The use of THEN as a new keyword allowed a block of statements to be incorporated until the terminating keyword ENDIF was reached. This also solved the dangling else problem. This was augmented by the addition of the keyword ELSE, which allowed for a group of statements to be actioned if the preceding IF is not satisfied.

IF (E) THEN
    ...
ELSE
    ...
ENDIF

By the mid-70s, Fortran was likely coerced into making these changes due to the competition from C and Pascal, both of which offered these conditionals. These new F77 constructs allowed for improved program readability, especially through eliminating the need for statement labels, and goto statements. Here is an example:

IF (K.GT.0) THEN
    POSNUM = POSNUM + 1
ELSE IF (K.LT.0) THEN
    NEGNUM = NEGNUM + 1
ELSE
    ZEROS = ZEROS + 1
ENDIF

The emergence of Ada did nothing to evolve the if statement. Like Pascal and F77, it used a then keyword, borrowed the else-if idea from Algol68, renaming it elsif, and used the same structure terminator endif, as F77. By this stage, if statements had likely evolved as far as they would, and new languages were just selecting appropriate concepts from existing languages.

if C1 then 
    S1 
elsif C2 then 
    S2
elsif Cn then 
    Sn
else 
    S(n+1)
endif;

Fortran 90 would go on to finally make the arithmetic if obsolescent. Python would alter very little, adopting the elif of Algol68, and the lack of parentheses.

if x == 0:
    zeroes = zeroes + 1
elif x < 0:
    negnum = negnum + 1
else:
    posnum = posnum + 1

Julia as well uses an amalgam of structural pieces.

if x < 0
    negnum = negnum + 1
elseif x > 0
    posnum = posnum + 1
else
    zeroes = zeroes + 1
end

We are now in the age of mix-and-match, and it is unlikely the if statement will evolve to any great extent.

The evolution of if (iii): Algol68, Pascal and C

The design of the if statement in Algol 60 was likely the pinnacle of its evolution. From here on in every language tweaked its syntax, but there were no major changes. Languages like Algol 68, C, and Pascal all had conditional statements. Algol 68, although having the same name as “Algol” moniker, was a different language altogether.

Whereas Algol 60 required the use of explicit compound statements within an if statement if more than one statement was being controlled, Algol 68 incorporated the use of control structure terminators. For the if statement this meant the use of the reversed keyword fi. Algol 68 still lacked the parentheses of Fortran, but also had no requirements for compound statements, as each section was self-delineated. It had the following general form:

if C then
    ...
else
    ...
fi

This had the added effect of eliminating the dangling-else problem of Algol 60. Algol 68 also added the keyword elif, a short-hand to allow for a series of else-if statements:

if C1 then
    ...
elif C2 then
    ...
elif C3 then
    ...
else
    ...
fi

Here is an example:

if x>0 then
    posNum := posNum + 1;
elif x<0 then
    negNum := negNum + 1;
else
    zeros := zeros + 1;
fi

The if statement of C simplified that of Algol 60, deleting the then clause, and adding parentheses to enclose the conditional statement. It had the following general form:

if (C) 
    statement1; 
else 
    statement2;

However, similar to Algol 60, groups of statements require the use of compound statements delineated by { }, and C also suffers from the dangling-else problem of Algol 60. Here is an example:

if (x>0)
    posNum := posNum + 1;
else if (x<0)
    negNum := negNum + 1;
else
    zeros := zeros + 1;

Pascal, which arrived at a similar time to C, has a syntax similar to that of C – except its logical expression was bracket-less, and it used the then keyword, like Algol 60. Like Algol 60, it also suffered from the dangling-else problem, and required the use of begin-end delineators for a compound statement.

if C then
    S
else
    S2;

The evolution of if (ii): Fortran IV and 66

Fortran did not make any inroads into modifying the if statement until later. Likely spurned on by Algol 60, Fortran IV introduced the logical if statement in 1965. It had the following form:

IF (E) STATEMENT

Where E was a logical expression, using operators of the form .EQ. for =, and .LE. for ≤. The statement was any statement except a DO statement or another logical IF. However unlike Algol 60, there were no compound statements, and no keyword corresponding to else. Both these had to be achieved by means of goto statements. In this sense it almost mimicked an if-else statement. Consider the example below:

    IF (A .LE. 0) GOTO 15
    W = X ** A
    GOTO 20
15  W = 0
20  ...

In this case, if the value of A is less than of equal to zero, the program jumps to statement 15, setting W to 0. Otherwise it calculates W=X**A, and jumps to statement 20. Notice that the Fortran conditional “operators” are stropped by the use of periods, e.g. .EQ.. This was done to avoid potential ambiguity. The expression A LE 0 could also have been interpreted as the variable ALE0. Fortran 66, the first industry standard made no changes to the if statement.

There were a number of differences between Fortran (IV) and Algol (60):

Fortran used mnemonics to represent conditional operator, e.g. .LE., versus Algol’s ≤ (in some implementations <= was used due to the non-availability of ≤)
Fortran uses parentheses, ( ), to separate the logical expression from the statement, whereas Algol uses the additional keyword then.
Fortran (66) required that each arithmetic statement on either side of a conditional be of the same datatype. This is because A.GT.B was often translated to A-B.GT.0.(This disappeared in F77).

By all accounts, Fortran IV, and 66 were extremely deficient with respect to conditional statements. The next major changes were not to appear until Fortran 77.

Consider code that looked like this in Algol 60:

if k>0 then posNum := posNum + 1
       else if k<0 then negNum := negNum + 1
                   else zeros := zeros + 1

The equivalent in Fortran 66 would be:

    IF (K.GT.0) GOTO 30
    IF (K.LT.0) GOTO 31
    ZEROS = ZEROS + 1
    GOTO 47
30  POSNUM = POSNUM + 1
    GOTO 47
31  NEGNUM = NEGNUM + 1
75  ...

How did if evolve in other languages? Algol 68, C, Pascal?

The problem with evolving languages today

Unlike the 1960s, today very few new languages evolve. Those that do are often moulded from an existing form. In earlier days of programming language design, languages were implemented when the complete specifications were designed. Algol 60 evolved through such a specification, and although specific compilers always contained small variations, the core concept was the same. Radical changes to the underlying structures came at broad intervals, say 5-7 years, which allowed the language to attract users.

One of the major problems today is the pace of language roll-out. Good examples are Julia and Swift. Swift was introduced in 2014, upgrading to V1.2 in the same year, and Swift V2 in 2015, and V2.2 in December 2015. Swift 3.0 arrived in Sept. 2016. All this in a little over two years. What’s wrong with this? The problem is that development of these languages has become far too fluid. I understand that modern languages are often behemoths that naturally require tweaking, but sometimes features disappear or are radically altered between versions of a language – and that just shouldn’t happen. Why? Because radically modifying structures in a language on such a regular basis can lead to painful code migration, and frustrated programmers who then have to spend time rewriting code.

Let’s look at a case in point, the Swift string. Strings in Swift are their own entities, for example an empty string can be created in the following manner:

var emptyString = ""

Easy right? It’s also easy to do things with strings. For example:

var string1 = "darth"
var string2 = "vader"
string1 = string1 + string2
// string1 now contains darthvader

Many would argue this is much nicer than C. The problem lies in Swift’s evolution from V2 to V3, and involves naming conventions. Swift 2 had a number of functions to advance an index to traverse a string, functions like successor() and predecessor(). Okay, so they DO seem verbose. Some would likely argue that succ() and pred() would have been better… but that’s taking us off topic. In Swift V3 these changed. The properties startIndex and endIndex remain the same. Okay, so let’s look at an example:

let sith = "vader"

// Swift 2
sith[sith.startIndex]               // "v"
sith[sith.startIndex.successor()]   // "a"

In V3, the functions startIndex() and endIndex() have been replaced by index(after:) and index(before:). So the above code now looks like this:

// Swift 3
sith[sith.startIndex]               // "v"
sith[sith.index(after: startIndex)  // "a"

Which maybe just doesn’t seem as intuitive anymore. Why change it? Anyways the point I’m trying to make is that continuous changes to the structure of a language means that programmers have to constantly migrate code, which is not ideal. The other issue is when does a language become stable? Does it continuously evolve? And what about backwards compatibility?

The truth is, programmers will become very hesitant to write libraries for new languages, if the structure of the language changes too often. It is hard to devote time to an endeavour, only to make the realization that the codebase will have to be deeply modified on a yearly basis.

Here’s a thought. Design a language. Implement the language. Release the language. Let people USE the language for 3-4 years, whilst reviewing what works and what doesn’t. Then make subtle changes, and allow for backwards compatibility.

The Craft of Coding

Musings on programming and education

programming language evolution