I type, therefore I am – The conundrum of data types in languages.

In programming languages, there are two kinds of typing, i.e. assigning a datatype to a variable: static and dynamic. More traditional languages such as Pascal, C, Ada and Java require the programmer to assign a type to a variable before it can be used. For example in C:

double aNumber;

This makes the variable aNumber, a type double, meaning that it can store double-precision floating point numbers – not anything else. This is called static typing, and really means that type checking occurs when the program is compiled. Languages such as Python, Perl, and Julia use something often called dynamic typing, where the type is assigned when a variable is created or used. In dynamic typing, type checking occurs at runtime.  For example in Julia:

aNumber = 'hello'
aNumber = 47

The first assignment makes the variable aNumber a string, and the following one makes it an integer. This means that a variable doesn’t really have a fixed type.

What sets them apart? Many would argue that statically typed languages are more robust – dynamically typed languages are touted to behave more erratically, with run-time errors, and a difficulty in achieving the same level of correctness. The single biggest benefit of dynamic typing is that for the novice programmer there is a very shallow learning curve. There is no type system (although some languages also allow variables to be typed if required), and therefore no other dependencies, such as I/O. For example, C requires the following types of integers:

int, unsigned int
short, unsigned short
long, unsigned long
long long, unsigned long long

This can be overwhelming for the novice programmer to remember, considering the actually value range of each is based on the particular system the C compiler is on, and then to have to deal with  specific type codes for I/O, e.g. %d for int, %ld for long etc.

 

When did indenting appear in programs?

Program today and you would think that indenting has been around forever – but this is not the case. Don’t forget that until the mid 1960’s many programming languages were force-formatted, usually with 8 spaces before the left-justified code started, to hold any labels used in the program.

It wasn’t until the early to mid 1970s that people started to study the benefit of indentation and white space in making programs look “better”, i.e. easier to follow and modify. Some of these studies indicated that indentation inhibited program comprehension. Others, that programs with goto statements did not lend themselves to indentation, which isn’t surprising considering the unstructured nature of instructional jumps. In fact the 1970s was awash with studies on the effects of indentation, and commenting (another concept that seemed to be new).

Here is a Pascal program from Niklaus Wirth’s “Pascal: User Manual and Report” in 1975. Notice two things: indentation is 3 spaces, and multiple statements exist on single lines, for example on line 3 of the program. This is not untypical of the time, indenting was haphazard, and no attempt was made to give every statement its own line.

program exponentiation(input, output);
var e,y: integer; u,x,z: real;
begin read(x,y); write(x,y);
   z := 1; u := x; e := y;
   while e>0 do
   begin
      while not odd(e) do
         begin e := e div 2; u := sqr(u);
         end;
      e := e-1; z := u*z
   end;
   writeln(z)
end.

So where did the use of 2 and 4 spaces come from? It is hard to trace the exact nature of the amount of whitespaces used. In 1983, Miara et al. [1] undertook a study to determine the impact of the level of indentation on program comprehension. Testing programs with 0, 2, 4, and 6 spaces on novice and expert programmers, their results gave favour to 2 or 4 spaces.

[1] Miara, R.J., Musselman, J.A., Navarro, J.A., Shneiderman, B., “Program indentation and comprehensibility”, 26(11), pp.861-868 (1983)

Memories of memory

One of the limitations (or benefits if you look at it another way) of contemporary programming languages is that they deal with memory management so you don’t have to. Programmers don’t have to worry about stacks and heaps – they are hidden away in the dark recesses of the machine. C uses stacks and heaps, but the reality is that in most cases when a modern language talks about the stack, what it’s really referring to is the heap. It’s using the heap – and most programmers have no clue.

Python, for instance is implementation dependent in how it manages memory, although in most cases it is handled internally by some form of memory manager. For example Cpython uses a private heap containing all Python objects and data structures.

Does it really matter? In some cases no, not at all. There are situations when I don’t really care where or how something is stored. But there are other situations, such as embedded applications, where knowledge of memory management is paramount (hence the use of C-like languages on embedded systems).

 

 

 

Design of a string type for C

As mentioned in a previous post, strings in C are kind-of blah. Part of this has to do with the use of a string terminator, and the fact that they are not first-class objects. What about using length counts instead? How would this be achieved? One way would be to store the length count in a single byte at the start of the string, say index 0? This does have the effect of limiting the length of a string to 255 characters, but this rally shouldn’t be a problem. If you are storing larger strings, there is likely a better data structure, or you could simply use an array of characters Differentiating it in this way is similar to how Fortran deals with arrays. Too complicated? Unlikely, especially for the novice programmer, who no longer has to deal with the terminating string fiasco. Also strings could be indexed from 1..n without the loss of the precious “0-index”, which is used for storing the length of the string.

So what would a rejigged string look like? A simple string, with a maximum length of 255, might look like this (adopted from the keyword used by ALGOL 68):

string s;

This means that s[1] to s[255] would contain the characters in the string and s[0] would contain the length of the string. A longer string might be achieved through using the moniker long. For example:

long string[2000] s;

This would create a string with 2000 characters. Making an array of strings might be accomplished by:

string[30] s[40];

This would create an array of 40 strings, each 30 characters in length. Another annoying feature of C is that it provides functions for string to number, but not vice versa. This could be fixed by having a cast operator (string), which avoids having to use sprintf().

Of course, in an ideal world these strings would be even more efficient if they were coupled with the ability to use substrings s[i:j], overload the + operator for concatenation, == for equality and use !s to return the string length, but then, maybe I’m thinking of another language…

How lucid is Lucid?

In 1976, a language appeared from the University of Waterloo named Lucid. What is interesting about this unconventional “data flow” programming language is that the order of the statements in the language is irrelevant, and assignment statements are equations. It seemed to be primarily designed to carry out mathematical proofs, describing an algorithm in terms of assignments and loops. Here is a simple Lucid program to calculate the square root of a number N:

1  N = first input
2  first I = 0
3  first J = 1
4  next J = J + 2×I + 3
5  next I = I + 1
6  output = I as soon as J>N

Now let’s look at how it works:

  1. inputs N
  2. initialize the loop variable I
  3. initialize the loop variable J
  4. repeated, generates successive value of J
  5. repeated, generates successive value of I
  6. terminates the loop and outputs the result

Of course to the average programmer, this seems kind-of intuitive, not completely left of field like some languages (Lisp anyone?). The authors describe this language as being spartan – containing NO procedures (as differentiated from functions), data structures, control structures or I/O. They go on to remark that “not having to worry about control flow is remarkably liberating” [1].

I doubt there is still a compiler anywhere, but it is a cool language to at least explore on paper.

[1] Ashcroft, E., Wadge, B., “Some common misconceptions about Lucid”, ACM SIGPLAN Notices, 15(10), pp.15-26 (1980).

The shortcut if

In a paper critiquing Pascal in 1973, A. N., Habermann comments on the if statement. He suggested that the code:

i := if i=7 then 1 else i+1

more clearly expresses that a value is assigned to i than the statement:

if i=7 then i:=1 else i:=i+1

What do you think? Is the embedded statement easier to read? Maybe, for an experienced programmer, maybe not so much for a novice. It is similar to the problem found in C with the ternary if statement.

i = i==7 ? 1 : i+1;

This suffers from a lack of readability, mostly related to the use of two symbols ? and : to represent then and else. Using the actual words if, then and else would be two verbose. That and it doesn’t seem logical to everyone to embed a decision statement within an assignment. But it does make nice compact code.

Habermann, A.N., “Critical comments on the programming language Pascal”, Acta Informatica, 3, pp.47-57 (1973).

Pascal’s Achilles Heel

The Pascal programming language was designed for teaching. Anyone who learned programming in the 1970s and 80s likey did so using Pascal. One of the main idiosyncrasies with the design of Pascal is the use of semicolons. In C, semicolons perform the task of terminating statements, so it is hard to use them in the wrong context. In Pascal, semicolons are statement separators adopted from the syntax of ALGOL. This basically means that they do not exist in places where the layout of the program would make them redundant. For example, consider this piece of code in C:

1 while (!odd(y))
2 {
3    y = y / 2;
4    x = sqrt(x);
5 }

whereas in Pascal the code would look like this:

1 while not odd(y) do
2 begin
3    y := y div 2;
4    x := sqr(x)
5 end;

The two statements on lines 3 and 4 are separated by a semicolon. The semicolon after the end on line 5 separates the while loop from the next statement. Most Pascal compilers will also accept the following:

1 while not odd(y) do
2 begin
3    y := y div 2;
4    x := sqr(x);
5 end;

But failure to add the semicolon at the end of line 5, as in:

1 while not odd(y) do
2 begin
3    y := y div 2;
4    x := sqr(x);
5 end
6 y := y - 1;
7 z := x * z;

This will result in an an error of the form:

Fatal: Syntax error, ";" expected but "identifier Y" found
Fatal: Compilation aborted

Similarly, a semicolon before an else statement will effectively chop the if statement in two, causing an error. The following is the correct way:

if i > j
then maxi := i
else maxi := j;

Basically if you are writing programs in Pascal, remember the following two rules:
a semicolon before ELSE is wrong;
a semicolon before END is unnecessary. 

The devolution of usability

One of my hobbies is woodworking, and one magazine I liked a lot before it disappeared was Woodworking Magazine. Or rather it disappeared by merging with Popular Woodworking, to become Popular Woodworking Magazine. What is amazing is the website evolution, because normally websites improve over time. Not so in this case. The first image shows the webpage of Woodworking Magazine in 2005. This magazine had no ads, and its website reflects this with no ads, and a very clean front page. It is clear that the current issue is the main stay of the page. The webpage actually reflected the magazine, which also contained no ads.

woodworkingMAG2005

In comparison, consider the Popular Woodworking site in 2005. It too depicted the current issue of the magazine, and was quite clean, even though there were some ads on the website. The information on the left side of the webpage is well organized, making it easy to find relevant information.

popwwMAG2005

After the merger circa 2010, the website too evolved into a hybrid (shown here in 2011). The menu has transformed from vertical to horizontal, and a video stream has been added.

popwwMAG2011

Finally a snapshot of the website from 2016. It is now an extremely busy website festooned with advertising.

popwwMAG2016

Compare this to the Fine Woodworking website, which offers a much cleaner browsing experience. There are ads, but they are lower down, so as to not crowd out the content on the opening portion of the website. Everything is easy to find, and the use of whitespace makes things stand out.

finewoodworkingoct16

Structural erosion in old code, i.e. rot

Old code rots. Not in the traditional sense of the word of course. There is no physical decay, and it doesn’t smell, it usually manifests itself as code that slows over time, or just stops working. So in reality it may be more accurate to say that it erodes in the same way that steel rusts, slowly weakening. Structural erosion of code occurs for a number of reasons. Sometimes it is because the environment itself changes. Code is ported from an old system to a new one, and runs. Yet the increased speed of the system may have a negative impact on the old code due to the type of algorithm used in the code.

A good example is old technology. iPhones (or any phone for that matter) last for a certain number of years. At some point the technology changes, and they are no longer capable of updating the operating system. The iPhone then becomes a tomb for the software within it. Without an OS update, it is then likely that over a short term, apps will no longer be able to be updated either. Or maybe an app is no longer supported, so it may not function properly if transferred to a new iPhone. It sometimes happens with compilers as well. Really old code rots, because it requires a *lot* of changes to make it function, possibly because the software development environment has changed too much.

It is also present in websites, where dependencies such as links no longer exist, causing the website to become dysfunctional.

The user interfaces of Star Trek – vocal

One of the more interesting aspects of the computer systems on Enterprise is the human-computer interface. Computer stations are equipped with audio I/O, and a seemingly unlimited set of words, in unrestricted English. Here’s an example:

Computer. Digest log recordings for past five solar minutes. Correlate hypotheses. Compare with life forms register. Question: Could such an entity within discussed limits exist in this galaxy? (Episode: Wolf in the Fold)

There is no way with current technology that we could ever fathom such an understanding of English, or any language for that matter. The request also implies quite a high level of intelligence for the computer itself.

What about the whole speech thing?
So the Enterprise relies heavily on speech recognition and semantic comprehension of a natural language. Speech recognition takes phonemes (speech sounds) and tries to make them into words.  In Star Trek, recognition of spoken words has been completely solved. In 1977 the capabilities were akin to 1000 words recognized for one speaker. Is it any better today? Today we have Siri, maybe the forefront of speech I/O. Microsoft apparently has a word-error-rate (WER) of only 6.3%, slightly lower than IBM’s Watson team at 6.9%. In 1995, the WER was 43% (IBM). Speech recognition has always been challenging because every persons speech is so different, but great strides are being made.

Aside from this, semantic comprehension, or understanding is a completely different ballgame. What progress has there been on the design of algorithms to analyze statements?

Schmucker, K.J., Tarr, R.M., “The computers of Star Trek”, BYTE, Dec. pp.12-14, 172-183 (1977)