Calling a C function from Cobol

So, you may not like C, but imagine a world where it’s possible to call C from Cobol!

That world is this world! Here is an example of a C function say() which prints out two 6-character strings:

#include <stdio.h>

int say(char hello[6], char world[6])
{
    int i;
    for (i=0; i<6; i=i+1)
        putchar(hello[i]);
    for (i=0; i<6; i=i+1)
        putchar(world[i]);
    putchar('\n');
    return 0;
}

This code is then compiled into an object file:

gcc -c say.c -o say.o

Here is the corresponding Cobol program which calls it.

identification division.
program-id. hello.
environment division.
data division.
working-storage section.
01 hello pic x(6) value "hello ".
01 world pic x(6) value "world!".
procedure division.
  call "say" using hello world.
  stop run.

Which is compiled in the following manner:

cobc -x -Wall -free -O hello.cob say.o

Okay, so this *looks* easy… and it is. But it gets harder with integers, and arrays of integers – largely because of the differences between numeric values in Cobol and C. Note that because cobc compiles to C, this shouldn’t be an issue, but it seems to be.

UPPERCASE 2 lowercase conversion made easy

Got a legacy file in UPPERCASE – horrible, a crime against coding. But a lot of old code is like this (BASIC anyone??). In Unix it’s super easy to convert using the command tr. tr translates characters, and in the case below maps UPPERCASE characters to lowercase.

tr '[:upper:]' '[:lower:]' < inputFile > outputFile

History of languages: 1990s – super blah

The great age of language design, already waning in the 80’s declined in the 1990s. Okay, so I mean Java appeared in 1995, a creation of Sun Microsystems (now Oracle), and has been successful. BUT, at the time the thinking was it would be able to displace many of the ingrained languages, in their specific domains. This never really happened. Many universities took to teaching Java, but by the early 2000s, this trend had already started to reverse.

The greatest contribution of the 90’s was likely Python. Python, developed by Guido van Rossum, first appeared in 1991. Although not a “compiled” language, it was easy to learn, and extend.

Towards the end of the decade, the C standard moved to C99, incorporating some new data types such as long long, and complex, variable length arrays, and single-line comments in the form of // adopted from C++. Fortran too evolved, with substantial modifications, first to F90, then to F95. Fortran 90 was quite a radical revision for Fortran, one of the most interesting of which may have been the transition from fixed to free-style formatting. F90 also introduced modules, user-defined datatypes, array operations and features (sub-arrays, slicing), pointers, and of course recursion. F95 was in comparison, a minor revision. However the changes made in the 1990’s  allowed Fortran to remain relevant as a programming language.

 

 

Calculating π

In the second season Star Trek episode “Wolf in the Fold” (1967), Captain Kirk and Mr. Spock force an evil entity out of the starship Enterprise’s computer by commanding the computer to “compute to the last digit the value of π”, thus sending the computer into an infinite loop. There is of course no last digit of π. In October 2014, a multi-threaded program called y-cruncher was used to derived π to 13.3 trillion digits (achieved by anonymous user “houkouonchi”) – it took 208 days on a system with two Xeon E5-4650L processors (2.6 GHz). Compare this to Ludolph van Ceulen (1540–1610), a German mathematician who spent most of his life calculating π. He managed 35 digits:

3.14159265358979323846264338327950288

The history of π encompasses many centuries. In the ancient world the value of π was frequently taken to be 3. A Babylonian clay tablet also gives a value of 3 -1/8 or 3.125 while the Rhind Papyrus which is dated 1650 B.C. gives a value of 4(8/9)2 or 3.16. The first attempt to compute π seems to be due to Archimedes of Syracuse (c.287 – 212 B.C.) who used a sequence of regular polygons inscribed in and circumscribed about a circle. The perimeters may be used to give lower and upper bounds for the value of π, and as the number of sides increases these bounds give better and better estimates. Beginning with regular hexagons and doubling the number of sides successively, Archimedes computed the perimeters of inscribed and circumscribed polygons with 6, 12, 24, 48 and 96 sides. For a 96-sided polygon he found that

3-10/71 < π < 3-10/70

which is equivalent to a value for π of 3.14 to two decimal places.

In 1942 ENIAC (Electronic Numerical Integrator and Calculator) was born, a “giant brain” in its time, capable of 5000 additions a second. It is hard to compare the “speed” of these antiquated systems with modern systems, but a good way is to compare the time taken to perform various calculations. ENIAC calculated 2037 digits in 70 hours in 1949. The Apple II, with 16kb of RAM performed the same calculation in 40 hours in 1978. No doubt the same algorithm now would run in a couple of microseconds. Computing has come along way in a little over six decades.

The question of course is why do we need to calculate pi to such accuracy? Partially as a benchmark. It takes a lot of effort to crunch numbers, and is a good indicator of how well a system can deal with a certain algorithm. Accuracy? Unlikely. The International Space Station Guidance Navigation and Control (GNC) subsystem performs calculations using 15 digits of π. Calculating π to 39 digits allows you to measure the circumference of the observable universe to within the width of a single hydrogen atom.

So why do humans strive to calculate as many digits as possible for π? Not because of any practical application – but rather to push the boundaries or what can be calculated, which is no different than trying to break the land-speed record.

 

Why nested functions can be cool.

Have you ever tried to use nested functions? ANSI C doesn’t like them, and only Gnu C supports them. Why are they good, aren’t they as evil as goto statements and global variables? I don’t think so. Most people aren’t use to them because they generally don’t occur in newer languages, but in languages such as Pascal, Ada, and Modula-2, they reign. Why are they useful? For information hiding of course!

Let’s consider a version of the Factorial algorithm in which the actual recursive function fact(), has no parameters, and returns nothing, using global variables instead. But not “global” variables in the same sense as C – variables which are accessible from a non-recursive wrapper function, factorial(). The wrapper function is called by the user and initiates the call to fact(), maintaining all the variables (n), and returning the result. The user has no access to fact(), as it is nested within factorial().

function factorial(x: integer): integer;
  var n: integer;
  
  function fact(): integer;
  begin
    if n = 0 then
      fact := 1
    else begin
      n := n - 1;
      fact := (n+1) * fact();
    end
  end;

begin
  n := x;
  factorial := fact();
end;

It involves some modifications of the recursive function fact(), most noticeably the decrement of the variable n, prior to the recursive call. In earlier times, the replacement of parameters and local variables by “global” variables was a means of eliminating inefficient run-time stack manipulations of the variables and parameters.

 

 

Programs written in other languages: German

Everyone write programs in English don’t they… I mean programming language are written in English, so it goes to say that the rest of the program is in English too right?

Not so. We quickly forget that just because the language is in English does not mean the program is. Take for example the following program from the German book “PASCAL in 100 Beispielen” (1983). Can you figure out what it does?

germanPascal

It basically asks a farmer the length and width of his field, then calculates the m², following up with a request for the cost per m², and a calculation. If you code and run the program you can program figure out roughly what it is doing. Google translate would help with the German words. Although it’s not perfect. It translates the phrase in the first write statement to:  “Bauer Ignaz how long is your field.” For some reason it couldn’t translate Bauer to farmer. Give it the word by itself, and it does translate it. So what would a Pascal program look like if we converted the core keywords of Pascal to German?

program = Programm 
input = Eingang
output = Ausgabe
var = Var
begin = Start
end = Ende
write = schreiben
read = lesen
writeln = schreiben + Linie (Line) = schreibenLn
readln = lesen + Linie (Line) = lesenLn

Interesting? Some of the keywords are similar, some constructed in a similar manner to the English ones could be quite long.

 

Is code optimization evil?

In Donald Knuth’s 1974 paper Structured Programming with go to Statements¹, he makes the following statement:

“There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.

This was the early 1970s, a time where computers still had very little memory, so optimization was key in using all the available resources. Is code optimization still evil? Likely not as much as it once was, in the age of mobile and embedded systems which require efficient code, it way be as important as ever. For example, inefficient code could lead to premature battery drain on a mobile device. In code from inexperienced programmers, it often manifests itself as simple things such as redundancy due to the recomputation of common expressions, or loops containing loop independent expressions. For example, take the simple quadratic equation, which would be transformed into the following statements in C:

r1 = (-b + sqrt(b * b - 4.0 * a * c)) / (2.0 * a);
r2 = (-b - sqrt(b * b - 4.0 * a * c)) / (2.0 * a);

The problem with this code is that each expression contains elements that are repeated: the discriminant (√b²-4ac), and the redundant computation of the subexpression 2a. This code would be more efficient if written as:

denom = 2.0 * a;
sdisc = sqrt(b * b - 4.0 * a * c);
r1 = (-b + sdisc) / denom;
r2 = (-b - sdisc) / denom;

As another example, consider the following piece of code:

x = 3.4;
for (i=0; i<100; i=i+1)
    y = y + a[i] * (x*x + 3.0*x + 2.0);

Here the expression x²-3x+2 is independent of the loop variable i, but would still be independently calculated 100 times. This would be more efficient if replaced by:

x = 3.4;
z = (x*x + 3.0*x + 2.0);
for (i=0; i<100; i=i+1)
    y = y + a[i] * z;

It is sometimes the small things that make all the difference, maybe not in languages such as C, but certainly in languages such as Python which are more susceptible to slow code.

¹ACM Computing Surveys, 6(4), p.268, (1974)

Classic computer science papers you should read.

Many of the early papers on programming languages published in journals such as “Communications of the ACM” prior to the 1990’s were extremely readable, written by the computer science pioneers. They were often very opinionated, but ironically resonate even today with the challenges posed within the industry. Here is a sample of the more interesting papers.

  • Dijkstra, E.W., “Programming considered as a human activity”, EWD117 (1964-67)
  • Dijkstra, E.W. “go to statement considered harmful”, Communications of the ACM, 11(3), pp.147-148 (1968)
  • Dijkstra, E.W., “The humble programmer”, Communications of the ACM, 15(10), pp.859-866 (1972)
  • Holt, R.C., “Teaching the fatal disease (or) Introductory computer programming using PL/I”, SIGPLAN Notices, p.8-23 (1973)
  • Wirth, N., “On the composition of well-structured programs”, Computing Surveys, 6(4), pp.247-259 (1974).
  • Wirth, N., “On the design of programming languages”, in IFIP Congress 74, pp.386-393 (1974)
  • Wirth, N., “An assessment of the programming language Pascal”, IEEE Trans. Software Engineering, 1(2), pp.192-198 (1975)
  • Hoare, C.A.R., “The emperor’s old clothes”, Communications of the ACM, 24(2), pp.75-83 (1981)

If you program, you should read these papers, to gain a historical insight to the field today.

How smart is AI?

So Google’s AI beats a human playing Go (4:1), a seemingly ancient game played all over East Asia. It’s not the first time AI has beat a human playing a game. In 1997, chess master Garry Kasparov lost to IBM’s Deep Blue. AI has also won games of checkers and Othello and Jeopardy!. But what does this really mean? It means that artificial intelligence is good at winning games where the rules are quite succinctly defined. Not that Go is an easy game to play, it does involve a level of real artificial intelligence known as deep learning. The game is played on a 19×19 board, with the goal being to gain the most territory by placing and capturing black and white stones. The average game is 150 moves, with a possible 10^170 board configurations.

The question of course is how smart is AI really? Games like Sudoku are *easy* to solve because they have nice, concise rules. In fact an algorithm to solve Sudoku using recursion is relatively simple. Other games are not so. Will AI ever be smart enough to play a game of Jenga? Snakes and ladders? Probably not… in fact any game that involves some form of creativity will likely not be solved by AI. Creativity in general is not something computers do well.

 

 

Exceptions are the Wookiees of Ada

“It’s not nice to upset a Wookiee.” 
— Han Solo

Wookiee’s are  quite strong, and are known to rip people’s arms out of their sockets when provoked. In Ada, the role of the Wookiee is taken over by the exception. For those who have only ever coded in C, Ada is a bit of a culture shock. Partially because it won’t allow things that C-like languages do. And when it Ada doesn’t like something it’s like being whacked by a Wookiee.

Array goes out of bounds… WHACKED BY A WOOKIEE.
File doesn’t exist?… WHACKED BY A WOOKIEE.

But the reason people find exceptions challenging is because they, like a Wookiee can be somewhat if-your-face. Isn’t that a good thing? Here’s an example piece of Ada code which reads in 10 integers and print them in reverse:

with Ada.Text_IO, Ada.Integer_Text_IO;
use Ada.Text_IO, Ada.Integer_Text_IO;

procedure arrayRev is
type intArray is array (1..10) of integer;
Value : intArray;
begin
 for i in 1..10 loop
   Get(Item => Value(i));
 end loop;
 for i in reverse 1..11 loop
   Put(Item => Value(i));
   New_Line;
 end loop;
end arrayRev;

Now compile and run this code, entering the following numbers:

1 2 3 4 5 6 7 8 9 10 11 12

Ada will read the 12 numbers in fine, storing the first 10, and discarding the rest. The problem arises when the program attempts to access the 11th element of the array Value,  which of course does not exist. Here is the “constraint” exception raised by Ada when this is attempted:

raised CONSTRAINT_ERROR : arrayrev.adb:12 index check failed

This indicates an index failed on Line 12 of the program, in the call to Put. Better still, see what happens when we attempt to enter a character in the input:

1 2 3 4 c 5 6 7 8 9

Ada does not like the fact that the letter “c” is not an integer, and raises an “IO” exception:

raised ADA.IO_EXCEPTIONS.DATA_ERROR : a-tiinio.adb:87 instantiated at a-inteio.ads:18

C would not allow both these things to happen quite happily. Ada won’t.