Are you writing safety critical code in C?

If you’re writing safety critical code, e.g. for real-time systems, it may be relevant to follow some guidelines, like the Jet Propulsion Labs C coding standard. Here are some of the main points:

  • All loops shall have a statically determinable upper-bound on the maximum number of loop iterations.
  • There shall be no direct or indirect use of recursive function calls.
  • There shall be no use of dynamic memory allocation after task initialization. i.e. no use of malloc(), sbrk(), alloca(), and similar routines.
  • The goto statement shall not be used. There shall be no calls to the functions setjmp or longjmp.
  • In compound expressions with multiple sub-expressions the intended order of evaluation shall be made explicit with parentheses.
  • The evaluation of a Boolean expression shall have no side effects.
  • Use of the C preprocessor shall be limited to file inclusion and simple macros.
  • There should be no more than one statement or variable declaration per line. A single exception is the C for-loop, where the three controlling expressions (initialization, loop bound, and increment) can be placed on a single line.
  • Functions should be no longer than 60 lines of text and define no more than 6 parameters.
  • The validity of function parameters shall be checked at the start of each public function. The validity of function parameters to other functions shall be checked by either the function called or by the calling function.
  • Compile with all possible warnings active; all warnings should then be addressed before the release of the software.

Further Reading

Holzmann, G., The Power of Ten – Rules for Developing Safety Critical Code

Making funny characters visible

Sometimes, files in Unix contain weird “non-printing” characters, because they are invisible… this sometimes happens because of the way ASCII is copied from one place to the next. There are a bunch ways to find these characters, for example using cat -v, but it’s just as much fun to code it yourself. This program, vis, again comes from “The UNIX Programming Environment”, with a few tweaks. The program has two modes: (i) by default it displays the non-printing characters as an octal value, and (ii) a “-s” option which strips non-printing characters from the output rather than displaying them. Here is the program, vis.c:

#include <stdio.h>
#include <ctype.h>
#include <string.h>

int main(int argc, char *argv[])
{
   int c, strip=0;
   if (argc > 1 && strcmp(argv[1], "-s") == 0)
      strip = 1;
   while ((c = getchar()) != EOF)
      if (isascii(c) && (isprint(c) || isspace(c)))
         putchar(c);
      else if (!strip)
         printf("\\%03o", c);
   return 0;
}

The program is quite simple. Lines 8-9 decide if the “-s” flag is being used, and if so, sets the variable strip to 1 (true). The loop (line 10) then processes every character in the input until End-Of-File is encountered. If the character, c, is an ascii, and is a printable or space character (line 11), it is output as is (line 12). Otherwise it is a non-printable character, and if strip=0 it is printed out in octal form (otherwise it is simply ignored). The program is run as either one of:

vis -s <inputfile
vis <inputfile

The first test file contains a Ctrl-h, which turns the word spaghetti, into spaghett. Running it through the program produces:

spaghetti\010

Indicating there is a backspace character embedded at the end of the word. The second test file contains a quote (by Dijkstra) copied directly from the internet. Here is the text:

“Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better.” “The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.”

Running it produced the following output:

		\342\200\234Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better.\342\200\235 \342\200\234The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.\342\200\235

Clearly the text contains one of the largest problems makers – double quotes. Here the octal sequence \342\200\234 represents the “left double quotation mark”, and \342\200\235 represents the “right double quotation mark”.

Ever wondered why goto is considered harmful?

Whenever the subject of goto comes up there is always a slew of people of people who say that goto being harmful is considered an overreaction. I doubt those people have ever tried to reengineer or even comprehend a legacy program. The main reason it is considered “harmful”, is because as complexity of a program increases, unstructured jumps can lead to an inability to understand what is happening in a program, even in small programs. Maintenance of programs with goto is also a nightmare. Saying “goto is a tool, and tools should be used” is somewhat of a simplistic argument. A rock is a tool, but we have moved far beyond using rocks for hammers, or anything for that matter (yes you could use one as such if stuck in the wilderness, but that analogy doesn’t work for programming).

To illustrate the complexity that results from the inclusion of a single arbitrary goto, consider the following flowcharts [1].

Simple versus complex

The flowchart on the left has 72 unique paths. The flowchart on the right with the dotted line signifying an unstructured jump has 93,748,416,840 unique paths [2]. Here’s the math for those interested:

The complexity added by a single unstructured jump makes it extremely hard to verify such as program. Now imagine a program where there are 10 or 100 such jumps? This is one of the principles underpinning Dijkstra’s formative letter of 1968, that programs with a lot of goto statements become complex, and therefore hard to understand. Anyone who has tried to decipher even a small Fortran IV program containing a cornucopia of arithmetic if‘s will understand.

Sources:

  1. Krause, K.W. et al. Optimal Software Test Planning Through Automated Network Analysis (1973)
  2. Gilb, T., Software Metrics, Winthrop Publishers Inc. (1977)

Coding Cobol: Many ways of doing things

There often different ways of doing things in Cobol. For example, in this post on reading arrays of data from a file, there is a segment of code that prints out the values in a table (array). The code uses a loop to call a paragraph print-out-planets:

perform print-out-planets varying i from 1 by 1 until i=6.
...
print-out-planets.
   display "Planet: ", i, " ", planets(i).

This could also be written without the need for a separate paragraph:

perform varying i from 1 by 1 until i=6
   display "Planet: ", i, " ", planets(i)
end-perform.

In the same way, the paragraph read-planet could be rewritten. The code from the initial post is:

read-planet.
   read input-file at end move 'y' to feof.
   if feof is not = 'y'
      move planet-info to planets(i)
      add 1 to i
   end-if.

This can be re-written as:

read-planet.
   read input-file 
      at end move 'y' to feof
      not at end
      move planet-info to planets(i)
      add 1 to i.

Or in a simpler form:

read-planet.
   read input-file
   at end move 'y' to feof
   not at end move planet-info to planets(i), add 1 to i.

It is also possible to absorb the paragraph read-planet back into the calling sequence. The calling sequence currently looks like this:

perform read-planet until feof='y'.

It can instead be changed to:

   perform with test before until feof='y'
      read input-file
         at end move 'y' to feof
         not at end move planet-info to planets(i), add 1 to i
   end-perform.

What these things show is that it is possible to write things in many ways in Cobol. Some are a little trickier, and code that works in a paragraph may not always be easily cut-and-paste into a loop.

Coding Cobol: Tricks with arrays (or tables)

Arrays in Cobol are called tables, and they are a bit odd. For example the following code creates a 1D table with 5 elements in it, each of type x(5), or rather a “string” of 5 ascii characters.

01 planets occurs 5 times.
   03 swplanet pic x(5).

Accessing each of the strings is done using the construct planets(x), where x is the index. This is basically equivalent to:

01 planets pic x(5) occurs 5 times.

To make the string elements easily accessible, the declaration can be modified to:

01 planets occurs 5 times.
   03 swplanet pic x occurs 5 times.

Now planets is still a 1D table, but is comprised of swplanet, which is also a 1D table, so in reality swplanet is a 2D table. For example if the data stored in planets is:

Endor
Jakku
Naboo
Jedha
Yavin

Then planets(1) is Endor, planets(2) is Jakku, etc. Now swplanet(i,j) will access the jth element of the ith row, e.g. the value of swplanet(2,1) is J (from Jakku). The code below prints out the individual elements of the 2D table in a row.

   perform varying i from 1 by 1 until i=6
      perform varying j from 1 by 1 until j=6
         display swplanet(i,j), " " with no advancing
      end-perform
   end-perform.

Here is the output (note without the “no advancing“, each element would be output to a separate line):

E n d o r J a k k u N a b o o J e d h a Y a v i n

You can also associate indexes directly with a table (meaning they don’t need to be declared independently). For example:

01 planets occurs 5 times indexed by i.
   03 swplanet pic x occurs 5 times indexed by j.

Now to change the indexes we need to use the set statement. For example to read in the data from file, the code changes from:

move 1 to i.
perform with test before until feof='y'
   read input-file
      at end move 'y' to feof
      not at end move planet-info to planets(i), add 1 to i
end-perform.

to:

set i to 1.
perform with test before until feof='y'
   read input-file
      at end move 'y' to feof
      not at end move planet-info to planets(i), set i up by 1
end-perform.

Coding Cobol: Reading in an array of things

Arrays in Cobol are actually tables, and reading them in can be tricky. This is especially true depending on how data is stored in the file. Unlike other languages, it makes a big difference if the data is next to each other or on separate lines. Here is an example, a list of Star Wars planets shown in two forms.

EndorJakkuNabooJedhaYavin
Endor
Jakku
Naboo
Jedha
Yavin

Notice the data only has planets with 5 letters, using more requires padding (because Cobol wants to read X amount of data, not Y). So Let’s look at a program which reads the first iteration of the data. Remember, Cobol is designed to read and process specific types of data. Here is one program that reads in the first set of data and processes it.

identification division.
program-id. swp.

environment division.
input-output section.
file-control.
select input-file assign to dynamic ws-fname.

data division.
file section.
fd input-file
   record contains 25 characters.
01 planet-info.
   03 planet pic x(5) occurs 5 times.

working-storage section.
77 i pic 99.
77 ws-fname pic x(30).

procedure division.
   display "Enter the filename: " with no advancing.
   accept ws-fname.

   open input input-file.
   read input-file
   end-read.
   close input-file.

   move 1 to i.
   perform print-out-planets
      until i is greater than 5.
   stop run.

print-out-planets.
   display "Planet: ", i, " ", planet(i).
   add 1 to i.

Try using the second set as input and it won’t work. Why? Because the input contains <return> characters which will mess it up. This program works okay, but in reality input records should not be used for further processing. It might be better to read in the values in a singular fashion, and then store them in a table (array). First make sure there is a “organization is line sequential” in the file-control section.

identification division.
program-id. swp.

environment division.
input-output section.
file-control.
select input-file assign to dynamic ws-fname
   organization is line sequential.

data division.
file section.
fd input-file.
01 planet-info.
   03 planet pic x(5).

working-storage section.
77 i        pic 99.
77 ws-fname pic x(30).
77 feof     pic a(1).
01 planets occurs 5 times.
   03 swplanet pic x(5).

procedure division.
   display "Enter the filename: " with no advancing.
   accept ws-fname.

   move 1 to i.
   open input input-file.
   perform read-planet until feof='y'.
   close input-file.

   perform print-out-planets varying i from 1 by 1 until i=6.
   stop run.

read-planet.
   read input-file at end move 'y' to feof.
   if feof is not = 'y'
      move planet-info to planets(i)
      add 1 to i
   end-if.

print-out-planets.
   display "Planet: ", i, " ", planets(i).

Note the file descriptor for input-file is one record, which is used by the paragraph read-planet, to read in a single record, and if it is not the EOF, store that record in the table planets. This way the whole file is read in a sequential manner. Note the use of separate paragraphs to perform certain tasks. This is effectively similar to functions in other languages, except that all variables are global.

A process zapper in C

WARNING: Programmer discretion is advised.
Post contains system programming, C, terminated processes.
Please don’t play with systems programming unless you know what you are doing.

As much as “Force Quit” is useful in OSX, sometimes it doesn’t do the job properly, and you may be forced to look a little deeper using the command line. While is it easy to terminate a process using the Unix command kill, it is sometimes less than easy to find the actual process ID. In “The UNIX Programming Environment”, Kernighan and Pike introduce a program called zap, written in C, which is an “interactive process killer”. I have reproduced the code below with a few fixes/tweaks from the original. It is a good example of how C can be used for some basic systems processing, and why it is such a powerful language in this realm.

The basic pretext for the program is the use of the Unix command ps, which shows active processes. Used without any flags, ps just returns a list of processes running from controlling terminals, i.e. applications won’t be shown. To do this we need to to use the -ax flag, which will show all processes running including those which do not have a controlling terminal. The program uses the function popen(), which is similar to fopen() except that it opens a “process”. The program zap.c is shown below. The process to be searched for, is provided as an argument to zap, e.g. zap firefox.

#include <stdio.h>
#include <signal.h>
#include <string.h>
char *progname;
char *ps = "ps -ax";

int main(int argc, char *argv[]){
   FILE *fin;
   char buf[BUFSIZ];
   char rsp;
   int pid;

   progname = argv[0];
   if ((fin = popen(ps, "r")) == NULL){
      printf("%s: can't run %s\n", progname, ps);
      return(1);
   }
   fgets(buf, sizeof(buf), fin);
   while (fgets(buf, sizeof(buf), fin) != NULL)
      if (argc == 1 || strstr(buf, argv[1]) != NULL) {
         buf[strlen(buf)-1] = '\0';
         printf("force quit: %s? [y/n]", buf);
         scanf(" %c",&rsp);
         if (rsp == 'y'){
            sscanf(buf, "%d", &pid);
            printf("terminating process %d\n", pid);
            kill(pid, SIGKILL);
         }
      }
   return(0);
}

The code on lines 14-17 checks to see if popen() can open the command “ps -ax“. If it can’t do this, an error is generated. Line 18 just reads and discards the header line generated by “ps -ax“. The loop in line 19 iterates through each line in the output. If the line (buf) contains the search string (argv[1]), then the line is further processed (line 20). Line 21 suppresses the '\n'. Lines 22-23 prompts the user to indicate whether the process is to be terminated. If the response is yes (line 24), the first value in the line, an integer representing the process-id is extracted from the line (buf), and stored in the variable pid (line 25). Finally, the process is terminated using the function kill() from the library signal.h (line 27).

Here is a sample run:

% zap top
force quit: 38555 ttys039    0:00.79 top? [y/n]y
terminating process 38555
force quit: 38556 ttys046    0:00.01 ./a.out top? [y/n]n

Note that the changes from the original code include the use of strstr() replacing K&P’s function strindex(), and the replacement of K&P’s function ttyin() with a simple scanf(). There was no strstr() function in the standard string library at the time K&P wrote the book.

This works on OSX, and may have to be modified for other Unix variants.

The Myths of Fortran

I find it funny when people say things like “It is an old language written for older mainframes”, or even better “It is a very specialized language that require speed of calculations so mostly used in scientific computation or on legacy systems.”. The people that make these comments are clueless. Clueless and naive. They probably think OO is great, and C++ is a perfectly good language.

Yes, Fortran is old, but then so too is C, and few people seem to complain about C, or its derivatives, which are marginal modifications at best. Fortran is 65 years old this year, C is 50. The age of a language doesn’t matter at all. It is it’s ability to do things – and Fortran can do a *lot*. Speed-wise it’s up there with C, and it allows processing things like arrays very easily. There’s also less to worry about from the perspective of memory than say C. People just don’t like Fortran because they have had little or no exposure to it. They like C, or Java, or C++ because that’s all they know. Sure Fortran was once an old language, and there is still plenty of legacy code out there, but modern Fortran is nothing like Fortran I.

Let’s dispel some of the myths:

  • Myth: Fortran is old.
    • Reality: Define old? Modern versions of Fortran are just as useful as any other modern language, without some of the baggage, and still allowing for backwards compatibility. Don’t be ageist.
  • Myth: Fortran has no pointers.
    • Reality: Pointers were actually introduced in Fortran 90, and are many times easier to understand than those in C.
  • Myth: Fortran doesn’t allow recursion.
    • Reality: Recursion was introduced in Fortran 90… if you really need to use it. People tend to think recursion is some sort of panacea, but its not.
  • Myth: Fortran only does implicit typing.
    • Reality: Fortran does have implicit typing rules, but since F77 allows them to be overridden by declaring the variable. Also using “implicit none” invokes explicit declaration for all variables.
  • Myth: Fortran is fixed format.
    • Reality: Fortran allows free-formatting since F90.
  • Myth: Fortran uses lots of goto statements.
    • Reality: Fortran did once use a lot of goto statements in many differing guises, e.g. the ubiquitous arithmetic if. Modern Fortran does not, in fact the Arithmetic if is an obsolescent feature now. Besides which, goto statements don’t write themselves.
  • Myth: Fortran doesn’t use modern programming constructs.
    • Reality: All control structures in Fortran are modern.
  • Myth: Fortran promotes spaghetti code.
    • Reality: No, no, and no again. Yes, there was once a time when Fortran’s structures promoted the use of spaghetti code, but the programmer mentality at the time didn’t help much. You can write unstructured spaghetti code in any language.
  • Myth: Fortran isn’t portable.
    • Reality: Sure, once upon a time there were different Fortran compilers on different systems. People also used 8″ floppy disk drives. But Fortran 66 was the first “standard”. Now, Fortran is no less portable than C. There are standard Fortran compilers for every major platform that exists today.
  • Myth: Fortran is dead.
    • Reality: No, people write Fortran programs every day. The biggest problem is that institutions of higher learning don’t teach Fortran, so many programmers are exposed only to languages of the C family.
  • Myth: Fortran is hard to learn.
    • Reality: Fortran is easier to learn than many languages. The syntax of the language is very easy for anyone who can understand English. It does not overuse symbols to made code more succinct (like C does), does not push pointers on the user (like C does), and does not do stupid things (like C does, e.g. dangling-else).

Fortran is not outdated, nor is it complex, nor a state of mind. Fortran is just as relevant today as any other language, and probably just a tad less complex.

Coding Cobol: A note on strings

There are two types of string in Cobol, and they are very different. The first type of string is one that treats the entity created as en entire entity, i.e. it is not possible to index this type of string. These strings were created in Cobol to store things like peoples names etc. and Cobol provides a lot of functionality to process them, they are just not able to be subscripted because they are not arrays. Here is an example of a program that reads in a 10-character entity, and tries to print it out by elements.

identification division.
program-id. strings.

data division.
working-storage section.
01 str1 pic x(10).
77 i pic 99.

procedure division.
    accept str1.
    perform varying i from 1 by 1 until i=10
       display str1(i)
    end-perform.
stop run.

When compiled, this will produce the error message: “error: ‘str1' cannot be subscripted“. It can be subscripted because it is a string, but it can be sub-stringed using str(i:x), where x represents the length of the substring, x=1 for a single element), as noted in the comments.

In order to create a string that can be subscripted, you need an array of characters (or rather a table as an array is known in Cobol). Below is the same example as above using an array.

identification division.
program-id. strings.

data division.
working-storage section.
01 str-arr.
   03 str2 pic x occurs 10 times.
77 i pic 99.

procedure division.
    accept str-arr.
    display str-arr.
    perform varying i from 1 by 1 until i=11
       display str2(i)
    end-perform.

stop run.

Notice two things. One, that the array is created with both a “global” name, str-arr (line6), and a indexed name, str2 (line 7). The use of str-arr allows for the array of characters to be read in as a whole, rather than having to use a loop (line 11) – and also output as a whole entity (line 12). If you want to index the array, then you can use str2, as shown in the loop in lines 13-15. The input and output from the above code is shown below:

photograph
photograph
p
h
o
t
o
g
r
a
p
h

Dijkstra on PL/1

“…I must mention PL/1, a programming language for which the defining documentation is of a frightening size and complexity. Using PL/1 must be like flying a plane with 7000 buttons, switches and handles to manipulate in the cockpit. I absolutely fail to see how we can keep our growing programs firmly within our intellectual grip when by its sheer baroqueness the programming language —our basic tool, mind you!— already escapes our intellectual control.”

Dijkstra, The Humble Programmer (1972)