Ever wondered why goto is considered harmful?

Whenever the subject of goto comes up there is always a slew of people of people who say that goto being harmful is considered an overreaction. I doubt those people have ever tried to reengineer or even comprehend a legacy program. The main reason it is considered “harmful”, is because as complexity of a program increases, unstructured jumps can lead to an inability to understand what is happening in a program, even in small programs. Maintenance of programs with goto is also a nightmare. Saying “goto is a tool, and tools should be used” is somewhat of a simplistic argument. A rock is a tool, but we have moved far beyond using rocks for hammers, or anything for that matter (yes you could use one as such if stuck in the wilderness, but that analogy doesn’t work for programming).

To illustrate the complexity that results from the inclusion of a single arbitrary goto, consider the following flowcharts [1].

The flowchart on the left has 72 unique paths. The flowchart on the right with the dotted line signifying an unstructured jump has 93,748,416,840 unique paths [2]. Here’s the math for those interested:

The complexity added by a single unstructured jump makes it extremely hard to verify such as program. Now imagine a program where there are 10 or 100 such jumps? This is one of the principles underpinning Dijkstra’s formative letter of 1968, that programs with a lot of goto statements become complex, and therefore hard to understand. Anyone who has tried to decipher even a small Fortran IV program containing a cornucopia of arithmetic if‘s will understand.

Sources:

Krause, K.W. et al. Optimal Software Test Planning Through Automated Network Analysis (1973)
Gilb, T., Software Metrics, Winthrop Publishers Inc. (1977)

Ditching a tricky arithmetic if

A three-way branch known as the arithmetic-if, can be tricky to remove sometimes. Consider the code below:

    do 6 i = 1,3
       if (dif - c(i)) 7,7,6
  6 continue
    i = 4
  7 ffm = b(i)*exp(a(i)*dif)

The arithmetic-if works by branching based on evaluating the expression: the result is either less-than zero, zero, or greater-than zero. Here if the value of dif-c(i) is ≤0 then the program branches to label 7, otherwise if >0 it branches to label 6, which is the next iteration of the loop. Basically the loop exits when dif is less than or equal to c(i), and the value of i is used in the calculation of ffm. If the condition is never met, then the loop ends, and the value of i is set to 4, and ffm is calculated.

The first thing to do when re-engineering this, is to update the do-loop to F90+ standards, and reconfigure how the if statement works, essentially transforming it to a regular logical if, that uses goto to branch to label 7 if the difference is ≤ 0.

    do i = 1,3
       if (dif - c(i) .le. 0) then
          go to 7
       end if
    end do
    i = 4
  7 ffm = b(i)*exp(a(i)*dif)

The problem here is removing the goto and still being able to skip the statement i=4. The trick is to incorporate i=4 into the loop. For example:

do i = 1,4
   if (i == 4) then
      exit
   else if (dif <= c(i)) then
      exit
   end if
end do
ffm = b(i)*exp(a(i)*dif)

The loop runs through 4 times, when the index i becomes 4, the loop exits, and ffm is calculated. Otherwise for i is 1…3, the loop only exits if dif is less than or equal to c(i). This is only one approach, there are others of course.

Algol – Breaking the sequential flow

Algol was the first real “algorithmic” language, more so than Fortran because the latter contained a lot of structures that we couldn’t consider pleasant, and most of them had to do with what is considered breaking the sequential flow. In early Fortran there were a lot of “jump” statements, either explicitly goto, or thinly veiled as goto (see arithmetic if). In early languages goto (or go to) was often used in place of repetitive statements, either because the programmer was use to jumps (from assembler), mistrusted loops, or just didn’t consider them. Algol used the go to statement and associated labels to break the sequential flow.

Consider the arithmetic sequence, S+1/n/n where S=0 initially, and n takes on a sequence of values 1,…,∞. This can be expressed as:

Of course ∞ makes for a lot of work, so it is easier to stop the process at some point, let’s say 1.6449 (π²/6). Here is an Algol program to perform the task:

This piece of code has two jumps. The first one, which uses the label L1 mimics a loop, repetitively processing the next value of n, until such time as S is greater-than-or-equal-to 1.6449, then the second jump is invoked to label L2, effectively exiting the loop.

Removing goto’s that jump backwards

One of the issues when re-engineering an old Fortran program is the global goto’s that jump backwards. They kind-of emulate a loop, but sometimes cross over. Here is an example:

In this situation, if the conditional, C is true, then the program goto’s to label 20, otherwise it goto’s to label 10. In both situations, the code in B is executed, but only if C is false is the code in A executed. This code could look something like this:

   program gotoWITHloop
   integer :: turn
   turn = 1
10 write(*,*) "code A"
   ! This part of the code only happens when turn is 1
20 write(*,*) "code B"
   ! This part of the code only happens when turn is both 1 and 2

   if (turn .eq. 1) then
      turn = 2
      goto 20
   else
      turn = 1
      goto 10
   end if
   end

The conditional is related to whether the variable turn is 1 or 2. First time into the loop both code A and B are activated, and then the conditional is applied. If turn is equal to 1, the program jumps to label 20, then back to the conditional etc. If turn is equal to 2, then the program jumps to label 10 at the top, and flows through etc.

How can this be fixed, to remove the goto statements, and their associated labels?

One way of re-engineering this involves the use of a loop, and a subroutine.

program gotoWITHloop
   integer :: turn
   turn = 1
   do
      if (turn .eq. 1) then
         call codaA()
      end if
      write(*,*) "code B"
      ! This part of the code only happens when turn is both 1 and 2
      if (turn .eq. 1) then
         turn = 2
      else
         turn = 1
      end if
   end do
end
! This part of the code only happens when turn is 1
subroutine codeA()
   implicit none
   write(*,*) "code A"
end subroutine codeA

As the stuff in code A is only activated with turn equal to 1, it can be situated in an if statement. It also might be nice, to encapsulate the code itself into a subroutine. The labels can then be removed, and the entire code situated inside a loop.

This of course is just one way to deal with this sort of a situation. In many cases removing these goto’s involves a combination of loops, if statements, and subroutines.

Is goto evil?

What is goto?

A goto statement typically performs an unstructured jump within a program, and may be a legacy of the jump instructions found in assembler code. Its use was popular in the early years of programming, before the advent of structured programming in the 1960s and 1970s. Its first use was likely Fortran, and subsequently Cobol. Fortran eventually incorporated structured programming in the 1970s, and the dependence on goto declined. It was still incorporated into languages that followed such as C and Ada, but is missing from Java and Python. Many controls structures had their origins in the goto statement. The conditional goto used in Fortran to jumping over code evolved into the if statement with a compound body. The conditional goto for repetition turned into the structured repetitive loops: for, while, do-while, repeat-until etc. The conditional goto for jumping out a loop evolved into the break statement.

Is goto still considered harmful?

In 1968 Edsger Dijkstra wrote an article published in Communications of the ACM titled “Go-to statement considered harmful”. The paper was frequently referenced in the ensuing years, and may have laid the groundwork for the demise of unstructured programming. Ironically, Dijkstra acknowledges in “What led to ‘Notes on Structured Programming’” that the original paper was titled ‘A case against the goto statement’, and was changed by the editor, Niklaus Wirth.

A program composed entirely of assignment statements is easy to trace, because its flow is sequential, there are no unstructured branches. The addition of conditional statements doesn’t change things that much, the process still remains linearly sequential, although different paths through the code can be taken. Adding repetition statements makes the code more complex, but does not make it unstructured – it is always possible to easily trace through the code. The addition of unstructured jumps makes it much more challenging to trace through a program. In Dijkstra’s words, the goto statement is “just too primitive”, a structure which makes proving program correctness much more challenging.

So why is goto considered harmful? The #1 reason may be that it is possible to use goto in such a manner that it obscures the structure of a program to the point where it becomes impossible to decipher. William Wulf in his paper “A Case Against the GOTO” puts it another way: it is possible to use the goto to fabricate a “rat’s nest” of control flow. Let’s not forget that early languages had little in the way of structure – nearly everything was achieved using goto statements. In early Fortran, if statements were limited to a single statement, compound statements required the use of a goto statement. To perform a block of code if the value of a variable X is less than 100 required writing a statement of the form:

    IF (X .GE. 100) GO TO 50
    ...code executed if X<100
50

There was also no else statement, so a corresponding statement can be similarly fashioned:

    IF (X .GE. 100) GO TO 50
    ...code executed if X<100
    GO TO 100
50
    ...code executed if X>=100
100

Loops were created in the following manner:

   I = 0
10 I = I + 1
    ...do something in the loop
   IF (I .NE. N) GO TO 10

This may have precipitated the move to structured programming.

One can take the adage that “goto doesn’t create bad code, bad programmers do”, but essentially before structured programming there was goto, often producing a spaghetti like code. One doesn’t really appreciate the context of this until having translated a program from a “legacy” language such as Cobol or Fortran – one that contained a myriad of goto statements disguised in various forms. For example, in Fortran, there was:

the computed GO TO:

GO TO (210,220,230,240,250), i

the unconditional branch

GO TO 400

the assigned goto

GO TO N, (10,20,30,40)

and the arithmetic if (performs the same sort of unstructured branching)

IF (N) 10,20,30

Consider the following Fortran code containing a series of arithmetic if statements, and two go to statements which mimic the effect of loops:

One or two goto’s in a small program may not seem like a travesty, when they start to comprise one of the core structures in a program, the program becomes too unstructured to analyze properly. The sample Fortran program shown above is 22 lines long, and contains six unstructured jumps. Imagine what longer programs would look like?

Code containing gotos is also harder to format, and reduces the efficiency of compiler optimizations. Source code optimizations differ from the efficiencies obtained at compiler time. The use of the goto also violates the principles of structured programming. In languages such as Fortran, especially earlier dialects, loops often existed in the form of free loops, hand built structures that were dependent on the proper branching logic. For example consider the following code shown in both Fortran and C:

The problem here is that the program iterates infinitely, a legacy of the minus sign in I=I-1, which should be a + sign. I becomes negative, and the if statement never fails. Using a for loop in the case of the C program would not allow this to happen.

Modern dialects of Fortran have all but made obsolete several of these Fortran goto’s including computed go to and assigned go to.

When to jump around?

Sparse use of goto statements can be okay, for experienced programmers in specific circumstances. For example, using a single goto and single label to exit several levels of scope, or using a goto in the middle of complex construction to short cut to the top of a loop. It may also be used in error handling where there in the absence of exceptions.

Ada and goto

Every language has its hidden “features”, and Ada, like most other languages has a goto statement. Any statement could be labelled by an identifier enclosed in double angle brackets, << >>. For example:

<<gohere>> g = 12.3; ...
goto gohere;

However, the goto in Ada is somewhat better behaved. It cannot transfer control outside the current subprogram or package body; it cannot transfer control inside a structure (e.g. from else to then in an if statement); and it cannot transfer control from the outside of a structured statement into the body of a structured statement. Consider the following code:

with ada.Text_IO; use Ada.Text_IO;
with ada.Integer_Text_IO; use ada.Integer_Text_IO;

procedure gotoada is
   x,y,z : integer;
 
begin
         put_line("Enter a number: ");
         get(x);
<<lbl1>> if x = 1 then
            y := 1;
            z := 1;
<<lbl2>> else
            y := 2;
            z := 2;
         end if;

         case y is
            when 1 =>
               put_line("jump to top");
               goto lbl1;
            when others =>
               put_line("jump to middle");
               goto lbl2;
         end case;

end gotoada;

The goto associated with lbl1 is allowed. However the goto associated with lbl2, which jumps into the middle of the if statement above it, is not allowed. Try to compile this, and you’ll get the following compiler message:

gotoada.adb:24:21: target of goto statement is not reachable

Languages like Pascal would easily allow this sort of code, but Ada was designed to prevent issues with uber unstructured code.

Fortran Re-engineering: Debt repayment schedule (iii)

Now that the program looks nicer, we will deal with the first of the legacy issues, the declaration of the variables. This is simple, as it just requires the addition of the ::. For example:

integer year, bdate
real i, int, intamt

becomes

integer :: year, bdate
real :: i, int, intamt

Next we will deal with the label based loop. Here is the code that forms the loop:

      do 15 j=1,k
      year=bdate+n+j
      int=cmpamt*i
      balrd=pmtamt-int
      cmpamt=cmpamt-balrd
 15   write(*,4) year,pmtamt,int,balrd,cmpamt
 4    format(5x,i4,6x,f8.0,3x,f8.0,5x,f8.0,6x,f8.0)

To deal with this we move to the newer spec for loops, i.e. do/end do. The loop above now becomes:

do j = 1,k
   year=bdate+n+j
   int=cmpamt*i
   balrd=pmtamt-int
   cmpamt=cmpamt-balrd
   write(*,4) year,pmtamt,int,balrd,cmpamt
   4 format(5x,i4,6x,f8.0,3x,f8.0,5x,f8.0,6x,f8.0)
end do

The label has been removed, and the loop properly terminated. Notice the label 4 associated with the format statement has been indented. There is no right or wrong way of doing this, doing it this way just makes it fall in line with the rest of the indented code.

The next unstructured jump to dispose of is that associated with label 10, which basically cycles through multiple debts in the input. It looks like this:

 10   read(*,1,end=50) amt,i,n,k,bdate
      ...
      ...
      goto 10

In the read statement, the clause end=50 implies that at the end of the input, jump to label 50, which in the case of this program, is the end of the program. The first thing to do is update this. In newer versions of Fortran, one can use the parameter iostat. For example the code above is modified to:

 10   read(*,1,iostat=reason) amt,i,n,k,bdate

The status of the I/O is then returned in the integer variable reason. If reason is less than 0, then the input has ended. A simple if statement following this will deal with exiting the program if this is the case.

if (reason < 0) then
   stop
end if

The structure of the program can now be modified to incorporate a global do/end do loop. This would look like this:

do 
   read(*,1,iostat=reason) amt,i,n,k,bdate
   if (reason < 0) then 
      stop 
   end if
   ...
   ...
   stop
end do

Basically the read statement deals with the input, returning a status in the variable reason. If reason is less than zero, the program terminates, otherwise the program processes the data.

All the unstructured jumps are now gone!

How many goto’s in old code?

In 1971, Donald Knuth published a paper, “Empirical study of FORTRAN programs”, in Software Practice and Experience. In it he studied over 250,000 lines of Fortran code from Lockheed and 11,000 lines from Stanford University. Knuth determined that the four most popular statement types account for over 70% of the lines of code. What were they?

So GOTO accounted for 13% of statements in the Lockheed code, or 24,942 times. Of the 28,783 IF statements found in both studies, 8,858 statements were arithmetic IFs, while the remaining 19,925 had the form of IF(…), and 71% of those were IF (…) GO TO. So statements involving jumps of some form abounded! Now imagine re-engineering one of these *behemoth* legacy programs with *that* many goto’s.

Knuth, D.E., “Empirical study of FORTRAN programs”, Software-Practice and Experience, 1, pp.105-133 (1971).

goto not always considered harmful (in Ada)

You never thought I would say that did you? But in the right context, goto can be used for good. In languages such as Ada, which has no continue construct for its loops, there are less options. So can a goto be used to build one? The answer is yes. Here is an example:

procedure gotoEG is
  i : integer;
begin
  i := 0;
  while i<5 loop 
    if i rem 2 = 1 then 
        goto Continue; 
    end if; 
    Put(i);
    New_Line;
  <<Continue>> 
    i := i + 1;
  end loop;
end gotoEG;

This works in a very simple way. In the above snippet of code, if i evaluates to an odd integer, control is immediately transferred to the label <<Continue>>, which means the next iteration of the loop is invoked. Why doesn’t Ada have something like a continue (or skip)? Likely because it’s not really a pivotal construct. On a side note, for some reason, it wouldn’t allow the code to appear in the following manner:

  ...
<<Continue>> 
  end loop;
end gotoEG;

When trying to compile this, it threw an Ada “compilation error” because it didn’t like that there was no instruction between the label <<Continue>> and the end of the loop. Is there a way of doing this without using a goto? Yes, but it is a little bit more convoluted. In the code below, an exception is used

procedure nogoto is
  i : integer;
  Continue : exception;
begin
  i := 0;
  while i<5 loop
    i := i + 1;
    Put_Line("start loop");
    begin
      if i rem 2 = 1 then
        raise Continue;
      end if;
      Put(i);
      New_Line;
      exception
        when Continue => null;
    end;
  end loop;
end nogoto;

An example of why goto is horrible

Here is a piece of code which does a horrible thing.

It runs the loop 10 times, then the goto statement is activated and jumps into the loop, and activates the printf statement once again (with the value of the loop index i being 11). Now i is checked against the loop conditional, and because it is greater than 10, the loop is exited again, and the goto statement is again activated. This continues infinitum.

int i;
for (i=1; i<=10; i=i+1)
    OMG: printf("%d where are you when you go to?\n", i);
goto OMG;

It is not he fact that the goto has been used, but rather how it has been used. C by no means should allow a goto to jump into a loop. But C is in reality a high level abstraction of assembler, in which goto’s are fairly standard.

However it is not the goto that is the problem, it is the programmer’s who use goto’s in such a manner that they produce unstructured code, making it extremely difficult to follow.

The Craft of Coding

Musings on programming and education

goto statement