Re-engineering versus Refactoring

When dealing with legacy software, it is important to understand what can be done with the software. Legacy software often consists of software that has been left to run for a long time without too many inherent changes, the “don’t fix what isn’t broken” strategy. As compilers in languages such as Fortran are backwards compatible, it is often possible to compile and run these old programs. Yet at some point it becomes necessary to deal with the old code. So how to is this achieved? Is the code to be re-engineered or refactored?

Re-engineering means making fundamental changes to the code. Here are three core methods of reengineering:

  1. Porting – programs are modified to work on a new hardware platform.
  2. Translation – programs are translated from legacy language to contemporary one.
  3. Migration – programs are converted from a legacy language to a newer dialect.

In essence this is no different to the work that would be done to an old building. It might be moved in its entirety to a new location, it might be completely rebuilt, or it might be made new, incorporating only the facade of the original building.

Refactoring on the other hand, leaves things more intact. Refactoring involves changing a piece of software in such a manner that the external behaviour of the code remains unchanged, but it’s internal structure and architecture are enhanced. This is akin to modernizing the plumbing and electrical system of an old building. It still functions and looks the same way, but the infrastructure has been improved. Refactoring takes control of decaying code, improving the readability and maintainability of existing code. Refactoring is done to fix short-cuts, eliminate duplication and dead code, and to make the design and logic clear. To make better and clearer use of the programming language. It does not necessarily imply that the code is migrated to a new dialect of the language. Refactoring is often a part of the life-cycle of software, and may not be targeted specifically at legacy code.

Reengineering and refactoring look very similar, and there are likely areas, such as migration, where they overlap. In reality the process of dealing with legacy code often begins with refactoring, and progresses to reengineering. In situations where the code base is too complex, it might be worthwhile trying to improve efficiency first by improving algorithms. If this doesn’t work however, reengineering might be in the cards.

Here’s an example of the possibilities when dealing with a legacy, say Fortran IV, piece of code. The refactoring may involve processes such as:

  1. eliminating equivalence statements: specifies that two or more variables or arrays in a program unit share the same memory.
  2. elimination of common blocks: shared, common storage for Fortran programs prior to F90.
  3. removing dead code: code that is never accessed.


Reengineering on the other hand could involve a port to a new platform, a translation to C, or a migration to Fortran 95.



An example of why goto is horrible

Here is a piece of code which does a horrible thing.

It runs the loop 10 times, then the goto statement is activated and jumps into the loop, and activates the printf statement once again (with the value of the loop index i being 11). Now i is checked against the loop conditional, and because it is greater than 10, the loop is exited again, and the goto statement is again activated. This continues infinitum.

int i;
for (i=1; i<=10; i=i+1)
    OMG: printf("%d where are you when you go to?\n", i);
goto OMG;

It is not he fact that the goto has been used, but rather how it has been used. C by no means should allow a goto to jump into a loop. But C is in reality a high level abstraction of assembler, in which goto’s are fairly standard.

However it is not the goto that is the problem, it is the programmer’s who use goto’s in such a manner that they produce unstructured code, making it extremely difficult to follow.

Cobol – the elephant in the room

The elephant in the room is likely Cobol (and to a lesser extent Fortran). Still widely used in industry, yet often maligned in academic circles. Who needs to teach that languages anymore. Cobol isn’t well liked because of its verbosity, English-like syntax (who would have thought), and lack of structured programming. Yet it is said that more than 70% of the worlds business runs on Cobol. True, Cobol isn’t cool. It is criticized for being “out-of-date, not attractive, complex, and expensive”. Sure there isn’t a huge demand for people with an understanding of Cobol – but neither is there a glut of people who can decipher and re-engineer Cobol programs – see the crux is re-engineering, not coding per-se.

Think everyone will be porting their Cobol code to Java? Not likely. Here is a case in point – healthcare insurer BlueCross BlueShield (check out this Computerworld article written by Robert L. Mitchell). They process nearly 10% of all healthcare claims in the U.S., and run millions of lines of optimized Cobol to process 19.4 billion online healthcare transactions annually. Cobol handles transactional workloads on mainframes extremely well, and is hard to replace in this role. Beyond popular belief the mainframe is not dead – Unisys, Groupe Bull, Fujitsu, and IBM all still make mainframes.

Food for thought.

The reality is that in 10 years time, iOS will have been replaced by something even newer, just as application development on iOS devices morphs from Objective C to Swift. Such platforms and applications are transitional, i.e. they don’t have a long lifespan. Yet in 20 years, Cobol will still be with us – in some form or other.

Learn a legacy language if you want to differentiate yourself from everyone else. Or better still – learn a legacy language, AND how to re-engineer it.

Want to read more? – here’s a great article by Microfocus.