Programs that are too messy to deal with, i.e. especially those that could well be identified as “spaghetti code”, are often better treated using reverse engineering. Basically reverse engineering analyzes the program to determine what exactly it does, and using this information, reconstructs a new program with the same parameters. This example is ideally suited because of some of the arithmetic
if statements, and the general confusion they cause. From the perspective of time, it is not worth the effort to try and reengineer things in the traditional sense of things to make it work.
This program basically does the following:
- Read in a series of characters from the standard input into a buffer string,
- Set the value of the index
- Parse the string
bufr, using the index
ktEach element from the string is converted to its respective integer ASCII value.
- The following actions are then performed:
cis a period, “.”, it designates the end of a sentence, and the sentence counter
cis a slash, “/”, it designates the end of input, and the program proceeds to output the statistics and terminate.
cis not a comma, “;”, semicolon, “;”, or dash, “-“, (i.e. any other character) the character counter,
- The index value
ktis incremented. If
ktis less than the length of the buffer, the program loops back to 3, otherwise it loops back to 1.
Once the program has been reverse-engineered, it is then time to make a choice. Do we build a new program based on the logic of the original algorithm, or do we tweak the algorithm? Taking the first approach means that we likely have to do twice the work, as once the program is written it will have to be enhanced to improve the algorithm and remove nonsense code. Taking the second approach means that we end up with an improved algorithm. We will take the latter approach.
The actual algorithm is not really that complicated, even if the code is horrible to read. In reality the program is nonsensical in places. Where it tests for presence of comma, semicolons, and dashes (so they can be ignored), it seems to ignore all the other things possible. For example the term “clover-like” would be treated as one word. It also discounts the existence of sentences terminated with ? or !. A better way would be to only consider characters in the range a..z and A..Z.
The basic structure of the program can now be used to reconstruct a new program. The top portion of the program (declarations, data assignments) remains basically the same. In the top part of the program, a header is added, and “
implicit none” is included.
program wordstats implicit none
Next is the variable/array declarations section. Here the only real difference is that the declarations are modified to modern Fortran standards (inclusion of
:: operator), and the string
bufr is also modified to modern standards. Notice that the variables for comma, semicolon, and dash have been removed, as they won’t be needed in the reengineered version of the program. The variable
blank has been replaced with
space, mainly because
blank could also imply a tab.
integer :: nw, nc, ns integer :: c, kt, space, slsh, perod, pc character(len=100) :: bufr real :: aws, asw data space,slsh,perod/32,47,46/ data c/32/ data nw,nc,ns,pc/0,0,0,0/
The next part following this is the
format statements. These have been reduced down from four to two. Format statement 104 has been augmented by the addition of a
format for the number of words counted. The statements have also been cleaned up to make them easier to read.
101 format(35x,'input text') 104 format(///,21x,'number of sentences =',i8,/, & 8x,'average number of words/sentence =',f8.2,/ & 10x,'average number of symbols/word =',f8.2,/ & 20x,'number of characters =',i8,/ & 25x,'number of words =',i8)
The tail end of the program that produces the output also remains largely unchanged. The only real change here is the addition of an extra line of output for the number of words – I mean they are calculated, so why not output them?
99 aws = float(nw) / ns asw = float(nc) / nw write(*,104) ns,aws,asw,nc,nw
Now we have to deal with the part of the program that has changed – the actual processing code.