Reengineering a messy “arithmetic if” in Fortran (iii) – reverse engineering

Programs that are too messy to deal with, i.e. especially those that could well be identified as “spaghetti code”, are often better treated using reverse engineering. Basically reverse engineering analyzes the program to determine what exactly it does, and using this information, reconstructs a new program with the same parameters. This example is ideally suited because of some of the arithmetic if statements, and the general confusion they cause. From the perspective of time, it is not worth the effort to try and reengineer things in the traditional sense of things to make it work.

This program basically does the following:

  1. Read in a series of characters from the standard input into a buffer string, bufr.
  2. Set the value of the index kt to 1.
  3. Parse the string bufr, using the index kt Each element from the string is converted to its respective integer ASCII value.
  4. The following actions are then performed:
    • If c is a period, “.”, it designates the end of a sentence, and the sentence counter ns is incremented.
    • If c is a slash, “/”, it designates the end of input, and the program proceeds to output the statistics and terminate.
    • If c is not a comma, “;”, semicolon, “;”, or dash, “-“, (i.e. any other character) the character counter, nc is incremented.
  5. The index value kt is incremented. If kt is less than the length of the buffer, the program loops back to 3, otherwise it loops back to 1.

Once the program has been reverse-engineered, it is then time to make a choice. Do we build a new program based on the logic of the original algorithm, or do we tweak the algorithm? Taking the first approach means that we likely have to do twice the work, as once the program is written it will have to be enhanced to improve the algorithm and remove nonsense code. Taking the second approach means that we end up with an improved algorithm. We will take the latter approach.

The actual algorithm is not really that complicated, even if the code is horrible to read. In reality the program is nonsensical in places. Where it tests for presence of comma, semicolons, and dashes (so they can be ignored), it seems to ignore all the other things possible. For example the term “clover-like” would be treated as one word. It also discounts the existence of sentences terminated with ? or !. A better way would be to only consider characters in the range a..z and A..Z.

The basic structure of the program can now be used to reconstruct a new program. The top portion of the program (declarations, data assignments) remains basically the same. In the top part of the program, a header is added, and “implicit none” is included.

program wordstats
   implicit none

Next is the variable/array declarations section. Here the only real difference is that the declarations are modified to modern Fortran standards (inclusion of :: operator), and the string bufr is also modified to modern standards. Notice that the variables for comma, semicolon, and dash have been removed, as they won’t be needed in the reengineered version of the program. The variable blank has been replaced with space, mainly because blank could also imply a tab.

   integer :: nw, nc, ns
   integer :: c, kt, space, slsh, perod, pc
   character(len=100) :: bufr
   real :: aws, asw
      
   data space,slsh,perod/32,47,46/
   data c/32/
   data nw,nc,ns,pc/0,0,0,0/

The next part following this is the format statements. These have been reduced down from four to two. Format statement 104 has been augmented by the addition of a format for the number of words counted. The statements have also been cleaned up to make them easier to read.

   101 format(35x,'input text')
   104 format(///,21x,'number of sentences =',i8,/, &
                   8x,'average number of words/sentence =',f8.2,/ &
                  10x,'average number of symbols/word =',f8.2,/ &
                  20x,'number of characters =',i8,/ &
                  25x,'number of words =',i8)

The tail end of the program that produces the output also remains largely unchanged. The only real change here is the addition of an extra line of output for the number of words – I mean they are calculated, so why not output them?

99 aws = float(nw) / ns
   asw = float(nc) / nw
   write(*,104) ns,aws,asw,nc,nw

Now we have to deal with the part of the program that has changed – the actual processing code.

The art of naming variables

Naming variables is one of those tricky things few people seem to talk about in introductory programming courses. Maybe it is assumed there is some sort of magic knowledge.

Identifiers can make up to 70% of a program, so it is important to get them right the first time. Classically, identifiers allow the alphabetical characters, numbers, and a limited set of symbols for use in making identifiers, be it for variables or functions. Symbols include the likes of the underscore character. Early programming languages such as Lisp also allowed the use of the hyphen between words that form what we could term compound-identifiers, e.g. END-OF-FILE. Cobol also allowed the use of the hypen, but that was because its operators were individual English words, i.e. SUBTRACT was used instead of -.

Good names make a program more readable. Consider the following example:

a = b * l;

This is syntactically correct, but its meaning is opaque. Contrast this with:

area =  breadth * length;

Here meaning is provided without having to write comments. A good name should be short, descriptive and precise. Long names mean that everything becomes longer. Simple expressions become long, and less readable. For example:

double sphere_surface_area, sphere_radius, pi;
sphere_surface_area = 4.0 * pi * sphere_radius * sphere_radius;

versus:

double sphere_SA, sphere_R, pi;
sphere_SA = 4.0 * pi * sphere_R * sphere_R;

The first example is certainly very descriptive, however the expression has become extremely long and awkward. A name should be descriptive, but this does not mean that it has to be long. The second example uses shorter words which are just as descriptive in the context they are being used. In this case the context is surface areas of geometric solids, so we have used standardized suffixes to denote characteristics such as surface area (SA) and radius (R). Where sphere_SA denotes the surface area of a sphere, cylinder_SA could similarly be used to denote the surface area of a cylinder. We could also have chosen suffixes such as surfA, sArea.

A good name is worth a paragraph of comments. Here are some general guidelines:

  • Variable names should start with a lowercase character. For example: sphere_Radius, sphere_radius, sphereRadius
  • Temporary variables should have names which are short and reflect the fact that they offer temporary storage. For example, tmp, temp or tempInt.
  • The prefix n should be used with variables representing the number of objects. For example, nBacteria, n_edges.
  • Make all variable names either singular or plural. Identifiers should not only differ by the addition of the letter s. For example, cell and cells isn’t too good, it would be better to use cell and cell_array.
  • Variables which are indices in a loop should be named or prefixed with i, j, k, etc.

Negative boolean names should be avoided! When these are coupled with a negation operation, you get a double negative, which can be hard to decipher. For example, if we have the variable does_NotExist, then !does_NotExist is a double negative. Best to use something like: exists. In a similar vane, be careful when using the letters I (capital i), 1 (one) and l (lowercase ell) in a variable.

When it comes to variables for descriptive items with more than one word, e.g. “monthly rainfall”, it is possible to use mnemonics, or perhaps the use of Camelcase.

What are the most common website usability mistakes?

No matter how well a website is designed there are always small things that can be found that impact the usability of a website. This happens even in “templates” that are available for WordPress, and Squarespace. What are the most common of these mistakes, which can generally be found on the homepage?

  • Thin, light, tiny fonts – Fonts that are so small, and light that they are hard to see for the average user, let alone anyone who is visually impaired. This sometimes occurs in places like menus, or even logos that have a tiny font that is impossible to read. There is no point putting content on a website if it is unreadable.
    • Solution: Text must be legible. Use a font size that people can actually read. Use at least 16px font for the body text.
  • Lack of contrast – Small fonts is one thing, a lack of contrast is an even bigger pitfall: A combination of a light typeface with low contrast that seriously impedes usability due to lousy readability.
    • Solution: Test text/background combinations in a checker like Colorable.
  • Minuscule logos – If a business has a logo, it should display it in a prominent place. Why then do some websites have a logo which is so small it’s barely perceptible?
    • Solution: Create logo’s that are easily identifiable, and stand out on the website.
  • Micro-clickable areas – Hyperlinks are designed to be clicked, so to make them usable, it makes sense to ensure that they’re easy to click. A large clickable area makes it easier to hover the mouse cursor over the link.
    • Solution: Use larger fonts or increasing padding to create a larger clickable area around the text.
  • Lack of relevant information – User engagement is very important, and a good way, of course, is to have a FAQ on a website. A website should also provide an easy way for users to communicate with a company/individual – and many don’t. This sometimes extends to simple information like “hours of operation.” There is nothing worse than looking at a restaurant website that fails to include the hours they are open, or indeed some sense of the menu they cook (or even if they are open on a holiday).
    • Solution: Add an email, contact form, or even (heaven forbid) a phone number. Online forums are a good place for customers to ask/answer questions. Good customer service will bring them back.
  • No search function – Sometimes this is the first thing people look for on a website (these are called Search-Dominant users). Whether you are running an e-commerce business or writing a blog, everyone needs a search feature. The exception to the rule are the simplest of websites, that offer three items for sale, and don’t have any other things on the website, e.g. blogs.
    • Solution: Add a search feature. You don’t even have to code one, Google, Bing, Yahoo all work well, and can be easily added to a website (to search that site).
  • Dead links – A link that isn’t meant to change (i.e., is hard-coded) is called a permalink. When the link changes on the server (i.e., a webpage is deleted, moved, or a domain changes), the hardcoded link now points to nowhere, and are called dead links.
    • Solution: Check your links frequently for dead ones. It couldn’t be easier using a link checker… like W3C.
  • Lack of an About page – People want to know who the individual/company is. Not having an About page (or having a lacklustre one) makes people wonder about the validity of the website. It helps people form a connection, which is extremely important, especially if the website is for a small business.
    • Solution: Add an About page.
  • Lack of a shopping cart – If a website has a store/shop, then it should also have a shopping cart. Failure to provide a visible shopping cart can lead to distrust from the user. It should also be a shopping cart which is identifiable.
    • Solution: Add a shopping cart, and don’t make it anything boutique.
  • Poorly written content – A website should not only have good design but good copy – the text content of a website. Visitors might enjoy looking at the beautiful pictures, and animations, but they will still need to read text to process information. But people don’t read content from top to bottom (as in books). People tend to jump from one piece of content to another, or one website to another (yeah, we all do it). People tend not to read websites top to bottom; they start reading whatever pops out at them first and then move to the next thing that captures their interest.
    • Solution: Provide a few areas of focus (i.e., things that attract visitors’ attention). This can be achieved using high-contrast colours, alluring images, and large fonts. Use descriptive headings – informative yet concise. The text should be short and easy to digest – nobody wants to read large novels online.
  • Sameness – Lots of websites seem to use the same template. While this is not exactly a usability thing, doing the same thing can be construed as being somewhat bland.
    • Solution: Try something new, and use a template that’s a little left of centre. Things that are a little different will be remembered.

Exploring website usability (ii)

After exploring numerous websites looking for honey (hey, it’s tasty, and healthy), I came to the conclusion that there seems to be an endemic when it comes to certain design traits amongst small business websites. Specifically small menu fonts, and small logos, and particularly in website headers. This could be in part because of the use of web-templates. Here is another example, where you really have to look closely to even try and decipher the logo.

Here is a website with a better designed header, with a nicely centred logo, and menu underneath. Top-right shows the link to Instagram, and a search bar on the top-left. It’s an aesthetically pleasing website, with lots of white-space. The one thing missing is a shopping cart – going to the SHOP and selecting an item takes the user straight to the checkout, which isn’t ideal if somebody wants to buy more than one item.

The third website homepage, shown below is again very aesthetically pleasing. The logo is well placed and sized (although some of the text around the website’s name is small). The menu is well situated, and there is a shopping cart, with content indicator. The image is beautiful as well. Here the downside is a lack of search feature. Now it a store has fewer than 10 items, it is a trade-off as to whether a search facility makes sense, however the presence of a blog could make it more convenient for visitors to find things on the site.

Is C++ just pure dark magic?

I started coding in C++ in 1991. It was the focus of my honours thesis, “OO Problem Solving in C++“, or something like that. It reformulated common programming problems using the OO characteristics of C++. During my masters degree I took a course in OO design, which opened my eyes to the complete nonsense that OO was (taught by someone who knew little more about OO than we did). After that, I didn’t touch OO much, let alone C++. OO was meant to be some utopian way of programming, and was meant to change how program design and implementation was done. But one only had to read Rumbaugh’s Object-Oriented Modeling and Design (a horrible book) to realize that what it really was, was a cult.

Objects, objects everywhere… where they go? Nobody cares.

C++ was obviously the first vessel of the pure darkness that was OO. C++ started life as “C with classes”, created by Bjarne Stroustrup. In 1983 it was renamed to C++.

In 2007 Linus Torvalds wrote the following to a newsgroup responding to the use of C in the source code of Git:

C++ is a horrible language. It’s made more horrible by the fact that a lot of substandard programmers use it, to the point where it’s much much easier to generate total and utter crap with it. Quite frankly, even if the choice of C were to do nothing but keep the C++ programmers out, that in itself would be a huge reason to use C.

You really only have to look at the circus that is the code for the F-35, written in C++, to realize that the language is likely beyond it’s use-by date. The F-35 supposedly has 8-million LOC, most of which we presume were written in C++. Why they didn’t use Ada for a real-time system is beyond me (supposedly because they couldn’t find enough Ada programmers). It probably is a mix of poorly written C++ code, and the fact that they are using a language which was never meant to be used to fly a plane which is causing the issues. Poorly written C++ code is likely causing issues with memory management.

The difference between C++ and Ada of course is a subtle one – Ada won’t allow the sort of nonsense C++, and indeed C does. I mean it was Stroustrup himself who said that “C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do it blows your whole leg off.” Not exactly reassuring from the guy who created the language.

Or maybe C++ isn’t pure dark magic, programs written in it are just poorly constructed, possibly by people who don’t have enough knowledge about C++ to code in it effectively… or perhaps they are just poor programmers (I mean they do exist, and many have degrees in computer science). I gave up on C++ in the early ’90s, before it even gained a standard. The language is just too complex. The C++98/03 specification was 879 pages long. In comparison, the C++20 standard is 1834 pages in length.

Want to learn more? Check out Belay the C++, a blog dedicated to enlightening people about poor software practices in C++.

Exploring website usability (i)

There are few perfect websites in the world from a design perspective. Usually most websites have things that just don’t work from a usability viewpoint. This series will look at the flaws on the home pages of a series of websites. These reviews have nothing to do with the content of the website, but relate to aspects of its design, perhaps, fonts, colours, or pictures. The first website has a very nice looking green and yellow website (names have been blurred).

Unfortunately there are a number of usability issues on this homepage. The first is the fact that the menu text is somewhat small. A bigger issue is actually the lack of contrast between the green font, and the yellow background. Using the WebAIM Contrast Checker, it is easy to determine that the contrast ratio between the green (81890C) and yellow (FDCB51) is 2.5:1. Web accessibility requirements need a contrast ratio of at least 4.5:1 for normal text. The use of the black text in the logo box has a nice contrast ratio of roughly 11:1.

The other real issue is the use of the logo. It seems to be pasted in the middle of the main image, rather than in the traditional top-left, next to the menu bar. That means when the user scrolls up the page, the logo disappears. I also don’t understand the “noticeboard” framing, and other text, including the large “Start Shopping” button. The shopping “bag” is well placed, and the image of the bee is very nice. There also does not seem to be a search bar of any sort. I know for small websites it seems like too much effort, but it’s amazing how many times people search for things rather than browsing through pages.

There may also be some issues for colour-blind users. To check these out one can use Coblis, a colour blindness simulator. For example Deuteranopia is the most common form of red-green colour deficiency. Below is a sample of what the home page might look like to someone with Deuteranopia.

The website viewed from the perspective of someone with Deuteranopia colour deficiency

Some of these issues are not uncommon on small e-commerce websites, because people who make the websites don’t necessarily think about them. Below is a second website focused on honey. Overall its a nice website, with huge and vibrant product images. The issues here are small. Firstly, unless the page is zoomed to the full size of the screen, the menu exists as a small hamburger-type menu in the top-right. There’s nothing wrong with that when the window is made smaller, but hiding the menu on a large-view screen means an extra step for the user.

When the text does appear, it is *tiny*. Like really small (see picture below). This seems to be a common flaw among small business websites today. Maybe it is the use of templates to build the website? Nobody seems to question the design decisions made. It may be that there are too many menu options? Six of the menu options are product categories, so it would be better to see theses in a drop-down menu. The logo is also incredibly small. It would be better to make the “header” deeper, and incorporate a decent sized logo. The final sting? Well, there are two actually – (i) No search functionality, and there is no icon for the shopping “Panier” or basket. Having an icon makes it easy for anyone to find the shopping cart (words are not needed, as long as the icon used is universal).

Very small logo, small font size, and lack of a shopping cart icon.

Reengineering a messy “arithmetic if” in Fortran (ii) – the analysis

Unlike many industrial programs, this is an uber small program to reengineer. Conversely there are some things that are universal. Usually that involves converting the program to lowercase, modifying the comment delimiter from C to !, and changing any machine dependent I/O references. Thankfully this program only needs to be converted to lowercase, otherwise it will compile as is. Some might argue, that if it compiles and runs, then why change anything? Mostly for the purposes of maintainability, and removing features that Fortran no longer really supports. When compiled with gfortran, this program will produce numerous warnings of the form:

Warning: Fortran 2018 deleted feature: Arithmetic IF statement at (1)

If will still compile, but these issues should be dealt with. Now firstly, we may want to actually analyze the program before diving in head-first. The easiest way to do this is to print out a copy of the program, and mark the issues of note, and the flow of the program. First the general legacy issues with the program (see Figure 1). These are things that are generic issues with most legacy Fortran programs. Here is a list of some issues:

  1. Add a program header, as there isn’t one.
  2. Add an “implicit none” statement because we want all variables declared.
  3. Convert the variable declarations to include the :: separator.
  4. Modify all arrays to modern declarations.
  5. Change the line continuation character from a prefix * to a suffix &
Fig.1: Generic legacy Fortran issues

The next set of issues involve the structures associated with the program flow, i.e. the arithmetic if‘s, and the lone goto. The extent of the problem can be visualized in Figure 2. The arithmetic if‘s form various kinds of structures, from if-elseif-else constructs, to loops of differing forms.

Fig.2: Program flow issues

It is in reality, to use a British saying, it’s a bit of a “dog’s breakfast” of spaghetti code. It is possible to reduce some of the more benign pieces of code – for example the arithmetic if‘s at labels 40, 50, 55, and 60 can be easily transformed into a nested if construct. Some of the other labels accept jumps from various arithmetic if‘s in the program, and even constructing loops to replace some is, well, tedious. Honestly, this program is one which is better suited to some reverse engineering.

Reengineering a messy “arithmetic if” in Fortran (i)

The arithmetic if can be easy to remove if it just forms a basic if-then-else statement, that is one which is in close proximity. Removing one where the jumps go beyond the local region is more challenging. Consider the following Fortran IV or 66 program which counts sentences, words and characters in a line of text. A slash marks the end of the text. It is derived from a program on pages 28-29 of Kernighan and Plauger’s “The Elements of Programming Style” (1974). I had to change some things from the original program so that it would actually compile (some things are not so backwards compatible).

      INTEGER NW,NC,NS
      INTEGER C,KT,BLANK,COMMA,SCOL,DASH,SLSH,PEROD
      CHARACTER BUFR(72)
      REAL AWS,ASW
      DATA BLANK,COMMA,SCOL,DASH,SLSH,PEROD/32,44,59,45,47,46/
      DATA C/32/
      DATA NW,NC,NS,KT/0,0,0,73/
  101 FORMAT(35X,'INPUT TEXT')
  102 FORMAT(72A1)
  103 FORMAT(4X,72A1)
  104 FORMAT(///,21X,'NUMBER OF SENTENCES=',I8,/,7X,'AVERAGE NUMBER OF
     *WORDS/SENTENCE=',F8.2,/10X,'AVERAGE NUMBER OF SYMBOLS/WORD=',F8.2,
     */25X,'NUMBER OF WORDS=',I8)
      WRITE(*,101)
   10 READ(*,102) BUFR
      WRITE(*,103) BUFR
      KT=KT-72
      IF(C-PEROD) 20,35,20
   20 C=ICHAR(BUFR(KT))
   25 IF(C-PEROD) 40,30,40
   30 NS=NS+1
      NW=NW+1
      KT=KT+2
      IF(KT-72) 35,35,10
   35 C=ICHAR(BUFR(KT))
      IF(C-SLSH) 25,75,25
   40 IF(C-BLANK) 50,45,50
   45 NW=NW+1
      GO TO 70
   50 IF(C-COMMA) 55,70,55
   55 IF(C-SCOL) 60,70,60
   60 IF(C-DASH) 65,70,65
   65 NC=NC+1
   70 KT=KT+1
      IF(KT-72) 20,20,10
   75 AWS=FLOAT(NW)/NS
      ASW=FLOAT(NC)/NW
      WRITE(*,104) NS,AWS,ASW,NW
      CALL EXIT
      END

Here is a sample input and output from the program.

                                   INPUT TEXT
The cat in the hat came back. Wrecked a lot of habits. /
    The cat in the hat came back. Wrecked a lot of habits. /

                     NUMBER OF SENTENCES=       2
       AVERAGE NUMBER OF  WORDS/SENTENCE=    6.00
          AVERAGE NUMBER OF SYMBOLS/WORD=    3.42
                         NUMBER OF WORDS=      12

This piece of code, with 40 lines of code has 9 arithmetic if statements. That’s incredibly horrible, and frankly it is a poster-child for spaghetti code.

Coding Fortran: Writing an image as a text image file

The corresponding subprogram to write a “text image” is rather simpler. It really just involves opening a file for writing, and then writing each row of the image to file. Simple really.

subroutine write_textimage(fname,img,nrows,ncols)
   character (len=25), intent(in) :: fname
   integer, allocatable, dimension(:,:), intent(in) :: img
   integer, intent(in) :: nrows
   integer :: i

   open(unit=40,file=fname,status='new',action='write')

   do i = 1,nrows
      write(40,*) img(i,:)
   end do
   close(40)

end subroutine write_textimage

Translating BASIC to Fortran: Twin primes

Some programs seem a bit like enigmas. Such is the case with BASIC programs. BASIC was the first language I saw, on an old Apple IIe in the mid 1980s, although at the time I had no clue what a programming language was. I had no work processing software and so used basic to write my essays as print statements (hey it worked). I use

Consider the following BASIC program to generate twin primes. Primes are odd, so any two consecutive primes must have a distance that is greater than or equal to 2. Pairs of primes with the shortest distance, 2, are called twin primes. For example the following numbers are twin primes: 3,5; 5,7; 11,13; 17,19; 29,31; etc. Straight away you will notice one of the core problems with BASIC… its lack of indenting. This makes it hard to read even a small program. That, and every line has a numeric label associated with it, even if the label does nothing. It begs a little analysis.

Straight away you will notice one of the core problems with BASIC… its lack of indenting. This makes it hard to read even a small program. That, and every line has a numeric label associated with it, even if the label does next-to-nothing. It begs a little analysis. Looking at the code you will notice a few things:

  • The FOR loops are somewhat structured, i.e. they use the keyword NEXT to return to the next iteration of the loop.
  • The IF statements use implicit goto‘s, i.e. just by specifying a label, the program jumps to the statement at the label. In the case of this program, the IF statements all jump to the end of the loop, invoking a C-like “continue” continuum.
  • Variables are somewhat dynamically allocated, by using the keyword LET, which assigns a variable a value.
  • The keyword DIM is used to create arrays of a certain dimension.
  • Most labels do nothing, they just identify a line of code.
  • For some reason (not sure exactly why) the array B has the same name as the loop variable B.
  • The BASIC function sqr() calculates the square-root.

Translating the program just needs some tweaks. Below is the BASIC program converted to Fortran.

program twinPrime
   integer, dimension(1000) :: A
   integer, dimension(400) :: B
   integer :: c, x, d
   real :: s

   A = 0
   c = 0
   s = sqrt(1000.0)
   do d = 2, 1000
      if (A(d) >= 0) then
         c = c + 1
         B(c) = d
         if (d <= s) then
            do x = d,1000,d
               A(x) = -1
            end do
         end if
      end if
   end do
   write(*,*) 'Twin primes'
   do x = 2,c
      if (B(x)-B(x-1) == 2) then
         write(*,*) B(x-1), B(x)
      end if
   end do
end program twinPrime

Here are the changes:

  • In the BASIC program, the FOR loop in lines 120-140 just sets the elements in the array A to zero. This can be done in Fortran in a single line (Line 7).
  • The implicit goto‘s in the BASIC IF statements can be removed by modifying the logic within the do loop starting on Line 10. Instead of evaluating a condition, and jumping to the next iteration of the loop, the logic of both if statements can be inverted, and the second BASIC IF nested within the first.
  • The same logic can be applied to the last IF statement in the BASIC program. Rather than find the difference between two consecutive primes, and continue to the next iteration if the difference is greater than 2, simply only print the two consecutive primes that have a difference of 2 (Lines 23-25).