Coding Fortran: Writing an image as a text image file

The corresponding subprogram to write a “text image” is rather simpler. It really just involves opening a file for writing, and then writing each row of the image to file. Simple really.

subroutine write_textimage(fname,img,nrows,ncols)
   character (len=25), intent(in) :: fname
   integer, allocatable, dimension(:,:), intent(in) :: img
   integer, intent(in) :: nrows, ncols
   integer :: i

   open(unit=40,file=fname,status='new',action='write')

   do i = 1,nrows
      write(40,*) img(i,:)
   end do
   close(40)

end subroutine write_textimage

Translating BASIC to Fortran: Twin primes

Some programs seem a bit like enigmas. Such is the case with BASIC programs. BASIC was the first language I saw, on an old Apple IIe in the mid 1980s, although at the time I had no clue what a programming language was. I had no work processing software and so used basic to write my essays as print statements (hey it worked). I use

Consider the following BASIC program to generate twin primes. Primes are odd, so any two consecutive primes must have a distance that is greater than or equal to 2. Pairs of primes with the shortest distance, 2, are called twin primes. For example the following numbers are twin primes: 3,5; 5,7; 11,13; 17,19; 29,31; etc. Straight away you will notice one of the core problems with BASIC… its lack of indenting. This makes it hard to read even a small program. That, and every line has a numeric label associated with it, even if the label does nothing. It begs a little analysis.

Straight away you will notice one of the core problems with BASIC… its lack of indenting. This makes it hard to read even a small program. That, and every line has a numeric label associated with it, even if the label does next-to-nothing. It begs a little analysis. Looking at the code you will notice a few things:

  • The FOR loops are somewhat structured, i.e. they use the keyword NEXT to return to the next iteration of the loop.
  • The IF statements use implicit goto‘s, i.e. just by specifying a label, the program jumps to the statement at the label. In the case of this program, the IF statements all jump to the end of the loop, invoking a C-like “continue” continuum.
  • Variables are somewhat dynamically allocated, by using the keyword LET, which assigns a variable a value.
  • The keyword DIM is used to create arrays of a certain dimension.
  • Most labels do nothing, they just identify a line of code.
  • For some reason (not sure exactly why) the array B has the same name as the loop variable B.
  • The BASIC function sqr() calculates the square-root.

Translating the program just needs some tweaks. Below is the BASIC program converted to Fortran.

program twinPrime
   integer, dimension(1000) :: A
   integer, dimension(400) :: B
   integer :: c, x, d
   real :: s

   A = 0
   c = 0
   s = sqrt(1000.0)
   do d = 2, 1000
      if (A(d) >= 0) then
         c = c + 1
         B(c) = d
         if (d <= s) then
            do x = d,1000,d
               A(x) = -1
            end do
         end if
      end if
   end do
   write(*,*) 'Twin primes'
   do x = 2,c
      if (B(x)-B(x-1) == 2) then
         write(*,*) B(x-1), B(x)
      end if
   end do
end program twinPrime

Here are the changes:

  • In the BASIC program, the FOR loop in lines 120-140 just sets the elements in the array A to zero. This can be done in Fortran in a single line (Line 7).
  • The implicit goto‘s in the BASIC IF statements can be removed by modifying the logic within the do loop starting on Line 10. Instead of evaluating a condition, and jumping to the next iteration of the loop, the logic of both if statements can be inverted, and the second BASIC IF nested within the first.
  • The same logic can be applied to the last IF statement in the BASIC program. Rather than find the difference between two consecutive primes, and continue to the next iteration if the difference is greater than 2, simply only print the two consecutive primes that have a difference of 2 (Lines 23-25).

The caveat of fixed-point numbers

So what is a fixed-point number? A fixed-point number is a real number has a specific number of digits for the integer part of a number, and a specific number for the fractional part. For example, a fixed point number of the form IIII.FFFF would have 9999.9999 as the largest representable number. The smallest non-zero number would be 0000.0001. Typically fixed-point numbers are implemented as integers, i.e. π with a value of 3.14 as a fixed-point number would be represented as 314, with 100 as the scaling factor. For example, a weight scale for baking is rarely accurate below 0.001kg (1 gram), and so no more decimal points are needed.

There are some advantages to fixed-point numbers. Firstly, decimal fractions can be exactly represented. Secondly, fixed-point arithmetic can be faster. Thirdly, and perhaps the most important, addition and subtraction don’t introduce any errors. However all roads do not glitter in gold – multiplication is more of an issue, due to things like overflow. Fixed-point numbers are heavily used in digital signal processing hardware, but few languages support fixed-point datatypes – Cobol and Ada are good examples of some that do. It may be one of the reasons that Cobol has held on to its kingdom so well – nobody wants round-off errors in financial calculations. Cobol declares numerics as fixed-point. For example:

X PIC 9999V99

This declares X to be a fixed-point variable with four digits before the decimal place, and two digits after.

Now I get why fixed-point is used in hardware, for example helping to control resources by eliminating the logic requirements for dynamic range shifts inherent in floating-point operations. Maybe the main benefit of fixed-point arithmetic is that it is as straight-forward and efficient as integers arithmetic. So I understand why fixed-point numbers are used to represent things like money. What I don’t get is the use of fixed-point numbers to represent integers. For example an 8-bit image which has integers in the range 0-255. There are systems which convert 0-255 to the range 0-1, and then store those values as fixed-point numbers. Here’s the issue.

Converting the value 1 to a value between 0-1 is (1/255) = 0.003921568627. If this is stored in the form I.FFFFFFFF then it becomes 0.00392156, likely represented as an integer as 00392156. But what exactly is the point, when 1 could simply be represented as the 8-bit integer 1? Sometimes I think people like to make things more complicated than they have to be. Is there any real benefit? If we calculate the average of a neighbourhood of pixels, it likely results in a real-number – but this real number is truncated, or rounded in some manner to again provide an integer.

Coding Fortran: Reading a text image from a file

Recently I was working on some code to read a “text image” from a file into a Fortran program. Something that is a trivial task in languages like C is somewhat more convoluted in Fortran. This is largely because of the nature of a “text image”, which is essentially a large matrix of integers stored in a file with no identifying information with respect to number of rows or columns. This is likely the simplest form of image. So it is a bit trickier, because Fortran has to figure out these parameters before reading in the data.

So in this post I will outline a subprogram to input a text image from file. The subroutine, read_textimage(), will read an 8-bit (grayscale) image from a ASCII file. Let’s start by constructing the basic code for reading in the text image. First we need to open the file for reading.

open(unit=20,file=fname,status='old',action='read', access='sequential')

This piece of code opens the file associated with the string fname. It assumes the file exists (old), is opened for reading, and access is sequential. Next the number of rows and columns have to be calculated. For the number of rows, a simple loop is constructed which uses a dummy read statement (i.e. does not assign a value to a variable), to facilitate the return of the I/O status specifier error. When error = -1, and end-of-file is encountered, and the loop terminates, otherwise the variable nrows is incremented, counting the number of rows.

do
   read(20,'(I3)',iostat=error)
   if (error == -1) exit
   nrows = nrows + 1
end do

Next we will attempt to find the number of elements per row, i.e. the number of columns. The code is shown below. Note that the first statement is a call to rewind(), which basically resets the file pointer to the top of the file.

rewind(20)
read(20,'(a)') line
do i = 1,3000
   read(line, *, iostat=error) (p, j=1,i)
   if (error .ne. 0) then
      ncols = i - 1
      exit
   end if
end do

The code works by reading a single line of the file, and storing it as a string, line. The code then uses a loop to read the integers in that line over and over again until there are no more, and read generates an error, i.e. error is not 0. An arbitrary value used to represent the maximum number of integers in the string, e.g. 3000. The read() statement uses the string line as input, reading integers until the error occurs. The value ncols is the number of columns in the text image.

Now that the subroutine has calculated both nrows, and ncols, a 2D dynamic array can be created to store the data from the file. The data can be read in as a block of values, straight into the array, with one small caveat. Fortran is a column-major, which means that when read() performs the read from file, it reads the data into column-wise rather than row-wise. So if the data file looks like this:

    210     205     196     192     194     182
    197     202     208     207     194     189
    190     192     201     192     192     188
    185     197     197     177     176     188

It reads the values in the first row into the column of an array, e.g. the first four numbers in the first row will form the first column in the output. This means that the resulting 4×6 array will look like this:

210 194 208 190 192 197
205 182 207 192 188 177
196 197 194 201 185 176
192 202 189 192 197 188

To read in the array properly, we have to read the 4×6 elements from the file into a 6×4 array and then just transpose it (using the built-in function transpose()). First create two dynamic arrays:

integer, allocatable, dimension(:,:) :: img
integer, allocatable, dimension(:,:) :: imgTemp

Then allocate them memory:

allocate(imgTemp(ncols,nrows))
allocate(img(nrows,ncols))

Next, the file pointer is reset to the start, and the entire image is read into the variable imgTemp, which is then transposed into the actual image, img. Once this is done, the memory associated with imgTemp can be deallocated.

rewind(20)
read(20,*) imgTemp
img = transpose(imgTemp)
deallocate(imgTemp)

Here is the whole subroutine:

subroutine read_textimage(fname,img,nrows,ncols)
   character (len=25), intent(in) :: fname

   integer, allocatable, dimension(:,:), intent(out) :: img
   integer, intent(out) :: nrows, ncols
   integer, allocatable, dimension(:,:) :: imgTemp
   integer :: error
   integer :: p, i, j
   character (len=3000) :: line

   io = 0
   nrows = 0
   open(unit=20,file=fname,status='old',action='read', access='sequential')

   do
      read(20,'(I3)',iostat=error)
      if (error == -1) exit
      nrows = nrows + 1
   end do

   rewind(20)
   read(20,'(a)') line
   do i = 1,3000 !max number of integers
      read(line, *, iostat=error) (p, j=1,i)
      if (error .ne. 0) then
         ncols = i - 1
         exit
      end if
   end do
   write(*,*) 'rows=',nrows,' cols=',ncols
   allocate(imgTemp(ncols,nrows))
   allocate(img(nrows,ncols))
   rewind(20)
   read(20,*) imgTemp
   img = transpose(imgTemp)
   deallocate(imgTemp)
   close(20)

end subroutine read_textimage

Coding Fortran: Funny read behaviour from files

All languages do not work in the same way. Consider the following piece of Fortran code which should read 10 integers from a file.

program readfromfile
   implicit none
   integer :: i, n
   open(unit = 100, file = 'test.dat', status = 'old', action = 'read')
   do i = 1,10
      read(100,*) n
      print *, n
   end do
end program readfromfile

Here is the sample input, with each integer on a separate line:

1
2
3
4
5
6
7
8
9
10

The program functions properly when run with this data. However if the data were changed to:

1 2 3 4 5 6 7 8 9 10

Then the program would have some issues. Namely when run it produces an error of the form:

           1
At line 8 of file fileread.f95 (unit = 100, file = 'test.dat')
Fortran runtime error: End of file

It basically reads the first integer, and then baulks. Why does this happen? Well Fortran, by default reads a value, and then proceeds to the next line to read the next value. When the integers are all on the same line in the file, the first read statement reads one integer, then advances to the beginning of the next line to read the next integer but there isn’t a next line, so a runtime error is generated. How can this be avoided?

There are two solutions to this problem. The first is that Fortran allows multiple values to be read into an array in one statement. The code to do this is:

integer, dimension(10) :: data
open(unit = 100, file = 'test.dat', status = 'old', action = 'read')
read(100,*) data
print *, data

A second solution is to specify non-advancing input. This tells read() not to skip to the beginning of the next line after each read statement.

do i = 1,10
   read(100,'(i2)',advance='no') n
   print *, n
end do

Of course if there is more data on another line, the file pointer will have to be moved there using a dummy read(100,*).

The low down on legacy code and languages

It is easy to get terms confused in computing, because every new week there seem to be new ones. When we talk about legacy programs, we are generally talking about old code that is still being used and maintained. Legacy programs are generally coded in legacy languages, i.e. languages or versions of languages which have not evolved. For example it is possible for a program written in 1968 in Fortran IV to still be running on a particular machine (which may itself be a legacy system). This is possible largely because most Fortran compilers are backwards compatible. Fortran IV is a legacy language because it existed in a particular slice in time. In the same vein, K&R C, released in 1978 is also a legacy language.

For languages such as Algol-68, because there are few if any modern compilers, it is more like a dead language. Programs which exist written in Algol-68 are again legacy programs.

Contemporary versions of legacy languages are not legacy languages. For example while Fortran 77 is a legacy language, Fortran 95 or 2003, 2008, or 2018 are modern languages. Modern Fortran is just as relevant as modern C, or Swift.

Why rankings are kind of hogwash

It’s funny how the world revolves around rankings. We seem to have rankings for everything, most notably cities. There’s a ranking for the “quality of living”, a “global cities index”, a “safe cities” index, rankings based on diversity and cultural programming, most livable city, blah, blah, blah. Some are well done, other’s not so much. But here’s the thing, lists and rankings are just one opinion, usually garnered by looking at a few indicators. Academic programs are no different, but here the rankings are often skewed. How you ask? One word – research. Although teaching is the primary role of most universities, it often barely gets a look-in… and if it does there are few if any indicators of good pedagogy, it’s usually stupid things like library budget.

There is one particular ranking in Canada which is prominent over all others… so much so that they surely make a fortune selling their yearly guide to universities. Now I’m sure people read it quite diligently when making a choice, but it’s not the full story. Often even rankings designed to help students to choose a university are biased towards research… and here’s the thing, while research matters, only a limited number of undergraduate students will ever be exposed to it. Research is often too esoteric to make it into the mainstream curriculum, and even more bothersome, some faculty who may be exceptional researchers are abysmal teachers. That’s right, just because someone if brilliant in their subject area does not mean they can teach (if you’re not convinced, watch the Big Bang Theory, it’s closer to the truth than you might think).

This Canadian ranking also provides a ranking in a number of popular fields of which CS is one. The ranking is based on quality and research, and is derived 50% from reputation, and 50% from research reputation. So effectively not really that useful for someone looking for a CS program. It is based on a survey of faculty in the specific fields asking what institutions they considered offered the best programs, and conducted the best research. Hardly an unbiased opinion. What about an institutions ability to teach? What about asking alumni of the programs?

The problem with many of these rankings is that they aren’t exactly comprehensive, especially when it comes to programs like CS. Nobody asks if the CS programs contain relevant and interesting courses, or perhaps innovative pedagogical methods. Do they have innovative undergraduate programs, i.e. one not from the 1980s? Do they have faculty with industry ties or experience? Many faculty in institutions of higher learning have little or no industry experience – so often don’t understand the nuances of software at the commercial level. Rankings don’t take any of this into account, but that’s not a surprise. So to do well in the survey an institution has to have a good reputation for quality amongst the CS community, and have a good research reputation (what does this mean – number of journal papers published?).

So rankings have to be interpreted with a grain of salt. If you are a secondary student looking for an institution to do an undergrad CS degree in, look closely at the courses that are offered by the CS department, and the type of faculty. An institution that ranks in the top 10 of a list may not provide a stellar undergraduate experience. They may have a great research program… but research isn’t everything.

The origins of the word “program” in computing

The word program stems from the Greek prographein = pro (before) + graphein (write), meaning to write publicly. Eventually it evolved to the Greek and late Latin programma, relating to “public notice”. The spelling programme, established in Britain, is from French in modern use and began to be used early 19th century, originally in the “playbill” sense. This eventually evolved to program which is the word most often used in a computer context.

One might expect the computing related context of the word program to stem from the 19th century with the likes of Babbage. The work on his Analytic Engine contains no reference to the word, nor does the work of Ada Lovelace, which only describes “weaving algebraic patterns“.

If we looked up the word program (or programme) in a dictionary from the first half of the 20th century it would define it as “a regular plan of action in any undertaking“. By 1945 the development of ENIAC allowed for the computation of problem solutions based on a “program of operations” [3]. The ENIAC whitepaper [3] discusses the problem of “programming” the ENIAC, and uses the term “program” extensively, but in this context it really refers to the actions required to set up ENIAC for running a computation, in some way a precursor to programs in the modern context. This made sense as the ENIAC predated the use of even the most basic machine code.

In 1948 Goldstine and von Neumann [1] defined the coded sequence of a problem as a “routine”, rather than a program. By 1949 “Mathematics Dictionary“, define programming as “...the process of planning the logical sequence of steps to be taken by the machine.“. This made sense considering the definitions provided by David Wheeler (one of the designers of the EDSAC system) at the “Conference on high speed automatic calculating-machines” in 1949 [2]:

A PROGRAMME is a flowchart showing the operations (in block form) corresponding to the action of the calculator during the solution of the problem.
A ROUTINE is the programme written in the detailed code of a particular machine.

In this context, a routine was likely written in the instruction set of a particular system. A paper in 1950 described a program for playing chess [4] – “The computer operates under the control of a ‘program’. The program consists of a sequence of elementary ‘orders’“. The same author published an article in Scientific American [5] where a program is described as “a sequence of elementary computer orders“. The author also describes “sub-programs”. The program in this instance is still really just an “algorithm” converted to whatever instruction set was used by a particular computer.

The definition likely slightly changed with the appearance of early human-readable “high-level” languages such as Autocode. Brooker, who developed the Mark I Autocode ca. 1955, described “programs” as being written in an idealized language or “instruction code” which would then be converted by the machine to precise instructions [6]. In reality the use of the word “program” likely evolved from the idea of a set of operations, morphing to describe the precursor of an algorithm, and eventually to the code itself.

  1. Goldstine, H.H., von Neumann, J., “Planning and Coding of Problems for an Electronic Computing Instrument”, Part II, Volume III, The Institute for Advanced Study (1948)
  2. Wheeler, D.J., “Planning the use of a paper library”, in Proc. Conference on high speed automatic calculating-machines, p.36-40 (1950)
  3. Eckert Jr., J.P., et al., “Description of the ENIAC and Comments on Electronic Digital Computing Machines“, Moore School of Electrical Engineering, U.Pennsylvania (1945)
  4. Shannon, C.E., “Programming a computer for playing chess”, Philosophical Magazine, 41(314) (1950)
  5. Shannon, C.E., “A chess-playing machine”, Scientific American, 182(2), pp.48-51 (1950)
  6. Brooker, R.A., “The Autocode programs developed for the Manchester University Computers”, The Computer Journal, pp.15-21 (1958)

Why Fortran is easy to learn

Some may think Fortran is an archaic language, and certainly not one that should be taught, but in any respects modern Fortran is an exceptionally good language to teach programming in. Certainly better than C and its idiosyncrasies. Better than Python or Julia you say? In many cases yes. Certainly things like I/O are still problematic for the novice programmer in these languages. Partially it is because modern Fortran is easy to read and comprehend, which is an important factor for the individual learning to program for the first time. The idea of introductory languages is not necessarily that they are suited to commercial endeavours, but rather that they are (i) easy to implement algorithm in, (ii) readable, and (iii) do not contain things that will confuse and bewilder.

Let’s perform the simple task of calculating the distance to the horizon. The calculation is quite simple, all that is required is the height of your eye, i.e. the distance that your eyes are off the surface of the water. The formula for calculating the distance, d, is d=√(2rh), where r is the radius of the earth (6367.45km), and h is the height of the object in metres (divided by 1000 to convert to km). So for an observer standing in a tower 42m above sea level, the distance to the horizon is d=√(2×6367.45×(42/1000)) = 23.13km. So a program to calculate this in Fortran would look like this:

program horizon
   real :: height, dist
   write(*,*) 'Height above sea level(m): '
   read(*,*) height
   dist = sqrt(2*6367.45*(height/1000.0))
   write(*,*) 'Distance to horizon = ', dist, 'km'
end program horizon

There is very little the novice programmer could not understand in this program – even without having any real notion of the syntax of Fortran. The terms read and write imply something is being input and output, the distance calculation is effectively just an equation. The only real sticking point maybe the variable declaration on the second line. The equivalent C program is somewhat more challenging.

#include <stdio.h>
#include <math.h>
int main(void)
{
   float height, dist;
   printf("Height above sea level(m): ");
   scanf("%f", &height);
   dist = sqrt(2*6367.45*(height/1000.0));
   printf("Distance to horizon = %.2fkm\n", dist);
   return 0;
}

Firstly it is four lines longer, and the only thing the same between the two programs is the equation. Two of the lines in the C program are taken up including libraries for I/O and math functions – which Fortran doesn’t require. The printf function deals with output, and is quasi-intuitive… the scanf function is not at all intuitive. What about int main(void) – muddy. What about the Python program, surely it must be better than both Fortran and C?

import math
height = float(input('Height above sea level(m): '))
dist = math.sqrt(2*6367.45*(height/1000.0))
print('Distance to horizon = ', dist, 'km')

Python dispenses with the notion of defining the “program”, allowing the algorithm to take place with little fanfare… I mean it is only 4 lines. But the math library still has to be imported. The prompt for the height is combined together with reading in the value, which is nice and compact, but the user will still question what float() is doing. And Julia?

println("Height above sea level(m): ")
height = parse(Float64, chomp(readline()))
dist = sqrt(2*6367.45*(height/1000.0))
println("Distance to horizon = ", dist, "km")

It suffers from input issues as well, chomping and parsing things – things novice programmers don’t care to know about. The commonality among all these programs is of course the calculation. The usability of a programming language is likely associated with how well it uses appropriate words to describe functions and actions, and uses symbols appropriately.