The code is effectively disassociated from anything legacy at this point… but it still needs some tidying up. Firstly we could combine the final two if statements into an if-else statement, which is effectively what they are. Therefore this code:
if (dist.le.30) vel = 2.425+0.00175*dist*dist
if (dist.gt.30) vel = 0.625+0.12*dist-0.00025*dist*dist
if (dist .le. 30) then
vel = 2.425 + 0.00175*dist*dist
vel = 0.625 + 0.12*dist - 0.00025*dist*dist
Next all the old-fashioned conditional operators could be changed, i.e from .le. to <=. Finally we could improve readability by adding some spaces to equations, and remove the stop statement.
Here’s the final piece of code:
There is nothing inherently difficult about the process of re-engineering. This is of course a very simple example, and re-engineering a 10,000 line Fortran program *may* be somewhat of a nightmare… but it *is* possible. Most of all, one has to develop a rigorous process to re-engineer a program. You can’t just go and start editing code, especially as a *lot* of code may not be well documented, or have algorithms which seem cryptic. Stand-alone Fortran functions can be somewhat easier, as they can always be called from other languages such as Julia.
Got this message the other day at a McDonalds self-service kiosk. Nice message… doesn’t tell you much about the error… was the printer out of paper? jammed? Pressed Yes 5 times – same message. Pressed No and it gave me a an order number – which didn’t mesh with what was being processed on the board. Overall these kiosks work great, they just run into problems with the physical hardware. What about an app where I could somehow place an order when in the store, and receive a digital receipt? No paper issues at all then.
We have covered calling C from Julia, and calling Fortran is not that different. Now it is not that uncommon for glue languages such as Python and Julia to be able to access existing code from other languages. It makes sense from the viewpoint of not having to actually re-engineer code that is known to work. However in Python you have to use something like f2py, the Fortran to Python interface generator. Or make the Fortran code callable from C and then bind that with Cython (see link). Seems like a pain right?
Julia makes this way easier, with no “wrappers” or anything similar required. Works in the same way as calling C code except for a couple of differences. So imagine a piece of Fortran code (contained in SG.f90) with three subroutines:
savgol() – performs Savitzky-Golay smoothing on 1D data.
ludcmp() – performs LU decomposition (called by savgol())
lubksb() – solves a series of linear equations (called by savgol())
Now we want to create a piece of Julia code which calls savgol(), without the need to translate the code to Julia. The first thing that needs to happen of course is that the Fortran code needs to be compiled into an object file.
gfortran -shared -fPIC SG.f90 -o SG.so
This creates the object file SG.so. However if you try and use the function savgol in ccall, you will likely get an error of the form:
ERROR: ccall: could not find function savgol in library
This is because Fortran tends to mangle, or disfigure names. This has to do with case insensitivity in Fortran, and is only problematic in the sense that every compiler does things a little differently. In the case of GNU Fortran, it converts all characters to lowercase, and adds an underscore. If you aren’t sure, you can always check using the utility nm, which checks the symbol table (or name list) of an object file. So if it is applied to SG.so, we obtain:
0000000000000ecd T _lubksb_
00000000000011cf T _ludcmp_
000000000000189d T _savgol_
You can see savgol has an underscore prepended to it as well, this can be ignored. Now it is possible to define the call to ccall() (covered in the next post).
In the end, not every image can be segmented using thresholding. The best images for thresholding are those where there is a good separation between the objects of interest. There are many images which cannot be properly segmented by any means (the presence of colour in images does help, but is not a panacea for segmentation). Good examples of things that lead to reasonable thresholding results include such things as text images, B&W drawings, and images where objects can be easily differentiated. Below are some examples.
Choosing a global or localized thresholding algorithm really depends in the content you wish to segment, and whether or not it is compromised by other things in the image, e.g. non-uniform background, overlapping objects etc. There is no perfect thresholding algorithm, and some images are not optimal for turning into a binary image, or even an image with four sections. It would be like trying to segment the image below.
Now we come to the crux of the re-engineering problem: goto
This is in reality a small program with uncomplicated unstructured constructs. But everyone has to start somewhere. The first four goto statements in reality form a type of “goto” based if-else statement. These can be converted. This means the code below can become much more readable and actually produce less code.
You can see how this works in the marked-up code shown earlier. This is re-engineered into the following structure:
5 if (totdis.le.250) then
tower = 1
dist = totdis
else if (totdis.le.750) then
tower = 2
dist = iabs(totdis - 500)
tower = 3
dist = 1000 - totdis
This means that the conditional statements have to change, because the original code jumped to another point in the program if the conditional is true, and essentially carried out the code after the if, only if the conditional was false. So the first conditional must change from (totdis.gt.250) to (totdis.le.250), and the second conditional changes from (totdis.gt.750) to (totdis.le.750). The code at label 15 becomes the else portion of the if-else structure.
Not forgetting that the newer Fortran if statement uses the then clause, and is terminated with end if. Now that the goto statements have been removed, the associated labels 10, 15 and 20 can also be deleted. Here’s the code at this point:
The next step involves fixing the small stuff, to get the program to actually compile. Most old Fortran programs can be compiled as is due to the backwards compatibility of Fortran compilers. However, convert the extension from .for to .f90 or .f95, and something will break. The first thing to break will be comments. All the “C” delimiters have to be changed to “!“. Then it might compile. Sometimes there are other finicky things that have to be fixed as well, but unstructured code will compile… usually. The “line continuation” operator will also have to change, from + to &.
Next, simple stuff like compressing the first format statement from 3 lines to 2, adding some whitespaces in places (e.g. if( → if ( ), and form single goto statements from “go to“. It is important to have a sequence of code files when re-engineering. Make some changes, document the changes in the file, and compile the program to make sure nothing breaks. Make a copy, and make new changes in that file. DO NOT make all the changes in the same file. That’s a CRAZY idea.
Finally, the variable declarations are modified to use :: .
The next step in the re-engineering process involves reading the code. People often forget to read the code. I can see the eyes rolling now… “that may work for small programs, but what if my code is 3000 lines long?”. Yes, I agree, it’s not easy to read a piece of code that long… but what other way is there to understand what is happening? Documentation? – yes that helps, *if* it exists. Maybe it doesn’t. Anyways, let’s look at the code (on paper).
Notice from this analysis that there are two main pieces of unstructured code. The first relates to four goto statements and essentially constructs an if-else if-else statement. In the first if statement, if totdis is greater than 250, then it jumps to the code at label 10. If it is less-than-or-equal-to 250, then if performs the code directly below the if statement, then jumps to label 20, which represents the end of the conditional. Similarly with the next if statement. So by converting this piece of code, four goto statements disappear.
The remaining goto is associated with a global loop, encompassing more of the working code. It’s easy to convert this to a loop. Of course there are a myriad of smaller things, but this analysis has provided a basis for the next step.
Image binarization is intrinsically linked to the image histogram, that 1D representation of the distribution of greys within an image. There are many forms of histogram, as outlined in a previous post. The trick is turning a grayscale image into a binary image without loosing key information, and to that end binarization is somewhat of an art. Why you ask? Shouldn’t it be a science? If you are processing images that are all very much alike, some experimentation will provide a means of thresholding the images so you get very similar results in all cases. For example, consider a factory that makes building toys that uses an image verification system to check that parts are accurate. All parts are photographed using consistent lighting, and there is sufficient contrast between objects and background to make the task easy and reproducible. Consider the image of the gear below (with suboptimal lighting as it is just an illustration). The task may be to determine that none of the gear’s teeth are defective.
The histogram of this image looks like this:
Which represents a bimodal distribution: one mode represents the gear, the other the background region in the image. This is an ideal scenario. Now a threshold can be selected somewhere in the “valley” between the two peaks. Naturally this image is not completely perfect, as a machine vision system would have optimal lighting which would illuminate the object in a uniform manner. If we binaries the image using Otsu’s algorithm, we get a threshold value of 108.
The algorithm produces a binary image which shows the teeth quite nicely, however the inner portion of the gear has one side, and two of the holes missing. This is not such a problem is we only care about the gears teeth. Would another algorithm do a better job?
Here are the results from applying 16 different binarization algorithms (courtesy of ImageJ):
As you can see from the results, none of them is perfect. Many extract the teeth, but not the inner holes, others extract the inner holes at the expense of some of the teeth. It’s not a perfect system.
Now imagine trying to create a generic thresholding algorithm? Difficult? Yes, highly. So binarization becomes an art because you often have to use a creative approach to finding the appropriate algorithm to perform the thresholding – and in some cases it might be too complex an image for any algorithm to produce results.
It is interesting to read about new algorithms in image processing. As mentioned in the previous post, there are a multitude of contrast enhancement algorithms out there. Go on… read one or to of the papers associated with the algorithms. Then *try* to reproduce the algorithm. In some cases it will be easy. In others? Well, let’s just say that with the information given, the algorithm cannot be reproduced. Worse still? The author has taken the liberty of testing the algorithm on images which show dramatic results, or has failed to compare the results against a simple algorithm. Here’s one of the images used to test the algorithm:
Pretty blah right? Thats because the histogram is skewed completely to the left.
Now many of the algorithms the authors use to compare their algorithm against fail on this image. Not surprising… it has one peak. Histogram equalization? It works quite nicely.
The published algorithm also works quite well… but then again, so does stretching the histogram.
This is one of the reasons it’s hard to get excited about the field of image processing sometimes. It has become overwhelmed with lack-luster algorithms.