Memory in C – peeking at the HEX addresses (i)

The nice thing about C is that it allows you to peer into the machine using printf and the %p specifier to output hexadecimal addresses of variables and the like. In this post we’ll look briefly at how the four segments of memory are represented. (The entire program is at the end of the post).

Initialized Data Segment

More commonly called the data segment,which contains global variables and initialized static variables. For example, global variables (outside main):

int globalVar = 24;
char globalStr[] = "global string";
const char *strlit = "string literal";

in addition to localized static variables (inside main):

 static int staticVar = 12;

If the hexadecimal addresses are output:

INITIALIZED DATA SEGMENT
global variable: 24              at 0x105e75038
  global string: global string   at 0x105e7503c
 string literal: string literal  at 0x105e74db8
static variable: 12              at 0x105e75058

Now let’s look at the memory a little closer (with HEX converted to decimal).

INITIALIZED DATA SEGMENT 
globalVar:    size=4    memory position=4394012728
globalStr:    size=14   memory position=4394012732
   strlit:    size=8    memory position=4394012088
staticVar:    size=4    memory position=4394012760

The first global variable, globalVar, is stored at position 4394012728, which at 4 bytes in size takes us to the start position of the global string, globalStr. The string literal, strlit, is 8 bytes in length ( on a 64-but machine), and signifies a pointer. The global character array globalStr starts at 4394012732, and is 14 bytes in length (13 + 1 for ‘\0’).

Note that the data segment can be further classified into initialized read-only, and initialized read-write regions. The variables globalvar, globalStr, and staticVar are stored in the read-write region. In the case of strlit, the string literal, “string literal” is stored in the read-only region, and the character pointer variable strlit is stored in the read-write region.

UNInitialized Data Segment

The uninitialized data segment, often called the BSS (block started by symbol.), contains all global variables and static variables that are initialized to zero or do not have explicit initialization in source code. A global variable  (outside main) with no initial value can be created in the following context:

int globalVarUninit;

This is stored in the uninitialized data segment in the following manner:

UNINITIALIZED DATA SEGMENT 
uninitialized global: 0 at 0x105e7505c

Now let’s look at the memory a little closer (with HEX converted to decimal).

UNINITIALIZED DATA SEGMENT 
globalVarUninit:    size=4    memory position=4394012764

Notice that the memory location is after the last memory position in the initialized data segment.

 

 

Advertisements

Segmenting eyes through thresholding (ii)

Having done some experimentation, it is now time to expand the exploration to testing a series of images with the algorithms that how the most promise, namely:

  • Global: Otsu, Triangle
  • Localized: Phansalkar

Here are four images, with the image used in our original experimentation in the top-right.

 

Global threshold with Otsu

In three of the images, the iris has been quiet well segmented, however in the image where the iris is most differentiated from the white region of the eye (top-left), Otsu has failed to threshold the iris successfully.

Global threshold with Triangle

In the first test image, thresholding using the Triangle algorithm must have been somewhat of outlier as it failed to threshold the pupil, however in the lower two images, it has successfully extracted the iris.

Localized threshold with Phansalkar

The other localized algorithm, Phansalkar, has extracted the pupils successfully in each image. Each pupil could then be extracted as the largest circular shaped “blob” in each image, after some post-processing to clean up the image. Eyelashes are also easily extracted (for anyone wanting to do eyelash detection!

Segmenting eyes through thresholding (i)

Let’s evaluate the usefulness of using simple thresholding (binarization) to extract different regions of interest from the image of an eye. In this case let’s try for pupil and iris segmentation. There are two ways we could proceed here: using a global thresholding algorithm, or a more localized one. When choosing a thresholding algorithm, it is often a case of trial-and-error, so having a means of testing a whole series of thresholding is an effective way of doing this. This option is available in ImageJ, and it’s one of the reasons ImageJ is my go-to application for playing around with images, during the exploration phase. Let’s consider this image of the eye:

Now let’s test this from the perspective of global and local thresholding algorithms (in grayscale). Let’s do the global thresholding algorithms first.

The results are what one would consider expected. Most algorithms have delineated the iris quite effectively, with the Triangle algorithm actually extracting the pupil. Here is the histogram, to get some perspective:

There are two peaks in this histogram, at either end of the spectrum representing the white of the eye, and the pupil. In the middle is a large plateau representing the rest of the image. However not every image of the eye will look the same.

Moving on to the localized thresholding algorithms:

Here only one algorithm, that of Phansalkar, actually segments the pupil in a reasonable manner. Localized algorithms don’t rely on the global histogram, so aren’t distracted by differences between images. So where does this leave us? With some preliminary data about the algorithms capacities to segment parts of the eye – not perfectly of course, because there is no thresholding algorithm that is perfect, largely because there is no image that is truly perfect.

Next we will see how these three algorithms perform on three other images:

  • Global: Otsu, Triangle
  • Localized: Phansalkar

 

Programming languages are the interface to the machine

We interact with machines in many ways, but the most primeval means is by writing programs. Programs become the soul of the machine, without them, the machine is useless, soulless. The programming language is the medium between the world of humans and the world of 1’s and 0’s.  Yet despite 60 odd years of programming language design, we are yet to design one which is easy for everyone to use. This is partially due to the intricacies of the machine, but also due to the fact that we don’t design for language usability per se. The easiest language for people to learn programming language concepts is a simple one – very little syntax, easy to decipher. The problem of course is more complex than that because programming does not only involve writing code.

There is the task of solving a problem, then developing an algorithm, and converting that algorithm into a programming language. Nothing is trivial. For example it is easy to state a problem of the form: “find all faces in a photograph“. The problem *seems* simple because to humans it is. Developing an algorithm is considerably more complex.

There are also differences between natural language and the programming language, what some would term linguistic transfer. Programming is difficult for novices because of how humans interpret a language. A case in point is the use of  while to indicate a loop.

while (n < 1000)
  -- do something constructive
end

The term while could be construed as a period of time, but more often it is misinterpreted as being applied continuously (i.e. a continuously active test), versus being evaluated once per iteration of the loop. In English one could say “while the sun is shining, play outside“, which is a loop of sorts, but not one we actively think about. We don’t look at our environment every minute to determine whether the sun is still shining. A novice programmer may misconstrue a while loop to terminate the minute the condition is satisfied.  It is an instinctive thing. Ada may actually have had a good idea with the concept of a loop where the exit is introduced within the loop itself.

loop
   exit when (n < 1000);
   -- do something
end loop;

We know a physical loop is continuous, be it in the form of a rope loop, or a loop in a road (or even the new Apple HQ building at Apple Park). Leaving the loop in the road involves making a conscious choice to leave while in the loop. This actually makes sense in many ways because there is one generic loop to deal with most situations. It could also deal with the case where one wants to do something n times, but that may itself be better expressed using something like a repeat statement:

repeat n times
   -- do something
end

Or what about this?

do repeat i with values 1 to n
   -- do something
end do

Too wordy? Likely. The same issues are encountered by the if statement. Novice programmers might not yet understand the sequential life of a program, and so they might consider an if statement to be “waiting” continuously, instead of being executed in sequence. Something like “if it starts raining, open an umbrella“. It might seem easy to the experienced programmer, but not so to the novice.

Is there a definitive solution? Unlikely. The use of the existing programming vernacular is ingrained into programmers. We have never really delved into the terms used for loops since the for loop appeared in Algol, and the while loop shortly afterwards. It might be possible to make small changes over time… but computer science rarely does things in small ways. We likely have to find better ways of teaching people to program.

 

 

Coding Fortran: file I/O

File I/O in Fortran is not that hard, and there are a lot of options provided. Let’s look at it from the perspective of a small program that performs file writing.

program fileIOwrite

   character (len=25) :: fname
   integer :: i,n,fsize
   logical :: lexist
   character :: formL

   write(*,*) 'filename? '
   read(*,*) fname
   n = 100

   inquire(file=fname, exist=lexist, form=formL, size=fsize)
   if (lexist) then
      write(*,*) 'File exists and is ', formL
      write(*,*) 'The file is ',fsize,' bytes in size'
      write(*,*) 'It will be overwritten'
      open(unit=20,file=fname,status='replace',action='write')
   else
      write (*,*) 'File does not exist - will be created.'
      open(unit=20,file=fname,status='new',action='write')
   end if

   do i = 1,n
      write(20,*) i
   end do
   close(20)

end program

The first function, inquire(), is used to determine if the file entered by the user exists, and if it does, extracts some information from the file specs. If the file exists (i.e. lexist is true), then some information about it is output to standard output, and the file is opened as an overwrite (status=’replace’) (i.e. the data in it is lost). If the file doesn’t exist, open() is used to create a new file.

Note that the subprogram for opening the file, open(), has a number of parameters:

  • status :  ‘old’, ‘new’, ‘replace’, ‘scratch’ or ‘unknown’. If ‘old’ is specified the file must exist; if ‘new‘ the file must not exist; if ‘replace‘ and the file exists it will be deleted before a new file is created; and if ‘scratch‘ the file will be deleted when closed. In general use ‘old‘ for input and ‘new‘ for output.
  • action : read, write, or readwrite.

There are a bunch of other parameters including, iostat (execution success), form (formatted or unformatted), access (sequential or direct), or position (asis, rewind, append), to name but a few. Note that open() associates a “unit number specifier”, with the file being opened. In the example above that is the number 20. All interactions with the file now specify this number. This is different to many programming languages where you have to specify a file pointer. The file is closed using close().

Note also how the user can specify the filename.

Coding Ada: packages (+ records and file I/O) (ii)

Now, let’s deal with the package body (imageP.adb). It really just contains the implementations for each of the three subprograms outlined in the specification. This piece of code actually illustrates a number of concepts in Ada.

package body imageP is

procedure readImage(img: out image; fname: in unbounded_string) is
   infp: file_type; 
begin
   open(infp,in_file,to_string(fname)); 
   get(infp,img.dx);
   get(infp,img.dy);
   for i in 1..img.dx loop
      for j in 1..img.dy loop
         get(infp,img.pixel(i,j));
      end loop;
   end loop;
   close(infp);
end;

function meanFilter(img: in image) return image is
   sum: integer;
   filtimg : image;
begin
   filtimg.dx := img.dx;
   filtimg.dy := img.dy;
   for i in 1..img.dx loop
      for j in 1..img.dy loop
         filtimg.pixel(i,j) := 0;
      end loop;
   end loop;

   for i in 2..img.dx-1 loop
      for j in 2..img.dy-1 loop
         sum := 0;
         for x in -1..1 loop
            for y in -1..1 loop
               sum := sum + img.pixel(i+x,j+y);
            end loop;
         end loop;
         filtimg.pixel(i,j) := integer(float(sum)/9.0);
      end loop;
   end loop;
 
   return filtimg;
end;

procedure writeImage(img: in image; fname: in unbounded_string) is 
   outfp: file_type;
   ans: character;
begin
   if exists(to_string(fname)) then
      put_line("File exists - overwrite (Y/N)? ");
      get(ans);
      if ans = 'Y' then
         create(outfp,out_file,to_string(fname));
      else
         put_line("exiting - image not saved");
         return;
      end if;
   else
      create(outfp,out_file,to_string(fname)); 
   end if;
 
   set_output(outfp);
   put(img.dx,width=>4); new_line;
   put(img.dy,width=>4); new_line;
   for i in 1..img.dx loop
      for j in 1..img.dy loop
         put(img.pixel(i,j),width=>4);
      end loop;
      new_line;
   end loop;
   set_output(standard_output);
   close(outfp);
end;

end imageP;

There is nothing intrinsically different with any of the subprograms, so we will not go through them too deeply, except to point out the new features. Both the procedures readImage() and writeImage() contain a number of file I/O directives. Let’s consider readImage() first:

1 infp: file_type; 
2 open(infp,in_file,to_string(fname)); 
3 close(infp);

The code on line 1 creates a file pointer.  The next piece of code opens the file specified by the user with the file pointer, and opens it for input using the specifier in_file. The code on the third line just closes the file. Note that because the filename fname specified by the user is input as an unbounded_string, it must be converted to a fixed string using to_string(). In the procedure writeImage() there are a couple of neat things. The first is the function exists(), which is from the package directories, which determines whether or not a file exists (also used in the main program at the end of the discussion).

exists(to_string(fname))

To create a file that does not exist, or overwrite one that does, we can use the function create().

create(outfp,out_file,to_string(fname));

Now the interesting part is a function called set_ouput(), which can be set to the output file pointer, outfp, and then the file pointer does not need to be specified explicitly for each individual output function. This continues until the output is reset to standard_output. i.e. we can use put(img.dx,width=>4) instead of put(outfp, img.dx,width=>4).

set_output(outfp); 
set_output(standard_output);

Here is a sample calling program:

with ada.Text_IO; use Ada.Text_IO;
with ada.Integer_Text_IO; use Ada.Integer_Text_IO;
with ada.strings.unbounded; use ada.strings.unbounded;
with ada.strings.unbounded.Text_IO; use ada.strings.unbounded.Text_IO;
with ada.directories; use ada.directories;
with imageP; use imageP;

procedure testpack is
   filename: unbounded_string;
   imgO, imgF: image;
begin
   loop
      put_line("Enter input image filename: ");
      get_line(filename);
      exit when exists(to_string(filename));
   end loop;

   readImage(imgO,filename);
   imgF := meanFilter(imgO);
   put_line("Enter output image filename: ");
   get_line(filename);
   writeImage(imgF,filename);

end testpack;

Note also that in Ada, it is possible to return an array or record, as shown in function meanFilter().

 

Coding Ada: packages (+ records and file I/O) (i)

Like the modules in Fortran, Ada has its own “collection”, and it’s called a package. In the following example, I have created a small “image processing” package which has three functions, and a data structure to hold the image data in the form of a record. The code also contains some ideas for file I/O, which is an interesting part of Ada, because it never really was a priority, as Ada was designed for embedded systems. Obviously image processing in languages like Ada is not ideal, mostly from the perspective of verbosity – too many loops to iterate through pixels.

A package is made of two files: a specification and a body. The specification contains information about the core structure of the package, and the body contains the actual subprogram implementations. Here is the specification for the package imageP (imageP.ads).

with ada.Text_IO; use Ada.Text_IO;
with ada.Integer_Text_IO; use Ada.Integer_Text_IO;
with ada.strings.unbounded; use ada.strings.unbounded;
with ada.strings.unbounded.Text_IO; use ada.strings.unbounded.Text_IO;
with ada.directories; use ada.directories;

package imageP is

   type pixelMatrix is array(1..200,1..300) of integer;
 
   type image is 
      record
         pixel : pixelMatrix; 
         dx : integer;
         dy : integer;
      end record;
 
   procedure readImage(img: out image; fname: in unbounded_string);
   function meanFilter(img: in image) return image;
   procedure writeImage(img: in image; fname: in unbounded_string);
 
end imageP;

First an integer array type pixelMatrix is created. Adding an array directly in the record will result in a compiler error of the form: “anonymous arrays not allowed as components“.  The reason why is that anonymous arrays are extremely restricted because they make new incompatible types. If we created two different records, each containing an array, they would be considered dissimilar, even if syntactically identical. Creating a type alleviates this problem. The record image contains a pixelMatrix item, and two integers, dx, and dy to hold the dimensions of the image. Note that individual components of the record can be accessed as follows:

imgI : image;
imgI.dx := 100;
imgI.dy := 100;
imgI.pixel(i,j) := 204;

The remainder of the specification contains the three subroutines: readImage(), meanFilter(), and writeImage().

  • readImage() – Procedure to read a “text image” from an ASCII file. The text image is comprised of the number of rows, number of columns, followed by the pixel data.
  • meanFilter() – Function to performs a 3×3 neighbourhood mean filtering on the image. Returns a filtered  image. Does not modify the original image.
  • writeImage() – Procedure to write an image to an ASCII file as a “text image”. Creates a file if it does not exist, and if an attempt is made to overwrite,

The specification also includes any package dependencies required by the package body. Once both specification and body are completed, they can be included in an Ada program in the same way any other packages are. Compiling a program using a package is as easy as compiling the main program.

 

Why I like organic coding

If you have looked through my blog, you will know that I’m a big fan of programming in Julia. What I don’t like is how Julia and other languages such as Python have added a plethora of more complexity as they evolve. If you look at the language as a whole, it is extremely well designed. A novice programmer can easily written effective code to analyze and process data. The problem comes when they try to understand code that isn’t very organic. What do I mean by organic code? Consider the following piece of Julia code which performs median filtering on a grayscale image.

The code is fairly easy to decipher. There is no need for complex vectorization, dynamic memory management, or fancy syntax. I’m not a big fan of fancy syntax. To me, organic coding is coding that is plain and simple, not obfuscated in any manner. While a language such as Julia can be quite complex, it is possible to use a subset of the language, and make coding easy. Simple code doesn’t need things like objects. Organic code is also open-source. I once did a bit of programming in Matlab¹, but to be honest its closed-source nature makes it hard for people to learn transferable skills.

¹If you want to read more about why Matlab is not ideal, read this blog post, I Hate Matlab: How an IDE, a Language, and a Mentality Harm, by Olivia Guest.

Coding Ada: strings (ii) – unbounded

If variable length strings are needed, then an unbounded string should be used, specifying the appropriate packages:

with Ada.Strings.Unbounded; use Ada.Strings.Unbounded;
with Ada.Strings.Unbounded.Text_IO; use Ada.Strings.Unbounded.Text_IO;

Unbounded strings are not arrays of characters, and are stored on the heap (and automatically deallocated). For example, word is declared as an unbounded string, and can have any size:

word : unbounded_string;

The unbound string can be input using get_line().

get_line(word);

unbounded string ELEMENTS

Due to the fact that unbounded strings are not really arrays, it is more challenging to access them. Consider the following piece of code, which just prints out each of the elements of string word:

 for i in 1..length(word) loop
    put(word(i));
 end loop;

Compiling this will likely result in an error of the form: “array type required in indexed component“. This is because an unbounded string can’t be indexed in the same way a fixed string can. In order to make this work, it requires the use of the “function” element(), where the parameters are the name of the unbounded string (word), and the element to access (i):

 for i in 1..length(word) loop
    put(element(word,i) & " ");
 end loop;

The same with modifying an element. For this you need to use the function replace_element(), for example:

replace_element(word,1,'v');

This will replace the first element of the string word with the character “v“.

Functions

As seen in the above example, the length of an unbounded string can be determined using the function length().

Conversions

It is not possible to just assign strings to unbound strings, they must be converted using functions:

  • to_unbouded_string – convert from a fixed string to an unbounded string
  • to_string – convert from an unbounded string to a fixed string.

 

Coding Ada: strings (i) – fixed

Strings are tricky in Ada. There I said it.

Strings in Ada have their own type.  Let’s say we want to declare a string with with 10 characters in it (this is an explicit, uninitialized string):

str : string(1..10);

Now Ada strings are fixed length, which means its length will never change. It also means a string variable can only be assigned string that match its length. If you then want to use get(str) to read the string from standard input, it will expect to read in 10 characters, and it will wait until all 10 have been read. So if you were to input:

sith
sith
sith

The string stored would be sithsithsi. You can’t store strings less than 10 characters in str. Strings can be declared in a number of ways:

s1 : string := "Lord Vader";
s2 : string(1..10);
s3 : string(1..10) := "Lord Vader";

Here are some characteristics of arrays:

  • Array assignment requires arrays of same length on both sides.
  • Strings of different lengths can be compared.
  • Slices of strings, e.g. s1(2..4) can be compared.
  • A string of length 1 is not the same as a character.
  • For I/O, get_line() and put_line() are defined for strings.

To read in strings, use get_line(). It’s basic syntax is get_line(S,L), where the output parameters are: S, the input string, and L, the number of characters that were input.

Note that strings do lack flexibility. If you were to read in a file of words (e.g. dictionary), where the length of the words varies, it is difficult to do this using a normal array of strings, because of the need to fill up the words. To do this we need to use unbounded strings. In essence, Ada has three types of strings:

  • fixed – the strings as discussed above
  • bounded – the current size and the string are combined in one type. The size of a string can change, but it must always be less than some predefined maximum.
  • unbounded – the size can change as required.