Want to learn about legacy programming languages and re-engineering?

cis3190ad

Advertisements

Programming moisture vaporators

Every year in my introductory programming class there are a bunch of students who say they are taking programming because they “have to”, and complain that they will never need to program.

Really?

30 years ago, I would have said this was true, but the now reality is quite different. Yes, there are likely some skills that will never be that useful. I count chemistry amongst those. Sure, had they taught more about *real* everyday chemistry it might have been useful, but laboratory chemistry just isn’t. It’s not like we had chemistry sets to tinker with (I mean it wasn’t the 1950s). Even calculus… useful for a small minority, likely less useful for the rest. Statistics? – yes, one of the few things that would be extremely useful if more people learned about it.

But programming? It didn’t matter so much when we had little data to process, or processed the data by hand. Now all these scientific fields – chemistry, physics, biology – they all have BIG data to parse. So knowing how to program makes sense if you want to have the ability to process the data. Even people like mechanical engineers need to have some idea how to program. Some laugh at this… but they won’t laugh forever. Just like mechanical engineering relies on the ability to build things, so too in the age of computers does it rely heavily on automated systems such as CNC machines. And guess what? There is no fancy way of creating code to run these machines. Many of them run using a language called APT, or Automated Program Tool, first appearing in the late 1950s, and still a standard internationally.

So very few can afford to ignore programming. Okay, so it may be slightly mundane at times. But, by learning to program you will gain a skill which may be of use to you later in you career. You may never have to program in the binary language moisture vaporators – but you never really know do you?

 

Star Wars and Valerian

With the release of “The Force Awakens”, it seems appropriate to delve into the “inspirations” for some of the early concepts in the Star Wars movie. Star Wars was the quintessential sci-fi movie of its era, and frankly few movies have surpassed it. Now one usually does not question the origin of these things – that is until one reads the French graphical novels of  Valérian: Spatio-Temporal Agent, written by Pierre Christin and artist Jean-Claude Mézières. The graphic novels first started appearing in 1967, with the series focusing on the adventures of the dark-haired Valérian, a spatio-temporal agent, and his redheaded female companion, Laureline, as they travel the universe through space and time. The early graphic novels are available in English from publisher Cinebook.

There are aspects of the movies which seem mysteriously similar to parts of the books – which came first. Firstly, there is the Millennium Falcon (Star Wars, 1977) – it looks very similar in design to Valerian’s XB982 astroship (Welcome to Alfolol, 1972).

valerian6

The slave-girl costume worn by Laureline in World Without Stars (1972) and the costume worn by the character Princess Leia  in the scenes where she is enslaved by Jabba the Hutt in Return of the Jedi (1983).

valerian2

A scene in Empire of a Thousand Planets  (1971) where Valérian is encased in a liquid plastic and a scene in The Empire Strikes Back where the character Han Solo is encased in a substance called carbonite (1980).

valerian4

A scene in Empire of a Thousand Planets (1971) where one of the “enlighteneds”  removes his helmet to reveal his radiation burned and scarred face underneath and a scene in Return of the Jedi where the character Darth Vader removes his helmet to reveal the burned face of Anakin Skywalker.

valerian3

A second scene, also from Empire of a Thousand Planets, shows the enlighteneds with their helmet on. Note the similarity in the curves of the helmet to Vaders?

valerian1

The alien Shingouz from Ambassador of the Shadows (1975) and the character Watto seen in The Phantom Menace (2003).

valerian7

What about the resemblance between the ships from Terran fleet in  Ambassador of the Shadows (1975) and the Mon Calamari cruisers from Return of the Jedi?

valerian5

And finally, the incredible likeness between the Ambassador from Point Central (Ambassador of the Shadows, 1975) , and Grand Moff Tarkin (Star Wars, 1977)?

valerian8

One does have to query the similarity between the Valerian books and the Star Wars trilogy. So much so, that the authors included a scene in one of of their later comics where Valerian and Laureline meet Luke and Leia in a cantina. Leia says something to the effect of “you look very familiar to us” and Laureline responds with “oh, no, we’ve been here for years”.

expo_starwars

If you have never read Valérian: Spatio-Temporal Agent, then order an English translation from Cinebook (there seem to be 21 issues in French). The latest English translation “The Ghosts of Inverloch” is due out in March. And watch out for summer 2017, when the movie “Valerian and the City of a Thousand Planets” is released.

 

The heap – malloc() and free()

I don’t want to delve too deep into pointers in C. But here’s some additional information on the two most commonly used functions associated with heap memory – malloc() and free().

malloc()

The malloc() function takes as its argument an unsigned integer which represents the size of the memory block requested from the heap (in bytes). Then malloc() returns a pointer to a new heap block if the process is successful., and NULL otherwise (usually because the heap is full). An easy way of calculating the number of bytes needed is by using the sizeof() function. For example, sizeof(int), will return the size, in bytes of an int.

free()

The function free() takes a pointer to a heap block, and returns it to the free pool for reuse later. The pointer that is the argument to free() must be the same one returned earlier by malloc(). There is no requirement for providing the size of the heap block, the memory manager will deal with this. Failure to deallocate memory in programs where the use of the heap is prolific can result in memory leaks, and nasty thing happening.

An example

void BadHeap() 
{
    // allocate the pointer, but not the pointer
    int *p;
    // this dereference is a serious runtime error
    *p = 42; 
    // this will cause a Segmentation fault
    printf("%d\n",*p);
}

Compare this with the proper way of doing things:

void GoodHeap() 
{
    // allocate the pointer
    int *p;
    // allocate heap memory to the pointer
    p = malloc(sizeof(int));
    // allocate a value to the heap memory
    *p = 42;
    // print out the value 
    printf("%d\n", *p);
    // free the allocated memory
    free(p);
}

If you like this, there is more fun to be had with calloc(), valloc(), and realloc().

 

 

Memory and C – how the stack works

To illustrate memory in C, consider the following program:

 #include <stdio.h>
 double pi = 3.14159;
 double circleArea(double r) 
 {
     return pi * r * r;
 }
 int main(void)
 {
     double area, radius;
     scanf(“%lf”, &radius);
     area = circleArea(radius);
     printf(“Area = %.2f\n”, area);
     return 0;
 }

Now consider the following diagram relating to the programs memory when it is run:

stackmemory2

The left portion of the diagram represents the stack up until the point in the program where the user has entered a value for the radius and it has been stored in the variable radius. The value stored in the global variable pi is stored in the static region. The middle portion of the diagram represents the call to the function circleArea, which has created a new frame in the stack showing various internal variables. The right portion of the diagram shows the stack after the call to circleArea has been completed. The function stack frame has been deleted and not the variable area contains the value of the area calculated, which was returned from the function.

Memory in C – the stack, the heap, and static

The great thing about C is that it is so intertwined with memory – and by that I mean that the programmer has quite a good understanding of “what goes where“. C has three different pools of memory.

static: global variable storage, permanent for the entire run of the program.
stack: local variable storage (automatic, continuous memory).
heap: dynamic storage (large pool of memory, not allocated in contiguous order).

stackmemory4

Static memory

Static memory persists throughout the entire life of the program, and is usually used to store things like global variables, or variables created with the static clause. For example:

int theforce;

On many systems this variable uses 4 bytes of memory. This memory can come from one of two places. If a variable is declared outside of a function, it is considered global, meaning it is accessible anywhere in the program. Global variables are static, and there is only one copy for the entire program. Inside a function the variable is allocated on the stack. It is also possible to force a variable to be static using the static clause. For example, the same variable created inside a function using the static clause would allow it to be stored in static memory.

static int theforce;

Stack memory

The stack is used to store variables used on the inside of a function (including the main() function). It’s a LIFO, “Last-In,-First-Out”, structure. Every time a function declares a new variable it is “pushed” onto the stack. Then when a function finishes running, all the variables associated with that function on the stack are deleted, and the memory they use is freed up. This leads to the “local” scope of function variables. The stack is a special region of memory, and automatically managed by the CPU – so you don’t have to allocate or deallocate memory. Stack memory is divided into successive frames where each time a function is called, it allocates itself a fresh stack frame.

Note that there is generally a limit on the size of the stack – which can vary with the operating system (for example OSX currently has a default stack size of 8MB). If a program tries to put too much information on the stack, stack overflow will occur. Stack overflow happens when all the memory in the stack has been allocated, and further allocations begin overflowing into other sections of memory. Stack overflow also occurs in situations where recursion is incorrectly used.

A summary of the stack:

  • the stack is managed by the CPU, there is no ability to modify it
  • variables are allocated and freed automatically
  • the stack it not limitless – most have an upper bound
  • the stack grows and shrinks as variables are created and destroyed
  • stack variables only exist whilst the function that created them exists

Heap memory

The heap is the diametrical opposite of the stack. The heap is a large pool of memory that can be used dynamically – it is also known as the “free store”. This is memory that is not automatically managed – you have to explicitly allocate (using functions such as malloc), and deallocate (e.g. free) the memory. Failure to free the memory when you are finished with it will result in what is known as a memory leak – memory that is still “being used”, and not available to other processes. Unlike the stack, there are generally no restrictions on the size of the heap (or the variables it creates), other than the physical size of memory in the machine. Variables created on the heap are accessible anywhere in the program.

Oh, and heap memory requires you to use pointers.

A summary of the heap:

  • the heap is managed by the programmer, the ability to modify it is somewhat boundless
  • in C, variables are allocated and freed using functions like malloc() and free()
  • the heap is large, and is usually limited by the physical memory available
  • the heap requires pointers to access it

An example of memory use

Consider the following example of a program containing all three forms of memory:

#include <stdio.h>
#include <stdlib.h>
int x;          

int main(void) 
{
    int y;   
    char *str; 

    y = 4;
    printf("stack memory: %d\n", y);

    str = malloc(100*sizeof(char)); 
    str[0] = 'm';
    printf("heap memory: %c\n", str[0]); 
    free(str);          
    return 0;
}

The variable x is static storage, because of its global nature. Both y and str are dynamic stack storage which is deallocated when the program ends. The function malloc() is used to allocate 100 pieces of of dynamic heap storage, each the size of char, to str. Conversely, the function free(), deallocates the memory associated with str.

stackmemory3

 

How to store large images in C without pointers

So in the last post we looked at how to store large arrays dynamically in C. Is there any way to avoid this? Well, there are a couple of tricks.

Trick #1

When dealing with grayscale images, who only have values from 0-255 – which is literally how MOST images are stored. Even colour images contain tuples of Red-Green-Blue values all within the range 0-255. One way of course is to use a short datatype, which is typically smaller than an int, and takes less storage: 2 bytes versus 4 on many systems. But this is system dependent, and still there are many values not needed, i.e. 256-32K. So a better bet? Try using an unsigned char. These datatypes typically use 1 byte of information, so ¼ the data of an int, and they fit exactly in the 0-255 space needed.

Here is an 2000×2000 array created in this manner:

unsigned char image[2000][2000];

This array has a size of 4 million bytes.

Trick #2

The second trick, is rather sly. Create the array using static. For example:

static int image[2000][2000];

This means when the program runs, it will use statically allocated memory instead of the automatically allocated memory (i.e. the stack).

This array has a size of 16 million bytes.

 

How to store large data in C?

When C was designed, data was small, and life was good. One could learn C, and avoid all the nastiness of the pointers associated with it. But as data has grown, so too has the need for novice programmers to learn about memory, and how to manipulate it. Okay, so there is nothing wrong with learning about memory, but sometimes you just want to solve a problem. Take the example of digital images. When the age of digital cameras dawned in the mid 1990s, the resolution of images was below 0.5 megapixels – something like 640×480 pixels. It was easy to store such images in an array created on the stack, something of the form:

int array[500][700];

However now, the size of images has ballooned. A 12 MP image will be 4000×3000 pixels in size – 12 million pixels, each of which is stored using a 4-byte int. This is 48 megabytes (assuming a grayscale image). The normal stack size may be around 5MB. Trying to store this in an array in C:

int array[3000][4000];

would cause a segmentation fault when the program runs. Just NOT enough memory, and there is only one way to solve the problem – use the heap. By using the heap, we can create a dynamic array of any size. There are two ways of doing this, (i) allocating a single piece of n×m memory on the heap, OR (ii) allocate an array of arrays – allocate one array of n rows, and to each row allocate m columns.

Method 1: A single array

First create a pointer to an int, and then allocate memory to it:

int *image;
image = (int *)malloc(sizeof(int)*n*m);

Now we can set every element to zero:

for (i=0; i<n; i=i+1)
    for (j=0; j<m; j=j+1)
        image[i*m+j] = 0;

Notice that there is no image[i][j] syntax being used here. The compiler has no idea where the next row starts because it is a 1D array, so a single index value must be used which is an amalgam of the row and column indices and the dimension of the column.

Method 2: An array of arrays

The second method allows the use of the traditional [i][j] indexing. In this situation an array of int arrays is created:

int **image;

Now allocating memory to this is a little trickier. First we have to use malloc to allocate an array of n pointers to ints.

image = malloc(n * sizeof(int *));

Then for each row, malloc allocates m buckets of memory:

for (i=0; i<n; i=i+1)
    image[i] = (int *)malloc(m * sizeof(int *));

Now using this method, every element can be set to zero in the following way:

for (i=0; i<n; i=i+1)
    for (j=0; j<m; j=j+1)
        image[i][j] = 0;

How does one pass this “array” to a function? Consider putting the zeroing code above in a function called setArrayZero():

void setArrayZero(int **img, int n, int m)
{
    int i, j;
    // Set all the elements to zero.
    for (i=0; i<n; i=i+1)
        for (j=0; j<n; j=j+1)
            img[i][j] = 0;
}

This would be called in a conventional manner:

setArrayZero(image,n,m);

NB: Don’t forget that with either method, the memory has to be deallocated when you’re done with it.

Why does int = long in C?

In C, the integer datatypes are shortint, long, and long long. Now technically, each of these should contain a certain range of (signed) numbers, for example (according to the C99 standard):

short:     -32767 to 32767
int:       -32767 to 32767
long:      -2147483647 to 2147483647
long long: -9223372036854775807 to 9223372036854775807

This would make sense, but things aren’t always that simple. It always depends on the system. Here are the specs for a 64-bit OSX system:

size of short     : 2 bytes
size of int       : 4 bytes
size of long      : 8 bytes
size of long long : 8 bytes
range of short     : -32768 to 32767
range of int       : -2147483648 to 2147483647
range of long      : -9223372036854775808 to 9223372036854775807
range of long long : -9223372036854775808 to 9223372036854775807

So long and long long are the same on this system.

On a Raspberry Pi (with a 32-bit ARM processor) however, a different story emerges:

size of short     : 2
size of int       : 4
size of long      : 4
size of long long : 8
range of short     : -32768 to 32767
range of int       : -2147483648 to 2147483647
range of long      : -2147483648 to 2147483647
range of long long : -9223372036854775808 to 9223372036854775807

Here, int and long are the same.

The moral of the story? CHECK the limits of your datatypes before you use them if you plan to process large numbers.