How to store large data in C?

When C was designed, data was small, and life was good. One could learn C, and avoid all the nastiness of the pointers associated with it. But as data has grown, so too has the need for novice programmers to learn about memory, and how to manipulate it. Okay, so there is nothing wrong with learning about memory, but sometimes you just want to solve a problem. Take the example of digital images. When the age of digital cameras dawned in the mid 1990s, the resolution of images was below 0.5 megapixels – something like 640×480 pixels. It was easy to store such images in an array created on the stack, something of the form:

int array[500][700];

However now, the size of images has ballooned. A 12 MP image will be 4000×3000 pixels in size – 12 million pixels, each of which is stored using a 4-byte int. This is 48 megabytes (assuming a grayscale image). The normal stack size may be around 5MB. Trying to store this in an array in C:

int array[3000][4000];

would cause a segmentation fault when the program runs. Just NOT enough memory, and there is only one way to solve the problem – use the heap. By using the heap, we can create a dynamic array of any size. There are two ways of doing this, (i) allocating a single piece of n×m memory on the heap, OR (ii) allocate an array of arrays – allocate one array of n rows, and to each row allocate m columns.

Method 1: A single array

First create a pointer to an int, and then allocate memory to it:

int *image;
image = (int *)malloc(sizeof(int)*n*m);

Now we can set every element to zero:

for (i=0; i<n; i=i+1)
    for (j=0; j<m; j=j+1)
        image[i*m+j] = 0;

Notice that there is no image[i][j] syntax being used here. The compiler has no idea where the next row starts because it is a 1D array, so a single index value must be used which is an amalgam of the row and column indices and the dimension of the column.

Method 2: An array of arrays

The second method allows the use of the traditional [i][j] indexing. In this situation an array of int arrays is created:

int **image;

Now allocating memory to this is a little trickier. First we have to use malloc to allocate an array of n pointers to ints.

image = malloc(n * sizeof(int *));

Then for each row, malloc allocates m buckets of memory:

for (i=0; i<n; i=i+1)
    image[i] = (int *)malloc(m * sizeof(int *));

Now using this method, every element can be set to zero in the following way:

for (i=0; i<n; i=i+1)
    for (j=0; j<m; j=j+1)
        image[i][j] = 0;

How does one pass this “array” to a function? Consider putting the zeroing code above in a function called setArrayZero():

void setArrayZero(int **img, int n, int m)
{
    int i, j;
    // Set all the elements to zero.
    for (i=0; i<n; i=i+1)
        for (j=0; j<n; j=j+1)
            img[i][j] = 0;
}

This would be called in a conventional manner:

setArrayZero(image,n,m);

NB: Don’t forget that with either method, the memory has to be deallocated when you’re done with it.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s