Arrays always seem challenging for novice programmers because their aren’t too many parallels in real life. The closest we come are eggs in a carton (2×6), or wine bottles in a case (3×4), and these are two dimensional arrays – one dimensional arrays seem more elusive. Pez in a dispenser act more like a stack. So how does one convey the idea of a container which holds only one type of item? The closest one gets to an illustrative example is something like a divided storage boxes, but even then, it illustrates a container with a lid, which arrays don’t have. Arrays are essentially containers without walls, or lids, or physical constraints (leaving memory constraints aside).So while arrays are challenging to conceptualize, they can also be challenging to implement. I have talked previously about the use of 0-based indexing. Whilst it might have seemed natural to Dijkstra, the average person learning to program does not understand why a container has portions that are marked as “division 0”. First let’s consider arrays in C. Here is an integer array with 100 elements in C:
int a[100];
The problem with this implementation is that there is no identification of the concept of an array. This is largely because the specification is only done with the square brackets. There is also no indication that indices start at zero (although in C, this is the norm). C likely has the highest cognitive load because there is no intuitive information for the novice. Consider as an alternative, how other programming languages specify arrays.
Pascal a : array[1..100] of integer; Ada a : array (1..100) of integer; Fortran integer, dimension(1:100) :: a Julia a = array(Int64, 1, 100)
In all cases, the type of the array is clearly specified. In the case of Pascal, Julia, and Ada, the term array is used to indicate that it is a container. In all cases the range of array is clearly indicated, and in addition, this often also indicates the indices which can be used. In the case of Fortran, there is no explicit use of the term array, however it does use the adjective dimension to specify the range of values. Also, not all languages are limited to starting at index 0, like C. Pascal for instance allows for both positive and negative indices to be used, allowing the language to conform to an algorithm, rather than the algorithm conform to the language.
In all the cases above it is easier for the novice programmer to understand that a is an array. Specifying the range of indices for the array is also easier for the novice. Take for example the Ada array. Creating a loop to process this array, giving each element a value is simple:
for i in 1..100 loop a(i) := i; end loop;
This is due in part to the loop being quite easy to understand. Consider the same loop in C:
for (i=0; i<100; i=i+1) a[i] = i;
This loop is just not as intuitive for the novice programmer, partially because it uses the indices 0 to 99. What is likely to happen is that the novice programmer might write a loop in one of the following forms:
for (i=0; i=100; i=i+1) for (i=0; i==100; i=i+1) for (i=0; i!=100; i=i+1)
The first will result in an infinite loop, the second in no iterations of the loop, and the last one will work, but is very unintuitive. Here there is a lot of scope for the novice to make errors, whereas it is much more difficult using the Ada loop. Consider now arrays in Python, often touted to be a very easy to learn language. Firstly, as Python has no natural arrays (only lists), an array must be created using Numpy. Here is the same specification as for the languages above, an integer array with 100 values.
a = numpy.arange(0, 100, dtype=np.integer)
First there is nothing which intuitively screams “array”. Next, and probably most problematic for the novice programmer is the fact that the range must be specified as 0..100+1, because specifying “0..99” would result in an array with 99 elements.
Of course there are also issues with using arrays, and the structure of the arrays. The use of square brackets, [ ], may be more intuitive than simple parentheses. So languages like Pascal, Julia and C are likely better than Fortran and Ada here because it is easy to confuse the use of array parentheses, with those used in function calls, or expressions. When one extends the arrays to multiple dimensions, how this is specified also contributes to cognitive load. C does this using separate brackets, e.g. a[i][j], which is less intuitive than Fortrans use of the integrated index, a(i,j). Finally there are issues with what can be done with arrays. In some languages such as Fortran, setting the entire array to a particular value is easy. For example, in Fortran, the following code, creates a 20×20 integer array (theArray), sets the whole array to the value 10, and then sets the central 8×8 region to the value 5:
integer, dimension(20,20) :: theArray theArray = 10 theArray(7:14,7:14) = 5
This is achieved through whole-array operations, and array slicing, which are convenient features of many programming languages. In C, unless you want to set the whole array to zero, there is no easy way of performing this task, and array slicing is not allowed. This means the code in C might look something like this:
int i, j, theArray[20][20]; for (i=0; i<20; i=i+1) for (j=0; j<20; j=j+1) theArray[i][j] = 10; for (i=6; i<=13; i=i+1) for (j=6; j<=13; j=j+1) theArray[i][j] = 5;
Which one of Fortran or C is easier for the novice programmer to understand and implement?
In conclusion, it seems like languages such as C, and Python increase the cognitive load for novice programmers, partially because the use of 0-based indexing is not intuitive, and partially because of the ease in making errors when it comes to specifying loops to manipulate the array. Other languages provide a better basis for understanding arrays. This and the added features of languages such as Fortran and Julia which allow array slicing mean that the novice programmer can concentrate more on the problem solving aspects of their algorithm, and less on the syntax of the language.