The way that arrays are stored in languages doesn’t get much in the way of discussion. C is row-major (as is C++, Pascal, and Python), Fortran is column-major (as is Julia, R and Matlab). But what does this really mean? It’s all about how the arrays are stored in memory: row major stores an array row-by-row, and column major stores an array column-by-column. Now in the overall scope of things, it doesn’t really matter which is used. Here’s what it means from a visual perspective.
Does it affect processing efficiency? Kind-of, but only in the sense that you should always traverse the data in the order it was laid out. Basically in a 2D column-major array, it is the row indices that change the fastest. This means that if you are using a nested loop, it is more efficient to process the columns in the outer loop and the rows in the inner loop. In a 2D row-major array, the opposite is true. Of course if you mix up the order of the indices, it’s unlikely to make a difference in a small 2D array.
In a large 2D array however, there is a good chance some of it will be stored in the cache. Accessing a column-major array in the column-row order will be efficient because the array is stored sequentially in this manner (and the CPU pre-fetches data required next). In the figure above this means the data would be accessed as 1, 4, 7. Accessing it in row-column format will lack efficiency because the data is not stored sequentially. In the figure above, this means the data would be accessed as 1, 2, 3. This means more cache-work, and a reduction in efficiency.
In Julia this means that for a 2D array, the following code is more efficient:
dx,dy = size(img) for j=1:dy, i=1:dx if (imgY[i,j] > imgCb[i,j]) im_e3[i,j] = 255 end end
than:
dx,dy = size(img) for i=1:dx, j=1:dy if (imgY[i,j] > imgCb[i,j]) im_e3[i,j] = 255 end end
So know how the data is laid out in the language you are using, and access it accordingly.
One thought on “Column-major vs row-major arrays: Does it matter?”