Why 1-based indexing is *OK*.

Some programmers markedly object to the presence of 1-based array indexing in programming languages. This happens every time a new language appears which uses it, the latest being Julia. But why?One of the arguments is because 0-based indexing is more natural. Really? Ever start counting anything at 0? How many eggs are in an egg carton – ahhh 12?  But when I take out the first, is it egg number 0?

Starting array indexing at 1 *is* natural for counting. So an array with 100 elements will be indexed from 1 to 100, not 0 to 99. Zero-based indexing is more of an artifact of C than anything else. Languages like C and C++ use pointers to store data, so it makes more sense to have their indices start at zero. Finding an array element in contiguous memory is then easy.

Element address = address of array + index × (size of type)

Arrays basically simulate matrices and vectors in mathematics, which used 1-based indices before the computer era, and still do in many cases. Both Fortran and Matlab use 1-based indexing, as do most matrices and vectors. Pascal and Ada allow for indexing to be generic. You want array indexing from -7 to 7… no problem – maybe generic indexing is the best approach? Some types of array manipulation are easier using zero-based indexing, and in certain circumstances zero-based indexing produces cleaner code.  Some of the reason why many people like 0-based indexing is the same reason they get nauseous looking at a Cobol program – they are use to programming predominantly in C-like languages, where 0-based indexing is the norm.

Dijkstra in his article “Why numbering should start at zero“, argues 0 ≤ i < N is a “nicer” range than 1 ≤ i < N+1 and doesn’t really care for 1 ≤ i ≤ N. This really doesn’t make much sense – there are no aesthetic qualities associated with conditional statements. Guido van Rossum  gives a rationale for why Python uses 0-based indexing, which boils down to being “swayed by the elegance of half-open intervals“.  He does make a good point that splitting a string into three components using a[:i], a[i:j], and a[j:] is nice.  But one also has to like the fact that i:j implies from i to <j.

Sure, for veteran programmers, 0-based indexing is no big issue, but for novice programmers, 0-based indexing is a huge usability issue. An array has 10 elements, but the 1st is actually the 0th, and the 10th is actually the 9th. Mathematically-focused languages like Matlab, Fortran, and Julia are okay having 1-based indexing. To each their own. Look some people like 0-based indexing and others like 1-based. There may be no right answer… and maybe that little bit of extra brain work in 1-based languages isn’t such a bad thing.

Advertisement

16 thoughts on “Why 1-based indexing is *OK*.

  1. Hi,
    I found here the smartest and the most objective analysis I was able to find on the net about this controversy. Bravo !

    This must figure inside of any computer science class as “first paradigm” to understand as tool for programming tasks.

    Personally I started to program in Basic, Fortran, Pascal, and after 10 years later in Gauss, Matlab, Maple, Mathematica, Python and now Cuda. For ending today (after a natural fixed point in Matlab because its readability and easy of coding mathematics) with Julia and Cuda Fortran as the most elegant ways foe expressing the scientific programming (i.e. 1-based indexing for CPU and GPU coding).

    Best regards from France !
    Constant

    1. Thank you! Yes, having started in Pascal and Fortran, I don’t buy the “0 is better” concept. What I have found is that too many students learn C/Java/C++ as a first language and have never delved into languages that differ substantially, like Fortran. They get into a mindset that there is no other approach. It’s similar with terminating control structures, like those in Julia. They don’t like them because “C doesn’t do them”.

  2. Great explanation! I’ve been arguing this for years, but it seems to fall on deaf ears.

    I am completely, totally, and utterly agnostic about indexing. I’ve used both since the 1970s and I have no reason to prefer one over the other. My mind simply adapts to the language I use, whether that’s C or Smalltalk.

  3. If integers start at 0, why shouldn’t arrays also?

    The RC4 stream cipher for example uses arrays and is really hard to do correctly without zero-indexing. It becomes necessary to litter the code with a bunch of “+1” everywhere.

    1. On the flip-side there are many algorithms where the opposite is true. Why not allow user-defined indexing like Fortran does? The whole 0-indexing thing started with C… not that that’s bad, but there are two sides to every argument. I mean integers include whole negative numbers as well.

  4. Its all fun and games until you want to stride the arrays, then 0 is the only rational choice because otherwise the code will be much harder to read.

  5. Actually there is a purely mathematical rationale for using 0-based indices. As we know, natural numbers consist of a base number and it’s recursive successors under the Peano axioms. For Peano arithmetic it does not matter how the base number is named – if it is 0 or 1 (or if we’re brave enough, we could choose 2 or a monkey). The Peano arithmetic will be just the same. In this sense the discussion of whether indices should be 0 or 1 based is obsolete. The problems arise when natural numbers are embedded into integers and integer arithmetic is used to operate with the natural numbers. In this case the discussion is far from obsolete. Integers have a special number, the neutral number, which is 0. If natural numbers are embedded into integers, there is only one ‘canonical’ choice of the embedding, which is to state that the base number of natural numbers should be chosen to be the same as the neutral number of integers and the successor of b is given by b+1. This choice makes the equation b+0=b well defined for the natural numbers.

  6. I’ve programmed with both 1-based and 0-based, and I prefer 1-based. The 1st item being 1, and the Nth item being N causes me much less trouble, and I’ve found 1:N to include 1 and N to be the most intuitive (so I’m not interested in the half-open intervals argument). Python’s range(N) function feels very odd.

    When it comes to arguments about strides etc. … well having had to deal with n-dim arrays manually in C++, I can understand where people are coming from (0-based does help there), but frankly Julia (and other 1-based languages) seem to have given me all the tools I need to never have to worry about it. Most of my loops look like “for x in Xs …” or rely heavily on broadcasting so index rarely even matters, but when it does, writing x[2] for the 2nd entry just seems logical.

    If you really need alternative indexing and neither 0 nor 1 are enough, say you’re doing dates and you’d like entries for 1966 to 2016, then Julia has you covered there too.

    As for Peano arithmetic, I can’t say it’s ever come up.

  7. This is a fundamental misreading of Dijkstra’s piece on this. The argument should not be based purely on aesthetics or even intuition; Dijkstra is making an argument based on the length of the subsequences based on the bounds, and on the representation of a zero length array. In the end, it results in keeping code as simple as possible.
    In my experience of scientific programming, I have come across a range of problems:
    * a collection of objects, which can be indexed either 0 to n-1, or 1 to n, and no-one cares because the index is not used in any calculations
    * a length n sequence or array from 0 to n-1 in which the index is used in calculations, and the first element logically goes with a 0 index; this is quite common
    * a length n sequence or array from 1 to n in which the index is used in calculations and there is no meaning to a 0th element; this is quite rare; often when it happens it is because the 0th element is a constant, and so can be optimized into the calculation.
    * a length n+1 sequence or array from 0 to n inclusive in which the index is used in calculations; not particularly common
    * a sequence which starts from 2 or some other number; these are not common, but are an argument for being able to specify the range in a language
    Since half the problems it doesn’t matter, and most problems fit into the 0 to something category, it makes more sense for a language to index from 0 rather than having to have ‘+1’ all over the place, or for the language to provide arbitrary index ranges. I’ve noticed particularly in converting Fortran math routines to C/C++ that I’ve often been able to get rid of most of the ‘+1’ idiosyncracies; the code is objectively simpler because there is less arithmetic in it.

    1. At the end of the day, everyone is entitled to their view on 1 versus 0 based indexing, and their interpretation of Dijkstra’s work.
      I have never liked it 0-based indexing, but then I grew up with languages that were mostly 1-based. Having taught C to novice programmers
      for 15 years, for people without any programming knowledge the 0-based indexing is problematic, in a language that is already problematic for novices.

  8. We actually do start counting at zero. In the case of your eggs example, what if the carton is empty? Then you have zero eggs. You don’t usually notice this because you usually count things when there are some items to count, creating the illusion of starting at one.

    1. If the carton is empty, and there are no (zero) eggs, there is nothing to mark, and therefore no zero egg. There are no objects to count. That’s inherently the problem with 0-based indexing, as it assumed with index 0 an element exists. If there are no eggs, then all there is a carton, and therefore the carton no longer needs to exist either. If I look at an open field, you could say there are zero trees, but likely most people would say there are no trees, or not even mention it. It becomes somewhat of a philosophical issue.

    2. What if we have several cartons? Carton number 1 contains 6 eggs. Carton number 2 is empty and contains 0 eggs. Carton number 3 (etc…). Our array would then look like [6,0,5,…].

      If you have no (0) cartons, then it doesn’t make sense to assign one a label or a count. We’re counting the eggs, but we’re ordering the cartons.

  9. I’ve used both one and zero-based indexing languages over 33 years of programming. Zero-based won me over (pun intended). The main reasons have been well elaborated. Just want to add one.

    In processing streams of, for example, bytes, it’s very common to have an array embedded inside another array. For example, tokenizing words out of a stream of text.

    If an algorithm needs to access the byte before the start of the embedded array, under one based indexing, the preceding byte, at an offset of -1, would be indexed at 0. The byte before that, 2 positions behind, offset -2, would be indexed at -1. That seems illogical. The code would be messy and hard to understand.

    Under zero-based indexing, the byte 2 positions behind is at -2. Logically. Yet another reason why it’s normal to index like that.

    I find that all the discussions of one-based indexing tends to rely on the “that’s how we learned to count as children/mathematicians” argument, neglecting the fact that array indexing is *indexing*, not *counting*. Also, zero-based indexing is common in mathematics, if not the norm. Usually the zero isn’t written out; “n, n+1, n+2…”.

    As soon as a mathematical or worked programming example is given that can’t be argued against, the argument shifts to “oh, well, it’s just a matter of personal preference”. Of course everyone can have their own opinion. But if someone suggested a 2-based indexing system, what would you make of them? It’s a valid opinion. The language could support it efficiently. But would you want to write code with this person? Most programmers would decline.

    Ultimately, I believe it’s about losing face. When those quiet but open-minded programmers, currently using one-based indexing languages, realize that zero-based is more practical, they just quietly move over and get on with their lives. But if they’ve been banging on about counting eggs in boxes for years, when it was actually indexing eggs that mattered, then those same people are gonna look or feel somewhat asinine when they U-turn. But the reality is most programmers don’t care about face, they only care about how easy it is to get work done.

    1. Look, 0- or 1- based languages are neither good nor bad. People that like 0-based indexing always seem very quick to defend their position for some reason or another. Maybe it stems from the hoards of people using C-based languages over the years. People did a lot of mathematical calculations before 0-based indexing appeared and nobody seemed to mind. I do find people that use language like Fortran a bit more open-minded to using whatever index in necessary to get the job done. Having taught C to beginners for 15 years, the concept of arrays starting at zero always caused problems because they just don’t understand how an index begins at zero. To them, it is counting. Experienced programmers can do whatever they please.

      1. It’s not counting, it’s ordering. Say I have 10 boxes, and I’m throwing balls randomly into them and seeing where they land. The 1st box might have 0 balls.

        It’s fine to count from 0, but we still order the boxes from 1 to 10. If there were 0 boxes, there would be no box to label.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.