How long should an identifier be?

One thing I still see a lot of in student programs is long identifiers. I mean really long. Early programs in languages such as Fortran often used simple 1-2 character identifiers. This was likely the case because memory was limited, so large descriptive names would take up more space. After this, many programming languages limited the length of identifiers to something like 32 characters.  Well technically they didn’t limit them, but only the first 32 characters were valid (i.e. it ignored the rest). The past 20-30 years have seen a steady increase in the allowable length of identifiers.

Now many languages allow an unlimited length identifiers. That’s a problem. Now consider creating an identifier for a variable that stores the “number of sentences” counted in a text analysis program. Here are some options:

number_of_sentences
numberofsentences
NumberOfSentences
nosentences
no_sentences
numSentences
numSntncs
nSentence
nSent
...

The possibilities seem endless don’t they?

While longer identifiers offer better comprehension, they also can become so long that they impede the readability of a program, and impact human memory resources. Imagine some extreme identifier such as  total_number_of_characters_in_a_word.

Part of creating a readable program involves the proper choice of identifiers, and their length does play a role in readability. Excess identifier length can impede both reading speed, and  program comprehension. Which of the following pieces of code to calculate compound interest has better readability?

A = P*(1+(r/n))^(n*t)

amount = principalAmount * (1+(annualInterestRate/compoundedTimesPerYear))^(compoundedTimesPerYear*numberOfYears)

amount = prinAmnt * (1 + (intRate/timesPY))^(timesPY*nYears)

So identifier length has to be a balance between being informative and being readable.

A study published by New et al. [1] looked at the effect of word length on visual word recognition (and there are a *lot* of published works on word length and what they term inhibitory trends). In their study they found average reaction times were lowest when words had a length of 7 characters, and the graph of words 3-13 characters had a U shape. Performance flagged for both uber-short and long words. Why? One of their conclusions is that words with a length of 6-9 characters have the highest chance of being processed with a single “fixation” (where your eye comes to rest when you read). Shorter words are skipped, and longer words require more than one “fixation”.

What does this mean? Choose your identifiers carefully.

[1] New, B., Ferrand, L., Pallier, C., Brysbaert, M., “Reexamining the word length effect in visual word recognition: New evidence from the English Lexicon Project”, Psychonomic Bulletin & Review, 13(1), pp.45-52 (2006).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s