Sally sells C strings by the C shore

Is there anything that quantifies evil in C better than strings?

Let’s look at the following piece of C code:

char aString[10];
scanf("%s", aString);
printf("%s\n", aString);

Now if we try and input an ASCII string “thecatsat, nothing goes wrong because the string is 9 characters in length, and will fit into the character array aString with no issue. Now try “thecatsatonthemat“. Depending on the system you’re running it on, this won’t cause any problems, which is an issue. Now when we run it with “thecatsatonthemat1” (on OSX), it returns with an “Abort trap: 6”. On another Unix system it doesn’t baulk until I type “thecatsatonthemat1234567“, then it produces a segmentation fault. The problem is that there is only “official” storage for 9 characters and the EOS (End-Of-Sting)  terminator ”. But C allows this fiasco to go on.

And the compiler won’t find these sort of bugs either. But it gets worse. Now we code something like this:

char aString[10];
char bString[10] = "1234567890";

scanf("%s", aString);
printf("String A: %s\n", aString);
printf("String B: %s\n", bString);

Notice that the string bString is initialized with exactly 10 characters, leaving no room for the EOS terminator. What happens when this compiles (OSX)? Nothing, there are no complaints. What happens when it runs, and we enter “thecat”? Here’s the output:

String A: thecat
String B: 1234567890thecat

Somehow the lack of string terminator has appended the memory location of aString to bString. On the other Unix system it did nothing. Make the initialized string the right length, and it works fine. Make it too long, and the compiler will complain, with a message something along the lines of:

warning: initializer-string for array of chars is too long

What is obvious is that C can’t be trusted to find coding inaccuracies for you. Other languages do strings better. In Fortran, a string is declared in the following manner (and is different from a character array):

character (len=10) :: str

Then if the “thecatsatonthemat” is input, only the first 10 characters,  “thecatsato” is stored – the rest is discarded. Fortran protects from the craziness that C doesn’t. And it doesn’t make the programmer deal with the EOS thing, if fact Fortran strings are not null-terminated.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s