Computer science education should come out of the dark ages

One of the problems with computer science education is that it is stuck in the dark ages. Don’t get me wrong, there are many good initiatives to modernize the ways curricula are taught. But one has to ask the question – in a field that moves so quickly is teaching classes even relevant anymore? Look I get the importance of learning the theoretical underpinning of a subject, but that makes it more of a science, and computer science, even though it has the word science in its title is more of a craft than a science.

So maybe it should be taught a different way. Think apprenticeship – and I know it’s a radical idea, but if universities *really* want to be innovative in how they teach things they have to let go of what they know, and move into the light.

The model I would propose is similar in concept to the Forest Schools, found in the kindergarden and early elementary setting. Here children spend anywhere from a half day to a full day outdoors in local woodlands and green spaces, in various urban and near-urban parks, natural spaces adjacent to or on school grounds, or natural playgrounds and outdoor classrooms. The basis of these programs is emergent, experiential, inquiry-basedplay-based learning. Why shouldn’t higher education be treated in a similar manner? Okay, I’m not suggesting playing in forests. But hey why not – if it could be incorporated into some practical project.

And therein lies the whole concept. Turning the current system away from lecture-based learning, towards informative, experiential based learning. How can this be accomplished? First of all, ditch the current concept of classes. You learning programming by doing, not by sitting and learning about syntax. It does seem radical, but innovative instruction is about being radical.

Next base the learning experience on the idea of spending the period from September to April, working in small groups, creating solutions to real problems. Intersperse this with 3-5 day coding bootcamps, to help solidify certain ideas, or introduce new concepts. Maybe have people from industry give 1-2 day workshops. The projects would hopefully mimic what real world projects go though to produce an outcome, and the outcomes would have to be real. If students had to learn certain things – they would just have to do so. If we take the forest analogy, one project could be to use inexpensive drones to measure forest canopies. That might stand out as a good upper year project. So what about first year. Maybe first year could be about developing something small? Maybe an app? Students would likely gain incredible skills in designing, implementing and marketing an app.

What are the benefits of this approach? Firstly, we could likely cut back a degree to three years. Thats right, 3 years. Why do we need 4-5 years for a degree. If students wanted to do coop, they wouldn’t be restricted by “classes they have to take to fulfill the degree requirements”, so could so it whenever they like. There likely would be issues with ancillary courses such as math, or science, but maybe they would need to change their perspective too?

Okay so some will argue that “practice” is what coop is for, and I get that. But the reality is that *every* student should have the same experience. Worst case scenario, opt for a hybrid model where the first two years of a degree are traditional, and the last two-years follow the apprenticeship model (or 1.5/1.5). Will this ever happen? No, unlikely. Few institutions have the gutsiness required to take this big jump into the unknown. But you know what? It may just actually work.

Advertisements

Why floating-point numbers aren’t really real

There are two types of numbers in programming (let’s ignore complex numbers) – floating-point and integers. Integers are whole numbers, and are therefore always precise. Therefore assigning 4 to an integer variable, no matter the language, will always result in the number 4 being precisely stored. Floating-point numbers are any numbers with a decimal point in them. 3.14159 is a floating-point number. So what is a real number? A real number is the “analogue” version of a floating-point number – they have infinite precision, and are continuous, and “nonlossy”. This means that π is represented as:

3.1415926535897932384626433832795028841971693993751058209…

There are no bounds. Now the difference between reals and floating-point numbers is that the latter have limited precision – it really depends on the abilities of the computer. Floating-point numbers are essentially approximations of real numbers. Some languages, such as Fortran, use the term real to denote floating-point numbers – but they aren’t real. Also, because they are approximations, they are susceptible to errors. Here’s an example in Python: the value of the expression 0.1+0.1+0.1-0.3 is 5.551115123125783e-17. Do this on paper, and you will get zero. That’s because computers don’t store all floating-point numbers precisely.

Floating-point numbers are represented in computers as binary (base 2) fractions. Therefore to understand why floating-point numbers are approximations, it’s easiest to do a conversion from a decimal fraction to a binary fraction. This can be done using the following steps:

  1. Multiply the number by two
  2. Take decimal as the digit
  3. Take the fraction as the starting point for the next step
  4. Repeat until you either get to 0 or a periodic number
  5. Read the number starting from the top – the first result is the first digit after the comma

So converting 0.125 to binary gives:

0.125 × 2 = 0.25
0.25 × 2 = 0.5
0.5 × 2 = 1.0

So the binary number is .001. Now 0.1 *seems* like it would convert easily. But here’s the problem… let’s convert it:

0.1 × 2 = 0.2
0.2 × 2 = 0.4
0.4 × 2 = 0.8
0.8 × 2 = 1.6
0.6 × 2 = 1.2
0.2 × 2 = 0.4
0.4 × 2 = 0.8
0.8 × 2 = 1.6
0.6 × 2 = 1.2

So the binary equivalent is .00011(0011) . The 0011 will occur ad infinitum. Which means 0.1 won’t be stored exactly. Converting the number back to a decimal fraction will derive a number pretty close to, but not exactly 0.1. Now many systems will print a decimal approximation of the true decimal value of the binary approximation stored by the machine. (whew!). So you might actually see 0.1 printed. Some languages, such as Python will allow you to see what is happening behind the scenes. This is achieved using the Decimal module.

> from decimal import Decimal
> Decimal(0.1)
> Decimal('0.1000000000000000055511151231257827021181583404541015625')

This is known as a representational error – certain decimal fractions can’t be represented exactly as binary fractions.

What is dead code?

Dead or unreachable code is code that will never be executed. It manifests itself as variables that are declared but never used, functions that are never called, or code that is skipped because of a branch. Since the dead code is not executed, it is an error, often a logic error. Dead code is created by cavalier programmers. An easy way to create dead code is by use of a goto statement and labels. Here is an example:

goto process_image;
j = j + 1;  // dead code here.

Since we have an unconditional branch to the label processing on the first line, the assignment statement on the second line can never be executed. A return statement can be used to make dead code. Here is an example:

int div_0(double a, double b)
{
    if (b == 0)
        return 1;
    else 
        return 0;
    return a/b;   // dead code here.
}

The arithmetic expression a/b is performed after two return statements inside an if-else. Since the if-else combination will result in one of the return statements being executed, the final return statement becomes an orphan, because it is impossible to get to. That is, since code execution is linear, and there is no conditional expression wrapping the return statement, any code after the return statements cannot possibly be executed.

Note that very few compilers will identify orphan code.

DON’T fix what isn’t broken

The biggest problem with some programmers is that they like to fix what isn’t broken. This starts out innocuously enough – a piece of code that works, but isn’t exactly an “ideal” piece of code. Then the programmer starts to fiddle with the code. This is all good and well, unless they take no precautions, i.e. they work on the original code and fail to make a copy of the code. What usually happens is they end up breaking the code. Then they can’t fix it.

I’ll say it again – make a COPY of the code.

If you do want to modify code that already works, consider the following:

  • Modify the code so that it has the same functionality, but is shorter or more efficient.
  • Modify the code so that it has the same functionality but replaces user-defined code with a built-in functionality.
  • Modify the code so that it has the same functionality, but is easier to read and understand.
  • Modify the code so that it has the same functionality, but orphaned code has been removed.

Note that modifying code can lead to errors being introduced into the code.

Clang – YIKES – a C compiler with user-friendly error messages!

One of the inherent problems with many programming languages is that compilers are often less than friendly when it comes to compiler messages. Not so with the Clang compiler – front end for C, C++, Objective-C and Objective-C++. Developed by Apple, this compiler actually produces messages the novice can understand. Consider the following piece of non-sensical code, containing a number of syntactic errors:

1  #include <stdio.h>
2
3  int main(void)
4  {
5      int anArray[n];
6      int i, n=10;
7      char aString[20];
8
9      scanf("%s", &aString);
10     i = 0;
11     while(true){
12         anArray[i] = 0
13     }
14     return 0;
15 }

When the code is compiled with gcc, we get the following messages:

$ gcc -Wall -std=c99 errors.c
errors.c: In function ‘main’:
errors.c:5: error: ‘n’ undeclared (first use in this function)
errors.c:5: error: (Each undeclared identifier is reported only once
errors.c:5: error: for each function it appears in.)
errors.c:9: warning: format ‘%s’ expects type ‘char *’, but argument 2 has type ‘char (*)[20]’
errors.c:9: warning: format ‘%s’ expects type ‘char *’, but argument 2 has type ‘char (*)[20]’
errors.c:11: error: ‘true’ undeclared (first use in this function)
errors.c:13: error: expected ‘;’ before ‘}’ token
errors.c:6: warning: unused variable ‘n’
errors.c:5: warning: unused variable ‘anArray’

Clearly from these error messages, there are issues with the code. Basically here are the actual problems with the syntax of the code:

  1. On line 5 of the code, n is used to size the integer array anArray, however n is not declared until line 6.
  2. On line 6, n is declared but never used.
  3. On line 9, the scanf() function uses an ampersand, &, before the string variable, which is redundant.
  4. On line 11, the variable true does not exist.
  5. On line 12, the statement is not terminated by a ;.

The inherent problem with gcc, and classically most compilers is that the error messages may not always be worded in the most appropriate way for novice programmers. That and no context is given, requiring the programmer to refer back to the code.

Here are the messages produced using Clang:

cc errors.c
errors.c:5:17: error: use of undeclared identifier 'n'
    int anArray[n];
                ^
errors.c:9:13: warning: format specifies type 'char *' but the argument has type
 'char (*)[20]' [-Wformat]
    scanf("%s", &aString);
           ~^ ~~~~~~~~
errors.c:11:11: error: use of undeclared identifier 'true'
    while(true){
          ^
1 warning and 2 errors generated.

Firstly, you’ll notice that this compiler has colour-coded the error messages. This makes it easier to differentiate errors from warnings. Secondly, it actually reproduces the offending code. This means the programmer does not need to refer back to the code in order to understand the problems. Thirdly, the source of the error on the line is pinpointed using the ^ character. Now not all the errors are shown, and that may be of benefit to the novice, as to not overwhelm them.

Fixing these 2 errors and the warning and re-compiling it results in the final problem being routed out:

errors.c:12:23: error: expected ';' after expression
    anArray[i] = 0
                  ^
                  ;
1 error generated.

A compiler well worth considering.

Zen-like algorithms

A simpler algorithm may sometimes be more challenging to find, but consider the benefits.

Simple algorithms are:

  • easier to understand
  • easier to implement
  • easier to verify and debug
  • easier to maintain
  • more compact
  • more efficient
  • more fun

For example, consider the following snippets of code to to find the maximum value between three numbers, x, y , and z.

if (x < y)
    if (x < z)
        min = x;
    else 
        min = z;
else
    if (y < z)
        min = y;
    else
        min = z;

It doesn’t seem that complex, however the code could be easier to understand. Let’s try something else:

if (x < y && x < z)
    min = x;
else if (y < z)
    min = y;
else 
    min = z;

This is simpler of course, but is there another, even simpler, Zen-like way to code the algorithm:

min = x;
if (y < min)
    min = y;
if (z < min)
    min = z;

The computers of Star Trek

The original starship USS Enterprise (circa 2245) was a marvel of engineering. Powered by a controlled matter-antimatter reaction the USS Enterprise was brimming with technology, with the computer tasked with controlling much of it, from monitoring life-support systems to food synthesis. The technology dealt with problems computer science is still dealing with – things like voice control, automatic programming, and computer analysis of complex problems.

In 1977, Schmucker and Tarr [1] wrote a paper from which this discussion is based with the same title as this blog post [1]. They postulated the size of Enterprise’s data repository as 10²² bits with an access time of 10^-15 seconds, something called a femtosecond. It would have to be extremely large to hold the amount of data it does. Now 10²² bits is equivalent to 1.25 zettabytes, which is about 1.25 trillion gigabytes. To put this into perspective, a recent study estimated that nearly 8 zettabytes of data will be generated globally in 2015. It is set to increase to 35 zettabytes by 2020. So we are generating a lot of data, which doesn’t even take into account data which already exists, or is yet to be digitized. Imagine storing imagery of the entire planet? So we’re talking a lot of data. Now couple all this with a couple of centuries more of data, and data from other planets in the United Federation of Planets, and it is almost impossible to predict how much data storage would be needed. Likely somewhat more than 10²² bits.

isolinearChip

Isolinear chips

Does the Enterprise use magnetic memory? I doubt it. Later starships made use of Isolinear Optical Chips (circa 2349), which is the size of a modern USB stick. They hold 2.15 Kiloquads, which is equivalent to 2^215 bytes of data [see here for an explanation]. Now this might make more sense from a data storage point of view. Imagine being able to carry around the equivalent of 5.26561458 × 10^43 zettabytes of information? The nice thing about these chips is that they integrate storage and processing into one entity, which may reduce the reliance on access time. The Galaxy class ships had three redundant cores, each had 2048 storage modules with 144 Isolinear chips each for a total of 294,912 chips per core.

So from a futuristic perspective, I doubt the ability to store data will be an issue. Current solid state technology allows for capacities of 4GB, which I’m sure will increase.  There has to be a better way – and maybe there is. Scientists at the Swiss Federal Institute of Technology are working on a way to encode data onto DNA. One gram of DNA can potentially hold up to 455 exabytes of data – truly amazing. This leads us down the road of bio-inspired computing, which is certainly possible, it will just take time.

As for access time? It’s hard to compare without more information on the type of storage. Network speeds are moving towards 32 terabytes per second on a single piece of glass fiber, enough to transfer a 1GB movie in 0.03 milliseconds. This may be fast enough for a contained environment like the Enterprise. The downside is of course data access from the storage medium – current technology puts this at 500MB/s on a top-line solid state drive – fast, but still a bottleneck.

[1] Schmucker, K.J., Tarr, R.M., “The computers of Star Trek”, BYTE, Dec. pp.12-14, 172-183 (1977)

Read some code today

Programmers like to write code. But how many like to read code? Likely very few.

So why is this the case? The main reason may be because reading somebody else’s code may not be a lot of fun. Particularly if it is poorly documented, or poorly structured. Oh yeah, “those things don’t really matter”, I hear you tell yourself. Try reading a piece of code that lacks proper formatting, that is full of “shortcuts”, and has no documentation – you will quickly abandon it. One persons elegance is another persons character dump.

But why would you want to read code? Boooooorrrring. Well here’s one reason – it is possible to learn new ways of writing algorithms, creating data structures, or handling exceptions. If you are a novice programmer, then reading code will help you understand how certain control structures, and data structures are used in the context of a program that works. You can take a piece of existing code and modify it – see what it does. You may learn more than reading some boring book on programming.

If you have some experience programming, go back and read some of your own code. Do you still know what it does? Bet it probably lacked some structure, and probably isn’t commented very well, is it? Maybe consider looking at an algorithm written in different languages. Reading code will improve your skills at programming in a similar way that reading copious literature will hone your writing skills. Reading code improves your language vocabulary, and your ability to craft code in different ways.

Artificial intelligence ??

Watch a TV series such as Battlestar Galactica, and ponder the future of intelligence. Is it indeed possible to create a “skinjob”, essentially an entity which is biologically similar in every way to humans except with the processing power of a multi-processor system, and enhanced senses? Can we encapsulate the workings of the human brain inside silicon? The truth is that it is very unlikely. It may be possible to meld the human mind with computers of some form, as in the Borg of Star Trek’s, but an autonomous intelligent entity? Maybe more like the hubots from the Swedish scifi show Real Humans.

Contemporary robots can do rudimentary thinking, they can walk up stairs, play chess, and soccer. But these are not revolutionary algorithms. Vision systems are used extensively in factory robots to search for defects in products, they are fast and efficient algorithms, but they are not truly intelligent. They follow a series of rules to process data. Sometimes algorithms “learn”, but they have to be given the data in the first place, they cannot search for it themselves, nor can they interpret the data the way the brain does. A case in point is the perception of beauty. Robots may one day have the intelligence to perceive and interpret a scene, the same way human vision works. However will they be able to look at a field of sunflowers and appreciate their beauty? Or peer into the heavens and wonder what lies beyond the stars?

Is it possible to make a machine that is more intelligent than a human? It may process things a couple of billion times faster, but is it smarter? Everything that an electronic brain does should be in the scope of a human brain because it is the human brain that designs the algorithm used by the machine. In the end, AI is not really AI. It is more likely some form of quasiintelligent computability. For an object to have true artificial intelligence requires traits such as intuition and flexibility, to look beyond their programming. Humans are not given vast amounts of data to learn from… well that’s not exactly true. Humans are surrounded by data, which we process into information. We learn to associate the word “tree” with a woody plant, that can be deciduous or coniferous, but once we know what a tree is, our brains can identify the 100,000 species of trees by sight – and differentiate some of them based on size, shape, or leaf structure. A computer could do the same, but it has to be given information, it cannot infer it.

Besides which do we really need artificial intelligence? There are enough intelligent human and animals on the planet, that we don’t need to create it just for the sake of it. As HAL from 2001 put it…

“I know I’ve made some very poor decisions recently.”

 Humans make enough poor decisions, we don’t need artificial intelligence to make them as well.