Why physical usability really matters

We teach computer science, but usually relegate the human-computer interaction and usability aspects of it to upper year classes, where its relevance may be “too little, too late”. Software is a means to an end, whether it makes a washing machine function, control a self-driving car, or run a social media app. Software should possess seven qualities: user experience, availability, performance, scalability, adaptability, security, and economy. But, if you get the user experience wrong, then none of the rest matter. Unfortunately software exhibiting bad user experiences abounds, from terrible websites, to lacklustre apps, and awful appliance interfaces. Part of the problem is that software designers rely on advice from people who *aren’t the end-users* – sometimes they have focus-groups, but it doesn’t seem like often enough.

A case in point is Metrolinx Preto readers on various transit systems. I don’t know who chooses the device interfaces, but I imagine it is the transit systems themselves. On some systems, the readers are designed in such a manner that once tapped, the user is provided with confirmation via a sound, and a brief verification of the balance on the card. This is useful, because otherwise the user is forced to login to the Presto website to find the balance on the card. This is done on a two-line screen. The TTC on the other hand provides a huge screen area, and the only feedback they provide is success or failure – no information on the current balance. There are supposedly 10,000+ readers out there, and I would imagine transparency would be a key factor, making all the readers function in the same way from a user experience point-of-view. Consider the TTC Presto reader found in the buses.

The area associated with tapping the Presto card represents roughly 12% of the readers, front surface area. The feedback screen on the other hand takes up 30% of the real estate. Now many people using the reader for the first time will wrongly try and tap the screen, because it is the first thing the eyes are drawn too – not the small area below it. There are two problems, one is that the actual human-machine interface is very small, and the second is that the large screen basically mimics the visual instructions  already on the tap area. A better interface would have concentrated more on the interface area, and less on a huge feedback area (which serves no other purpose really). These other Presto card readers do the job *way* better from the perspective of a person interacting with a reader. Feedback is provided by means of the 2-line LED display, and its almost impossible to get the card-tapping wrong.

Considering the market, you would also think that Presto would have a mobile app, not force users into using a mobile web site. It just makes sense.



Optical blur and the circle of “non-sharpness”

Most modern cameras automatically focus a scene before a photograph is acquired. This is way easier than the manual focus that occurred in the ancient world of analog cameras. When part of a scene is blurry, then we consider this to be out-of-focus. This can be achieved in one of two ways. The first way is by means of using the Aperture Priority setting on a camera.  Blur occurs when there is a shallow depth of field. Opening up the aperture to f/2.8 allows in more light, and the camera will compensate with the appropriate shutter speed. It also means that objects not in the focus plane will be blurred. The second way is through manually focusing a lens.

Either way, the result is optical blur. But optical blur is by no means shapeless, and has a lot to do with a concept known as the circle of confusion (CoC). The CoC occurs when the light rays passing through the lens are not perfectly focused. It is sometimes known as the disk of confusion, circle of indistinctness, blur circle, or blur spot. CoC is also associated with the concept of Bokeh, which I will discuss in a later post. Although honestly – circle of confusion may not be the best term. In German the term used is “Unschärfekreis”, which translates to “circle of non-sharpness“, which inherently makes more sense.

A photograph is basically an accumulation of many points – which represent the exact points in the real scene. Light striking an object reflects off many points on the object, which are then redirected onto the sensor by  the lens. Each of these points is reproduced by the lens as a circle. When in focus, the circle appears as a sharp point, otherwise the out-of-focus region appears as circle to the eye. Naturally the “circle” normally takes the shape of the aperture, because the light passes through it. The following diagram illustrates the circle of confusion. A photograph is exactly sharp only on the focal plane, with more or less blur around it.  The amount of blur depends on an objects  distance from the focal plane. The further away, the more distinct the blur. The blue lines signify an object in focus. Both the red and purple lines show objects not in the focal plane, creating large circles of confusion (i.e. non-sharpness = blur).

Here is a small example. The photograph below is taken in Bergen, Norway. The merlon on the battlement is in focus with the remainder of the photograph beyond that blurry. Circles of confusion are easiest to spot as small bright objects on darker backgrounds. Here a small white sign becomes a blurred circle-of-confusion.

Here is a second example, of a forest canopy, taken through focusing manually. The CoC are very prevalent.

As we de-focus the image further, the CoC’s become larger, as shown in the example below.

Note that due to the disparity in blurriness in a photograph, it may be challenging to apply a “sharpening” filter to an image.


30-odd shades of gray – the importance of gray in vision

Gray (or grey) means a colour “without colour”… and it is a colour. But in terms of image processing we more commonly use gray as a term synonymous to monochromatic (although monochrome means single colour). Now grayscale images can potentially come with limitless levels of gray, but while this is practical for a machine, it’s not useful for humans. Why? Because the structure of human eyes is composed of a system for conveying colour information. This allows humans to distinguish between approximately 10 million colours, but only about 30 shades of gray.

The human eye has two core forms of photoreceptor cells: rods and cones. Cones deal with visioning colour, while rods allow us to see grayscale in low-light conditions, e.g. night. The human eye has three types of cones sensitive to magenta, green, and yellow-to-red. Each of these cones react to an interval of different wavelengths, for example blue light stimulates the green receptors. However, of all the possible wavelengths of light, our eyes detect only a small band, typically in the range of 380-720 nanometres, what we known as the visible spectrum.The brain then combines signals from the receptors to give us the impression of colour. So every person will perceive colours slightly differently, and this might also be different depending on location, or even culture.

After the light is absorbed by the cones, the responses are transformed into three signals:  a black-white (achromatic) signal and two colour-difference signals: a red-green and a blue-yellow. This theory was put forward by German physiologist Ewald Hering in the late 19th century. It is important for the vision system to properly reproduce blacks, grays, and whites. Deviations from these norms are usually very noticeable , and even a small amount of hue can produce a noticeable defect. Consider the following image which contains a number of regions that are white, gray, and black.

Now consider the photograph with a slight blue colour cast. The whites, grays, *and* blacks have taken on the cast (giving the photograph a very cold feel to it).

The grayscale portion of our vision also provides contrast, without which images would have very little depth. This is synonymous with removing the intensity portion of an image. Consider the following image of some rail snowblowers on the Oslo-Bergen railway in Norway.

Now, let’s take away the intensity component (by converting it to HSB, and replacing the B component with white, i.e. 255). This is what you get:

The image shows the hue and saturation components, but no contrast, making it appear extremely flat. The other issue is that sharpness depends much more on the luminance than the chrominance component of images (as you will also notice in the example above). It does make a nice art filter though.


Software – a step back in time

It has been 60 years since the we crawled out of the primordial muck of assembly coding and started to program with the aid of Fortran. Fortran was our friend in the early days of coding, as much as Cobol, and many other languages that were to evolve. However times were different, programs were small, and usually geared towards performing a task faster than the human mind could. Ironically, our friends Fortran and Cobol are still with us, often to such an extent that it is impossible now to extricate ourselves from them.  The problem is we are still doing things the same way as we were 30 years ago, and software has ballooned to gargantuan proportions. It’s not only size, it’s also complexity. Writing a program 1000 lines long to perform some mathematical calculation, is vastly different to designing a piece of software a 50 million lines long to run a car. The more complex the software, the more chances that it conceals hidden bugs, and the greater the probability that no one really understands how it works. It’s not the languages that are the problem, it is the methodology we use to design software.

Learning to program is not fundamentally difficult. Note that I emphasize the word program, which encompasses taking a problem, solving the problem, designing an algorithm, *coding* the algorithm in a language, and testing the program to make sure it works. This is of course true if the problem can be solved. Can I write a program for cars to detect STOP signs using a built-in camera? Probably. Can it be tested? Probably? Can you create autonomous vehicles? Yes. Can you guarantee they will operate effectively in all environments? Unlikely. What happens to a self-driving car in a torrential downpour? Or a snowstorm? Autonomous trains/subways work well because they run on tracks in a controlled space. Aerospace software works well because avionics may be taken more seriously. Cars? Well who knows. A case in point – a Boeing 787 has 6.5 million lines behind its avionics and online support systems, the software in a modern vehicle? – 100 million LOC. Something not quite right there…

Fundamentally I don’t think we really know how to build systems 100 million LOC in size. We definitely don’t know how to test them properly. We don’t teach anyone to build these huge systems. We teach methods of software development like waterfall and agile, which have benefits and limitations, but maybe weren’t designed to cope with these complex pieces of software. What we need is something more immersive, some hybrid model of software development that takes into account that software is a means to an end, relies heavily on usability, and may be complex.

Until we figure out how to create useful, reliable software, we should likely put the likes of things like AI back on the shelf. Better not to let the genie out of the bottle until we know how to properly handle it.

I highly recommend reading this article: A small group of programmers wants to change how we code—before catastrophe strikes.

Coding? Why kids don’t need it.

There has been a big hoopla in the past few years about learning to code early. The number of articles on the subject is immense. In fact some people even think two year olds should start to code. I think it is all likely getting a bit out of hand. Yeah, I guess coding is somewhat important, but I think two year olds should be doing other things – like playing for instance. People seem to forget that kids should be kids. Not coders. Some places can’t even get a math curriculum right. For years Ontario has used an  inquiry-based learning system for math in elementary schools. This goes by a number of different names : discovery learning, or constructivism, which focuses on things open-ended problem solving. Problem is it doesn’t really work. Kids can’t even remember the times tables. So if you can’t get the basics of math right, then likely you won’t get coding right either.

Why do we code? To solve problems right? Coding, however, is just a tool, like a hammer. We use a hammer to build things, but not until we have some sort of idea *what* we are building. To understand how to build something, we first have to understand the context in which it will be built, the environment, the materials, the design. You can’t teach coding in isolation. That’s almost like giving someone a hammer, a bucket of nails and a skid of lumber and saying “build something”. The benefits of teaching kids to code are supposedly numerous. from increased exploration and creativity, mastery of new skills, and new ways of thinking. Now while it may be fun to build small robots and write small programs to have them do things, it is far from the reality of real computer science. The most code written every year is still in Cobol, to maintain the legacy code base that underpins the world’s financial system (and other systems alike). A world far removed from learning to code in Alice.

We have been here once before, in the 1980s there was an abundance of literature exploring the realm of teaching children to program (using visual languages such as Logo). It didn’t draw masses of students into programming then, and adding coding classes to the curriculum in elementary school now won’t do it either. Some kids will enjoy coding, many likely won’t.  Steve Jobs apparently once said “Everyone should learn how to program a computer, because it teaches you how to think.” But it doesn’t. You can’t write programs, if you don’t know how to solve the problems the programs are meant to deal with. Nearly anyone can learn to write basic programs like Python and Julia, but if you don’t have a purpose, then what’s the point? It’s nice to have coding as a skill, but not at the expense of learning about the world around you. Far too many people are disassociated from the world around them. Sitting in front of a machine learning to code, may not be the best approach to broaden the minds of our youth. We could simply start by having kids in school do more problem solving tasks, from the wider world – build things with Lego, make paper airplanes, find ways of building a bridge across a creek, cooking. The tasks are practical, and foster a means of problem solving. With a bag of problem solving skills, they will be able to deal with lots of things better in life.

Learning to code is not some form of magic, and sometimes kids should just be allowed to be kids.


Why human eyes are so great

Human eyes are made of gel-like material. It is interesting then, that together with a 3-pound brain composed predominantly of fat and water, we are capable of the feat of vision. Yes, we don’t have super-vision, and aren’t capable of zooming in on objects in the distance, but our eyes are magical. Eyes are able to focus instantaneously, and at objects as closer as 10cm, and as far away as infinity. They also automatically adjust for various lighting conditions. Our vision system is quickly able to decide what an object is and perceive 3D scenes.

Computer vision algorithms have made a lot of progress in the past 40 years, but they are by no means perfect, and in reality can be easily fooled. Here is an image of a refrigerator section in a grocery store in Oslo. The context of the content within the image is easily discernible. If we load this image into “Google Reverse Image Search” (GRIS), the program says that it is a picture of a supermarket – which is correct.

Now what happens if we blur the image somewhat? Let’s say a Gaussian blur with a radius of 51 pixels. This is what the resulting image looks like:


The human eye is still able to decipher the content in this image, at least enough to determine it is a series of supermarket shelves. Judging by the shape of the blurry items, one might go so far to say it is a refrigerated shelf. So how does the computer compare? The best it could come up with was “close-up”, because it had nothing to compare against. The Wolfram Language “Image Identification Program“, (IIP) does a better job, identifying the scene as “store”. Generic, but not a total loss. Let’s try a second example. This photo was taken in the train station in Bergen, Norway.

GRIS identifies similar images, and guesses the image is “Bergen”. Now this is true, however the context of the image is more related to railway rolling stock and the Bergen station, than Bergen itself. IIP identifies it as “locomotive engine”, which is right on target. If we add a Gaussian blur with radius = 11, then we get the following blurred image:

Now GRIS thinks this scene is “metro”, identifying similar images containing cars. It is two trains, so this is not a terrible guess. IIP identifies it as a subway train, which is a good result. Now lets try the original with Gaussian blur and a radius of 21.

Now GRIS identifies the scene as “rolling stock”, which is true, however the images it considers similar involve cars doing burn-out or stuck in the snow (or in one case a rockhopper penguin). IIP on the other hand fails this image, identifying it as a “measuring device”.

So as the image gets blurrier, it becomes harder for computer vision systems to identify, whereas the human eye does not have these problems. Even in a worst case scenario, where the Gaussian blur filter has a radius of 51, the human eye is still able to decipher its content. But GRIS thinks it’s a “photograph” (which *is* true, I guess), and IIP says it’s a person.

When things aren’t quite what they seem

Some algorithms I don’t get. I just read about an algorithm “Treepedia“, which measures the canopy cover in cities, using Google imagery – or actually it measures the “Green View Index (GVI). As their website describes, they don’t count the number of individual trees, but have created a “scaleable and universally applicable method by analyzing the amount of green perceived while walking down the street”. This gives Toronto a “Green View Index” of 19.5%. Nice. But look a little closer and they also state “The visualization maps street-level perception only, so your favorite parks aren’t included!” So, it focuses on street trees, which is interesting, but doesn’t represent the true picture. The green cover in an urban environment cannot be estimated from a simple series of street-view photographs.

Now that’s a problem. The algorithm clearly does *not* calculate the amount of land covered by trees.  That’s a problem because Toronto has a bunch of ravines, and numerous parks filled with trees. Official estimates put the actual tree canopy cover at somewhere between 26.6% and 28%. Vancouver is the opposite – Treepedia gives it a rating of 25.9%, whereas official documents show the amount closer to 18% (2013). That’s made up of canopy on private property (62%), parks (27%), and streets (11%).

So why didn’t they include parks, and forests? Because it’s not a trivial thing to do. Green spaces from aerial images in urban areas also include grassed areas, and gardens, shrubs etc. But the algorithm is not entirely to blame though, maybe the logic behind it? Look, representing canopy cover is important, but that’s not what this algorithm does. It calculates some measure of “greenness” calculated at street level. It doesn’t take into account the 70ft silver maple in my back yard. It might be useful tool for filling in the street canopy with new plantings, but a metric which in the words of the website  “by which to evaluate and compare canopy cover” it is clearly not. The problem is compounded when the media, run articles and misinterpret the data. Even cities publish the findings, as shown on the City of Toronto website, proclaiming “Toronto was named one of the greenest cities in the world”. They fail to mention that only 25 cities are shown, and mislead readers with statements like “Each city has a percentage score that represents the percentage of canopy coverage”, which is clearly not the case. Besides you can hardly calculate the GVI of only 25 cities and call it a “world ranking”.

P.S. For those interested, there is an abundance of literature relating to actual canopy cover estimation using airborne LiDAR, aerial imagery, and satellite imagery.



Fixing photographs (e.g. travel snaps) (ii)


There are some images which contain shafts of light. Sometimes this light helps highlight certain objects in the photograph, be it as hard light or soft light. Consider the following photo of a viking carving from the Viking Ship Museum in Oslo. There are some nice shadows caused by the light streaming in from the right side of the scene.

One way to reduce the effects of light is to convert the photograph to black-and-white.

By suppressing the role colour plays in the image, the eyes become more fixated on the fine details, and less on the light and shadows.

4. IMproving on sharpness

Sometimes it is impossible to take a photograph with enough sharpness. Tweaking the sharpness just slightly can help bring an extra crispness to an image. This is especially true in macro photographs, or photographs with fine detail. If the image is blurry, there is every likelihood that it can not be salvaged. There is only so much magic that can be performed by image processing. Here is a close-up of some water droplets on a leaf.

If we filter the image using some unsharp masking to sharpen the image, we get:

5. saturating colour

Photographs of scenes containing vivid colour may sometimes appear quite dull, or maybe you want to boost the colour in the scene. By adjusting the colour balance, or manipulating the colour histogram, it is possible to boost the colours in a photograph, although they may end up “unrealistic” colours in the processed image. Here is a street scene of some colourful houses in Bergen, Norway.

Here the image has been processed with a simple contrast adjustment, although the blue parts of the sky have all but disappeared.



Fixing photographs (e.g. travel snaps) (i)

When travelling, it is not always possible to get a perfect photograph. You can’t control the weather – sometimes it is too sunny, and other times there is not enough light. So the option of course is to modify the photographs in some way, fixing what is considered “unaesthetic”. The problem lies in the fact that cameras, as good as they are, don’t always capture a scene the way human eyes do. Your eyes, and brain correct for many things that aren’t possible with a camera. Besides which we are all tempted to make photographs look brighter – a legacy of the filters in apps like Instagram. Should we fix photographs? It’s one of the reasons the RAW file format exists, so we can easily modify an images characteristics. At the end of the day, we fix photographs to make them more aesthetically pleasing. I don’t own a copy of Photoshop, so I don’t spend copious hours editing my photographs, it’s usually a matter of adjusting the contrast, or performing some sharpening.

There is of course the adage that photographs shouldn’t be modified too much. I think performing hundreds of tweaks on a photograph results in an over-processed image that may not really represent what the scene actually looked like. A couple of fixes to improve the aesthetic appeal?

So what sort of fixes can be done?

1. Fixing for contrast issues

Sometimes its not possible to take a photograph with the right amount of contrast. In an ideal world, the histogram of a “good” photograph should be uniformly distributed. Sometimes, there are things like the sky being overcast that get in the way. Consider the following photo, which I took from a moving train using shutter-priority with an overcast sky.

The photograph seems quite nice right? Does it truly reflect the scene I encountered? Likely not quite. If we investigate the histogram (the intensity histogram), we notice that there is one large peak towards the low end of the spectrum. There is also a small spike near the higher intensity regions, most likely related to the light regions such as the sky.

So now if we stretch the histogram, the contrast in the image will improve, and the photograph becomes more aesthetically pleasing, with much brighter tones.

2. Fixing for straight lines

In the real world, the lines of buildings are most often straight. The problem with lenses is that they are curved, and sometimes this impacts the form of photograph being acquired. The wider the lens, the more straight lines converse to the centre of the image. The worse case scenario are fish-eye lenses, which can have a field of view of up to 180°, and result in a barrel distortion. Take a photograph of a building, and the building will appear distorted. Human eyes compensate for this with the knowledge that it is a building, and its sides should be parallel – they do not consciously notice converging vertical lines. However when you view a photograph, things are perceived differently – it often appears as though a building is leaning backwards. Here is an photograph of a building in Bergen, Norway.


Performing a perspective correction creates an image where the vertical lines of the building are truly vertical. The downside is of course that the lower portion of the image has been compressed, so if the plan is to remove distortion in this manner, make sure to allow enough foreground in the image.

Obviously it would be better to avoid these problems when photographing buildings.



Making a simple panorama

Sometimes you want to take a photograph of something, like close-up, but the whole scene won’t fit into one photo, and you don’t have a fisheye lens on you. So what to do? Enter the panorama. Now many cameras provide some level of built-in panorama generation. Some will guide you through the process of taking a sequence of photographs that can be stitched into a panorama, off-camera, and others provide panoramic stitching in-situ (I would avoid doing this as it eats battery life). Or you can can take a bunch of photographs of a scene and use a image stitching application such as AutoStitch, or Hugin. For simplicities sake, let’s generate a simple panorama using AutoStitch.

In Oslo, I took a three pictures of a building because obtaining a single photo was not possible.

This is a very simple panorama, with feature points easy to find because of all the features on the buildings. Here is the result:

It’s not perfect, from the perspective of having some barrel distortion, but this could be removed. In fact the AutoStitch does an exceptional job, without having to set 1001 parameters. There are no visible seams, and the photograph seems like it was taken with a fisheye lens. Here is a second example, composed of three photographs taken on the hillside next to Voss, Norway. This panorama has been cropped.

This scene is more problematic, largely because of the fluid nature of some of the objects. There are some things that just aren’t possible to fix in software. The most problematic object is the tree in the centre of the picture. Because tree branches move with the slightest breeze, it is hard to register the leaves between two consecutive shots. In the enlarged segment below, you can see the ghosting effect of the leaves, which almost gives that region in the resulting panorama a blurry effect.

Ghosting of leaves.

So panorama’s containing natural objects that move are more challenging.