Debunking TV technology: Reflections in drinking glasses

Apart from “enhancement”, another TV myth is this idea of reflections, either in drinking glasses, or heaven forbid peoples eyes (post to follow). Case in point, “Crossing Lines Episode 6 Special Ops – Part 2″. Using a video stream image of a glass half full of water in a remote farmhouse, the team use a reflection in the glass, of a small opening in the window curtain to determine there’s an object on the distant hill which they use to pinpoint the house, after sharpening the image so that the object is crisp. Here’s a representation of the glass.

crossL_refl

And here’s what the set-up would look like:

debunkReflect1

Now the problem here is one of simple optics. The glass in the picture is a multi-faceted glass. The challenge would be having the correct series of angles to reflect the outside image, which we presume is about 90 degrees to the right of the camera.  Below are three photographs taken of my backyard through: (i) a 9-sided glass with water in it, (ii) a round glass with water in it, and (iii) a round glass with no water in it. The 9-sided glass by no means allows views from anything but directly ahead, and the images are compressed horizontally and exist on each facet of the glass. The round glass flips the image horizontally, while also distorting it. This is because light “bends” when it passes though one substance to another of a different density. Light traveled from the air, through the glass, through the water, through the back of the glass, and then back through the air, before hitting the scene on the other side. The round glass with no water in it provides an undistorted un-flipped image which is slightly blurry.

debunkReflect2

Experiment: Fill a multi-faceted glass with water and put an object to the side of the glass. Rotate the glass around 360 degrees. You will never see the object.  To really see “around a corner” you have to use a prism, or something like the Super Secret Spy Lens from photojojo.

Here is an experiment performed using a window with 2-panes of glass (and an inert gas in between). Opening the window and taking a photograph out the window, with the camera lens placed perpendicular to the window frame result in an image of the side of the house. The image is a reflection, and is doubled due to the 2-pane nature of the window. However it is possible. Not so possible with a multi-paned glass.

debunkReflect3

 

Advertisements

Why facial recognition shouldn’t scare you.

Some people are paranoid about going to a place like London, because of all the cameras. There is reputed to be 1.85 million CCTV cameras in the UK, or roughly one camera for every 32 people. Big brother is watching. People are being tracked wherever they go, it’s almost like an episode of “Person of Interest”. But let’s think about it for a minute. In London alone there are 10,524 CCTV cameras. The London Tube station network has over 13,000 CCTV cameras. That’s a whole lot of data, which even if it isn’t stored for any great length of time has to be processed.

The real problem? Too many low resolution cameras. It doesn’t matter how many low resolution cameras are installed in an area, if facial details or the characters on a license plate are blurred. Minimum specifications for traffic cameras are about 720 x 576 pixels. Is higher resolution any better? Consider the following photograph of Canary Wharf Station in London (Image courtesy of James Peacock (Creative Commons, flickr ID# 4056621356)). This 4272×2848 (12MP) image has a file size of 14.4MB.

Home time

Having cameras using high resolution photographs only gives an overview of what is happening (and most high resolution CCTV cameras are probably 3-5MP in resolution). Three extracted 100×100 sub-images are shown below. This shows the inherent difficulty associated with CCTV images – even high resolution ones. The first sub-image shows a person’s face, but not clear, and certainly not one that could be enlarged or used for facial recognition. The second sub-image shows some text large enough to be recognizable. The third sub-image vaguely shows a persons face, but is extremely dark due to the lack of uniform lighting in the photograph. It may be possible to extract generic information about a persons face, but a positive identification is extremely unlikely.

small_images

The biggest caveat? – facial recognition itself. Facial recognition algorithms are hindered by the constraints of the imaging device, e.g. poor/uneven lighting, low resolution images, or faces too far from the camera. It works quite well for full frontal faces, or those 20 degrees off, anything beyond that and it struggles.

What is really need is high-definition CCTV systems with hemispheric (360 degree) fields of view, optical zoom which can be triggered remotely, and some form of intelligent screening, not unlike “The Machine” from Person of Interest, a mass surveillance machine.

1984 anyone?

Finding faces in a crowd: Debunking a TV myth

Watch any crime show on TV and eventually they will use a piece of software to “enhance” an image – e.g. zoom-in, deblur, remove noise. For example, enlarge and sharpen a blurred image of a person’s face to the point where it is crisp and identifiable. The reality is far from the truth though. This paper looks at the basic problems of enlarging images. Nonsense like zooming in, looking at images in reflections, and enhancing are only possible if the information is there.

That’s not to say facial recognition won’t work – it may work great if you have a gigapixel image, you can zoom-in to your hearts content. If you have a 2 megapixel image – good luck.


Why internet searching is more like rummaging

In the beginning information was stored in the collective memories of people, passed down through the ages. Then came the “written” word, albeit in symbol form on cave wall, on papyrus, or etched into stone tablets. In the mid 15th century came the Gutenberg press and the printed book. To find some information we first went to search in an encyclopedia, and if we needed more in-depth information, searched a specialized book on the topic. Then in the early 90s came the world-wide-web, or net as it is commonly alluded to, and a new way to distribute knowledge. For in reality there was always a wealth of knowledge out there in the aether, but the means of distributing this knowledge had been missing. Recipes from a particular region were often collected in a book of “local fare”, but its influence rarely went beyond the region. Now we find ourselves with a wealth of information, available in the form of entries on wikipedia, information repositories (e.g. flickr), personal websites, and blogs. But is there too much information? Searching a book for content usually involves looking through the index for keywords. How well we are able to find the information has a direct correlation with how well the book has been indexed. Poorly indexed books will result in the reader spending more time searching the book using brute force, i.e. leafing through every page in an attempt to find the relevant information. The same could be said of searching for information on the net. Yet the problem lies in the fact that there is no “index” for the net, and no reliable way for searching it.

Google is undoubtedly the most successful search engine, based on the notion of information linkage. Google’s algorithm uses a patented system called PageRank to help rank web pages that match a given search string. The PageRank algorithm computes a recursive score for web pages, based on the weighted sum of the PageRanks of the pages linking to them. Websites that have a lot of links to them have a high accessibility index, and therefore a greater chance of “floating” to the top of the Google heap. It doesn’t mean they are the best matches, but rather that they are the most linked. As such pertinent information may be impossible to find, other than by chance occurrence. Visual information is even more challenging, because a good search relies on an image to have appropriate tags associated with it. Content Based Image Retrieval (CBIR) is a concept with heralds a new method of searching, but in actuality it relies on extracting content from an image, which is not going to happen any time soon. In reality, searching the internet for information is more akin to rummaging than practical searching. Rummaging can be described as “searching unsystematically and untidily”. See a pattern? A real search engine would be capable of indexing web pages by extracting key words from the page, and maybe incorporating meta-data such as the time stamp when the page was last modified.

In truth sometimes it may be easier to find information by going to a library and looking through books than finding it on the internet.

Software bloat

Sometimes newer software has a tendency to be larger, or to use larger amounts of system resources than previous versions of the same software. Such software is often called bloatware and is caused by code bloat, in which code becomes undesirably long, slow, or wasteful of resources.

bloat_beanz

When the software industry evolved in the 1970s, there were significant limitations on the amount of disk space and memory available. Developers exhausted considerable effort making programs efficient by fitting them into available resources. The situation in the new millennium is completely reversed. The rapid advancement of technology has made entities such as memory and processors plentiful and inexpensive. In 1995, Niklaus Wirth summed up this situation in Wirth’s Law , “Software gets slower faster than hardware gets faster”. Some of this bloat may be due to inefficiencies in programming practices, lack of optimization, or the overuse of features.

A good example of software bloat is that associated with operating systems. Microsoft Windows ballooned from 15 million LOC in Windows95 to somewhere close to 50 million LOC for Vista in 2007. Apple OSX Tiger (10.4) contained 86 million LOC, whilst the Linux kernel has grown from 5,929,913 LOC in 2003 (2.6.0) to 14,998,651 LOC in 2012 (3.2). Let’s put that into context. A million lines of code printed out would be approximately 18,000 pages. So in the case of OSX Tiger, 1,548,000 pages, or to visualize it better, roughly 4994 copies of the original edition of The Hobbit. Code complexity cannot be completely described by merely citing lines of code (LOC), but it is somewhat revealing. The more code a program has, the greater the complexity, and the greater the likelihood that the code will become slower. There is also danger in reduced reliability – the more code, the more potential for erroneous code.

Sometimes it seems as though software bloat is inversely proportional to the size of a device. So as an electronic device decreases in size, the size of its software increases. Android OS used in mobile devices has over 3 million LOC. The software that runs Paris Metro Line 14: 87,000 LOC. It controls the line’s train traffic, regulates the train speed, manages several alarm devices and allows for traffic of both automatic and non-automatic trains on the same line.

One small piece of code controls moving trains, one huge piece of code controls a mobile device.
Food for thought.

Using Moore’s Law to solve program inefficiencies

As the processing ability of computers becomes faster, we sometimes forget that some devices have limitations. Mobile devices have to contend with processors that draw minimal power, in a device with minimal space. Washing machines and fridges also have minimal embedded systems within them. And then there are devices such as Mars Rover Curiosity. The Curiosity has less processing power than an iPhone 5, but it also had to contend with a 253-day, 352 million-mile flight to Mars. Its memory is radiation hardened to withstand radiation from space. The RAD750 processor can withstand 200,000 to 1,000,000 rads, temperature ranges between –55 °C and 125 °C and requires 5 watts of power. It’s software is 2.5 million LOC written in C.

– Processors: Curiosity’s is 132MHz; the iPhone 5’s is 1.3 GHz.
– Memory: Curiosity’s has 128 MB; the iPhone 5 has 1 GB.
– Storage: Curiosity holds 4 GB; iPhone 5 holds 64 GB.

Writing code for devices with limited resources is intrinsically more challenging than those with abundant resources, and requires a working knowledge of making programs more efficient. In Knuth’s 1974 paper “Structured programming with go to statements” [1] he states: “we should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil”. But with CPUs such as the IBM zEC12, with a clock rate of 5.5 GHz, we may have forsaken the task of writing efficient code altogether. Why? For the simple reason that as time has progressed we have relied on Moore’s Law to solve program inefficiencies – a slow algorithm can be made more efficient by having it run on a faster CPU. Now machines process things so quickly it is hard for students to garner the difference between various algorithms – they end up designing code which is inefficient because they assume machine speed-ups will make up for any inefficiencies in the code.

In the 1970s it was commonplace to do comparative studies of algorithms running on various platforms, to optimize inefficiencies and produce leaner, more compact code. For example, compare a Bubble-sort algorithm with a Quick-sort algorithm by sorting 1000 random integers. The result – both return immediately, with run-times of 0.0 seconds. A common problem – algorithms that had different run-times 20 years ago, now seem equally efficient. Now run the algorithms with 100,000 random numbers: 0.01 seconds for Quick-sort, approximately 40 seconds for Bubble-sort. What sort of person would wait 40 seconds for an app to run? Likely no one. That’s why designing the best program involves knowing something about algorithm efficiency. If both algorithms seem to have the same efficiency, the novice programmer will choose the code that is shortest, i.e. the Bubble-sort. The experienced programmer will realize that the Quick-sort is more efficient, even though it relies on recursion, which uses more stack resources. Maybe an Insertion-sort running at 6 seconds?

Bottom line – We cannot rely on faster CPUs to fix inefficient programs. Nor can we forsake the fact that many devices have limitations that prevent them from running algorithms in the same manner as they would on devices with limitless resources.

[1] Knuth, D.E., “Structured programming with go to statements”, Computing Surveys, 6(4), pp.261-301 (1974).

Does the number of pixels matter?

Okay, here’s the situation. The Nokia Lumia 1020 has a 41 megapixel camera (38MP for 4:3, 34MP for 16:9). It has six Carl Zeiss lenses and supposedly produces fantastic photographs – if you print them 4 feet wide by 3 feet tall. When it comes to megapixels, 41 is possibly too much for the average user. Unless you want to zoom in lots, or get great detail in macro photographs. In reality 12 megapixels is probably enough. Want a bigger image? Take a bunch of images and stitch them together. So double the megapixels means double the image quality right? Not so. Case in point, any series of progressive cameras, like the Olympus E-series.

Olympus E-1 = 2560 × 1920 = 4,915,200 (5MP)
Olympus E-3 = 3648 × 2736 = 9,980,928 (10MP)
Olympus E-5 = 4032 x 3024 = 12,192,768 (12.2MP)

From the E-1 to the E-3 there is only a 42% increase in the width dimension of the image, not the doubling suggested by the megapixels. Increasing a 5MP image to twice its linear dimensions would require a quadrupling of the megapixels – i.e. to double the resolution of a 12MP image would require a 50MP sensor.

Sensor sizes really don’t changed much, but as the amount of pixels increase, the size of the pixels get smaller. Smaller pixels mean less ability to absorb light, and reduced sensitivity. The sensor on the Olympus E-5 is 18×13.5mm, with an imaging area of 17.3x13mm (225mm^2). That means approximately 54,190 pixels per mm^2 and a pixel size of approximately 4.3µm. The Lumia 1020 has a pixel size of 1.1µm (micrometer), which is compensated for by the shear amount of pixels, and the form of sensor. More digital photography tutorials here.

More pixels does not always mean a better image. More megapixels means increased file sizes, increased processing time, apps that take longer to run. The question you really have to ask yourself is, are you really going to take a 38MP photograph of the plate of food you’re eating at a restaurant?

Why building a better iPhone is challenging

So Apple introduces a new iPhone with some updates and everyone is aghast that it isn’t completely redesigned. But it’s getting harder and harder to re-invent mobile devices. I mean they do need to be mobile. Adding more features can lead to increases in size. The iPhone 5 is fit-in-the-pocket comfortable. The Samsung Galaxy Mega is not. You may as well get an iPad mini. Want a bigger camera, say 16 megapixels?, faster processor? That requires a device which is going to eat more power – which means a larger battery, and before you know it the device begins to expand. 16 megapixel photographs also mean that apps will have to work harder to process them, and they’ll need more storage, and likely more memory. What else do you want the iPhone to do? Make you lunch? Drive your car? It’s a mobile device – it does amazing things, but there are limits to what it can (and should) do. Sure compact optics, more efficient processors and better batteries will improve things, but maybe not in the next few development cycles.

The software will likely evolve at a faster pace. Improvements in how the operating system and apps work, increased battery life, and maybe more algorithmic intelligence. The problem is some of these enhancements are transparent to the user, so while it seems as though very little has improved, the opposite is often true. Case in point is Apple Maps. When it first debuted there were many issues with the mapping services. The true genius lies in the use of resolution independent vector maps. This is quite different to the raster images used by others, which aren’t as efficient when used on slow data networks, or networks with low bandwidth. In vector maps, roads, coastlines, and any other data is represented as mathematical lines rather than fixed images. In short the vector images are dynamic, meaning the map labels dynamically reorient themselves, and text scales smoothly, as it too is dynamic. Vector maps save memory, allow maps to be cached on the device, and bandwidth reduced.

Apple Maps is apparently upwards of 5 times more efficient than Google Maps. In an experiment performed by Gizmodo, an identical series of activities were performed using Google Maps and Apple Maps. On Google Maps, the average data download was 1.3MB – Apple Maps came in at 271KB. Reduced data means reduced user costs.

Sometimes the most evolutionary changes are those you can’t see.

Why COBOL doesn’t cripple the mind

I teach a course on legacy programming. More on that in a later post. One of the languages taught in that class is Cobol. It’s been around for over half a century and yet very few institutions teach it anymore. In most students eyes, Cobol is a mythical entity, some language that was supposedly used in the dark ages of programming, supplanted by C and Java. Not used in industry – a dead language. It was Dijkstra who in the 1970s said “the use of COBOL cripples the mind“.

The truth of course is much different. COBOL supports 90% of Fortune 500 business systems everyday and 70% of all critical business systems are written in COBOL. We can’t live without Cobol, and it’s not going away anytime soon. Oh sure, some systems can be re-engineered using other languages, but for the most part it is just too cost prohibitive. Look at how much Y2K cost to change a 2-digit date to a 4-digit date and deal with some leap-year issues – roughly US$300 billion.

One thing I found very interesting is that many students immensely dislike programming in Cobol. Comments are often along the lines of:

“I hate this language so much.”

Why do they hate the language so much? Partially of the reason is the English-like syntax. Students have a hard time dealing with it, they are use to programming in one of the C-family of languages: C, C++, C#, Java etc. Most have never programmed in the likes of Fortran, or Ada, let alone Cobol. It becomes even more challenging with older Cobol code that needs to be re-engineered. For example, consider the following code in a C-based language:

tax = rate * income

In Cobol this becomes either one of:

multiply rate by income giving tax.
compute tax = rate * income.

Also data has to be specified exactly. Coding something like

77 sumtotal pic 99.

means you can store the values 0 to 99. No negative numbers, and nothing larger than two digits – there is no facility to store them. There is also minimal modularity, nearly all variables are global, no pointers, minimal built-in functions. Should I go on? There are of course many positives – e.g. file handling is a breeze. Does it cripple the mind? Hardly. It just requires thinking a little differently about the syntax of the language.

Re-engineering Cobol code is expensive, tiresome, and may not be as much fun as designing a mobile app. But want a guaranteed job for the foreseeable future? Learn Cobol.

Cobol is very much alive for being a dead language.

Oh, and learn a pattern parsing language such as awk while you’re at it.

Computer science is about knowing how to program

Programming is the linchpin of software. Sure the task of coding a piece of software takes up maybe 20% of a software design project, but it is a skill that is required. In addition you have to know how to solve problems, design the software, test the software and write readable documentation.

Do students get enough practice programming? Are the languages used relevant?

Most places teach something like C, Java, C#, Objective C, or maybe C++. Some variant of the C-family of languages. They have been around for many years, but most are used in some form in industry. Other places teach Python, or maybe more esoteric languages such as Scheme or Haskell used mainly for teaching purposes. Nothing is really wrong with using any language… even Pascal makes a nice teaching language, but teaching a industry-relevant language does help students, especially for those who do industry co-op placements. The problem is that students may not be getting enough programming experience, or at least relevant coding experience.

This includes programming mid-sized systems, understanding a diverse range of languages, being able to integrate languages, designing testing strategies, parallel programming, and real-time programming. Ideally it would also include some real-world programming of mobile systems, and even looking at legacy systems with languages such as Cobol. It is hard to scale from writing programs that run 500-1000 lines in length to the 100,000-1M line behemoths regularly encountered in industry without practice.

“The only way to learn a new programming language is by writing programs in it.”
Kernighan & Ritchie