Why higher education needs more problem solving courses

Problem solving is an integral part of the computer science experience, without it, the craft of computing boils down to simple programming. There is no need for programming if the quintessential underpinnings of a the solution to a problem do not exist. There are of course many problems in computer science that are still open problems – why? Because nobody has solved the problems underpinning them. Some problems seem simple but are in reality quite challenging, and image processing is a good example. Take the simple task of cleaning up a picture from an old manuscript that has been digitized. Figure 1 shows an example of a piece of a typewritten note, where the paper shows the marks of age.

Fig.1: A deteriorated text image

How does one go about producing a clean, crisp version of the image where the text can be easily extracted? What steps are involved? Will the solution be one focused on the specifics of the particular image, or a more generic solution? There are many differing scenarios, and approaches which could be considered. Some students will read the literature, and take the tried and true approach to cleaning up image. Others will throw their lot in with some AI-based scheme (i.e. black magic). Many students forget that to understand how to solve the problem first requires an understanding of the actual data involved – an aged piece of paper with typewritten text, where the paper has been damaged by a liquid of some sort, making the paper discoloured in spots.

There was a time when computer science focused more on problem solving skills, even if in an implicit manner. Students were given a problem and derived a solution. The internet arrived and vast vestiges of information became available, and problem solving skills began to disappear. It is easier to try and find a solution on the internet than actually look at the problem and attempt to solve it. I do sympathize with this approach, as the internet is full of readily available information – whether it is information or disinformation is debatable. When I was a student, and there was no Web (there was an internet, but information was hard to find) you were forced to derive a solution to a problem, of seek solace in the computing section of the library.

It is however not the student’s fault. We have failed in teaching any form of real-world problem solving skills. We assume that they learn these things in secondary school, but this is hardly the case. We provide few if any courses dedicated to real-world problem solving. All students in all programs at university should take a real-world problem solving course, even if it is under the guise of a more fanciful “critical thinking“.

The sort of course I am thinking of would involve elements of individual and group-based problem solving, using real-world problems that are both tactile, and theoretical in nature. Math problems are all good-and-well, but their solutions are often pretty well defined, whereas problems from other realms can be messier – and messy is good. There are many hands-on activities that can be used to engender problem solving skills. These problems help students to:

  • Discover and explore problems and solutions
  • Learn new concepts in thinking
  • Become more creative / innovative
  • Become more open-minded and learn how to avoid mental blocks
  • Use intuition and common sense in problem solving
  • Exercise the “more than one solution” approach

Skills that were once developed because a problem could be approached using very little information now incorporate a vast amount of both information and disinformation. Information can be overwhelming and actually hinder the thought process. For some tasks it is of course challenging to think of new ways to solve a problem. If a task involved “develop a new algorithm for sorting a series of integers. You may not use any existing algorithms. It must be completely new.” This would be challenging because the algorithms that exist have not really been augmented by new algorithms.

As for cleaning up the deteriorated image? The solution is quite simple, but it requires some experimentation. Firstly, the colour image is problematic because of, well the complexities of colour. So one approach is to convert the image to grayscale, reducing the problem space (Fig.2A). Next the background can be modelled using something like a polynomial, to create an image which represents everything but the text (Fig.2B). Finally this background is subtracted from the grayscale image (Fig.2C). The image can then be binarized (separated into text and background) using a thresholding algorithm (Fig.2D).

Fig.2A: Convert into a grayscale image.
Fig.2C: Subtract 2B from 2A.
Fig.2B: Model the background.
Fig.2D: Extract the text.

There are of course other approaches to the problem, some of which might be even better. The goal here is to explore a problem, and come up with solutions – good or bad. The future lies in solving problems, and we need to create an educational environment where we promote the idea, rather than allow it to be overrun by a cacophony of nonsense.

Can computers think?

In one of the earlier articles on machines playing chess [1], applied mathematician Claude E. Shannon poses the question of whether electronic computers can “think“. Today we are still asking these questions, and the answers are likely the same. Shannon poses several advantages of a machine over a human player:

  • A machine can make individual calculations at much greater speed.
  • A machine’s play is free of errors other than those due to the deficiencies of the program, whereas human players often make simple and obvious blunders.
  • A machine is free from laziness, or the temptation to make an instinctive move without proper analysis of the position.
  • A machine is free from “nerves”, so it will make no blunders due to overconfidence or defeatism.

Shannon concluded that since there was no precise definition for “thinking“, there is no definite answer as to whether machines can think. From a behaviouristic point of view, the machine acts as though it is thinking. From the psychological perspective thinking is characterized by looking at solutions to a problem, and choosing the best solution by mental evaluation – this is exactly what a chess-playing algorithm does. Shannon concludes with the notion that in the end, the machine does what it has been told to do. There is no freedom of thought. The machine makes decisions, but those decisions are guided by the design of the algorithm.

The reality is that modern AI is not much different – although it has the ability to “think”, our perception of that form thinking may not be synonymous with human thinking. AI still relies on algorithms that tell the machine what to do, albeit algorithms that can make some limited decisions based on what it interprets in the data. In many respects the AI algorithms are able to learn from their experiences, but thinking? That’s a whole different thing.

  1. Shannon, C.E., “A chess-playing machine“, Scientific American, 182(2), pp.48-51 (1950)

The conundrum of languages in a CS curriculum

One of the conundrums of CS education is choosing what languages to teach, and how many of them. Choices are usually made based on faculty interests, or industrial use, with little credence given to actual pedagogical studies on what language actually works to teach programming. CS departments are hesitant to teach too many languages, and there are some good reasons for it. There is often not enough time in the course of a degree to spend more than a few courses related to programming. Languages can also become repetitive with respect to syntax. So courses on languages often exist in the form of comparative studies, grouping a dozen or more languages into a course with looks at different design aspects of languages.

In introductory programming courses, the language is used as a tool to teach the concepts of programming, e.g. variables, decisions, repetitions, modular code etc. Of course many of these courses end up being courses about the language syntax. This often happens with languages that pose difficulty to the novice programmer (here’s looking at you C). After this there may be one other language introduced, normally in the context of object-oriented programming, e.g. C++ or Java – these too often devolve into courses about the language. Naturally many of these languages have syntax which is very similar, especially those derived from C (which in reality is most widely used languages).

There are many issues with the use of languages in a curriculum. While industrial use is a good characteristic of a language, it is a poor choice for introductory languages because the onus there should be on teaching programming concepts, not language syntax. This is what makes the likes of C and Java poor choices as introductory languages. There is nothing wrong with choosing a language which is easy for the novice programmer to understand, yet covers all the basic programming constructs. It doesn’t have to be industrially relevant, but any choice is good as long as it is not an academic language (i.e. one restricted to pure academic use). Fortran is a good choice, as is Python, or even Pascal. None have the trip-ups that are found in C.

Once students grasp the concepts of programming, then a second language can be introduced – be it C, Java, or even Swift. The first and second languages should be fundamentally different. Fortran followed by C allows the student to be able to compare and contrast language structures and syntax. Having a curriculum which focuses solely on C-based languages produces students which have a biased and narrow scope of programming languages. Now while C is a lingua franca for many developers, it is not the only language in the world, and should not be treated as such. To modify a well known quote about a hammer:

“If the only programming language you know is C, you will start treating all your problems like they contain pointers.”

After the second language, students generally have enough programming knowledge to be able to understand and use any language. Beyond that, there is scope for specialist languages in certain courses, or language courses which explore different languages, their history, design and application. But there is ostensibly no reason to have more courses on the syntax of particular languages, unless there is some fundamental design construct which makes them interesting, e.g. functional languages.

I have heard students quite often make comments of the form “but we weren’t taught to program in X”. The reality is that a CS degree doesn’t exist to teach people to program in ten different languages. Once a student has a basic knowledge base, it is actually easy to learn new programming languages. Nobody says “I’m not going to learn Cobol because I’m a C programmer”. The reality is that most programmers are polyglots, and most of the languages they know are self-taught. Languages are not fundamentally different from one another really, each has its own idiosyncrasies, but that’s what makes them interesting.

Further reading:

The value of a liberal arts degree

People might think that because I work in computer science, that I view liberal arts degrees as being frivolous – but nothing could be further from the truth. What I have learned from years of being in institutions of higher learning is that many STEM degrees just don’t provide a very broad view of the world. A degree in biology focuses on biology, but offers very little in the way of intellectual growth beyond the sciences. I know, it’s a broad statement, but these sorts of degrees rely heavily on learning facts – they are methodical and scientific. They don’t delve very much into the interpretative sphere, not in the same way that a fine arts major interprets a painting, or an archaeologist interprets how people lived based on their dig findings. Now it’s okay to be solely focused on science, but it does tend to skew the perspective of the world. Arts isn’t a precise field, but life in its entirety is not precise.

Arts graduates often have one huge advantage over most STEM graduates – the ability to write – and by this I do not mean scientific writing, which often suffers from being monotone, dry and boring. I find it funny when potential PhD students say they are “a great writer” because they have written a couple of papers. Yawn. I have talked about the importance of good writing before, and frankly many scientists suffer from an inability to write well. That may stem from the fact that they write for journals which expect a certain level of dullness, or they don’t read anything outside their discipline. And if great writing is the sole outcome of a liberal arts degree, I would take it in a heartbeat. Too many scientists succumb to method-based writing, and I understand that for documenting experimental results it makes sense, it’s just not interesting.

An arts degree is intrinsically flexible, with students taking courses in many differing disciplines. STEM students typically take only STEM courses, with a lot of whining when they are required to take an arts or social science course. The history of engineering – why would that be relevant? To learn from ones mistakes of course, or perhaps to reengineer some old method? This brings us to the other relevant part of an arts degree – research. While STEM is a research heavy field, most people are only introduced to research in postgraduate degrees. In an arts degree, research is a part of the entire experience. Writing a history paper requires someone to research primary and secondary material, and more often than not include an opinion or two. Few STEM students ever delve into world of scientific articles (maybe because they are boring?). Few computer science students ever read a seminal paper from the 1960s written by a CS pioneer who could actually write. Fewer still understand historical context. This is even more interesting today with the broad spectrum of historic material available in digital form.

As for earnings, which is always where people say things like “engineers earn more”. Hold on. Reign it in. A recent (and very long) report from The Andrew W. Mellon Foundation [1], should reassure everyone that a liberal arts degree is a good investment. The authors conclude that while STEM graduates earn more in the beginning of their careers, the advantages decrease over time. In recent years some business leaders, including CEO’s of tech companies, have suggested that liberal arts degrees hold real value. In a recent Wall Street Journal blog post, David Kalt (founder of reverb.com) said “A well-rounded liberal arts degree establishes a foundation of critical thinking. Critical thinkers can accomplish anything. Critical thinkers can master French, Ruby on Rails, Python or whatever future language comes their way. A critical thinker is a self-learning machine.”.

This means that a liberal arts degree is a good choice, if that is what you want to do.

  1. Hill, C.B., Davidson Pisacreta, E., The Economic Benefits and Costs of a Liberal Arts Education, Research Report, The Andrew W. Mellon Foundation (2019)

Big-O and speed

Ultimately Big-O relates most to time-complexity. The easiest way of visualizing algorithm complexity is Big-O as it relates to execution time on a machine capable of performing 1 million operations (or instructions) per second, or one operation per microsecond (μsec). The table shows number of operations (NO) and execution time for a series of values of n. (1 sec = 106 μsec = 103 msec) One million operations per second is circa mid 1970s vintage mainframe, like the VAX-11.

O()10 NO10102NO102104NO104106NO106
O(1)11 μsec11 μsec11 μsec11 μsec
O(log2 n)3.323 μsec6.647 μsec13.313 μsec19.9320 μsec
O(n)1010 μsec102100 μsec10410 msec1061 sec
O(n log2 n)33.233 μsec664664 μsec133e3133 msec199e520 sec
O(n2)102100 μsec10410 msec1081.7 min101211.6 days
O(n3)1031 msec1061 sec101211.6 days101831709 yrs
O(2n)102410 msec10303.17×1017 yrs10301010301030

Of course the faster the machine, the faster the algorithm. In this context, to process a million items with a quadratic algorithm, O(n2) takes over 11 days. For something like the Apple M1 chip that may do a 11 trillion calculations per second this would take 0.09 seconds. For a computer like the HP El Capitan supercomputer which can perform 2 quintillion calculations per second, i.e. 2×1018 calculations, it would take 5e-7 seconds or 0.5 μsec. A cubic algorithm (something like Towers of Hanoi) would only take 0.5 seconds instead of 31,709 years on this machine.

So a quadratic algorithm that might have been horrible to run in 1977 for large datasets, is now quite presentable, but only because machines got faster. In fact to the human, the perceivable difference between 0.09 seconds and 0.5 μsec is negligible.

Vacations of the mind (or the burden of data)

Have you ever had your mind just go blank? It may be something that happens due to age, or maybe it is an effect of the age of data. Everyday we are bombarded with data from electronic sources around us. Forms, ads, email, spreadsheets, documents, tv etc. Is there too much data in our lives? Should we have data free days when we can just allow our minds to rest? It is hard thinking of new things when you are constantly bombarded with things your mind must think about. People never use to have to deal with so much data. You could read a book, but you weren’t being bothered by other things.

Maybe where we have progressed as humans is not all progression? A car still gets us from point A to Point B, but the mechanics of the car are much more complex. Is more complex better? Data has also become more complex, but is it better information? Hardly, here is just more of it. Nobody really has explored the impact of data on our lives. Like many other innovations we assume there would be no side affects. Unlike eating which involves many differing senses or building which is a very tactile endeavour, data usually involves mostly visual interpretation, be it text or images/videos. Some people like to distinguish between text and images, and they are of course different. One can visualize in an image the same data that it might take a page of text to describe, and the image can be absorbed at once, versus the text. Our minds are of course very adept at interpreting visual data – more than 80% of the information our brains process is visual.

We have almost come to the point of craving data, and some would question whether we aren’t somewhat addicted to it. For some people it’s YouTube, watching copious videos of many things, often at a 1.5x or greater speed to get the data fix quicker. Sometimes we can’t even wait to watch a show, we just read the synopsis – why watch something, when we already know what is going to happen (go on you know you do it). Everyone has some form of data vise – for me it’s digitized books. Websites like archive.org offer a cornucopia of digitized books on just about any topic, from books sometimes hundreds of years old. I likely spend far too much time browsing material (arguably it easier than traipsing through dust libraries). Too much data can lead to a feeling of being overwhelmed. Our brains start to jump from one thought to another to another, a mental multitasking of sorts which eventually takes a toll on concentration and focus.

How to fix this? Make at least one day a week data free. Turn off the machines – I don’t mean do nothing, but read a physical book instead, or draw something, or just sit under a tree or in a cafe and think (or don’t think). Progress is made not through the copious amount of new data produced but by the relevance and tangibility of what is produced. Or maybe it is that some humans care too much about money and not much about anything else. To reduce our dependency on data we should maybe endeavour to go more old school, off-data, similar to off grid perhaps. I’m not saying remove the internet, but perhaps wean ourselves off binging on information feeds. We all do it, but is it productive for our lives? Maybe we should have weeks where we just refrain from digital binging? Vacations of the mind of sorts. Emptiness is not necessarily a bad thing. Maybe this could be Nordic thinking? It is as simple as walking through a forest, or on a beach.

Big-O – Some examples

Big-O : Bubblesort

As one example to illustrate Big-O, let’s consider the humble Bubblesort. Here is the algorithm for a Bubblesort:

  1. N integers are in an array A(N).
  2. Set the counter L to N-1.
  3. Find the largest among the first L+1 integers and exchange it with the number in the (L+1)st element in the array.
  4. If the counter L has reached 1, the algorithm terminates. Otherwise reduce L by 1 and repeat steps (3) and (4).

Here is what the algorithm looks like implemented in Fortran:

  subroutine bubblesort(n,A)
    integer, intent(in) :: n
    integer, dimension(n), intent(inout) :: A
    integer :: L,i,j

    do L = 1,n-1
        do j = 1,n-L
            if (a(j) > a(j+1)) then
                call swap(a(j), a(j+1))
            end if
        end do
    end do

  end subroutine bubblesort

The Bubblesort in its worst case makes n-1 comparisons, and the same number of swaps in the first pass; n-2 comparisons in the second pass, and so on, until it makes a single comparison in the (n-1)th pass. Therefore for an input for size n>1, the maximum number of swaps (K) is:

K = (n-1) + (n-2) +...+ 1

Therefore the worst case efficiency for a Bubblesort is proportional to n2, so it is an O(n2) algorithm. This is why it is considered one of the slower sorting algorithms.

Big-O : Image mean filter

As an example of Big-O notation, let’s consider the task of applying a mean filter to a grayscale image. Now the image I is dx×dy pixels in size, and a mean filter calculates the the mean of a square neighbourhood around a pixel image(i,j). This mean value becomes the new pixel in the mean-filtered image. This sort of algorithm involves using a nested loop to process each of the m×n pixels. If the “radius” of the square neighbourhood is w=2, then the filter will be of size -2..2, for each dimension, giving a 5×5 mean filter. So below is snippet of C code to implement this.

for (i=w; i<dx-w; i=i+1)
   for (j=w; j<dy-w; j=j+1){
      sum = 0;
      for (x=-w; x<=w; x=x+1)
         for (y=-w; y<=w; y=y+1)
            sum = sum + image[i+x][j+y];
      imageN[i][j] = (int) sum/25.0;
}

The C code consists of a series of four nested for loops. The outer two loops process each pixel in the image (excepting the two rows of boundary pixels. The the loop on line 1, would process dx-4 rows, while the loop on line two would process dy-4 columns. The inner two loops calculate the mean of the 5×5 neighbourhood. So for every pixel in the image, there are 25 operations to calculate the sum of the neighbourhood, and one operation to calculate the mean. So all in there are roughly (dx⋅dy)⋅(25+1). If n=dx⋅dy, then this becomes O(26n), or really O(n).

When this algorithm was applied to an image 2144×6640 pixels in size, the C implementation ran in 0.8 seconds. The problem is it is hard to really understand what the O(n) means in relation to the timings. The same algorithm implemented in Fortran runs in 0.49 seconds (see below), and has the same O(n) as the C code (or more exactly (dx⋅dy)⋅(3) = O(3n) = O(n).

do i = w+1,dx-w
   do j = w+1,dy-w
      blk = image(i-w:i+w,j-w:j+w)
      sumB = sum(blk)
      imageN(i,j) = int(sumB/25.0)
   end do
end do

It is possible that the reduced runtime is due to the fact that the inner nested loop in the C implementation has been replaced with an array slice, and use of a built-in function sum() in the Fortran version.

Big O notation- What is it good for?

Big O notation is something that I could never really understand as a novice programmer. It is basically a measure of algorithm complexity, measuring an algorithm’s behaviour (runtime) in terms of how quickly it grows relative to the input. It effectively helps differentiate more efficient algorithms from less efficient ones.

What is Big O?

Now normally the runtime of an algorithm could be determined by timing it (well sort of – it is never 100% accurate because systems run so many different processes. The problem is that machines have different types and numbers of processors, differing amounts of memory, different operating systems, etc. So Big O notation is a means of describing an algorithms runtime independent of the machine. The O (a capital letter O, not a zero) is sometimes called bachmann-Landau notation from the name of the German mathematicians, Paul Bachmann and Edmund Landau, who invented the notation. The letter O was used because the growth rate of a function is also called its order.

Big O notation says that that given the size of input, denoted by n, the runtime of the algorithm grows “on the order of the size of the input. The mathematical notation used is called the order of magnitude, O(f(n)), where f() is some function. Now in certain circumstances that matters. If I am sorting large data sets of something, then I would prefer to do that as efficiently as possible. Now let’s face it, sorting is where Big O normally appears, usually in some course on algorithms. Here are the most common functions.

O(1)

The simplest algorithms are characterized by O(1), or “order of a constant“. This means that the algorithm will perform in the same time, regardless of the size of the input.

subroutine big0_constant(a)
   integer, dimension(:), intent(in) :: a
   a(1) = 0
end subroutine big0_constant

O(n)

Algorithms that are O(n) grow linearly with the size of the input. A good example of this is finding the largest element in an unordered set by simply examining each item in the set.

subroutine big0_linear(a)
   integer, dimension(:), intent(in) :: a
   integer :: i
   do i=1,size(a)
      write(*,'(i5)', advance='no') a(i)
   end do
end subroutine big0_linear

O(n2)

Algorithms that are O(n2) grow quadratically. As the input gets larger these algorithms become impractical, e.g. some sorting algorithms.

subroutine big0_quadratic(a)
   integer, dimension(:), intent(in) :: a
   integer :: i, j
   do i=1,size(a)
      do j=1,size(a)
         write(*,'(i5)', advance='no') a(i)*j
      end do
   end do
end subroutine big0_quadratic

O(log n)

An algorithm that does more work than an O(1) algorithm, but grows slower than an linear function, is the logarithmic, or O(log n). Good examples are binary search, and Euclid’s algorithm.

O(n log n)

Behaviour that is somewhere between linear and quadratic is O(n log n).

O(2n)

This is exponential time, O(2n), and is emblematic of costly algorithms. A good example of this is Fibonacci. It denotes an algorithm whose growth doubles with each addition to the input data set. The growth of the curve is exponential, starting off shallow, and then increasing meteorically.

Drop the constants

Normally when describing an algorithm in terms of Big O, constants are ignored. For example if an algorithm has two loops which process n elements, the Big ) would be O(2n), which reduces to O(n). Multiplying by two has no real effect on explaining the growth of the algorithm. Similarly it there was an algorithm which had a single loop processing n elements followed by a nested loop processing n elements, the runtime would be O(n+n2), which is just O(n2). In most cases, as n increases, these add-ons becomes very insignificant.

Some examples

Here are some examples of input values and corresponding Big-O.

nlog nn log nn22n
10112
21244
4281616
832464256
1646425665,536
3251601,0244,294,967,296
6463844,096very large
128789616,384über large
25682,04865,536Just don’t ask

Of course in many cases the Big-O is just a worse case scenario. For example

Big-O functions compared

Big-O and sorting

Big-O is most commonly used when comparing clusters of like algorithms, such as sorting algorithms.

AlgorithmO BestO AverageO Worst
Quicksortn log nn log nn2
Mergesortn log nn log nn log n
Heapsortn log nn log nn log n
Bubblesortnn2n2
Insertion sortnn2n2
Selection sortn2n2n2
Shellsortn log n(n log n)2(n log n)2

Postscript

To be honest, the problem with Big-O is that it is somewhat of a theoretical nature. While I’m sure it is useful to determine an algorithms behaviour, some algorithms may be too complex to obtain a good estimate of Big-O. Or perhaps they use a series of functions for which there is no clear understanding of the behaviour.

Some software engineers in the real world may use Big O in the context of making software go fast, but in other cases accuracy outshines speed. There are of course many things that are involved with an algorithms speed apart from complexity – cache, use of GPU processing, use of concurrency, programming language etc. Efficiency is more than just knowing Big O, but I’m sure it helps to have some understanding of it.

The overall reality is that I don’t know if I actually care that much about Big-O. I think it was more important when machines had limited resources, and algorithm efficiency was important, and I’m sure it might be important in things that require some sort of bleedingly efficient algorithm today (e.g. things that go into space, which have limited resources). But today? it’s hard to compare many algorithms because they run super fast.

Tail recursion – What’s so special?

Tail recursion can be used to make things more efficient. Consider the following recursive function which sums up all the integers less than or equal to the input value x:

int sum(int x)
{
   if (x == 1)
      return x;
   else
      return x + sum(x-1);
}

If this function were called as sum(5), this is what would be evaluated:

sum(5)
5 + sum(4)
5 + (4 + sum(3))
5 + (4 + (3 + sum(2)))
5 + (4 + (3 + (2 + sum(1))))
5 + (4 + (3 + (2 + 1)))
15

Note here that every recursive call has to complete before the sum can actually be calculated. Now consider a tail-recursive version of the same function:

int tailsum(int x, int runSum)
{
   if (x == 0)
      return runSum;
   else
      return tailsum(x-1,runSum+x);
}

Now consider the sequence of events that would occur if we called tailsum(5,0).

tailsum(5,0)
tailsum(4,5)
tailsum(3,5+4)
tailsum(2,9+3)
tailsum(1,12+2)
tailsum(0,14+1)
15

The sum is performed in a progressive manner. So technically, when the next recursive step is to be performed, the current stack frame isn’t needed (no value is stored in there), so it may allow for some optimization.

By having no additional instructions to execute, there is no need to store the activation record on the stack because the only thing left to do once the recursive call exits is to restore the stack to the previous function. Instead of needlessly modifying the stack, tail calls overwrite the current activation record instead of pushing a new one onto the stack. By replacing the current activation record, instead of pushing a new one onto the stack, stack space is greatly reduced. Once the recursion terminates, the function will return to the original activation record.

Amicable numbers in Fortran

How can numbers be friendly? Two numbers are considered amicable if their relationship is defined in terms of their proper divisors. A proper divisor of a natural number n is any natural number that divides n, excluding n itself. For example 12 has proper divisors, 1, 2, 3, 4 and 6. Two numbers are amicable if the sum of the proper divisors of one number equals the second number and the sum of the proper divisors of the second number equals the first number. For example consider the amicable number pair 220 and 284, already known to Pythagoras.

The divisors of 220 are: 1, 2, 4, 5, 10, 11, 20, 22, 44, 55, and 110. Their sum is 284.
The divisors of 284 are: 1, 2, 4, 71, and 142. Their sum is 220.

In the 9th century, Arabian mathematician Thābit ibn Qurra (Thabit ben Korrah) (826/836-901) gave the following formula for finding amicable numbers:

p = 3(2n)-1
q = 3(2n-1)-1
r = 9(22n-1)-1

If n is greater than 1, p, q, and r are all primes, then 2xpq and 2xr constitute an amicable pair of numbers. For example if x=4, then A=47, B=23, and C=1151 which are all primes. Then the two amicable numbers are 17296 and 18416. Unfortunately this formula does not account for all amicable pairs. In 1636 Fermat rediscovered the rule, and two years later Descartes gave an equivalent rule.It was Euler in 1750 that added 59 pairs to the list. A better algorithm is [1]:

  1. Start with a number, n.
  2. Find the sum of its factors, S, including 1, but not n.
  3. If S is more than n, find S1, the sum of the factors of S.
  4. If S1 = n, print the amicable pair: n, S.

Here is a program in Fortran to calculate a series of amicable numbers:

program amicable
   implicit none
   integer :: n, s, s1

   write(*,*) 'Amicable numbers:'
   do n = 2,100000
      s = sumFactors(n)
      if (s > n) then
         s1 = sumFactors(s)
         if (s1 == n) then
            write(*,*) n, s
         end if
      end if
   end do

contains

   integer function sumFactors(n)
      integer, intent(in) :: n
      integer :: i, s, sum
      sum = 1
      s = int(sqrt(real(n)))
      ! Sum values < sqrt(n) if divisible into n
      do i = 2, s-1
         if (mod(n, i) == 0) sum = sum + i + n/i
      end do
      ! Sum value if a square
      if (s*s == n) then
         sum = sum + s
      end if
      sumFactors = sum
    end function sumFactors

end program amicable

Here is a run of the program:

 Amicable numbers:
         220         284
        1184        1210
        2620        2924
        5020        5564
        6232        6368
       10744       10856
       12285       14595
       17296       18416
       63020       76084
       66928       66992
       67095       71145
       69615       87633
       79750       88730
  1. Spencer, D.D., Computers in Number Theory, Computer Science Press (1982)