One really nice thing about Julia is its ability to do parallelism. Intrinsically, i.e. no packages needed. For someone wishing to learn a bit of parallel coding, it’s an ideal environment. Julia provides a multiprocessing environment based on message passing to allow programs to run on multiple processors in shared or distributed memory.
One of the easiest methods is parallelizing loops. Note that although these loops *look* like normal loops, they behave completely differently. The most interesting part is that iterations do not happen in any specific order, and hence writes to variables or arrays are not visible from a global perspective – iterations run on different processes. Some of these limitations can be overcome with the use of shared arrays. Parallel loops use any number of processors. Here is a simple case, where we are simulating the number of times a coin toss ends in a tail:
ntails = @parallel (+) for i=1:2000000
rand(Bool)
end
This specifies a parallel loop where iterations of the loop are assigned to multiple processors. They are combined using a specified reduction, in this case (+). The reduction operator can also be omitted if the logic does not need it. This is of course a trivial example, but it shows how the parallel loop can be used.
But parallel does not always mean faster. Here Julia nicely provides the @time macro. This macro executes an expression, and prints the time it takes to execute, the number of allocations, and the total number of bytes the execution causes to be allocated (before it returns the value of the expression). Parallel loops can also be resource hungry, i.e. there is no free lunch. For example, let’s look at a simple function which sums the value 1, n number of times. Here is the serial version:
function serialSum(n)
sum = 0
for i = 1:n
sum = sum + 1
end
return sum
end
Nothing inherently challenging about that piece of code. Here is the code parallelized by using a parallel loop:
function parallelSum(n)
sum = 0
sum = @parallel (+) for i=1:n
1
end
return sum
end
The variable sum is assigned the result of all the sums. The use of @parallel, implies the loop is a parallel loop, the (+) implies the results of the parallel run are summed.
What happens when we run the code with N=1,000,000?
parallel 0.288343 seconds (231.73 k allocations: 10.216 MB)
serial 0.003774 seconds (1.37 k allocations: 64.265 KB)
Even with a billion, the parallel algorithm works 0.287 seconds, and the serial one at 0.004 seconds. So in this case the use of a parallel loop makes no sense. Apart from time, it also uses way more allocations, and an incredible amount of bytes more. Now it’s also possible to use a parallel nested loop. Here is the same concept extended to the realm of nested loops:
function serialSum(n)
sum = 0
for i = 1:n, j = 1:n
sum = sum + 1
end
return sum
end
And here we have parallelized both loops in the nested loop:
function parallelSum(n)
sum = 0
sum = @parallel (+) for i=1:n @parallel (+) for j=1:n
1
end
end
return sum
end
What happens when they run? with n=1,000,000?
parallel 12.503481 seconds (47.27 M allocations: 2.440 GB, 2.52% gc time)
serial 0.012687 seconds (2.58 k allocations: 116.727 KB)
Not exactly inspiring. Notice that the parallel version now has something called gc time associated with is, a whole 2.52%. What is this? Julia is a language which performs garbage collection. This means that Julia keeps track of current allocations, and frees the memory if it isn’t needed anymore. It doesn’t do this consistently though.
Can we use this to make image processing faster? Check the next blog post.