Wednesday, May 19, 2010

Parallel Programming in Haskell

A Parallel Programming Tutorial


What is parallel programming?
 
        Parallel programming is a technique by which computations can be executed simultaneously on more than one processor to save time. These processors could be working together on the same machine, or could be distributed across a network. Parallel programming will never allow asymptotic gains in efficiency, a quadratic algorithm will still be quadratic, but you can improve computation time by a constant factor. When you have a combination of a computationally difficult task, a highly parallel machine to work on, and a parallel approach to solving the problem, the gain in efficiency from parallelism can be enormous. The gain is still limited, however: adding more processors yields diminishing returns as the improvement in computational efficiency begins to become asymptotic. Parallel programming also generally comes with a non-trivial cost in extra complexity.


If it's so great, why am I not using parallel programming already?

        Parallel programming does incur a cost, and most people's machines are still single-processor machines which cannot directly benefit from parallel programming. This is changing rapidly, but most machines still do not offer enough parallelism to make parallel programming truly compelling. In addition to these hardware obstacles, programmers have not generally had access to good tools for parallel programming. Fortunately, Haskell with the Glasgow Haskell Compiler makes this easy. The arguments for learning parallel programming in Haskell can be listed as follows:
  • It is easy to learn to write parallel algorithms in the Haskell Programming Language
  • The Haskell Programming Language makes parallel annotations cheap
  • Hardware manufacturers have been improving parallelism instead of clock speed
  • Distributed and highly parallel computers have long been available in academic and industrial settings


Operational semantics: seq and friends

Take a look at the following type signatures:
const :: b -> a -> b
seq   :: a -> b -> b
pseq  :: a -> b -> b
par   :: a -> b -> b
Note: These type signatures may look slightly different from what shows up in GHCi. a and b are used to illustrate the similarity of these functions.
        What's going on here? From the type signatures, you should be suspecting that these functions don't actually do anything at all to their arguments. You would be absolutely right. Here is how we could implement them:
const a b = a
seq   a b = b
pseq  a b = b
par   a b = b
         None of these functions appear to do anything useful with these definitions. However,seqpseq, and par all give information about the how the code should be evaluated. This is important: parallelism doesn't say anything about what is being evaluated, just about how it is being evaluated. To maintain this behavioral difference as to how things are evaluated, seq, pseq, and par cannot be defined like this: they each have different operational semantics from each other, even though they have the same denotational semantics. Parallelism is operational semantics, not denotational semantics, because (reiterating our earlier observation) computing a value in parallel does not change what the result of the computation is, only how it is arrived at. Let's look at the operational semantics of these functions one by one:
  • const does not have any special operational semantics. It is evaluated normally.
  • seq ensures that if the right side is evaluated, the left side is also evaluated.
  • pseq is like seq, but additionally ensures that the left side is evaluated before the right side.
  • par is like seq, but allows the right side to be evaluated while the left side is still evaluating.
Hint: An easy way to understand difference between seq and pseq is that note that the expression "b `seq` a `seq` b" has the same operational semantics as the expression "a `seq` b", while the same property does not hold for pseq.
        Note that, because the first arguments to seq, pseq, and par are ignored, they can be of any type, including the unit type (). However, because this argument will be evaluated, it cannot be undefined (try this in GHCi! Note that you may hear undefined referred to as bottom and see it written as ). par is particularly interesting because it does not guarantee that the evaluation will actually happen in parallel, it only hints that it could happen in parallel, only guaranteeing that both will be evaluated, like seq. The fact that you aren't guaranteed a parallel evaluation is a good thing, it means you can safely use par wherever it seems you may benefit without needing to worry about the overhead of real parallelism. pseq is important because if the left hand operand is your complicated parallel evaluation strategy, you will want that to complete all its work before you start evaluating the right operand (remember, seq does not guarantee this).
        You now know enough to start writing parallel algorithms in Haskell. There are easier ways to do this than to explicitly use par and pseq, and we will be covering those later when we cover parallel strategies (recall that pseq can use a left hand operand of type (), this will be important later). However, for the purposes of example, let's implement a simple parallel algorithm right now:

pmap :: (a -> b) -> [a] -> [b]
pmap _ [] = []
pmap f (x:xs) = hd `par` tl `pseq` hd:tl
    where
       hd = f x
       tl = pmap f xs
Note: It is not enough to simply annotate your code to get your program to actually run in parallel! You will first need to pass the Glasgow Haskell Compiler the "-threaded" option on the command line when compiling your program. You will also need a multi-core machine to observe any parallelism.
         This is a strict map over a list (don't try this on an infinite list!). hd and tl are calculated before pmap returns, and they are allowed to be calculated in parallel. No parallelism is guaranteed for pmap, but you have told the compiler that you think it would be a good idea to evaluate hd and tl in parallel.

If you have any doubts, this would be a good point to stop and read before continuing.

Parallel programming with Strategies


         So far we have only talked about parallel programming with the parallel annotation. Although these parallel annotations are already at a high level of abstraction, we still need to explicitly structure our annotations to get the evaluation order we want. Functions over commonly used data structures are often annotated for parallelism in the same ways; why not create special functions which have the sole purpose of evaluating the members of a data structure? These strategies live in Control.Parallel.Strategies and two type synonyms live in this module's source code:
  • Done, a synonym for ()
  • Strategy a, a synonym for a -> (), defined with the Done synonym above.
        As you may already realize, a Strategy returns nothing and any Strategy can be written in the form f _ = () without any change of denotational semantics. A Strategy encodes operational semantics over an input type; you will generally define them in terms of the functions covered in the previous chapter. The Strategy type constructor takes a single type parameter: the type which it can evaluate using its evaluation strategy (its operational semantics). Control.Parallel.Strategies deserves an article to itself, but for now, let's take a look at some of the functions defined in Control.Parallel.Strategies, some of which will be needed to understand a new implementation of pmap:
  • using: applies its left hand argument, an a, to its right hand argument, a Strategy a, then returns its left hand argument. This guarantees that the left hand will be evaluated strictly using the given strategy when using itself is evaluated.
  • demanding: evaluates its right hand argument, a Done (intended to be a fully applied strategy), then returns its left hand argument. You'd generally use it if the left hand argument can be computed more quickly after the evaluation of the fully applied parallel strategy on the right.
  • r0 is a strategy that does not do anything at all.
  • rwhnf is a strategy that strictly evaluates its argument
  • parList applies a strategy to every element of a list
  • parMap maps over a list, then applies a strategy to every element of the list using parList
Hint: You may be wondering about how parMap works. Won't the mapped function be applied sequentially before there is a chance to use the parallel strategy parList? Recall that Haskell is a non-strict, or lazy language, and the list type is also lazy. When "map f" is applied to the list, no evaluation is actually done, all that happens is that a list xs is now a list "map f xs".
        We can now re-implement our earlier pmap function using some of these terms:
pmap = parMap rwhnf
        As you can see, this definition  is much simpler and higher level. It simply says that our parallel map strictly evaluates each application of its given argument, in parallel. More on Control.Parallel.Strategies will be coming. For now, I highly recommend you take a look at the source code, as you can learn a lot from reading it. Good luck, and have fun!


PS: Seems like "WYSIWYG" does not quite hold for Blogspot. Sorry about the formatting hiccups.

1 comment:

  1. For a more compelling reason as to why you'd want to do all of this 'par' and 'pseq' stuff, here's a link to "the next step" - Nested Data Parallelism (where you have lists of lists, but not all the same size - sparse matrices, anyone?) on YouTube - trust me, it's not a rick roll...maybe...

    http://www.youtube.com/watch?v=NWSZ4c9yqW8

    ReplyDelete