Wednesday, May 19, 2010

Parallel Programming in Haskell

A Parallel Programming Tutorial


What is parallel programming?
 
        Parallel programming is a technique by which computations can be executed simultaneously on more than one processor to save time. These processors could be working together on the same machine, or could be distributed across a network. Parallel programming will never allow asymptotic gains in efficiency, a quadratic algorithm will still be quadratic, but you can improve computation time by a constant factor. When you have a combination of a computationally difficult task, a highly parallel machine to work on, and a parallel approach to solving the problem, the gain in efficiency from parallelism can be enormous. The gain is still limited, however: adding more processors yields diminishing returns as the improvement in computational efficiency begins to become asymptotic. Parallel programming also generally comes with a non-trivial cost in extra complexity.


If it's so great, why am I not using parallel programming already?

        Parallel programming does incur a cost, and most people's machines are still single-processor machines which cannot directly benefit from parallel programming. This is changing rapidly, but most machines still do not offer enough parallelism to make parallel programming truly compelling. In addition to these hardware obstacles, programmers have not generally had access to good tools for parallel programming. Fortunately, Haskell with the Glasgow Haskell Compiler makes this easy. The arguments for learning parallel programming in Haskell can be listed as follows:
  • It is easy to learn to write parallel algorithms in the Haskell Programming Language
  • The Haskell Programming Language makes parallel annotations cheap
  • Hardware manufacturers have been improving parallelism instead of clock speed
  • Distributed and highly parallel computers have long been available in academic and industrial settings


Operational semantics: seq and friends

Take a look at the following type signatures:
const :: b -> a -> b
seq   :: a -> b -> b
pseq  :: a -> b -> b
par   :: a -> b -> b
Note: These type signatures may look slightly different from what shows up in GHCi. a and b are used to illustrate the similarity of these functions.
        What's going on here? From the type signatures, you should be suspecting that these functions don't actually do anything at all to their arguments. You would be absolutely right. Here is how we could implement them:
const a b = a
seq   a b = b
pseq  a b = b
par   a b = b
         None of these functions appear to do anything useful with these definitions. However,seqpseq, and par all give information about the how the code should be evaluated. This is important: parallelism doesn't say anything about what is being evaluated, just about how it is being evaluated. To maintain this behavioral difference as to how things are evaluated, seq, pseq, and par cannot be defined like this: they each have different operational semantics from each other, even though they have the same denotational semantics. Parallelism is operational semantics, not denotational semantics, because (reiterating our earlier observation) computing a value in parallel does not change what the result of the computation is, only how it is arrived at. Let's look at the operational semantics of these functions one by one:
  • const does not have any special operational semantics. It is evaluated normally.
  • seq ensures that if the right side is evaluated, the left side is also evaluated.
  • pseq is like seq, but additionally ensures that the left side is evaluated before the right side.
  • par is like seq, but allows the right side to be evaluated while the left side is still evaluating.
Hint: An easy way to understand difference between seq and pseq is that note that the expression "b `seq` a `seq` b" has the same operational semantics as the expression "a `seq` b", while the same property does not hold for pseq.
        Note that, because the first arguments to seq, pseq, and par are ignored, they can be of any type, including the unit type (). However, because this argument will be evaluated, it cannot be undefined (try this in GHCi! Note that you may hear undefined referred to as bottom and see it written as ). par is particularly interesting because it does not guarantee that the evaluation will actually happen in parallel, it only hints that it could happen in parallel, only guaranteeing that both will be evaluated, like seq. The fact that you aren't guaranteed a parallel evaluation is a good thing, it means you can safely use par wherever it seems you may benefit without needing to worry about the overhead of real parallelism. pseq is important because if the left hand operand is your complicated parallel evaluation strategy, you will want that to complete all its work before you start evaluating the right operand (remember, seq does not guarantee this).
        You now know enough to start writing parallel algorithms in Haskell. There are easier ways to do this than to explicitly use par and pseq, and we will be covering those later when we cover parallel strategies (recall that pseq can use a left hand operand of type (), this will be important later). However, for the purposes of example, let's implement a simple parallel algorithm right now:

pmap :: (a -> b) -> [a] -> [b]
pmap _ [] = []
pmap f (x:xs) = hd `par` tl `pseq` hd:tl
    where
       hd = f x
       tl = pmap f xs
Note: It is not enough to simply annotate your code to get your program to actually run in parallel! You will first need to pass the Glasgow Haskell Compiler the "-threaded" option on the command line when compiling your program. You will also need a multi-core machine to observe any parallelism.
         This is a strict map over a list (don't try this on an infinite list!). hd and tl are calculated before pmap returns, and they are allowed to be calculated in parallel. No parallelism is guaranteed for pmap, but you have told the compiler that you think it would be a good idea to evaluate hd and tl in parallel.

If you have any doubts, this would be a good point to stop and read before continuing.

Monday, May 10, 2010

Who I am, and why I am starting this blog

Hello reader,

My name is Barend Venter. As of writing this, I am an undergraduate studying Computer Science at Central Washington University at Ellensburg WA. I am writing this blog to document my experiences using functional programming on Windows, so that others may benefit from my solutions or contribute their own. The second purpose of this blog is to advocate functional programming.

My languages of choice at this point are F# and Haskell. I have written many applications in those languages, most of them trivial but some a bit more ambitious. The most substantial of these was a full implementation of the Rabin public key cipher for key exchange and a key generator to use with it. As time permits, I may also write on languages that have piqued my interest but in which I have not yet coded, such as Erlang or Scala.

As I pointed out, I intend to advocate functional programming. Functional programming needs advocates: many programmers are barely aware it exists, and do not understand how functional programming works or why people use it. People who advocate the use of functional languages and explain how to tackle difficult problems not covered in beginners' textbooks could do a lot to raise awareness of the discipline. If people are aware of the motivation behind functional programming, they can provide better support for functional styles of programming when writing libraries which functional programmers will want to use. People who are aware of the functional programming style may also find that the functional programming style simplifies some difficult programming problems and makes them more approachable.

I am still speaking in very vague terms here. In the coming months, I will be covering more specific examples of how I address my personal projects and class assignments in a functional way, and presenting my experiences with the tools and libraries available with for functional programmers (probably just Haskell, but I may touch on Visual Studio tools that pique my interest).

My intended audience are programmers who have already perused free online resources like Real World Haskell, Learn You a Haskell for Great Good, and A Gentle Introduction to Haskell. I will however try to write such that anyone clever and determined enough should be able to understand what I am writing about.

Here are some links to the resources mentioned above:
I would recommend you glance over these briefly if you are unfamiliar with the Haskell programming language.

If you want to learn F#, I'd recommend buying Expert F# by Don Syme. The F# language has been changing at a fast pace and a lot of previously "built-in" features have been removed from the language and placed in libraries. You will want to pay attention to the examples in that book, as some of them may no longer work with the current version of F#.

My plan is to update at a minimum once every 2 months. My first planned post will be on parallel programming in Haskell, but I may deviate if I find something else more interesting to write about. I look forward to writing to you.

I am also a novice functional programmer, so I welcome any criticism and will gladly edit my posts accordingly to make them more instructive. So please read my blog postings critically, for your sake and mine, so we can learn from each other - or at least so that I can learn from you.