tobold.org

correct • elegant • free

△ Why Haskell Is My Favourite Programming Language △

◅ Lazy

Appendix: When Type Inference Fails ▻

Idioms

The optimistically-named A Gentle Introduction to Haskell includes "this concise definition of everybody's favorite sorting algorithm".

quicksort  []           =  []
quicksort (x:xs)        =  quicksort [y | y <- xs, y<x ]
                        ++ [x]
                        ++ quicksort [y | y <- xs, y>=x]

The same idea can be expressed in many different styles, here we explore some of them.

Our starting point is impressively brief. It's also opaquely idiomatic. Let's start by removing some of those idioms. Actually, this example demonstrates how easy it is to overuse the much-loved (x:xs) idiom. I think it should be reserved for the case of treating each quasi-anonymous element of a list identically (not unlike some uses of $_ in Perl); the expectation is that x will walk the list while xs dwindles.

But that's not the case here: and x at least plays such an important role in the algorithm that it must be worth naming it. I'd also lay it out a bit differently. So here's quicksort0.hs.

quicksort0 [] = []
quicksort0 (pivot:rest) =  quicksort0 [ x | x <- rest, x < pivot ]
                            ++ [ pivot ]
                            ++ quicksort0 [ x | x <- rest, x >= pivot ]

Personally I think this is already a huge improvement. The algorithm is much clearer (and also the fatal flaw in this version of it, but more on that later).

The next idiom to squash is list comprehensions. They're neat, but I have trouble remembering the syntax, so I rarely use them. In any case, it's nice to give names to some other important parts of the algorithm: quicksort1.hs.

quicksort1 [] = []
quicksort1 (pivot:rest) =
  let below = filter (< pivot) rest
      above = filter (>= pivot) rest
  in quicksort1 below ++ [pivot] ++ quicksort1 above

I suppose that introduces the idiom of sections, but I find them quite intuitive. The section (< pivot) is the predicate "less than pivot": in other words, it is a Boolean valued function of one argument. The section syntax works for any binary operator, so (+ 1) is the function that adds one to something, (- 2) the function that subtracts two from something, and (2 -) is the function that subtracts something from two. Sections work really well in combination with map, and foldl.

Another idiom is pattern-matching on function arguments. This is another good and easy one, but we can do without it. A recovering Scheme programmer might write quicksort2.hs.

quicksort2 this =
  if null this then []
  else
    let pivot = head this
        rest = tail this
        below = filter (< pivot) rest
        above = filter (>= pivot) rest
    in quicksort2 below ++ [pivot] ++ quicksort2 above

Let's assume we've decided to keep pattern matching. Where else can we go? The repitition of filter ... rest suggests we might be missing a trick, and that trick is partition, lurking just over there in Data.List. That yields quicksort3.hs.

import Data.List (partition)

quicksort3 [] = []
quicksort3 (pivot:rest) =
  let (below, above) = partition (< pivot) rest
  in quicksort3 below ++ [pivot] ++ quicksort3 above

I think this one is very elegant. I like the fact that partition guarantees we've got the entire list, and we only use one comparison operator. In all the preceding examples, if you mistype >= as >, then you get a different function (one that sorts but also removes duplicates). So I claim that quicksort3 is a definite improvement.

You might also suspect that it's more efficient, as the two-filter (and equivalent list comprehension) versions appear to scan the list twice, comparing each element against pivot twice: once with < and once with >=. But you might be surprised: Haskell is better than you have any reasonable right to expect at making your programs blazingly fast, no matter how you write them. I suspect that the default instances of < and >= in the Ord typeclass (which both reduce to compare), together with graph reduction, mean that there is no difference in efficiency between the two-filter versus the one-partition versions. (I will check this in reality, and report back).

Unfortunately, so far, all these definitions of quick sort are for pedagogical purposes only, as using the first element of the list for the pivot is very bad. If the list is already sorted, below is always empty, and we have a O(n2) selection sort. The usual way to avoid this is to select the median of the first, last, and middle element of the list. Messing around with these ideas, I came up with quicksort4.hs.

import Debug.Trace (trace)

quicksort4 this =
  case l of
    0 -> []
    1 -> this
    2 -> let (a:b:[]) = this
          in if b > a then this else b : [a]
    _ ->
      let pivot = findPivot
          (below, equal, above) = partitionCompare pivot
      in quicksort4 below ++ equal ++ quicksort4 above
    where
      l = length this; m = l `div` 2
      findPivot = median3 (this !! 0) (this !! m) (this !! (l - 1))
      partitionCompare p = p3 ([], [], []) p this
      p3 r _ [] = {- trace (show r) -} r
      p3 (b, a, e) p (x:xs) =
        case compare x p of
          LT -> p3 (b ++ [x], a, e) p xs
          EQ -> p3 (b, a ++ [x], e) p xs
          GT -> p3 (b, a, e ++ [x]) p xs

median3 a b c =
  case (compare a b, compare b c, compare a c) of
    (LT, LT,  _) -> b
    (LT, GT, LT) -> c
    (LT, GT, GT) -> a
    (GT, LT, LT) -> a
    (GT, LT, GT) -> c
    (GT, GT,  _) -> b
    (EQ,  _,  _) -> a
    (_,  EQ,  _) -> b
    (_,  _,  EQ) -> c

This is an unusual stable quicksort by virtue of the custom partition function, inspired by Haskell's Ordering type, that also extracts elements equal to the pivot. (I wonder if I've hit on something significant here, but it's surely such an obvious optimization that somebody must have thought of it before.)

So quicksort4 is, perhaps, a halfway-decent implementation of quick sort. The median-of-three cheaply improves performance in most cases, but it can be defeated. Authorities also concur that it is better to switch to a different sort algorithm when the list is short, which this version doesn't.

Of course, unless you're a language / library implementor, you really shouldn't be writing sort functions. There is a perfectly serviceable sort in the Prelude; it would be a poor Haskell implementation if this weren't at least as good as quicksort4.

If you really need blazing performance, you'll want to move from standard lists to a data structure with linear lookups, such as Data.Vector. There are a bunch of different sort algorithms available for Vectors, including David R Musser's introsort, which is an optimized quicksort (as used in the C++ standard template library). It's a slight pain to use, as the result comes in a monad (your choice of IO or ST); introsort.hs shows the idea.

import Data.Vector (freeze, fromList, thaw, toList, unsafeFreeze, unsafeThaw)
import Data.Vector.Algorithms.Intro (sort)

introsort l = do
  v <- thaw $ fromList l
  sort v
  r <- freeze v
  return $ toList r

introsort' l = (thaw . fromList) l >>= \v -> sort v >>
                freeze v >>= return . toList

introsort'' l = do { v <- thaw $ fromList l; sort v;
  r <- freeze v; return $ toList r }

introsort3 l = do
  v <- unsafeThaw $ fromList l
  sort v
  r <- unsafeFreeze v
  return $ toList r

main = do
  x' <- introsort x
  print x'

x = [3,1,4,1,5,9,2,6,5,3,5,8] :: [Int]

△ Why Haskell Is My Favourite Programming Language △

◅ Lazy

Appendix: When Type Inference Fails ▻