tobold.org

correct • elegant • free

△ Why Haskell Is My Favourite Programming Language △

◅ Idioms

Appendix: When Type Inference Fails

In Types, I said "you almost never need to declare a type". Since this article is written for Persons of a Skeptical Disposition, I thought the weasel word "almost" should be clarified.

Optional Type Declaration

The Haskell compiler doesn't mind if you sprinkle your program with type signatures - it will ensure that they are accurate. There are quite a few occasions when it can be beneficial to declare a type explicitly:

  1. it provides documentation;
  2. it can help your code run faster;
  3. it can elicit more useful type errors from the compiler.

Let's look at some examples of those.

1. Documentation

A lot of my Haskell code involves books. A feature of my bookselling website is the "Related Books" widget. This is automatically constructed from the database, essentially by building a matrix that holds a "score", representing how closely related each book is to every other book. From that matrix, we can then construct for each book a list of the most closely related books. Since most books are not related at all, the matrix is sparse, so it is implemented using a nested Map -- an efficient, hash-based data structure.

The scores in the matrix are built up by examining relatedness in various different ways. For example, when 2 books are by the same author, that scores 9 points. If the titles of 2 books share a keyword, that scores 1 point. (Amusing aside: an important part of keyword relatedness is getting the list of "stop words" right. I started with the obvious ones: "a", "an", "the", "of", "from"; then iterated by examining the results. During that process I ended up adding "about", "book", and "illustrated" to the list, which means that An Illustrated Book About Birds in the song Plateau has just one useful keyword: "birds". The song was written by Curt Kirkwood of the Meat Puppets, although I only know it from Nirvana's Unplugged rendition.)

The crucial part of building the matrix is this function:

updateScoreMap :: ScoreMap -> Scorer -> [Book] -> ScoreMap

I added the explicit type signature, partly because ghc -W warns about top-level bindings without type signatures, but mainly because it virtually describes what the function does. It takes an existing ScoreMap, a Scorer (whatever that is), and a list of Books, and it returns another ScoreMap. Taking the type signature together with the name, it is (surely) reasonably obvious that it is updating a ScoreMap by Scoring each Book in the list.

For the curious, here's the complete definition of the function, although I'll grant that this is fairly impenetrable:

updateScoreMap :: ScoreMap -> Scorer -> [Book] -> ScoreMap
updateScoreMap scoreMap (Scorer fn targetMap score) = foldl' upd scoreMap
  where
    upd mp bk = foldl' (updOut $ bookBook bk) mp (fn bk)
    updOut key mp rel =
      let relBks = filter (/= key) $ fromMaybe [] $ M.lookup rel targetMap
      in foldl' (updOut1 key) mp relBks
    updOut1 key mp bk = M.alter (updIn bk) key mp
    updIn bk mbMp =
      let mp = fromMaybe M.empty mbMp
      in Just $ M.insertWith' (+) bk score mp

2. Performance

Now, when I was developing updateScoreMap, I didn't start with the type signature (which was arguably wrong), nor did I have the Scorer type, which combines the scoring function, a TargetMap (which maps from the result of the scoring function back to list of books which produce that result) and the Score itself:

data Scorer = Scorer ScoreFn TargetMap Score

These three related items were originally separate arguments to updateScoreMap, with inferred types. Since (what is now called) Score consists of a sum of literal integers, the compiler inferred a type of Integer -- an arbitrary precision integer, what Lispers would call a "bigint".

Word8: updateScoreMap = 324372584 bytes Integer: updateScoreMap =

3. Debugging

(to be written)

Mandatory Type Declaraton

Very rarely is it necessary to declare a type. Just to see what that looks like, consider doubler0.hs; it reads a number from standard input, and doubles it.

main = do
  ln <- getLine
  let val = read ln
  print (val * 2.0)

No type problems here. But suppose we had a different sort of doubling in mind: doubler1.hs is supposed to read a number from standard input, and print it twice.

-- Wrong!
main = do
  ln <- getLine
  let val = read ln
  print val
  print val

But the snag is that there's nothing here to say that val is a number: the standard functions read and print are defined for just about every type. In the previous example, the appearance of the literal 2.0 gave the type inference engine enough information to deduce a type for val, but now it can't:

doubler1.hs:5:3:
    Ambiguous type variable `a0' in the constraints:
      (Show a0) arising from a use of `print' at doubler1.hs:5:3-7
      (Read a0) arising from a use of `read' at doubler1.hs:3:13-16
    Probable fix: add a type signature that fixes these type variable(s)
    In the expression: print val

And, to be fair, the program is pretty nonsensical. If we just want to echo the input twice, we don't need to parse it at all, as in doubler2.hs.

main = do
  ln <- getLine
  print ln
  print ln

Or, if we do anything else with val, that will be enough to give it a type, as in doubler3.hs. (Note that the entire program undergoes type inference, even though lazy evaluation means that val2 will never actually be calculated.)

main = do
  ln <- getLine
  let val = read ln
      val2 = val * 2.0
  print val
  print val

Or, finally, we can simply patch it up by declaring an explicit type, as in doubler4.hs.

main = do
  ln <- getLine
  let val = read ln :: Double
  print val
  print val

△ Why Haskell Is My Favourite Programming Language △

◅ Idioms