tobold.org

correct • elegant • free

△ How to do IO in Haskell △

◅ Error Handling

Reflections ▻

IO encumbrance can be cumbersome

We have said that IO actions return a type which is "IO encumbered". For example, readFile :: IO String returns not a plain String, but an IO String. We can remove the IO encumbrance by binding a name inside a do expression, but then that name is only in scope within the do expression.

Consider a program where several different functions need the contents of a file. It would be nice to write a top-level function that returns the contents as a plain String. Then we could call that function whenever we needed it. Here's badlineschars.hs.

file = "/etc/passwd"

lineCount = length (lines contents)
charCount = length contents

{- This cannot be done in Haskell! -}
contents :: String
contents = do
  s <- readFile file
  s

main = do
  putStrLn (show lineCount ++ " lines")
  putStrLn (show charCount ++ " characters")

It can't be done. The type system ensures that any function which calls readFile (or any other IO action) will itself have an IO type.

There are two options. The first option is to call readFile once at the top level, and pass the result down to each function that needs it, as in lineschars0.hs.

file = "/etc/passwd"

lineCount s = length (lines s)
charCount s = length s

main = do
  x <- readFile file
  putStrLn (show (lineCount x) ++ " lines")
  putStrLn (show (charCount x) ++ " characters")

Note that in this case, the functions lineCount and charCount avoid IO types. However, they have an extra argument.

The second option is to give the subsidiary functions IO types, as in lineschars1.hs.

file = "/etc/passwd"

lineCount = do
  s <- readFile file
  return (length (lines s))

charCount = do
  s <- readFile file
  return (length s)

main = do
  ls <- lineCount
  putStrLn (show ls ++ " lines")
  cs <- charCount
  putStrLn (show cs ++ " characters")

In this version, lineCount and charCount no longer take an argument (as in the original, broken attempt), but they now have IO types. Note also that the file is read twice.

Obviously, these are trivial examples, but you will encounter similar situations time & again. In general, solutions in the first style seem preferable (it constrains "IO encumbrance" to the top level, and in any case does less work), although it can be a nuisance to pass all that state down to lower level functions.

Similar problems can occur with output. Consider sudan.hs.

sudan n x y | n == 0    = x + y
            | y == 0    = x
            | otherwise = sudan (n - 1) sudan' (sudan' + y)
    where sudan' = sudan n x (y - 1)

sudTup (n, x, y) = sudan n x y
sudList [n, x, y] = sudTup (n, x, y)

main = print (sudList [1, 5, 4])

The sudTup and sudList functions are there just to make sudan a "low-level" function, some distance from main. Suppose we want to create a variant of sudan that prints a trace of how it is called. Easily done, but the new sudan will have an IO type, and this will propagate through all the intermediate functions up to main. Here's sudantrace0.hs.

sudan n x y | n == 0    = trace (x + y)
            | y == 0    = trace x
            | otherwise = do
  trace 0
  sudan' <- sudan n x (y - 1)
  sudan (n - 1) sudan' (sudan' + y)
    where
      trace r = do
        putStrLn("sudan " ++ show n ++ " " ++ show x ++ " " ++ show y)
        return r

sudTup (n, x, y) = do { p <- sudan n x y; return p }
sudList [n, x, y] = do { q <- sudTup (n, x, y); return q }

main = do
  r <- sudList [1, 5, 4]
  print r

Not only are the changes to the function sudan invasive, but also sudList, sudTup, and even main itself all have to change. This is, to put it mildly, a nuisance. Worse still, the output is not a trace of the original sudan function, since by rewriting it with an IO type we are explicitly specifying (some of) the evaluation order. As with the input example, though, there is no way around this.

At this point, you may well be tempted to go back to programming in <insert name of your favourite programming language before you discovered Haskell>. I will endeavour to offer some crumbs of comfort.

First, the restrictions on IO were not capriciously foisted upon us by ivory tower academics in order to keep Haskell pure. The restrictions are the only way that it is possible to do IO safely in a lazy language.

Secondly, if you find yourself chasing up & down a program adding "IOness" to lots of functions (as a permanent feature), you probably didn't design the program right in the first place. (You have my sympathy: lots of my programs are scarcely designed at all, they started as quick hacks and "just growed". Haskell offers its sympathy by making it much easier than most languages to implement redesigns.) Of course, if you are writing anything larger than a very tiny program, it is well worth pausing before you start to decide which parts of the program need to perform IO.

Thirdly, if you are debugging and really just need to see what some data structure looks like ("bung in a printf"), there are a couple of handy kludges you can use. In the module System.IO.Unsafe there is a function unsafePerformIO :: IO a -> a. As the extraordinary type indicates, unsafePerformIO strips "IOness" from a value. The downside is, as the name implies, this operation is unsafe: there are no guarantees that it will do what you expect when you expect it to. And it breaks the Haskell type system. In theory unsafePerformIO should not exist, but in practice it's sometimes so useful that it is allowed to persist.

To put it more vividly:

Fortunately (at least for supervisors and code reviewers) you will have to import System.IO.Unsafe at the top of any module that uses unsafePerformIO, so a quick glance will reveal this transgression of good Haskell [1].

[1]We used to joke that the requirement on predeclaring labels (line numbers) in Pascal was so that supervisors could quickly reject any program that used goto, without having to read past the first few lines of code.

With all the caveats out of the way, how do we use unsafePerformIO? Here's sudanunsafe.hs. (If you don't understand the use of the $ operator here, please see my note about it.

import System.IO.Unsafe (unsafePerformIO)

sudan n x y = unsafePerformIO $ do
                putStrLn ("sudan " ++ show n ++ " " ++ show x ++ " " ++ show y)
                return (realSudan n x y)

realSudan n x y | n == 0    = x + y
                | y == 0    = x
                | otherwise = sudan (n - 1) sudan' (sudan' + y)
    where sudan' = sudan n x (y - 1)

sudTup (n, x, y) = sudan n x y
sudList [n, x, y] = sudTup (n, x, y)

main = print (sudList [1, 5, 4])

The output of this example may not be pretty: it actually varies from one Haskell environment to another, which emphasizes my point that when unsafePerformIO is evaluated is unpredictable. However, the example demonstrates that it is possible to get some handle on what sudan is doing without percolating "IOness" up and down the entire program: note that sudTup, sudList, and main have not changed at all. In a tight debugging spot, this is just the ticket; but please do tidy up the code and remove unsafePerformIO once the bugs have been squashed!

As a convenience, in the module Debug.Trace is the function trace :: String -> a -> a. This function writes its first argument to standard error, then returns its second argument. You will not be surprised to learn that trace utilizes unsafePerformIO, and therefore all the same caveats apply. Calls to trace must be excised from your program before you can consider it finished. Here's sudantrace1.hs, which traces the sudan function using trace.

import Debug.Trace (trace)

sudan n x y = trace msg realSudan n x y
    where msg = "sudan " ++ show n ++ " " ++ show x ++ " " ++ show y

realSudan n x y | n == 0    = x + y
                | y == 0    = x
                | otherwise = sudan (n - 1) sudan' (sudan' + y)
    where sudan' = sudan n x (y - 1)

sudTup (n, x, y) = sudan n x y
sudList [n, x, y] = sudTup (n, x, y)

main = print (sudList [1, 5, 4])

Exercises

17. Try sudantrace1.hs and sudanunsafe.hs in all the Haskell environments you have available. Is the output the same?

18. Write a program that uses unsafePerformIO and dumps core, or otherwise crashes.

△ How to do IO in Haskell △

◅ Error Handling

Reflections ▻