tobold.org

correct • elegant • free

△ How to do IO in Haskell △

◅ Summary of the do expression

Error Handling ▻

The Handle type

As in other programming languages, most operations on files in Haskell require that you first open the file. This produces a file handle, which references the file for future operations. Normally, you must close the handle when you are finished with it. In Haskell, a file handle is a value of type Handle.

Here's a slightly different version of the line-counting example which uses file handles, countlines1.hs. The motivation for this will become clear soon, as we gradually improve the program.

import IO
    
main = do
  l <- countLinesFile "/etc/passwd"
  putStrLn (show l)
    
countLinesFile f = do
  h <- openFile f ReadMode
  hCountLines h
    
hCountLines h = do
  x <- hGetContents h
  return (length (lines x))

The function hCountLines has type hCountLines :: Handle -> IO Int; like the functions defined by the Haskell report, we prefix an h to the name to indicate that a Handle argument is expected. The handle comes, of course, from the call openFile which has type openFile :: FilePath -> IOMode -> IO Handle. The first argument, of type FilePath (a synonym for String), names the file to be opened. The second argument is of the enumerated type IOMode, and is one of ReadMode, WriteMode, AppendMode, or ReadWriteMode.

The function hGetContents :: Handle -> IO String, returns as a String the entire (remaining) contents of the file referenced by the handle argument. It also implicitly closes the handle [1]. So hGetContents does for a file Handle what readFile does for a file name.

[1]Strictly, hGetContents and readFile put the handle into a semi-closed state, but the upshot is the same: we don't need to close the handle explicitly.

As you might expect, 3 values of type Handle are already in existence when the program starts: stdin, stdout, and stderr. There is a related function getContents which is equivalent to hGetContents stdin - in other words, it reads the remainder of the program's standard input.

Now, we can combine countLinesFile and getArgs to produce a program that counts the lines in the file(s) given as command line arguments. We'll build up to it in stages, starting with a program that works for just one file, countlines2.hs.

import IO
import System (getArgs)
    
main = do
  args <- getArgs
  let f = head args
  output f
    
output f = do
  l <- countLinesFile f
  putStrLn (f ++ ": " ++ show l)
    
{- definitions of countLinesFile and hCountLines as before -}

This program, of course, ignores all but its first command line argument. It also fails if there are no command line arguments. So let's move on to countlines3.hs.

{- imports as before -}
    
main = do
  args <- getArgs
  let acts = map output args
  sequence_ acts

{- definitions of output, countLinesFile, and hCountLines as before -}

This probably appears a bit mysterious. Let's look closely at what's going on. First, we know that args :: [String], and output :: String -> IO (). So the type of acts is acts :: [IO ()]; in other words, it is a list of IO actions! Can we create such a thing? Of course: this is Haskell after all. (Hey, we can even create an infinite list of IO actions if we want.)

Merely defining acts like this doesn't perform any IO, though. To do that, we need to sequence_ the list. As you can probably guess by now, sequence_ has the type sequence_ :: [IO ()] -> IO (): it takes a list of IO actions and performs them in order, which is itself an IO action [2].

[2]OK, so this is another simplification. Like return, sequence_ applies to all sorts of Monads, and its type really is sequence_ :: Monad m => [m a] -> m ().

Running sequence_ over the result of a map is a common pattern; so much so that the Prelude includes the following definition:

mapM_ f as = sequence_ (map f as)

You might guess that the M here stands for Monad. The trailing underscore we'll talk more about later. Using mapM_ we can express our line counter even more succinctly, as in countlines4.hs.

{- imports as before -}
    
main = do
  args <- getArgs
  mapM_ output args

{- definitions of output, countLinesFile, and hCountLines as before -}

There are still a couple of remaining flaws in this program. One is that if you hand this program no command line arguments at all, it produces no output at all: reasonable, but perhaps not very useful. A Unix program would instead count the lines in its standard input in this case. We can fix that quite easily, with countlines5.hs.

import IO
import System (getArgs)
    
main = do
  args <- getArgs
  case args of
    [] -> do
         l <- hCountLines stdin
         putStrLn (show l)
    xs -> mapM_ output xs
    
output f = do
  l <- countLinesFile f
  putStrLn (f ++ ": " ++ show l)
    
countLinesFile f = do
  h <- openFile f ReadMode
  hCountLines h
    
hCountLines h = do
  x <- hGetContents h
  return (length (lines x))

Finally, the motivation for hCountLines becomes clear! If we can count the lines for any Handle, then we can use the same function for either named files or standard input.

The other flaw with this program is that it gives up if it cannot read a file. Here's an interactive session that demonstrates this problem:

$ runhaskell countlines5.hs /etc/passwd /etc/printcap
/etc/passwd: 35
/etc/printcap: 4

$ runhaskell countlines5.hs /etc/passwd /etc/nonesuch /etc/printcap
/etc/passwd: 35
*** Exception: /etc/nonesuch: openFile: does not exist (No such file or directory)

In the second case, a Unix program would display an error message, and then proceed to count the lines in /etc/printcap. We can do this in Haskell, but not till we've looked at error handling.

△ How to do IO in Haskell △

◅ Summary of the do expression

Error Handling ▻