tobold.org

correct • elegant • free

△ How to do IO in Haskell △

◅ IO encumbrance can be cumbersome

Reflections

I wrote this tutorial for people like I was a year ago: a Haskell programmer who found my grasp of IO was still a little shaky. I'd been writing tiny and small programs in Haskell for some years, and read a number of articles and tutorials on Monadic IO, plus the Haskell Report itself, but was still unconfident about the whole business of IO.

About a year ago, it became clear that a large shell script I had been hacking on for some time was in need of a serious rewrite. Despite my best efforts, it had become too large, too messy, too inflexible, and too slow for the job. Had I been under pressure from an employer, I would probably have chosen Perl for the rewrite. One of the best things about working for yourself is that sometimes you can choose what's right over what's expedient, and I decided instead at least to try the rewrite in Haskell.

Well, I learned a lot along the way, and eventually the Haskell version worked well enough to replace the original script. Since then, I have learned even more while cleaning up my original Haskell code. Realising that I do now have a pretty confident grasp of how to do IO in Haskell, I felt I should try to lay out my particular learning curve, in the hope that it will help others.

I think the major drawback, for me, of the other articles and tutorials I've read concerning IO in Haskell is that they talk too much about monad theory. Monads are indeed a minor miracle. (I'm old enough to have programmed in a pure functional language, called "glide" if I remember correctly, which lacked monadic IO. It severely limited what the language could do.) But I've come to realise that it isn't necessary to understand monads to exploit the power of the IO monad! So in this tutorial there is almost no talk of monads (these few paragraphs excepted).

Perhaps I should give a concrete example of the kind of bewilderment I felt. At least a couple of different articles I had read on the subject started with the monadic binding functions >> and >>=, then introduced do notation as "syntactic sugar". As a result, I would sometimes take a broken do expression and rewrite it with the monadic operators, having got the idea that I somehow needed to scrape under do's sugar coating. Unsurprisingly, this rarely brought enlightenment.

Here's a rough analogy. Pattern matching in function definitions is, it turns out, just syntactic sugar for case expressions. But I've yet to see a Haskell tutorial that starts by teaching function definition with case, before moving onto the normal, sweetened, syntax. Indeed, when I want to write a case expression, I think "backwards" from function definition, not the other way round! Similarly, now that I feel I have grasped do, I'm reasonably comfortable with the monadic binding functions.

This is not to disparage those other tutorials. No two people learn in the same way, and doubtless moving from monadic generalities to IO specifics is more efficient if it works for you. I am not claiming that this tutorial is better than any other, just different, and there's no harm in having more choice.

By the way, you might be wondering how my Haskell rewrite of that large, messy, inflexible, and slow shell script panned out. Well, the initial Haskell version was about the same length (in terms of lines of code) as the shell script. After some cleanup, it would be somewhat shorter, except it's acquired more features in the mean time. Less messy and more flexible, undoubtedly: the shell script had reached the stage where even small changes required a strong cup of coffee and several minutes poring over the code to understand how it worked before changing a line. By contrast, the Haskell version is reasonably clean, has proved easy to extend, and of course with real data structures I can do things of which I had never previously dreamed.

Speed was not the primary motivation for the rewrite, although it was extremely frustrating to work with a shell script that took nearly 2 minutes to run. (The spaghetti nature of the code also made it hard to extract small pieces to work on in isolation.) So I was surprised and delighted that the Haskell version runs in just 3 seconds. (That's with ghc -O2, but even with runghc it takes only about 10 seconds.) Given that the output is about 6M spread over 1500 files, that's shifting some. Mind, it does gobble up a lot of memory!

Tim Goodwin
November 2007

Credits

This tutorial was written by Tim Goodwin.

© Copyright 2007, 2009 Tim Goodwin

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.0 UK: England & Wales License.

Haskell code appearing in this document has been placed in the Public Domain: you can do absolutely anything you want with it.

All feedback is very welcome. Please email me if you have any comments whatsoever (good or bad!), questions, or suggestions for improvements.

△ How to do IO in Haskell △

◅ IO encumbrance can be cumbersome