Introductory Haskell Programming in the UNIX Environment
A few months back I was chatting to Don Stewart about scripting in Haskell, and he pointed me towards some Haskell shell scripts he's written.
This weekend, Don wrote some introductory tutorials. Part 1 introduces Haskell in a similar style to how the Camel book introduces Perl -- quite readable, and fairly low on mathematical jargon. Part 2 introduces character and file IO, which I'll dig into below.
Why bother?It turns out that you can re-implement the core of many simple UNIX tools as one-liners in Haskell. This is interesting because, like C, Haskell compiles to a binary and runs like a real program. Its also interesting because, unlike C, Haskell provides lots of error checking, as well as guarantees against segfaults and memory leaks, for free.
Lazy evaluationConsider the following implementation of cp (from Part 2), which copies its standard input to standard output:
import System.Environment main = do [infile, outfile] <- getArgs s <- readFile infile writeFile outfile s
Although this is pretty simple to understand, it looks like it reads the entire contents of the input file into the variable s, and then writes that to the output file. That would be a huge memory hog, so let's take a look at what's actually going on.
Haskell compiles to a binary, so we can strace the resulting program:
$ strace -o /tmp/cp.out ./cp bigfile.ogg /tmp/bigfile-copy.ogg
$ less /tmp/cp.out
...
read(3, "\300\23n\261\205\v\fD$\r\330,\260\2172Zp\241h\306<\216"..., 8192) = 8192
write(4, "\300\23n\261\205\v\fD$\r\330,\260\2172Zp\241h\306<\216"..., 8192) = 8192
read(3, "\2646\353t\304\300\f9|\36\10|O@r|\3149\3\340v{4\366|\17"..., 8192) = 8192
write(4, "\2646\353t\304\300\f9|\36\10|O@r|\3149\3\340v{4\366|\17"..., 8192) = 8192
...
We see that it has actually set up an 8K temporary buffer to funnel data back and forth, keeping the memory requirements very low. So the code was not a memory hog at all, even though its pretty simple to understand.
The way this works is that s is not a normal String
variable at all. It is an IO String, the embodiment of
everything String-like in IO. It lives in a very
beautiful, transient and continually changing state of interaction where it
might read some chars, write some, read some, write some, and so on until EOF.
This is all that an IO String could want from its brief yet
pristine existence, and nothing more.
Giving our instance of this IO String a name is
conceptually similar to the use of named pipes in shell scripts. A direct
translation of the above Haskell script into sh might be:
#!/bin/sh
infile=$1
outfile=$2
s="${TMPDIR-/tmp}/$$.fifo"
mkfifo $s
cat < $s > $outfile &
cat < $infile > $s
rm $s
Of course, this example is trivial; you'd only use named pipes for more complex tasks, such as setting up transcoding pipelines, where you might not know the names or parameters of the commands to be run up front. So, what if your shell script doesn't need to be so complex? What if you don't need to name your intermediate pipe?
cat $infile | cat > $outfile
Well, that's fine in Haskell too:
readFile f >>= writeFile g
No more naming our intermediate IO String. But now we know that
it's still there, lurking inside that little >>=. This uses


0 Comments:
Post a Comment
Links to this post:
Create a Link
<< Home