blog.kfish.org

My name is Conrad Parker, and I live in Kyoto, Japan. I work with Renesas in Tokyo, designing the Linux multimedia architecture for a new line of mobile processors; and for Wikimedia Foundation, working on Ogg integration for Mozilla Firefox. I am also working towards a PhD in Computer Science at Kyoto University. Free software projects include the Sweep sound editor and the Annodex media system, and various smaller ones that you can read about here.

Sunday, 17 December 2006

Introductory Haskell Programming in the UNIX Environment

A few months back I was chatting to Don Stewart about scripting in Haskell, and he pointed me towards some Haskell shell scripts he's written.

This weekend, Don wrote some introductory tutorials. Part 1 introduces Haskell in a similar style to how the Camel book introduces Perl -- quite readable, and fairly low on mathematical jargon. Part 2 introduces character and file IO, which I'll dig into below.

Why bother?

It turns out that you can re-implement the core of many simple UNIX tools as one-liners in Haskell. This is interesting because, like C, Haskell compiles to a binary and runs like a real program. Its also interesting because, unlike C, Haskell provides lots of error checking, as well as guarantees against segfaults and memory leaks, for free.

Lazy evaluation

Consider the following implementation of cp (from Part 2), which copies its standard input to standard output:

import System.Environment

main = do
  [infile, outfile] <- getArgs
  s <- readFile infile 
  writeFile outfile s

Although this is pretty simple to understand, it looks like it reads the entire contents of the input file into the variable s, and then writes that to the output file. That would be a huge memory hog, so let's take a look at what's actually going on.

Haskell compiles to a binary, so we can strace the resulting program:

$ strace -o /tmp/cp.out ./cp bigfile.ogg /tmp/bigfile-copy.ogg
$ less /tmp/cp.out
...
read(3, "\300\23n\261\205\v\fD$\r\330,\260\2172Zp\241h\306<\216"..., 8192) = 8192
write(4, "\300\23n\261\205\v\fD$\r\330,\260\2172Zp\241h\306<\216"..., 8192) = 8192 
read(3, "\2646\353t\304\300\f9|\36\10|O@r|\3149\3\340v{4\366|\17"..., 8192) = 8192
write(4, "\2646\353t\304\300\f9|\36\10|O@r|\3149\3\340v{4\366|\17"..., 8192) = 8192
...

We see that it has actually set up an 8K temporary buffer to funnel data back and forth, keeping the memory requirements very low. So the code was not a memory hog at all, even though its pretty simple to understand.

The way this works is that s is not a normal String variable at all. It is an IO String, the embodiment of everything String-like in IO. It lives in a very beautiful, transient and continually changing state of interaction where it might read some chars, write some, read some, write some, and so on until EOF. This is all that an IO String could want from its brief yet pristine existence, and nothing more.

Pass the pipe

Giving our instance of this IO String a name is conceptually similar to the use of named pipes in shell scripts. A direct translation of the above Haskell script into sh might be:

#!/bin/sh

infile=$1
outfile=$2

s="${TMPDIR-/tmp}/$$.fifo"
mkfifo $s

cat < $s > $outfile &
cat < $infile > $s

rm $s

Of course, this example is trivial; you'd only use named pipes for more complex tasks, such as setting up transcoding pipelines, where you might not know the names or parameters of the commands to be run up front. So, what if your shell script doesn't need to be so complex? What if you don't need to name your intermediate pipe?

cat $infile | cat > $outfile

Well, that's fine in Haskell too:

readFile f >>= writeFile g

No more naming our intermediate IO String. But now we know that it's still there, lurking inside that little >>=. This uses lazy evaluation, and we read in the Camel book that laziness is the first virtue of a programmer; Haskell gives it to you in spades.

Labels: ,

Sunday, 3 December 2006

Living in Kyoto

Autumn leaves are falling, and the days are becoming cooler. Awesome hacking weather.

It's now two months since I came to Japan. I spend half my time studying Japanese, half my time coding and doing research, and half my time partying. I've done a fair bit of sightseeing around Kyoto's temples, shrines and gardens, partied in Tokyo with Alex, Horms and Raster a couple of times, and learned my way around the local ramen joints. My favourite cocktail bar stocks over 20 different varieties of gin, including their own pepper-infused variety which tastes great with stingray (flambéd in your face, with spiritus). Student life is hell.

I've been car-free for two months now; it feels great to kick the gasoline habit. I ride a mama-chari, a steel-framed bicycle with a basket on the front. At first I thought of it as the cheap, chunky brick of a bike that it is. Then I realised that I have about the same strength to bike-weight ratio as I did when I was a kid on a BMX. So now I take every opportunity to bunny-hop random obstacles, get air off pavement and jump gutters. For some reason old people look at me funny when I do that.

As soon as I arrived and got my alien registration sorted, I sat down on a borrowed laptop to review the 350 awesome paper submissions for linux.conf.au 2007. The programme is packed full of more awesome than a noseful of wasabi. Go there or lose, your choice.

More recently I've been helping organise FOMS 2007, a developer workshop for free and open source media software, the week before LCA. It too will rock, and give us a great opportunity to plan some real integration between projects, like video editors + annodex + wikipedia. Rock on!

Recent hacking:

  • HOgg: a new commandline tool and Haskell library for manipulating Ogg files.
  • Blender Scripts: Getting Blender to model, animate and render non-interactively.
  • Croquet: An open source, fully hackable 3D metaverse. I'm just learning my way around -- come play in #croquet on irc.freenode.net!

Labels: