blog.kfish.org

My name is Conrad Parker, and I live in Kyoto, Japan. I am working towards a PhD in Computer Science at Kyoto University, finishing September 2009. I also work on some free software projects including the Sweep sound editor and the Annodex media system, and various smaller projects which you can read about here.

Sunday, 17 December 2006

Introductory Haskell Programming in the UNIX Environment

A few months back I was chatting to Don Stewart about scripting in Haskell, and he pointed me towards some Haskell shell scripts he's written.

This weekend, Don wrote some introductory tutorials. Part 1 introduces Haskell in a similar style to how the Camel book introduces Perl -- quite readable, and fairly low on mathematical jargon. Part 2 introduces character and file IO, which I'll dig into below.

Why bother?

It turns out that you can re-implement the core of many simple UNIX tools as one-liners in Haskell. This is interesting because, like C, Haskell compiles to a binary and runs like a real program. Its also interesting because, unlike C, Haskell provides lots of error checking, as well as guarantees against segfaults and memory leaks, for free.

Lazy evaluation

Consider the following implementation of cp (from Part 2), which copies its standard input to standard output:

import System.Environment

main = do
  [infile, outfile] <- getArgs
  s <- readFile infile 
  writeFile outfile s

Although this is pretty simple to understand, it looks like it reads the entire contents of the input file into the variable s, and then writes that to the output file. That would be a huge memory hog, so let's take a look at what's actually going on.

Haskell compiles to a binary, so we can strace the resulting program:

$ strace -o /tmp/cp.out ./cp bigfile.ogg /tmp/bigfile-copy.ogg
$ less /tmp/cp.out
...
read(3, "\300\23n\261\205\v\fD$\r\330,\260\2172Zp\241h\306<\216"..., 8192) = 8192
write(4, "\300\23n\261\205\v\fD$\r\330,\260\2172Zp\241h\306<\216"..., 8192) = 8192 
read(3, "\2646\353t\304\300\f9|\36\10|O@r|\3149\3\340v{4\366|\17"..., 8192) = 8192
write(4, "\2646\353t\304\300\f9|\36\10|O@r|\3149\3\340v{4\366|\17"..., 8192) = 8192
...

We see that it has actually set up an 8K temporary buffer to funnel data back and forth, keeping the memory requirements very low. So the code was not a memory hog at all, even though its pretty simple to understand.

The way this works is that s is not a normal String variable at all. It is an IO String, the embodiment of everything String-like in IO. It lives in a very beautiful, transient and continually changing state of interaction where it might read some chars, write some, read some, write some, and so on until EOF. This is all that an IO String could want from its brief yet pristine existence, and nothing more.

Pass the pipe

Giving our instance of this IO String a name is conceptually similar to the use of named pipes in shell scripts. A direct translation of the above Haskell script into sh might be:

#!/bin/sh

infile=$1
outfile=$2

s="${TMPDIR-/tmp}/$$.fifo"
mkfifo $s

cat < $s > $outfile &
cat < $infile > $s

rm $s

Of course, this example is trivial; you'd only use named pipes for more complex tasks, such as setting up transcoding pipelines, where you might not know the names or parameters of the commands to be run up front. So, what if your shell script doesn't need to be so complex? What if you don't need to name your intermediate pipe?

cat $infile | cat > $outfile

Well, that's fine in Haskell too:

readFile f >>= writeFile g

No more naming our intermediate IO String. But now we know that it's still there, lurking inside that little >>=. This uses lazy evaluation, and we read in the Camel book that laziness is the first virtue of a programmer; Haskell gives it to you in spades.

Labels: ,

Tuesday, 27 September 2005

Transcoding through fifos

When converting media files from one format to another (transcoding), a lot of intermediate data is generated: you plug a decoder into an encoder, and many megabytes of raw, uncompressed media data are shovelled between the two. Ideally you don't want to write this data to a disk file. The old-school Unix way to handle this is to use a named pipe:
#!/bin/sh

INPUT=$1
BASE=`basename $INPUT`
OUTPUT=`echo $BASE|sed -e "s/\\.[^.]*$/.spx/g"`

FIFO="${TMPDIR-/tmp}/$BASE.$$"
mkfifo $FIFO

mpg123 -w $FIFO $INPUT &
speexenc $FIFO $OUTPUT

rm -f $FIFO

m2anx is a much longer script using this technique to transcode (and optionally annodex) from any format supported by mplayer to Ogg Theora, Ogg Vorbis or Ogg Speex. It uses multiple named pipes to handle the audio, video, and muxing in parallel, and uses a shell trap to clean up the named pipes on exit.

The new-school Unix way to handle this problem is to build media pipelines using GStreamer. If you're just after a simple and reliable command-line theora encoder try ffmpeg2theora.

Labels: