Sunday, 18 September 2011

Iteratees at Tsuru Capital

Tsuru Capital is a small company. We build our internal systems for live trading and offline analysis in Haskell, and we're proud to be sponsoring ICFP 2011. We use iteratees throughout our systems, and have actively encouraged all our staff to contribute changes upstream and participate in community design discussions. By being part of the open source community and taking part in peer-review, we all end up with better software.

Over time various Tsuru staff members have worked on tools using iteratees, including (grepping the CONTRIBUTORS files): Bryan Buecking, Michael Baikov, Elliott Pace, Conrad Parker, Akio Takano, and Maciej Wos. There's been some lively discussions and many small patches providing functions that we use in production every day.

Last year Conal Elliott provided some mentoring to Tsuru staff, during which we worked through a denotational semantics for iteratees. This resulted in discussions on both the iteratee project list and haskell-cafe about Semantics of iteratees, enumerators, enumeratees.

By using iteratees in production we've contributed various simple but practical functions, including:

  • enumFdFollow, an enumerator (data source) which allows you to process the growing tail of a log file as it is being written.
  • ioIter, an iteratee that uses an IO action to determine what to do. Typically this is action involves some user interaction, such as a user issuing commands like play/pause/next/prev.
  • ListLike functions last (an iteratee that efficiently returns the last element of a stream), mapM_ and foldM.
  • mapChunksM_, a more efficient version of mapM_ that operates on the underlying chunks, eg. logger = mapChunksM_ (liftIO . print).
  • takeWhile, and its enumeratee variant takeWhileE


  • endianRead8, an iteratee for reading 64bit values with a given endianness. I've used this in ght as well as an internal project.

Stream conversion We've done quite a bit of work on stream conversion, as we use a few different layers of data processing. The iteratee architecture allows you to isolate the data source, conversion and processing functions; much of what we've worked on involves ensuring the converters (enumeratees) can control or translate control messages, so that commands like "seek" do not get lost. We've also built combinators to simplify the task of creating new stream converters.
  • convStateStream, which converts one stream into another while continually updating an internal state. Importantly for variable bitrate binary data, it can produce elements of the output stream from data that spans stream chunks.
  • (><>) and (<><). These allow stream converters to be composed without rewriting boilerplate. Jon Lato gives a good example using these in the StackOverflow answer to Attoparsec Iteratee.
  • zip, zip[345], sequence_ for using multiple iteratees to process a single stream instance, and (for zip*) collecting the results.
  • eneeCheckIfDone*: This family of functions (eneeCheckIfDoneHandle, eneeCheckIfDonePass, eneeCheckIfDoneIgnore) can be used with
    unfoldConvStreamCheck to make a version of unfoldConvStream which respects seek messages.


Parallel stream processing We often want to do multiple unrelated analysis tasks on a data stream. Whereas sequence_ takes a list of iteratees to run simultaneously and handles each input chunk by mapM across that list, psequence_ runs each input iteratee in a separate forkIO thread. For a real-world example, see Michael Baikov's post about psequence, psequence_, parE, parI.

Thanks

Thanks to John Lato for consistently and reliably maintaining the iteratee package, providing thoughtful feedback and graciously suggesting improvements.

Friday, 30 July 2010

A Haskell template for GTK, Glade, Cairo apps

I just uploaded cairo-appbase to Hackage. This is a template for building new GUI applications using GTK, Glade and Cairo.

To install it:

$ cabal update $ cabal install gtk2hs-buildtools $ cabal install cairo-appbase

Then, run cairo-appbase:

The GTK widget layout is done via a Glade XML file which can be edited visually using glade. This template includes working callbacks to handle the File and Help menus and File Save/Open dialogs, with dummy handlers for selecting filenames and the Edit menu's cut/copy/paste. The main canvas uses Cairo for graphics rendering, and includes example code from the cairo package.

To build your own application on top of this, first grab the code. You can either grab it from hackage with cabal unpack cairo-appbase, or clone the git repo:

To add widgets, install glade from your distro system and run glade data/main.glade. Note that you must run cabal install to put the glade file in the correct place for your application to pick it up. To modify the code, edit src/cairo-appbase.hs. Hooking up functions to widgets is very simple: get a widget by name (which you set in glade file), and hook one of its signals (which you found in the Signals tab in glade) to an IO () action:

cut1 <- get G.castToMenuItem "cut1" G.onActivateLeaf cut1 $ myCut

The template code includes a trivial definition of myCut:

myCut :: IO () myCut = putStrLn "Cut"

A real application will want to pass data to the callback. In C, this is fairly tedious as you only have a single void * to pass to callbacks as "user_data", and applications typically do lots of marshalling and unmarshalling to pass data around. In Haskell however, you can make yourself a more complex callback handler and use a curried version of it in each instance:

cut1 <- get G.castToMenuItem "cut1" G.onActivateLeaf cut1 $ myComplexCut project phase 7 ... myCut :: Project -> MoonPhase -> LuckyNumber -> IO () myCut project phase num = do let selection = currentSelection project when (phase == Full) howl when (num /= 7) fail doActualCut selection

Erik de Castro Lopo discussed how currying at length in his April 2006 post, GTK+ Callbacks in OCaml. The Haskell GTK+ bindings have been around a long time, but were only recently cabalized and uploaded to Hackage. I put together cairo-appbase in August 2006 when I was playing with it, but now that I have more time for Haskell I've updated it and uploaded it to Hackage. Enjoy, and hack away!

Tuesday, 15 June 2010

Speeding up cross-compiling with ccache and distcc on Debian

The conventional way of doing embedded development is to cross-compile everything then copy it onto the target, but working natively allows you to use "normal" tools and workflows. We want to issue commands directly to a shell on the development board or phone prototype, and speed up the compilation step by distributing it to a faster machine such as your workstation. This isn't the usual way to do things, but I like working this way, and here's how to make it work faster.

This article explains how to configure a Debian PC host and a Debian target system so that development done on the target invokes the cross-compiler on the host. The advantage offered by this approach is a speed-up of compile times. Note that this does not speed up other aspects of building, such as source configuration (which can be slow for packages using GNU autotools), linking or installation.

We assume that a full Debian system is available for development on the target: packages can be built natively using gcc and a full toolchain (binutils, ld etc.), and tools such as automake, autoconf, libtool, version control systems etc. are available.

The setup we work with uses Debian on both the host PC and the target. The examples will use a debian-sh4 on the target, with the sh4-linux-gnu-gcc cross compiler installed on the build host. For other target architectures, simply replace all instances of sh4-linux-gnu- with the arch prefix, eg. arm-linux-gnueabi-.

In this article, commands executed natively on the target device will use the prompt target#, and commands executed on the x86 build host will use the prompt host#.

The first step is to ensure you can build software natively on the target. For GCC:

target$ gcc hello.c -o hello
and for autotools projects:
target$ ./configure target$ make

ccache

Next, install ccache:

target# apt-get install ccache

ccache keeps a cache of compiled object files, such that the same compilation does not need to be repeated. This cache exists outside of your source tree, so it persists across invocations of 'make clean'. It compares the pre-processed source files, so that compilation of a source file will happen if it or any of its included headers is changed. The usual way to use ccache is to simply set your C compiler to be "ccache gcc".

target$ ccache gcc hello.c -o hello
and for autotools projects:
target$ CC="ccache gcc" ./configure target$ make

Debian also sets things up so that if you put /usr/lib/ccache ahead of /usr/bin in your PATH, it will get used for native builds whenever gcc is invoked. That is useful to set up, but not necessary for this setup with distcc.

An aside about compiler naming

Before we move on to cross compiling, it's important to realize that the native compiler is also available with its full architecture prefix:

target$ ls -l /usr/bin/sh4-linux-gnu-gcc lrwxrwxrwx 1 root root 7 Mar 17 01:45 /usr/bin/sh4-linux-gnu-gcc -> gcc-4.4

The binary called sh4-linux-gnu-gcc does the same thing on both the host and target: you can simply think of it as a program that takes in a C file and produces an sh4 binary:


                +-------------------+ 
    C source -> | sh4-linux-gnu-gcc | -> sh4 binary
                +-------------------+ 

The distinction between "native" and "cross-" compiling is then just a matter of what machine you are running this compiler program on. If you run sh4-linux-gnu-gcc on an x86 machine, you are cross-compiling, but if you run sh4-linux-gnu-gcc on an sh4 machine then you are just compiling. Of course the compiler binaries are different; the point is that a shell script which calls the compiler by its full name would work without modification on either machine.

distcc

distcc allows you to use a compiler running on a different, faster machine. This involves running a server (distccd) there, and it is far easier to set up than it would seem.

First, ensure that we can cross-compile on the build host:

host$ sh4-linux-gnu-gcc hello.c -o hello host$ file hello sh4-linux-gnu-gcc hello.c -o hello host$ file hello hello: ELF 32-bit LSB executable, Renesas SH, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped

Next, we install distcc on the build host:

host# apt-get install distcc

To activate the server and tell it what clients to allow, edit /etc/default/distcc:

STARTDISTCC="true" ALLOWEDNETS="127.0.0.1 10.0.0.0/16"

and restart it:

host# /etc/init.d/distcc restart

You can check that it is running:

host# netstat -pant | grep distcc tcp 0 0 10.0.0.1:3632 0.0.0.0:* LISTEN 16142/distccd

So that we can ensure that compilation is running on the host, watch this log file in a separate window:

host# tail -f /var/log/distccd.log

Then, on the client (ie. the target system) we also install distcc:

target# apt-get install distcc

We do not need to modify the distcc configuration on the target as it will not be running the server, so Debian's defaults are fine. However, we do need to set an environment variable to specify which machine[s] to compile on.

target$ export DISTCC_HOSTS='host'

You run distcc in a similar manner to ccache, by simply setting your C compiler. Note that we are only distributing compilation, not linking, so we just run the compilation step:

target$ distcc sh4-linux-gnu-gcc -c hello.c

This should turn up in the host's distcc logs:

host# tail -f /var/log/distccd.log distccd[16390] (dcc_job_summary) client: 10.0.1.103:45983 COMPILE_OK exit:0 sig:0 core:0 ret:0 time:46ms sh4-linux-gnu-gcc hello.c

And back on the target, we have the hello.o file which was generated by the sh4-linux-gnu-gcc cross-compiler on the build host:

target$ ls -l *.o total 16 -rw-r--r-- 1 conrad conrad 884 Jun 11 07:28 hello.o target$ file hello.o hello.o: ELF 32-bit LSB relocatable, Renesas SH, version 1 MathCoPro/FPU/MAU Required (SYSV), not stripped

The C file was transferred over the network to the host, where distccd invoked the cross-compiler and then sent the results back to the target. The end result is the same as if sh4-linux-gnu-gcc had been run directly on the target, but we avoided using the slower CPU of the target system.

To fully take advantage of distcc, you can run distccd on multiple build hosts, and specify all their names in the DISTCC_HOSTS environment variable on the target. Then use eg. "make -j 10" to run multiple compiles in parallel, which will each then get farmed out to different build hosts.

Combining ccache and distcc

You can quite simply put these two tools together, by calling:

target$ ccache distcc sh4-linux-gnu-gcc -c hello.c
You can quite simply put these two tools together, by setting CCACHE_PREFIX to "distcc" before calling ccache:
target$ export CCACHE_PREFIX="distcc" target$ ccache sh4-linux-gnu-gcc -c hello.c
(Thanks to Joel Rosdahl for the correction).

The first time we run this the code is cross-compiled on the build host and sent back to the target, and ccache keeps track of that. The second time we run this, ccache notices that it already has a stored copy of the output hello.o, and decides to use that rather than calling the compiler. (From ccache's point of view, the compiler is "distcc sh4-linux-gnu-gcc").

For autotools project, you can simply do the following before calling ./configure:

target$ export CCACHE_PREFIX="distcc" target$ export CC="ccache sh4-linux-gnu-gcc"
After which the ./configure step will write Makefiles which specify to compile with ccache, so the rest of your build (ie. make -j 10) just works as normal without any new settings or any other change to your workflow.

For more discussion of combining distcc with ccache, see the distcc(1) man page.

Summary

By combining both ccache and distcc we can:

  1. avoid redundant compilations, and
  2. distribute required compilations to a faster build host.
The result is faster build times, which speeds up your development cycle and allows you to work more efficiently on the target system itself.

Monday, 24 May 2010

Monday Music: Heyoo by Kobi

Made with AUBE on Linux last November, this is Heyoo by Kobi:

AUBE/Metadecks Live is a music production tool designed for live use. A track like this is made by setting up a bunch of sample, rhythm and effects units, playing them for a while and recording the result.

This post uses the HTML5 <audio> tag. If the audio controls are not present then the problem may simply be that your browser does not support HTML5 <audio> with Ogg Vorbis (in which case upgrade to one that does). If you are reading this in a feed reader or via a planet aggregator, then the problem may be that the reader or aggregator strips the HTML5 <audio> tag -- in which case you might want to switch to a more modern reader, or upgrade your planet.

Monday, 17 May 2010

Monday Music: Deika by Kobi

Made with AUBE on Linux a few years ago, this is Deika by Kobi:

AUBE/Metadecks Live is a music production tool designed for live use. A track like this is made by setting up a bunch of sample, rhythm and effects units, playing them for a while and recording the result.

Tuesday, 11 May 2010

Streaming Ogg Vorbis with sighttpd 1.1.0

I just released Sighttpd version 1.1.0, which includes support for streaming Ogg Vorbis from standard input. In an earlier post introducing a new HTTP streaming server (sighttpd 1.0.0), I described how sighttpd could be used to stream raw data, such as plain text:

$ while `true`; do date; sleep 1; done | sighttpd

and H.264 elementary video streams but not Ogg, because an Ogg stream needs to have setup headers prepended for each codec stream. "Instead, we would need to do something like Icecast: buffering these headers and serving them first to each client that connects before continuing with live Ogg pages".

So, that's exactly what version 1.1.0 introduces with a new <OggStdin> module. The sighttpd.conf setup is similar to the normal <Stdin> configuration:

Listen 3000

# Streaming Ogg Vorbis from stdin, using the special
# OggStdin module that caches Ogg Vorbis headers
<OggStdin>
        Path "/stream.ogg"
        Type "audio/ogg"
</OggStdin>

You can run this with a shell pipeline like:

$ arecord -c 2 -r 44100 -f S16_LE -t wav | oggenc -o - - | sighttpd -f examples/sighttpd-oggstdin.conf
And you can connect to it as an Ogg stream, eg:
$ ogg123 http://localhost:3000/stream.ogg

At the start of an Ogg Vorbis file or stream are three mandatory header packets:

  1. The Vorbis BOS (beginning of stream) header, which describes basic information like the number of channels and the samplerate of the audio.
  2. Metadata in VorbisComment format, which basically consists of text values like "ARTIST=Richard Feynman".
  3. The setup header, which includes "codec setup information as well as the complete VQ and Huffman codebooks needed for decode".

We can view the raw contents of these packets with oggz dump:

$ oggz dump Kobi-Birk_20011125.ogg |head -n 30
00:00:00.000: serialno 0639825516, granulepos 0, packetno 0 *** bos: 30 bytes
    0000: 0176 6f72 6269 7300 0000 0002 44ac 0000  .vorbis.....D...
    0010: 18fc ffff 00f4 0100 18fc ffff b801       ..............

00:00:00.000: serialno 0639825516, calc. gpos 0, packetno 1: 94 bytes
    0000: 0376 6f72 6269 7320 0000 0058 6970 686f  .vorbis ...Xipho
    0010: 7068 6f72 7573 206c 6962 566f 7262 6973  phorus libVorbis
    0020: 2049 2032 3030 3130 3831 3303 0000 000a   I 20010813.... 
    0030: 0000 0074 6974 6c65 0042 6972 6b0b 0000  ...title.Birk ..
    0040: 0061 7274 6973 7400 4b6f 6269 0d00 0000  .artist.Kobi ...
    0050: 6461 7465 0032 3030 3131 3132 3501       date.20011125.

00:00:00.000: serialno 0639825516, granulepos 0, packetno 2: 2.820 kB
    0000: 0576 6f72 6269 7325 4243 5601 0040 0000  .vorbis%BCV..@..
    0010: 8020 9a19 a7b1 945a 6bad 1d72 9a42 abb5  . .....Zk..r.B..
    0020: d65a 6bad 2594 5a5b adb5 d65a 6bad b5d6  .Zk.%.Z[...Zk...
    0030: 5a6b adb5 d65a 6b8d 81d0 9055 0000 1000  Zk...Zk....U....
    0040: 0021 0c55 0651 c99c d65a 6b44 1064 0649  .! U.Q...ZkD.d.I
    0050: e920 d65a 6be8 a0a5 105a 4cad d65a 6bad  . .Zk....ZL..Zk.
    0060: b5d6 5a6b adb5 d61a 6320 3464 1500 0004  ..Zk....c 4d....
    0070: 00c0 1863 8c31 0619 6410 5248 21a5 9452  ...c.1..d.RH!..R
    0080: 8c31 e618 74d2 5147 9d76 da71 6821 9594  .1..t.QG.v.qh!..
    0090: 5acc 2de7 9c73 ceb9 d61a 080d 5905 0024  Z.-..s..... Y..$
    00a0: 0000 a838 8664 5886 0584 86ac 0200 3200  ...8.dX.......2.
    00b0: 0004 1024 4353 34c7 d554 cf34 5d55 0542  ...$CS4..T.4]U.B
    00c0: 4356 0100 4000 0002 8000 0a18 4451 1445  CV..@..... .DQ.E
    00d0: 5114 4551 1445 d1f3 3ccf f33c cff3 3ccf  Q.EQ.E..<..<..<.
    00e0: f33c cff3 3ccf f33c cf03 4243 5601 0009  .<..<..<..BCV.. 
    00f0: 0000 1a8a a228 8ee2 00a1 21ab 0080 0c00  .....(....!... .
    0100: 0001 0cc7 9014 49d1 244d d22c cff2 80d0  .. ...I.$M.,....

When a client connects to a stream somewhere in the middle of a song, these headers from the beginning are required in order to decode the audio data. sighttpd writes the pages containing the 3 header packets to a temporary file (created with mkstemp(3)). When a new client connects, the contents of that file are sent to it with sendfile(2) before jumping into the current contents of the stream.

I'm not trying to make a replacement for Icecast, but instead building a more general streaming server -- and of course I want it to have good Ogg support! So, please try it out, and leave some feedback in the comments or in email to me or ogg-dev :)

Monday, 10 May 2010

Monday Music: Birk by Kobi

Made with AUBE on Linux a few years ago, this is Birk by Kobi:

AUBE/Metadecks Live is a music production tool designed for live use. A track like this is made by setting up a bunch of sample, rhythm and effects units, playing them for a while and recording the result.

The rhythms are made with a simple drum machine, which is basically a matrix of triggers tied up to sample players. These are fed through a cascade of delays to get the rolling effect -- I love feeding a short delay to provide echo into a longer delay which matches the beat, so that the individual sounds combine with each other to make a more complex rhythm.

The rhythm is sent through a resonant low-pass filter; as the track starts off, the cutoff of that filter is raised to give the effect of opening up the whole track. It's a pretty simple technique, used in tracks like Fatboy Slim's Right Here, Right Now.

The filtered version is called the "wet" part of the mix, and the unfiltered version is the "dry" part. Changing the amount of these is useful: the dry part provides definition (the attacks of each drum are clearly audible), and the wet part has a more interesting texture. In a sequencer you might program the "wetness" of the effect; I like to work with it more directly by feeding the two versions into a cross-fader and switching between them live. If you are quick enough with the controls then your other arm is free for doing handstands :)