<?xml version='1.0' encoding='UTF-8'?><rss xmlns:atom='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' version='2.0'><channel><atom:id>tag:blogger.com,1999:blog-9101292118679422945</atom:id><lastBuildDate>Mon, 21 Apr 2008 01:04:19 +0000</lastBuildDate><title>blog.kfish.org</title><description/><link>http://blog.kfish.org/</link><managingEditor>Conrad Parker</managingEditor><generator>Blogger</generator><openSearch:totalResults>55</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-8373941523429907668</guid><pubDate>Thu, 17 Apr 2008 14:45:00 +0000</pubDate><atom:updated>2008-04-18T00:30:28.061+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>firefox</category><category domain='http://www.blogger.com/atom/ns#'>vimperator</category><category domain='http://www.blogger.com/atom/ns#'>rikaichan</category><title>:rikaichan for Vimperator</title><description>&lt;p&gt;
Some of my favourite Firefox plugins are:
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.polarcloud.com/rikaichan/"&gt;Rikaichan&lt;/a&gt;, a Japanese dictionary, which adds instant translation popups when you mouse over a word;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://vimperator.mozdev.org/"&gt;Vimperator&lt;/a&gt;, which provides &lt;tt&gt;vi&lt;/tt&gt;-like user interface;&lt;/li&gt;
&lt;li&gt;and &lt;a href="https://addons.mozilla.org/en-US/firefox/addon/1337"&gt;Hide Tab Bar&lt;/a&gt;, because Vimperator's buffer list is more useful.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
Vimperator hides the menu bar by default. &lt;b&gt;Tools-&gt;Toggle Rikaichan&lt;/b&gt; has no default keybinding, and the keybindings to navigate the menubar are not available if the menubar is not visible, so Rikaichan can no longer be activated.
&lt;/p&gt;
&lt;p&gt;
The following adds a vimperator command &lt;tt&gt;:rikaichan&lt;/tt&gt;; save it to &lt;tt&gt;.vimperator/plugin/toggleRikaichan.js&lt;/tt&gt;:

&lt;blockquote&gt;&lt;pre&gt;
(function(){
    vimperator.commands.add(new vimperator.Command(
        ['rikaichan', 'rikai'],
        function(){
            rcxMain.inlineToggle();
        }
    ))
}) ();
&lt;/pre&gt;&lt;/blockquote&gt;
&lt;/p&gt;
&lt;p&gt;
It is aliased to &lt;tt&gt;:rikai&lt;/tt&gt; for short, but unfortunately vimperator won't recognize &lt;tt&gt;:理解&lt;/tt&gt;.
Thanks to ktsukagoshi for the explanation of how to write a vimperator plugin (&lt;a href="http://d.hatena.ne.jp/ktsukagoshi/20080305/1204730962"&gt;vimperatorのプラグインの作成&lt;/a&gt;).
&lt;/p&gt;
&lt;p&gt;
&lt;i&gt;Remember, &lt;a href="http://www.vergenet.net/~conrad/syre/"&gt;the interface is inside your mind&lt;/a&gt;.&lt;/i&gt;
&lt;/p&gt;</description><link>http://blog.kfish.org/2008/04/rikaichan-for-vimperator.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-8478267578838678327</guid><pubDate>Mon, 14 Apr 2008 07:30:00 +0000</pubDate><atom:updated>2008-04-14T17:10:56.059+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>continuation fest</category><category domain='http://www.blogger.com/atom/ns#'>haskell</category><title>Continuation Fest 2008: Continuations for video decoding and scrubbing</title><description>&lt;p&gt;
Yesterday was &lt;a href="http://logic.cs.tsukuba.ac.jp/Continuation/"&gt;Continuation Fest 2008&lt;/a&gt;,
at the University of Tokyo's campus in Akihabara (a very nice venue!).
It was very well attended; latecomers overflowed to a second room and participated by video conference. It was a little strange to see so many people interested in such an
&lt;strike&gt;obscure, troublesome and malignant&lt;/strike&gt; expressively powerful
programming construct; the breadth of talks made for a very inspiring and practical introduction to the theory, applications and implementation of continuations in many different languages.
&lt;/p&gt;
&lt;p&gt;
I recommend reading
&lt;a href="http://pllab.is.ocha.ac.jp/~asai/"&gt;Kenichi Asai&lt;/a&gt;'s
introduction to delimited continuations
(&lt;a href="http://pllab.is.ocha.ac.jp/~asai/papers/contfest08slide.pdf"&gt;slides&lt;/a&gt; [PDF]).
He introduced the &lt;tt&gt;shift&lt;/tt&gt; and &lt;tt&gt;reset&lt;/tt&gt; operators
through the problem of expressing exceptional control flow, and
then explained how to use these to type (ie. determine a concrete type for)
&lt;tt&gt;printf&lt;/tt&gt;. The main point was that
&lt;tt&gt;shift/reset&lt;/tt&gt; provide a high-level abstraction over control flow, with minimal impact
on the implementation of your existing functions.
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://okmij.org/ftp/"&gt;Oleg Kiselyov&lt;/a&gt; demonstrated some new code for transactional
web applications, using delimited continuations for explicit state sharing between parallel connections. The result is that the user has a consistent view across multiple tabs are open on the same site, and the state is transactional so that there is no need for warnings like "Do not press the BUY button more than once!". He said that everyone already understands delimited continuations, they just don't realize it.
&lt;/p&gt;
&lt;p&gt;
The topic of my presentation at Continuation Fest was
&lt;b&gt;Continuations for video decoding and scrubbing&lt;/b&gt;:
&lt;/p&gt;
&lt;blockquote&gt;&lt;p&gt;
Playback of encoded video involves scheduling the decoding of audio and video frames and synchronizing their playback. "Scrubbing" is the ability to quickly seek to and display an arbitrary frame, and is a common user interface requirement for a video editor. The implementation of playback and scrubbing is complicated by data dependencies in compressed video formats, which require setup and manipulation of decoder state.
&lt;/p&gt;&lt;p&gt;
We present the preliminary design of a continuation-based system for video decoding, reified as a cursor into a stream of decoded video frames. Frames are decoded lazily, and decoder states are recreated or restored when seeking. To reduce space requirements, a sequence of decoded frames can be replaced after use by the continuation which created them.
&lt;strike&gt;We outline implementations in Haskell and C.&lt;/strike&gt;
&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://seq.kfish.org/~conrad/static/continuation-fest-2008/continuations-for-video.pdf"&gt;Slides&lt;/a&gt; [383KB PDF]&lt;/li&gt;
&lt;li&gt;&lt;a href="http://seq.kfish.org/~conrad/static/continuation-fest-2008/continuations-for-video.article.pdf"&gt;Article&lt;/a&gt; [215KB PDF]&lt;/li&gt;
&lt;/ul&gt;
&lt;/p&gt;

&lt;p&gt;I'll be introducing the code for this over the next few months.
Whereas in my presentation about
&lt;a href="http://blog.kfish.org/2008/03/bossa-2008-video-player-internals.html"&gt;video player internals&lt;/a&gt; at BOSSA I outlined the problem space in designing a multimedia architecture,
at Continuation Fest I tried to break it down into subproblems and considered
useful data structures and programming techniques for dealing with them.
&lt;/p&gt;
&lt;p&gt;
I got a lot of great feedback, and I think I succeeded in my mission to introduce this problem space to some really smart people.
Thanks particularly to
&lt;a href="http://www.cs.rutgers.edu/~ccshan"&gt;Chung-chieh Shan&lt;/a&gt; for some insightful ideas
about how to deal with existing stateful codec implementations. It was also very interesting to
talk with
&lt;a href="http://www.ie.u-ryukyu.ac.jp/~kono/index-e.html"&gt;Shinji Kono&lt;/a&gt; about
&lt;a href="http://sourceforge.jp/projects/cbc/"&gt;Continuation-based C (cBc)&lt;/a&gt;
(&lt;a href="http://www.ie.u-ryukyu.ac.jp/~kono/tmp/cf08-kono.tgz"&gt;slides&lt;/a&gt; [HTML tarball]),
a C-like language capable of expressing continuations, non-local jumps, multiple function entry-points, and assorted other ways to shoot yourself in the foot. He suggested that it was designed for exactly the kind of thing I'm doing, and I'll be interested to try it
out. It is implemented in a modifed GCC 4.x as an RTL code generator, so should now be (fairly)
architecture-independent.
&lt;/p&gt;
&lt;p&gt;
Thanks to the organizers of Continuation Fest 2008 for putting together such a useful and interesting event. I look forward to implementing just some of the things I learned :-)
&lt;/p&gt;</description><link>http://blog.kfish.org/2008/04/continuation-fest-2008-continuations.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-6484626782594667327</guid><pubDate>Fri, 11 Apr 2008 21:43:00 +0000</pubDate><atom:updated>2008-04-12T06:50:09.367+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>sweep</category><category domain='http://www.blogger.com/atom/ns#'>security</category><title>Release: Sweep 0.9.3</title><description>This is a bugfix release of &lt;a href="http://www.metadecks.org/software/sweep/"&gt;Sweep&lt;/a&gt;,
addressing &lt;a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2008-1686"&gt;CVE-2008-1686&lt;/a&gt;.
For details, see my earlier post about
&lt;a href="http://blog.kfish.org/2008/04/release-libfishsound-091.html"&gt;libfishsound 0.9.1&lt;/a&gt;.
Thanks to Peter Shorthose for managing this release.</description><link>http://blog.kfish.org/2008/04/release-sweep-093.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-4753778705188094077</guid><pubDate>Mon, 07 Apr 2008 01:08:00 +0000</pubDate><atom:updated>2008-04-07T12:29:34.660+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>ogg</category><category domain='http://www.blogger.com/atom/ns#'>fishsound</category><category domain='http://www.blogger.com/atom/ns#'>security</category><title>Release: libfishsound 0.9.1</title><description>&lt;p&gt;
This is a maintenance release, fixing a security vulnerability in Speex header processing as outlined in &lt;a href="http://www.ocert.org/advisories/ocert-2008-2.html"&gt;oCERT 2008-02&lt;/a&gt;.
When used in a client for web video content, as in the
&lt;a href="http://www.annodex.net/"&gt;OggPlay Firefox Plugin&lt;/a&gt; or the
&lt;a href="http://www.illiminable.com/ogg/"&gt;Ogg DirectShow filters&lt;/a&gt;, a specially crafted Ogg Speex stream hosted on a server could be used to allow an attacker to execute arbitrary code on the client system. The OggPlay plugin binaries available from &lt;a href="http://www.annodex.net/"&gt;www.annodex.net&lt;/a&gt; have already been updated.
&lt;/p&gt;
&lt;h4&gt;Details&lt;/h4&gt;
&lt;p&gt;
The &lt;a href="http://wiki.xiph.org/OggSpeex"&gt;Speex header&lt;/a&gt; contains a 32-bit &lt;tt&gt;modeID&lt;/tt&gt; field, interpreted by libspeex as a signed int (&lt;tt&gt;spx_int32_t&lt;/tt&gt;)
The normal way to use this is to index into a global mode list to retrieve a SpeexMode *:
&lt;blockquote&gt;&lt;pre&gt;
mode = (SpeexMode *)speex_mode_list[modeID];
&lt;/pre&gt;&lt;/blockquote&gt;

and then use that to set up a decoder:
&lt;blockquote&gt;&lt;pre&gt;
st = speex_decoder_init(mode);
&lt;/pre&gt;&lt;/blockquote&gt;

This calls &lt;tt&gt;speex_decoder_init()&lt;/tt&gt; in libspeex, which looks like:

&lt;blockquote&gt;&lt;pre&gt;
void *speex_decoder_init(const SpeexMode *mode)
{
   return mode-&gt;dec_init(mode);
}
&lt;/pre&gt;&lt;/blockquote&gt;

So if you don't check that the &lt;tt&gt;modeID&lt;/tt&gt; given in the stream header is within the bounds of &lt;tt&gt;speex_mode_list[]&lt;/tt&gt;, arbitrary code can be executed.
&lt;tt&gt;libfishsound&lt;/tt&gt; was checking the upper bound (&lt;tt&gt;modeID &amp;lt; SPEEX_NB_MODES&lt;/tt&gt;) but was not checking against negative values.
&lt;/p&gt;
&lt;h4&gt;Discussion&lt;/h4&gt;
&lt;p&gt;
This header processing is all boilerplate, and a reference implementation is given in
&lt;a href="http://svn.xiph.org/trunk/speex/src/speexdec.c"&gt;speexdec.c&lt;/a&gt;.
I took a copy of that about 7 years ago for
&lt;a href="http://www.metadecks.org/software/sweep/"&gt;Sweep&lt;/a&gt;, which I then adapted for libfishsound. The current reference speexdec.c does not have this bug.
&lt;/p&gt;
&lt;p&gt;
For the Symbian port of Speex we created a function which returns the desired mode given a modeID, rather than having application code index into a global mode list.
I wrote and committed &lt;a href="https://trac.xiph.org/changeset/7511"&gt;speex_get_mode()&lt;/a&gt;
to libspeex in September 2004, and it does the correct bounds checking.
So if I'd been using that function in libfishsound then today's problem would never have happened. As it turns out, the libfishsound svn trunk version of
&lt;a href="http://svn.annodex.net/libfishsound/trunk/src/libfishsound/speex.c"&gt;speex.c&lt;/a&gt;
does use that function. As far as I am aware, the OggPlay plugin binaries have always been built against the libfishsound svn trunk, so they were never vulnerable in the first place. However, recent tarball releases of libfishsound have been coming of a separate branch, so the advisory is valid for applications linked against those releases.
&lt;/p&gt;  
&lt;p&gt;
Finally, I sent a patch to 
&lt;a href="http://people.xiph.org/~jm/"&gt;Jean-Marc Valin&lt;/a&gt; yesterday which entirely removes the possibility of this bug happening again by bounding the mode values returned by &lt;tt&gt;speex_packet_to_header()&lt;/tt&gt; in libspeex. It will be available very soon in a libspeex release.
&lt;/p&gt;
&lt;h4&gt;Acknowledgements&lt;/h4&gt;
&lt;p&gt;
Thanks to the team at &lt;a href="http://www.ocert.org/"&gt;oCERT&lt;/a&gt; for the efficient reporting of this advisory, and to the anonymous submitter for the details.
I was able to patch the offending branches, which allowed 
&lt;a href="http://v2v.cc/~j/"&gt;j^&lt;/a&gt; to build and upload new OggPlay plugin binaries (within 24 hours of contact by oCERT).
&lt;/p&gt;
&lt;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://lists.xiph.org/pipermail/speex-dev/2008-April/006636.html"&gt;libfishsound 0.9.1 release notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.ocert.org/advisories/ocert-2008-2.html"&gt;oCERT 2008-02&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/p&gt;</description><link>http://blog.kfish.org/2008/04/release-libfishsound-091.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-8395424576443155275</guid><pubDate>Mon, 24 Mar 2008 18:23:00 +0000</pubDate><atom:updated>2008-03-25T05:22:20.038+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>haskell</category><category domain='http://www.blogger.com/atom/ns#'>ogg</category><title>Release: HOgg 0.4.0</title><description>&lt;p&gt;
&lt;a href="http://www.vergenet.net/~conrad/software/hogg/"&gt;HOgg&lt;/a&gt;
is a Haskell library and commandline tool for manipulating Ogg files.
This release contains a bunch of code written during &lt;a href="http://blog.kfish.org/2008/02/foms-lca-2008-roundup.html"&gt;FOMS and LCA 2008&lt;/a&gt;, including
a new sort subcommand and proper handling of Skeleton when merging and ripping files. Full details are in the
&lt;a href="http://www.vergenet.net/~conrad/software/hogg/release_notes/hogg-0.4.0.txt"&gt;release notes&lt;/a&gt;.
&lt;/p&gt;

&lt;h3&gt;sort implementation&lt;/h3&gt;
&lt;p&gt;
My favourite part is the implementation of the new &lt;tt&gt;sort&lt;/tt&gt; subcommand:
&lt;blockquote&gt;
&lt;pre&gt;
sort :: [OggPage] -&gt; [OggPage]
sort = sortHeaders . listMerge . demux
&lt;/pre&gt;
&lt;/blockquote&gt;
&lt;/p&gt;
&lt;p&gt;
This is somewhat shorter than the equivalent C implementation,
&lt;a href="http://svn.annodex.net/liboggz/trunk/src/tools/oggz-sort.c"&gt;oggz-sort.c&lt;/a&gt; &amp;mdash;
&lt;b&gt;Haskell affords abstraction whereas in C it's a trade-off&lt;/b&gt;.
&lt;tt&gt;sortHeaders&lt;/tt&gt; is a long (21 line) function that re-orders header pages according to
the Theora and Skeleton specifications, and &lt;tt&gt;listMerge&lt;/tt&gt; is a generic list merging function, also used in the &lt;tt&gt;merge&lt;/tt&gt; subcommand. &lt;tt&gt;demux&lt;/tt&gt; is tiny:
&lt;blockquote&gt;
&lt;pre&gt;
demux :: (Serialled a) =&gt; [a] -&gt; [[a]]
demux = classify serialEq
&lt;/pre&gt;
&lt;/blockquote&gt;
You can read that as "demux is classification by serial number": &lt;tt&gt;classify&lt;/tt&gt; is a generic list function, classifying list elements according to some criterion you give it. Here, for example, the list of pages:
&lt;blockquote&gt;
&lt;tt&gt;[Video0, Audio0, Video1, Audio1, Audio2, Audio3, Video2, Audio4, Video3, ...]&lt;/tt&gt;
&lt;/blockquote&gt;
will get classified into two separate lists:
&lt;blockquote&gt;
&lt;pre&gt;
[[Video0, Video1, Video2, Video3, ...],
 [Audio0, Audio1, Audio2, Audio3, Audio4, ...]]
&lt;/pre&gt;
&lt;/blockquote&gt;
This is done lazily, meaning that the processing is done on the fly and big intermediate lists are not constructed in memory. &lt;tt&gt;Video0&lt;/tt&gt;, &lt;tt&gt;Audio0&lt;/tt&gt; will be passed through &lt;tt&gt;listMerge&lt;/tt&gt; and &lt;tt&gt;sortHeaders&lt;/tt&gt; and written to disk by the consumer of &lt;tt&gt;sort&lt;/tt&gt; well before &lt;tt&gt;Video103&lt;/tt&gt; and &lt;tt&gt;Audio5007&lt;/tt&gt; are seen.
&lt;/p&gt;

&lt;h3&gt;Documentation improvements and self-checking&lt;/h3&gt;
&lt;p&gt;
The help for each subcommand now contains long descriptions, mostly similar to the man pages of the
&lt;a href="http://www.annodex.net/software/liboggz/index.html"&gt;&lt;tt&gt;Oggz&lt;/tt&gt;&lt;/a&gt; tools.
The descriptions also have explicit sections describing how Theora, Skeleton and chained files are handled.
The example commandlines for each subcommand use the
&lt;a href="http://wiki.xiph.org/index.php/MIME_Types_and_File_Extensions"&gt;Ogg MIME types and file extensions&lt;/a&gt; that we are now recommending in Xiph.Org.
&lt;/p&gt;
&lt;p&gt;
The best bit though is &lt;tt&gt;hogg selfcheck&lt;/tt&gt;, which checks that the help examples are valid.
It checks that all the example commandlines pass through getOpt without errors, and that all file extensions used in options are valid. This is the kind of nice touch which would have been a pain to code up in C, but fell out cleanly in the Haskell implementation. As it is fairly cheap to run (and printing help text is hardly a performance-critical operation), this option is also silently run after printing out any help output at all, so that such errors are more likely to be found
and reported. The same commit that introduced &lt;tt&gt;hogg selfcheck&lt;/tt&gt; also fixed two such documentation errors which were found by this option :-)
&lt;/p&gt;</description><link>http://blog.kfish.org/2008/03/release-hogg-040.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-2394818974149938437</guid><pubDate>Mon, 24 Mar 2008 18:17:00 +0000</pubDate><atom:updated>2008-03-25T05:16:49.585+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>xsel</category><title>Release: xsel 1.2.0</title><description>&lt;p&gt;
&lt;a href="http://www.kfish.org/software/xsel/"&gt;XSel&lt;/a&gt; is a command-line tool for manipulating the X selection.
This is a maintenance release, improving argument handling, documentation and X11 library detection.
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.vergenet.net/~conrad/software/xsel/download/xsel-1.2.0.tar.gz"&gt;xsel-1.2.0.tar.gz&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="http://svn.kfish.org/xsel/trunk/release_notes/xsel-1.2.0.txt"&gt;Release notes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description><link>http://blog.kfish.org/2008/03/release-xsel-120.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-3484153005411885323</guid><pubDate>Mon, 24 Mar 2008 17:29:00 +0000</pubDate><atom:updated>2008-03-25T03:15:30.394+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>bossa</category><title>BOSSA 2008: Video Player Internals</title><description>&lt;p&gt;Last week I attended
&lt;a href="http://www.bossaconference.indt.org/"&gt;BOSSA&lt;/a&gt;, a conference on open source software
for mobile embedded platforms, organized by &lt;a href="http://www.indt.org.br/"&gt;INdT&lt;/a&gt;. It was held in the town of Porto de Galinhas, Brazil.
Since then I have been hanging out in the INdT labs in Recife, hacking on xine, catching up with friends and exploring the old city.
&lt;/p&gt;&lt;p&gt;
The topic of my presentation at BOSSA was &lt;b&gt;Video Player Internals&lt;/b&gt;:
&lt;blockquote&gt;
Embedded platforms put demands on latency and memory use. Video playback
makes these difficult to guarantee. This presentation discusses the
architecture of video players, and the problems imposed on them by the
design of video codecs and their containers. To explain these problems
we look at both proprietary and open source formats (MPEG, Ogg, Theora,
Dirac, etc.) and evaluate open source video players in this context.
We particularly examine xine and GStreamer, and introduce the minimal
architecture of OggPlay.
&lt;/blockquote&gt;
&lt;/p&gt;
&lt;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://seq.kfish.org/~conrad/static/bossa-2008/video-player-internals.pdf"&gt;Slides&lt;/a&gt; [613KB PDF]&lt;/li&gt;
&lt;li&gt;&lt;a href="http://seq.kfish.org/~conrad/static/bossa-2008/video-player-internals.article.pdf"&gt;Article&lt;/a&gt; [330KB PDF]&lt;/li&gt;
&lt;/ul&gt;
&lt;/p&gt;
&lt;p&gt;
I'm very grateful to INdT for the opportunity to attend, it was an awesome conference in a very beautiful part of the world.
&lt;/p&gt;
&lt;/p&gt;</description><link>http://blog.kfish.org/2008/03/bossa-2008-video-player-internals.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-4341732321478649420</guid><pubDate>Fri, 15 Feb 2008 08:54:00 +0000</pubDate><atom:updated>2008-02-15T18:24:33.429+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>ogg</category><title>Release: liboggz 0.9.7</title><description>&lt;p&gt;
There's been a whole bunch of work on
&lt;a href="http://www.annodex.net/software/liboggz/index.html"&gt;liboggz&lt;/a&gt; recently; it deserves a few more weeks of
shaking out and perhaps some updated Win32/MacOS support before it gets 1.0 slapped on it.
&lt;/p&gt;
&lt;p&gt;
&lt;a href="http://lists.xiph.org/pipermail/ogg-dev/2008-February/000847.html"&gt;liboggz 0.9.7&lt;/a&gt;
includes a new tool called oggz-sort, which addresses a problem with some encoders that
Shane Stephens brought up at
&lt;a href="http://www.annodex.org/events/foms2008/pmwiki.php/Main/Proceedings"&gt;FOMS&lt;/a&gt;. The
discussion was going around in circles, so my response was to write this C code. It implements a function that Shane has written but not yet released in his OCaml implementation of Ogg
(&lt;a href="http://svn.annodex.net/oogg/trunk/"&gt;oogg&lt;/a&gt;), and
which I've written but not yet released in my Haskell implementation (&lt;a href="http://www.kfish.org/software/hogg/"&gt;HOgg&lt;/a&gt;). Of course, people will take this version more seriously because it's written in C.
&lt;/p&gt;
&lt;p&gt;
From &lt;tt&gt;&lt;b&gt;oggz-sort (1)&lt;/b&gt;&lt;/tt&gt;:
&lt;blockquote&gt;
&lt;p&gt;
&lt;b&gt;oggz-sort&lt;/b&gt; sorts an Ogg file, interleaving pages in order  of  presentation  time.  It  correctly  interprets the granulepos timestamps of Ogg
Vorbis, Speex, FLAC and Theora bitstreams, and all bitstreams  of  Annodex files.
&lt;/p&gt;&lt;p&gt;
Some  encoders produce files with incorrect page ordering; for example,
some audio and video pages may occur out of order. Although these files
are  usually  playable, it can be difficult to accurately seek or scrub
on them, increasing the likelihood of glitches during playback. Players
may  also need to use more memory in order to buffer the audio and
video data for synchronized playback, which can be a problem  when  the
files are viewed on low-memory devices.
&lt;/p&gt;&lt;p&gt;
The  tool  &lt;b&gt;oggz-validate&lt;/b&gt;  can be used to check the relative ordering of
packets in a file. If out of order packets are reported, use &lt;b&gt;oggz-sort&lt;/b&gt;
to fix the problem.
&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/p&gt;
&lt;p&gt;
This release also adds support for the experimental
&lt;a href="http://lists.xiph.org/pipermail/ogg-dev/2007-December/000706.html"&gt;CELT&lt;/a&gt; audio codec, which is being developed
by Jean-Marc Valin (the primary author of &lt;a href="http://www.speex.org/"&gt;Speex&lt;/a&gt;). CELT is
designed as a low-latency codec for high-quality audio. When wiretapping conversations
encoded in CELT, we recommend that you record using the Ogg container format. You can then use oggz-tools to help with your analysis.
&lt;/p&gt;</description><link>http://blog.kfish.org/2008/02/release-liboggz-097.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-7742603208750161824</guid><pubDate>Sat, 09 Feb 2008 23:30:00 +0000</pubDate><atom:updated>2008-02-10T10:55:09.111+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>xsel</category><title>Release: xsel 1.1.0</title><description>&lt;p&gt;
This is a story about the meaning of "version 1.0".
A few weeks ago I released
&lt;a href="http://blog.kfish.org/2008/01/release-xsel-100.html"&gt;version 1.0 of xsel&lt;/a&gt;, a simple commandline utility
for manipulating the X selection and clipboard.
I chose to call it 1.0 after recalling a discussion with
&lt;a href="http://www.algorithm.com.au/"&gt;Andr&amp;eacute; Pang&lt;/a&gt;,
about how the meaning of version numbers in open source software tends to differ from that in other software communities. For example, it is often advised not to buy the first version of a proprietary software product as it is sure to be buggy and incomplete; open source projects on the other hand often aspire to 1.0 being a major milestone, bug-free and fully-functional. The Windows and Mac freeware and shareware communities tend to follow a middle ground, content to release a useful but incomplete version 1.0, but thereafter avoiding the quick version creep that afflicts companies with marketing departments (and version-limited support contracts).
&lt;/p&gt;
&lt;p&gt;
I'll argue that that middle way makes for more meaningful version numbers. Putting the label "1.0" on a release should be your way of saying that it's the first version that:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;won't hose a user's system, and&lt;/li&gt;
&lt;li&gt;hopefully does something useful.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;
Any version number less than 1.0 is sending out a signal that the software isn't quite ready yet; perhaps that you could lose or damage data by using it. Many people intuitively wait for version 1.0 before trying out some software, and this is fair enough. In fact, we &lt;i&gt;need&lt;/i&gt; a way of warning that a project isn't ready for widespread adoption, that it could damage data, that the tarball is only out there so that other people can grab the code and help fix bugs. That's what version numbers less than 1.0 mean.
&lt;/p&gt;
&lt;p&gt;
After 1.0, you can keep adding features and bumping the version number, working towards version 2.0 which perhaps does useful things in a different way. And from 2.0, onwards to 3.0 and beyond; integers are cheap. The important thing is not fall into the trap of thinking of 1.0 as some kind of asymptotic upper bound representing the perfect release.
&lt;/p&gt;
&lt;p&gt;
Back to xsel. At first I wrote up release notes as version 0.9.7, but then
then remembered that discussion with Andr&amp;eacute; and realized that it should really
just be 1.0.
More to the point the previous release (in July 2001, which went five years without a bug report or patch) should have been 1.0. So yeah, it was a good feeling to just write "1.0" and send it out.
&lt;/p&gt;
&lt;p&gt;
The morning after releasing 1.0 I got a report from someone who couldn't get
it to compile -- turns out they didn't have the X11 development libraries
installed, and for some reason I had commented out the checks for that
in &lt;tt&gt;configure.ac&lt;/tt&gt; while testing something or other a while ago.
As a result, the configure script wasn't check for its only dependency.
I considered doing a canonical 1.0.1 (LOL) release. Within the next day, though,
I got a report about how to fix handling of COMPOUND_TEXT, an archaic way of
handling international text (since superceded by UTF8_STRING). And there follows
the next lesson (as berated by &lt;a href="http://www.rasterman.com/"&gt;Raster&lt;/a&gt;): random
bug reports stream in &lt;emph&gt;after&lt;/emph&gt; a release, not before.
&lt;a href="http://www.mega-nerd.com/"&gt;Erik de Castro Lopo&lt;/a&gt; is up to his 20th
pre-release of libsndfile 1.0.18; each pre-release he gets bombarded with reports.
&lt;/p&gt;
&lt;p&gt;
Anyway, the post-1.0 bug reports have died down, so today I'm releasing version 1.1.0 of
&lt;a href="http://www.vergenet.net/~conrad/software/xsel"&gt;xsel&lt;/a&gt;.
&lt;i&gt;"This release adds basic support for COMPOUND_TEXT and fixes a configuration bug"&lt;/i&gt;.
And I'm still waiting to hear good uses of &lt;b&gt;&lt;tt&gt;xsel --append&lt;/tt&gt;&lt;/b&gt; and
&lt;b&gt;&lt;tt&gt;xsel --follow&lt;/tt&gt;&lt;/b&gt;.
&lt;/p&gt;</description><link>http://blog.kfish.org/2008/02/release-xsel-110.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-663768690450268455</guid><pubDate>Fri, 08 Feb 2008 05:58:00 +0000</pubDate><atom:updated>2008-02-08T16:49:39.606+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>linux.conf.au</category><category domain='http://www.blogger.com/atom/ns#'>foms</category><title>FOMS, LCA Multimedia 2008: Videos</title><description>&lt;p&gt;
I arrived back in Japan after a few awesome weeks in Australia for
&lt;a href="http://www.annodex.org/events/foms2008/pmwiki.php/Main/HomePage"&gt;FOMS&lt;/a&gt; and &lt;a href="http://linux.conf.au/"&gt;LCA&lt;/a&gt;. The weather in Melbourne was great, and the food was fantastic.
&lt;/p&gt;
&lt;p&gt;
Between FOMS and LCA, dozens of free multimedia software developers were in town. It was the first time that developers of Dirac, Speex, Theora, Vorbis, Ogg, and most of the Annodex crew were all in the same place, so we spent most of the week of LCA holed up in a room designing content description and packaging formats. One immediate outcome will be finalization of the Dirac mapping into the Ogg container.
&lt;/p&gt;
&lt;p&gt;
I organised the multimedia miniconf on the Monday of LCA, which was jam-packed with excellent presentations and lightning talks. Thanks to everyone who came, and talked, and video recorded. There were plenty of comments along the lines of it being "pretty hardcore for a miniconf".
If you are interested in helping with next year's LCA Multimedia, or have friends in Hobart who might be able to help, let's start throwing around ideas. In particular, quite a few people asked what happened to the audio miniconf parties from a few years ago, and it might be a good chance to revive those ...
&lt;/p&gt;
&lt;h4&gt;Videos&lt;/h4&gt;
&lt;p&gt;
The following pages contain embedded videos of the presentations from these events, and the multimedia-related presentations from LCA:
&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.annodex.org/events/foms2008/pmwiki.php/Main/Proceedings"&gt;FOMS Proceedings&lt;/a&gt;: introductions by the participants&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.annodex.org/events/lca2008_mmm/pmwiki.php/Main/Schedule"&gt;LCA Multimedia&lt;/a&gt;: Dirac, Xiph, EngageMedia, FFADO and many others&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.annodex.org/events/lca2008_mmm/pmwiki.php/Main/LCA"&gt;Multimedia talks @LCA&lt;/a&gt;: PulseAudio, Ogg, Theora, Telepathy, Farsight ...&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;
The videos on these pages are embedded with &lt;a href="http://metavid.ucsc.edu/wiki/index.php/Mv_embed"&gt;mv_embed&lt;/a&gt;, which supports playback via the &lt;a href="http://www.annodex.net/"&gt;OggPlay plugin for Firefox&lt;/a&gt;, vlc-plugin or generic application/ogg.
mv_embed is a JavaScript library by Michael Dale of &lt;a href="http://metavid.org/"&gt;MetaVid&lt;/a&gt;. It is really easy to use, you just include that library (&lt;tt&gt;&amp;lt;script src="..."&amp;gt;&lt;/tt&gt;) and then write &lt;tt&gt;&amp;lt;video src="..."&amp;gt;&lt;/tt&gt; anywhere in your page. No need to wait for native HTML5 support in your browser :-)
&lt;/p&gt;
&lt;/blockquote&gt;</description><link>http://blog.kfish.org/2008/02/foms-lca-2008-roundup.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-1253829388926355286</guid><pubDate>Sun, 13 Jan 2008 05:11:00 +0000</pubDate><atom:updated>2008-01-13T14:54:16.500+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>ogg</category><title>Release: liboggz 0.9.6</title><description>&lt;p&gt;
This release of
&lt;a href="http://lists.xiph.org/pipermail/ogg-dev/2008-January/000717.html"&gt;Oggz 0.9.6&lt;/a&gt; contains a new tool, &lt;b&gt;&lt;tt&gt;oggz-comment&lt;/tt&gt;&lt;/b&gt;, which can be used to edit the basic metadata (title, producer, copyright etc.) of Ogg Theora files.
The library also has some pretty major improvements to the way it works out timestamps and does seeking, mostly the work of Shane Stephens.
&lt;/p&gt;
&lt;p&gt;
In media files, timing and synchronization is extremely important. If the image and audio start to go out of sync, it is very noticeable and the video quickly becomes unwatchable. When you scan through a file you often need to decode a lot more data than you actually display. This is particularly the case when you jump backwards, which is common in a user interface that supports scrubbing. As video frames are stored as a difference relative to earlier (or later) frames, you end up needing to secretly jump further back in the file to the previous keyframe, and then decode many frames up to the one you actually want to show. For a smooth user experience you need to do this as quickly as possible.
&lt;/p&gt;
&lt;p&gt;
Ogg has some interesting framing properties. Given that timing is so important, you might expect that every packet has its precise timing information associated with it. In Ogg, it turns out not to be so. Packets are stored in pages, and there is only one timestamp per page. It is common for many audio packets to be crammed onto one page; the timing information for all the rest is not stored in the file. On the other hand, the encoded data for video keyframes is usually much larger, and spans multiple pages. Only the last packet on a page has its timestamp recorded, so if the keyframe is followed by an a much smaller packet of frame data in the same page, the timestamp for the keyframe will be lost. For these reasons I tend to refer to Ogg as a "lossy" container.
&lt;/p&gt;
&lt;p&gt;
In order to minimize these problems, liboggz now inspects the encoded data in order reconstruct the expected granulepos (corresponding to a timestamp) for every packet in an Ogg stream. This allows applications to use reliable timestamps, even though these are only sparsely recorded in most Ogg streams.
This is not as easy as it sounds, particularly for Ogg Vorbis.
To get a flavour of what's involved, read Shane's rant in the comments, explaining how to 
&lt;a href="http://trac.annodex.net/browser/liboggz/trunk/src/liboggz/oggz_auto.c#L468"&gt;calculate Vorbis timestamps&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
For an in-depth discussion, come to Ralph Giles' talk at linux.conf.au,
&lt;a href="http://linux.conf.au/programme/detail?TalkID=68"&gt;Seeking is hard: Ogg design internals&lt;/a&gt;.
&lt;/p&gt;</description><link>http://blog.kfish.org/2008/01/release-liboggz-096.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-2578249400119137180</guid><pubDate>Sat, 12 Jan 2008 15:53:00 +0000</pubDate><atom:updated>2008-01-13T01:29:18.168+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>xsel</category><title>Release: xsel 1.0.0</title><description>&lt;p&gt;
&lt;a href="http://www.vergenet.net/~conrad/software/xsel/"&gt;XSel&lt;/a&gt; is a command-line program for getting and setting the contents of the X selection. You can use &lt;tt&gt;xsel &lt;/tt&gt;in shell scripts and desktop keybindings, so that the contents of the X selection are available to command arguments:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;b&gt;&lt;tt&gt;mozilla --remote "openurl(`xsel`)"&lt;/tt&gt;&lt;/b&gt;
&lt;/blockquote&gt;

&lt;p&gt;
This release adds UTF-8 support and fixes various bugs. The last version of XSel was 0.9.6, released sometime around 2001. It may have been the first version also. For some reason a bunch of patches came in recently, and I've had the joy of revisiting this project.
&lt;/p&gt;
&lt;p&gt;
For old time's sake, my
&lt;a href="http://lists.slug.org.au/archives/slug-chat/2001/July/msg00054.html"&gt;thoughts on ICCCM&lt;/a&gt;. (Warning: explicit language).
Back then I made a point of implementing as much of that crack as possible. You can even tell applications to delete their selected text:
&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;To delete the contents of the selection: &lt;b&gt;&lt;tt&gt;xsel --delete&lt;/tt&gt;&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;
(This really works, you can try it on &lt;tt&gt;xedit&lt;/tt&gt; to remotely delete text in the editor window).
&lt;/p&gt;

&lt;/p&gt;
&lt;p&gt;
This time around, of course, nothing does what the docs say anymore.
So we ignore the details in the 2001 proposal for Inter-Client
Exchange of Unicode Text and just grunt atoms at the selection owner
until they yield all their secrets. And now, finally, &lt;tt&gt;xsel&lt;/tt&gt; works on
Japanese.
&lt;/p&gt;
&lt;p&gt;
&lt;em&gt;People have come up with some interesting uses for &lt;tt&gt;xsel&lt;/tt&gt; over the years, but nobody has yet come up with a nifty use for the following options:&lt;/em&gt;
&lt;/p&gt;
&lt;blockquote&gt;
&lt;ul&gt;
&lt;li&gt;To append to the X selection: &lt;b&gt;&lt;tt&gt;xsel --append &amp;lt; file&lt;/tt&gt;&lt;/b&gt;&lt;/li&gt;
&lt;li&gt;To follow a growing file: &lt;b&gt;&lt;tt&gt;xsel --follow &amp;lt; file&lt;/tt&gt;&lt;/b&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/blockquote&gt;
&lt;p&gt;
Any ideas?
&lt;/p&gt;</description><link>http://blog.kfish.org/2008/01/release-xsel-100.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-2095151563909900908</guid><pubDate>Sat, 12 Jan 2008 10:34:00 +0000</pubDate><atom:updated>2008-01-12T20:31:17.896+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>ogg</category><category domain='http://www.blogger.com/atom/ns#'>flac</category><title>Release: libfishsound 0.9.0</title><description>&lt;p&gt;
Now &lt;a href="http://lists.xiph.org/pipermail/flac-dev/2008-January/002472.html"&gt;libfishsound 0.9.0&lt;/a&gt; supports
&lt;a href="http://flac.sourceforge.net/"&gt;FLAC&lt;/a&gt;, the Free Lossless Audio Codec.
The &lt;a href="http://www.annodex.net/software/libfishsound/libfishsound-flac/"&gt;patches&lt;/a&gt;
were originally contributed by Tobias Gehrig in 2004. There hasn't been much use of Ogg FLAC, whereas FLAC in its native encoding is very popular. However, the point of the Ogg mapping is to allow FLAC to be used in parallel with other codecs, in particular as the audio codec for video files.
The combination of Theora video and FLAC audio can be very useful for music videos, where you might not care too much if the image has lost some quality but you want the sound to be as good as possible.
&lt;/p&gt;
&lt;p&gt;
However, creating such a file isn't so easy. Let's say you have a source video, like
&lt;a href="http://www.archive.org/details/gtv204_jacobfredjazzodyssey"&gt;GrooveTV #204 - Jacob Fred Jazz Odyssey&lt;/a&gt;. I took the MPEG-1 file as recommended; for clarity, let's call it &lt;tt&gt;source.mpg&lt;/tt&gt;. To make a video to test on, I did:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
&lt;b&gt;&lt;tt&gt;ffmpeg2theora source.mpg&lt;/tt&gt;&lt;/b&gt;&lt;br/&gt;
to encode the video into an Ogg file containing Theora video and Vorbis audio. This produces &lt;b&gt;&lt;tt&gt;source.ogv&lt;/tt&gt;&lt;/b&gt;.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;&lt;tt&gt;oggzrip -c theora source.ogv -o video-theora.ogv&lt;/tt&gt;&lt;/b&gt;&lt;br/&gt;
to extract only the Theora video track, into &lt;b&gt;&lt;tt&gt;video-theora.ogv&lt;/tt&gt;&lt;/b&gt;.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;&lt;tt&gt;mpg123 -w source.wav source.mpg&lt;/tt&gt;&lt;/b&gt;&lt;br/&gt;
to extract the audio to a wav file, &lt;b&gt;&lt;tt&gt;source.wav&lt;/tt&gt;&lt;/b&gt;. Here the audio in the source material was encoded as MPEG I layer II; obviously if you were producing a music video, you'd skip this step and encode FLAC from the original recording. I didn't have that here, and I just wanted a file I could test on.
&lt;/p&gt;&lt;p&gt;
However, at the least this step means that no further artifacts are introduced into the audio, other than those which were present in the MPEG encoding. If the only source material you have is already encoded, you don't want to degrade it further by re-encoding it with a different codec.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;&lt;tt&gt;flac --ogg source.wav -o audio-flac.oga&lt;/tt&gt;&lt;/b&gt;&lt;br/&gt;
to encode the audio. This produces an Ogg FLAC file called &lt;b&gt;&lt;tt&gt;audio-flac.oga&lt;/tt&gt;&lt;/b&gt;.
&lt;/p&gt;
&lt;p&gt;
&lt;b&gt;&lt;tt&gt;oggzmerge video-theora.ogv audio-flac.oga -o final.ogv&lt;/tt&gt;&lt;/b&gt;&lt;br/&gt;
to merge the video and audio tracks into the final Ogg video file, &lt;b&gt;&lt;tt&gt;final.ogv&lt;/tt&gt;&lt;/b&gt;.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
Note that we're using the recently recommended 
&lt;a href="http://wiki.xiph.org/index.php/MIME_Types_and_File_Extensions"&gt;file extensions for Ogg video and audio&lt;/a&gt;.
&lt;/p&gt;
&lt;p&gt;
If you know an easier way to create Ogg Theora+FLAC files, please leave a note in the comments :-)
&lt;/p&gt;</description><link>http://blog.kfish.org/2008/01/release-libfishsound-090.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-3881116361912966419</guid><pubDate>Tue, 11 Dec 2007 04:04:00 +0000</pubDate><atom:updated>2007-12-11T14:02:31.219+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>ogg</category><category domain='http://www.blogger.com/atom/ns#'>html5</category><title>HTML5 for free media: Today on #whatwg</title><description>&lt;p&gt;There has been a bit of &lt;a href="http://lists.xiph.org/pipermail/advocacy/2007-December/001469.html"&gt;FUD about Ogg Theora&lt;/a&gt; recently
[&lt;a href="http://yro.slashdot.org/yro/07/12/09/2045200.shtml"&gt;2&lt;/a&gt;]
[&lt;a href="http://www.boingboing.net/2007/12/09/nokia-to-w3c-ogg-is.html"&gt;3&lt;/a&gt;].
So, over on &lt;a href="http://wiki.whatwg.org/wiki/IRC"&gt;#whatwg&lt;/a&gt;, one day before the &lt;a href="http://www.w3.org/2007/08/video/"&gt;W3C Video on the Web Workshop&lt;/a&gt;:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;table&gt;
&lt;tr&gt;&lt;td&gt;11:35:59&lt;/td&gt;&lt;td&gt; *       Hixie casually removes Ogg from the spec and sees what happens&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;11:36:43&lt;/td&gt;&lt;td&gt; *       othermaciej_ takes shelter&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;td&gt;...&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;/blockquote&gt;

&lt;p&gt;
The editor of the HTML5 draft specification, Ian Hickson (Hixie), sent &lt;a href="http://lists.w3.org/Archives/Public/public-html/2007Dec/0136.html"&gt;this message &lt;/a&gt;:
&lt;/p&gt;

&lt;blockquote&gt;
I've temporarily removed the requirements on video codecs from the HTML5
spec, since the current text isn't helping us come to a useful
interoperable conclusion. When a codec is found that is mutually
acceptable to all major parties I will update the spec to require that
instead and then reply to all the pending feedback on video codecs.
&lt;/blockquote&gt;

&lt;blockquote&gt;
&lt;table&gt;
&lt;tr&gt;&lt;td&gt;12:05:02&lt;/td&gt;&lt;td&gt; &amp;lt;kfish&amp;gt; Hixie!&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;12:11:47&lt;/td&gt;&lt;td&gt; *       kfish throws a tantrum on behalf of the free software community&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;&amp;nbsp;&lt;/td&gt;&lt;td&gt;...&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;/blockquote&gt;

&lt;p&gt;
However, the change didn't turn out to be so bad after all. The new text reads:
&lt;/p&gt;

&lt;blockquote&gt;
...; we need a codec that is known to not require per-unit or per-distributor licensing, that is compatible with the open source development model, that is of
sufficient quality as to be usable, and that is not an additional submarine patent risk for large companies.
&lt;/blockquote&gt;

&lt;p&gt;
The previous draft stated no such requirements. As no rationale was given for choosing Ogg, that recommendation was easy to attack.
Members of the &lt;a href="http://www.mpegla.com/"&gt;MPEG LA&lt;/a&gt;, the cabal whose members receive money when people use content in MPEG formats, then had a fairly easy job of inciting &lt;a href="http://www.whatwg.org/issues/#graphics-video-codec"&gt;flamewars&lt;/a&gt;
on the whatwg list.
&lt;/p&gt;
&lt;p&gt;
The new, clearer wording should allow more productive technical discussion, so that we can actually build an &lt;a href="http://perens.com/OpenStandards/Definition.html"&gt;open standard&lt;/a&gt; which encourages anyone, anywhere, to publish their videos freely.
&lt;/p&gt;

&lt;blockquote&gt;
&lt;table&gt;
&lt;tr&gt;&lt;td&gt;12:29:48&lt;/td&gt;&lt;td&gt; *       kfish reads the replacement text and revokes the tantrum&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;12:30:15&lt;/td&gt;&lt;td&gt; &amp;lt;kfish&amp;gt; Hixie, actually you didn't casually remove Ogg, you made the case for Ogg stronger, so thankyou :-)&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;12:35:37&lt;/td&gt;&lt;td&gt; &amp;lt;Dashiva&amp;gt;       "Lift the cat who was amongst the pigeons up and put him back on his pedestal for now."&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;12:35:40&lt;/td&gt;&lt;td&gt; &amp;lt;Dashiva&amp;gt;       Poetic&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;12:37:49&lt;/td&gt;&lt;td&gt; &amp;lt;Hixie&amp;gt; kfish: :-)&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;
&lt;/blockquote&gt;</description><link>http://blog.kfish.org/2007/12/html5-for-free-media-today-on-whatwg.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-4589947712058568287</guid><pubDate>Thu, 06 Dec 2007 13:31:00 +0000</pubDate><atom:updated>2007-12-06T23:29:15.824+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>haskell</category><category domain='http://www.blogger.com/atom/ns#'>ogg</category><title>Release: HOgg 0.3.0</title><description>&lt;a href="http://www.kfish.org/software/hogg/"&gt;Hogg&lt;/a&gt; is a commandline tool for manipulating Ogg files. It has subcommands, like &lt;tt&gt;hogg chop&lt;/tt&gt; for cutting out bits of video, &lt;tt&gt;hogg info&lt;/tt&gt; for telling you about the codecs, and &lt;tt&gt;hogg dump&lt;/tt&gt; for hexdumping the packet data.

It's basically a re-implementation of most of the stuff in &lt;a href="http://www.annodex.net/software/liboggz/index.html"&gt;liboggz&lt;/a&gt;, but the new features in
&lt;a href="http://www.kfish.org/software/download/hogg-0.3.0.tar.gz"&gt;hogg 0.3.0&lt;/a&gt;
such as chopping out a section of a file and adding &lt;a href="http://wiki.xiph.org/OggSkeleton"&gt;Ogg Skeleton&lt;/a&gt; metadata, are not yet in &lt;tt&gt;oggz-tools&lt;/tt&gt;.

&lt;pre&gt;
$ hogg help chop
chop: Extract a section (specify start and/or end time)
Usage: hogg chop [options] filename ...

Examples:
  Extract the first minute of file.ogg:
    hogg chop -e 1:00 file.ogg

  Extract from the second to the fifth minute of file.ogg:
    hogg chop -s 2:00 -e 5:00 -o output.ogg file.ogg

  Extract only the Theora video stream, from 02:00 to 05:00, of file.ogg:
    hogg chop -c theora -s 2:00 -e 5:00 -o output.ogg file.ogg

  Extract, specifying SMPTE-25 frame offsets:
    hogg chop -c theora -s smpte-25:00:02:03::12 -e smpte-25:00:05:02::04 -o output.ogg file.ogg
&lt;/pre&gt;

Nevertheless, I'm continuing to work on both &lt;tt&gt;liboggz&lt;/tt&gt; and &lt;tt&gt;hogg&lt;/tt&gt;. &lt;tt&gt;liboggz&lt;/tt&gt;, in pure C, is faster; &lt;tt&gt;hogg&lt;/tt&gt;, in pure (but unoptimised) Haskell, is more correct.
I spent a few hours earlier today tracking down a corner case in &lt;tt&gt;liboggz&lt;/tt&gt;, coincidentally triggered by the chopping routines in &lt;tt&gt;libannodex&lt;/tt&gt;. It reminded me that one of my first realizations about Haskell was that its sanity-checker often tells you about forgotten corner cases of algorithms.</description><link>http://blog.kfish.org/2007/12/release-hogg-030.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-280132888872617567</guid><pubDate>Thu, 15 Nov 2007 11:50:00 +0000</pubDate><atom:updated>2007-11-16T19:24:10.085+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>utf8</category><category domain='http://www.blogger.com/atom/ns#'>haskell</category><category domain='http://www.blogger.com/atom/ns#'>unicode</category><title>Survey: Haskell Unicode support</title><description>Haskell source is interpreted as UTF-8, but internally the data is stored as Unicode code points. However the generic show method does not serialize Strings as UTF-8
(when using GHC).
So, when reading or writing documents it is necessary to introduce an explicit conversion from or to the desired character set. This article outlines how to use Unicode in Haskell, and surveys three alternatives for character set conversion: &lt;i&gt;iconv&lt;/i&gt;, &lt;i&gt;utf8-string&lt;/i&gt; and &lt;i&gt;encoding&lt;/i&gt;, providing working examples for each.

&lt;h2&gt;Unicode in Haskell source&lt;/h2&gt;

The Haskell Prime standardization wiki contains discussions of
&lt;a href="http://hackage.haskell.org/trac/haskell-prime/wiki/UnicodeInHaskellSource"&gt;Unicode in Haskell Source&lt;/a&gt;, and of ways of handling
&lt;a href="http://hackage.haskell.org/trac/haskell-prime/wiki/CharAsUnicode"&gt;Char as Unicode&lt;/a&gt;.
In particular, GHC (as of release 6.6, early Jan 2006) interprets source files as UTF-8. Hence the following is a valid source file:

&lt;pre&gt;
import System.Time

main :: IO ()
main = do
  time &lt;- getClockTime
  cal &lt;- toCalendarTime time
  putStrLn $ dayName $ ctWDay cal

dayName :: Day -&gt; String
dayName d = case d of
              Monday -&gt; "月曜日"
              Tuesday -&gt; "火曜日"
              Wednesday -&gt; "水曜日"
              Thursday -&gt; "木曜日"
              Friday -&gt; "金曜日"
              Saturday -&gt; "土曜日"
              Sunday -&gt; "日曜日"
&lt;/pre&gt;

The &lt;tt&gt;dayName&lt;/tt&gt; function provides the Japanese name for a given &lt;tt&gt;Day&lt;/tt&gt;. However the &lt;tt&gt;main&lt;/tt&gt; function, which tries to &lt;tt&gt;print&lt;/tt&gt; that onto &lt;tt&gt;stdout&lt;/tt&gt;, dumps it without any character set conversion, truncating each character to 8 bits. In order to control the output charset, we need to use a Unicode conversion library. The three libraries
&lt;i&gt;iconv&lt;/i&gt;, &lt;i&gt;utf8-string&lt;/i&gt; and &lt;i&gt;encoding&lt;/i&gt; have similar purposes but some different features.

&lt;h2&gt;&lt;a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/iconv"&gt;iconv&lt;/a&gt;&lt;/h2&gt;

&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Description:&lt;/th&gt;&lt;td&gt;Binding to C iconv() function&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Author:&lt;/th&gt;&lt;td&gt;Duncan Coutts&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;darcs get&lt;/th&gt;&lt;td&gt;&lt;tt&gt;&lt;a href="http://code.haskell.org/iconv/"&gt;http://code.haskell.org/iconv/&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Exports:&lt;/th&gt;&lt;td&gt;&lt;tt&gt;Codec.Text.IConv&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Interface:&lt;/th&gt;&lt;td&gt;&lt;tt&gt;ByteString.Lazy&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Advantages:&lt;/th&gt;&lt;td&gt;Speed, coverage of charset support&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Disadvantages:&lt;/th&gt;&lt;td&gt;Portability: requires POSIX &lt;tt&gt;iconv()&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;

This is a Haskell binding to the &lt;tt&gt;iconv()&lt;/tt&gt; C library function, providing a lazy ByteString interface.
The only module exported is &lt;tt&gt;Codec.Text.IConv&lt;/tt&gt;, which provides a single
function:

&lt;pre&gt;
-- | Convert fromCharset toCharset input output
convert :: String -&gt; String -&gt; Lazy.ByteString -&gt; Lazy.ByteString
&lt;/pre&gt;

where &lt;tt&gt;fromCharset&lt;/tt&gt; and &lt;tt&gt;toCharset&lt;/tt&gt; are the names of the input and output character set encodings, and input and output are the input and output text
as lazy ByteStrings.

An example program to convert the encoding of an input file, similar to the
GNU iconv program, is given in
&lt;a href="http://haskell.org/~duncan/iconv/examples/hiconv.hs"&gt;examples/hiconv.hs&lt;/a&gt;.
The guts of that program is:

&lt;pre&gt;
        output = convert (fromEncoding config) (toEncoding config) input
&lt;/pre&gt;

which is somewhat clearer than the
&lt;a href="http://lists.slug.org.au/archives/coders/2006/12/msg00003.html"&gt;brain-damaged&lt;/a&gt; interface exported by the C library. Exceptions are provided for handling unsupported conversions, invalid and incomplete characters. These errors can be silently ignored if desired by calling &lt;tt&gt;convertFuzzy&lt;/tt&gt; instead.

As this library wraps the system &lt;tt&gt;iconv()&lt;/tt&gt; implementation, all character sets supported on the underlying system are available. The Lazy.ByteString interface works directly on the memory buffers used by the C library, which may give a speed advantage for large conversions.

Note however that the &lt;tt&gt;iconv()&lt;/tt&gt; C library function is defined by POSIX.1-2001 and may not be available on some older systems. In most such cases it should be possible to install
&lt;a href="http://www.gnu.org/software/libiconv/"&gt;GNU libiconv&lt;/a&gt; separately.

&lt;h2&gt;&lt;a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/utf8-string"&gt;utf8-string&lt;/a&gt;&lt;/h2&gt;

&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Description:&lt;/th&gt;&lt;td&gt;Simple UTF-8 conversion library&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Author:&lt;/th&gt;&lt;td&gt;Eric Mertens&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;darcs get&lt;/th&gt;&lt;td&gt;&lt;tt&gt;&lt;a href="http://code.haskell.org/utf8-string/"&gt;http://code.haskell.org/utf8-string/&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Exports:&lt;/th&gt;&lt;td&gt;&lt;tt&gt;Codec.Binary.UTF8.String, System.IO.UTF8&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Interface:&lt;/th&gt;&lt;td&gt;&lt;tt&gt;String&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Advantages:&lt;/th&gt;&lt;td&gt;Simplicity&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Disadvantages:&lt;/th&gt;&lt;td&gt;Only supports UTF-8 conversions&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;

This library contains both a simple module for data conversion with a String interface, and a useful IO module.

The String conversion module, &lt;tt&gt;Codec.Binary.UTF8.String&lt;/tt&gt;, provides two pairs of complementary encoding and decoding functions:

&lt;pre&gt;
-- | Encode a string using 'encode' and store the result in a 'String'.
encodeString :: String -&gt; String

-- | Decode a string using 'decode' using a 'String' as input.
-- | This is not safe but it is necessary if UTF-8 encoded text
-- | has been loaded into a 'String' prior to being decoded.
decodeString :: String -&gt; String

-- | Encode a Haskell String to a list of Word8 values, in UTF8 format.
encode :: String -&gt; [Word8]

-- | Decode a UTF8 string packed into a list of Word8 values, directly to String
decode :: [Word8] -&gt; String
&lt;/pre&gt;

I guess "not safe" in the comment for &lt;tt&gt;decodeString&lt;/tt&gt; refers to type-safety; for example this function doesn't stop you from trying to decode the same text twice, whereas if you tried that with the plain &lt;tt&gt;decode&lt;/tt&gt; function, the compiler would point out your bug for you.

To see how this might look in the wild, the following is a complete "Hello World" web application (err, CGI script) in Japanese:

&lt;pre&gt;
import Codec.Binary.UTF8.String
import Network.CGI hiding (Html)
import Text.Html

main :: IO ()
main = runCGI $ handleErrors cgiMain

cgiMain :: CGI CGIResult
cgiMain = do
    setHeader "Content-Type" "text/html; charset=utf-8"
    output $ renderHtml $ h1 &lt;&lt; encodeString "おはよう御座います！"
&lt;/pre&gt;

The &lt;i&gt;utf8-string&lt;/i&gt; library also includes an entire IO module, &lt;tt&gt;System.IO.UTF8&lt;/tt&gt;, exporting
&lt;tt&gt;print, putStr, putStrLn, getLine, readLn, readFile, writeFile, appendFile, getContents, hGetLine, hGetContents, hPutStr, hPutStrLn&lt;/tt&gt;. These essentially wrap the default IO functions in &lt;tt&gt;encodeString&lt;/tt&gt; and &lt;tt&gt;decodeString&lt;/tt&gt;, which you may find convenient if you are doing lots of UTF-8 processing.

This library is tiny, and implemented natively in Haskell so there are no portability issues. As it works directly on ByteStrings it should be sufficiently fast for practical purposes. Of course, if you need to do conversions to or from character sets other than UTF-8, you will need to use a different library.

&lt;h2&gt;&lt;a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/encoding"&gt;encoding&lt;/a&gt;&lt;/h2&gt;

&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Description:&lt;/th&gt;&lt;td&gt;Native Haskell charset conversion library&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Author:&lt;/th&gt;&lt;td&gt;Henning Günther&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;darcs get&lt;/th&gt;&lt;td&gt;&lt;tt&gt;&lt;a href="http://code.haskell.org/encoding/"&gt;http://code.haskell.org/encoding/&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Exports:&lt;/th&gt;&lt;td&gt;&lt;tt&gt;Data.Encoding.*, System.IO.Encoding&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Interface:&lt;/th&gt;&lt;td&gt;&lt;tt&gt;ByteString.Lazy&lt;/tt&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Advantages:&lt;/th&gt;&lt;td&gt;Portable; covers more charsets than &lt;i&gt;utf8-string&lt;/i&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;th&gt;Disadvantages:&lt;/th&gt;&lt;td&gt;Covers fewer charsets than &lt;i&gt;iconv&lt;/i&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;

&lt;tt&gt;Data.Encoding&lt;/tt&gt; provides native Haskell implementations for encoding and decoding of many common character sets: ASCII, UTF8, UTF16, UTF32, ISO8859[1-16],
CP125[0-8], KOI8R, and GB18030, as well as BootString (for &lt;a href="http://www.ietf.org/rfc/rfc3492.txt"&gt;Punycode&lt;/a&gt;). For each of these, it implements an &lt;tt&gt;Encoding&lt;/tt&gt; interface:

&lt;pre&gt;
{- | Represents an encoding, supporting various methods of de- and encoding.
     Minimal complete definition: encode, decode
 -}
class Encoding enc where
        -- | Encode a 'String' into a strict 'ByteString'. Throws the
        --   'HasNoRepresentation'-Exception if it encounters an unrepresentable
        --   character.
        encode :: enc -&gt; String -&gt; ByteString
        -- | Encode a 'String' into a lazy 'Data.ByteString.Lazy.ByteString'.
        encodeLazy :: enc -&gt; String -&gt; LBS.ByteString
        encodeLazy e str = LBS.fromChunks [encode e str]
        -- | Whether or not the given 'Char' is representable in this encoding. Default: 'True'.
        encodable :: enc -&gt; Char -&gt; Bool
        encodable _ _ = True
        -- | Decode a strict 'ByteString' into a 'String'. If the string is not
        --   decodable, a 'DecodingException' is thrown.
        decode :: enc -&gt; ByteString -&gt; String
        decodeLazy :: enc -&gt; LBS.ByteString -&gt; String
        decodeLazy e str = concatMap (decode e) (LBS.toChunks str)
        -- | Whether or no a given 'ByteString' is decodable. Default: 'True'.
        decodable :: enc -&gt; ByteString -&gt; Bool
        decodable _ _ = True
&lt;/pre&gt;

Notice that this interface provides exceptions for handling unrepresentable characters.

Instances of &lt;tt&gt;Encoding&lt;/tt&gt; can be found by importing charset-specific modules; each simply exports a value with the same name as the module, ie. &lt;tt&gt;Data.Encoding.ISO88592&lt;/tt&gt; exports &lt;tt&gt;ISO88592&lt;/tt&gt;, which is an instance of &lt;tt&gt;Encoding&lt;/tt&gt;. Here is a "Hello World" CGI in Polish, using ISO-8859-2:

&lt;pre&gt;
import Data.Encoding
import Data.Encoding.ISO88592
import Data.ByteString.Char8
import Network.CGI hiding (Html)
import Text.Html

main :: IO ()
main = runCGI $ handleErrors cgiMain

cgiMain :: CGI CGIResult
cgiMain = do
    setHeader "Content-Type" "text/html; charset=iso-8859-2"
    output $ renderHtml $ h1 &lt;&lt; (unpack $ encode ISO88592 "Cześć")
&lt;/pre&gt;

You'll notice the call to the &lt;tt&gt;unpack&lt;/tt&gt; to convert the &lt;tt&gt;ByteString&lt;/tt&gt; into a plain &lt;tt&gt;String&lt;/tt&gt; as expected by &lt;tt&gt;Html&lt;/tt&gt;.

The &lt;i&gt;encoding&lt;/i&gt; library also provides a way to select an encoding by name:

&lt;pre&gt;
-- | Takes the name of an encoding and creates a dynamic encoding from it.
encodingFromString :: String -&gt; DynEncoding
&lt;/pre&gt;

(Anything which is a DynEncoding is by definition an instance of Encoding). So we could choose the encoding at runtime, or we can just be lazy and pick encodings by name. If we do this, we don't need to import the charset-specific module, and we can replace the last line of our CGI with:

&lt;pre&gt;
    let enc = encodingFromString "ISO-8859-2"
    output $ renderHtml $ h1 &lt;&lt; (unpack $ encode enc "Cześć")
&lt;/pre&gt;

The &lt;i&gt;encoding&lt;/i&gt; library also provides a pair of functions for converting character sets directly between two ByteStrings:

&lt;pre&gt;
-- | This decodes a string from one encoding and encodes it into another.
recode :: (Encoding from,Encoding to) =&gt; from -&gt; to -&gt; ByteString -&gt; ByteString

recodeLazy :: (Encoding from,Encoding to) =&gt; from -&gt; to -&gt; Lazy.ByteString -&gt; Lazy.ByteString
&lt;/pre&gt;

The &lt;tt&gt;System.IO.Encoding&lt;/tt&gt; module does not try to provide as many convenience functions as the similar module provided by &lt;i&gt;utf8-string&lt;/i&gt;, providing only the generic &lt;tt&gt;hGetContents&lt;/tt&gt; and &lt;tt&gt;hPutStr&lt;/tt&gt;. However, it does provide a way of retrieving the current system's default encoding (when used on systems supporting POSIX.1-2001 &lt;tt&gt;nl_langinfo()&lt;/tt&gt;), which &lt;i&gt;utf8-string&lt;/i&gt; lacks.

&lt;pre&gt;
-- | Like the normal 'System.IO.hGetContents', but decodes the input using an
--   encoding.
hGetContents :: Encoding e =&gt; e -&gt; Handle -&gt; IO String

-- | Like the normal 'System.IO.hPutStr', but encodes the output using an
--   encoding.
hPutStr :: Encoding e =&gt; e -&gt; Handle -&gt; String -&gt; IO ()

-- | Returns the encoding used on the current system.
getSystemEncoding :: IO DynEncoding
&lt;/pre&gt;

As this library is native Haskell it is portable, and as it uses lazy ByteStrings it can be fast. While it does not (yet) provide as many character sets as your system's &lt;tt&gt;iconv()&lt;/tt&gt;, it does support many of the most commonly used ones.

&lt;h2&gt;Notes&lt;/h2&gt;

The libraries surveyed here are under fairly active maintenance, and there are rumours of unifying their implementations. Nevertheless the existing interfaces are fairly similar where common functionality exists.

&lt;strike&gt;
Historically, all serialized data was handled in Haskell as Strings, and there was a legitimate concern that transparently converting the character set of arbitrary Strings could mangle data.
The newer ByteString and Binary interfaces may allow future Haskell standards to clearly disambiguate binary and textual data, and simply serialize Strings as UTF-8 by default.
&lt;/strike&gt;

Although it might be nice to "simply" serialize Strings as UTF-8, &lt;tt&gt;show&lt;/tt&gt; is the wrong place to do it. Haskell's &lt;tt&gt;Read/Show&lt;/tt&gt; serialization serializes to &lt;tt&gt;String&lt;/tt&gt;, which is a list of &lt;tt&gt;Char&lt;/tt&gt;, ie. a list of abstract Unicode code points. Character set conversion should rather happen on conversion to &lt;tt&gt;[Word8]&lt;/tt&gt;, at which point byte values become significant. This also encompasses direct conversions to &lt;tt&gt;ByteString&lt;/tt&gt;, and the internals of primitive IO functions such as:

&lt;pre&gt;
putChar    :: Char -&gt; IO ()
putChar    =  primPutChar

getChar    :: IO Char
getChar    =  primGetChar
&lt;/pre&gt;

, &lt;tt&gt;getContents&lt;/tt&gt;, &lt;tt&gt;readFile&lt;/tt&gt;, &lt;tt&gt;writeFile&lt;/tt&gt;, and &lt;tt&gt;appendFile&lt;/tt&gt; defined in the &lt;a href="http://www.haskell.org/onlinereport/standard-prelude.html"&gt;Haskell Prelude&lt;/a&gt;, and the various character IO functions on &lt;tt&gt;Handle&lt;/tt&gt;s defined in &lt;a href="http://www.haskell.org/ghc/docs/latest/html/libraries/base-3.0.0.0/System-IO.html"&gt;System.IO&lt;/a&gt;.
Whether or not this conversion can be done everywhere transparently, and backwards-compatibly, is an open issue for Haskell Prime. Meanwhile these libraries provide useful interfaces for explicit &lt;tt&gt;[Word8]&lt;/tt&gt; and &lt;tt&gt;ByteString&lt;/tt&gt; conversion, and various IO wrappers.

&lt;h2&gt;Summary&lt;/h2&gt;

Although all Haskell Strings are Unicode, Haskell98 does not specify a character set representation for their IO. Unicode strings can be written directly into Haskell source files and hence exist as data within a program, but character set conversion is required if you wish to read or write these Strings in files, user input or on the network.

We looked at ways of dealing with Unicode in Haskell, surveyed some useful libraries and provided working examples. Although we might hope that a future version of Haskell will provide a way to handle UTF-8 conversions, in the meantime we need to choose an appropriate library for each project that handles Unicode text.

&lt;h2&gt;Updates&lt;/h2&gt;

&lt;b&gt;Fri Nov 16&lt;/b&gt;: Edited to incorporate some feedback from #haskell:

&lt;ul&gt;
&lt;li&gt;Thanks to Tim Newsham for clarifying GHC's default &lt;a href="http://hpaste.org/3908"&gt;character encoding&lt;/a&gt; when printing Strings.&lt;/li&gt;
&lt;li&gt;Thanks to Stefan N. O'Rear for pointing out that Show/Read is not the right place for serialization, but that it should instead occur on conversion to/from &lt;tt&gt;[Word8]&lt;/tt&gt;.&lt;/li&gt;
&lt;/ul&gt;</description><link>http://blog.kfish.org/2007/10/survey-haskell-unicode-support.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-7020532669878400468</guid><pubDate>Fri, 09 Nov 2007 07:37:00 +0000</pubDate><atom:updated>2007-11-09T16:41:48.024+09:00</atom:updated><title>Release: libfishsound 0.8.1</title><description>libfishsound provides a simple programming interface for decoding and
encoding audio data using Xiph.Org codecs (Vorbis and Speex). 
&lt;a href="http://www.annodex.net/software/libfishsound/download/libfishsound-0.8.1.tar.gz"&gt;libfishsound 0.8.1&lt;/a&gt; is a maintenance release, fixing a build error when configured with encoding disabled.

Full &lt;a href="http://www.annodex.net/software/libfishsound/html/"&gt;documentation of the FishSound API&lt;/a&gt;, customization and installation,
and complete examples of Ogg Vorbis and Speex decoding and encoding are
provided.</description><link>http://blog.kfish.org/2007/11/release-libfishsound-081.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-7769382099342914655</guid><pubDate>Mon, 10 Sep 2007 12:56:00 +0000</pubDate><atom:updated>2007-09-10T22:16:11.642+09:00</atom:updated><title>Type-Level Instant Insanity</title><description>&lt;p&gt;
I wrote an introductory tutorial about type-level programming,
&lt;a href="http://haskell.org/haskellwiki/User:ConradParker/InstantInsanity"&gt;"Type-Level Instant Insanity"&lt;/a&gt;, which is now available in &lt;a href="http://www.haskell.org/sitewiki/images/d/dd/TMR-Issue8.pdf"&gt;the Monad.Reader, Issue 8&lt;/a&gt; [PDF]. The tutorial implements a solution to the puzzle
&lt;a href="http://www.geocities.com/jaapsch/puzzles/insanity.htm"&gt;Instant Insanity&lt;/a&gt;, entirely in the Haskell type-system.
&lt;/p&gt;

&lt;blockquote&gt;
Familiarity with the syntax of the Haskell Type System is a prerequisite for understanding the details of general Haskell programming. What better way to build familiarity with something than to hack it to bits?
&lt;/blockquote&gt;

&lt;p&gt;
The point of the exercise is to demonstrate how the type system can be used to tell the compiler about details of logic, such as pre- and post-conditions of functions. Although type annotations are optional in Haskell, they can often be used to more clearly describe what the shape of a correct solution is; or, conversely, to tell the compiler what a bug looks like.
&lt;/p&gt;

&lt;p&gt;
There is also a 
&lt;a href="http://haskell.org/haskellwiki/User_talk:ConradParker/InstantInsanity"&gt;talk page&lt;/a&gt; for this article, so feel free to add random thoughts there, or here.
&lt;/p&gt;

&lt;p&gt;
&lt;a href="http://www.haskell.org/haskellwiki/The_Monad.Reader"&gt;The Monad.Reader&lt;/a&gt;
is "a quarterly magazine about functional programming. It is less formal than a journal, but somehow more enduring than a wiki page or blog post."
&lt;/p&gt;

&lt;p&gt;
Please enjoy :-)
&lt;/p&gt;</description><link>http://blog.kfish.org/2007/09/type-level-instant-insanity.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-8661185421897865477</guid><pubDate>Sat, 09 Jun 2007 08:09:00 +0000</pubDate><atom:updated>2007-06-09T17:10:26.397+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>haskell</category><category domain='http://www.blogger.com/atom/ns#'>xml</category><category domain='http://www.blogger.com/atom/ns#'>review</category><title>Review: TagSoup</title><description>&lt;p&gt;This week I've been playing with &lt;a href="http://www-users.cs.york.ac.uk/~ndm/tagsoup/"&gt;TagSoup&lt;/a&gt;, a Haskell library by Neil Mitchell and Henning Thielemann "for extracting information out of unstructured HTML code, sometimes known as tag-soup". This article introduces the basic usage of TagSoup, and discusses the functional approach to mining XML-like data.
&lt;/p&gt;
&lt;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Name:&lt;/strong&gt; TagSoup&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Version:&lt;/strong&gt; darcs as at June 9, 2007; prior to release version 0.2&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Functionality:&lt;/strong&gt; Parsing possibly malformed HTML/XML&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Inputs:&lt;/strong&gt; String (also includes a &lt;em&gt;URL -&gt; IO String&lt;/em&gt; helper function)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;License:&lt;/strong&gt; BSD3&lt;/li&gt;
&lt;/ul&gt;
&lt;/p&gt;
&lt;h3&gt;parseTags :: String -&gt; [Tag]&lt;/h3&gt;
&lt;p&gt;The first thing you do with TagSoup is parse the document into a list of &lt;tt&gt;[Tag]&lt;/tt&gt;. The &lt;tt&gt;Tag&lt;/tt&gt; type is fairly general, and can represent the various things that can occur when reading an HTML document: an opening or closing tag, the text between tags (like &lt;tt&gt;&amp;lt;strong&amp;gt;&lt;strong&gt;this&lt;/strong&gt;&amp;lt;/strong&amp;gt;&lt;/tt&gt;), comments, or special tags like &lt;tt&gt;&amp;lt;!DOCTYPE ...&amp;gt;&lt;/tt&gt;. It is also used to mark the location of syntax errors, though of course these are not fatal as the whole point of the library is to robustly work around badly-formed input.
&lt;/p&gt;
&lt;h3&gt;XML Parsing&lt;/h3&gt;
&lt;p&gt;
TagSoup actually contains no HTML-specific code, other than that it knows about HTML entities. To demonstrate that it can be used for other kinds of possibly malformed XML, I added an example which extracts information from an RSS feed (now part of &lt;a href="http://www.cs.york.ac.uk/fp/darcs/tagsoup/Example/Example.hs"&gt;Example.hs&lt;/a&gt;):
&lt;/p&gt;
&lt;pre&gt;
-- rssCreators Example: prints names of story contributors on
-- sequence.complete.org. This content is RSS (not HTML), and the selected
-- tag uses a different XML namespace "dc:creator".

rssCreators :: IO [String]
rssCreators = do
    tags &lt;- liftM parseTags $ openURL "http://sequence.complete.org/node/feed"
    return $ map names $ partitions (~== "dc:creator") tags
    where
      names xs = innerText $ xs !! 1
&lt;/pre&gt;
&lt;p&gt;This function is of type &lt;tt&gt;IO [String]&lt;/tt&gt;: it uses &lt;tt&gt;IO&lt;/tt&gt;, and returns a list of &lt;tt&gt;String&lt;/tt&gt;s -- the names of contributors. The first line:
&lt;pre&gt;
    tags &amp;lt;- liftM parseTags $ openURL "http://sequence.complete.org/node/feed"
&lt;/pre&gt;
uses &lt;tt&gt;openURL&lt;/tt&gt; (part of TagSoup) to read the contents of the given RSS feed into a String, and then runs &lt;tt&gt;parseTags&lt;/tt&gt; on that, calling the result &lt;tt&gt;tags&lt;/tt&gt;.
&lt;/p&gt;
&lt;h4&gt;Extracting information&lt;/h4&gt;
&lt;p&gt;
Now that we can use the XML as a list of &lt;tt&gt;Tag&lt;/tt&gt;s, the second line:
&lt;pre&gt;
    return $ map names $ partitions (~== "dc:creator") tags
&lt;/pre&gt;
splits it up into separate partitions, starting a new partition wherever there is a tag that roughly matches &lt;tt&gt;&amp;lt;... dc:creator ...&amp;gt;&lt;/tt&gt;. It then runs the function &lt;tt&gt;names&lt;/tt&gt; on each partition, and returns the result. &lt;tt&gt;names&lt;/tt&gt; simply grabs the text inside the first thing in a partition, ie. the content of the &lt;tt&gt;&amp;lt;dc:creator&amp;gt;&lt;/tt&gt; tag itself. Done:
&lt;pre&gt;
*Example.Example&gt; rssCreators 
["dons","dons","dons","jgoerzen","dons","dons","dons","dons","dons","dons"]
&lt;/pre&gt;
&lt;/p&gt;
&lt;h3&gt;A more complex example, using an external HTTP library&lt;/h3&gt;
&lt;p&gt;
&lt;a href="http://research.microsoft.com/~simonpj/"&gt;Simon Peyton-Jones&lt;/a&gt; is a Free Software developer working on the GHC compiler at Microsoft Research in Cambridge, England. One of the examples given in &lt;a href="http://www.cs.york.ac.uk/fp/darcs/tagsoup/Example/Example.hs"&gt;Example.hs&lt;/a&gt; attempts to extract a list of his current research papers. Using TagSoup's convenient but simple HTTP library, it fails to terminate due to a hanging server. Here is a working version of that example  using the new lazy ByteString version of the &lt;a href="http://www.dtek.chalmers.se/~tox/site/http.php4"&gt;Haskell HTTP&lt;/a&gt; library:
&lt;/p&gt;
&lt;pre&gt;
-- compile with: ghc --make -o spj -Ldist/build -lHSHTTP1-3000.0.0 spj.hs

module Main where

import Text.HTML.TagSoup

import qualified Data.ByteString.Lazy.Char8 as BS
import Network.HTTP (rspBody)
import Network.HTTP.UserAgent as UA

spjPapers :: IO ()
spjPapers = do
        rsp &lt;- UA.get "http://research.microsoft.com/~simonpj/"
        let tags = parseTags $ BS.unpack $ rspBody rsp
        let links = map f $ sections (~== "a") $
                    takeWhile (~/= TagOpen "a" [("name","haskell")]) $
                    drop 5 $ dropWhile (~/= TagOpen "a" [("name","current")]) tags
        putStr $ unlines links
    where
        f :: [Tag] -&gt; String
        f = dequote . unwords . words . innerText . head . filter isTagText

        dequote ('\"':xs) | last xs == '\"' = init xs
        dequote x = x

main = spjPapers
&lt;/pre&gt;
&lt;p&gt;This example is obviously a little more involved, but this is typical for a real-world example of scraping HTML -- the main content of the page is handwritten, and the enclosing content management system uses many non-standard elements and attributes.
&lt;/p&gt;&lt;p&gt;The guts of that example is the &lt;tt&gt;links = ...&lt;/tt&gt; declaration. It looks for the part of the page roughly between &lt;tt&gt;&amp;lt;a name="current"&amp;gt;&lt;/tt&gt; and &lt;tt&gt;&amp;lt;a name="haskell"&amp;gt;&lt;/tt&gt;, breaks it up into sections (starting each section wherever there is a new &lt;tt&gt;&amp;lt;a ...&amp;gt;&lt;/tt&gt; tag), and then runs some function &lt;tt&gt;f&lt;/tt&gt; on each of those sections. To see this in more detail, we can read the declaration from the far end backwards: with the parsed list of &lt;tt&gt;tags :: [Tag]&lt;/tt&gt;, do the following:
&lt;/p&gt;&lt;ol&gt;
&lt;li&gt;drop everything until you get an opening tag matching
&lt;tt&gt;&amp;lt;a ... name="current" ...&amp;gt;&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;drop the next 5 tags (no matter what they are)&lt;/li&gt;
&lt;li&gt;take everything until you get an opening tag matching
&lt;tt&gt;&amp;lt;a ... name="haskell" ... &amp;gt;&lt;/tt&gt;&lt;/li&gt;
&lt;li&gt;We now have a big list of tags (&lt;tt&gt;:: [Tag]&lt;/tt&gt;): split it up into sections, starting each section wherever there is a new &lt;tt&gt;&amp;lt;a ...&amp;gt;&lt;/tt&gt; tag. This will give us a list of lists of tags (&lt;tt&gt;:: [[Tag]]&lt;/tt&gt;).&lt;/li&gt;
&lt;li&gt;run &lt;tt&gt;f&lt;/tt&gt; on each section's &lt;tt&gt;[Tag]&lt;/tt&gt;.
&lt;/ol&gt;
&lt;p&gt;And what about this magical function &lt;tt&gt;f&lt;/tt&gt;? It takes the first item of &lt;tt&gt;TagText&lt;/tt&gt; (text between tags) in a section, runs &lt;tt&gt;unwords . words&lt;/tt&gt; on it to clean up the whitespace (by splitting it up into words, then joining the words back up with a single space between each), and finally removes any surrounding quotes if present. You could use a function like &lt;tt&gt;f&lt;/tt&gt; to clean up tag text in any page you are scraping. Here, the final result is a nice, clean, plaintext list of the titles of Simon Peyton-Jones' current research papers:
&lt;pre&gt;
Constructor specialisation for Haskell programs
Faster laziness using dynamic pointer tagging
Scrap your type applications
A History of Haskell: being lazy with class
...
&lt;/pre&gt;
&lt;/p&gt;

&lt;h3&gt;Functions for extracting information&lt;/h3&gt;

TagSoup provides a few useful functions for pulling apart web pages. Taken together with Haskell's list processing functions, the result is a very expressive, concise language for extracting information from web data.

In the above examples, we've seen how various String and list handling functions (like &lt;tt&gt;&lt;a href="http://haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#v:lines"&gt;lines&lt;/a&gt;, &lt;a href="http://haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#v:words"&gt;words&lt;/a&gt;, &lt;a href="http://haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#v:unlines"&gt;unlines&lt;/a&gt;, &lt;a href="http://haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#v:unwords"&gt;unwords&lt;/a&gt;, &lt;a href="http://haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#v:takeWhile"&gt;takeWhile&lt;/a&gt;, &lt;a href="http://haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#v:drop"&gt;drop&lt;/a&gt;, &lt;a href="http://haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#v:filter"&gt;filter&lt;/a&gt;&lt;/tt&gt;, defined in the &lt;a href="http://haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html"&gt;Haskell Prelude&lt;/a&gt;), can be used together with predicates like &lt;tt&gt;&lt;a href="http://www.cs.york.ac.uk/fp/haddock/tagsoup/Data-Html-TagSoup.html#v:isTagText"&gt;isTagText&lt;/a&gt; &lt;/tt&gt; from TagSoup. TagSoup also provides operators for inexact matching, &lt;tt&gt;~==&lt;/tt&gt; and its negative &lt;tt&gt;~/=&lt;/tt&gt;. These allow you to do a loose match on tag contents, so that your HTML-scraping application has some resilience to minor changes in page generation.
&lt;/p&gt;&lt;p&gt;Lastly, functions like &lt;tt&gt;&lt;a href="http://www.cs.york.ac.uk/fp/haddock/tagsoup/Data-Html-TagSoup.html#v:sections"&gt;sections&lt;/a&gt;&lt;/tt&gt;  make use of the above predicates to divide the page up into similar parts. As we have seen in the above examples, this is useful for the common case where a page contains a list of items, and we want to extract the same kind of information from each of those items. We can write a function &lt;tt&gt;f&lt;/tt&gt; to handle one item, then simply map &lt;tt&gt;f&lt;/tt&gt; across all the contents of the page.
&lt;/p&gt;
&lt;h3&gt;Comparisons&lt;/h3&gt;
&lt;p&gt;
The similarly-named &lt;a href="http://ccil.org/~cowan/XML/tagsoup/"&gt;Java TagSoup&lt;/a&gt; is "a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild". This tries to make badly formed HTML usable through a conventional SAX interface. This makes it useful for more general applications than the Haskell TagSoup, but far less expressive for the task of scraping a known web page.
&lt;/p&gt;&lt;p&gt;
The Python &lt;a href="http://www.crummy.com/software/BeautifulSoup/"&gt;Beatiful Soup&lt;/a&gt; (and similarly its Ruby counterpart &lt;a href="http://www.crummy.com/software/RubyfulSoup/"&gt;Rubyful Soup&lt;/a&gt;) "can turn even invalid markup into a parse tree." This takes quite a similar approach to the Haskell TagSoup, and also provides functions to print out the modified parse tree. It attempts to create a full DOM parse tree (using HTML-specific heuristics), so extracting information can involve walking the tree with syntax like &lt;tt&gt;head.nextSibling.contents[0].nextSibling&lt;/tt&gt;. Nevertheless it provides content-based search functions like &lt;tt&gt;soup.find('p', align="center")&lt;/tt&gt;, and &lt;a href="http://www.crummy.com/software/BeautifulSoup/documentation.html"&gt;with a few lambda functions&lt;/a&gt; can be quite expressive.&lt;/p&gt;

&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;
This &lt;a href="http://www-users.cs.york.ac.uk/~ndm/tagsoup/"&gt;TagSoup&lt;/a&gt; does one thing and does it very well: it provides a small set of useful abstractions for extracting information from HTML pages and other XML data, As the markup you are scraping is badly formed, TagSoup provides operators for inexact matching and does not attempt to coax the page into a conventional tree structure. The result is a very fast, list-based representation of page content which can be mined using TagSoup's &lt;tt&gt;Tag&lt;/tt&gt;-specific functions and Haskell's usual &lt;a href="http://haskell.org/ghc/docs/latest/html/libraries/base/Data-List.html"&gt;list&lt;/a&gt; operations.&lt;/p&gt;</description><link>http://blog.kfish.org/2007/06/review-tagsoup.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-4245879337638539399</guid><pubDate>Sun, 03 Jun 2007 06:48:00 +0000</pubDate><atom:updated>2007-06-04T19:23:30.646+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>ogg</category><category domain='http://www.blogger.com/atom/ns#'>release</category><title>Release: libfishsound 0.8.0</title><description>&lt;a href="http://www.annodex.net/software/libfishsound/index.html"&gt;libfishsound&lt;/a&gt; provides a simple and consistent programming interface for decoding and encoding audio data using Xiph.Org codecs (&lt;a href="http://www.vorbis.com/"&gt;Vorbis&lt;/a&gt; and &lt;a href="http://www.speex.org/"&gt;Speex&lt;/a&gt;).

This release includes compatibility with the floating point portion of the libfishsound development trunk API, in preparation for use with &lt;a href="http://wiki.xiph.org/index.php/OggPlay"&gt;liboggplay&lt;/a&gt;. In order to build a minimal version of libfishsound for use with liboggplay,
&lt;a href="http://www.annodex.net/software/libfishsound/html/group__configuration.html"&gt;configure with encoding disabled&lt;/a&gt; in order to produce a smaller binary and to remove the dependency on libvorbisenc.</description><link>http://blog.kfish.org/2007/06/release-libfishsound-080.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-8813214115514127724</guid><pubDate>Fri, 04 May 2007 13:28:00 +0000</pubDate><atom:updated>2007-06-03T12:41:48.775+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>annodex</category><category domain='http://www.blogger.com/atom/ns#'>xtech</category><title>XTech 2007</title><description>&lt;a href="http://2007.xtech.org/"&gt;&lt;img src="http://2007.xtech.org/public/asset/asset/4"/&gt;&lt;/a&gt;
&lt;p&gt;
I'll be giving a general talk about &lt;a href="http://www.annodex.net/"&gt;Annodex&lt;/a&gt; at &lt;a href="http://2007.xtech.org/"&gt;XTech 2007&lt;/a&gt; in Paris. The conference theme is "The Ubiquitous Web", and covers emerging standards for web technology.
&lt;/p&gt;&lt;p&gt;
My presentation slides and a short paper are available:
&lt;a href="http://2007.xtech.org/public/schedule/detail/101"&gt;Building video webs with Annodex&lt;/a&gt;.
&lt;/p&gt;</description><link>http://blog.kfish.org/2007/05/xtech-2007.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-3610534075491606598</guid><pubDate>Mon, 12 Feb 2007 12:35:00 +0000</pubDate><atom:updated>2007-06-03T10:11:28.983+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>latex</category><title>Greek, Cyber-virgins, in LaTeX</title><description>&lt;p&gt;
For some reason, recently I've been reading a lot of math and writing a lot
of LaTeX. I was getting lost in the symbols so I wrote up a chart of the
&lt;a href="http://www.kfish.org/tech/latex/greek-alphabet.html"&gt;Greek alphabet
in LaTeX math mode&lt;/a&gt;, in pretty black, white and grey.
The source is annotated with instructions for Web 2.0 programmers to replace
the grey with cornflower blue.
&lt;/p&gt;

&lt;p&gt;
While putting this together I came across the book
&lt;a href="http://www.eijkhout.net/tbt/"&gt;TeX by Topic, a TeXnician's Reference&lt;/a&gt;, by Victor Eijkhout.
The original book, published in 1992, has been out of print for a few years,
but it is avaliable from the author's web site as a 289pp PDF. Chapter 1
covers the low-level input processor and expansion, and the subsequent 269
pages go below that, into the nasty implementation details of TeX:
&lt;/p&gt;

&lt;blockquote&gt;
"The four levels [of input processing] (corresponding roughly to the 'eyes',
'mouth', 'stomach' and 'bowels' respectively in Knuth's original terminology
..."
&lt;/blockquote&gt;

&lt;p&gt;
Thanks to the quick search capabilities of my PDF reader I was able to find
the details I needed about macro expansion within the few minutes of sanity
that this book affords the casual reader.
If you are planning to start a neo-tribalistic techno-cult,
I highly recommend this book as a means of torturing hapless
cyber-virgins during their last few hours before sacrifice.
&lt;/p&gt;

&lt;p&gt;
And keep it real: Respect Knuth's original terminology.
&lt;/p&gt;</description><link>http://blog.kfish.org/2007/02/greek-cyber-virgins-in-latex.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-4902773631788194611</guid><pubDate>Tue, 06 Feb 2007 15:03:00 +0000</pubDate><atom:updated>2007-06-03T10:10:12.461+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>microcontroller</category><category domain='http://www.blogger.com/atom/ns#'>ubuntu</category><title>Atmel AVR ISP mkII, avrdude, Ubuntu</title><description>&lt;p&gt;
Inspired by
&lt;a href="http://lca2007.linux.org.au/talk/23"&gt;Vik Olliver&lt;/a&gt; and
&lt;a href="http://lca2007.linux.org.au/talk/185"&gt;Geoffrey Bennett&lt;/a&gt; at
&lt;a href="http://lca2007.linux.org.au/"&gt;LCA 2007&lt;/a&gt;,
and by seeing a class full of 9 year old Japanese kids building and programming robots (part of a field trip at &lt;a href="http://www.cm.is.ritsumei.ac.jp/c5-07/"&gt;C5 2007&lt;/a&gt;), I grabbed an &lt;a href="http://en.wikibooks.org/wiki/Embedded_Systems/Atmel_AVR"&gt;AVR microcontroller&lt;/a&gt; and started playing.
&lt;/p&gt;

&lt;p&gt;
This post just contains some gotchas for what seem to be common problems.
&lt;/p&gt;

&lt;p&gt;
My aim tonight was just to build and program a simple test circuit, as outlined in
Guido Socher's article &lt;a href="http://www.linuxfocus.org/English/November2004/article352.shtml"&gt;Programming the AVR microcontroller with GCC&lt;/a&gt;. As I have plenty of USB ports but no parellel port, I bought a USB based in-system
programmer, the AVR ISP mkII. My laptop is running Ubuntu 6.10 (Edgy), but most
of the software required comes straight out of universe, ie. basically pulled directly from Debian.
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;To use the mkII rather than a parallel port programmer, refer to the
&lt;a href="http://www.equinox-tech.com/products/details.asp?ID=362#f622"&gt;pin-out for the AVR ISP header&lt;/a&gt;. Add a separate Vcc line to the programmer, the rest
of the circuit is unchanged.
&lt;/li&gt;

&lt;li&gt;The cross-compiler toolchain is packaged in Debian, as is avrdude (to upload and download the code, dude):
&lt;pre&gt;
apt-get install binutils-avr gcc-avr avr-libc avrdude
&lt;/pre&gt;
There are some buildable examples in &lt;tt&gt;/usr/share/doc/avr-libc/examples&lt;/tt&gt;.
&lt;/li&gt;

&lt;li&gt;
Guido Socher provides
&lt;a href="http://www.linuxfocus.org/common/src2/article352/avrm8ledtest-0.2.tar.gz"&gt;an updated tarball&lt;/a&gt; for the demo code in his tutorial. It contains a pre-built image and a useful Makefile.
&lt;/li&gt;

&lt;li&gt;
To add an entry for the AVR ISP mkII to udev, add the following to
&lt;tt&gt;/etc/udev/rules.d/40-permissions.rules&lt;/tt&gt;:

&lt;pre&gt;
#AVRISP mkII
SUBSYSTEM=="usb_device",SYSFS{idVendor}=="03eb",SYSFS{idProduct}=="2104",MODE="0666"
&lt;/pre&gt;
(from &lt;a href="http://yuki-lab.jp/linux/ubuntu.html"&gt;Yuki's Linux Memo&lt;/a&gt;).
&lt;/li&gt;

&lt;li&gt;
Apparently the factory setting for the clock on these chips is fairly random,
so you need to set that explicitly before you try doing anything else like
uploading code. If you don't do that, then you'll get a bunch of nasty timeout
errors like
&lt;tt&gt;avrdude: stk500v2_paged_write: write command failed with 128&lt;/tt&gt;.
So set the clock by dropping into the avr terminal:
&lt;pre&gt;
$ avrdude -c avrispmkII -P usb -p m8 -F -u -t
avrdude&gt; sck 10 
&gt;&gt;&gt; sck  10
Using p = 10.37 us for SCK (param = 7)
avrdude&gt; quit
&gt;&gt;&gt; quit 

avrdude done.  Thank you.
&lt;/pre&gt;
&lt;/li&gt;
&lt;/ul&gt;</description><link>http://blog.kfish.org/2007/02/atmel-avr-isp-mkii-avrdude-ubuntu.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-5061965354011642226</guid><pubDate>Sun, 17 Dec 2006 14:42:00 +0000</pubDate><atom:updated>2007-06-03T10:07:14.086+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>haskell</category><category domain='http://www.blogger.com/atom/ns#'>scripting</category><title>Introductory Haskell Programming in the UNIX Environment</title><description>&lt;p&gt;
A few months back I was chatting to
&lt;a href="http://www.cse.unsw.edu.au/~dons/"&gt;Don Stewart&lt;/a&gt;
about scripting in Haskell, and he pointed me towards some
&lt;a href="http://bother.kfish.org/wiki/HaskellShellScripts"&gt;Haskell shell scripts&lt;/a&gt; he's written.
&lt;/p&gt;
&lt;p&gt;
This weekend, Don wrote some introductory tutorials.
&lt;a href="http://cgi.cse.unsw.edu.au/~dons/blog/2006/12/16#programming-haskell-intro"&gt;Part 1&lt;/a&gt; introduces Haskell in a similar style to how the Camel book
introduces Perl -- quite readable, and fairly low on mathematical jargon.
&lt;a href="http://cgi.cse.unsw.edu.au/~dons/blog/2006/12/17#programming-haskell-part-2"&gt;Part 2&lt;/a&gt; introduces character and file IO, which I'll dig into below.
&lt;/p&gt;

&lt;b&gt;Why bother?&lt;/b&gt;

&lt;p&gt;
It turns out that you can re-implement the core of many
&lt;a href="http://haskell.org/haskellwiki/Simple_unix_tools"&gt;simple UNIX tools&lt;/a&gt;
as one-liners in Haskell. This is interesting because, like C, Haskell
compiles to a binary and runs like a real program. Its also interesting because,
unlike C, Haskell provides lots of error checking, as well as guarantees
against segfaults and memory leaks, for free.
&lt;/p&gt;

&lt;b&gt;Lazy evaluation&lt;/b&gt;

&lt;p&gt;
Consider the following implementation of &lt;tt&gt;cp&lt;/tt&gt; (from &lt;a href="http://cgi.cse.unsw.edu.au/~dons/blog/2006/12/17#programming-haskell-part-2"&gt;Part 2&lt;/a&gt;),
which copies its standard input to standard output:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
import System.Environment

main = do
  [infile, outfile] &lt;- getArgs
  s &lt;- readFile infile 
  writeFile outfile s
&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;
Although this is pretty simple to understand, it looks like it reads the
entire contents of the input file into the variable s, and then writes that
to the output file. That would be a huge memory hog, so let's take a look
at what's actually going on.
&lt;/p&gt;

&lt;p&gt;
Haskell compiles to a binary, so we can
&lt;a href="http://www-128.ibm.com/developerworks/aix/library/au-unix-strace.html"&gt;strace&lt;/a&gt; the resulting program:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
$ strace -o /tmp/cp.out ./cp bigfile.ogg /tmp/bigfile-copy.ogg
$ less /tmp/cp.out
...
read(3, "\300\23n\261\205\v\fD$\r\330,\260\2172Zp\241h\306&lt;\216"..., 8192) = 8192
write(4, "\300\23n\261\205\v\fD$\r\330,\260\2172Zp\241h\306&lt;\216"..., 8192) = 8192 
read(3, "\2646\353t\304\300\f9|\36\10|O@r|\3149\3\340v{4\366|\17"..., 8192) = 8192
write(4, "\2646\353t\304\300\f9|\36\10|O@r|\3149\3\340v{4\366|\17"..., 8192) = 8192
...
&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;
We see that it has actually set up an 8K temporary buffer to funnel data back
and forth, keeping the memory requirements very low. So the code was not
a memory hog at all, even though its pretty simple to understand.
&lt;/p&gt;
&lt;p&gt;
The way this works is that &lt;code&gt;s&lt;/code&gt; is not a normal &lt;code&gt;String&lt;/code&gt;
variable at all. It is an &lt;code&gt;IO String&lt;/code&gt;, the embodiment of
everything &lt;code&gt;String&lt;/code&gt;-like in &lt;code&gt;IO&lt;/code&gt;. It lives in a very
beautiful, transient and continually changing state of interaction where it
might read some chars, write some, read some, write some, and so on until EOF.
This is all that an &lt;code&gt;IO String&lt;/code&gt; could want from its brief yet
pristine existence, and nothing more.
&lt;/p&gt;

&lt;b&gt;Pass the pipe&lt;/b&gt;
&lt;p&gt;
Giving our instance of this &lt;code&gt;IO String&lt;/code&gt; a name is
conceptually similar to the use of named pipes in shell scripts. A direct
translation of the above Haskell script into &lt;code&gt;sh&lt;/code&gt; might be:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
#!/bin/sh

infile=$1
outfile=$2

s="${TMPDIR-/tmp}/$$.fifo"
mkfifo $s

cat &lt; $s &gt; $outfile &amp;
cat &lt; $infile &gt; $s

rm $s
&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;
Of course, this example is trivial; you'd only use named pipes for more
complex tasks, such as setting up
&lt;a href="http://blog.kfish.org/software/scripting/transcoding_fifos.html"&gt;transcoding pipelines&lt;/a&gt;, where you might not know the names or parameters of the
commands to be run up front.
So, what if your shell script doesn't need to be so complex? What if you don't
need to name your intermediate pipe?
&lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
cat $infile | cat &gt; $outfile
&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;
Well, that's fine in Haskell too:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
readFile f &gt;&gt;= writeFile g
&lt;/pre&gt;
&lt;/blockquote&gt;

&lt;p&gt;
No more naming our intermediate &lt;code&gt;IO String&lt;/code&gt;. But now we know that
it's still there, lurking inside that little &lt;code&gt;&amp;gt;&amp;gt;=&lt;/code&gt;. This uses
&lt;emph&gt;lazy evaluation&lt;/emph&gt;, and we read in the
&lt;a href="http://www.hhhh.org/wiml/virtues.html"&gt;Camel book&lt;/a&gt; that laziness
is the first virtue of a programmer; Haskell gives it to you in spades.
&lt;/p&gt;

&lt;!--
&lt;p&gt;
And yes, we can change the size of that buffer using 
&lt;a href="http://haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html#v%3AhSetBuffering"&gt;hSetBuffering&lt;/a&gt;:
&lt;/p&gt;

&lt;blockquote&gt;
&lt;pre&gt;
import System.Environment
import System.IO

main = do
  [f,g] &lt;- getArgs
  fh &lt;- openFile f ReadMode
  gh &lt;- openFile g WriteMode
  mapM (\h -&gt; hSetBuffering h (BlockBuffering (Just 102400))) [fh, gh]
  hGetContents fh &gt;&gt;= hPutStr gh
&lt;/pre&gt;
&lt;/blockquote&gt;
--&gt;</description><link>http://blog.kfish.org/2006/12/introductory-haskell-programming-in.html</link><author>Conrad Parker</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-9101292118679422945.post-7635455074903435207</guid><pubDate>Sun, 03 Dec 2006 10:52:00 +0000</pubDate><atom:updated>2007-06-03T10:05:50.939+09:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>kyoto</category><title>Living in Kyoto</title><description>&lt;p&gt;
Autumn leaves are falling, and the days are becoming cooler. Awesome hacking
weather.
&lt;/p&gt;
&lt;p&gt;
It's now two months since I came to Japan. I spend half my time studying
Japanese, half my time coding and doing research, and half my time partying.
I've done a fair bit of sightseeing around Kyoto's
&lt;a href="http://images.google.co.jp/images?q=kyoto+temple"&gt;temples&lt;/a&gt;,
&lt;a href="http://images.google.co.jp/images?q=kyoto+shrine"&gt;shrines&lt;/a&gt; and
&lt;a href="http://images.google.co.jp/images?q=kyoto+garden"&gt;gardens&lt;/a&gt;,
partied in Tokyo with
&lt;a href="http://www.e-sa.org/"&gt;Alex&lt;/a&gt;,
&lt;a href="http://www.vergenet.net/~horms/"&gt;Horms&lt;/a&gt; and
&lt;a href="http://www.rasterman.com/"&gt;Raster&lt;/a&gt; a couple of times,
and learned my way around the local
&lt;a href="http://www.kyopro.kufs.ac.jp/dp/dp01.nsf/ecfa8fdd6a53a7fc4925700e00303ed8/fa5329ae6259f7924925704a00285658!OpenDocument"&gt;ramen&lt;/a&gt;
joints.
My favourite cocktail bar stocks over 20 different varieties of gin,
including their own pepper-infused variety which tastes great with
stingray (flamb&amp;eacute;d in your face, with spiritus). Student life is hell.
&lt;/p&gt;

&lt;p&gt;
I've been car-free for two months now; it feels great to kick the gasoline
habit. I ride a mama-chari, a steel-framed bicycle with a basket on the front.
At first I thought of it as the cheap, chunky brick of a bike that it is. Then
I realised that I have about the same strength to bike-weight ratio as I did
when I was a kid on a BMX. So now I take every opportunity to bunny-hop random
obstacles, get air off pavement and jump gutters. For some reason old people
look at me funny when I do that.
&lt;/p&gt;

&lt;p&gt;
As soon as I arrived and got my alien registration sorted, I sat down on a
borrowed laptop to review the 350 awesome paper submissions for
&lt;a href="http://lca2007.linux.org.au/"&gt;linux.conf.au 2007&lt;/a&gt;. The
&lt;a href="http://lca2007.linux.org.au/Programme"&gt;programme&lt;/a&gt; is packed
full of more awesome than a noseful of wasabi. Go there or lose, your choice.
&lt;/p&gt;

&lt;p&gt;
More recently I've been helping organise
&lt;a href="http://www.annodex.org/events/foms2007/"&gt;FOMS 2007&lt;/a&gt;, a developer
workshop for free and open source media software, the week before LCA.
It too will rock, and give us a great opportunity to plan some real integration
between projects, like video editors + annodex + wikipedia. Rock on!
&lt;/p&gt;

&lt;p&gt;
&lt;b&gt;Recent hacking&lt;/b&gt;:
&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://www.annodex.net/~conrad/software/hogg.html"&gt;HOgg&lt;/a&gt;:
a new commandline tool and Haskell library for manipulating Ogg files.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://bother.kfish.org/wiki/BlenderNotes"&gt;Blender Scripts&lt;/a&gt;:
Getting Blender to &lt;emph&gt;model&lt;/emph&gt;, animate and render non-interactively.&lt;/li&gt;
&lt;li&gt;&lt;a href="http://www.opencroquet.org/"&gt;Croquet&lt;/a&gt;: An open source,
fully hackable 3D metaverse. I'm just learning my way around -- come play in
#croquet on irc.freenode.net!&lt;/li&gt;
&lt;/ul&gt;</description><link>http://blog.kfish.org/2006/12/living-in-kyoto.html</link><author>Conrad Parker</author></item></channel></rss>