blog.kfish.org

My name is Conrad Parker, and I live in Kyoto, Japan. I work with Renesas in Tokyo, designing the Linux multimedia architecture for a new line of mobile processors; and for Wikimedia Foundation, working on Ogg integration for Mozilla Firefox. I am also working towards a PhD in Computer Science at Kyoto University. Free software projects include the Sweep sound editor and the Annodex media system, and various smaller ones that you can read about here.

Follow me on Twitter: @conradparker.

Wednesday, 8 April 2009

Release: libfishsound 0.9.2

Fishsound has moved to Xiph.org! The new home page is at http://www.xiph.org/fishsound/.

New in this release

This release contains security and other bugfixes:

  • Security fixes related to Mozilla bugs 468293, 480014, 480521, 481601.
  • Fix bounds checking of mode in Speex header
  • Handle allocation failures throughout due to out of memory
  • Added support for libFLAC 1.1.3
  • Add conditional support for speex_lib_get_mode() from libspeex 1.1.7 If available, this function is used in place of static mode definitions. For ticket:419
  • Check for Vorbis libs via pkgconfig, required for MacPorts etc.

Labels: ,

A proposal for generalizing the byte-range referral HTTP Response header

Re: the Media Fragments WD. Here I am using the term "byte-range referral" for multiple concatenated HTTP requests, for the purpose of improving cacheability; this is called a "4-way handshake" in the current working draft.

Shortcomings of the existing byte-range referral scheme

The above WD, and the current Annodex scheme, are specified to allow sharing of non-header data between different temporal views of media resources. They limit the positioning of custom data to the media headers. different segments to have different headers, which is useful for Ogg but not necessarily so for other formats.

Even for Ogg, it could be useful to refer to the codebooks separately from the Skeleton for more finely-grained data re-use. Then a client can locally cache the codebooks and know not to bother retrieving them over and over; but to get the updated skeleton and keyframe data for temporal segment requests.

Hence, I am proposing that we should specify an ordered list of tuples of (URI, byte range) which the concatenation of is byte-wise identical to the byte contents of the requested URI

This response can also contain data, so if you want to refer to this response you can include a tuple of (this, range) where this is the literal string "this", and refers to the body of the current response.

This syntax then allows the server to include parts from many different URLs. The custom data is then centralized in this response, and can be used for any parts of construction of the response so that it can be used for tail data (such as ID3 tags, divx seek tables etc.)

List and tuple separator characters

The list separator should be commas, as this then allows the list to be separated over HTTP response lines (without re-ordering).

Hence the tuple separator should not be commas; it can simply be whitespace:

Range-Referral: http://www.example.com/video.ogv?headers 0-1280
Range-Referral: http://content1.example.com/video.ogv 5380-48204
Range-Referral: this 0-950
Range-Referral: http://content1.example.com/video.ogv 60880-238382

By comma replacement, this set of headers is equivalent to the single header:

Range-Referral: http://www.example.com/video.ogv?headers 0-1280, http://content1.example.com/video.ogv
5380-48204, this 0-950, http://content1.example.com/video.ogv 60880-238382

Interpretation of other response headers

The body of this request is simply all the custom parts for this view, concatenated bytewise. The Range-Referral header explains how to use this data.

Content-Length: is the length of the body.

A Range request is made relative to the body. So for example a client could just do a HEAD request to get the Range-Referral headers, and then do multiple Range requests to retrieve the reqired parts in sequence (rather than locally caching all the data for tailers etc.). Coherence of the concatenated responses can be assured by the use of existing HTTP/1.1 caching identifiers.

So, this constructed response is only special in that a user agent knows how to use it in conjuction with other URI response data to display a media segment. Otherwise it is standard HTTP, and can have caching headers/tags attached, be cached by intermediate proxies, and itself be the subject of range requests.

Generalization to other segment types

This mechanism allows a complex sequence of byte-ranges to be specified. It explicitly marks data ranges which are re-usable, allowing them to be cached. It generalizes so that any complex data subview can be served, where re-usable data is keyed canonically and can be cached on the network.

For example, it may be useful for specifying the data for a spatial subrange of video.

Labels: , ,

Friday, 3 April 2009

liboggplay, liboggz, libfishsound migrated to git.xiph.org

The source repositories for some Ogg libraries developed as part of the Annodex project have moved from from svn.annodex.net to git.xiph.org. These libraries are:

  • liboggplay, an Ogg Theora playback library used by Mozilla Firefox;
  • libfishsound, a simplified API for using audio codecs, used by liboggplay and the by the DirectShow Oggcodecs; and
  • liboggz, a library for seeking, reading and writing Ogg (used by liboggplay), and tools for managing Ogg streams. This includes oggz-chop, which is used by various sites including the Internet Archive to serve Ogg files.

Reasons for the migration

Xiph.org, which develops free codecs (Ogg Vorbis, Theora, Dirac, Speex, CELT, FLAC), already provided the hosting for Annodex.net projects. The move to the xiph.org domain reflects that these libraries are recommended for general use by projects requiring Ogg support.

The move from Subversion to Git allows for distributed development, letting developers without write access to the central Subversion repository develop code using a version control system, and making it easier for developers and packagers to track multiple independent changes. Among distributed version control systems, Git was chosen for its flexibility and popularity. It is already used within Xiph.org for Speex, the ultra-low latency, high quality audio codec CELT, and the experimental text overlay codec Kate.

Checking out the sources

To do a fresh checkout of the code, make a new git repository This assumes that you begin with an empty working directory:

$ git clone git://git.xiph.org/liboggz.git

Adding a remote to an existing git-svn checkout

Many developers already used git-svn to access the previous svn repositories. In this case you will already have a local git clone of the sources, perhaps with your own local changes. In that case, simply add a new remote to your existing repository, eg.:
$ git remote add xiph git://git.xiph.org/liboggz.git

Labels: , ,

Wednesday, 1 April 2009

Discovery and fallback for media segment addressing over HTTP

This post concerns the use of queries or fragments in the URI specification for accessing segments of media over HTTP. We outline the user-visible differences between the two approaches, including the form of the URIs seen by users in each scenario and the consequent user interface activity, and then explain the HTTP request and response mechanisms that result. The purpose of this analysis is to better understand the trade-offs in usability and the impact on network performance, with reference to existing implementations rather than hypothetical scenarios.

I will make the case that the user-visible differences between the two syntaxes are immaterial, and that a more important distinction is that they induce different protocols. I will also claim that the use of the fragment syntax introduces unnecessary complexity in that it lacks a discovery mechanism and has no useful fallback to existing HTTP.

User-visible differences

We are constructing a URI syntax for addressing segments of media data. Taking the simple case of addressing some video content beginning at an offset of 10 seconds, we consider the two forms:

  • Query syntax: http://www.example.com/media.ogv?t=10
  • Fragment syntax: http://www.example.com/media.ogv#t=10

For simplicity here we are using a shortened segment identifier t=10; I touched on the topic of segment identifiers in a recent article about pretty printing durations.

Regarding the direct HTTP semantics of these two forms, if the user is already viewing the specified media.ogv, the query syntax reloads the portion from 10 seconds as a new resource, whereas the fragment syntax modifies the view of the current resource.

Although developers are rightly wary of a page refresh due to the time required to render complex HTML, in practice no visible change occurs when reloading a video. The query syntax has been used to control video seeking in JavaScript (using the Java cortado video player plugin, or an earlier Oggplay plugin), and also natively in the current Firefox 3.5 implementation.

In any case, this distinction is only user-visible if the video is the top-level resource. In the common case of a web page that embeds a video, the user-visible resource is the HTML page. In this case, the mechanism for controlling video is under the control of the embedding web page via JavaScript.

For example, URIs to YouTube pages allow a time segment to be appended using a fragment syntax. However, this fragment is used by JavaScript to control the embedded Flash video player; the mechanism for then retrieving video data is then managed by the Flash player. Similarly, in HTML5 Ogg <video> implementations, a fragment identifier appended to the HTML page may be interpreted by JavaScript to control seeking in the <video> source using a non-fragment mechanism, like query syntax.

Differences in request mechanisms

Either way we introduce a new behaviour that user agents can use to retrieve media segments over HTTP.

When handling a media segment which is specified by a query, the user agent initiates a standard HTTP request. It connects to port 80 on the specified host, and uses the entire path, including the query specifer, in the GET request. The server then begins transferring the required data representing that segment of the media.

To retrieve the URI http://www.example.com/media.ogv?t=10:

GET /media.ogv?t=10 HTTP/1.1
Host: example.com

However the proposed request mechanism for handling a segment specified by a fragment is not standard HTTP. In conventional HTTP, a fragment specifier is stripped by the user agent and not sent to the server at all; rather, the server sends the requested response (representing the entire resource), and after retrieval, the user-agent uses the fragment specifier to select the view shown to the user.

A recently proposed behaviour for handling media segments involves placing the segment specifier into the Range HTTP Request header, with a new units of seconds.

To retrieve the URI http://www.example.com/media.ogv#t=10:

GET /media.ogv?t=10 HTTP/1.1
Host: example.com
Range: seconds=10-

Response mechanism: byte-range redirection

The byte-range redirection response mechanism involves identifying parts of the segment view which are byte-wise identical to the original resource, and specifying redirections to those.

How discovery works

A user-agent will only receive a byte-range redirection response if it has indicated that it is capable of interpreting that, by including an extra HTTP request header. For example, here using a media segment URL specified with a query parameter:

GET /media.ogv?t=10 HTTP/1.1
Host: example.com
X-Accept-Range-Redirect: bytes

If the server is capable of handling the byte-range redirection mechanism, it will do so and indicate that it has done so explicitly in its response headers.

Query syntax has a sensible fallback to standard HTTP

However if the extra request header is not present, the server will simply send an entire response corresponding to the requested segment. Similarly if the header is present but the server is not capable of this new mechanism, it will simply continue with a standard HTTP response. The client can tell if the response is a segment response or not by the presence of an acknowledging response header.

If either client or server does not understand the byte-range redirection protocol, the request falls back to standard HTTP and the required segment is correctly returned. The cost of this fallback, compared to the case where both client and server understand the new request/response headers, is a loss of cacheability for subsequent overlapping segment requests.

Fragment syntax has a high cost of failure

The mechanism involving the fragment specifier does not have a fallback to standard HTTP: if the client does not understand that it should add the Range header with newly defined units, then it will end up simply requesting the entire resource. Similarly, if the server does not understand the new header then it will simply respond with the entire resource. If the cost of failure is to download some number of hours of extra video, as it would be in the case of MetaVid's congress proceedings, that is a prohibitive cost.

Summary

  • The distinction is one of protocol mechanism
  • For the common case of video displayed in HTML, the distinction is not user-visible
  • The use of fragment specifiers do not have a fallback to standard HTTP
  • The cost of discovery failure for fragments is high (retrieval of entire resource)

Actions

  • To clarify within the Media Fragments WG how queries can be used effectively, for both considered user scenarios.
  • To consider how the byte-range redirection mechanism can be generalized for other segment specifiers, such as spatial regions.

Labels: , ,

Tuesday, 24 February 2009

Is OpenMAX important for Free Software?

Much as OpenGL gives you access to 3D hardware, OpenMAX allows you to take advantage of hardware codecs. This is a brief overview introducing what OpenMAX is, explaining why it is useful for the open source community, and outlining steps for integration with free codecs, and open source multimedia frameworks and applications.

What is OpenMAX?

OpenMAX is a set of C APIs specified by the Khronos Group (who also co-ordinate standards like OpenGL and OpenAL). Whereas media frameworks like GStreamer and DirectShow are quite generic, providing all capabilities from codec integration through to synchronization of playback and recording and network access, OpenMAX more strictly defines three layers of operation:
  • OpenMAX IL (Integration Layer) is an interface to multimedia codecs implemented in hardware or software. It does not provide any interfaces for synchronized capture or playback of video and audio.
  • OpenMAX DL (Development Layer) APIs "specify audio, video and imaging functions that can be implemented and optimized on new CPUs, hardware engines, and DSPs and then used for a wide range of accelerated codec functionality such as MPEG-4, H.264, MP3, AAC and JPEG."
  • OpenMAX AL (Application Layer) provides acceleration of capture and presentation of audio, video, and images.
The significance of this layering is that it allows hardware and software developers to implement conformance to a particular layer, so that device manufacturers can more reliably integrate components from each. This creates a free market for media components as commodities; and of course open source businesses are well suited to operating in such an environment.

OpenMAX is already availabile in generally open source platforms like Maemo and Android. As part of my work with Renesas I've been developing OpenMAX IL components for the video encoding and decoding hardware on the SH-Mobile processor series. (However, this post does not necessarily reflect the views of my employer).

Open Source implementations

OpenMAX components implement a specific C API. All components need to manage their ports and synchronize access to their input and output data buffers, so implementations generally include a shared library for the IL core, as well as some OpenMAX components required to pass Khronos conformance tests. There are (at least) three open source implementations of OpenMAX IL:
  • Bellagio, developed mainly by STMicroelectronics and Nokia.
  • TI have an implementation of OpenMAX for OMAP.
  • OpenCore, the multimedia framework used by the Android platform, includes an open source implementation of OpenMAX IL. [gitweb]
So far I've been working with Bellagio, which has an active open source community. It has a good balance between commercial concerns like manufacturer deadlines and conformance testing, and openness to the community by encouraging and integrating development forks, and having a responsive mailing list and bug tracker.

Xiph OpenMAX

I haven't mentioned specific codecs yet; OpenMAX currently encourages use of non-free codecs like MP3, MPEG-4 and H.264. This in itself is not good for the aims of Free Software, but I think that the API standardization that OpenMAX offers can simplify the productization of hardware implementations of free codecs.

Xiph.org develops free codecs (Ogg Vorbis, Theora, Dirac, Speex, CELT, FLAC). Ogg Vorbis is required by the OpenMAX IL specification, but there are not yet any other OpenMAX IL implementations of the other codecs. Developing software OpenMAX IL components will allow application developers to implement Ogg support ahead of hardware support. It would also give hardware manufacturers a set of specific, well-defined goals for implementing Ogg support, with the understanding that the hardware components, when shipped with these software control APIs, will work in a variety of open source applications with minimal modifications.

There were a few Xiph.org people at FOMS 2009, so I introduced what we'd need to do to implement OpenMAX IL components for Xiph.org codecs:

  • Choose an OpenMAX IL framework
  • Implement generic Ogg mux/demux components (instead of single Ogg Vorbis component)
  • Implement IL components for each codec (Theora, Dirac, Speex, CELT, FLAC)
  • Implement GStreamer OpenMAX plugins for each codec

A recent thread, [Flac-dev] FLAC support for Android?, discusses requirements for implementing OpenMAX IL component for the lossless audio codec FLAC.

Free Software application support

In order to make use of OpenMAX components, applications need to either use the OpenMAX APIs directly or use a framework which does. For example, there is already an OpenMAX-GStreamer project which implements GStreamer plugin wrappers for Bellagio OpenMAX IL components. This allows any GStreamer application to take advantage of hardware codecs when they are available, or fall back to software implementations otherwise. This fits well with the GStreamer project's stated aim of of not implementing codecs, but providing routing, discovery and synchronization.

Other applications will need to use OpenMAX directly; good candidates would be applications that target mobile/embedded systems like Gnash, Fennec, WebKit and VoIP clients, as well as server-side transcoding or rendering software that needs high throughput.

Remember this:

  • Mobile processors increasingly have hardware units for video encoding and decoding, as well as audio and image processing
  • OpenMAX gives you access to hardware codecs (audio/video, image processing etc.)
  • Implementing OpenMAX components for free codecs will give manufacturers a clear path to hardware implementation

At some point in the near future it'd be great to get a few open source OpenMAX implementers together at a conference, ideally at a more general multimedia workshop like FOMS to discuss application integration. Perhaps at FOMS 2010, or FOMS Europe? In any case it'd be good to get some more discussion going: do you think OpenMAX is important for Open Source, and for Free Software? What other barriers do you think there are to hardware support for free codecs? And would you be interested in helping out with developing and testing OpenMAX support for your favourite codecs, and in your favourite applications?

Labels: ,

Tuesday, 23 December 2008

Release: HOgg 0.4.1

A new release of HOgg, on Hackage:

This contains updates to work with Hackage, the Haskell source package system; and also a new hogg man subcommand to generate man pages for subcommands.

Updated for Hackage

Hackage is Haskell's source packaging system. It makes it very easy to keep up to date with bleeding-edge releases.

You'll need the cabal command. This is already in Gentoo (emerge cabal) and Arch Linux (pacman -S cabal-install). If you're on a system where cabal is not already packaged, you'll first need to install GHC (eg. apt-get install ghc6 on Ubuntu 8.10 or Debian Lenny systems), then:

$ wget http://hackage.haskell.org/packages/archive/cabal-install/0.6.0/cabal-install-0.6.0.tar.gz
$ tar zxf cabal-install-0.6.0.tar.gz
$ cd cabal-install-0.6.0
$ chmod +x bootstrap.sh
$ ./bootstrap.sh

This will download and build the packages required to set up cabal. From there, a new Haskell package like hogg can be installed by simply doing:

$ cabal update
$ cabal install hogg

This will build and install hogg into $HOME/.cabal/bin (which of course you should add to your $PATH if you actually want to use anything you install via cabal :-)

man page output of self-documentation

hogg already generated its own help text, with runtime checking of example syntax. This release adds a hogg man subcommand which generates the same help text in Unix man page format:

$ hogg man man

.TH HOGG 1 "December 2008" "hogg" "Annodex"
.SH SYNOPSIS

.B hogg
.RI man
[
.I OPTIONS
]


.SH DESCRIPTION
Generate Unix man page for a specific subcommand (eg. "hogg man chop")

.SH OPTIONS
  -h, -?  --help     Display this help and exit
  -V      --version  Output version information and exit

.SH EXAMPLES
.PP
Generate a man page for the "hogg chop" subcommand:
.PP
.RS
\f(CWhogg man chop\fP
.RE
.SH AUTHORS

hogg was written by Conrad Parker

This manual page was autogenerated by
.B hogg man man.

Please report bugs to <ogg-dev@xiph.org>

Labels: ,

Friday, 4 July 2008

Release: liboggz 0.9.8

liboggz 0.9.8 includes the first release of oggz-chop, as well as support for the new karaoke codec OggKate.

oggz-chop can be used to serve time ranges of Ogg media over HTTP by any web server that supports CGI. The oggz-chop binary simply checks if it is being run as a CGI script by checking some environment variables, and if so acts based on the CGI query parameter t=, much like mod_annodex. It accepts all the time specifications that mod_annodex accepts (npt and various smpte framerates), and start and end times separated by a /.

All you need to do is set up the following Apache config:

ScriptAlias /oggz-chop /usr/bin/oggz-chop Action application/ogg /oggz-chop

, and all your Ogg files will be handled with oggz-chop, which means that you can put a time range on the end, like:

http://www.example.com/candidate_speech.ogv?t=00:23/00:26

The minimal amount of data required to play the section between 23 and 26 seconds will be sent to you, such that it plays back immediately from the time requested. As for caching, it generates Last-Modified HTTP headers, and responds correctly to If-Modified-Since conditional GET requests.

It implements the same chopping algorithm as the Haskell version hogg chop, released in HOgg 0.3.0, so it will insert an Ogg Skeleton track which can give players hints about what time the in-sync audio and video data should start being rendered, and if any of the input files include Skeleton information that will be preserved, and the output will contain only one Skeleton track.

Many thanks to Michael Dale, j^ and John Ferlito for testing out oggz-chop during its development.

Labels:

Monday, 7 April 2008

Release: libfishsound 0.9.1

This is a maintenance release, fixing a security vulnerability in Speex header processing as outlined in oCERT 2008-02. When used in a client for web video content, as in the OggPlay Firefox Plugin or the Ogg DirectShow filters, a specially crafted Ogg Speex stream hosted on a server could be used to allow an attacker to execute arbitrary code on the client system. The OggPlay plugin binaries available from www.annodex.net have already been updated.

Details

The Speex header contains a 32-bit modeID field, interpreted by libspeex as a signed int (spx_int32_t) The normal way to use this is to index into a global mode list to retrieve a SpeexMode *:

mode = (SpeexMode *)speex_mode_list[modeID];
and then use that to set up a decoder:
st = speex_decoder_init(mode);
This calls speex_decoder_init() in libspeex, which looks like:
void *speex_decoder_init(const SpeexMode *mode)
{
   return mode->dec_init(mode);
}
So if you don't check that the modeID given in the stream header is within the bounds of speex_mode_list[], arbitrary code can be executed. libfishsound was checking the upper bound (modeID < SPEEX_NB_MODES) but was not checking against negative values.

Discussion

This header processing is all boilerplate, and a reference implementation is given in speexdec.c. I took a copy of that about 7 years ago for Sweep, which I then adapted for libfishsound. The current reference speexdec.c does not have this bug.

For the Symbian port of Speex we created a function which returns the desired mode given a modeID, rather than having application code index into a global mode list. I wrote and committed speex_get_mode() to libspeex in September 2004, and it does the correct bounds checking. So if I'd been using that function in libfishsound then today's problem would never have happened. As it turns out, the libfishsound svn trunk version of speex.c does use that function. As far as I am aware, the OggPlay plugin binaries have always been built against the libfishsound svn trunk, so they were never vulnerable in the first place. However, recent tarball releases of libfishsound have been coming of a separate branch, so the advisory is valid for applications linked against those releases.

Finally, I sent a patch to Jean-Marc Valin yesterday which entirely removes the possibility of this bug happening again by bounding the mode values returned by speex_packet_to_header() in libspeex. It will be available very soon in a libspeex release.

Acknowledgements

Thanks to the team at oCERT for the efficient reporting of this advisory, and to the anonymous submitter for the details. I was able to patch the offending branches, which allowed j^ to build and upload new OggPlay plugin binaries (within 24 hours of contact by oCERT).

Labels: , ,

Tuesday, 25 March 2008

Release: HOgg 0.4.0

HOgg is a Haskell library and commandline tool for manipulating Ogg files. This release contains a bunch of code written during FOMS and LCA 2008, including a new sort subcommand and proper handling of Skeleton when merging and ripping files. Full details are in the release notes.

sort implementation

My favourite part is the implementation of the new sort subcommand:

sort :: [OggPage] -> [OggPage]
sort = sortHeaders . listMerge . demux

This is somewhat shorter than the equivalent C implementation, oggz-sort.cHaskell affords abstraction whereas in C it's a trade-off. sortHeaders is a long (21 line) function that re-orders header pages according to the Theora and Skeleton specifications, and listMerge is a generic list merging function, also used in the merge subcommand. demux is tiny:

demux :: (Serialled a) => [a] -> [[a]]
demux = classify serialEq
You can read that as "demux is classification by serial number": classify is a generic list function, classifying list elements according to some criterion you give it. Here, for example, the list of pages:
[Video0, Audio0, Video1, Audio1, Audio2, Audio3, Video2, Audio4, Video3, ...]
will get classified into two separate lists:
[[Video0, Video1, Video2, Video3, ...],
 [Audio0, Audio1, Audio2, Audio3, Audio4, ...]]
This is done lazily, meaning that the processing is done on the fly and big intermediate lists are not constructed in memory. Video0, Audio0 will be passed through listMerge and sortHeaders and written to disk by the consumer of sort well before Video103 and Audio5007 are seen.

Documentation improvements and self-checking

The help for each subcommand now contains long descriptions, mostly similar to the man pages of the Oggz tools. The descriptions also have explicit sections describing how Theora, Skeleton and chained files are handled. The example commandlines for each subcommand use the Ogg MIME types and file extensions that we are now recommending in Xiph.Org.

The best bit though is hogg selfcheck, which checks that the help examples are valid. It checks that all the example commandlines pass through getOpt without errors, and that all file extensions used in options are valid. This is the kind of nice touch which would have been a pain to code up in C, but fell out cleanly in the Haskell implementation. As it is fairly cheap to run (and printing help text is hardly a performance-critical operation), this option is also silently run after printing out any help output at all, so that such errors are more likely to be found and reported. The same commit that introduced hogg selfcheck also fixed two such documentation errors which were found by this option :-)

Labels: ,

Friday, 15 February 2008

Release: liboggz 0.9.7

There's been a whole bunch of work on liboggz recently; it deserves a few more weeks of shaking out and perhaps some updated Win32/MacOS support before it gets 1.0 slapped on it.

liboggz 0.9.7 includes a new tool called oggz-sort, which addresses a problem with some encoders that Shane Stephens brought up at FOMS. The discussion was going around in circles, so my response was to write this C code. It implements a function that Shane has written but not yet released in his OCaml implementation of Ogg (oogg), and which I've written but not yet released in my Haskell implementation (HOgg). Of course, people will take this version more seriously because it's written in C.

From oggz-sort (1):

oggz-sort sorts an Ogg file, interleaving pages in order of presentation time. It correctly interprets the granulepos timestamps of Ogg Vorbis, Speex, FLAC and Theora bitstreams, and all bitstreams of Annodex files.

Some encoders produce files with incorrect page ordering; for example, some audio and video pages may occur out of order. Although these files are usually playable, it can be difficult to accurately seek or scrub on them, increasing the likelihood of glitches during playback. Players may also need to use more memory in order to buffer the audio and video data for synchronized playback, which can be a problem when the files are viewed on low-memory devices.

The tool oggz-validate can be used to check the relative ordering of packets in a file. If out of order packets are reported, use oggz-sort to fix the problem.

This release also adds support for the experimental CELT audio codec, which is being developed by Jean-Marc Valin (the primary author of Speex). CELT is designed as a low-latency codec for high-quality audio. When wiretapping conversations encoded in CELT, we recommend that you record using the Ogg container format. You can then use oggz-tools to help with your analysis.

Labels:

Sunday, 13 January 2008

Release: liboggz 0.9.6

This release of Oggz 0.9.6 contains a new tool, oggz-comment, which can be used to edit the basic metadata (title, producer, copyright etc.) of Ogg Theora files. The library also has some pretty major improvements to the way it works out timestamps and does seeking, mostly the work of Shane Stephens.

In media files, timing and synchronization is extremely important. If the image and audio start to go out of sync, it is very noticeable and the video quickly becomes unwatchable. When you scan through a file you often need to decode a lot more data than you actually display. This is particularly the case when you jump backwards, which is common in a user interface that supports scrubbing. As video frames are stored as a difference relative to earlier (or later) frames, you end up needing to secretly jump further back in the file to the previous keyframe, and then decode many frames up to the one you actually want to show. For a smooth user experience you need to do this as quickly as possible.

Ogg has some interesting framing properties. Given that timing is so important, you might expect that every packet has its precise timing information associated with it. In Ogg, it turns out not to be so. Packets are stored in pages, and there is only one timestamp per page. It is common for many audio packets to be crammed onto one page; the timing information for all the rest is not stored in the file. On the other hand, the encoded data for video keyframes is usually much larger, and spans multiple pages. Only the last packet on a page has its timestamp recorded, so if the keyframe is followed by an a much smaller packet of frame data in the same page, the timestamp for the keyframe will be lost. For these reasons I tend to refer to Ogg as a "lossy" container.

In order to minimize these problems, liboggz now inspects the encoded data in order reconstruct the expected granulepos (corresponding to a timestamp) for every packet in an Ogg stream. This allows applications to use reliable timestamps, even though these are only sparsely recorded in most Ogg streams. This is not as easy as it sounds, particularly for Ogg Vorbis. To get a flavour of what's involved, read Shane's rant in the comments, explaining how to calculate Vorbis timestamps.

For an in-depth discussion, come to Ralph Giles' talk at linux.conf.au, Seeking is hard: Ogg design internals.

Labels:

Saturday, 12 January 2008

Release: libfishsound 0.9.0

Now libfishsound 0.9.0 supports FLAC, the Free Lossless Audio Codec. The patches were originally contributed by Tobias Gehrig in 2004. There hasn't been much use of Ogg FLAC, whereas FLAC in its native encoding is very popular. However, the point of the Ogg mapping is to allow FLAC to be used in parallel with other codecs, in particular as the audio codec for video files. The combination of Theora video and FLAC audio can be very useful for music videos, where you might not care too much if the image has lost some quality but you want the sound to be as good as possible.

However, creating such a file isn't so easy. Let's say you have a source video, like GrooveTV #204 - Jacob Fred Jazz Odyssey. I took the MPEG-1 file as recommended; for clarity, let's call it source.mpg. To make a video to test on, I did:

ffmpeg2theora source.mpg
to encode the video into an Ogg file containing Theora video and Vorbis audio. This produces source.ogv.

oggzrip -c theora source.ogv -o video-theora.ogv
to extract only the Theora video track, into video-theora.ogv.

mpg123 -w source.wav source.mpg
to extract the audio to a wav file, source.wav. Here the audio in the source material was encoded as MPEG I layer II; obviously if you were producing a music video, you'd skip this step and encode FLAC from the original recording. I didn't have that here, and I just wanted a file I could test on.

However, at the least this step means that no further artifacts are introduced into the audio, other than those which were present in the MPEG encoding. If the only source material you have is already encoded, you don't want to degrade it further by re-encoding it with a different codec.

flac --ogg source.wav -o audio-flac.oga
to encode the audio. This produces an Ogg FLAC file called audio-flac.oga.

oggzmerge video-theora.ogv audio-flac.oga -o final.ogv
to merge the video and audio tracks into the final Ogg video file, final.ogv.

Note that we're using the recently recommended file extensions for Ogg video and audio.

If you know an easier way to create Ogg Theora+FLAC files, please leave a note in the comments :-)

Labels: ,

Tuesday, 11 December 2007

HTML5 for free media: Today on #whatwg

There has been a bit of FUD about Ogg Theora recently [2] [3]. So, over on #whatwg, one day before the W3C Video on the Web Workshop:

11:35:59 * Hixie casually removes Ogg from the spec and sees what happens
11:36:43 * othermaciej_ takes shelter
 ...

The editor of the HTML5 draft specification, Ian Hickson (Hixie), sent this message :

I've temporarily removed the requirements on video codecs from the HTML5 spec, since the current text isn't helping us come to a useful interoperable conclusion. When a codec is found that is mutually acceptable to all major parties I will update the spec to require that instead and then reply to all the pending feedback on video codecs.
12:05:02 <kfish> Hixie!
12:11:47 * kfish throws a tantrum on behalf of the free software community
 ...

However, the change didn't turn out to be so bad after all. The new text reads:

...; we need a codec that is known to not require per-unit or per-distributor licensing, that is compatible with the open source development model, that is of sufficient quality as to be usable, and that is not an additional submarine patent risk for large companies.

The previous draft stated no such requirements. As no rationale was given for choosing Ogg, that recommendation was easy to attack. Members of the MPEG LA, the cabal whose members receive money when people use content in MPEG formats, then had a fairly easy job of inciting flamewars on the whatwg list.

The new, clearer wording should allow more productive technical discussion, so that we can actually build an open standard which encourages anyone, anywhere, to publish their videos freely.

12:29:48 * kfish reads the replacement text and revokes the tantrum
12:30:15 <kfish> Hixie, actually you didn't casually remove Ogg, you made the case for Ogg stronger, so thankyou :-)
12:35:37 <Dashiva> "Lift the cat who was amongst the pigeons up and put him back on his pedestal for now."
12:35:40 <Dashiva> Poetic
12:37:49 <Hixie> kfish: :-)

Labels: ,

Thursday, 6 December 2007

Release: HOgg 0.3.0

Hogg is a commandline tool for manipulating Ogg files. It has subcommands, like hogg chop for cutting out bits of video, hogg info for telling you about the codecs, and hogg dump for hexdumping the packet data. It's basically a re-implementation of most of the stuff in liboggz, but the new features in hogg 0.3.0 such as chopping out a section of a file and adding Ogg Skeleton metadata, are not yet in oggz-tools.
$ hogg help chop
chop: Extract a section (specify start and/or end time)
Usage: hogg chop [options] filename ...

Examples:
  Extract the first minute of file.ogg:
    hogg chop -e 1:00 file.ogg

  Extract from the second to the fifth minute of file.ogg:
    hogg chop -s 2:00 -e 5:00 -o output.ogg file.ogg

  Extract only the Theora video stream, from 02:00 to 05:00, of file.ogg:
    hogg chop -c theora -s 2:00 -e 5:00 -o output.ogg file.ogg

  Extract, specifying SMPTE-25 frame offsets:
    hogg chop -c theora -s smpte-25:00:02:03::12 -e smpte-25:00:05:02::04 -o output.ogg file.ogg
Nevertheless, I'm continuing to work on both liboggz and hogg. liboggz, in pure C, is faster; hogg, in pure (but unoptimised) Haskell, is more correct. I spent a few hours earlier today tracking down a corner case in liboggz, coincidentally triggered by the chopping routines in libannodex. It reminded me that one of my first realizations about Haskell was that its sanity-checker often tells you about forgotten corner cases of algorithms.

Labels: ,

Sunday, 3 June 2007

Release: libfishsound 0.8.0

libfishsound provides a simple and consistent programming interface for decoding and encoding audio data using Xiph.Org codecs (Vorbis and Speex). This release includes compatibility with the floating point portion of the libfishsound development trunk API, in preparation for use with liboggplay. In order to build a minimal version of libfishsound for use with liboggplay, configure with encoding disabled in order to produce a smaller binary and to remove the dependency on libvorbisenc.

Labels: ,

Wednesday, 31 August 2005

Trivial unit testing and coverage checking for C libraries

Recently I added a version script and the start of a test suite (under valgrind) to the Theora video codec. Apart from locating a trivial leak in the encoder, doing so turned up some interesting oddities.

The Version Script lists all public symbols, and tells the linker to only export these. This was added to avoid symbol clashes with other libraries. This usage is similar to a .def file for MSVC. A Version Script also allows pattern matching and definition of multiple API versions.

The tests I added are fairly trivial. One is a 'noop' test which simply creates and destroys each kind of data structure the library provides. By using GNU Automake's TESTS_ENVIRONMENT to (optionally) run the tests under valgrind, we can determine if the library contains memory leaks in its constructors and destructors.

One of the tests uses all of the theora_comment_*() API functions, and checks the correctness of return values and errors. If we're happy that a set of tests covers all API functions, then we can be reasonably happy that if it passes, the API is:

  • completely exported by the linker (the test runs at all),
  • does not contain any memory leaks (as valgrind doesn't complain),
  • and is correctly implemented (as the test passes).

When using GNU Automake, make distcheck will fail if any tests fail. make distcheck should be used to create distribution tarballs, the point being that you ensure all tests pass before release. make distcheck also has other nice benefits like testing that install and uninstall works correctly.

One of the requirements for the theora reference implementation is to minimize dependencies. More detailed testing and analysis can be achieved with check and gcov, but the above is a fairly low-impact approach suitable for most C libraries.

Labels: ,

Sunday, 19 June 2005

Creative Commons tagging

To: linux-audio-user
Cc: advocacy@xiph.org

I've been following the rise and rise of music made with Linux, which have been announced on this list and Jan Weil has been listing.

Many of the released files have no licensing information. In most parts of the world, this implies "All Rights Reserved". If you are making music, or samples, that you are happy to share with others then you should consider tagging your files with a CreativeCommons license.

Embedding licensing information allows people using music browsers and search engines to _find your stuff_ (songs, samples, source materials -- it's up to you). We want Linux distributions to provide tools for people to find and use free media, and music made with Linux should be ready for that.

Creative Commons provide a guide to embedding licensing information and also more specific information about putting licensing information in Ogg Vorbis files.

This basically involves adding a LICENSE comment, such as:

LICENSE=Licensed to the public under http://creativecommons.org/licenses/by-sa/2.5/ verify at http://example.com/cclicenses.html

Using the commandline vorbis-tools, these tags can be added easily. To add licensing information to an existing Ogg Vorbis file:

  vorbiscomment -t "LICENSE=Licensed to the public ..." file.ogg
To add licensing information while encoding a WAV file to Ogg Vorbis:
  oggenc -c "LICENSE=Licensed to the public ..." file.wav

Please include the URL of the license you choose in the LICENSE tag. Information on CreativeCommons license choices is here.

Looking forward to a web of free music,

Conrad.

Labels: ,

Thursday, 9 June 2005

I ate snails

This week I have been staying with Thomas and Kristien, and their crazy cat Lunya. Last night I went to dinner at a Basque restaurant; when I got back, Thomas asked how it was.

<kfish> I ate snails. <thomasvs> Did you remove the shit from the end of the snails? <kfish> The wha???

In related work, we've been having Ogg fun preparing the GUADEC recordings.

Labels: