HTML5 Audio APIs – How Low can we Go?

By Mark Boas

“O ye’ll tak’ the high road, and Ah’ll tak’ the low (road)
And Ah’ll be in Scotlan’ afore ye.”
(The Bonnie Banks o’ Loch Lomond)

The web audio community are a vibrant bunch. No sooner had the standard <audio> API been established, than developers were clamouring for more. Just playing audio wasn’t enough, we wanted to analyse, react to and manipulate our audio. Happily, the browser makers obliged with first Mozilla, then Google producing enhanced web audio APIs for their browsers – the only problem was, they were two very different implementations. The Audio Data API implemented in Firefox exposed the data at a fairly low level, while Webkit’s Web Audio API provided a higher level abstraction providing a number of predefined functions. Luckily, it didn’t take long for the JavaScript community to react and start bridging the gap between the two, by writing libraries that provided a common API, libraries such as sink.js which smooths over low level differences. In turn, sink.js was used by ‘higher level’ libraries like audiolib.js – (a general purpose audio toolkit) and Audiolet (which provides a more musically intuitive API, with similar objectives to Webkit’s in-browser solution). There are many others, such as XAudioJS which sports a Flash® and base64 data url wav generation fallback, older projects like dynamic.js that just provides a Flash® fallback for the Audio Data API and DSP.js a Digital Signal Processing Library.

People really love messing about with audio.

Notice that the process of creating all this cool functionality didn’t come about from a W3C spec. Similarly, the Advanced Audio APIs were not the result of a W3C think-tank, but from two competing visions of what an advanced audio API should look like. Now it looks like the Web Audio API will be implemented in Safari as well as Chrome.

Once you create compelling functionality, developers will immediately start to use it. It may be experimental but developers will start to rely on it to make cool stuff. Cutting edge technology is seductive like that. I’m surer than sure that the Web Audio API has been well researched and has taken much inspiration from tried and tested APIs that exist outwith of our lovely browser based world (Apple’s Core Audio Frameworks, I believe), but I’m not convinced that you can really tell what web developers need or want until you give them something to play with.

Mozilla’s approach was to expose a very comprehensive low level API, which potentially allows JavaScript developers to create all the functionality of Webkit’s Web Audio API and then some. As a result we get libraries like JSMad cropping up. What does JSMad do? Significantly, it allows you to play MP3s in Firefox*. Is JavaScript fast enough? Apparently so. This was a ‘this changes everything’ moment for me and since then a similar approach has been taken by pdf.js and more recently Broadway.js which decodes H.264 on the fly.

*Neither Firefox or Opera support MP3 natively due to patent concerns.

I’m not saying Mozilla’s Audio Data API is perfect, there are issues with audio using the same thread as the UI and synch issues with multiple streams. However this is being addressed in the MediaStreams Processing proposal and it’s worth taking a look at it, even if it’s just for an insight into what future implementations could look like.

I’m digressing. The point is, if browser makers expose the low level API, developers will quickly come in and start writing libraries on top of that API. As is often the case, the developer community will start making things that the browser makers had never even considered. It makes sense, there are many more web developers than browser developers. Sure, web developers will bridge the gaps and polyfill over the cracks, which let’s face it, has been the only reasonable way of going forward with HTML5, but crucially they will also make new libraries that other developers can use – and all of this at very high rates of turnaround. Of course, the common-or-garden JavaScript developer has a series of enormous advantages over the browser API developer or the standards bodies that seek to define these APIs. I’m gonna name three here:

  1. Strong community — Web developers have a huge active and open community to draw from.

  2. Lower barrier to entry — The barrier of participation once something is put on something like github is virtually zero.

  3. Room to manouevre — Nothing web developers write is ever set in stone, JavaScript represents a much more fluid abstraction than the less flexible native browser code.

Ok, so bear with me here, and this is more of a question than a proposal – What if we separate concerns between browser makers and web developers when it comes to creating standards? Browser makers could concentrate on security, privacy, performance and exposing low level API’s in such a way that web developers can start to build libraries and APIs in the fluid, dynamic, iterative and extremely reactive manner that the web as a media allows. Once these libraries reach an acceptable level of adoption, browser makers can get together and decide which of these features they want to adopt based on tried and tested use cases, and yes make it a standard and build it into the browser. Wouldn’t we move forward more quickly that way? And as a bonus, no browser would be left behind as we’d be building the polyfills along the way.

In short, what I’m saying is that if the standard bodies put their energy into defining low level APIs, the high level APIs will look after themselves, or rather the community will look after them. After all it seems that the W3C themselves want a more community based approach to standards and besides we all know that bottom-up trumps top-down, right?

Outside my flat is an open space that the local council didn’t quite know what to do with, I’m sure they considered adding basket-ball hoops, concrete tables, a kid’s playground and all kinds of things. As it turned out they created a decent flat surface and pretty much left it as that. The users of this space, mostly children, decided this was a perfect space for playing soccer and improvised the space to include a hand drawn goal and pitch markings. If the council really wanted to make something permanent, they could take inspiration from this and create real goals and solid pitch markings.

It’s probably too late to change the Webkit implementation of the Web Audio API significantly, but I would strongly urge the developers of it to include a more comprehensive low level API in future releases. What’s the worst that could happen?

Improvised Goal Posts

A big thanks to Jussi Kalliokoski and Dustin Blake for helping with this post and deep respect and gratitude to all those hacking on audio.

➸ Follow me on Twitter

Tags: , , , , ,

Tuesday, November 15th, 2011 Audio, javascript 17 Comments