HTML5 Audio APIs – How Low can we Go?

Tuesday, November 15th, 2011

By Mark Boas

“O ye’ll tak’ the high road, and Ah’ll tak’ the low (road)
And Ah’ll be in Scotlan’ afore ye.”
(The Bonnie Banks o’ Loch Lomond)

The web audio community are a vibrant bunch. No sooner had the standard <audio> API been established, than developers were clamouring for more. Just playing audio wasn’t enough, we wanted to analyse, react to and manipulate our audio. Happily, the browser makers obliged with first Mozilla, then Google producing enhanced web audio APIs for their browsers – the only problem was, they were two very different implementations. The Audio Data API implemented in Firefox exposed the data at a fairly low level, while Webkit’s Web Audio API provided a higher level abstraction providing a number of predefined functions. Luckily, it didn’t take long for the JavaScript community to react and start bridging the gap between the two, by writing libraries that provided a common API, libraries such as sink.js which smooths over low level differences. In turn, sink.js was used by ‘higher level’ libraries like audiolib.js – (a general purpose audio toolkit) and Audiolet (which provides a more musically intuitive API, with similar objectives to Webkit’s in-browser solution). There are many others, such as XAudioJS which sports a Flash® and base64 data url wav generation fallback, older projects like dynamic.js that just provides a Flash® fallback for the Audio Data API and DSP.js a Digital Signal Processing Library.

People really love messing about with audio.

Notice that the process of creating all this cool functionality didn’t come about from a W3C spec. Similarly, the Advanced Audio APIs were not the result of a W3C think-tank, but from two competing visions of what an advanced audio API should look like. Now it looks like the Web Audio API will be implemented in Safari as well as Chrome.

Once you create compelling functionality, developers will immediately start to use it. It may be experimental but developers will start to rely on it to make cool stuff. Cutting edge technology is seductive like that. I’m surer than sure that the Web Audio API has been well researched and has taken much inspiration from tried and tested APIs that exist outwith of our lovely browser based world (Apple’s Core Audio Frameworks, I believe), but I’m not convinced that you can really tell what web developers need or want until you give them something to play with.

Mozilla’s approach was to expose a very comprehensive low level API, which potentially allows JavaScript developers to create all the functionality of Webkit’s Web Audio API and then some. As a result we get libraries like JSMad cropping up. What does JSMad do? Significantly, it allows you to play MP3s in Firefox*. Is JavaScript fast enough? Apparently so. This was a ‘this changes everything’ moment for me and since then a similar approach has been taken by pdf.js and more recently Broadway.js which decodes H.264 on the fly.

*Neither Firefox or Opera support MP3 natively due to patent concerns.

I’m not saying Mozilla’s Audio Data API is perfect, there are issues with audio using the same thread as the UI and synch issues with multiple streams. However this is being addressed in the MediaStreams Processing proposal and it’s worth taking a look at it, even if it’s just for an insight into what future implementations could look like.

I’m digressing. The point is, if browser makers expose the low level API, developers will quickly come in and start writing libraries on top of that API. As is often the case, the developer community will start making things that the browser makers had never even considered. It makes sense, there are many more web developers than browser developers. Sure, web developers will bridge the gaps and polyfill over the cracks, which let’s face it, has been the only reasonable way of going forward with HTML5, but crucially they will also make new libraries that other developers can use – and all of this at very high rates of turnaround. Of course, the common-or-garden JavaScript developer has a series of enormous advantages over the browser API developer or the standards bodies that seek to define these APIs. I’m gonna name three here:

  1. Strong community — Web developers have a huge active and open community to draw from.

  2. Lower barrier to entry — The barrier of participation once something is put on something like github is virtually zero.

  3. Room to manouevre — Nothing web developers write is ever set in stone, JavaScript represents a much more fluid abstraction than the less flexible native browser code.

Ok, so bear with me here, and this is more of a question than a proposal – What if we separate concerns between browser makers and web developers when it comes to creating standards? Browser makers could concentrate on security, privacy, performance and exposing low level API’s in such a way that web developers can start to build libraries and APIs in the fluid, dynamic, iterative and extremely reactive manner that the web as a media allows. Once these libraries reach an acceptable level of adoption, browser makers can get together and decide which of these features they want to adopt based on tried and tested use cases, and yes make it a standard and build it into the browser. Wouldn’t we move forward more quickly that way? And as a bonus, no browser would be left behind as we’d be building the polyfills along the way.

In short, what I’m saying is that if the standard bodies put their energy into defining low level APIs, the high level APIs will look after themselves, or rather the community will look after them. After all it seems that the W3C themselves want a more community based approach to standards and besides we all know that bottom-up trumps top-down, right?

Outside my flat is an open space that the local council didn’t quite know what to do with, I’m sure they considered adding basket-ball hoops, concrete tables, a kid’s playground and all kinds of things. As it turned out they created a decent flat surface and pretty much left it as that. The users of this space, mostly children, decided this was a perfect space for playing soccer and improvised the space to include a hand drawn goal and pitch markings. If the council really wanted to make something permanent, they could take inspiration from this and create real goals and solid pitch markings.

It’s probably too late to change the Webkit implementation of the Web Audio API significantly, but I would strongly urge the developers of it to include a more comprehensive low level API in future releases. What’s the worst that could happen?

Improvised Goal Posts

A big thanks to Jussi Kalliokoski and Dustin Blake for helping with this post and deep respect and gratitude to all those hacking on audio.

➸ Follow me on Twitter

Tags: , , , , ,

Tuesday, November 15th, 2011 Audio, javascript

17 Comments to HTML5 Audio APIs – How Low can we Go?

  • azakai says:

    Great post! I think exactly the same. Browsers should expose low-level APIs and let the web community innovate on top of that. This is working great with WebGL (a low-level API, with nice high-level JS libraries on top), and the same can work for audio.

  • Niloy Mondal says:

    Nice post. Loved the local council open space example.

  • Robert O'Callahan says:

    Oh, you already mentioned it! Great :-)

  • Dan Schultz says:

    Just to play devil’s advocate: it seems to me that this plan would fall prey to the same issues that come from any crowdsourced design. It will be really great for a lot of important things (i.e. identifying vital use cases) but it doesn’t do it all. In particular you risk losing touch with big pictures and under-represented needs (basically, you will align well with the general tool builder and developer, but that doesn’t mean you align well with the niche developer or future developers!)

    In short: if you only look at current developments how do you ensure a beautiful and well crafted future? What about the use case that isn’t as popular but is still incredibly important?

    If what you are saying is that the browser should support the lowest level access possible then, for APIs dealing with data streams, I totally agree. If you are saying that we should design all our standards based on how the tools are being used today and the current shape of the web then I worry about how agile our standards will be for the long term / for furthering larger visions and designs.

  • Jon Rimmer says:

    WRT. “what’s the worst that can happen”, I’m not an expert on the audio APIs, but Chris Roger’s comparison of them on the standards list ( http://lists.w3.org/Archives/Public/public-xg-audio/2011Feb/0011.html ) suggests that the low level nature of the Mozilla API caused problems around latency and glitching (alongside some other criticisms). Seems a little troubling.

  • PanG says:

    Isn’t it possible you want both high level and low level control? My understanding is that the JavaScriptAudioNode in the web audio api gives you exactly the low level, raw-array primitive that you want.

    Even beyond that, I also think that calling the web audio api high level is a bit of a stretch. Have you tried reading that spec? :) Some of the options available (like the convolution stuff) aren’t exactly “press a button, have something play.” It seems like the flexibility you want is there, and there is plenty of room to build the audio libraries we need on top of it.

  • MarkB says:

    @Dan – well I guess it’s all about scratching an itch. Yes, that old chestnut! :) If it looks like new functionality is required, giving the appropriate building blocks, JS developers can build any functionality that they require and of course others can benefit from that. Later on the decision can be made as to whether to build that into the browser.

    So the API landscape is continually evolving to fit developer’s and hence user’s needs. A reactive approach as opposed to a proactive one.

    It’s worth noting that this is the approach that has been taken with WebGL and we are reaping the benefits of that.

  • MarkB says:

    @Jon I mention the limitations of the Audio Data API in my post. Steps are being taking to resolve these issues. I am more concerned with the approach than the actual implementation. Bugs and glitches have a habit of being fixed but if we take the wrong approach, consequences can be much more serious.

    @PanG I’m more concerned about the low level stuff being implemented properly than the high level functionality being removed. My understanding is that the Web Audio API’s AudioNodes don’t quite cut the mustard. I applaud both the Firefox and Webkit attempts to take <audio> to the next level – I just think that some tweaking is required so that we can let things flourish.

    Hopefully somebody who works with both these APIs on a day-to-day basis can come in and explain the differences and pros and cons in detail.

  • Jussi says:

    @PanG The Web Audio API’s lower level API, JavaScriptAudioNode, is quite incomplete imho, the whole idea of the higher level API there is to shield you from what happens if you run audio in the same thread as the UI or other blocking operations, such as synchronous XHR, or whatever that happens. But then, at some point, you’d like to introduce a JavaScriptAudioNode and all these benefits are lost. This is a problem that could be solved by having the createJavaScriptAudioNode() take a Web Worker as an argument instead, and is currently in discussion in the Audio WG.

    Then the other problem is this bug: http://code.google.com/p/chromium/issues/detail?id=73062 , which prevents the developer from specifying the sample rate at which the node should work, so it’s kind of ironic that there’s all things fancy, convolution etc., but no resampling, which is very inefficient in JS because memory access is one of the slowest things in JS, not to mention the impact such operations have on the garbage collector.

    Then there’s the bigger problem, it’s not addressing enough use cases imho. Imagine you’re making a video editing tool, and you want to be able to add crossfades and such. You can take in audio from a video, and another one as well, and mix them nicely, but you’ll probably be streaming them. Then imagine the other video starts buffering, and the videos lose sync, and the users start complaining. Some of these other use cases for an audio API are addressed by the MediaStreamProcessing proposal.

    The thing is, I’d rather have a comprehensive audio API that addresses as many use cases as possible and is usable in most places in a similar way, the tools we can build ourselves, but if there’s not enough to build, it’s not much use. The way the Web Audio API is going is that it focuses on these tools instead of the use cases, and then new use cases would be implemented as new features (or nodes) of the API, and this makes up for a horrible feature detection mess. Requirements first, nice-to-have things next.

  • Schorny says:

    Hi.

    The problem with this approach is, that you tie down browsers. It is very hard to change an implementation if you need a lowlevel stable API. You can’t change your underlying infrastructure – which means we slow down progress and invention.

    Things like JavaScript virtual machines would be nearly impossible if we were granted low level access to the interpreter. Low level Audio APIs may restrict Access to Hardware Accelerated Decoding.

    Low Level APIs are great for Developers, don’t get me wrong – but they have their costs. We need to take that into account.

  • Saul says:

    Your proposal pretty much reflects the current process except that it is not formalised as such.

    Perhaps it even should not be formalised… artificial separation of concerns is bound to stir a whole mountain of religious wars and disagreements. I am not sure if such things moves the web forward.

    I for one am perfectly happy with things as they are. Let the ecosystem run its own ways, creativity hates process; especially when it is not confined to a single organization and isnt spontaneous.

  • [...] you want to know more about those and libraries to work around their differences, there is a great overview post available on Happyworm. Share this:TwitterFacebookLike this:LikeBe the first to like this post. ← Previous [...]

  • [...] in various environments. What this all meant is that we needed to use an advanced audio API. Now I’ve had fun investigating advanced audio APIs before, they differ from the standard audio APIs by being designed to allow you not only to play audio [...]

  • [...] Oh and another thing. Of course there is the Audio Data API of Firefox and the web audio proposal from Webkit available but getting those running in mobile devices will be a much bigger change. If you want to know more about those and libraries to work around their differences, there is a great overview post available on Happyworm. [...]