community

HTML5 Audio APIs – How Low can we Go?

By Mark Boas

“O ye’ll tak’ the high road, and Ah’ll tak’ the low (road)
And Ah’ll be in Scotlan’ afore ye.”
(The Bonnie Banks o’ Loch Lomond)

The web audio community are a vibrant bunch. No sooner had the standard <audio> API been established, than developers were clamouring for more. Just playing audio wasn’t enough, we wanted to analyse, react to and manipulate our audio. Happily, the browser makers obliged with first Mozilla, then Google producing enhanced web audio APIs for their browsers – the only problem was, they were two very different implementations. The Audio Data API implemented in Firefox exposed the data at a fairly low level, while Webkit’s Web Audio API provided a higher level abstraction providing a number of predefined functions. Luckily, it didn’t take long for the JavaScript community to react and start bridging the gap between the two, by writing libraries that provided a common API, libraries such as sink.js which smooths over low level differences. In turn, sink.js was used by ‘higher level’ libraries like audiolib.js – (a general purpose audio toolkit) and Audiolet (which provides a more musically intuitive API, with similar objectives to Webkit’s in-browser solution). There are many others, such as XAudioJS which sports a Flash® and base64 data url wav generation fallback, older projects like dynamic.js that just provides a Flash® fallback for the Audio Data API and DSP.js a Digital Signal Processing Library.

People really love messing about with audio.

Notice that the process of creating all this cool functionality didn’t come about from a W3C spec. Similarly, the Advanced Audio APIs were not the result of a W3C think-tank, but from two competing visions of what an advanced audio API should look like. Now it looks like the Web Audio API will be implemented in Safari as well as Chrome.

Once you create compelling functionality, developers will immediately start to use it. It may be experimental but developers will start to rely on it to make cool stuff. Cutting edge technology is seductive like that. I’m surer than sure that the Web Audio API has been well researched and has taken much inspiration from tried and tested APIs that exist outwith of our lovely browser based world (Apple’s Core Audio Frameworks, I believe), but I’m not convinced that you can really tell what web developers need or want until you give them something to play with.

Mozilla’s approach was to expose a very comprehensive low level API, which potentially allows JavaScript developers to create all the functionality of Webkit’s Web Audio API and then some. As a result we get libraries like JSMad cropping up. What does JSMad do? Significantly, it allows you to play MP3s in Firefox*. Is JavaScript fast enough? Apparently so. This was a ‘this changes everything’ moment for me and since then a similar approach has been taken by pdf.js and more recently Broadway.js which decodes H.264 on the fly.

*Neither Firefox or Opera support MP3 natively due to patent concerns.

I’m not saying Mozilla’s Audio Data API is perfect, there are issues with audio using the same thread as the UI and synch issues with multiple streams. However this is being addressed in the MediaStreams Processing proposal and it’s worth taking a look at it, even if it’s just for an insight into what future implementations could look like.

I’m digressing. The point is, if browser makers expose the low level API, developers will quickly come in and start writing libraries on top of that API. As is often the case, the developer community will start making things that the browser makers had never even considered. It makes sense, there are many more web developers than browser developers. Sure, web developers will bridge the gaps and polyfill over the cracks, which let’s face it, has been the only reasonable way of going forward with HTML5, but crucially they will also make new libraries that other developers can use – and all of this at very high rates of turnaround. Of course, the common-or-garden JavaScript developer has a series of enormous advantages over the browser API developer or the standards bodies that seek to define these APIs. I’m gonna name three here:

  1. Strong community — Web developers have a huge active and open community to draw from.

  2. Lower barrier to entry — The barrier of participation once something is put on something like github is virtually zero.

  3. Room to manouevre — Nothing web developers write is ever set in stone, JavaScript represents a much more fluid abstraction than the less flexible native browser code.

Ok, so bear with me here, and this is more of a question than a proposal – What if we separate concerns between browser makers and web developers when it comes to creating standards? Browser makers could concentrate on security, privacy, performance and exposing low level API’s in such a way that web developers can start to build libraries and APIs in the fluid, dynamic, iterative and extremely reactive manner that the web as a media allows. Once these libraries reach an acceptable level of adoption, browser makers can get together and decide which of these features they want to adopt based on tried and tested use cases, and yes make it a standard and build it into the browser. Wouldn’t we move forward more quickly that way? And as a bonus, no browser would be left behind as we’d be building the polyfills along the way.

In short, what I’m saying is that if the standard bodies put their energy into defining low level APIs, the high level APIs will look after themselves, or rather the community will look after them. After all it seems that the W3C themselves want a more community based approach to standards and besides we all know that bottom-up trumps top-down, right?

Outside my flat is an open space that the local council didn’t quite know what to do with, I’m sure they considered adding basket-ball hoops, concrete tables, a kid’s playground and all kinds of things. As it turned out they created a decent flat surface and pretty much left it as that. The users of this space, mostly children, decided this was a perfect space for playing soccer and improvised the space to include a hand drawn goal and pitch markings. If the council really wanted to make something permanent, they could take inspiration from this and create real goals and solid pitch markings.

It’s probably too late to change the Webkit implementation of the Web Audio API significantly, but I would strongly urge the developers of it to include a more comprehensive low level API in future releases. What’s the worst that could happen?

Improvised Goal Posts

A big thanks to Jussi Kalliokoski and Dustin Blake for helping with this post and deep respect and gratitude to all those hacking on audio.

➸ Follow me on Twitter

Tags: , , , , ,

Tuesday, November 15th, 2011 Audio, javascript 17 Comments

Accessibility, Community and Simplicity

Mark Boas

In last week’s Mozilla News Lab we were treated to webinars from Chris Heilmann, John Resig and Jesse James Garrett.

Chris (a Technical Evangelist for Mozilla), took us on a useful whistle-stop tour of HTML5 and friends, technologies that I certainly wish to take advantage of in my final project. I feel that there’s so much potential yet to be tapped, it’s difficult to know where to start, but I’ll definitely be using the new audio and video APIs and looking into offline storage. Chris also touched on accessibility, something that I want to bake into any tool I come up with. To me accessibility is interesting, not just from the perspective of users with disabilities but also because I hope to do is make news (and the tools to create news) more accessible to all people.

John’s talk was fascinating to me as a project coordinator of a JavaScript library, relating directly to the News Lab I found it interesting that many successful projects are a product of both good tools and strong community. The community aspect cannot be underestimated and is perhaps more important than the quality of the tool or service (which can always be improved by the community). I spoke about ‘Community Driven Development’ this year at FOSSCOMM and so is close to my heart, but to hear this confirmed by someone who has experienced success at the very highest level was great. I am now convinced that whatever ever form my final project takes, it will not only be community driven but it will have community built in, which to me means:

  • Providing forums and methods of communication and feedback with the initial release
  • Building extensibility into the tool so people can add functionality via plugins
  • Encouragement of collaboration – make it easy for people to get involved
  • Making sure that the documentation is comprehensive and clear (and can be contributed to)
  • Fanatical support and encouragement

I also attended the ‘extra’ webinar given by Jesse James. In some ways this was the most useful to me as it focussed on designing the user experience or UX, something I’m keen to learn more about and who better to tell me than the author of The Elements of User Experience. It was fascinating to take a peak at how UX experts like Jesse approached the subject of design. I sat there transfixed as he broke down the process and discussed ideas such as psychology, emotion and how the user feels and their relationship with a given product or tool. Jesse spoke about the personality of a product and how users relate to tools as they would another person, using similar parts of the brain as they do when relating to people.

I really want the tool I create to be as simple as possible to use, so my question to Jesse was :
Is it possible to make a product so simple that it doesn’t need a personality, or needs less? Or is the simplicity its personality?

The answer : “Simplicity is a personality.

Great. Hopefully I’m on the right track then.

Tags: , , , , , ,

Monday, July 25th, 2011 HTML5, Uncategorized 1 Comment