Further Experimentation with Hyper Audio

Wednesday, May 4th, 2011

Mark Boas

Following the fun we had making the Hyperdisken demo, I was happy to be asked by Mozilla, in collaboration with Radiolab and SoundCloud to help create another demo to show off the possibilities of hyper audio. This time we had an excellent Radiolab program as audio material and we wanted to get a little more ‘involved’. What was required was an application that would consist of many of the features of the Hyperdisken demo but also integrate deeply with SoundCloud API, and on top of this something extra, something to catch the eye.

Here’s what we came up with.

symmetry

I was fortunate again to work with the ideas-forge known as Henrik Moltke, who collaborated early on with Paul Rouget to produce something he dubbed the ‘Word River’ – a CSS3 manipulated flowing river of words that dynamically picked up content from an HTML transcript. We were also keen to make a pure HTML5 based solution and Paul helped figure out the hooks into the SoundCloud API that would allow us to achieve that. We were also very lucky to be given a great design by the multi-talented Lee Martin, SoundCloud’s experimenter extraordinaire.

So with proof of concept and some visual bling firmly in hand I was tasked with making this baby fly. Luckily I had help. SoundCloud engineers were at hand to answer any questions and crucially we had great support and code contributions from the popcorn.js group. I also managed to talk jPlayer author and all round JS media guru and of course, colleague Mark Panaghiston into giving me a hand. So despite the tight deadlines we were pretty much set.

Henrik has already blogged about the ideas and functionality that make up the demo. I want to write a little about the technology used.

Although I found out in retrospect, not strictly essential, we once again used jPlayer as our audio base, we’re familiar with it and we can move fast using it. It also meant that we could take much of the functionality developed in previous demos and plug it right in. Again, the excellent Popcorn.js was the engine that drove all the time based display of text and images and dealt with the parsing of data. Steven Weerdenberg, active Popcornista from Seneca College, very kindly wrote a plugin that grabbed, parsed and presented comments (amongst other things) from the track we used hosted on SoundCloud. This is where the Popcorn framework comes into its own as a plugin oriented architecture, something we took advantage of when we converted both the transcript and the word river functionality into plug-ins.

So where did the data come from? Well again the transcript HTML doubled as the source for richer interaction when used by the word river plugin. I like this approach I have to say. It means you don’t have to be a programmer to come in and immediately understand the content and change it accordingly. I also like the fact that the transcript is a separate HTML file, it pleases the separatist in me and means that it works as a standalone resource. We also used the standard speaker notations as a type of meta-data, the word river plugin hiding these parts for the purpose of display but using them to colour code each speaker’s text.

This part of the transcript:

Transcript Markup
is used to create this ‘word river’:

Word River

and this interactive transcript :

Transcript Text

Data-wise, everything else came via the SoundCloud API, this included their trademark wave-form, both ogg and mp3 audio sources and all of the comments. We also hijacked the comments to make a crude content management system. The idea being that any comments posted by the Radiolab account with references to images in them, were picked up and displayed as images in the main content area, and did not show up as comments on the timeline. If two images were present it meant they were square, one and it was ‘widescreen’ a blank image was used to remove images when they were no longer needed.

The last pieces of the puzzle and one we’ve still some polish to apply to, (if polishing puzzles makes any sense to you) was getting it all working on the majority of tablets and mobile devices. Since this demo didn’t use Flash this was actually a possibility and we got our web designer Silvia Benvenuti to come out of maternity leave and sort this out for us at the 9th hour, leaving me quite literally holding the baby.

This was a tough gig but all in all I’m happy with what we achieved, everyone seemed to really enjoy taking part in the process, and I certainly enjoyed bringing it all together. Hopefully it will inspire both program makers, designers and developers to come together and explore the limits of what hyper-audio can do. As Inspiral Carpets would say, moo!

moo!

Source code for this project and other demos can be found on github

Follow me on Twitter if you want to hear more about this sort of thing.

Tags: , , , , , , , ,

Hyper Audio – A New Way to Interact

Friday, April 8th, 2011

Mark Boas

Recently I had the privilege of working on a very interesting project with a few folk from Mozilla – it’s the type of project I love to work on, as it involves web audio and its deep integration into the general web experience.

Web audio is no longer consigned to being the passive play and pause experience of yesteryear, it has the potential to be much more, it can be a driver of much richer interactions, something Henrik Moltke explores with something he dubs Hyper Audio. The remit of the project was to take various media elements of a radio interview broadcast by Danish Radio station DR; audio, subtitles, transcripts, footnotes etc and link these in an intuitive and useful manner.

To say this project was right up my street would be an understatement – this project was in my flat, raiding my fridge and drinking my beerz. I was already fascinated by the concept. I’d been playing about, creating audio related demos for a couple of years and in November last year I decided to attend the Mozilla Drumbeat festival and created a demo for the event. The demo was accepted to be exhibited at the science fair on the opening evening and garnered some interesting feedback both on and offline, what it effectively demonstrated was the synchronization and bi-directional control of text and audio.

When Henrik asked me to work on this project, I naturally jumped at the opportunity. Due to time differences, pressing deadlines and the luxury of having a nice quiet office, I stayed up late most nights for a week, happily hacking away and helped out and supported by various Mozillians and the popcorn.js community.

So that’s the back-story, here’s the demo.

Screenshot from HyperDisken Demo (Hyper Audio)

Some things to try :

  • Switch the audio from English to Danish – it should continue from the same point in Danish, subtitles and the transcript should also change appropriately.
  • Try clicking on words in the transcript – the audio should start playing from the corresponding point.
  • Highlight a passage of transcript text – this should add a tweetable excerpt to the ‘share’ box. The URL included should just play that part of the audio.
  • Clicking the music note icons in the ‘media’ box should take you to the point of the audio where that resource was mentioned.

How did we achieve this? We used popcorn.js to display subtitles, footnotes and other time-related resources. In fact a lot of this was already in place when I picked up the project. I then integrated jPlayer for the audio playback and deeper interaction. Popcorn allows us to associate timings with actions and have these actions triggered by media when they hit said timings. So pretty much perfect for our needs. jPlayer provided a solid abstraction above the native audio API, it allowed me to easily synchronize and switch audio tracks and jump to specific points or sections in the audio, with very few lines of code. Importantly it also protected us from any cross-browser issues and allowed our designers to effortlessly create a custom skin for the player.

So this was the control, but what about the media? Well this part was a massive team effort. Henrik managed to provide a very accurately timed transcript. We had hoped to use the subtitles in SRT format but for convenience we parsed them or rather Scott Downe parsed them into JSON format.

One of the bigger issues we encountered was that we only had the transcript in English and the timings for the Danish transcript were naturally different. Luckily we had accurately timed Danish subtitles and legendary Bobby Richter on hand to convert the subtitles to individual words complete with their timings, which he did by cunningly interpolating the timing of words (based on word length) and based on their in-subtitle position. All knocked out in about 10 minutes and in 20 lines of code. It worked surprisingly well, of course you need to be able to understand Danish to truly tell. We could have probably parsed the subtitles into the transcript on the fly but due to time limitations we made them static.

Perhaps an aside not directly related to audio, I managed to hack together some code that allowed highlighted transcript text to be placed in the ‘share’ box, and grab the timings of the first and last words, from there it was pretty much straightforward to make this excerpt tweetable.

This whole endeavor was very much a group effort, a huge thanks to the popcorn.js team, who made joining their IRC feel like walking into a pub full of friends.

Special credit and thanks then should go to Scott Downe, Bobby Richter, Barry Threw, David Humphrey, Brett Gaylor, Ben Moskowitz, Christian Valentiner, Silvia Benvenuti and of course Henrik ‘Tank’ Moltke whose baby all this was. It was great being part of such a talented team. Awesomesauce indeed.

Mark B

Tags: , , , , ,

P2P Web Apps – Brace yourselves, everything is about to change

Monday, March 7th, 2011

Mark Boas

The old client-server web model is out of date, it’s on its last legs, it will eventually die. We need to move on, we need to start thinking differently if we are going to create web applications that are robust, independent and fast enough for tomorrow’s generation. We need to decentralise and move to a new distributed architecture. At least that’s the idea that I am exploring in this post.

It seems to me that a distributed architecture is desirable for many reasons. As things stand, many web apps rely on other web apps to do their thing, and this is not necessarily as good a thing as we might first imagine. The main issue being single points of failure (SPOF). Many developers create (and are encouraged by the current web model to create) applications with many points of failure, without fallbacks. Take the Google CDN for jQuery for example – no matter how improbable, you take this down and half the web’s apps stop functioning. Google.com is probably the biggest single point of failure the world of software has ever known. Not good! Many developers think that if they can’t rely on Google, then who can they rely on? The point is that they should not rely on any single service, and the more single points you rely on, the worse it gets! It’s almost like we have forgotten the principles behind (and the redundancy built into) the Internet – a network based on the nuclear-war-tolerant ARPANET.

I believe it’s time to start looking at a new more robust, decentralised and distributed web model, and I think peer-to-peer (P2P) is a big part of that. Many clients, doubling as servers should be able to deliver the robustness we require, specifically SPOFs would be eliminated. Imagine Wikileaks as a P2P app – if we eliminate the single central server URL mode and something happens to wikileaks.com, the web app continues to function regardless. It’s no coincidence that Wikileaks chose to distribute documents on bittorrent. Another benefit of a distributed architecture is that it scales better, we already know this of course – it’s one of the benefits of client-side code. Imagine Twitter as a P2P web app, imagine how much easier it could scale with the bulk of processing distributed amongst its clients. I think we could all benefit hugely from a comprehensive web based P2P standard, essentially this boils down to browser-built-in web servers using an established form of inter-browser communication.

I’m going to go out on a limb here for a moment, bear with me as I explore the possibilities. Let’s start with a what if. What if, in addition to being able to download client-side JavaScript into your cache manifest you could also download server-side JavaScript? After all what is the server-side now, if not little more than a data/comms broker? Considering the growing popularity of NoSQL solutions and server-side JavaScript (SSJS) it seems to me that it wouldn’t take much to be able to run SSJS inside the browser, granted you may want to deliver a slightly different version than runs on the server, and perhaps this is little more than an intermediary measure. We already have Indexed Database API on the client, if we standardise the data storage API part of the equation, we’re almost there, aren’t we?

So let’s go through a quick example – let’s take Twitter again, how would this all work? Say we visit a twitter.com that detects our P2P capabilities and redirects to p2p.twitter.com, after the page has loaded we download the SSJS and start running it on our local server. We may also grab some information about which other peers are running Twitter. Assuming the P2P is done right, almost immediately we are independent of twitter.com, we’ve got rid of our SPOF. \o/ At the same time if we choose to run offline we have enough data and logic to be able to do all the things we would usually do, bar transmit and receive. Significantly load on twitter.com could be greatly reduced by running both server-side and client-side code locally. To P2P browsers, twitter.com would become little more than a tracker or a source of updates. Also let’s not forget, things run faster locally :)

Admittedly the difference between server-side and client-side JS does become a little blurred at this point. But as an intermediary measure at least, it could be useful to maintain this distinction as it would provide a migration path for web apps to take. For older browsers you can still rely on the single server method while providing for P2P browsers. Another interesting beneficial side-effect is that perhaps we can bring the power of ‘view-source’ to server-side code. If you can download it you can surely view it.

The thing that I am most unsure about is what role HTTP plays in all this. It might be best to take an evolutionary approach to such a revolutionary change. We could continue to use HTTP to communicate between clients and servers, allowing the many benefits of a RESTful architecture, restricting the use of a more efficient TCP based comms to required real-time ‘push’ aspects of applications.

It seems to me that with initiatives such as Opera Unite (where P2P and a web server are already being built into the browser) and other browser makers looking to do the same, this is the direction in which web apps are heading. There are too many advantages to ignore: robustness, speed, scalability – these are very desirable things.

I guess what is needed now is some deep and rigorous discussion and if considered feasible, standards to be adopted so all browsers can move forward with a common approach for implementation. Security will obviously feature highly as part of these discussions and so will common library re-use on the client.

If web apps are going to compete with native apps, it seems that they need to be able to do everything native apps do and that includes P2P communication and enhanced offline capability, it just so happens that these two important aspects seem to be linked. Once suitable mechanisms and standards are in place, the fact that you can write web applications that are equivalent or even superior to native apps, using one common approach for all platforms, is likely to be the killer differentiator.

Thanks to @bluetezza, @quinnirill, @getify, @felixge, @trygve_lie, @brucel and @chrisblizzard for their inspiration and feedback.

Further reading : Freedom In the Cloud: Software Freedom, Privacy, and Security for Web 2.0 and Cloud Computing, The Foundation for P2P Alternatives, Social Networking in the cloud – Diaspora to challenge facebook, The P2P Web (part one)

Tags: , , ,

Monday, March 7th, 2011 development, HTML5, javascript, P2P, Web Apps 26 Comments

A few HTML5 questions that need answering

Thursday, February 17th, 2011

Mark Boas

Challenge Accepted :)

Earlier this week Christian Heilmann posted a talk and some interesting questions on his blog. Perhaps foolishly I decided to accept the challenge of answering them and what follows are my answers to his questions. My hope is to stimulate a bit of civilised debate and so I hope that people challenge what I have written. I’ve tried not to be too opinionated and qualify things with ‘I think’, ‘probably’ etc where possible, but these are inevitably my opinions (and nobody else’s) and should be interpreted as just that.

I think Christian has done a great job putting these questions together – if they make other people think as much as they made me think they’ve certainly been worthwhile. Like many of life’s great questions the answers don’t always boil down to yes and no. In fact usually there is a grey area in which a judgment call is required, and so I apologise in advance if some of the answers are simply ‘it depends’.

Can innovation be based on “people never did this correctly anyways”?

Yes, I’d say that innovation can be based on almost anything. It’s probably better if innovation is based on solid foundations, however I believe the web is more of a bazaar than a cathedral and so solid foundations aren’t always feasible.

If we look at some older browsers where let’s say the spec was interpreted in an unexpected way, take specifically IE5.5 and the CSS box model – hacks are required to get around its ‘creative’ implementation and due to the huge win of being able to write cross-browser web-pages with the same markup, hacks are almost always found. Often hacks by their very nature will be ugly, but this can help serve to remind us of their existence.* Once the hack is found innovation continues again at its usual pace. I’d argue that innovation on the web is built on a series of (often ugly) hacks.

To answer the question in the context of HTML5′s tolerance for sloppy markup, it’s interesting to note that HTML5 does allow you to markup as XHTML if you so choose, personally and maybe because I’m over fastidious, this is what I try to do. Whether the browser picks up my mistakes or just silently corrects them is another issue. Modern editors’ text highlighting helps a lot but if you are taking your markup seriously you may want to validate it. At the end of the day innovation will take place regardless.

*http://tantek.com/log/2005/11.html

Is it HTML or BML? (HyperText Markup Language or Browser Markup Language)

I think HTML should be accessible to more than what we traditionally view as the browser, yes. From an accessibility standpoint it should be a responsibility to make sure that pages are reasonably accessible from a range of devices such as screenreaders. What ‘reasonably accessible’ means could be cause for considerable debate. Additionally it is in most people’s interest to make their pages readable by a search-engine bot. Some developers may find that they want their pages to be parseable in which case they should probably mark them up in XHTML.

I think though, that the most effective way to encourage people to mark up their pages for devices other than browsers is to make clear the benefits of doing so. Search bots and screenreaders will eventually catch up and I think at that point we will be in a better place.

On a side note hypertext is what the web is founded on – it’s the essence of the web if you like. If you stripped down the web to just hypertext you’d still have a web, albeit a limited version of what we have today. Hypertext is really just text or documents endowed with the ability to link to other documents. This ability to link is of course extremely powerful and enables anyone to create a web page that is instantly ‘part’ of the web. Some companies are starting to rely less on the humble hyperlink and invent markup for themselves that relies on JavaScript to actually work, Facebook’s FBML is an example of this and I would perhaps put this ‘extension’ of HTML in the BML camp.

But while so much has been added to HTML, the hyperlink still remains and developers are encouraged to use it, even if in some cases it is just a fallback for more complex interactions. Only the other day we experienced the fallout from Google’s hashbang method of ajaxifying a link, I feel that a large part of this stemmed from the fact that developers felt that they no longer had to populate their hrefs, but in reality it is still highly recommended they do. So no matter how complex a web application becomes I’d argue that it should always be based fundamentally on the good old hyperlink in which case, strictly speaking it would be HTML.


Should HTML be there only for browsers? What about conversion Services? Search bots? Content scrapers?

Some would argue that information published publicly on the web is there to be consumed by however or whatever chooses to consume it. However a semantic web not only makes content more accessible to humans but also to algorithm running computers that seek to diseminate and categorise the wealth of information out there, this new information could then be presented back to a human and so the wheel turns and the machine cranks on. Again I feel developers should be shown the benefits.

Touching again on accessibility, laws exist and I think rightly so, to mandate that certain websites should be accessible to all, just as they do for certain buildings especially public ones. Whether your house or say a merry-go-round should be accessible is less clear. During the development of jPlayer we decided to make elements of it accessible because we felt it was the right thing to do, but the very first versions weren’t. The ‘right-thing-to-do’ can often be more a powerful motivator than any rule, especially if it is difficult to enforce.


Should we shoe-horn new technology into legacy browsers?

I think we should aim to support as many legacy browsers as reasonably possible – yes. What is ‘reasonably possible’ has always been a contentious issue but I think most developers broadly agree on what this means.

A key decision is how consistent you want each browser’s interpretation of your web page or application to be. Note that the experience will never be identical across all browsers, even the modern ones, so a certain degree of comprimise is always required. I really just view the shoe-horning of new technology into legacy browsers as just another hack, although instead of correcting a misimplemented feature its aim is to augment the legacy browser’s feature set to more closely match newer browsers.


Do patches add complexity as we need to test their performance? (there is no point in giving an old browser functionality that simply looks bad or grinds it down to a halt)

If we take patches to mean any code styling or markup that can adequately add missing functionality to a browser, (are also referred to as shims, shivs or more recently polyfills), yes patches do add complexity, but well written patches that can easily be dropped into a project should mask the complexities from the web developer. You could argue that patches are essential if we are to start using new technologies like HTML5 in the wild. Again a decision needs to be made as to whether a patch slows down a browser unacceptedly, whether to fallback to a less complete solution for legacy browsers or whether not to use the new technology at all yet.


How about moving IE fixes to the server side? Padding with DIVs with classes in PHP/Ruby/Python after checking the browser and no JS for IE?

The trouble with moving fixes to the server-side is that, in my experience at least, front-end developers and especially designers (whose lives these fixes affect) like to be able to develop applications with as little contact with the server as possible. Some web pages don’t even require any real server-side interaction and so we would be creating an added burden to the designer or developer’s development cycle. That said if the web developer could continue to work as usual on a web application that relied on the server-side anyway, safe in the knowledge that the server would take care of any differences then I suppose that would be one possible solution.

Another issue is server-side solutions tend to be less transparent and this could cause unneccessary resistance to those who might seek to change and evolve these fixes. You would also have to deal with the situation where several server-side fixes existed (PHP/Ruby/Python etc), the exception perhaps being mods to the actual webserver (although webservers do vary).

It is true that developers or designers are unlikey to work without a webserver these days but on balance I think there are a lot of issues to be resolved before all but the simplest fixes get moved to the server-side. So I’d say some IE fixes on the server-side could well make sense, anything more complex probably isn’t warranted.


Can we expect content creators to create video in many formats to support an open technology?

I don’t think we can expect content creators to create a video in many formats to support a certain technology just because it is open. It’s more likely to depend on the content creators and the market that they are appealing to – whether they are creating a true native cross-platform solution or whether they are willing to fallback to say Flash to deliver content that a browser doesn’t natively support. I think this is an area where some sort of server-side video format conversion module could be useful in encouraging developers to support native video in every browser. Ironically any tool that converts to or from certain types of encoded video could be subject to royalties (I’d love to be proved wrong here). A content creator could decide to support only open video formats but for example if H.264 is assumed not to be open, the big question is how do you create a video fallback for iOS whose browser doesn’t support Flash or similar.


Can a service like vid.ly be trusted for content creation and storage?

The question that immediately springs to mind is what happens if vid.ly goes down? You are essentially introducing another point of failure and it’s probably a good idea to keep these to a minimum, depending of course on how crucial it is that that content is always available. You could also argue that depending on CDNs to host your favourite JavaScript library is a bad idea so I guess it’s a judgment call on how likely it is that a particular service is to go down. Leaving aside the issue of reliability, what happens if a service like vid.ly gets bought out or hacked or decide to change their policy? All these are important questions that we’d probably do well to ask ourselves before relying on such a service.

Content creation could be a winner though, in the sense that it would be nice to upload one format and several other formats be made available to you. It’s a bit unfashionable but I sometimes wonder whether the WordPress model of being able to download and drop in a useful module on to your LAMP stack makes more sense. I’d love to be able to host my own personal version of vid.ly for use on my website only – it also makes sense from a distributed computing point of view.


Is HTML5 not applicable for premium content?

If we take premium content to mean content available for a fee, it is certainly more difficult to protect that content from being downloaded with HTML5. Being able to view source is arguably one of the most important aspects of open technology and HTML5 is a group of open technologies. However there is almost always a way to grab premium content whether it is based upon open technology or not, so at the end of the day we’re really talking about how difficult you make this.


Mark B

Tags:

Thursday, February 17th, 2011 AJAX, configuration, CSS, development, HTML, HTML5, javascript, SEO 1 Comment

Drumbeat Demo – HTML5 Audio Text Sync

Sunday, December 5th, 2010

Last month I had the pleasure of travelling to Barcelona to participate in Mozilla’s Drumbeat festival (of which more details are to come).

I very much wanted to demo the capabilities of HTML5 audio and so set about creating a demo in keeping with the theme of the festival – ‘Learning, Freedom and the Web’. I ended up with a very rough prototype of a web app that synchronised audio to text, word for word, more accurately it provided an interface that allowed a person to synchronise the audio to text and then demonstrated a couple of things that were made possible once this synchronisation had taken place.

So what was possible?

The first thing I found was that once I had the timings I could easily create a mechanism to control the audio from the text. Now by clicking on individual words I could jump to the corresponding part of the audio, useful for navigating audio and also potentially as an aid to learning language. I took this a little further by allowing the user to highlight areas of the text and having just that part of the text played back, which was um, an interesting exercise*.

*This feature is very very experimental and needs some love.

Finally I added a bit of razzmatazz, tacking on ‘Image Overlay Mode’, which is really a text overlayed on image mode, but that was a bit too wordy. To achieve image overlay mode in my limited time, I used canvas, however I’m aware that CSS3 is probably a better fit for this type of ‘animation’.

The code is very crude – it really was just flung together in a desperate rush to get it working for the Drumbeat Science Fair, so I hesitate to say feel free to take it and do with it what you will. But please do consider all demos posted on our blog open source and dual licensed under the MIT and GPL licenses.

Once again the jPlayer library came in handy, providing a useful abstraction and ensuring that the solution works on various platforms.

The demo.

Instructions :

1. In Sync Mode – press play on the player and use the space bar or sync button to synchronise the words with the audio.

2. Switch to Playback Mode to see the words synched to the audio. Click on the words to play the audio from that point. Try selecting areas of text.

3. Hit Image Overlay Mode if you are so inclined and have a canvas enabled browser.

4. Try Hack mode if you want to adjust any part of the timings and/or words.

Alternatively watch a screen capture of the demo :

mp4 | ogv

(Flash version coming soon, honest.)

Feedback and ideas on this demo are particularly appreciated. I’m also interested in possible uses (apart from karaoke ;) ) and perhaps other projects that I could collaborate with to make something genuinely useful.

Thanks to @elmook, @f1lt3r, @sroux, @aulentina and others who gave feedback and encouragement. Special thanks to @bluetezza for the original idea.

Mark B

Sunday, December 5th, 2010 Audio, HTML5, javascript, jPlayer, jQuery 26 Comments