By Mark Boas
H.264 is a video compression format, which although fairly recently open-sourced – H.264 is ‘patent encumbered’ in other words royalties for its use could be claimed at any time. Currently latest versions of Chrome, Internet Explorer and Safari all
support H.264 but Firefox and Opera do not but support an alternative format known as WebM (as does Chrome). However around 13 months ago Google stated their intention to drop H.264 support from Chrome ‘in the coming months’.
A Flash fallback is used when we detect whether HTML5 or a particular format is supported and fall back to a Flash solution if not. At this point it’s worth mentioning that Mobile Safari does not support Flash. NBD in this case means no big deal (for Chrome). The special Flash deal probably alludes to Chrome and Adobe’s deal to bundle Flash with Chrome and Adobe’s intention to drop support for Flash on Linux for browsers other than Chrome. Note that Adobe plan to not support ‘new mobile device configurations’ going forward.
B2G refers to Boot-2-Gecko a new initiative by Mozilla to create a web-based open source mobile operating system.
Note. Whereas there is more room to maneuver with desktop browsers, with mobile devices, speed and battery life are of prime concern. Since currently most, if not all mobile devices that play video, support H.264 decoding directly via hardware, it would be difficult to enter the market with a mobile OS that didn’t support H.264.
I encourage you to expand their tweets to see who else has chimed in. You can see the full discussion here.
WebM vs H.264 Showdown
The second discussion involves David Storey (former Opera Web Evangelist now working for Motorola which has recently been acquired by Google) and Faruk Ateş (‘Entreprenerd’ and former web-standards specialist for Apple). Excerpts follow.
This is the interesting part for me. H.264 and VP8 (the video part of WebM) are both based on Inverse Discrete Cosine Transform for decoding so in theory at least this could happen.
It’s worth pointing out that services exist that will do this for you.
Upload once and have multiple formats available automatically. Vid.ly comes to mind but I’m sure there are others. Although I do think it would be nice to have a package that did something similar which you could install on a server of your choice.
Last week as part of the Mozilla News Lab, I took part in webinars with Shazna Nessa – Director of Interactive at the Associated Press in New York, Mohamed Nanabhay – an internet entrepreneur and Head of Online at Al Jazeera English and Oliver Reichenstein – CEO of iA (Information Architects, Inc.).
I have a few ideas on how we can create tools to help journalists. I mean journalists in the broadest sense – casual bloggers as well as hacks working for large news organizations. In previous weeks I have been in deep absorption of all the fantastic and varied information coming my way. Last week things started to fall into place. A seed of an idea that I’ve had at the back of my mind for some time pushed its way to the front and started to evolve.
Something that cropped up time and again was that if you are going to create tools for journalists, you should try and make them as easy to use as possible. The idea I hope to run with is a simple tool to allow users to assemble audio or video programs from different sources by using a paradigm that most people are already familiar with. I hope to build on my work something I’ve called hypertranscripts which strongly couple text and the spoken word in a way that is easily navigable and shareable.
Editing, compiling and assembling audio or video usually requires fairly complex tools, this is compounded by the fact that it’s very difficult to ascertain the content of the media without actually playing through it.
I propose that we step back and consider other ways of representing this media content. In the case of journalistic pieces, this content usually includes the spoken word which we can represent using text by transcribing it. My idea is to use the text to represent the content and allow that text to be copied, pasted, dragged and dropped from document to document with associated media intact. The documents will take the form of hypertranscripts and this assemblage will all work within the context of my proposed application, going under the working title of the Hyperaudio Pad. (Suggestions welcome!) Note that the pasting of any content into a standard editor will result in hypertranscripted content that could exist largely independently of the application itself.
Some examples of hypertranscripts can be found in a couple of demos I worked on earlier this year:
As the interface is largely text based I’m taking a great deal of inspiration from the elegance and simplicity of Oliver’s iAWriter. Here are a couple of rough sketches :
Last week I’m happy to say that I found myself collaborating with other members of the News Lab, namely Julien Dorra and Samuel Huron, both of whom are working on related projects. These guys have some excellent ideas that relate to meta-data and mixed media that tie in with my own and I look forward to working with them in the future. Exciting stuff!
As part of the Knight Mozilla News Challenge I’m taking part in, I attended two webinars. The first was given by Aza Raskin concerned prototyping and communication, the second the story of Storify and the Hacks/Hackers community by founder and journalist Burt Herman.
I found both these webinars extremely valuable, I had actually seen Aza’s talk in November in Barcelona when he spoke at the Drumbeat Festival, it was inspiring then as it is now. The main thrust being that prototypes are important as a way to get your idea out there, put it in front of people and iterate upon it! The Storify story gave a different perspective and I found myself agreeing with thoughts on following your passion, building community, teams and again getting your product out there!
However the point in Burt’s talk that really stood out for me, was his stressing of keeping things simple. The Minimum Viable Product approach is something which resonates strongly with me, I have to say I’m a big fan, but what he said next really struck home. Specifically in response to a question on how Burt and his team went from early prototypes to the current incarnation of Storify, he re-emphasised the iterative approach of and gathering feedback from early minimal releases … and then BAM – out it came, mentioned almost as an aside :
“We built this thing that made it easy to embed a tweet in a blogpost”
“so you could just type in the ID of a tweet and it would give you a nice formatted HTML thing”
So beautifully simple, yet hugely powerful. And it was built by co-founder Xavier Damman in a day! Unsurprisingly TechCrunch picked it up almost immediately and many people started using it soon after. Once they realised what they had, they built upon it and worked it in to their main product. This for me was such a great example of what the ‘just build it’ mentality can achieve. Prior to that, they had been experimenting with slightly more complex systems, but they tried a few variations on a theme and with the help of good feedback they hit upon a fantastic solution.
I have a few ideas for applications for the ‘Unlocking Video’ challenge, I’m not entirely sure of any of them, but one thing’s for certain I’m going to get the very core of an idea out there early and then iterate wildly in response to as much feedback as I can gather, and hope as Burt and his team did, to strike gold.
Related resources :
Storify storified – SXSW Winners [webmission.be]
Tweet embedding tool : http://media.twitter.com/blackbird-pie
Following the fun we had making the Hyperdisken demo, I was happy to be asked by Mozilla, in collaboration with Radiolab and SoundCloud to help create another demo to show off the possibilities of hyper audio. This time we had an excellent Radiolab program as audio material and we wanted to get a little more ‘involved’. What was required was an application that would consist of many of the features of the Hyperdisken demo but also integrate deeply with SoundCloud API, and on top of this something extra, something to catch the eye.
I was fortunate again to work with the ideas-forge known as Henrik Moltke, who collaborated early on with Paul Rouget to produce something he dubbed the ‘Word River’ – a CSS3 manipulated flowing river of words that dynamically picked up content from an HTML transcript. We were also keen to make a pure HTML5 based solution and Paul helped figure out the hooks into the SoundCloud API that would allow us to achieve that. We were also very lucky to be given a great design by the multi-talented Lee Martin, SoundCloud’s experimenter extraordinaire.
So with proof of concept and some visual bling firmly in hand I was tasked with making this baby fly. Luckily I had help. SoundCloud engineers were at hand to answer any questions and crucially we had great support and code contributions from the popcorn.js group. I also managed to talk jPlayer author and all round JS media guru and of course, colleague Mark Panaghiston into giving me a hand. So despite the tight deadlines we were pretty much set.
Henrik has already blogged about the ideas and functionality that make up the demo. I want to write a little about the technology used.
Although I found out in retrospect, not strictly essential, we once again used jPlayer as our audio base, we’re familiar with it and we can move fast using it. It also meant that we could take much of the functionality developed in previous demos and plug it right in. Again, the excellent Popcorn.js was the engine that drove all the time based display of text and images and dealt with the parsing of data. Steven Weerdenberg, active Popcornista from Seneca College, very kindly wrote a plugin that grabbed, parsed and presented comments (amongst other things) from the track we used hosted on SoundCloud. This is where the Popcorn framework comes into its own as a plugin oriented architecture, something we took advantage of when we converted both the transcript and the word river functionality into plug-ins.
So where did the data come from? Well again the transcript HTML doubled as the source for richer interaction when used by the word river plugin. I like this approach I have to say. It means you don’t have to be a programmer to come in and immediately understand the content and change it accordingly. I also like the fact that the transcript is a separate HTML file, it pleases the separatist in me and means that it works as a standalone resource. We also used the standard speaker notations as a type of meta-data, the word river plugin hiding these parts for the purpose of display but using them to colour code each speaker’s text.
This part of the transcript:
is used to create this ‘word river’:
and this interactive transcript :
Data-wise, everything else came via the SoundCloud API, this included their trademark wave-form, both ogg and mp3 audio sources and all of the comments. We also hijacked the comments to make a crude content management system. The idea being that any comments posted by the Radiolab account with references to images in them, were picked up and displayed as images in the main content area, and did not show up as comments on the timeline. If two images were present it meant they were square, one and it was ‘widescreen’ a blank image was used to remove images when they were no longer needed.
The last pieces of the puzzle and one we’ve still some polish to apply to, (if polishing puzzles makes any sense to you) was getting it all working on the majority of tablets and mobile devices. Since this demo didn’t use Flash this was actually a possibility and we got our web designer Silvia Benvenuti to come out of maternity leave and sort this out for us at the 9th hour, leaving me quite literally holding the baby.
This was a tough gig but all in all I’m happy with what we achieved, everyone seemed to really enjoy taking part in the process, and I certainly enjoyed bringing it all together. Hopefully it will inspire both program makers, designers and developers to come together and explore the limits of what hyper-audio can do. As Inspiral Carpets would say, moo!
Source code for this project and other demos can be found on github
Follow me on Twitter if you want to hear more about this sort of thing.
Recently I had the privilege of working on a very interesting project with a few folk from Mozilla – it’s the type of project I love to work on, as it involves web audio and its deep integration into the general web experience.
Web audio is no longer consigned to being the passive play and pause experience of yesteryear, it has the potential to be much more, it can be a driver of much richer interactions, something Henrik Moltke explores with something he dubs Hyper Audio. The remit of the project was to take various media elements of a radio interview broadcast by Danish Radio station DR; audio, subtitles, transcripts, footnotes etc and link these in an intuitive and useful manner.
To say this project was right up my street would be an understatement – this project was in my flat, raiding my fridge and drinking my beerz. I was already fascinated by the concept. I’d been playing about, creating audio related demos for a couple of years and in November last year I decided to attend the Mozilla Drumbeat festival and created a demo for the event. The demo was accepted to be exhibited at the science fair on the opening evening and garnered some interesting feedback both on and offline, what it effectively demonstrated was the synchronization and bi-directional control of text and audio.
When Henrik asked me to work on this project, I naturally jumped at the opportunity. Due to time differences, pressing deadlines and the luxury of having a nice quiet office, I stayed up late most nights for a week, happily hacking away and helped out and supported by various Mozillians and the popcorn.js community.
So that’s the back-story, here’s the demo.
Some things to try :
- Switch the audio from English to Danish – it should continue from the same point in Danish, subtitles and the transcript should also change appropriately.
- Try clicking on words in the transcript – the audio should start playing from the corresponding point.
- Highlight a passage of transcript text – this should add a tweetable excerpt to the ‘share’ box. The URL included should just play that part of the audio.
- Clicking the music note icons in the ‘media’ box should take you to the point of the audio where that resource was mentioned.
How did we achieve this? We used popcorn.js to display subtitles, footnotes and other time-related resources. In fact a lot of this was already in place when I picked up the project. I then integrated jPlayer for the audio playback and deeper interaction. Popcorn allows us to associate timings with actions and have these actions triggered by media when they hit said timings. So pretty much perfect for our needs. jPlayer provided a solid abstraction above the native audio API, it allowed me to easily synchronize and switch audio tracks and jump to specific points or sections in the audio, with very few lines of code. Importantly it also protected us from any cross-browser issues and allowed our designers to effortlessly create a custom skin for the player.
So this was the control, but what about the media? Well this part was a massive team effort. Henrik managed to provide a very accurately timed transcript. We had hoped to use the subtitles in SRT format but for convenience we parsed them or rather Scott Downe parsed them into JSON format.
One of the bigger issues we encountered was that we only had the transcript in English and the timings for the Danish transcript were naturally different. Luckily we had accurately timed Danish subtitles and legendary Bobby Richter on hand to convert the subtitles to individual words complete with their timings, which he did by cunningly interpolating the timing of words (based on word length) and based on their in-subtitle position. All knocked out in about 10 minutes and in 20 lines of code. It worked surprisingly well, of course you need to be able to understand Danish to truly tell. We could have probably parsed the subtitles into the transcript on the fly but due to time limitations we made them static.
Perhaps an aside not directly related to audio, I managed to hack together some code that allowed highlighted transcript text to be placed in the ‘share’ box, and grab the timings of the first and last words, from there it was pretty much straightforward to make this excerpt tweetable.
This whole endeavor was very much a group effort, a huge thanks to the popcorn.js team, who made joining their IRC feel like walking into a pub full of friends.
Special credit and thanks then should go to Scott Downe, Bobby Richter, Barry Threw, David Humphrey, Brett Gaylor, Ben Moskowitz, Christian Valentiner, Silvia Benvenuti and of course Henrik ‘Tank’ Moltke whose baby all this was. It was great being part of such a talented team. Awesomesauce indeed.
- The Hyperaudio Pad – Next Steps and Media Literacy
- Breaking Out – The Making Of
- Breaking Out – Web Audio and Perceptive Media
- Playing web audio offline on mobile Safari. Mission impossible?
- Flash vs HTML5 Video and the Codec thing
- Altrepreneurial vs Entrepreneurial and Why I am going to Work with Al Jazeera
- HTML5 Audio APIs – How Low can we Go?
- Hyperaudio at the Mozilla Festival
- The Hyperaudio Pad – a Software Product Proposal
- Introducing the Hyperaudio Pad (working title)
- Accessibility, Community and Simplicity
- Build First, Ask Questions Later
- Further Experimentation with Hyper Audio
- Hyper Audio – A New Way to Interact
- P2P Web Apps – Brace yourselves, everything is about to change
Add new tag