Drumbeat Demo – HTML5 Audio Text Sync

Sunday, December 5th, 2010

Last month I had the pleasure of travelling to Barcelona to participate in Mozilla’s Drumbeat festival (of which more details are to come).

I very much wanted to demo the capabilities of HTML5 audio and so set about creating a demo in keeping with the theme of the festival – ‘Learning, Freedom and the Web’. I ended up with a very rough prototype of a web app that synchronised audio to text, word for word, more accurately it provided an interface that allowed a person to synchronise the audio to text and then demonstrated a couple of things that were made possible once this synchronisation had taken place.

So what was possible?

The first thing I found was that once I had the timings I could easily create a mechanism to control the audio from the text. Now by clicking on individual words I could jump to the corresponding part of the audio, useful for navigating audio and also potentially as an aid to learning language. I took this a little further by allowing the user to highlight areas of the text and having just that part of the text played back, which was um, an interesting exercise*.

*This feature is very very experimental and needs some love.

Finally I added a bit of razzmatazz, tacking on ‘Image Overlay Mode’, which is really a text overlayed on image mode, but that was a bit too wordy. To achieve image overlay mode in my limited time, I used canvas, however I’m aware that CSS3 is probably a better fit for this type of ‘animation’.

The code is very crude – it really was just flung together in a desperate rush to get it working for the Drumbeat Science Fair, so I hesitate to say feel free to take it and do with it what you will. But please do consider all demos posted on our blog open source and dual licensed under the MIT and GPL licenses.

Once again the jPlayer library came in handy, providing a useful abstraction and ensuring that the solution works on various platforms.

The demo.

Instructions :

1. In Sync Mode – press play on the player and use the space bar or sync button to synchronise the words with the audio.

2. Switch to Playback Mode to see the words synched to the audio. Click on the words to play the audio from that point. Try selecting areas of text.

3. Hit Image Overlay Mode if you are so inclined and have a canvas enabled browser.

4. Try Hack mode if you want to adjust any part of the timings and/or words.

Alternatively watch a screen capture of the demo :

mp4 | ogv

(Flash version coming soon, honest.)

Feedback and ideas on this demo are particularly appreciated. I’m also interested in possible uses (apart from karaoke ;) ) and perhaps other projects that I could collaborate with to make something genuinely useful.

Thanks to @elmook, @f1lt3r, @sroux, @aulentina and others who gave feedback and encouragement. Special thanks to @bluetezza for the original idea.

Mark B

Sunday, December 5th, 2010 Audio, HTML5, javascript, jPlayer, jQuery

26 Comments to Drumbeat Demo – HTML5 Audio Text Sync

  • Dawn says:

    What’s wrong with karaoke? ;) Nice demo. I love the image overlay option – makes for a more dynamic interesting presentation of audio. If this could be used for radio, being hard of hearing I might actually choose to listen to radio more often if I could follow it with the text. And jump to relevant or interesting text if its a catch up service.

  • MarkB says:

    Dawn, thanks. We are trying to figure out what this technology could be used for going forward, so your comment is exactly the sort of feedback we’re looking for. :)

  • This is really cool. The first thing I thought of is being able to play a movie scene based on the dialogue. You would obviously need a text transcription but this would allow you to search by keyword, then click on the keyword and the movie would start playing at that point.

  • I have a friend who works for a professional subtitles company. I’ll have to pass this along and see what he thinks. I myself am quite impressed and am thrilled to see this being pulled off in HTML5 and JS. I personally need to look into the canvas feature a bit more.

  • MarkB says:

    @cancel bubble – great idea!

    @Jose – Please do pass this by your friend and let us know what he thinks. Regarding canvas – I’m sure a lot more could be done, I really did just throw that bit together – hope to find more time to play. A CSS3 approach could also be interesting.

  • SIMON Allan says:

    We definitely need this for tatoeba.org, this way it can help learner to listen only the difficult part of a sentences.

  • [...] think that I’m basically looking for something like this… except that instead of having a word by word transcription playing on the screen, you would see [...]

  • Barton says:

    If you’re looking for an interesting application of the technology, I can give you a couple:

    1) Screen casting of text data. Most screen capture grabs the screen as a video stream, with audio synced… unfortunately the quality of the video is usually poor, and it doesn’t scale well to different screen sizes. Also, the text (usually source code or command line) isn’t selectable from the video. My thought on this is that the voice portion of this would actually be narration/explanation, and the text would be source code or use of a command line interface… although having a split screen with the transcript in one pane and the source code in another might be useful as well). Search YouTube for ‘screencast’, that should give you a good feel for the market for this technology.

    2) Web based IVR systems. Tech support typically costs about a $0.25 for an automated response, vs. $6.00 for a human response. Creating a trouble-shooting system/Call tree that people would actually use would cost companies that offer tech support a lot of money.

  • Michael says:


    please look


    Press Space and the click on any word.

    sound works ok only in latest FF and Opera. In Chrome sound plays ok, but you can not click on a word to position a playback

  • [...] demo to inspire the students was Mark Boas’ audio and text sync example which shows how you could sync the text of a speech with the audio file for later [...]

  • Chuka says:

    Hi Mark.
    I’m looking for exactly this one for several days. I will use it on my website for language learning.
    I saw that your demo synchronized word by word. Is it possible to sync paragraph by paragraph?
    Do you sell the code? Please quote this nice player.
    chka_66 66 (a t) y ah oo . c o m

    Best regards,

  • Bobby says:

    It would be helpful in hack mode, allow an audio switch (pointing to audio available online via URL).

    This is useful to make a time code file to sync audio with text…

  • Art Aldana says:

    Hello Mark,

    This is exactly what I’ve been looking for. I need a web-based application that can sync audio to text as you demonstrated in this application. I am an educator and I want to be able to offer new readers and other with learning challenges, reading tools that will facilitate comprehension. What is the status with this application? Are you going to see it through? Would you be interested in sharing your knowledge so this application can be shared by others?

  • Mark Cain says:

    Having spent a significant part of my adult life trying to master dead languages and exploring how to enable proficiency, my thoughts, while viewing your demo, ran to the possible utilization of this technology in teaching such languages.

  • Dalton says:

    This looks great! I work for a company that produces short audio pieces for public radio (our site already uses jPlayer: http://storycorps.org/listen) and we’ve been looking for a way to display subtitles for audio in a clean, non-flash way. I think this might be it! I’ll let you know if we are able to use it.


  • [...] of years now. In November last year I decided to attend the Mozilla Drumbeat festival and created a demo for the event. To my surprise the demo was accepted to be exhibited at the science fair on the [...]

  • This looks fantastic!

    I’m in search of a such subtitle synch to help kids who are dyslectic. It’s very helpful for them to read along the story that is shown in a video. Reading what you’re hearing is then essential and this will make it possible that video can work as a tutor for them!

    Please continue this development!

  • Scott says:

    Hi Mark,

    We are interested in talking to you about helping us out with a project syncing audio/text.

    Can you please contact me and we can discuss ? My email is scottpjr@gmail.com



  • [...] Audio sync is another demo of connecting transcripts and audio [...]

  • Lalitha says:

    Synchornizing audio with text is very challenging one.This method is good but the accuracy is not that much correct.By using flash we can achieve with accuracy but it is quite complicate

  • [...] was only a year ago when I started tinkering with text and audio and was asked to show off a very small demo at the first Mozilla Festival in Barcelona last [...]

  • [...] started playing with synchronizing text and audio in HTML5 about a year ago. A few months later, he stumbled across a blog post by Henrik Molte about [...]

  • Adam says:

    Was wondering how you make this code available or rather do you make this code available?

  • Luci Rios says:

    This could be a great tool for synchronizing animation to sound and also to teach a foreign language

  • John Adams says:

    Love this demo.. stole the code but couldn’t figure it out..I’m a 4 out of 10 with any scripting..Still I did manage to figure out highlighting text …My projects with this technology are:
    Putting the 51 Volumes of Harvard Classics online with interactive texting…(now to be highlighted because my 13 year old kept getting lost)…I have about 25 hours of them up with no highlighting…Youtube search “HC5Books”…but will redo to highlight the words…
    My next project I will start in 2 years after Harvard Classics are the WPA Guides written in the 1930-40′s with image overlay…
    Very tedious work though…eats time and more time and more time….
    Thanks for the demo

  • Sarika says:

    This is really wonderful demo. I am working on a project requires same audio text synchronization and embed it on a video also contains some background sound too, but all without using flash.
    Your demo relates a lot as per my requirement Please advice me how to achieve that. I google alot but not found anything worth.

    Thanks in advance.