Hyperaudio - A Future of Audio
Hyperaudio
Making Audio a First Class Citizen of the Web.
hy·per·au·di·o
- Non passive
- Dynamically generated
- Integrated into the web experience
As hypertext is to text, hyperaudio is to audio.
What makes audio so special?
- Requires only partial attention
- Conveys emotion well
- Goes deeper ...
?
Demo 1 - Martin Luther King
What do we need to make audio interactive?
Simply put ...
- Set the audio playhead position
- Get the audio playhead position
Word time-aligned transcripts. Hypertranscripts?
<span data-t="123">Hello</span>
<span data-t="456">Edinburgh!</span>
Demo 2 - Hyperdisken
Transcripts break audio out of its black box
And make it ...
- Navigable
- Searchable
- Shareable
Navigable
A transcript can give you a very good idea of media content.
We can scroll through a transcript and take in content more easily than we can scrub audio or video.
Searchable
Transcribing your audio makes it much more findable and shareable.
Search engines can index your media and you can also search through text for keywords.
Shareable
Now we have a good grip on our content's word timings, we can allow people to link to excerpts of audio and their associated text.
We can add social mechanisms.
Language Switching
Precise timings mean we can switch languages on-the-fly.
Use Case : language learning applications.
Demo 3 - RadioLab
Technology
- Popcorn.js ↬
- jPlayer ↬
- jQuery ↬
Popcorn.js
- ✔ Light
- ✔ Modular
- ✔ Time based events built in
- ✔ Active community
- ✔ Library independent
- ✔ IE8+ compatible to an extent
jPlayer
- ✔ Light
- ✔ Modular
- ✔ Skinnable
- ✔ Active community
- ✔ Flash fallback built in
- ✔ IE6+ compatible
Popcorn.js + jPlayer
A winning combo?
- support IE8 using a custom player
- use one audio/video format
- masks browser differences
Going further with text
- Highlight words as they are spoken.
- Use colour and size of text.
- Provide a mobile device experience.
Demo 4 - Radio24Syv
The elephant in the room
How do we create word-aligned timed transcripts?
- Use third-party services
- By hand
- Both
Third party transcription services (paid) $
- 3PlayMedia ↬
- Koemei ↬
- DotSub ↬
- Dragon Speech Engine ↬
- PlyMedia ↬
- Ramp ↬
- SpeakerText ↬
- VoiceBase (freemium) ↬
Third party transcription services (free) ☮
- Amara (formerly Universal Subtitles) ↬
- CMU Sphinx ↬
- Shout Toolkit ↬
- Something that hasn't been made yet!
Hypervideo?
As most video has an audio track, we can apply the same techniques we use to control audio, to video.
Demo 5 - US Presidential Debates
Demo 6 - State of the Union Speech
Idea
When we synchronise audio with text we can manipulate audio in the same way we can text.
Demo 7 - Hyperaudio Pad
Hyperaudio Pad
- Create audio/video programmes easily.
- Web based intuitive interface.
- Each programme comes with a hypertranscript.
- Remix the remixes.
- Programmes come with source intact.
Nothing left on cutting room floor.
Hyperaudio Pad Applications
- Citizen journalism.
- Mainstream journalism when time is an issue.
- Prototype (first cut).
- Casual mash-ups. Art?
Hyperaudio Ecosystem
... and now for something completely different.
We've talked a lot about speech-to-text.
What about text-to-speech?
Demo 7 - Perceptive Media
Dynamically generated audio
We are starting to see the ability to dynamically generate audio content.
- Libs like Speak.js can generate audio on the fly.
- Advanced audio APIs allow 'real-time' effects.
- Standards for future audio are being forged!
The Web Audio API
The Web Audio API is a high-level JavaScript API for processing and synthesizing audio in web applications.
It takes a node based approach, each node performs an audio function and connected together to define the overall audio rendering.
It can be as simple as this ...
or as complex as this ...
Web Audio API - Three Audio
What's around the corner?
Lots of new media related web technology:
- WebRTC
- Web Speech API
- Media Fragments
WebRTC
A technology to facilitate P2P comms.
WebRTC
- Designed for browser-to-browser comms
- Handles live streaming of audio and video
- Facilitates the streaming of any type of data
Useful for ..
- Voice Calling / Video Chat
- P2P File sharing
- Inter-application comms
Opus Audio Codec
- Mandatory part of the WebRTC standard
- Support for both constant bit-rate (CBR) and variable bit-rate (VBR)
- Audio bandwidth from narrowband to fullband
- Dynamically adjustable bitrate, audio bandwidth, and frame size
- Good loss robustness and packet loss concealment (PLC)
Live Streaming is Possible
- Pausing streams will cause audio to buffer. Reconnect to keep live.
- Streams can break, and the browser generally recovers, but detection and reconnection is faster.
- Only Flash can handle RTMP streams.
- Chrome does not like the ICY response headers (often used by SHOUTcast). AAC will break.
Media Fragments
The three main parts are :
- Spatial Dimension - eg #xywh=160,120,320,240
- Temporal Dimension - eg #t=9,20
- Track - eg #track='audio'
Web Speech API
- An API for speech based input
- Currently Chrome only
- Server based
Mobile is Getting Better
But watch out for older mobile browsers.
- Cannot autoplay
- May not be able to affect volume
- May not be able to play simultaneous audio
- Will most likely not preload
Web Native Audio & Video - The Stats
- ◼◼◼◼◼◼ Supports
- ◼ Mixed
- ◼ Does Not Support
source: statcounter.com
The Future is Now
The audio element is very well supported, already we can break audio out of it's black box and integrate into our web experiences in new ways and forms.
Very soon we will have cross-browser support for advanced audio and real-time communications.
source: statcounter.com