At BBC newsHACK VIII last week, I was part of a WSJ team that put together Autocast, a small hands-free app that reads out the latest news. The idea was that those commuting by car could press play and have a stream of the latest articles read out to them, potentially ordered by their personal interests.
The HTML5 SpeechSynthesis API is a relatively new standard which does what it says on the tin: it allows text to be converted to (robotic) speech in-browser. It’s part of the Speech API, an open standard which also includes a SpeechRecognition API2.
The API is, in theory, pretty straightforward: create a new instance of
SpeechSynthesisUtterance and then read it out with
var msg = new SpeechSynthesisUtterance( "Hello I am browser" ); window.speechSynthesis.speak( msg );
Unfortunately, even in the process of putting together our simple app at newsHACK I encountered a host of annoyances:
Then there are inexplicable cross-browser bugs:
speechSynthesis.cancel()should clear any currently-playing or queued speech. However, on Safari v8.0.6 (OS X) it sometimes flat-out doesn’t work. (For example, try skipping forward and then pausing in the Autocast demo.)
speechSynthesis.speak()causes the new speech instance to be skipped. In Autocast, it means skipping to the next item skips through almost everything in the queue (~100 articles).
Combined, these problems are a nightmare that cost me hours of valuable time during the hackathon and could easily scupper a production-level app. The SpeechSynthesis API is a neat idea, but until these issues are addressed it isn’t much more than that.
Factiva is a Dow Jones product that provides access to articles from thousands of different publications. I’m not sure if the API is publicly available. ↩
A while back, I used the SpeechRecognition API in Kanji Voice Quiz. It worked OK with recognising set phrases – so could work well for voice commands – but I certainly wouldn’t attempt to use it for transcribing arbitrary speech. ↩
Posted 07 Jun 2015