By: Peter Houston
You probably can’t tell from my Publishing Executive blogs, but I’m Scottish.
Like most Scots, my heritage is a source of pride and I love it when people, mostly Americans to be fair, tell me they love my accent. For someone that grew up with their mother telling them to ‘speak properly’ that’s a great feeling.
Unfortunately, Amazon’s Alexa, appears to share my mother’s disdain for the way I speak. It doesn’t complement me on my accent, it doesn’t love my brogue, it doesn’t understand me half the time.
If you want to share my pain, take a few minutes out and watch these two hapless Scotsmen try to operate a voice-activated elevator. Be warned, they get quite angry and a little sweary.
I actually might be one of the worst people in the world to write about the rise of Voice User Interfaces in publishing, or maybe I’m one of the best. My frustrations are a sure-fire guarantee that I won’t be falling for the Start Trek inspired hype vortex that that currently swirls around voice technology.
Make no mistake – recent progress in the sector has been amazing. Between Apple Siri, Microsoft Cortana, Google Assistant, Amazon Alexa and a host of start-up apps, voice interfaces are developing at an astonishing rate. Capabilities and usage are soaring and a looming price war among the leading suppliers of voice controlled speakers will only speed things along.
Voice tech isn’t new – before Siri, Cortana and Alexa, there was Audrey, a Bell Labs system that could recognize single spoken numbers back in 1952. But the computing muscle needed to process voice commands no longer fills a room; it fits on a phone or a tiny desktop device thanks to Cloud processing.
Amazon’s Alexa has led the field to this point, especially in Europe, where the Google Home has only just launched. From a standing start in 2014, there are now more than 10,000 ‘Skills’ (think voice-controlled apps) available. Amazon doesn’t do sales figures, but it has expressed a desire to sell 10 million Alexa Echo devices in 2017. More broadly, analysis from VoiceLabs estimates that 24.5 million voice-driven devices will be shipped this year, up from 6.5 million in 2016 and just 1.7 million in 2015.
Early Days of Voice Applications
The fact that there are more and more voice applications being developed is a mark of the consumer interest, but unless you’re in news, don’t worry about missing the boat.
A survey conducted this time last year by Experian and Creative strategies shows Amazon Echo usage dominated by the commands ‘Play me a song’, “Set a timer or alarm”. The only area publishing activity features is “Read me the news”.
And even in news, publishers are having to curb their enthusiasm. An early entrant, possibly down to Jess Bezos’ ownership, The Washington Post quickly found that Skills are better for requesting quick-hit information. It gave up on trying to design voice interface to deliver detailed Olympic results in favor of basic information on medal tables.
It’s telling that the voice command “Tell me a joke” is 6th on the Experian-Creative Strategies popular usage list, less indicative of users need for a laugh as the relative simplicity of developing a ‘call and response’ style joke telling skill.
Microsoft designer Cheryl Platz explains this in a Medium post offering advice to would-be voice-interface designers. “These devices are all specialized for allowing customers to complete tasks using their voice – less of a conversation more of a request.”
Bezos himself has said the Echo’s Alexa assistant is “primarily” about using voice tech to simplify tasks around the house.
How Should Publishers Think About Voice?
The first thing is to think of voice as an experiment. 1950s prototypes aside, these are very early days for voice interface technology.
Publishers on the Alexa platform – most major newspapers plus publications that include The Skimm, Quartz, Time Out, and The Economist – are treating their efforts, rightly, as a test. They are going where their audience are to see if they can engage them there.
Second, echoing The Washington Post’s experience, most seem to be avoiding complicated Skills development for Alexa and opting for simpler “Flash Briefing” content. Like apps, more complex skills need to follow some level of logic to deliver from a range of content options. The Flash Briefing command is simply a trigger for a specific content package, like a news or weather update, usually running at about 90 seconds.
Third, don’t think of voice as a platform – think of it as just the latest content command and control system. From keyboard, to mouse, to touch screen, to voice interface — just one more way for your audience to activate your content.
Fourth, and at the risk of stating the obvious, voice interfaces work best with audio content. Looking ahead, AI might be able to deliver long-form text in an acceptable fashion, but as voice technology currently exists, it’s unlikely to replace reading. If you don’t believe me, ask Alexa to read one of your Kindle books — it’s far from easy listening.
It works extremely well, however, as a way to activate existing audio content (The Guardian’s Longread Podcasts for example) or custom made audio news or updates.
Fifth, don’t expect any revenue any time soon. As with so many bleeding edge publishing technologies, income doesn’t appear to be too high on the agenda.
For Amazon, there is a clear link to ecommerce, but surveys show shopping way down the list for actual usage, which suggests that the model has a ways to go. The most likely scenario for publishers is 10-second pre-roll style sponsor messages like those being trialled by VoiceLabs.
The company says customer response to the voice-first ads has been positive, spotlighting a 6-second ad unit for Insurance company Progressive that says: “Thank you for using our skill, which is brought to you by Progressive.” VoiceLabs CEO Adam Marchick told Forbes the company has tested the ad unit extensively, and there’s ‘no downside for brands’.
But Google Home, likely to favor an ad-driven model, has had a slightly less positive experience with early voice ad efforts. It had to pull a Beauty and Beast ad, with Engadget reporting negative customer response, writing, “Users didn’t pay $130 to get audio ads.”
Other than unwanted ads, there are other worries that may stand in the way of universal acceptance of voice tech – the biggest may be privacy.
In January, technology strategist Shelly Palmer wrote in Ad Age about consumers ‘Willing Suspension of our privacy’. Basically, he wonders if the convenience of the voice interface will be enough for consumers to allow an always-on listening device into their homes. Amazon has reassured customers that it doesn’t store the ‘pre-processing’ audio Alexa is constantly recording, but what if microphones were hacked?
There are also concerns that voice output amplifies the fake news phenomena.
“Ask a question – Get an answer” seems fine unless the question is ‘”Are Republican’s Fascists?” and Google Home’s response is “Republican’s equal Nazis.” Whatever you think of the GOP, that answer, which Danny Sullivan of Search Engine Land shared on Twitter, is not helpful to considered political debate.
We know all about the bad information on the internet, but the lack of context is a problem when the voice that reads you the news, tells you the weather, or manages your shopping list is also a direct conduit to every batshit conspiracy theory online.
The growth in device sales and Skills or voice app development suggests that consumers do see a value whether that’s an interface that works from the other side of the room or one that responds instantly to whatever pops into your head.
Ultimately, the success or failure of voice interfaces will depend on their ability to deliver. Can they make domestic chores easier? Can they discover and deliver quality content effectively? Can they learn to deal with my Scottish my accent?