Heresy? Hear me out.
Many believe that speech is the most natural way to interact with the world.
We’re born, we learn to talk, boom.
And with the advent and eager adoption of speech-enabled home devices, the marketing buzzwords surrounding ‘speech’, ‘artificial intelligence (AI)’, ‘conversational interface’, and so on, have taken on the momentum of a charging bull.
Speech Recognition Technology
Don’t get me wrong, I love speech recognition technology, and have been involved in the design and development of speech-enabled systems for over a decade, following a brief stint on the R&D side of things working on optimizing acoustic models for various languages and environments.
But speech recognition, in and of itself, is merely a technology, a tool to employ when it makes sense to help an end user get something done.
You want to play some music while you’re cooking dinner?
“Alexa, put on some Nickelback.”
“Sorry, I can’t expose your impressionable ears to slick, commercially-minded post-grunge.”
Oh, well that didn’t work quite how I’d expected…
What about customer care? Many tasks are simplified tremendously when speech recognition is deployed intelligently:
System: “How can I help you today?”
Caller: “A beaver bit through our cable and now I can’t get online.” #WeTheNorth
System: “Let me schedule a technician to check on that cable for you. Shall I also contact animal control?”
Cooking Dinner and Customer Experience
But let’s go back to that cooking dinner scenario for a moment. Add in to the background noise your 5-year-old singing ‘Let It Go!’ at the top of his lungs while your 3-year-old is figuring out that a metal spoon banging against a pot is waaayyy more entertaining than a wooden spoon, and that interaction falls apart completely. You’ll end up firing up Spotify on your computer instead, getting goose fat drippings all over the keyboard.
And that customer care experience over the phone?
Try being understood using an automated system on your cellphone while walking outdoors through a wind tunnel. Granted, it wouldn’t be much easier talking to a real person in that scenario, but that’s actually the point: Speech can suck.
Speech Can Suck
In these, and countless other scenarios, speech is not the most natural way to get something done. In fact, it’s possibly the WORST way, and will only lead to frustration. People know this intuitively, of course, and will avoid making a customer service phone call while their husband is vacuuming, or trying to use their voice-enabled personal assistant while on the subway during rush hour.
Other factors also make speech clunky to use, such as:
- User expectations of the technology (poisoned by decades of poorly-designed phone systems)
- Lack of discoverability of features. What can I do? Or more importantly, what CAN’T I do?
- Temporal nature of using speech: what you say ‘disappears’ as soon as you say it. Same with what the system says. Maintaining context is challenging.
- Systems are rarely ‘natural’. You can’t ‘say anything’, and what you do say needs to be planned, which the human brain is poor at doing in real time.
- Lack of non-speech communication cues: facial expression, gesture, and any visual feedback are all typically not part of speech-enabled systems.
5 Ways to Fail
I'll bet you know of a natural language IVR project that didn't go well.
Maybe someone ignored the fact that no out-of-the-box solutions are ‘good enough’ for your enterprise needs.
Maybe someone ignored one of the 5 points.
Then what happened was not pretty:
Costs skyrocketed. Schedules dragged out.
Reprimands ensued. Bonuses evaporated.
Do you know how to avoid these mistakes so the same doesn't happen to you?
In future posts I'm going to dig deeper into each of the 5 points and show you how natural language understanding technology works. And I'll teach you how to avoid these traps. Stay tuned.