Intuitive interactions depend on a solid design process stemming from user research and usability testing. Users can say anything to the VUI, so designers must gain a deep understanding of users’ needs to influence their design. A VUI should be able to anticipate what users will say and the contexts that are most important to them. Once the user interface is implemented, usability testing is vital in determining where the VUI may fail at handling users’ requests.
The irony of enabling natural interactions is that the biggest barrier to achieving it is technology itself. Though the temptation to speak to a bot in a series of keywords is great, it’s not how people talk, and it defeats the point of creating a natural language between people and machines. The minute a VUI breaks away from human-style interaction, it’s asking people to change behaviors, and that creates the very friction a VUI is designed to erase.
Everyone speaks just a little differently, so VUI designers have to think of all the different ways a person might ask a question. As Heidi Culbertson, founder and CEO of Marvee, a company dedicated to voice design for older adults said, “Expect the unexpected, and then expect to be surprised that you didn’t expect something.” Voice-enabled software starts with the end user, so continual research, user testing, and iteration to improve the language model is key.
Think about late-night television show hosts. They’re masters of conversation. They have a few minutes with each guest to ask relevant questions, get the answers the audience wants to hear, and to ask follow-up questions that make sense in context with the guest’s previous responses. Since their time is limited, they have to ensure the questions don’t require more brainwork than the guest or the audience can process in that time. A well-designed VUI should work the same way.
The VUI should have the ability to provide facts that the user might find useful and still stay relevant and sensitive to context. For instance, if the user asks for the weather or news nearly every day, have your VUI suggest those domains more frequently when prompted. Through this design, you can establish a proper relationship between your interface and the user. The VUI understands what the user wants and is able to adapt to those wants, creating a small but shared hub of memory and conversations.
Mihai Antonescu, a senior engineer at Mercedes-Benz R&D, has been working on MBUX, an in-car voice experience powered by Houndify. It’s his team’s job to make sure the car and driver can get along without distraction and provide the luxury experience inherent to the Mercedes brand. “Our biggest learning is that context matters,” Antonescu said in a 2019 Houndify webinar. “In our case, the customer in the car is probably driving. He has a lot of questions about the car, so we make sure that the VUI can answer all the questions in that context. Then, we expand on new things, and see where the context takes us. When the users says ”Hey Mercedes,” the MBUX responds to natural-speech voice queries and commands that range from the weather to nearby restaurants, to features like turning on the air conditioning, remembering the driver’s preferred temperature settings and rolling up the windows.”
How the VUI responds is the make-or-break moment for a user. This is when they will learn if they can trust your product to hold up your end of the deal. Did the voice assistant understand your request, and acknowledge understanding? Can it handle multi-part questions like “Find me Asian restaurants within 10 miles that are not Chinese, have at least 3 stars and are open late,” that require access to multiple domains? Did it provide a short, relevant response and the opportunity for follow-up questions?
At Pandora, which is known for its proprietary algorithms that help the brand personalize its music-streaming service, context was just as important for its voice assistant. “Music is one of those contexts where I don’t have to think,” said Ananya Sharan, product manager for Pandora’s voice mode. “Sometimes I just want Pandora to know what I like. I want to say ‘Play me something awesome,” and it does. Thinking of the context really helped us to prioritize what we wanted to build and how we wanted to design the interaction.”
Ben Steele of RAIN feels the threshold for natural conversations with our voice assistants is a long way off, even with recent notable tech innovations. “We’re able to design a pretty frictionless experience, but what we’re really looking forward to is for our assistants to interact on our level instead of us on theirs, right? Still, there are plenty of best practices that can reduce friction and improve the perception that you and your voice assistant are in an ongoing contextualized conversation”
The best way to ensure context and personalization is through data collection. And the best way to collect data is through usage. The more volume of data you get, the more signals from customers you get that can help optimize the experience. Whether it’s gradually learning your musical tastes and what you listen to at various times of the day or remembering your driving habits and temperature preferences, machine learning and AI can automate your favorites and deliver them right to you.
A cool feature in the MBUX voice assistant is that you can just say “Hey Mercedes” and mention the state that you’re in and the system will recommend things for you. Antonesco explains: “I’ll give you a couple of examples: Let’s say that you’re hungry, or feeling tired or cold. You can just say “Hey Mercedes, I’m hungry” and the system will recommend restaurants because you’ll probably want to look for food in the near future, and this is related to the state you just communicated to the assistant.”
Heidi Culberson founded a voice design company to help her aging parents navigate the world of technology to improve their independence, and in doing so, became an expert on creating VUIs that are more accessible and cater to different audiences. Her specialty is creating voice experiences for older audiences, but paying attention to the needs of the users is important to any age group of users. “As you’re designing a VUI, you need to really know who your audience is and start to personalize experiences. It’s matching how they’re comfortable speaking,” Culbertson explains. “The length of phrases, their word choices, their cadence, their rate of speech. VUI design is truly about the listener.”
Antonescu agrees. “Older people (tend to) ask full questions, and then they say “please search for this.” That’s very different from the way younger generations search for something. That was breaking our model, so we had to adjust it so we can handle things like ‘thank you’ and ‘please.’ One mistake voice designers make at the beginning of the process is to assume that they’ve written enough utterances. “You think,’how would I ask this?’” Antonescu said, “And then you ask a friend and he’ll ask the same question very differently. So keep asking people about how they would ask for one specific intent, and then constantly try to improve your language model.”
The need for personalization based on audience reaches across verticals. Sharan remembers working on Pandora’s language model with expectations that most people would say ‘Hey, Pandora, play me the xxx song.’ When it came to user testing, she was surprised. “People were asking for it by lyrics,” she said. “People were singing lyrics to the voice assistant. There were kids talking and other noise in the background. So we had to adapt and react to all of that. One solution was to clarify with the user what they meant and present them with options. But how much conversation do you engage with so that it doesn’t start to feel like we’re insulting the user?”
“Over time, remove as many barriers to get to the point as reasonable, but be aware that you’re closing doors when you do so. Weigh the value of an efficient experience against the constraints it places on the user, and offer avenues for them to remove or reconsider those constraints” Steele adds. “In general, be easier than a mobile app, then once you are, wean your users off their screens by facilitating habits around voice.”
Top VUI best practices from Mercedes, Pandora, and Marvee
It goes without saying that voice assistants need to be polite, but people, cultures, and languages have different ideas about the concept of “politeness.” Linguist Julie Belião, Director of Quality at Unbabel, thinks about this question daily for the company, which provides AI-powered multi-lingual customer support and service. She follows nine specific rules that define politeness and civility around the world. “You need to adjust the politeness level and reply to your customers in a way that shows you care about them,” she explains.
Her guidelines include tips like using softer language, rather than being direct. “Being very factual can sometimes sound too direct or give the wrong expectations,” she said. “So you can add vague expressions like about, kind of, sort of, stuff, and things.” Another way to soften the tone (and seem more polite) is to use modal expressions (could, might, should and would, for example). Use positive, affirming words to come across as assuring and energetic.” Good choices to add include words like amazing, awesome, great, good, gladly, definitely, absolutely, and understand to make a customer more comfortable and reassured.
In the next chapter, learn the best practices for user prompting and error recovery. It’s hard to follow a path you can’t ‘see,’ or in this case ‘hear.’ Learn how prompts and error processing can help improve user experience.