Emotionally Intelligence Voice Assistants
Apr 14, 2021
7 MIN READ

Is Emotional Intelligence the Next Step for Voice AI?

Advanced speech recognition that includes the ability to understand and accurately respond to complex and compound queries, spoken in multiple languages and with imprecise, natural speech is a problem solved.

For a few years now, advanced voice AI technologies have been delivering on the promise of branded voice assistants that deliver conversational AI to leading brands and their customers. 

Through these customized voice assistants, people can talk to the products and services around them by simply using their voice. When employed in mobile apps or for voice ordering and voice shopping, voice AI reduces friction and delivers easier, more convenient interactions—hands-free.

Although the movie, “Her”, fantasized about a future where voice assistants take on human attributes that make them unidentifiable as machines, the reality of that level of voice experience hasn’t yet made its way into the products and mobile apps of today. 

That being said, Voice AI technology developers are already focusing on creating more human-like voice assistants. Currently, emotion detection is helping voice assistants deliver more personalized experiences and providing customer sentiment analytics for call centers. 

Some companies, like OTO, have developed voice analytics models that can detect unique speaker attributes—such as voice gender and language—as well as identify the emotion and energy levels of the human speaker. These insights are being leveraged for call centers to create a new way to measure sentiment and customer satisfaction.

Emotion detection in speech technology is only the beginning of an effort to make our voice experiences more human-like, personal, and branded. The next, natural step in this process is the awakening of emotional intelligence in voice AI systems to create emotionally-appropriate responses based on the various emotions detected in the user’s voice.

Emotion detection in speech technology is only the beginning of an effort to make our voice experiences more human-like, personal, and branded.

Emotion Design

User experience designers have always focused on making things easier and more pleasurable to access and use. Up until now, voice assistants have followed the same design principles established for visual interfaces. To move forward with human-to-machine interactions, emotion design must become part of voice user interface development.

With the challenge of creating accurate speech recognition behind us, designers can look beyond ease of use, speed, and accuracy. Through purposeful emotion design, voice assistants have the potential to be trusted companions and advisors that successfully navigate the range of human emotion and motivators.

To move forward with human-to-machine interactions, emotion design must become part of voice user interface development.

In 2003, Donald A. Norman published a book, “Emotional Design,” that highlighted some research on how attractiveness affects people’s perceptions of how well things work. Through a series of observations, he discovered that “Attractive things work better than ugly ones”. That conclusion was drawn from an experiment carried out in Japan where researchers tested different ATMs on users. Although all the machines had identical functions and buttons, some of them were more visually pleasing—with better layouts and screens. 

The results of the experiments concluded that users consistently found that the more attractive machines worked better than the unattractive ones. Norman explains the results by saying:

“Attractive things make people feel good, which in turn makes them think more creatively. How does that make something easier to use? Simple, by making it easier for people to find solutions to the problems they encounter.”

Donald A. Norman

The ATM experiment results also imply that when something is attractive, the person interacting with it feels more relaxed and therefore better able to problem solve. The same design concepts can be easily applied to a voice user interface. If the voice assistant speaks in a tone, cadence, and volume that responds to emotional cues from the user’s voice, driving can become safer, contact center calls can help build positive customer relationships, and difficult situations can become more manageable.

TTS: It’s not what you say, it’s how you say it

Already TTS systems can be trained to sound more or less excited. In the case of a video game character, the TTS may be more aggressive or angry, while other characters can sound happy, depressed, or tired. By simply raising the volume and the rate of speech or lowering both, some basic emotions can be communicated with current TTS technology. The combination of emotion detection technology and the ability to adjust the voice assistant’s tone and cadence are leading us to a world with more emotionally intelligent voice assistants.

Training the TTS to mimic human emotions may begin with expressing the 6 basic human emotions—happiness, sadness, fear, anger, surprise, and disgust. Still, our voice assistants won’t be fully emotionally intelligent until they can adopt and mimic the full 27 distinct categories of emotions which influence everything we do—including decision making. 

The combination of emotion detection technology and the ability to adjust the voice assistant’s tone and cadence are leading us to a world with more emotionally intelligent voice assistants.

According to Antonio Damasio, author of, Descartes’ Error,” humans cannot separate their feelings from their thoughts. His research has revealed that human emotion is so intertwined with reason that one cannot make a decision without both.

In the future, we may be interacting with voice assistants in every aspect of our lives—as interfaces into machines, mobile apps, services, and as companions. In this voice-enabled world, we will likely expect those voice entities to adhere to societal norms by combining their logic with emotion—as we do. 

As emotionally intelligent entities, voice assistants will be able to recognize emotions and react appropriately, providing the sort of guidance and advice you might get from another person. In some situations, this level of interaction could be as dramatic as providing life-saving behavior or as simple as providing the voice cues needed to help diffuse an emotional situation.

Emotion detection in the car

Imagine a scenario where a driver is frustrated by the traffic, driving in a new area, and asking for directions to avoid further delays. The tone and tenor of that person’s voice could be detected as frustrated and confused by the voice assistant. In response, further driving directions can be given in a slow, calm, moderated voice. 

When the in-car voice assistant detects a potentially dangerous driving situation based on the driver’s state of mind, it can provide proactive suggestions. Lowering the temperature in the car, turning on the driver side chair massage, dimming the dashboard lights, or turning on some soothing music may help to lower the driver’s stress levels.

In some cases, driving can simply be a boring, lonely experience. Currently, a voice assistant that can act as the co-pilot—proving navigation, destination information, and in-car controls makes driving more convenient and safer through hands-free control. 

If that same voice assistant has emotional intelligence, it can act as the front-seat passenger, providing companionship and alleviating boredom. Now, drivers on long road trips have a human-like person to talk to while they drive—joking around and sharing information. Carrying on a conversation with another entity adds a level of interactivity to in-car entertainment that could dramatically decrease fatigue and daydreaming—and improve the safety of driving alone.

If that same voice assistant has emotional intelligence, it can act as the front-seat passenger, providing companionship and alleviating boredom.

Empathetic call center agents

Frustrated, anxious, and stressed are all common emotions experienced by people calling into customer contact centers. Callers are often trying to solve a problem or get information to help inform a buying decision.

Reaching a voice assistant means that these customers don’t experience further frustration by being forced to push a series of buttons and repeat information at various stages of the call. Efficiencies and the delivery of fast, accurate information creates a better user experience and ultimately more customer satisfaction, but it doesn’t build relationships.

The best call center agents are able to perceive the emotional state of the customer, listen patiently to their concerns and respond with empathy and efficiency. Even among human contact center agents, a lack of empathy for the caller can result in an escalation of the issue—instead of resolution.

Enter an emotionally intelligent voice assistant that can not only recognize how the caller is feeling, but react to it appropriately. A pleasing voice that adapts to the emotional state of the caller with an appropriate tone of voice and rapid and accurate information can increase the efficiencies of a customer center by reducing the number of calls that are escalated up the line. Following these interactions, customers are likely to feel more satisfied and less frustrated the next time they have to call for support.

Currently, emotion detection is used to inform action after the call is complete. Callers who leave the contact center feeling frustrated are flagged and follow-on calls can be made by an agent. Imagine the possibilities if that caller never hangs up in frustration.

A pleasing voice that adapts to the emotional state of the caller with an appropriate tone of voice and rapid and accurate information can increase the efficiencies of a customer center.

If the voice assistant is able to detect the emotions of the caller and respond in a way that ends with customer satisfaction during the first call, the result would include greater efficiencies and higher consumer sentiment.

Applying emotion design to voice assistants

The research by Donald Norman concluded that aesthetics matter. People are calmer, less stressed, and more able to problem solve when the machine they are working with is visually appealing. When those same concepts are applied to voice assistants, we can assume that people would be more likely to interact with a voice assistant, continue to engage, and even increase engagement if that user interface were pleasing to talk to.

Engaging in small talk, adjusting tone of voice and volume, and using the lexicon of empathy and emotion elevates the voice experience to mimic human-to-human interactions. If we imagine each voice assistant as an individual with specific personality traits, we can see those entities becoming characters in the brand narrative. To achieve this goal, voice AI designers and developers of the future may work with script writers to create dialogues for these voice assistant personalities.

People would be more likely to interact with a voice assistant, continue to engage, and even increase engagement if that user interface were pleasing to talk to.

In addition, these unique voice assistants will be closely associated with individual companies and will become a key element in their branding portfolio. The sound, personality, and ability to relate to customers at every stage of the customer journey will draw people into the brand.

The same voice will be heard across channels and will adjust to perform the appropriate functions with an attitude and emotional intelligence that reflects its purpose and the values of the company it represents.

Until now, we have focused on the IQ of voice assistants. In many cases the problem of understanding complex and compound queries and delivering accurate and rapid responses has been solved. Now, companies are looking to improve the EQ of their voice user interfaces as a way to improve voice experiences, retain customers, and remain competitive.

Assuming that people don’t use their emotions in decision making situations is to fall into the same trap of “Descartes’ Error.” In the new world of voice AI, “I think therefore I am,” will soon be replaced with “I think and feel, therefore I am more useful.”

Karen Scates is a storyteller with a passion for helping others through content. Argentine tango, good books and great wine round out Karen’s interests.

Interested in Learning More?

Subscribe today to stay informed and get regular updates from SoundHound Inc.