voice assistants understanding accents
Sep 02, 2021

Why Voice Assistants Need to Understand Accents

A voice assistant that responds with a low rate of accuracy fails to execute its primary function and purpose—to understand the user. While voice assistants have made strides in recent years at understanding multiple languages, accents have yet to bring about the industry-wide technological evolution that they deserve. According to Translate Day, there are an estimated 160 dialects of the English language in the world. If your voice user interface doesn’t understand accents, then a good percentage of the population won’t be able to use your brand’s voice assistant. 

Accents affect the very core of the user experience. Understanding is the first touchpoint a user has from the wake word through to queries. Some accents, such as the difference between a West Coast and East Coast accent, may only have difficulties with specific words, and the voice assistant will still understand what either user is asking at a fair rate of accuracy. People with other accents, such as those whose primary language is Indian, Spanish, or Chinese may not even be able to activate their voice assistant, leading to the ultimate failure in delivering a good user experience.  

Having a voice assistant that understands a diverse variety of accents is essential for targeting customers in different locations and expanding a target market to different areas. If a voice assistant only understands Californian English or East Coast English, then that would be excluding not only accents from native English speakers but also from those who have a different primary language. 

Diverse populations and voice assistants

According to an analysis by the Census Bureau with the Center for Immigration Studies, 67.3 million residents (20%) in the U.S. speak a language other than English at home, meaning 20% of the population is going to have their own distinct accent based on their native language when interacting with a voice assistant. Alternatively, they may decide to not even purchase a voice assistant that they know won’t understand them.

67.3 million residents (20%) in the U.S. speak a language other than English at home.

Census Bureau

Then there are accents based on different geographic regions for English speakers within the U.S. Babbel identified 14 main groups of accents for American English, including New England, Pacific Northwest, Midwestern, Southern, New Orleans, East Coast, African American Vernacular, and Native American. Each of these accents has its own distinct vocabularies, syntax structures, and phonetic rules. For example, a Boston accent often drops r’s, which could lead the voice assistant to misunderstand common words, such as park, car, card, start, and color, creating user frustration, bad reviews, or failed purchases. 

If a voice assistant can’t complete its primary function of understanding the user, the customer will stop using the technology, maybe forever. The stakes become even higher when the voice assistant is a vital part of interactions with the world, such as a voice-enabled drive-thru at a QSR or a voice AI call center. A user can’t simply abandon these types of devices like they would if it was in-car or a smart TV. Instead, they have to suffer through not being understood while trying to complete the task, creating a frustrating user experience. 

Excluding specific portions of the population, even unintentionally though lack of training data, contributes to discrimination. Companies should carefully consider accents when embarking on their voice-first strategy so they can spread their values of inclusivity and also reach a larger target market. 

The challenges of training voice models with accents

Accents are much like another language, complete with their own vocabulary, syntax, and phonetic rules. The complexities of language differences make creating accent-agnostic speech recognition systems nearly as challenging as offering distinct languages. In some situations, creating a completely different model for the accent as if it were another language is the best option to ensure it is trained for all patterns that may arise. 

The data shows that creating voice assistants that understand accents is no minor feat, with even large companies falling behind. According to a study done with the Washington Post and Pulse Labs, Google Home and Amazon Alexa had a 30% accuracy difference between native speakers versus those with accents, with Spanish and Chinese being some of the most difficult accents for voice assistants to understand and Western U.S. and Southern U.S. being the easiest. Another study by Uswitch compared 30 different British accents with Cardiff, Wales and Glasgow, Scotland having the lowest accuracy rate while London had the highest. These studies show that it truly depends on the training data used as to which accents are understood by the voice assistant.

There is a 30% accuracy difference between native speakers versus those with accents.

The Washington Post

When embarking on a voice-first strategy, companies should consider these best practices for training a voice assistant to understand accents: 

  • Have a diverse user base in training data
  • Use conversational data 
  • Start from the beginning 

Many companies fail at introducing enough accents into their training data in a conversational manner, relying on a Californian accent reading from newspapers like The Wall Street Journal as their primary data corpus instead of a variety of accents that map the different vocabularies and syntaxes that users speak in daily conversation. 

Instead, companies hoping to extend their product functionality and improve user experiences should introduce a diverse variety of accents into their training model so that it can learn as much as possible about different vocabularies, syntaxes, and phonetic rules. If only one accent is introduced, then the voice assistant won’t be able to deliver a positive user experience for a large portion of your customers.

If only one accent is introduced, then that is all it’s going to learn and won’t be able to deliver a positive user experience for a large portion of your customers.

Accented language is not something that can be added at a later point in the process. Training data and language models are key elements of building a voice assistant that will provide accurate results and exceptional user experiences. Companies need to be proactive about their voice assistants understanding accents, taking important steps from the very beginning of the project. It’s not an easy fix once halfway through the process or in the final steps. Oftentimes, it’s only after the voice assistant has been launched that companies start taking into account how big of a gap there is in understanding accents. 

Quick fixes, such as introducing gathered queries into the training model or adding specific vocabulary words into the dictionary, are only going to be a bandaid, patching a single facet instead of the larger pattern. Taking this approach, companies find they have to go back to the very beginning of the project and retrain their models to understand accents. Instead of getting caught in a voice assistant redo, carefully evaluate your approach from the beginning and introduce a diverse, conversational data set from the start. 

The importance of vocabulary for conversational AI

The material the voice actor uses to provide training data matters as much as the accent. Training models will learn and adapt to whatever vocabulary and syntax they are presented with, which means brands should also be aware of the reading material used to create the language corpus. For the best results, the content should be diverse and reflect how users speak casually and informally. 

If the content is from a source such as The Wall Street Journal or another newspaper where the writing is structured, grammatical, and formal, the voice assistant ultimately won’t understand how people speak in a conversational vernacular and create user frustration for aspects such as slang, contractions, and ungrammatical sentences. It also won’t teach the model vocabulary words that are unique to specific accents. Companies should get data from a wide variety of sources, even Twitter, to teach their training model how users speak in an everyday manner. 

Build, buy, or partner for accent-agnostic voice AI

Accents play a significant role in users’ experiences, which is why it’s so important for companies to carefully evaluate whether they want to build, buy, or partner for their voice assistant. Will the company that they partner with, build, or buy from understand the importance of accents and how to ensure a variety of them are introduced in training data?

Will the company that they partner with, build, or buy from understand the importance of accents and how to ensure a variety of them are introduced in training data? 

If a company builds their own voice assistant, they would have to collect all the training data with an assortment of accents themselves, ensuring that it comes from a wide variety of sources, is diverse, and conversational. When buying a voice assistant, the company will have to be satisfied with the training data used by a third-party platform developer with no control over what is used or how it is implemented. When partnering, they will have the flexibility to customize the training data and will be able to carefully select a partner who is knowledgeable and practiced in training models to understand accents.  

When embarking on a voice-first strategy, there are many factors to take into account, including custom wake words, sonic branding, the voice assistant’s tone, personality, and gender, and which domains should be used. With 160 dialects in the English language, 14 regional accents in the U.S., and 20% of the population speaking a language other than English at home, accents should absolutely be added to the list of considerations, as they affect such a large portion of the user base. 

Companies looking to build, buy, or partner for their voice assistant need to understand the significance of a voice assistant being able to complete its primary function—to understand the user. Without understanding, the voice assistant is little more than a technology that can only serve a very small portion of the population, and customers will ultimately turn to voice-enabled competitors that do understand them. 

When developing a voice-first strategy, consider your audiences and seek voice AI solutions that have the ability to respond accurately to the accented language of your target market. Before you decide to build, buy, or partner to develop a voice assistant, take into account the time and resources required to train voice assistant models to understand accents. If you’re looking for a voice AI platform partner, check to see what accents have already been developed and which will have to be added to the training data to match those of your users. 

At SoundHound Inc., we have all the tools and expertise needed to create custom voice assistants and a consistent brand voice. Explore Houndify’s independent voice AI platform at Houndify.com and register for a free account. Want to learn more? Talk to us about how we can help bring your voice strategy to life.

Kristen is a content writer with a passion for storytelling and marketing. When she’s not writing, she’s hiking, reading, and spending time with her nieces and nephew.

Interested in Learning More?

Subscribe today to stay informed and get regular updates from SoundHound Inc.

Subscription Form Horizontal