Custom wake words
Feb 09, 2021
7 MIN READ

5 Keys to Better Wake Word Design

Wake word is not just a voice trigger. It’s the entrance to your brand and your business. Your wake word should be responsive without barging-in uninvited, enticing your customers to come in and get their needs met quickly, accurately, and without frustration. For companies adopting a voice assistant, the wake word is often the first step toward customization. It can also be the tip-off that developing a customized voice assistant that meets customer needs and demands isn’t as easy as it seems.

For companies adopting a voice assistant, the wake word is often the first step toward customization.

If you’re contemplating adding a custom wake word to your product, there are many factors to consider, beginning with selecting the actual wake word or wake phrase that will represent your company. There are more elements that go into this decision than one might think. For example, your wake word should be unique, and it needs to represent your brand but not rhyme with very many other words. 

It’s also good to think about the number of syllables that your wake phrase will have. An industry standard is to have a minimum of three syllables. This will ensure that you don’t rhyme or overlap with too many other words in your target language and risk false positives—causing the device to respond uninvited.

When developing a custom wake word for our partners, we follow a series of steps and best practices to ensure the best customer experience possible. Here are five tips to get you thinking about your custom wake word implementation.

1. Collecting accurate training data

After you have chosen a wake phrase that is both unique and doesn’t sound like other words in the lexicon of your users, it’s time to begin collecting data to train the actual model. 

When selecting training data, be sure to include a large variety of voice samples from speakers with a wide range of pronunciations. This is where customer knowledge and a clear vision of the product roadmap are critical. Your wake word must respond to the way your users speak, including understanding accents and vocal differences by gender, age, and geographical region.

When selecting training data, be sure to include a large variety of voice samples from speakers with a wide range of pronunciations

To avoid inherent biases, you’ll want to balance male and female data, and collect voice samples from a wide range of age groups, regional accents, and cultural demographics to use as your training data. There are also options for artificially generating training data, which may help increase the volume or rate at which you can develop a model.

2. Designing wake words for the user’s environment

In some cases, you might want to collect the audio samples on the device itself. This is especially important if there are different DSP algorithms or noise cancellation software embedded in the voice assistant itself. The environment, and any sounds that will enter the audio pathway, are critical data points for the ultimate effectiveness of the wake word. Introducing those artifacts into the overall training data will ensure they don’t affect the quality of your voice experience.

For example, if your voice assistant will be resident in a car, the ambient noise it must filter will be very different from the sounds that a device in the home or in the workplace will encounter. When developing training data, try to mimic your user’s environment as much as possible. One of the best ways to ensure your wake word and your voice assistant will operate accurately is to collect audio from the actual user environment. 

When developing training data, try to mimic your user’s environment as much as possible.

Environmental considerations in-car should include the type of car, road noise audible in the car, changes in noise with increased or decreased speed, and with windows up or down. In the home, you might want to think about the size of the room, how different furniture might absorb or refract sound, and various layouts and distances from the user to the device. 

3. Determining the machine learning architecture

The next step—after you’ve collected training data for your unique wake phrase—is to decide which type of architecture you’ll use to train the model. There are multiple options to choose from, including a fully connected neural network or a convolutional neural network. Depending on which you choose, you may be facing certain trade offs. 

One thing to consider is the number of parameters or features that you’ll use. The more you have, the longer the training cycle. In some cases, you might be opening yourself up to problems of overfitting the model—resulting in a less responsive and accurate wake word.

The goal is to achieve the highest degree of accuracy possible within the time you have for training the model. When determining the machine learning architecture, the size of the model and the number of parameters and features you choose to include will most likely be based on the amount of time you have for training.

The goal is to achieve the highest degree of accuracy possible within the time you have for training the model.

4. Evaluating the voice model’s performance 

At the performance evaluation stage, there are two numbers you’ll want to look at: The false reject rate of the model and the false acceptance rate of the model. The false reject rate, or false negative, occurs when the model does not respond to the wake phrase. A false acceptance rate, or false positive, is when the model responds to words other than the wake word. 

When assessing the model’s performance, you’ll want to make sure that you have a control set of data that was not included in the training portion of the model. Keeping a set of test data ensures you can assess the wake word without placing a bias on the result or clouding the results of the false reject rate of the model. 

To measure the voice model’s performance, run the control data against the model and record the rate of false positives, false negatives, and accurate responses that result from the total number of samples in the control group. For the false acceptance rate, you can take any amount of data that is not the wake phrase, run it against the model, and see how often it wakes up. 

Keeping a set of test data ensures you can assess the wake word without placing a bias on the result or clouding the results of the false reject rate of the model. 

Augment these tests by adding background noises, background speech, and any effect may be present in the user’s environment. An example of background noise might include music, radio, or human speech. Observe and record the performance of your wake word or wake phrase in a non-studio environment to understand actual user experience and accuracy rates. 

5. Improving the wake word model

Before you set out to improve the wake word model, you’ll need to make some decisions about how tolerant you want to be about false rejects and false accepts. Depending on the use case, the user experience could be worse if your wake word is too hard to trigger or annoying if it triggers too easily. In either case, you’ll have to accept the trade-offs between these two outcomes.

A Detection Error Tradeoff (DET) Curve is a popular method for seeing the trade-off between optimizing for suppressing all false negatives versus capturing all true positives. DET Curves are also helpful when comparing models against one another. 

Depending on the use case, the user experience could be worse if your wake word is too hard to trigger or annoying if it triggers too easily.

Once you have a first model, you’ll want to improve it. Start by determining exactly what it is you’re trying to improve. Then, add more data, subtract data, or change the features that you’re using to match those goals. For example, if your first model had a high false reject rate, it could indicate that you didn’t have enough variance or variability in your speaker set and that you need to add additional data.

One of the greatest benefits of a branded wake word is in the customization of the experience for users. Honing in on the exact set of speakers who make up your user group, tailoring the experience to match the hardware, service, or app where it will be used, and analyzing and responding to the environment are the elements that make a custom wake word valuable to the user and to the brand.

For many internal teams, collecting all the data from different environments, voice types and accents, and training the models can be significant hurdles to developing and deploying a custom, branded wake word. 

At SoundHound, Inc. we’ve had 15 years to perfect our training data and processes. Our teams are experienced at compiling the right data to create custom wake words that result in exceptional customer experiences and positive user feedback. 

Explore Houndify’s independent voice AI platform or register for a free account. Want to learn more? Talk to us about how we can help you bring your voice strategy to life.

Kayla Ragulski

Kayla Regulski is a technical program manager who loves helping her coworkers succeed as much as she loves a craft beverage after a long run.

Interested in Learning More?

Subscribe today to stay informed and get regular updates from SoundHound Inc.