How and Why Voice Assistants are Moving to the Edge
Feb 03, 2022
7 MIN READ

How Edge Voice Assistants Open Up Possibilities for Device Manufacturers

While cloud-connected voice assistant solutions continue to grow in popularity, the possibilities and benefits of voice assistants on the edge are now just being realized by manufacturers across industries. The potential for voice assistants on the edge to meet growing customer demands and expand product functionality—while lowering costs and easing installation requirements for manufacturers—is truly endless. 

Edge voice assistants—embedded voice AI technology without dependence on the cloud for functionality—can deliver greater user privacy and safety. When the cloud is needed for queries, edge technology can become a hybrid solution delivering the benefits of an embedded voice AI with the greater domain knowledge afforded through cloud connectivity. 

Recently, SoundHound’s COO, Mike Zagoskek, sat down with Qualcomm’s VP of Product Management, Ziad Asghar, and SoundHound’s CEO and Co-Founder, Keyvan Mohajer, to discuss how and why voice assistants are moving to the edge. During the webinar, they talked about the benefits of edge technology, noisy environments, hybrid connectivity, and more.

Read some of the highlights of the discussion below or view the entire webinar here

Q: What do we mean by edge technology? 

Keyvan: So edge technology or edge computing means the processing of the data is done at the edge where the data is produced, as opposed to sending the data over the network to another location for processing. A good example of edge computing is autonomous driving, where a large amount of data needs to be processed quickly.

It’s not practical to send that data over the network. There are also other types of sensors in voice AI, a prominent implementation these days is when a device has a wake word followed by a query.

So when you say, “Hey Pandora, play the next song,” the wake word is an example of edge computing. The wake word technology needs to be listening and processing the audio continuously. So it’s not practical to always transmit that to the cloud.

First of all, it’s not cost-effective. It also has privacy constraints to always be transferred into data.

Q: How does edge technology open up possibilities for companies that may not have considered a voice AI solution in the past?

Keyvan: I can highlight five possibilities or advantages.

  1. Reducing costs
  2. The product can work out of the box
  3. User privacy
  4. The device can be always listening, even without a wake word
  5. The AI models in each device can learn and adapt to their environment individually 

Q: On Qualcomm’s focus on device technology, can you give us a quick overview of what edge means for you?

Ziad: Absolutely. You have to process data where it is created. It is created on the device. It is created on the edge. If you look at a smartphone, it has all of that information as to what the user is saying. 

Then, you have the queries that are essentially being absorbed by the microphone on the device. It makes sense to keep that data there for various reasons. There are various different technologies that we have baked into our products. 

With edge, you can listen in at all times with extremely low power consumption. You can even have the device further away from you, and you’d be able to listen in to it and process that information. What we have done is created a capability on the Snapdragon that allows you to be able to do the processing at less than a milliampere of current. 

We’ve added more and more AI processing power and capability over time to be able to do much better noise suppression, echo cancellation, and natural language processing. 

Edge also means that you can interact with the same experience between what you do on a smartphone or smartwatch and what you do with your auto. You want to have that same seamless experience as you go across devices. Edge is all of those things coming together. 

Q: How do edge voice assistants help with overcoming accuracy challenges, like in noisy environments?

Ziad: That is one of the bigger challenges. Let’s take the example of an automotive scenario. Let’s say you have the windows down and are enjoying the weather. The voice assistant needs to be able to take out the noise coming in the windows and do pristine noise suppression. 

Some of the techniques are traditional means for noise and echo cancellation. But as time has gone by, the results that we’re seeing with the application of AI models far exceed anything that we were able to do with conventional means. 

We’ve also been able to do more processing on the device. We just launched our new Snapdragon 8 with vastly more capability compared to a year ago.

Another thing that we’ve done also to be able to get to a much better experience is multiple microphones. The combination of all of these technologies is what completely changes the experience on the edge. 

Q: What are some of the use cases where we have a hybrid cloud and edge solution providing some of those benefits when they’re working together?

Keyvan: Connectivity is a good thing. Imagine if your smartphone did not have connectivity and how its use would decrease. We want devices to be connected. 

Sometimes we need to go to the cloud. If a user asks to turn on the lights or set a timer, that can be done on the device without going to the cloud. But if the user asks for the weather forecast, stock prices, sports scores, or restaurant reservations, then the device needs to go to the cloud. 

Even if we can understand the user question or perform the speech recognition and Natural Language Understanding on the edge, sometimes cloud models can outperform. In the cloud, you have almost unlimited memory and computing power. The memory and processing power on the edge is limited by the device.

So what some product creators do is this hybrid architecture that runs both the query on the edge and in the cloud. Then it uses an arbitration law to pick the better result. For example, if a car is going through a tunnel and loses connection, the user can still perform some tasks, such as controlling the air conditioning, turning on the radio, or adjusting the volume. 

We are seeing this hybrid architecture a lot in the automotive industry. We expect the adoption of that to also increase in other types of IoT devices.

Q: What are some of the customer experiences that you can really create thanks to the edge technology that’s coming?

Keyvan: The first thing that comes to mind is improved privacy. Edge will process the voice on the device, and the query will go to the internet. If users know that the voice assistant is listening, but the audio is not leaving the home, then they feel more comfortable with that experience. 

Q: How might enterprise benefit from moving to the edge (e.g. financial services, which seems like a good pick for privacy)? Are there any other enterprise support models that could have a measurable customer experience improvement at the edge?

Keyvan: Financial services are a good example, but healthcare, I think, is a more primary example of where privacy is very important with doctor-patient confidentiality. Voice AI can be very helpful in those settings. 

Voice biometrics is another example of what should be done on the edge. Edge connectivity can also enable new use cases, like factories where thousands of workers do something repeatedly on a daily basis. Then you take voice AI and say, can you make them 10% or 20% more efficient?

Q: What are the first steps that will help companies get the best solution for their unique customer needs? What should they be thinking about now, what steps should they take, and what questions should they ask before moving forward?

Keyvan: The obvious ones are picking the best technology for your needs, knowing your hardware, and understanding your users and what they want. Then, get a provider that can be your true partner on this journey. These journeys are long, and it’s important to know that AI models are never perfect. Some software products can be created and tested with a hundred percent predictability. Voice AI models will have errors, and you need to accept that. 

The key is to make sure you have visibility in analytics, statistics, and data usage so that you can use that information to improve your product. There are providers that hijack your product, and you don’t see the data or analytics. You don’t know who’s using it, how many people are using it, or what they’re saying. If it’s broken, you won’t have the chance to fix it.

You are in the best position to know your users. It’s essential to have visibility so that you can act on it and improve these voice AI models that are never perfect. 

Ziad: I also think it’s important to know essentially what you’re trying to achieve. It’s absolutely critical to have flexibility. Once you see the power of what’s possible, you will end up doing a lot more. So, it’s good to have that flexibility so that you are able to continue to add more capabilities. 

When we build technology, many times, we might have a certain use case in mind. Many times our customers will bring in an idea or perspective that we were not even thinking about. That’s where innovation really comes in.

Interested in viewing the whole discussion? Watch the entire webinar here

At SoundHound Inc., we have all the tools and expertise needed to create custom voice assistants and a consistent brand voice. Explore SoundHound’s independent voice AI platform at SoundHound.com and register for a free account here. Want to learn more? Talk to us about how we can help bring your voice strategy to life.

Interested in Learning More?

Subscribe today to stay informed and get regular updates from SoundHound Inc.