Kane Simms, Dustin Coates, and Mike Zagorsek
Nov 02, 2020
7 MIN READ

How Voice AI is Disrupting Industries: Recap of a VUX World Podcast Episode

A recent VUX World podcast took a deep dive into how voice AI is disrupting multiple industries. Special guest, Mike Zagorsek, VP of product marketing at SoundHound Inc. spoke with hosts Kane Simms and Dustin Coates of VUX World about the Houndify Voice AI platform. He highlighted how some of our partners (Mercedes-Benz, Pandora, and Mastercard) are using voice to create deeper relationships with their customers and extending the functionality and convenience of their products and services.

The following is a recap of some of that conversation. You can watch and listen to the podcast in its entirety here.

Kane: So Houndify has the ability for brands, developers, whoever to create experiences, exchanges, conversations on their devices, but you can also ask that assistant questions outside of the domain of what you’ve built for, because you can call upon the domains that SoundHound has available. Is that it? 

Mike: That’s a great way to characterize it. When you think about why you would want to add voice to the product, one is to extend the value of that product through a voice interface.

For instance, if you have an IoT product or an app or you’re in a car, the ability to control the product and do things that you’d normally expect that product to do through voice is empowering. It’s faster, it’s more convenient.

The other reason is to add new value beyond simple command and control. That’s accomplished by leveraging domains that are beyond the product itself, answering general knowledge questions and even adding services dedicated to your product that you couldn’t before without a voice interface. And that’s something that we’re really excited to see. You can think of it as a product extension and then value creation above and beyond that. 

Kane: What do you mean when you say you can offer things outside of your service that you wouldn’t have been able to offer before?

Mike: Imagine you have an appliance that is voice enabled. If it’s a coffee maker or a washing machine, you can start and stop it and control it. But if you want it to order more detergent or coffee, if you have a question about another product, or if you have a service issue, all of a sudden this product becomes a gateway into the brand in ways that in its previous form, it never could have been. What companies are starting to see, at least the ones that are more progressive, is that they can add value to their products by voice enabling them and they can start creating their own ecosystem and a stronger connection with the customer that is uniquely unlocked with voice.

Our position is that if you’re really trying to create value between the user and your product, you should extend that value through your own voice experience.

Kane: I want to help listeners understand the role that SoundHound plays. For example, you’ve got Alexa and Google Assistants and if you want to have a skill or an action, you can create your conversations in there and deploy them.

It sounds as though Houndify offers something similar to that, the only difference being that rather than putting it onto Alexa and Amazon control, where that’s distributed, you can control distribution. 

Mike: Alexa or Google are third-party services that are voice-enabled. We provide a very similar capability, but ours extends the brand of the company. Companies must decide if they want a third-party to take control of their product experience and risk that the product will become subsumed by it.

We think you should have a branded voice experience. We provide a custom wake word so that when you say something like, “Hello Hyundai,” you set the expectation that you are interacting with the product and the company. When it answers the question, you’re getting the answer from the product and the company.

It’s really about strengthening the customer’s relationship with the brand. We also feel it’s completely acceptable to have multiple assistants in one environment. So if you want to have Alexa and Google live alongside it, that’s perfectly fine. The user gets the choice without handing over the keys to their brand to a third-party. Instead, you’re maintaining that connection. And the last thing is, with the Houndify platform, you get to see and keep all the user data, whereas with Alexa and Google, you don’t, and that results in a loss of brand control and a loss of data control—pretty significant trade offs.

Dustin: Kane mentioned skills and actions, but it sounds like you are actually competing with Alexa for device makers and an assistant embedded inside devices. 

Mike: We don’t like to think of it as competition, because they can coexist. We’re mostly competing against the prioritization of adding voice to products. If a company makes a choice between adding skills to Alexa or creating their own assistant, then you could say that’s a competitive situation, but that’s actually a fundamentally different strategy. 

About a month ago, there was an ad for the Buick, an Alexa-enabled car. The passenger said, “It’s a Buick,” and the driver said, “No, it’s an Alexa,”  and they were really just saying, this car has Alexa and they were willing to state that it’s not a Buick, it’s an Alexa, which is a strategy.

Companies want to maintain their relationships with their users, and we’re not against them adding multiple assistants, but we don’t feel it should be at the expense of the core relationship.

Kane: What are the benefits to auto manufacturers of having their own unique voice assistant?

Mike: The data and understanding how users are interacting with the product and getting that full visibility, the brand extension—which is really powerful—and the product relationships.

If you have a question about a feature in the car or you want to control something, then what? The automaker then needs to transfer the core functionality of the vehicle over to a third-party voice assistant. So you’re not saying, “Hello, Kia, roll down the window,” you’re saying, “Alexa roll down the window.” If you add multiple assistants, then all of those third-party assistants now need to understand how the car works. If one of them updates their service, you’ll now have to maintain all these multiple platforms for what should really just be an extension of the vehicle experience or the product experience. 

Kane: In the Buick ad, the brand is actually saying that you should forget Buick and call it Alexa. Contrast that with Mercedes’ Superbowl ad from the year before last, and the entire ad was this guy running down the street and he’s talking, he’s making stuff happen. The entire experience was all about Mercedes having capability, Mercedes being able to do all the stuff that Alexa does, but it’s Mercedes, It’s a different way of building relationships.

On the one hand, you’ve got Amazon Alexa moving into vehicles and then becoming the dominant thing in that vehicle. The dominant relationship you will have in a Buick is with Alexa. Then, if Volvo launches a car with Alexa in it. How loyal are you going to be to Buick because your loyalty might be with Alexa?

On the Mercedes side, they have an assistant with their own custom wake word, their own custom functionality. You’re having a conversation with Mercedes and if Mercedes wanted to offer you the ability to connect to something like Spotify or to connect to something like Pandora or the third-party services, then Mercedes is in a position where they are the middle person between the brand and the customer relationship. 

Mike: In the Mercedes-Benz ad, they really positioned the voice assistant as a superpower. The driver said,”Hey Mercedes,” and then started making random comments about things in the world around him, which is obviously a hyperbole. But, the point they were making was when you control your environment with your voice, you have almost God-like powers and that driving a Mercedes and having that connection between you and the car and voice suddenly creates this unbelievable experience. That’s really brand extension and product extension. Users aren’t calling upon a third-party service, they’re getting this unique, integrated package that elevates what they are able to do.

That resonated really well, and it was exciting to see how invested Mercedes has been. They’ve taken a leadership position from an awareness standpoint, but other leading automakers are also onboard, including Hyundai, Kia, Honda, and PSA.

Kane: How can Houndify keep pace with Amazon and Google with machine learning, progressing as fast as it is, we know that some of these new models take a lot of data? 

MikeWe benefit from our 10 years of R&D experience, and our proprietary and very unique voice interaction platform. It has two key components, Speech-to-Meaning® and Deep Meaning Understanding.™ Our platform is different in many ways and has an advantage over traditional voice interaction platforms. 

Other voice assistants have a two-step process. First, you make your query, then it deciphers it using NLP, natural language understanding, and then it provides the result. Our founders knew that for a voice system to really work, it has to work the way people think. We’re actually processing the speech in real time and determining the context of what you’re saying. That allows us to deconstruct and reconstruct it in ways that traditional speech recognition and NLU systems can’t. The benefit is that people are already getting a superior solution.

I could do a quick demonstration that will explain. In the U.S. we have voice experience app, Hound. If you’re in the US, you can download Hound and try it out yourself. The ability to handle complex statements is one of the hallmarks of our platform and it ends up being fast and accurate.

Those are the types of experiences that often trip up other assistants, including Google and Alexa. It’s really our ability to handle complexity, exclusions, inclusions, and compound statements. You could try it with any other voice assistant and it will fail quickly. The work that we put into it gives our partners a leg up on a conversational voice experience that would be otherwise limited, even with some of these bigger players. 

Kane: What does that integration process look like for brands who want to integrate domains like Yelp?

Mike: In Houndify, we have access to hundreds of third-party data. You can integrate Yelp or weather services, sports services, and navigation. When you light up these domains, the domains leverage the core Deep Meaning Understanding technology and you instantly have the ability to ask and answer complex questions. That’s part of what has helped us attract a lot of companies, because they don’t have to build these data relationships—it comes with the platform.

Kane: Can you explain a little bit about how SoundHound enables brands to voice-enable software as opposed to hardware?

Mike: It’s always a relationship. The benefit of mobile apps is they all live within hardware that’s voice ready. In a smart speaker or a device with no screen, the software is somewhat invisible. If it’s an app, then you have a multimodal experience. For example, Pandora introduced Voice Mode to solve the challenges of music search and discovery. They wanted to create a voice experience that empowers their users, but also gives them the data, because Pandora lives in multiple environments—Alexa, Google, browsers, cars, you name it.

We created a wake word for them, and that’s a really important distinction, allowing users to play music much more directly and efficiently.

Kane: I think you’ve touched on how voice technology itself is actually disrupting the way that we use technology in the first place. If you think about Google search and when Google search first came about, that basically trained people on how to use search. Whenever you visit any website, you expect the search to work because you’ve been trained by Google. When you buy something online, you just expect it to work, you expect it to be trusted, you expect it to be fast because Amazon has shaped your expectation of what online shopping should be. 

I’m wondering whether increased access to voice-enabled devices will start to change the way people expect to be able to communicate with brands and whether there will be an expectation to be able to interact directly with a brand’s voice assistant?

Mike: We can’t envision a world without a voice-based connection to the brand itself. Your brand extension has to be consistent across the board, and your voice experience has to be consistent. That’s where the value of the brand really pays off. 

So right now, a lot of people are using voice as an extension of a tool experience and we feel because we have that conversational capability to handle more complex questions, people will evolve to start asking more needs based questions like, “I’m in the mood for some music that is relaxing.” 

Kane: One of the things that we spoke about previously is the concept of voice technologies having the potential to allow brands to do things that they wouldn’t previously have thought about doing, In fact, Mercedes and the other auto brands you partner with are in a position to generate conversations and build relationships with customers over time, and anyone who knows anything about marketing branding knows that it is all about long term relationship building and lifetime value. Think about what it will do for Mercedes’ brand loyalty when people know what it’s like to have a personal relationship with Mercedes in 15 years.

Voice technology has managed to get in a position where it’s allowing the opportunity for brands to do things that they wouldn’t have previously thought possible. And one of those is Mastercard launching a drive-thru platform using SoundHound.

Tell us a little bit about that.

Mike: A lot of payment companies are looking at the ecosystem and how they can add value above and beyond the transaction. The progressive ones, like Mastercard, are putting a lot behind innovation to be more than the point of transaction and create a point of sale experience.

Many countries have drive-thrus, but no one does it like in the United States and the demands of voice ordering from the car can be really complex, so Mastercard incorporated our Houndify platform. The drive-up platform is a digital interface and you place an order using voice the same way you would talk to a person, which our platform is perfect for, so we provide the voice experience. 

They’re tying the interface into the payment system. They also have a partnership with a company that will read the license plate and recognize the car as having personal information.

It’s a perfect example of how voice interfaces are powerful. Often, you need to make these shifts into new disruptive experiences, but it needs to work in concert with other features and capabilities. So the product maker or the product integrator has to really think holistically about the role of voice alongside other things. And then they create the disruption and it becomes repeatable. 

Kane: If I wanted to implement this as a developer, how do I go about doing that? 

Mike: You can very quickly build an assistant with a degree of custom command and control that takes advantage of the domains that we have. The default wake word we offer is “Okay Hound,” If you want a custom wake word, that’s something that we currently partner with you to build, and building a custom domain currently is something that we partner on. The roadmap is to open up the platform so that developers can be more independent.

To get more of the conversations and hear reactions from the audience, watch the podcast and video here:

Developers interested in exploring Houndify’s independent voice AI platform can visit Houndify.com to register for a free account or talk to us about how we can help you bring your voice strategy to life.

Interested in Learning More?

Subscribe today to stay informed and get regular updates from SoundHound Inc.