In our recent expert webinar, “How Top Automakers are Improving Voice Experiences with Houndify,” SoundHound, Inc engineers Scott and Kyle shared insights they’ve gleaned from working with some of the largest auto brands like Mercedes-Benz, Honda, Hyundai, Kia, and revealed why leading automakers are choosing Houndify’s Voice AI platform to help them continuously improve in-car user experiences and brand loyalty.
During the webinar, attendees saw a demonstration of Houndify’s powerful, independent Voice AI technology, and how it helps auto companies customize and differentiate, while retaining control over their brand and users.
After the webinar, participants engaged in a lively Q&A session with our experts. Here is a summary of the questions and responses from that session:
Q. Can your speech to intent algorithm function fully offline for basic vehicle control like entertainment, lights, etc?
A. Yes. Depending on hardware capabilities, we can handle even complex queries— like address search and POI search entirely offline—and also offer a hybrid solution for leveraging the full power of the cloud when networking is available and all expected features even when completely offline.
Q. Are all your speech-to-intent models hosted locally on vehicle?
A. That depends on vehicle hardware capabilities. If we can leverage on board GPU, we can host complex models locally on vehicle. With sufficient memory and GPU, most car essential capabilities can be hosted locally.
Q. When it comes to data queries how does your tech differentiate between a person, a restaurant, a car etc.?
A. This is part of what makes Speech-to-Meaning(™) so powerful. We use a number of different internal techniques to decide what is the most likely meaning by using context, data sources, popularity, our own models, and other signals. This is all done in real time, on a per-query basis.
There are various approaches used for entity detection. Some techniques use grammatical approaches and others use statistical approaches that require training data.
Q. What are the main languages that are be supported?
A. We are actively working on more than a dozen languages across the globe.
Q. Speaking of collective AI, can you give an example for a cross-domain query and explain the way it’s handled?
A. One example is: “What’s the weather and turn on the AC”. Our Speech-to-Meaning technology uses Query Glue (™) to evaluate the possible combinations dynamically and determines if there’s a match in a single domain or with several domains. We then automatically combine the responses so that the relevant information is returned.
Q. What language do you use? Pytorch or Tensorflow?
A. We constantly experiment with different frameworks, both public and proprietary, always looking for better performance.
Q. What about with self-driving cars? How does that change what you build based on what multiple passengers want?
A. As the self-driving technology evolves through the L2 > L5 arc, the focus on drivers will shift to passengers. We expect new use cases pertaining to controlling the in-car environment (such as climate control and media control for different zones inside the cabin), a deeper level of personalization across multiple vehicle and ride share companies, new use cases related to destination entry (with intelligent waypoint routing), and new types of information being shared between riders.
A. It depends on the model and use cases. For example, on some embedded use cases we can tune our accuracy for automotive solutions so that the WER between offline and online models have similar accuracy.
Q. Do you have data that supports that it’s better to have different wake words in the car vs. what users are already used to at home?
A. Our research with users and companies suggests that the wake word is best associated with the brand relationship and helps create positive brand associations and loyalty. So within a car, using a “Hello Hyundai” or “Hey Pandora” phrase reinforces the users interaction with the car and the brand perception as a whole. At home, a user may choose a different voice-enabled product for various home-related activities. For controlling things within your home from your car, it’s simply a matter of integrating smart home capabilities into the car’s voice AI platform.
Q. When you do inference, do you use GPU on device?
A. Yes, on automotive devices which have a GPU available. Since we provide solutions to a wide range of use cases and partners we make use of the GPU when it’s available but can provide a solution even on hardware in which it is not—with the obvious compute power limitations without a GPU.
Q. Do you store customer data? How do you protect their privacy?
A. Please refer to our privacy policy here
For general knowledge questions, Houndify uses a number of different data sources and partners to get the raw data. Then, we apply our NLU on top of this data to understand what the user is looking for and provide the correct data.
Houndify is designed to be an extensible platform that can allow different knowledge bases to plug in to answer questions like this. SoundHound can provide knowledge bases for many domains and others can also add their data to our platform to help answer these questions.
Q. How many intents and slots can your system handle?
A. Houndify platform has more than 100 public domains (and counting) and many more private domains available. Using our Speech-to-Meaning technology, those domains can handle very complicated queries with an almost unlimited number of permutations.
Q. Who is responsible for software updates to voice API? Is it the auto OEM or SoundHound?
A. For our cloud solution, software updates for our server are made in the cloud by us. They do not require any updates to the client. Any updates to the client or app are made by Auto OEM. We work with the OEM to make updates available but each relationship can be unique and customized.
Q. Can updates be delivered over air?
A. Yes, the client updates can be delivered over air.
Q. What is the perspective of usage for motorcyclists? Are they able to use navigations hands-free?
A. Because Houndify is a voice AI platform, we can understand navigation-related queries in any environment, and can handle the conversation flow for initiating navigation and disambiguating between potential destinations without a screen. We are partnering with manufacturers across the globe and have seen interest in motorcycle-based use cases.
Q. User friendly solutions make the applications powerful. What are the solutions you offer to assist the users?
A.We provide a variety of features to make this possible. You can download Hound app for iOS or Android (available in the US) to get a demonstration of the power of our voice assistant.
Q. Who determines which vendor is chosen in the responses if there’s more than one option (e.g., Mc. Donalds vs. Starbucks vs Dunkin’ or 7–11)
A. The Houndify platform is all about giving developers and companies options that best fit their products. The idea of Collective AI is that anyone can contribute to the platform and make their data available to other developers.
The Houndify client has flexibility to choose which domains to enable. Within a selected domain, there are many heuristics that determine the precedence/order of the results that are returned. For instance, if you are looking for gas stations, the distance and gas price will determine the order of results. If there are two retail domains, one for 7–11 and one for QuickMart, the client can choose to enable one or the other. If both are enabled, there will be various heuristics used to determine which one domain is able to give a better response to the user.
Q. Do you have data around how long a typical user is likely to wait for a response before starting to think the system has crashed/gotten off track, etc. NNG’s usual 1 sec/10sec limit seems to not fit voice systems.
A. Each application of our platform may be different, so we provide a mechanism to adjust the time to respond. The parameter is Voice Activity Detection (https://docs.houndify.com/reference/VoiceActivityDetection).
Q. Does your system provide graphical outputs like display cards for weather?
A. Yes, as shown in today’s demo, in addition to the voice response, Houndify also provides the information necessary for graphical outputs like display cards. Our consumer apps Hound and SoundHound are great reference applications illustrating that—so do check those out.
Q. How many different types of embedded systems do you support on the vehicles?
A. We provide solutions to many different automotive partners, and we do our best to support whatever systems they may use as part of our Houndify solutions.
Q. What about user privacy?
A. We are GDPR compliant and you can find additional information in our privacy policy: https://www.soundhound.com/en/privacy
Q. Can a brand change the voice by gender, regional accents, etc. or even add its own voice?
A. Yes, we offer a wide variety of voice options with different genders, accents, etc. We can also work with our partners to create a custom voice for their brand like we did with Pandora.
Q. There are many screen types and sizes on the destination car brands. How do you apply your solutions to these screens?
A. Houndify returns the user-friendly written and spoken responses and provides the data relevant to the user’s query. It’s up to the client application to manage the display on the various screen types and sizes.
Q. Can users teach the system to remember preferences? Like call my girlfriend, who is your GF? Andrea .. okay I’ll call your GF
A. Yes, the Houndify system can support multiple pieces of technology to make that capability possible. We allow for dialog and follow up queries. We also allow custom data to be stored like your wife’s name, your home and work address, your mom’s phone number, etc. We could also extend it to include Speaker Identification so it knows who “my” is depending on who the speaker is in the car.
Q. What are the biggest challenges on the NLU and Dialogue side? Is there opportunity for collaboration?
A. There are many challenges to NLU such as mis-recognition (“Show restaurants except Chinese”) and creating natural responses, but our Speech-to-Meaning and Collective AI technologies are able to tackle these challenges efficiently.
A. Houndify is an open platform where anyone can sign up and build voice-enabled products. Our goal in the near future is to enable anyone to use the Houndify platform to build solutions and then sell those to car manufacturers, appliance makers, or other software companies.