Nov 17, 2022

How New Voice AI Breakthrough Will Change Human-Computer Interaction Forever – Starting With How You Order Takeout

Back in 2007, Apple founder Steve Jobs showcased the iPhone touchscreen, and human-computer interfaces as we knew them changed forever. The company’s next-generation smartphone effortlessly leapfrogged its competition, and allowed all of us to enjoy interacting with devices in a more natural and intuitive way.  

With the same objective, today SoundHound unveiled Dynamic Interaction™ – a technology that marks the next big evolution in how people will interact with computers, allowing them to use their voices just as seamlessly as we currently use our fingers to search, scroll, and select. 

In tech speak, what we’re introducing is known as a “multimodal full-duplex interface.” It combines fragment-level, real-time natural language understanding and audio-touch input that coexists interchangeably with audio-visual output.

What that means is that Dynamic Interaction is built to understand and take action on (not just transcribe) the complexity of human speech by breaking it down into small individual components and processing them as fast as they are spoken. The interface then uses audio (verbal) or visuals (screen) to share what it understands back with the user “live.” They in turn can then edit or modify a query or request in real-time using speech or a touchscreen — creating an effortless two-way multimodal communication. 

Gone are the clunky wake words, awkward pauses, turn-taking, mishearing and having to speak in a rigid way in order to be understood. It’s as fluent as human-computer interaction has ever been. 

The Dynamic Interaction voice interface is also trained to ignore off-topic speech, is multimodal (so it can give live feedback and allow modifications, changes, and edits through voice or touch across a variety of devices), and will make proactive suggestions based on the topic and content of a conversation. 

And yes, to steal Jobs’ line: “Boy, have we patented it.”

But technology should always be a case of “show” and not “tell.” 

The test case

Because we believe Dynamic Interaction is such an incredible development, we’ll be rolling it out in one of the toughest environments imaginable – restaurants. Food ordering is a challenge that many organizations are seeking to solve with varying degrees of success. 

With this technological leap forward we can rise to meet the ever-increasing demand of the industry. This live, multimodal system will  – for example – let a customer speak their order and watch it build on screen while simultaneously enabling edits and modifications either by natural speech or touch – and they can talk to family members in the car at the same time. 

Though voice technology is already being deployed in such scenarios, current systems lack this kind of simplicity and user-friendliness. It has led to a slowing of adoption and a mistrust in the capabilities of voice technology. 

Like Apple’s touchscreen, we believe Dynamic Interaction can leapfrog earlier attempts at getting voice ordering right to deliver a much more natural human-machine interaction that vastly improves customer experience. 

Category changing

SoundHound’s Dynamic Interaction will ultimately help restaurants, customer service businesses, and a number of other service industries automate without exhausting or losing customers along the way. 

Importantly, we also believe that it will change the perception of the category, as we take another important step towards a future where we communicate fluently with technology in the most natural way we know – using our voices. 

Keyvan Mohajer

As Co-Founder and CEO of SoundHound, Keyvan Mohajer envisions a world where custom voice assistants transform how we interact with machines—making lives more convenient and productive.

