Keyvan Mohajer Presentation
Nov 30, 2021
7 MIN READ

How Keyvan Mohajer Shaped SoundHound Into a Successful Voice AI Solution

“We have 400 employees now and are growing, but it started with a couple of kids in a dorm room,” says SoundHound Inc.’s CEO, Keyvan Mohajer. Although voice AI adoption is increasing at exponential rates now, 20+ years ago, when Keyvan first thought of the idea, it was a SciFi fantasy. 

Recently, Keyvan Mohajer, Co-Founder and CEO, SoundHound Inc., joined Iranian Students of California (ISC) in the 2nd episode of The Tale of a Success, a series of interviews with successful Iranians to share their journey with students. 

During the interview, Keyvan discussed how SoundHound Inc. evolved into the successful voice AI platform it is today, the inspiration behind the technology, the challenges faced along the way, and predictions for the future.

Read the recap below or watch the full talk with Keyvan Mohajer here.

Q: Can you tell us about SoundHound and how you developed the Houndify Voice AI technology? 

Keyvan: Absolutely. So we started SoundHound with my co-founders in a dorm room at Stanford. Our vision was to make computers better than humans in understanding natural language.

We started SoundHound with my co-founders in a dorm room at Stanford. 

We noticed that computers are better than humans at computing, but they are not always better than humans at performing certain tasks. For example, we used to beat computers in chess, but now computers beat us. When it comes to language understanding, we still don’t think computers are as good as humans. While we have complex conversations with other humans, when we talk to computers or voice assistants, we lower our expectations. We wanted to change that. 

We wanted to make computers better than humans. We thought if we could do that, the world would be a better place, and people would be more productive. It’s a vision that will continue for a long time. 

The SoundHound journey

For SoundHound, the vision of the company was always voice recognition and Natural Language Understanding (NLU). We wanted to build the core technology in-house instead of licensing it from others or using open-source because we thought that there was room for innovation. We also wanted to own it so we could have complete freedom of what we could do with it.

That turned out to be a 10-year journey. It’s not practical for a startup to spend 10 years in R&D. So, we launched several products along the way. One of them that became very successful was our music identification app, SoundHound Music.

It detects recorded music that’s playing in the background. You can also sing or hum a tune into it. The app was downloaded a few hundred million times and brought revenue to the company while allowing us to attract more funding and talent. It paved the way for us to spend 10 years building the Houndify Voice AI platform

Q: Can you share with us some of the experiences you had when you were in Iran? 

Keyvan: Yeah, so I was born in Tehran and lived there for the first 17 years of my life. Then in 1995, my family immigrated to Canada. My dad, when he was 17, went to the United States for university, and then he went back to Iran after he got his degree. 

When my brothers and I were getting older (I have two brothers), my parents decided that they would do the same thing. Instead of sending us, we all immigrated. Canada had an immigration program, so in 1995, we moved there. I haven’t been back to Iran, but obviously, I am a big fan of the culture, and I follow it.

The biggest thing that I remember is that my parents were really into Persian art and Persian traditional music, which made an influence on our upbringing. I really enjoyed those days, and I still think it’s a big part of who I am. 

Q: So it’s 1995, you move to Canada, and then the next year you get admitted to the number one school in Canada, the University of Toronto, where you started 3 companies while you were doing your bachelor’s. Can you share with us some of the experiences you had with immigrating and overcoming challenges?

Keyvan: It was very hard. My English wasn’t very good. I remember when I arrived at the Toronto airport, the immigration officer had asked me, “How old are you?” and I didn’t understand what he meant. 

We also didn’t really know anyone. So it was lonely, and we were in a different status financially. I knew that my family had sacrificed a lot to bring us there, so I felt responsible to not let them down. I worked very hard. I finished one year of high school in Canada and got really good grades.

I knew that my family had sacrificed a lot to bring us there, so I felt responsible to not let them down. I worked very hard. 

I found out about this program at the University of Toronto called Engineering Science that was supposed to be the most difficult and prestigious Engineering program. So I applied to that program. 

In Iranian culture, when you talk about yourself, it’s not very polite to promote yourself. It’s not good to say, “I’m very good.” You are supposed to be humble, but in North American culture, you have to always say, “I am the best,” which was so unfamiliar to me. 

So in my application to the University of Toronto, I had to write about myself. I actually wrote, “I’m not very good,” which was a very stupid thing to do. So I didn’t get accepted even though I had very good grades. That was very disappointing, and I didn’t know what to do.

My older brother, who’s one year older than me, taught me the concept of not taking no for an answer. So I said, let’s go there, and my brother asked them, “Why didn’t you accept my brother? Look at his grades. How can you not accept this person?”

They went into the back room, looked at my grades, and they came back and said, “Okay, we accept you.” 

But they didn’t accept me into the Engineering Science program. They accepted me into Civil Engineering, which wasn’t my top choice. I debated whether to go into the program or go back to high school for a year before applying again. I eventually decided that I’m going to just get in and take it from there. 

So I studied one semester in Civil Engineering. I got really good grades. Then I went to the Engineering Science program. I said, “I’m going to transfer from Civil Engineering to Engineering Science.” They told me no one has ever done that. Usually, students transfer out of Engineering Science because they don’t want to do all the hard work.

At the end of my second semester in the Engineering Science program, I actually got the highest grades in the whole class. I finished the four years with the highest grades as well. I worked hard. It wasn’t easy.

I also wanted to be an entrepreneur, and I really liked to build things. So I started 3 companies. The first one was when I was 19 and then when I was 20 and 21.

Q: Why did you decide to move to the U.S. and continue grad school at Stanford? What was the decision-making process of doing a Ph.D. instead of going to the industry directly? 

Keyvan: I realized that Stanford is a great school, and a lot of companies like Yahoo came out of Stanford. So it matched with what I wanted to do. 

After I started 3 companies, I realized that I wanted to be a technical founder to a very high-tech company that will make a big impact in the world, and I would want to spend a big part of my life in it. So instead of being a serial entrepreneur and starting one company for a year, I wanted to make a company that would take decades. With that thinking, I thought I should go to grad school.

Q: How did you choose your Ph.D. research topic? 

Keyvan: You have to have some idea about what you like in engineering. I was interested in signal processing and machine learning. So I decided to specialize in it, which aligned with my mission to create a high-tech company. People should align their decision with their life’s passion and ambitions. 

People should align their decision with their life’s passion and ambitions. 

Then I started asking the question: what could happen in my lifetime that doesn’t currently exist? To answer that question, I turned to SciFi—Star Wars and Star Trek. If you look at SciFi by definition, it’s futuristic. The author comes up with things that they predict will happen reasonably in the future. 

If you look at, for example, Star Trek, there are several concepts that are normal in their world, but they don’t exist in our world. For example, going faster than the speed of light, teleportation, holodecks, and replicators. All of those things would be disruptive and change our world. The one that I thought would happen in my lifetime was voice AI.

The one that I thought would happen in my lifetime was voice AI.

In Star Trek, I saw that people were talking to robots and computers, and it wasn’t just dictation. You could talk to it, and it would understand you and look up things for you. That seemed like a very great thing to have that didn’t exist at the time. 

There were the concepts of speech recognition, speech-to-text, and Automatic Speech Recognition (ASR) that was used for dictation, but no voice AI. The technology at the time wasn’t very good. But combining National Language Understanding and having a conversation was the ambition.

The year was 2000, and I thought it would happen within 20 years, and I wanted to be a part of it. So with that, I chose my thesis to be in the area of speech recognition and machine learning. I set out to learn everything there is to learn about it.

Q: So it’s now the year 2004, and you decided to start SoundHound as a company. Can you share some of the stories from that time?

Keyvan: Yeah, so I wanted it to be a voice AI company. I was doing my Ph.D. from the year 2000 to 2004, and I started going to the VC investor community. I told them in 20 years we will talk to computers and they will talk back to us and that will change computing. I asked them to give me some funding so that I can build that company. 

The feedback from the VC community was, well, first of all, 20 years is way too long for our business. Most traditional VCs want to have a return on investment in a few years, not 20. The concept of 20 years was scary for them. They told me that I needed to come back with some built technologies that can be productized.

The feedback from the VC community was, well, first of all, 20 years is way too long for our business. Most traditional VCs want to have a return on investment in a few years, not 20.

That feedback was actually reasonable and an eye-opening experience. I went back to my dorm room and wondered what to do. Then I remembered that when I was a TA, I had a student, his name was James Hom, who is now one of our co-founders, and he was telling me about this idea of a query by humming. So you can hum a song, and a search engine could tell you what it is.

I thought I could do that because I had been working on analyzing human voices. Instead of extracting speech contents, maybe I could extract the melody. Then I just had to build the technology for talking and humming for the user and matching it against the data of melodies. 

I looked up James Hom on the Stanford directory. I found him and wrote him an email. I didn’t end up sending him the email because I worried that I wouldn’t be excited about it the next day. I saved a draft of the email and thought that if in a couple of days I was still feeling excited, I would reach out.

Then the next day, as I was driving on campus, I stopped at a stop sign and James Hom was crossing right in front of me. I thought, wow, that’s meant to be. I rolled down the window and said, “James, get in the car.” 

 I rolled down the window and said, “James, get in the car.” 

I managed to convince him to get in the car and we went for coffee. I told him that I wanted to build a voice AI company, but the first product was going to be a query by humming. The core technology was something that we could build within a year. Then we could productize it within 3 years.

While we were in a dorm room working on the technology. I kept having this thought of that famous Godfather melody. We had a database of 20,000 media files that were being searched to tell us what the song was. 

Two weeks prior to Christmas, we actually didn’t leave our room. We kept working on this technology, and then it was on Christmas Eve that I finally hummed this Godfather soundtrack, and it told me, “You’re singing The Godfather.” 

It was on Christmas Eve that I finally hummed this Godfather soundtrack, and it told me, “You’re singing The Godfather.” 

We were so emotional and happy. We wanted to celebrate, but it was Christmas Eve, and nothing was open. We found a fast-food, Mexican restaurant that was the only thing that was open. We went and bought 3 burritos, and they were the best burritos we’ve had.

Q: To put this in perspective, this is 2 years before YouTube, 3 years before iPhone 1 was announced by Steve Jobs, 7 or 8 years before Siri was acquired by Apple. How did you go from asking VCs for funding to raising 5 million a year or two after you created your first MVP? 

Keyvan: We built the humming search, and that was very compelling because it was something that nobody else could do, and we did it extremely well. It basically showed the capability of our team, our technology, and our ability to solve really hard problems. We pitched it and raised the first 5 million. 

It was a difficult journey. When you’re pursuing a long-term vision, you have to constantly motivate people, and you come across a lot of discouraging and negative remarks, not just from VCs but also from people you’re trying to hire.

When you’re pursuing a long-term vision, you have to constantly motivate people, and you come across a lot of discouraging and negative remarks, not just from VCs but also from people you’re trying to hire.

So, I became very good at inspiring people all the time. I would inspire them, and that would maybe last for a few months. Then somebody would raise a flag, and I would have to do it all over again. A big part of my energy was always to inspire people. The rest would be just building the core technology and realizing the vision.

Keyvan: When we announced the Houndify platform and the companion app, Hound, that was a big moment because we had kept it a secret for 10 years. For 6 of those years, there was a lot of research and thought that our vision might not even succeed because we had a new approach to voice AI. Everyone else was doing it differently.

We said, let’s do it like the brain does. Everyone else does speech-to-text or text-to-meaning in two steps. But, the brain doesn’t go from speech to text. It goes from speech to meaning. If we could do that, we could improve speed and accuracy, and that’s what our Speech-to-Meaning® technology does.

We also wanted the voice assistant to understand really complex queries. We set the bar very high. The first 6 years were iterations and things not working. We learned from it and tried again. 

The next 3 to 4 years were spent making it really good and fast and productizing it. That was also really hard because some people said, just announce what you have and get it out there. We just said no. It has to be really good first. 

2015 was a very important moment because we unveiled what we had been working on for 10 years. 

2015 was a very important moment because we unveiled what we had been working on for 10 years. 

Then I had this concern that what if the world is not ready for what we have? History is full of stories of people being ahead of their time. Even artists like Van Gogh, who died a poor person but now his paintings are worth hundreds of millions of dollars.

I didn’t want to be that. I wanted to be appreciated for what we’ve done. So, there was a bit of fear and concern. I knew that what we had was really good, but what if the world was not ready for it? 

But our announcement was fantastic. There was this video that went viral with 2 million views. I remember waking up on the day of the announcement, and it had 10,000 views. Then, it grew to 100,000 an hour later, and to 2 million views by the end of the day.

So we saw that people were ready for it. After that, it brought a lot of adoptions, customers, and funding. Our company grew by an order of magnitude. 

Q: This question is for young entrepreneurs as they’re developing technology that Google, Amazon, or Microsoft are working on. There is a lot of competition with these giants. What was your strategy? 

Keyvan: Just the idea of competing with Google, for example, was scary, not just for investors but also when you’re hiring people. They say, “Oh, you’re going up against Google. I’m not going to join you. You’re not going to succeed.” 

For the majority of our history, that was a problem. It’s not as much a problem anymore for two reasons. One, we have already established ourselves as a major player and force with good technology and good adoption. The second reason is that there was a trend of rising disruptors, that it was possible to beat the legacy giants. 

When we started, everybody was afraid, but you just have to have the courage to do it. It is possible to compete with the giants and beat them. My advice is to don’t be afraid. If you think you have something to offer, just go after it.

My advice is to don’t be afraid. If you think you have something to offer, just go after it.

Q: What’s your vision for the next 10 or 20 years? What major disruptors will there be?

Keyvan: AI and machine learning. If you look at what happened with the industrial revolution, machines became better than humans at doing heavy lifting, which transformed the industry. Before that, being physically fit and strong was a requirement to succeed.

Now, it’s still attractive, but not necessary. The same could happen with being smart and intelligent. It could be attractive, but machines will become smarter than humans, so it actually won’t be necessary to succeed. 

Machines will be able to solve previously unsolved problems. It’s a fascinating prediction, and there are many ways to think about how it could transform the world. If you could cure diseases or solve other problems at a faster pace, it could also be scary. So, we have to keep it under control to a certain extent, but I think that it will happen. There will be a new intellectual revolution where computers will become smarter than humans at solving problems.

There will be a new intellectual revolution where computers will become smarter than humans at solving problems.

About Iranian Students of California

Iranian Students of California is a nonprofit student-led organization dedicated to uniting and supporting students of Iranian heritage in their professional career paths. Find out more on their website here.

Watch the interview in its entirety here:

Interested in Learning More?

Subscribe today to stay informed and get regular updates from SoundHound Inc.