Text to Speech in Gaming
Nov 23, 2020
7 MIN READ

How Voice AI and Text-to-Speech are Redefining the Gaming Industry Today

Video gaming has reached a new level of popularity as people turn to online gaming to fill the entertainment void created by the Covid-19 pandemic. According to NPD Group, gaming sales in the U.S. reached $29.34 billion in August 2020, up 23% from the same period in 2019. In other words, gaming is hot—becoming the number one segment in the entertainment industry. 

Gaming industry growth has also been spurred on by new hardware releases, major acquisitions, and ultra-popular franchises, including Fortnite, Call of Duty, Pokémon, and others. That growth has also given rise to higher competition among independent game producers looking for ways to capitalize on the gaming trends while being mindful of spend and the bottom line.

The most common application of Text-to-Speech (TTS) in the gaming environment is to add voices to characters or to use TTS during game prototyping. But as voice AI continues to evolve, more and more game producers are looking for ways to use TTS to make gaming more accessible and to add functionality not available through a single interface.

Advanced TTS adds voice to characters in games

As recently as a few years ago, the quality of TTS solutions did not meet the exacting standards of highly produced, big-budget games. At the time, digital voices could not accurately represent the various character types in a gaming scenario or differentiate the tone and expression of an evil character from a silly one. The robotic nature of the synthetic voices quickly earned TTS a bad reputation that many game makers still believe is true.

Just as video games have evolved into more realistic movement and interactions between the characters, similarly TTS technologies have matured. Today’s neural voices deliver the character nuances previously available only from human actors voicing the character conversations. Emotion, laughter, and other paralinguistic sounds and expressions can be added to bring gaming characters to life and mimic the speaking style of the human voice. 

Neural voices provide the immersive experience games demand while giving game producers a cost-effective alternative to hiring top voice talent or even Hollywood actors to voice game characters. Text-to-speech character voices can impart emotion, adopt a specific speaking style, and represent nuances in character personality to give life to stories and games.

Once a voice character has been created, that same digital file can be saved for use in future games that use voice recognition or as the foundation for creating other characters and story narration. With just a few tweaks to remove a specific speaking filter, the same voice can be altered from an ‘evil’ speaking character and made softer to become the hero, for example. 

TTS to prototype game dialogue

Even when live voice actors are used to create the final game dialogue, TTS can be used to reduce production costs and time. Instead of waiting to make adjustments to the script during the voice acting sessions, TTS can be used to test scripts for flaws in the dialogue or narration.

Using TTS in the prototyping phase of game development allows designers and producers to rapidly swap lines of dialog and listen to variations in real-time to ensure that they accurately represent the character, scene, scenario, or story. Making adjustments in the prototyping phase allows designers to fine tune the story line without the pressures and constraints of the studio voiceover environment.

Multilingual and other variations in TTS capabilities allow designers to have scripts “read” in different languages using multiple genders to ensure that dialog is consistent across audiences and demographics. Using TTS as a prototyping tool can significantly shorten the amount of time and money spent in production and help get the game to market sooner.

Voice AI and TTS provide greater accessibility and functionality

The promise of voice AI and TTS to bring greater convenience and ease to human interactions with machines has reached the gaming industry. Game play instructions that have historically been provided through a combination of graphic and text cues can now be expressed through voice, making the games more accessible to individuals unable to read the text or that language. 

In addition, TTS can help bring the story to life. Instead of simply providing the storyline in text form, game producers can use natural sounding digital voices to narrate the storyline and further set the stage with the appropriate emotion and emphasis. 

The growing demand for greater interactivity in the gaming community will continue to be a driving force for game designers and producers to use TTS in more ways in the future. Gamers can access more functionality if they are able to use a combination of voice and keyboard or controller input. Game producers who are able to enhance the gaming experience through voice will quickly take the lead in a highly-competitive market. 

The deployment of new neural voices that allow both time savings and a life-like quality to character voices will drive the adoption of TTS in the gaming industry. Using the improved TTS technologies of today, developers can help bring games to market quickly while reducing costs and time.

To learn more about  how TTS can voice your games and to hear some of the latest gaming library voice samples, contact ReadSpeaker

Nate Murray

Nate Murray is the North American Marketing Director for ReadSpeaker. Nate has over ten years of experience building SaaS brands, and as a native New Englander enjoys skiing, hiking, and spending time on the coast.

Interested in Learning More?

Subscribe today to stay informed and get regular updates from SoundHound Inc.