ASR transcription for auto-captioning
Mar 03, 2022
7 MIN READ

How ASR Transcriptions Are Elevating User Generated Content Experiences

As voice AI continues to expand into every aspect of our lives, social media platforms, such as Snapchat and TikTok, are looking for ways to improve the video-sharing experience for their subscribers. As part of this trend, these social media platform providers have added automated subtitles to their content creation platforms through ASR transcriptions. 

Closed captioning makes the videos created by their subscribers easier to access for viewers in noisy environments and in public places where sound would be inappropriate. Importantly, the availability of closed captioning for social media content takes us one step closer to closing the digital divide by making all audio content available to people with varying degrees of hearing impairment. 

The technology behind closed captioning capabilities in real-time is Automatic Speech Recognition (ASR). Using advanced ASR, the audio portion of any video can be transcribed even as the content creator is speaking, making closed captions easy to add anytime and anywhere without interrupting the experience for the content creator. 

User-generated platform providers have always understood the imperative for exceptional experiences and the benefits of functionless, enjoyable interactions, including brand loyalty, recognition, and evangelists. ASR transcription services are making content platforms easier and simpler for users to create content that can be shared anytime and anywhere with anyone. The competition for subscribers to these platforms is fierce, and voice AI is helping social media platforms keep their competitive edge.

SoundHound provides auto-captioning for Snapchatter videos

Snap chat logo lockup

SoundHound has expanded its partnership with Snap Inc. to provide auto-captioning services for Snapchatter videos. SoundHound’s advanced automatic speech recognition (ASR) software and ability to convert speech to text in real-time enables Snapchatters to automatically add a complete transcription of the audio portion of the Snaps they create.

SoundHound is helping Snapchat innovate with auto-captions, making content accessible to more people. The ongoing partnership to provide a variety of voice AI solutions is giving Snapchatters hands-free convenience, more options for creating and sharing content, and the ability to stay on-the-go while using the platform.

After a Snapchatter records a video, they simply tap the quote icon, found in the Toolbar on the right side of their screen. They also have the option to edit the captions, drag, resize, and rotate them. From there, they can send Snaps to friends or add to their Story.

Auto captions allow Snapchatters to watch Snaps in noisy environments and still get a sense of the audio. Including auto-captions narrows the digital accessibility gap, while providing greater convenience for Snapchatters.

Including auto-captions narrows the digital accessibility gap, while providing greater convenience for Snapchatters.

SoundHound’s partnership with Snap Inc. extends back to June 2018, when its music discovery app became a launch partner for Snap Kit, Snapchat’s third-party integration platform for iOS and Android. Within the SoundHound app, Snapchatters can create a customizable music Snap containing song details, an exclusive animation, innovative effects, and a playback link. More recently, the two companies announced the availability of Voice Scan that allows Snapchatters to quickly find Lenses by using their voice.

Social media looks to ASR transcription for speed and accuracy

When users read transcriptions, they’re looking for accuracy and speed, despite noisy environments or different languages, as indicators of an exceptional experience. Being error-free is essential and could be the difference between a user understanding or misunderstanding the context of a video. If the auto-captioning is consistently providing incorrect transcriptions, users will ultimately not use it. 

ASR improves transcription accuracy through the integration of Natural Language Understanding (NLU) and advancements in technology that include custom vocabulary for a specific industry or business in addition to the language libraries of millions of words. For example, if the audience is younger, then media companies can ensure that Gen Z and Millennial slang are included. 

With noise filtering and speaker identification, ASR transcription can also produce accurate results in noisy environments or for videos with more than one speaker. In real-world environments, there are background noises, music playing, interruptions, and echoes that can all be addressed through data augmentation, far-field recognition, and noisy environment filtering for a level of precision not previously possible. 

With noise filtering and speaker identification, ASR transcription can also produce accurate results in noisy environments or for videos with more than one speaker.

ASR transcription can also address multiple languages and accented language with speed and accuracy. User-generated content platforms serve a wide range of demographics, and it’s vital that users are able to talk comfortably in their native language, accented second language, or regional speech variations and still be understood by the transcription service. 

User-generated content platform companies are expanding their user experiences with voice AI technology for more intuitive, convenient, fast, and accessible interactions. The trend to voice-enable experiences is growing among a variety of media companies, including Netflix, Pandora, and Snap Inc.

At SoundHound Inc., we have all the tools and expertise needed to create custom voice assistants and a consistent brand voice. Explore SoundHound’s independent voice AI platform at SoundHound.com and register for a free account here. Want to learn more? Talk to us about how we can help bring your voice strategy to life.

You Might Also be Interested In

Interested in Learning More?

Subscribe today to stay informed and get regular updates from SoundHound Inc.