Sound2Sound (S2S) Search Science
Sound2Sound (S2S) Search Science is a revolutionary technology, created by SoundHound, capable of recognizing various sound inputs including music and speech. It offers a breakthrough combination of speed and accuracy unattainable through traditional approaches to sound recognition.
S2S performs recognition by extracting features from the input signal and converting them to a compact and flexible Crystal representation. This Input Crystal is then matched against a database of Target Crystals which have been derived from searchable content.
Target Crystals are automatically generated from a range of data formats including both audio data (such as recorded music and user voice recordings) and non-audio data. Traditional text-based content, for example, can be seamlessly ingested by S2S due to its powerful synthesis techniques.
In all cases, recognition is performed directly on the Crystal representation – matching “sound to sound” – avoiding the error-ridden conversions (such as from “sound to text”) present in traditional systems. Furthermore, S2S is fully parallelizable, resulting in the ability to achieve speed and accuracy simultaneously as opposed to sacrificing one for the other.
S2S is a versatile technology that solves many different sound recognition problems. Examples include:
S2S Applied to Speech Recognition
S2S enables users to search text based content with their voices, achieving game changing accuracy and “instant” response times even when searching very large databases. S2S matches the user's voice to audio-oriented features extracted from the text database during a pre-processing synthesis phase, thus bypassing the brittle speech-to-text conversion phase employed by traditional engines.
S2S Applied to Music Identification
S2S has enabled SoundHound to create the world’s fastest music recognition system. Recognition takes place in real-time, eliminating the user’s perceived latency. Using rich and flexible audio features, S2S can successfully identify songs using as little as 2.5 seconds of audio, even in noisy environments and when searching against millions of audio clips.
S2S Applied to Singing & Humming Search
S2S has enabled SoundHound to offer the industry’s only viable singing and humming recognition engine. S2S matches multiple aspects of the user's rendition (including melody, rhythm, and lyrics) with millions of user recordings from midomi.com. The matching technology works regardless of the user's key or tempo and takes advantage of lyrics (when present), independent of language.
S2S Applied to Text Search
S2S enhances SoundHound's text search by providing the ability to understand pronunciation, enabling correct results for a range of misspelled searches.



