Coming soon – offline speech recognition on your phone
More than one in four people currently integrate speech recognition into their daily lives. A new algorithm developed by a University of Copenhagen researcher and his international colleagues makes it possible to interact with digital assistants like “Siri” without any internet connection. The innovation allows for speech recognition to be used anywhere, even in situations where security is paramount.
Talking to a computer was once the stuff of science fiction. Nowadays, saying “Hey Siri,” or Alexa, Google or other digital assistant on a smartphone or other interactive gizmo has become commonplace. Yet, in the future the role of speech recognition may become even more important.
While studies suggest that these technologies are already used by one in four people on a regular basis, should predictions hold true, by 2025 the number of devices equipped with speech recognition will exceed the planet’s population. And the technology is still evolving.
Until now, speech recognition has relied upon a device being connected to the internet. This is because the algorithms typically used for this process require significant amounts of temporary random access memory (RAM) which is usually provided by powerful data center servers. Indeed, try switching your smartphone to airplane mode and see how far your voice commands get you. But change is in the air.
A new algorithm developed by Professor Panagiotis Karras from the University of Copenhagen’s Department of Computer Science, together with linguist Nassos Katsamanis of the Athena Research Center in Greece, and researchers from Aalto University in Finland and KTH in Sweden, allows even smaller devices like smartphones to decode speech without needing substantial memory—or internet access.
The code, recently presented in a scientific article, employs a clever strategy: it "forgets" what it doesn’t need in real-time.
“Speech recognition fundamentally works by matching the small speeech sounds we use to form words and sentences—known as phonemes—with a library of corresponding sounds,” explains Panagiotis Karras. “Probabilities are calculated for matches and the subsequent combinations that go on to form our words and sentences. The most likely sequences are calculated and the software translates these sounds into text.”
Current algorithms require increased memory the longer one speaks as all alternative combinations must remain open until the final sound is analyzed. The new algorithm does away with this problem.
“The algorithm conceived by Panos and developed further by our team, does something entirely new,” says co-developer and co-author Nassos Katsamanis. “Unlike the existing gold standard algorithm used since speech recognition’s early days, our algorithm only stores a fraction of the processing data, serving as a set of ‘coordinates.’ With these, an entire sequence can be reconstructed, which makes speech recognition possible with significantly less RAM.”
From Keywords to Entire Sentences
This maneuver may sound simple, but it involves an entirely new and unique code for which the researchers have sought a patent. This algorithm reduces the need for critical memory without sacrificing recognition quality. And though it requires slightly more time and computational power, the researchers assure that the difference is negligible vis-à-vis the muscular capabilities of modern devices.
Moreover, it works without an internet connection, thus enabling speech recognition—and potentially real-time language translation in the future, hope the researchers—anywhere, even in the depths of the Amazon jungle.
Single words or very short sentences are generally manageable when current software needs to store alternative sequences and libraries of potential sound interpretations. However, as sentences become longer and potential word combinations more complex, the demand for RAM increases.
“Certain small devices can already recognize and act based upon a few words without internet connectivity. For example, a smart home system can recognize keywords such as "turn on" or "turn off". This is known as small-vocabulary speech recognition. With our algorithm, it will be possible to recognize more extensive instructions or, in principle, entire languages – without an internet connection. This is referred to as large-vocabulary speech recognition,” says Professor Karras.
Enhanced Inclusion, Security, and Energy Savings
According to the researchers, the invention opens up a range of possibilities – from practical, security-related, and societal benefits – to its significant energy-saving potential.
For instance, many people could benefit from the ability to translate foreign languages while traveling, regardless of internet access. This is one possibility that the researchers hope to achieve. But, the societal impact of linguistic accessibility, both now and in the future, could be far more significant.
Nassos Katsamanis sees great promise in the technology: “This algorithm can help democratize language technology by making information more accessible. To make translation tools and speech assistants available regardless of internet access will allow more people to engage in society. In particular, it will help people without written language skills or those with physically disabilities, by enabling them to understand and influence societal decisions.”
Another key advantage of this speech recognition invention is its security implications. When security is paramount, the new algorithm addresses a significant problem: internet connections can be hacked. By eliminating the need for internet access, the algorithm enhances security.
Furthermore, while the energy used by data centers to support current spreech regnition technology may be invisible to consumers, it is highly relevant in a world facing climate change. The growing demand for this technology, when met by this invention, could lead to significant energy savings by reducing the enormous need for temporary memory.
“It is vital to reduce energy consumption to minimize reliance on fossil fuels, as many data centers still use these energy sources,” concludes Professor Karras.
*
Facts: Phonemes
Phonemes are the smallest units of sound in a language that cannot be replaced without altering the meaning of what is spoken. According to the Danish Language Council, phonemes are “speech sounds with meaning-distinguishing functions.”
Speech recognition algorithms use phonemes as data units to recognize and process linguistic expressions by matching spoken sounds with text.
Facts: Speech Recognition Applications
Speech recognition software is widely used in all types of digital devices, particularly smartphones and home assistants like Siri, Google Assistant, and Amazon’s Alexa.
These tools can manage tasks ranging from controlling home lighting to communicating with refrigerators. They also expand the functionality of modern automobiles, allowing drivers to keep their hands on the wheel and eyes on the road.
This software is also crucial for transcription services, translation apps and language learning tools.
More info: A Linguistic Pathfinder
To understand how computers manage speech recognition, imagine solving a maze with a pencil.
Traditional algorithms approach speech redcognition in much the same way, by exploring all possible paths and remembering every dead-end until the maze essentially memorized and the goal is reached. This process places a heavy load on temporary memory as it tracks thousands of probabilities.
Panagiotis Karras’s new algorithm uses a principle that halves the problem at every step. Instead of remembering the entire maze, it keeps track of key points, recalculating paths as needed. In speech recognition, these key points are phonemes, which are stored as "coordinates" to reconstruct the optimal sequence later. This dramatically reduces memory requirements while maintaining accuracy.
The gold standard for this method is an older algorithm called Viterbi. The process described above places demands on a computer's temporary RAM storage, as it must calculate and remember the probability for all possible position of the maze at every step along the way. This can result in the algorithm having to keep track of millions of probabilities should the maze be long enough.
Panagiotis's new algorithm employs a principle that continuously halves the problem. At every stretch along its path through the maze, it only remembers the midpoint. The result is a significantly reduced need for temporary memory, as these "midpoints" are recalculated before the final route is presented.
In speech recognition, these points are represented by phonemes – the smallest units of sound in text that are calculated as the best match for what is spoken at any given point in the sentence being analyzed. These phonemes and their probabilities are stored as something like coordinates along a path that the algorithm identifies as optimal, as it works to navigate between the first and last sounds in a sentence.
Ultimately, they can be used to reconstruct the entire "path" and provide the best possible interpretation of the spoken input as text.
About the study
The following researchers have contributed to the project:
Martino Ciaperoni
Athanasios (Nassos) Katsamanis
Aristides Gionis
Panagiotis Karras
Keywords
Contacts
Panagiotis KarrasProfessorDepartment of Computer Science, University of Copenhagen
Alternative email: piekarras@gmail.com
Kristian Bjørn-HansenJournalist and Press ContactFaculty of Science, Copenhagen University
Tel:+45 93516002kbh@science.ku.dkAthanasios (Nassos) KatsamanisPrincipal ResearcherInstitute for Language and Speech Processing, Athena Research Center, Greece
Tel:+30 210 6875405nkatsam@athenarc.grLinks
ABOUT THE FACULTY OF SCIENCE
The Faculty of Science at the University of Copenhagen – or SCIENCE – is Denmark's largest science research and education institution.
The Faculty's most important task is to contribute to solving the major challenges facing the rapidly changing world with increased pressure on, among other things, natural resources and significant climate change, both nationally and globally.
Subscribe to releases from Københavns Universitet - Det Natur- og Biovidenskabelige Fakultet
Subscribe to all the latest releases from Københavns Universitet - Det Natur- og Biovidenskabelige Fakultet by registering your e-mail address below. You can unsubscribe at any time.
Latest releases from Københavns Universitet - Det Natur- og Biovidenskabelige Fakultet
One of world's largest glacier floods triggered in Greenland12.12.2024 11:03:34 CET | Pressemeddelelse
For the first time, scientists have observed the release of a massive glacial lake outburst in East Greenland, where more than 3,000 billion liters of meltwater were unleashed in just weeks. This rare, natural flooding event, witnessed by University of Copenhagen researchers, provides new insight into the immense and potentially hazardous forces that meltwater can unleash.
Talegenkendelse på telefonen kan snart klares uden internet12.12.2024 09:08:06 CET | Pressemeddelelse
I dag har mere end hver fjerde gjort talegenkendelse til en del af hverdagen. Ny algoritme, fra forsker ved Købehavns Universitet og internationale kollegaer, gør det for første gang muligt at tale med ”Siri” og andre digitale assistenter uden internetforbindelse. Med opfindelsen vil talegenkendelse kunne bruges hvor-som-helst, men også når sikkerheden er højeste prioritet.
Kemiske reaktioner gør plantedrikke fattige på næring12.12.2024 06:07:00 CET | Pressemeddelelse
En analyse af indholdet i plantebaserede drikke viser, at de har én ting tilfælles: De er fattigere på protein og essentielle aminosyrer, end komælk er. Forklaringen ligger i den omfattende forarbejdning, hvor kemiske processer både forringer kvaliteten af proteinerne og i nogle tilfælde skaber nye bekymrende stoffer. Det viser et studie fra Københavns Universitet.
En af verdens største gletsjerfloder udløst i Grønland11.12.2024 14:30:00 CET | Pressemeddelelse
For første gang er det lykkedes at observere udløsningen af en gigantisk gletsjerflod i Østgrønland, hvor over 3.000 milliarder litersmeltevand blev sluppet fri på blot få dage. Denne sjældne naturbegivenhed, som forskere fra Københavns Universitet overværede, giver ny indsigt i de enorme og potentielt farlige kræfter, som smeltevand kan udløse.
Electrical stimulation of the nervous system can improve motor learning11.12.2024 11:17:11 CET | Press release
Researchers at the University of Copenhagen have demonstrated that the brain's ability to learn certain skills can be significantly enhanced if both the brain and nervous system are primed by carefully-calibrated, precisely-timed electrical and magnetic stimulations. This new research has the potential to open entirely new perspectives in rehabilitation and possibly elite sports.
In our pressroom you can read all our latest releases, find our press contacts, images, documents and other relevant information about us.
Visit our pressroom