Københavns Universitet      -        Det Natur- og Biovidenskabelige Fakultet

Coming soon – offline speech recognition on your phone

Share

More than one in four people currently integrate speech recognition into their daily lives. A new algorithm developed by a University of Copenhagen researcher and his international colleagues makes it possible to interact with digital assistants like “Siri” without any internet connection. The innovation allows for speech recognition to be used anywhere, even in situations where security is paramount.

Woman in desert appears to speak to her phone
New algorithm eliminates the need for internet connection when using speech recognition on small devices such as smart phones and digital assistents. This means that talking to "Siri" will be possible in the middle of nowhere - or on the plane home. Photo: Getty Images

Talking to a computer was once the stuff of science fiction. Nowadays, saying “Hey Siri,” or Alexa, Google or other digital assistant on a smartphone or other interactive gizmo has become commonplace. Yet, in the future the role of speech recognition may become even more important.

While studies suggest that these technologies are already used by one in four people on a regular basis, should predictions hold true, by 2025 the number of devices equipped with speech recognition will exceed the planet’s population. And the technology is still evolving.

Until now, speech recognition has relied upon a device being connected to the internet. This is because the algorithms typically used for this process require significant amounts of temporary random access memory (RAM) which is usually provided by powerful data center servers. Indeed, try switching your smartphone to airplane mode and see how far your voice commands get you. But change is in the air.

A new algorithm developed by Professor Panagiotis Karras from the University of Copenhagen’s Department of Computer Science, together with linguist Nassos Katsamanis of the Athena Research Center in Greece, and researchers from Aalto University in Finland and KTH in Sweden, allows even smaller devices like smartphones to decode speech without needing substantial memory—or internet access.

The code, recently presented in a scientific article, employs a clever strategy: it "forgets" what it doesn’t need in real-time.

“Speech recognition fundamentally works by matching the small speeech sounds we use to form words and sentences—known as phonemes—with a library of corresponding sounds,” explains Panagiotis Karras. “Probabilities are calculated for matches and the subsequent combinations that go on to form our words and sentences. The most likely sequences are calculated and the software translates these sounds into text.”

Current algorithms require increased memory the longer one speaks as all alternative combinations must remain open until the final sound is analyzed. The new algorithm does away with this problem.

“The algorithm conceived by Panos and developed further by our team, does something entirely new,” says co-developer and co-author Nassos Katsamanis. “Unlike the existing gold standard algorithm used since speech recognition’s early days, our algorithm only stores a fraction of the processing data, serving as a set of ‘coordinates.’ With these, an entire sequence can be reconstructed, which makes speech recognition possible with significantly less RAM.”

From Keywords to Entire Sentences

This maneuver may sound simple, but it involves an entirely new and unique code for which the researchers have sought a patent. This algorithm reduces the need for critical memory without sacrificing recognition quality. And though it requires slightly more time and computational power, the researchers assure that the difference is negligible vis-à-vis the muscular capabilities of modern devices.

Moreover, it works without an internet connection, thus enabling speech recognition—and potentially real-time language translation in the future, hope the researchers—anywhere, even in the depths of the Amazon jungle.

Single words or very short sentences are generally manageable when current software needs to store alternative sequences and libraries of potential sound interpretations. However, as sentences become longer and potential word combinations more complex, the demand for RAM increases.

“Certain small devices can already recognize and act based upon a few words without internet connectivity. For example, a smart home system can recognize keywords such as "turn on" or "turn off". This is known as small-vocabulary speech recognition. With our algorithm, it will be possible to recognize more extensive instructions or, in principle, entire languages – without an internet connection. This is referred to as large-vocabulary speech recognition,” says Professor Karras.

Enhanced Inclusion, Security, and Energy Savings

According to the researchers, the invention opens up a range of possibilities – from practical, security-related, and societal benefits – to its significant energy-saving potential.

For instance, many people could benefit from the ability to translate foreign languages while traveling, regardless of internet access. This is one possibility that the researchers hope to achieve. But, the societal impact of linguistic accessibility, both now and in the future, could be far more significant.

Nassos Katsamanis sees great promise in the technology: “This algorithm can help democratize language technology by making information more accessible. To make translation tools and speech assistants available regardless of internet access will allow more people to engage in society. In particular, it will help people without written language skills or those with physically disabilities, by enabling them to understand and influence societal decisions.”

Another key advantage of this speech recognition invention is its security implications. When security is paramount, the new algorithm addresses a significant problem: internet connections can be hacked. By eliminating the need for internet access, the algorithm enhances security.

Furthermore, while the energy used by data centers to support current spreech regnition technology may be invisible to consumers, it is highly relevant in a world facing climate change. The growing demand for this technology, when met by this invention, could lead to significant energy savings by reducing the enormous need for temporary memory.

“It is vital to reduce energy consumption to minimize reliance on fossil fuels, as many data centers still use these energy sources,” concludes Professor Karras.

*

Facts: Phonemes

Phonemes are the smallest units of sound in a language that cannot be replaced without altering the meaning of what is spoken. According to the Danish Language Council, phonemes are “speech sounds with meaning-distinguishing functions.”

Speech recognition algorithms use phonemes as data units to recognize and process linguistic expressions by matching spoken sounds with text.


Facts: Speech Recognition Applications

Speech recognition software is widely used in all types of digital devices, particularly smartphones and home assistants like Siri, Google Assistant, and Amazon’s Alexa.

These tools can manage tasks ranging from controlling home lighting to communicating with refrigerators. They also expand the functionality of modern automobiles, allowing drivers to keep their hands on the wheel and eyes on the road.

This software is also crucial for transcription services, translation apps and language learning tools.


More info: A Linguistic Pathfinder

To understand how computers manage speech recognition, imagine solving a maze with a pencil.

Traditional algorithms approach speech redcognition in much the same way, by exploring all possible paths and remembering every dead-end until the maze essentially memorized and the goal is reached. This process places a heavy load on temporary memory as it tracks thousands of probabilities.

Panagiotis Karras’s new algorithm uses a principle that halves the problem at every step. Instead of remembering the entire maze, it keeps track of key points, recalculating paths as needed. In speech recognition, these key points are phonemes, which are stored as "coordinates" to reconstruct the optimal sequence later. This dramatically reduces memory requirements while maintaining accuracy.

The gold standard for this method is an older algorithm called Viterbi. The process described above places demands on a computer's temporary RAM storage, as it must calculate and remember the probability for all possible position of the maze at every step along the way. This can result in the algorithm having to keep track of millions of probabilities should the maze be long enough.

Panagiotis's new algorithm employs a principle that continuously halves the problem. At every stretch along its path through the maze, it only remembers the midpoint. The result is a significantly reduced need for temporary memory, as these "midpoints" are recalculated before the final route is presented.

In speech recognition, these points are represented by phonemes – the smallest units of sound in text that are calculated as the best match for what is spoken at any given point in the sentence being analyzed. These phonemes and their probabilities are stored as something like coordinates along a path that the algorithm identifies as optimal, as it works to navigate between the first and last sounds in a sentence.

Ultimately, they can be used to reconstruct the entire "path" and provide the best possible interpretation of the spoken input as text.

About the study

The following researchers have contributed to the project:

Martino Ciaperoni
Athanasios (Nassos) Katsamanis
Aristides Gionis
Panagiotis Karras

Keywords

Contacts

Panagiotis KarrasProfessorDepartment of Computer Science, University of Copenhagen

Alternative email: piekarras@gmail.com

Tel:+45 91 41 64 69paka@di.ku.dk

Links

ABOUT THE FACULTY OF SCIENCE

The Faculty of Science at the University of Copenhagen – or SCIENCE – is Denmark's largest science research and education institution.

The Faculty's most important task is to contribute to solving the major challenges facing the rapidly changing world with increased pressure on, among other things, natural resources and significant climate change, both nationally and globally.

Subscribe to releases from Københavns Universitet - Det Natur- og Biovidenskabelige Fakultet

Subscribe to all the latest releases from Københavns Universitet - Det Natur- og Biovidenskabelige Fakultet by registering your e-mail address below. You can unsubscribe at any time.

Latest releases from Københavns Universitet - Det Natur- og Biovidenskabelige Fakultet

In our pressroom you can read all our latest releases, find our press contacts, images, documents and other relevant information about us.

Visit our pressroom
World GlobeA line styled icon from Orion Icon Library.HiddenA line styled icon from Orion Icon Library.Eye