The changing nature of language and Automatic Speech Recognition

Smart speaker office controller concept vector illustration.

Smart speaker office controller concept vector illustration.Language whether spoken or written is an ever-evolving phenomenon, and new words and phrases are regularly introduced in its vocabulary. In modern Homo Sapiens history, it first evolved about 50,000–150,000 years ago and since then, over 7,000 different spoken languages have been established. Its versatility and adaptability, makes it difficult for Automatic Speech Recognition (ASR) service providers to keep their language models current with terminology and usage in the real-world.

The need for ASR Service Providers to Update their Language Models

Regularly updated features are introduced, and models are trained on a variety of data sets to increase recognition accuracy and guarantee that this accuracy can be attained in real-world circumstances. An ASR service provider should, for instance, be able to function and provide robust support at all times.

There will, however, always be a lag between the time it takes to find new terms in a spoken language and the time it takes to add them to training data, retrain language models, and make these freshly trained models accessible to users.

Words gaining popularity, mainly through social media

Some words or linguistic elements organically evolve over time, this has little bearing on how accurately they may be recognized. Words like “selfie,” “hashtag,” and “reels” would be added into speech recognition language models naturally as they appear in the data used to retrain and improve language data sets. These would then be released as regular product updates.

These words gradually become part of the language and often grow in popularity but are still used relatively infrequently. The impact of these newly coined words and phrases not being immediately added into an ASR language model is minimal as their exclusion is unlikely to lead to increased word error rates.

For example, if the word ‘selfie’ were to appear once within a 1,000-word transcript, the impact of incorrectly transcribing it would represent a word error rate (assuming all other words were transcribed correctly) of 0.1%. When compared to the time and money spent changing the language model to add this word (or simply a few words), which could be used to design a new language or supply new feature capability, this is insignificant.

Collecting the correct data

Speaking data is used to train language models. However, on-going examination of this training data is necessary to ensure that language models remain current. It’s also crucial that the data used is of the proper quality.

The problem of vocabulary and word diversity might be resolved by throwing data at it, but it’s not quite that simple. This big data strategy has the danger of adding bias or uncleansed, low-quality data, both of which would have a detrimental impact on the transcribed output for clients.

Obtaining high-quality data presents another challenge for ASR providers. Although some service providers rely on exploiting client data from their cloud services, intelligent customers and businesses alike are growing more concerned about how their personal data is utilised.

Words gaining unprecedented popularity: The case of COVID-19

There are some instances where words or phrases are coined and become widely used overnight. The “Brexit” vote and, more recently, “coronavirus” and “COVID-19” are two prime examples of this. Given their importance, these terms are employed exponentially more frequently in a brief period of time.

These phrases frequently have broad applications and are pertinent for a wide variety of channels. These terms start to dominate conversations across platforms, from broadcast and media to social media and contact centres. These words not only appear regularly but also do so prominently within a piece of text, such as headlines. When transcribing one of these terms, as opposed to less frequently used words, the effect on word error rate accuracy is significantly more noticeable, and the impact on end users significantly bigger.

Subscribe Now for Valuable Updates on
Transcription Trends and Advancements!

Take The Conversation Further ...

We'd love to hear thoughts on this article.
Be a part of the dicussion on Twitter or LinkedIn right now!

Share your thoughts

Disclaimer : The opinions expressed here by the Bloggers and those providing comments are theirs' alone, and do not reflect the opinions of Crimson Interactive or any employee thereof. Crimson Interactive is not responsible for the accuracy of any of the information supplied by the Bloggers. While every caution has been taken to provide readers with the most accurate information and honest analysis, please use your discretion before taking any decisions based on the information in this blog. Author will not compensate you in any way whatsoever if you ever happen to suffer a loss/inconvenience/damage because of/while making use of information in this blog.

Copyright © 2023 Understanding Transcription | All Rights Reserved.