Language whether spoken or written is an ever-evolving phenomenon, and new words and phrases are regularly introduced in its vocabulary. In modern Homo Sapiens history, it first evolved about 50,000–150,000 years ago and since then, over 7,000 different spoken languages have been established. Its versatility and adaptability, makes it difficult for Automatic Speech Recognition (ASR) service providers to keep their language models current with terminology and usage in the real-world.
The need for ASR Service Providers to Update their Language Models
Regularly updated features are introduced, and models are trained on a variety of data sets to increase recognition accuracy and guarantee that this accuracy can be attained in real-world circumstances. An ASR service provider should, for instance, be able to function and provide robust support at all times.
There will, however, always be a lag between the time it takes to find new terms in a spoken language and the time it takes to add them to training data, retrain language models, and make these freshly trained models accessible to users.
Words gaining popularity, mainly through social media
Some words or linguistic elements organically evolve over time, this has little bearing on how accurately they may be recognized. Words like “selfie,” “hashtag,” and “reels” would be added into speech recognition language models naturally as they appear in the data used to retrain and improve language data sets. These would then be released as regular product updates.
These words gradually become part of the language and often grow in popularity but are still used relatively infrequently. The impact of these newly coined words and phrases not being immediately added into an ASR language model is minimal as their exclusion is unlikely to lead to increased word error rates.
For example, if the word ‘selfie’ were to appear once within a 1,000-word transcript, the impact of incorrectly transcribing it would represent a word error rate (assuming all other words were transcribed correctly) of 0.1%. When compared to the time and money spent changing the language model to add this word (or simply a few words), which could be used to design a new language or supply new feature capability, this is insignificant.
Collecting the correct data
Speaking data is used to train language models. However, on-going examination of this training data is necessary to ensure that language models remain current. It’s also crucial that the data used is of the proper quality.
The problem of vocabulary and word diversity might be resolved by throwing data at it, but it’s not quite that simple. This big data strategy has the danger of adding bias or uncleansed, low-quality data, both of which would have a detrimental impact on the transcribed output for clients.
Obtaining high-quality data presents another challenge for ASR providers. Although some service providers rely on exploiting client data from their cloud services, intelligent customers and businesses alike are growing more concerned about how their personal data is utilised.
Words gaining unprecedented popularity: The case of COVID-19
There are some instances where words or phrases are coined and become widely used overnight. The “Brexit” vote and, more recently, “coronavirus” and “COVID-19” are two prime examples of this. Given their importance, these terms are employed exponentially more frequently in a brief period of time.
These phrases frequently have broad applications and are pertinent for a wide variety of channels. These terms start to dominate conversations across platforms, from broadcast and media to social media and contact centres. These words not only appear regularly but also do so prominently within a piece of text, such as headlines. When transcribing one of these terms, as opposed to less frequently used words, the effect on word error rate accuracy is significantly more noticeable, and the impact on end users significantly bigger.
Share your thoughts