Companies are addressing issues related to language with AI

Companies are addressing issues related to language with AI

by January 19, 2021

language with AI

Artificial intelligence (AI) has been impacting not only human lives but also various industries. Its tools, like deep learning, can increasingly teach themselves how to perform complex tasks. Besides, self-driving cars are about to hit the streets, and diseases are being treated using AI technology.

Yet despite these impressive advances, one fundamental capability remains elusive: language. Systems like Siri, Amazon’s Alexa and IBM’s Watson can follow simple spoken or typed commands and answer basic questions, but they can’t hold a conversation and have no real understanding of the words they use. If artificial intelligence is to be truly transformative, this must change.

Humans can frequently find it hard to communicate even when they are from the same city or country even though the language is the same. The variety of accents and dialects in one single language can be massive, and trying to understand them all as a human being is challenging in it. When it comes to speech recognition (ASR) technology automation, the same applies. The engine is required to understand varieties of accents, dialects, and even slang within a single language. As a human or ASR engine, one needs to understand what is being said to get the value out of what people are saying.

Accents and dialects add an extra barrier to the ability to communicate. When it comes to ASR technology, voice requires being comprehended and actioned simply and easily. The challenge for speech technology is to break down the language barrier and deliver understanding, context, and value to a conversation or speaker.

Some speech recognition companies such as Speechmatics and Paperspace are exploring solutions to address the issue related to accents and dialects.

A possible solution could be to make a speech recognition engine that is designed to work best for accent-specific language models. For instance, this means creating a language pack for Mexican Spanish, Spanish Spanish, and so on. With this approach, one gets great accuracy for one specific accent, and academically, one will get highly accurate results in most cases. This approach requires the right model for the right speech, and there are circumstances where this solution doesn’t work.

Other solution could be to build an any-context speech recognition engine that understands all Spanish accents regardless of the region, accent, or dialect. This approach has its challenges around the technical ability to build an engine in this way and the time it takes to build. However, the results speak for themselves with the frictionless and seamless user and customer experiences.

As ASR engineers only considered the accent-specific solution as a viable way to address the accent gap. From an engineering perspective, it made sense to constrain the problem to a single-accent model because it was the best way to deliver the best accuracy results for the specific accent or dialect.

This initiative also required ASR providers to build specific models for specific markets. For instance, a medical company would require an entirely different vocabulary to a utility company, and this raises a huge challenge when it comes to ASR technology. If look back to the late 1990s, engines required the user to train the ASR to their voice rather than the engine being speaker-independent.

As compute and machine learning has improved and evolved over the past decade, ASR providers have been able to widen the boundaries of what is possible with voice technology. As it became more widely adopted, it was apparent to engineers that one would never know the accent or dialect of the speaker before they used the technology, only the language.

With an all-encompassing language model, one may not get the best accuracy for a specific speaker, but one is likely to get the best accuracy across the board for that specific language. Companies like Speechmatics and Paperspace  are set to build an any-context recognition engine where they could build accent-agnostic language models. They found a way to build language models that were small enough in the footprint that makes their ASR consumable in the real world.