The Golden Age of Natural Language Processing

December 17, 2019

Natural Language Processing (NLP), the automatic manipulation of natural language by software, has been around for decades and in its early days produced some amusing results, like translating “out of sight, out of mind” into “blind fool.” “The girls are sewing buttons on the second floor” is an example of a sentence that a computer would have trouble understanding (is the second floor the location of the girls or the target of the sewing?) because it lacks general real-world intelligence. Machines may be good at mathematics, but it is difficult to translate a language problem involving speech and text into mathematical formulas. However, the past few years have brought major breakthroughs in NLP technology that are ushering in a new “Golden Age” of NLP, with total revenue projections of over $22 billion by 2025, according to Tractica.

The first breakthrough in NLP technology in this current Golden Age came in 2013 with the Word2Vec algorithm, which automatically reads through giant datasets of text and learns the pattern of associations and relationships between all the words inside that dataset. Rather than attempting to “understand” text, Word2Vec looks for correlations among word embeddings.

For example, in the previous sentence, the words “rather” and “than” precede the word “attempting,” and the word “attempting” is surrounded by the words “rather,” “than,” “to,” and “understand.” Word2Vec then compresses this information into a smaller dimension to create a compact encoding of the learned vocabulary. Just as squinting makes vaguely similar objects look more alike, this compression forces individual word correlations to form logical patterns that express word relationships. The result is a mathematical vector for each word that expresses that word’s relationship to the compressed patterns. These vectors can be used to deliver highly accurate results for diverse NLP tasks such as machine translation, question answering and sentiment analysis. They word vectors can also be used to create fascinating mathematical relationships on words, e.g., “gospel – God = jazz” and “fish + music = bass.” Word2Vec has its weaknesses, e.g., it can only store a single representation for words like “pit” or “bat” that have multiple distinct meanings. Nevertheless, it provided a quantum leap in NLP accuracy.

The next NLP advance, in early 2018, was ELMo (Embeddings from Language Models). Unlike Word2Vec, which uses a fixed vector for each word, ELMo calculates vectors for individual words taking the surrounding sentence or paragraph into account. ELMo then uses deep learning, with multiple hidden layers that help capture word correlations. This enables ELMo to learn different levels of linguistic representation at different layers. For example, an early layer may focus on primitive information such as distinguishing nouns from verbs, while a later layer may focus on higher-level information such as distinguishing “Janine” from “Jamie.” As a result, ELMo outperforms Word2Vec in a wide variety of NLP tasks.

Later in 2018 came an even more dramatic NLP breakthrough with the release of BERT (Bidirectional Encoder Representations from Transformers); as you can see, NLP researchers are fond of whimsical algorithm names. BERT is based on the attention mechanism, which starts with the observation that humans process information by paying attention to important details and tuning out noise. Intuitively, then, in an NLP problem like machine translation, given a training corpus of correlated sentences (e.g., source text translated into a different language), the attention mechanism can learn the set of positions in the source sentence where the most relevant information is concentrated.

NLP researchers were able to dramatically improve the performance of the attention mechanism by using multiple attention distributions, creating an algorithm known as the Transformer. BERT takes the Transformer one step further by introducing “masked language modeling,” where 15% of the words in a sentence are randomly masked. Then, a Transformer is used to generate a prediction for each masked word based on the unmasked words surrounding it, both to the left and right. BERT may seem like nothing more than a minor tweak on the Transformer, which itself is a minor tweak on the attention mechanism. These small tweaks can produce dramatic improvements in the quality of NLP outputs.

At the moment, BERT is the darling of the NLP research community, producing results for some NLP tasks, like question answering, that exceed human-level accuracy. However, NLP likely won’t stay this way for long. Here is a fearless prediction: in a year, we will have forgotten BERT in favor of an even better algorithm named ERNIE. While the name may end up being wrong, the principle stands: we are at the dawn of a new age in NLP, where better algorithms are being discovered on what seems like a monthly basis, with no end in sight. Many of these algorithms are released to the public, not just as research papers, but as Open Source code that almost immediately finds its way into commercial NLP products. Welcome to the Golden Age of NLP!

As a consumer, you have probably already noticed the quantum leap in the computer’s ability to understand human speech. From Alexa or Siri waking us up in the morning with the day’s to-do list to chatbots answering our customer service call, we are increasingly surrounded by NLP machines. With the technology to more efficiently aggregate data and derive actionable insights from it now available, organizations that are not looking to embrace NLP will find themselves at a disadvantage and be left behind.

So how could your enterprise benefit from NLP? Some internal use cases include:

  • HR: automated resume scanning to rapidly screen relevant candidates
  • Knowledge sharing: finding topic experts; searching the enterprise knowledge base
  • Legal: analyzing contracts to identify contradictions or violations

There are also many customer-facing use cases, such as:

  • Customer service chatbots
  • Automated customer survey analysis
  • Market intelligence based on web and social media content

Additionally, there are many vertical-dependent use cases, such as:

  • Healthcare: efficient billing from unstructured physician’s notes
  • Education: automated grading of homework
  • Media: automatic subtitling and translation

The bottom line: NLP is real, it’s here, and it can improve your business’s profitability, both by increasing customer satisfaction and by saving back office costs.

Bio: Moshe Kranc, Chief Technology Officer

Moshe Kranc is the chief technology officer at Ness Digital Engineering. Kranc has extensive experience in leading adoption of bleeding-edge technologies, having worked for large companies as well as entrepreneurial start-ups. He previously headed the Big Data Center of Excellence at Barclays’ Israel Development Centre (IDEC). He has worked in the high-tech industry for over 40 years in the U.S. and Israel. He was part of the Emmy award-winning team that designed the scrambling system for DIRECTV, and he holds 6 patents in areas related to pay television, computer security and text mining.


About Ness Digital Engineering
Ness Digital Engineering designs, builds, and integrates digital platforms and enterprise software that help organizations engage customers, differentiate their brands, and drive profitable growth. Our customer experience designers, software engineers, data experts, and business consultants partner with clients to develop roadmaps that identify ongoing opportunities to increase the value of their digital solutions and enterprise systems. Through agile development of minimum viable products (MVPs), our clients can test new ideas in the market and continually adapt to changing business conditions—giving our clients the leverage to lead market disruption in their industries and compete more effectively to grow their business. For more information, visit


Ness LinkedIn:

Ness Twitter: