Ashish Kumar: Pushing Teams to Innovate Faster-to-deploy Tech Solutions for Businesses

Ashish Kumar: Pushing Teams to Innovate Faster-to-deploy Tech Solutions for Businesses

Data Science and Artificial Intelligence have brought about a remarkable transformation in businesses. Disruptive technologies are strengthening every aspect of global industries, starting from retail, manufacturing, finance, education to defense. AI technologies are driving businesses towards efficiency and success through advanced analytics, machine learning, and deep learning.

Anchoring advanced analytics, Indium Software is providing technology solutions with deep expertise in digital and QA services. The company provides advanced digital solutions to its customers through big data, data engineering, stream processing, advanced analytics, and other services.

Ashish Kumar is the Principal Data Scientist at Indium Software. With over 8 years of experience in the field of data science and advanced analytics, Ashish is anchoring the advanced analytics practice to drive ML-based business solutions, which include project delivery management & team-building, presales & architecting ML solutions, and R&D and innovation on tougher problems.

Besides this, Ashish is also responsible for building and growing the company's outstanding innovation, which is the NLP framework called TeX.ai.

Ashish is highly experienced in data science consulting. He has solved ML problems across industries such as semiconductor, consumer finance, transportation and logistics, pharma, real estate, etc. He is well-versed with different data science packages in Python, advanced SQL with a deep understanding of the mathematics of data science algorithms.

He has authored books (Learning Predictive Analytics with Python and Mastering Pandas) in the data science domain. He is a mentor/SME at several leading online data science training companies. Also, Ashish has a degree in B. Tech from IIT Madras and a Postgraduate Diploma in Leadership from Ashoka University.

Defining Success through Innovation

Under Ashish's leadership, the company invented a new SaaS product called teX.ai. It offers end-to-end text analytics services including but not limited to text extraction from a variety of documents viz. PDF, images, and website, text summarization on a large corpus of documents e.g. customer reviews, legal documents, and text classification on a large corpus of text data into various pre-defined categories for better cataloging and/or targeted marketing.

teX.ai builds upon popular machine learning, natural language processing, and deep learning techniques and algorithms to power its core. The company has contained its innovation in creating different recipes, like unusual combinations, hyper-parameter optimization, new variant creation of these techniques and algorithms to get efficient business insights. A hybrid of linguistic and ML approaches for key-phrase extraction is the perfect example of this.

Few other examples of novel usages of these algorithms are:

  • Using CNN-like methods for identifying table or chart-like areas and using pre-trained OCR methods to extract tabular data. Marking non-tabular data as peripheral and using Conditional Random Fields for structuring.
  • Using edge detection methods to identify cell boundaries of a table to extract each cell and then OCR it at a cellular level for high accuracy. Train a new CNN for handwritten digit and character recognition.
  • Using K-means clustering on ELMo embeddings of extracted key phrases to semantically cluster these key phrases.

Ashish was proud to share with us that teX.ai was recognized as one of the top machine learning products by Forbes in 2019.

Meeting Demands with Industry-Centric Solutions

Indium Software's innovation teX.ai handles various use cases in a world inundated with unstructured data, especially text. This has led to an ever-growing market for easy-to-use NLP solutions, especially given that NLP skills are quite tough to learn and implement. The company's innovation caters to that demand by giving faster-to-deploy solutions for the following use cases:

  • Extraction of tabular and selected peripheral data from vendor invoices stored as a PDF in JSON, XML, and other formats for further analytics pipeline automation.
  • Extraction of text data from images containing the marks of students in different subjects for automatic storage in a structured format.
  • Extraction of credit information tabular format credit reports published by MLCB to be used in credit scoring.
  • Extraction of tables and charts from PDF reports prepared by BFSI or BFSI research agencies to usable CSV formats.
  • Automated Contextual Extraction and ML (CRF) – powered conversion of Metes and Bounds PDF files to Traverse files to be fed into GIS tools like ESRI.
  • The extraction of bank statements with each transaction as elements of nested JSON. With it, the extraction of peripheral data like bank name, account holder name, account holder address, and such other information by using custom NER powered by CRF and LSTM-CRF algorithms with F1 score as high as over 90%. This can handle any structure of tabular and peripheral data as long as the quality of PDF is good.
  • Extraction of tabular data (like the percentage composition of chemicals, etc.) from the Certificate of Analysis PDF files. And extraction of peripheral data like organization name, certificate number, manufacturing date using custom NER powered by CRF and LSTM-CRF algorithms. This can handle any structure of tabular and peripheral data as long as the quality of PDF is good.
  • Find positive and negative sentiment distribution about selected attributes of the product (s). Competitor analysis of similar products from different brands.
  • Find negative and positive key phrases in reviews under areas like service quality, delivery, product quality, and others to identify the improvement and reinforcement areas.
  • Perform unsupervised semantic clustering of the extracted key phrases to visualize the phrases in 2D and the similar phrases together.
  • Create a semantic search engine to find reviews, chats that are contextually similar to the query term.
  • Automatically categorize the product catalog of an e-commerce company based on the product description for better conversion.
  • Automatically flag off spam emails, messages, and other errors to help mass advertisers only send relevant information to their potential and existing customers.

Laurels and Recognition

Ashish states that the company's latest innovation has created several positive effects in the industry including faster go-to-market for NLP solutions, faster processing of and insights generation from unstructured data like PDFs, customer reviews, logs, images, and automation of unstructured data processing reduces the costs and time taken.

The innovation has proved to be a major success for the company and has also helped the organization by opening up an additional revenue stream, increased visibility, and increasing the number of leads from large Fortune 500 companies. It also led to the creation of an in-house talent pool and CoE for NLP and ML problems, opened up and materialized several cross-sell with many of our customers, and build an R&D and detail-oriented rigor in the company's solution delivery.

The company also partnered with AWS for hosting the innovation as the cloud platform. AWS also listed Indium Software in their marketplace to increase visibility. The company also engaged with various growth consultants to increase the sales footprint and used a lot of open-source Python libraries and open-source hugging face modules.

Challenges that Paved the Road to Success

Ashish mentions that their innovation would need few changes like the selective extraction of entities from documents would need retraining for extracting a new entity. The different kinds of documents would give the best results with different methods. Therefore, some trial and error would be needed to reach the best result. Also, the key-phrase identification would yield the best result if it is trained on a dataset that is specific to the prediction use case, i.e. training on a new dataset needed for best results.

Related Stories

No stories found.
logo
Analytics Insight
www.analyticsinsight.net