GigaSpaces Technologies: Integrating Data Science and IT Operations with MLOps Capabilities

May 16, 2020

Yoav Einav

Machine Learning is deemed as one of the key driver for the fourth industrial revolution. With more business firms now welcoming machine learning insights into their fabric of software advancements, to overcome the complex process involved in deploying it, DevOps methods are employed on the machine learning models. This emerging term in professional machine learning applications is called as MLOps. GigaSpaces Technologies, a computer software company, provides leading in-memory computing platforms for real-time insight to action and extreme transactional processing. With GigaSpaces, enterprises can operationalize machine learning and transactional processing to gain real-time insights on their data and act upon them at the moment.

In an interview with Analytics Insight, Yoav Einav, Vice President of Product, GigaSpaces Technologies shares how the company is breaking down the difference between MLOps and DevOps for better data analytics and management.


Kindly give us a brief overview of MLOps

Businesses are adopting Machine Learning Operations (MLOps), an approach developed to facilitate communication between data scientists and the operations or production team to help manage the production machine learning life cycle.  This includes every step from data preparation, model training, and validation, packaging, deploying, as well as monitoring the model accuracy and closing the loop for retraining. Similar to the DevOps term in the software development world, MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. This collaboration includes the ability to deploy machine learning (ML) projects using today’s production infrastructures like Spark and Kubernetes, both on-premise and in the cloud.


Could you tell us which problems does MLOps solve?

Adopting an MLOps approach will help organizations close the loop between gaining insight and turning it into actionable business value.

Many organizations underestimate the amount of effort it takes to incorporate machine learning into production applications. This leads to entire projects being abandoned halfway (75% of data science projects never make it to production) or to the consumption of far more resources and time than first anticipated. The ability to deploy new models in production remains a challenge around data integration, building the feature vector at scale, as well as serving the model in a production environment. In fact, the ML pipeline to deploy new models can take weeks or months, with many models never making it to production.


What are the challenges solved by MLOps? 

One of the goals of MLOps is to help the machine learning (ML) engineers and data scientists leverage cloud assets (public or on-premise) for ML workloads. Data scientists typically use tools like pandas’ read_csv and load data into memory, which are designed for small data sets that reside on a single machine that cannot work at scale. Pandas are used in the lab for small data tasks and then migrated to PySpark to leverage Spark for extensive data sets, upon which data engineers will rewrite it to a Spark job in Scala as part of the CI/CD process. The process has a great deal of manual overhead and is very error-prone.

In addition, naive data preparation methodologies that are used for models won’t work on real-world data that is constantly changing.  Latency or computation power constraints require a fundamentally different data pipeline that depends on stream processing, fast key/value, and time-series databases to deliver information or actions in real-time. There is also the challenge of tracking highly dynamic privacy requirements and regulations, which requires a resilient compliance process.


Can you highlight the difference between DevOps and MLOps? 

MLOps is an ML engineering culture and practice that aims at unifying ML system development (Dev) and ML system operation (Ops). Practicing MLOps means that you advocate for automation and monitoring at all steps of ML system construction, including integration, testing, releasing, deployment, and infrastructure management.

An ML project team can include researchers who are not software engineers.  Machine learning is experimental in nature. It’s difficult to track what worked and what didn’t.  In addition to unit and integration tests, machine learning products require validating and evaluating the models in an ongoing manner. Models can experience a degradation in performance due to constantly evolving data profiles and the nature of data seasonality (i.e., Christmas, Weekend, Brexit, COVID-19, etc.).  Notifications need to be sent when data values deviate from expectations.


How can the performance of machine learning models be improved? 

An in-memory analytics platform, such as GigaSpaces InsightEdge that combines the power of sub-second processing speeds of your operational data, elasticity to handle peak events, and co-location of data and ML models can significantly accelerate machine learning models in production. The entire ML pipeline is automated and shortened dramatically by connecting to the core infrastructure, either cloud or on-premise, operational data sources (CRM, ERP, Operational RDBMS), and analytical data sources (DWH, Data Lake, Analytical RDBMS), while also building the feature vector, serving the model and acting based on its score. The data is shared, while the model itself is stored locally in each partition, and the feature vector is built-in production using Massively Parallel Processing to reduce data serialization and shuffling, which provides extreme performance and minimal overhead in terms of networking and OS. The streaming data can also be contextualized with external data sources, like news feeds or weather information, or object stores and file systems, like Amazon S3 or HDFS, by utilizing advanced index techniques and event triggers to accelerate this process in order of magnitude.


How can machine learning models be monitored? 

The machine learning data pipeline can be monitored to identify small problems before they affect general performance. Administration tools can be used to enable users to view performance metrics and system alerts at the cluster level, as well as measuring the model’s accuracy in production to identify problem areas immediately. Since the data used to train models becomes stale, quickly continuous monitoring and regular training of the model is essential to maintain the model’s accuracy. Monitoring tools also enable the ability to control versions and apply A/B testing, blue-green deployments, and Canary testing capabilities with no downtime, thus providing higher levels of validation and effectiveness.