Incorporating ML For Data Analytics Can Fathom Big Data Storage Concerns

by August 14, 2019

Data Analytics

Machine learning algorithms are discovering hidden insights into the hidden business value through existing data across multiple industries. But the fact cannot be ignored that ML for data analytics imbibes some threat and challenges when it comes to storage infrastructure.

As the data can contain the hidden value, the organization may be less concentrating on the removal of old and aging data which causes storage issue. This may also result in complicating capacity planning efforts. Not to forget that the actual analytical processes create an extra load on existing storage infrastructure.

In contrast to this, several vendors have started using artificial intelligence as a tool for solving problems generated by big data analytics. Presently, these vendors have not based their ML for analytics effort around one technology rather on a distinct bunch of technologies.


The Speed Layer and Batch Layer

While considering AI for workload profiling and capacity planning, organizations much focus on having access to current data about storage use and health. Also, depending upon the real-time data is not always desired.

The disadvantage of relying on real-time streaming data is that it is raw and uncurated possessing possible imperfections and it limits the amount of processing that can be done.

To curb this, using relatively current data (not real-time) can process more information through ML for analytics.

Additionally, once can also opt for Lambda Architecture which addresses this problem by streaming data in two varied layers. The layers are known as a batch layer and speed layer. The former’s job is to store data as it is not being acted on in real-time. The batch rule can also be employed to improve and enhance data quality. Additionally, in certain models, it can also make data available to the serving layer, which is the third layer. This additional layer creates batch views in response to query requests.

While in the speed layer, inbound data is streamed to provide real-time data views. For Lambda architecture to work efficiently, it should possess low latency and have enough scalability to accommodate the inbound data.


Custom Field Programmable Gate Arrays

Although the Custom FPGAs have been used in electrical engineering for a long time yet is relatively a novel idea to the IT industry. As it stands now, hardware vendors have started to use it as an alternative to CPUs and GPUs in ML for data analytics offerings. For a fact, Intel spent $16.7 billion to buy Altera which is FPGA manufacturer in 2015.

Well in electronics engineering, FPGA eliminates the requirement to produce an integrated circuit. Also, unlike other ICs, FPGA is programmable which enables an electronics engineer to configure an FPGA to act like a custom-built IC. Other than this what makes FPGA more desirable is its ability to achieve low latency and perform floating-point computation.


Storage Via Containers

Containerized storage is also viable for machine learning. As the ML processes tend to be relatively light-weighted, they are increasingly executed within containers.

Although best known as a platform for running business applications, containers are also viable for machine learning. TensorFlow is a great example of the ML technology which is often containerized.

The most compelling reason for containerizing TensorFlow is that its application can be run at scale. Companies can distribute computational graphs across TensorFlow clusters and can containerize their servers.


Vendor Support to Storage

Dell EMC PowerMax family is one of the first storage products to embrace ML for data analytics capabilities. The company advertises PowerMax as being the world’s fastest storage array because it supports up to 10 million IOPS and allows for 150 GBps of bandwidth.

Other than Dell EMC, HPE also uses ML to make storage more intelligent to hybrid cloud.

Dell EMC is not the only vendor to use machine learning to make storage more intelligent. Hewlett Packard Enterprise (HPE) uses machine learning for data analytics to bring intelligence to hybrid cloud storage. The intelligent storage technology of HPE examines workloads to get the insight of the underlying requirement and then relocates data to the optimal location. The relocation is based on the metrics of storage cost, performance, proximity to where the data is being used and available capacity. It also adapts to changing conditions in real-time and relocates data on demand.



Companies who wish to employ the power of machine learning for storage should commence by deciding what it is they hope to gain. Once their needs have been identified, organizations can seek for products that directly address those requirements.

Based on the needs, companies might not have to buy storage hardware necessarily to avail perks of ML. Instead, a software application could handle a task such as automated capacity planning rather than requiring new storage hardware.