In a competitive atmosphere like this, it is necessary to get data preparation right in order to reap the benefits of data analytics. According to Gartner, some organizations consider data preparation crucial where around 70 percent of their time is invested in data preparation tasks.
Ehtisham Zaidi, senior director analyst of Gartner’s Data and Analytics team and lead author of Gartner’s Market Guide for Data Preparation Tools said that – “Finding, accessing, cleaning, transforming, and sharing the data, with the right people and in a timely manner, continues to be one of the most time-consuming roadblocks in data management and analytics.”
According to Jonathan Martin, chief marketing officer of Hitachi Vantara, those who wish to transform their business with analytics, the major issue is less about mastering AI and more about mastering the data pipeline. He further added, “The data preparation piece is the most challenging. How do I identify where all this data is? Am I able to build a portfolio? Am I able to engineer the pipelines to connect all those data sources in an automated and managed and governed way to allow us to get that data to the right place, the right person, the right machine in the right time frame?”
Let’s understand what are the significant data challenges faced by organizations.
• The number and complexity of data sources and data types needed to support analytics initiatives are increasing exponentially.
• A significant amount of time, resources, skills and tools are required to access such data sources across a distributed data ecosystem, internal and external to the organization.
• The volume of requests for self-service data access and integration is challenging IT teams. It implies that the centralized IT model to data integration doesn’t work anymore.
• Owing to the varied demands of business analysts, citizen integrators, line of business users, data engineers, and data scientists, the data requirements keep changing for different projects.
What are the next-gen tools for data preparation?
With the advancement of data preparation tools, the challenges have shifted. Earlier, the issue was which data sources to connect and which data to prepare but today the organizations are involved in data governance, lineage, traceability, and quality. They encounter problems while ensuring that the right people with the necessary skills get access to the right data using data preparation tools.
Stewart Bond, research director of the Data Integration and Integrity Software service at IDC terms this as issue of “data intelligence” where data intelligence stands for metadata about the data.
He stated that – “It’s the intelligence to know where the data is, what the data means, who’s using it, who can get access to it, why we have the data, how long we need to keep the data, and how people are using it.
Eventually, the data preparation tools market is evolving while including new features to address such issues. Also the next-gen tools today embrace capabilities for sharing findings and prepared models with IT teams for operationalization. Their data management features include data cataloging, which enables users to view and search for connected data assets.
What are the essentials for a data preparation tool?
According to Zaidi, data preparation tools should possess key capabilities like – Data ingestion and profiling; Data cataloging and basic metadata management; Data modeling and transformation; Data security; Basic data quality and governance support; Data enrichment; User collaboration and operationalization; Data source access/connectivity; Machine learning; Hybrid and multi-cloud deployment options; and Domain- or vertical-specific offerings or templates.
According to Gartner, there are four categories in data preparation tools:
• Standalone data preparation tools
• Data integration tools
• Modern analytics and BI platforms
• Data science and machine learning platforms
The global research firm also sees new categories emerging with data preparation capabilities:
• Data management/data lake enablement platforms
• Data engineering platforms
• Data quality tools
• Data integration specialists
Some significant data preparation tools are:
• Alteryx Designer
• Cambridge Semantics Anzo
• Datameer Enterprise
• Infogix Data3Sixty Analyze
• Trifacta Wrangler
• Talend Data Preparation