Data has become the most important asset for fuelling growth and driving change in this age of technology. Data science, along with technologies such as machine learning and artificial intelligence are having profound impacts on businesses and are rapidly becoming key business differentiators in a highly competitive world. Seth DeLand, Data Analytics Product Manager at MathWorks, in an exclusive interview with Analytics Insight talks about how data science platforms help organizations to make better decisions and deliver superior results in less time.
Analytics Insight: What features do data-science platforms offer and what are the advantages of using such a platform?
Seth: Data science platforms make tools for data science accessible to a wider audience of users. This includes tools for connecting to data sources, preprocessing data, combining data from various sources, applying machine learning techniques, and deploying analytics to production systems. Traditionally, these tools have only been accessible to those with strong computer programming skills and were not designed to be easily integrated as part of a larger workflow. By providing these tools in an easy-to-use package, data science platforms offer data science access to a wider audience, helping organizations cope with the resourcing pains due to the small number of practitioners with formal data science training. Data science platforms also serve as an internal standard for data science workflows, facilitating collaboration between teams on data science projects.
Analytics Insight: What sorts of companies would have use of a data science platform?
Seth: We see interest in data science from a wide range of businesses – from industrial equipment companies looking to use data science for predictive maintenance, to financial services companies using data science to develop new trading strategies. In each of these applications, data is available or can be collected to solve long-standing business problems. For example, in predictive maintenance, operators of expensive industrial assets have found that they can reliably predict remaining useful life and optimize service schedules by applying machine learning techniques to the data generated by equipment sensors. Data science platforms make it possible for the engineering teams that develop and maintain the equipment to leverage their wealth of knowledge about how the equipment should operate. This idea of empowering the engineers, or “domain experts”, is often more appealing than hiring data scientists who have little knowledge of how the equipment operates.
Analytics Insight: For what purposes are Data Science platforms suitable, for which not?
Seth: Data science platforms are suitable for exploratory data analysis such as understanding trends, removing outliers, statistical analysis, as well as building machine learning models. To offer a complete workflow, some data science platforms include capabilities for developing a complete algorithm that contains the machine learning model as an important component. Additionally, data science platforms may provide tools that are specific to the type of data being used – for example, image processing techniques for image data, signal processing techniques for sensor data, and text processing analytics for text data.
Data science platforms are not suitable for the authoring of production code for applications such as networking infrastructure or web development. While some data science platforms may come close, they do not serve as a full replacement for developer-level integrated development environments.
Analytics Insight: According to which criteria should companies choose a data science platform?
Seth: Companies should consider who in the organization will be using the data science platform. Business units will have expertise in their respective lines-of-business, but will likely need an easier-to-use tool compared to a centralized data science team. Organizations should also consider which types of data they will be using. Many data science platforms were designed for working with marketing and sales data and will not scale to newer data sources such as image, video, audio, and sensor data. Another factor to consider is how important it is for the organization to differentiate with its data science programs. In competitive markets, organizations should look to adopt tools that provide more flexibility to customize the analysis so that they don’t end up competing on the data alone.
What distinguishes your Data Science platform and what makes it different from the competition?
MATLAB provides a wide variety of ways to access data from several sources, including business data from databases, data warehouses, and Hadoop, to engineering data from sensors, data historians, and industry-specific protocols. Innovation often occurs when various data sources are combined, so connecting to data, regardless of the format, is very important. We also have a large focus on making the tools used by data scientist available to the domain experts. We deliver this by providing easy-to-use functions and apps for machine learning, deep learning, computer vision, signal processing, numerical optimization, and other advanced analytics technologies. Lastly, MATLAB provides deployment paths for running deployed analytics on embedded devices as well as IT infrastructure and clouds. The offering of multiple deployment paths continues to gain importance as the Internet of Things trend causes teams to rethink whether processing should be conducted on embedded devices at “the edge”, in “the cloud”, or in a hybrid manner.