5 Elements of Big data requirements

by November 2, 2019

The vast amount of data generated by various systems is leading to a rapidly increasing demand for consumption at various levels. Many applications, IoT and transaction systems are generating billions and trillions of records every day leading to new opportunities. These datasets generate meaningful insight and accurate predictions for their day to day business which maximizes the quality of services and generates healthy profits. This hurricane of data in the form of text, picture, sound, and video, so-called big  data warrants a specific framework to source and flow to multiple layers of treatment before it is consumed. Knowing customers, market conditions, customer buying patterns, status and a steady environment play an important role in a business.

Understanding the business needs, especially when it is big data necessitates a new model for a software engineering lifecycle as defining the requirements of big data systems is different from traditional systems.

The needs of the big data system should be discovered in the initial stage of the software life cycle.  The traditional engineering requirement framework and processes are incapable and insufficient to fulfill the needs of the organization. The traditional methods are limited to functional and a few nonfunctional requirements and are more focused on generic user requirements. Most of the time, users may not have enough insights about the potential of analytics and its features which leads to a generic BI solution.


It is the process of defining user needs and requirements for a new or existing solution to be built to assist them to perform their work. It involves discussion with all the stakeholders and identifies their business needs in the form of functional and nonfunctional requirements. These requirements are collated, validated, prioritized, analyzed and measured before they are made part of the life cycle. Big data and its potentials can be discovered only if we have the insights. The insights may have unknown patterns that can be explored with an in-depth analysis of the use case.

The requirement analysis is primarily grouped into 5 elements namely

1. Functional requirements

2. Non-functional requirements

3. BI and analytics use caseb

4. Data exploration

5. Agile

1. Functional requirements – These are the requirements for big data solution which need to be developed including all the functional features, business rules, system capabilities, and processes along with assumptions and constraints. Though the functional requirements have detailed information, it lacks the 360-degree view. For example, a Channel management dashboard should be generated every day. While the requirements are collated for the channel dashboard, it may fail to look at all the aspects of channel management resulting in the partial analysis.


2. Non-functional requirements – It defines how the developed system should work. Apart from usability, reliability, performance, and supportability, there are many other aspects that the solution should consider and ensure that they are taken care of. Some of the important requirements are;

a. Security – Multiple levels of security like firewalls, network isolation, user authentication, encryption at rest using keys, encryption of data in transit using SSL, end-user training, intrusion protection, and intrusion detection systems (IDS) are some of the key requirements for many of the modern data lakes.

b. Compliance – As the Big data solutions are becoming more matured, various industry-standard compliances and regulations are taking center stage. The challenges of industry compliances with ever-increasing chaos of standards, rules, regulations and contractual obligations are increasing the risk of non-compliance to multifold. Regulations like HIPAA(Healthcare), GDPR (European union) ensures customer privacy while some of the regulations mandate the organizations to keep track of customers’ information for a variety of reasons like prevention of fraud. This may end up in conflict or violation of new law while complying with the old laws.

c. Cloud platform – Selection of a cloud platform is specific to each and every organization however some of the aspects like adherence to compliance and regulations, security, data governance, technology footprint, roadmap and partnership, migration supportability, regional availability/services of components and cost are the prime factors while selecting the cloud service provider.

d. Self-serve data prep – It is one of the up-coming concepts which facilitates business users, analysts or data scientists to analyze and prepare the datasets so that these datasets can be used further without relying on data specialists/data technical specialists.

e. How long it takes for a business user to get the data from the application to data lake/datamart is defined as latency. Data volume is about how much of daily data is extracted from a source application to the data lake. It also covers the historical data that is required in the data lake/Datamart to cater to the data needs of business users. The older data that is infrequently used need to be taken out of Datamarts/data lake. This process of periodic data extraction out of datamart/data lake to low-cost storage is part of a data archival

While we focus on functional and non-functional requirements, there are other important facets that define the success of the Big data engagement. BI use case and Analytics patterns are the game changers and act as a nucleus which ensures that the Big data engagement is fully accepted by the business community and there are absolutely no surprises while it is being implemented.

3. Use case – These are grouped into 2 categories namely BI and Analytics use cases, depending on the requirements.

3.1 BI Use-case – A use-case defines the action to achieve a particular goal along with the required features so that the particular KPIs can be defined and tracked.

This section starts where the functional requirements end. It covers the detailed view of functional requirements by enlisting all the use cases whether they are used or not in the engagement. By doing so, we will end up listing all the use cases by creating a complete 360-degree view of the solution. Further, a use case is divided into multiple subsections and each subsection has its own detailed analysis. For example, as part of insurance, channel management is one of the popular use cases that many BI applications offer. The channel management has sub-sections like;

•  Sales from various channels for specific products

•  Sales behavior of sales associates, agents, and partners

•  Impact of rewards on various sales associates and partners

•  Partner retention strategy

•  Claims by each of the channel

•  Revisiting the product strategy based on the business expansion and underwriting processes which is based on the claim’s ratios…many more

These granular requirements for each of the use cases ensure that there are no gaps in understanding the use-case and its patterns.  If any of these are missed during requirements and taken up at a later part of the program, they may derail the schedule and result in cost overrun.

3.2 Analytics Use case: The first step for an analytics model is the identification of business use cases. These use cases are different from BI use cases focusing primarily on analytical needs. Building a requirements model to specify a use case at the beginning of analytics is the key aspect. It means, just defining the use case is not enough as there is a need to explore these use cases with the following critical items;

•  Business objective and measure

•  Characteristics like business processes, relationships, and dependencies

•  Selection and preparation of the data

•  Data validation

To illustrate, product optimization and pricing are some of the popular use cases in insurance. Its business objective is to build and optimize a product that is best suited for dynamic and risky market conditions i.e., at what price the product with its features can be sold.

Along with objective, its characteristics like market conditions, risk patterns, claims history, cost, revenue, expenses, profit, buying patterns, pricing sensitivity, behavioral sense, customer choice, and geography needs thorough analysis. In some cases, additional factors like weather, population, age of the population and many more need to be considered. All the above mentioned play an active role in deciding the pricing strategy. For instance, if the insurance company is strategizing their product pricing for the state of Florida, they need to consider many of the factors along with some of the additional like age of the population. This is because the aging population is a key input in deciding the pricing, as many of the counties like Sumter and Charlotte have an average value of 40% to 50% of the aging population. All the characteristics need to be analyzed in detail.

The specified requirement model consists of all the characteristics with their relationships and dependencies which influence the decision-making process of a use case. The decision-making process also drives towards all the direct and indirect impacts on other organizational measures and processes.

4. Data exploration – Effective data selection and preparation are the key ingredients for the success of a use case which can be used for accurate and decisive predictions. Firstly, the data required for a use case implementation need to be identified. During this process, there may be a requirement for an additional dataset as a reference (In Florida, the reference data on climate patterns and the changes for the last 5 years are the key input in formulating pricing of an insurance product). The selection of data followed by data correction activities like duplicates, standardization, data invention, masking and integration of data, fixes all or most of the issues which are the number one barrier for analytical models.

After the data preparation, the accuracy of the analytical model depends solely on data validation activities. The prime goal of the validation is to define a data set to verify the quality of the analytical models and nullify or limit the issues like noisy data, overfitting, outliers, and underfitting of data.

The details of data preparation or validation activities will be taken up in the upcoming post since the focus of this article is on requirements.


5. Agile – It is a methodology to execute a Big data engagement incrementally and systematically with a fixed time frame so that businesses can see the benefits within a short period than waiting for a longer duration. In agile, user stories are the means of defining and collecting functional and non-functional requirements in chunks that are of value to the customer. Below are a few examples of user stories.

•  As an underwriter, I would like to view the claim ratio with geography and time.

•  As a Sales manager, I would like to assess the impact of rewards on various sales associates and partners on a specific period so that I can decide on new rewards plan

•  As a marketing strategist, I would like to analyze the effect of a recent campaign in understanding cannibalization of products

As per Bill Wake’s INVEST model, these user stories should be independent, negotiable, valuable, estimable, small and testable so that these can be modularized for effective implementation. The user stories from product backlog are prioritized before being added to sprint backlog during the sprint planning and “burned down” over the duration of the sprint. Also, the dependencies, story points, the capacity of the team, productivity and timeliness are discussed during the sprint planning. Finally, after the implementation of all the stories and sprints, the backlog completion will be flagged as completed.

If any queries or comments, please write to basu.darawan@gmail.com