Agile and Big Data – A Match Winning Combination

by August 31, 2019

Agile and Big Data

A business needs a result-based framework that is supported by tools and technologies to be able to draw adequate analysis of the data available for taking required strategic, tactical and operational decisions. A data analysis could be drawn from data found in OLAP reports, dashboards, or scorecards.

BI/Big Data analytics/predictive analytics/mining models provides adequate operational insights. These insights are crucial for decision-making and can have far-sighted implications on a business’ outcomes.

However, in a typical software industry, the general perception is that BI/Big Data typically works well with a waterfall or iteration model. These traditional models comprise a hierarchy, which begins with analysis, moves to design and development, and then ends at the deployment stage. But the issue is, most businesses don’t consider the risk, costs involved, and time required until the end of the project life cycle. So, if any correction needs to be incorporated, businesses have to wait till the end of a project.

Hence, Agile methodology is required to achieve higher success rate of BI projects.


Agile Methodology

Agile is a software development methodology to execute a software project incrementally and systematically within a fixed time frame so that businesses can see the benefits within a short period than waiting for a longer duration. The fun of using agile is, it offers periodical output so that the results can be validated, and corrective action can be taken. The traditional methods like iterative or waterfall start with analysis, design, development till the deployment takes place but the issue is, that business doesn’t realize any value till the end of the project life cycle. These methods are risky, costly and less efficient as the business must wait till the end of the project for any corrective action.

As mentioned above, Agile is a methodology or philosophy of an organization to develop a software solution incrementally using short cycles of 2 to 4 weeks so that the development process is aligned with the changing business needs. Instead of a big bang/single-pass development to deployment for several months, all the requirements and risks are discussed upfront. Agile follows a process or framework of frequent feedback where a workable product is delivered after 2 to 4 weeks of iteration.

What A Regular Project Flow Looks Like

This project begins with an analysis of various components, followed by an ETL strategy approach, then towards data model designing, ending at report development. These components from analysis to development/testing and deployment takes several months before the business can see any result or benefits. The impact of this project are:

•   Going back and changing the design post testing and before deployment is quite difficult

•   Studying if the product is in a working condition is not possible until the last day

•   Some of the risks remain unknown until the last day

•   Any delay hamper downstream applications


BI/Big Data Projects

A project gives a business the necessary framework supported by tools and technologies to take the required strategic, tactical and operational decisions in the form of analysis. The analysis could be reports, dashboards, scorecards, and analytics. BI/Big Data provide the strategic and operational insights of business which are crucial for decision making. These insights are provided through various dashboards, OLAP reports, predictive analytics, and mining models. These insights help the organization to take key decisions that will have far-sighted implications.


Agile for BI/BigData Projects

In a typical software industry, the general perception is that BI/Big Data typically works well with waterfall or iteration model. Considering the BI success ratio (which is less than the failure ratio), the industry’s first choice is the non-agile model. Most of the BI projects have the components to be developed:

•   ETL packages/mappings

•   Data model

•   Data Quality

•   Reports/Dashboards

The project begins with an analysis of the various components, ETL strategy, design of the Data model to the development of reports. These components from analysis to development/testing and deployment take several months before the business can see the result and benefits. This will result in the following:

•   It’s difficult or suicidal to go back and change the design once the testing is done before it can be deployed

•   Users can not see any working product until the last day

•   Some of the risks remain unknown until the last day

•   Any delay will further hamper the downstream applications

So how we can avoid all these issues and ensure that the BI project is successfully executed? The answer is Agile. Are you wondering how will one deliver BI project with agile? Is this riskier than the waterfall or Iterative model? How will one ensure that the BI project undertaken is a blockbuster? Well, without a thought out plan, Agile is even more dangerous than any other model. The following paragraphs will spell out the mantra for success in using agile for BI projects, leading to a great success story.

For an illustration purpose, we are using insurance domain and related modules. All the projects start with a specific business requirements and related topics so that the specific problem/pain areas can be addressed.

Insurance policy is mainly a legal contract between an insurance company and policyholder for any liability coverage for many types of risks.  It covers property and assets from loss or damage in the event of disasters like fire, theft and natural calamities. The insurance domain consists of many business processes that define the nature of business. Some of the key processes are policies, claims, underwriting and renewal. The key ask from any insurance firms is as following

•   How will one analyze the data?

•   What extent of granularity can be applied to the analysis?

•   Is there any analysis to help one in setting the premium pricing?

•   How will one differentiate between risky vs non- risky properties?

So, the insurance relies heavily on analytics to address some of these key queries. The data is scattered across multiple systems in various structured and unstructured forms. The data volumes are growing every day and that poses a challenge to the insurance firms to continuously churn these data sets to meaningful insight. Big Data technologies and tools will allow firms to find these insights and help them to take some of the key decisions. Big data offers a framework to continuously bring these data sets, even in real time by addressing key data issues, building models and analyzing the data.

Bringing in Modularity – The Agile concept

By adapting modularity and functionality, any BI/Big Data program can be implemented in Agile and a business can reap all its benefits. The Agile concept is built on the basic principle of modularity. The backlogs/requirements are collected and collated in a highly structured way.

A backlog is an important artifact of Agile where all the user stories are stored. These backlog items (user stories) are aligned with various business processes. This takes quite a good amount of time to interview various business users with brainstorming sessions, surveys and 1 on 1 group discussions followed by verification of the understanding. These business processes once identified and confirmed, will be further separated into various modules/sub-modules based on the functionality. This process is called mapping, which helps to map the modules to business processes. This process is a key ingredient in the overall agile framework and the success of an agile project depends a lot on the success of how well the structure of the modularity and functionality is done. This will give an advantage to the team so that they can manage changes to the project at any given stage and efficiently manage the outcome.

The next big exercise for an agile team is to identify module dependencies. Based on the functionality, it is key to identify the inter-module and inter-business process dependencies so that a key mapping is developed to understand the relationship between various modules. This exercise helps the agile team in many ways;

•   Knowhow of the changes and the impact on various modules

•   Divide and rule – Parallel development activity can be taken where there are no dependencies between modules resulting in significant cut down in elapsed time

•   Seamless integration

•   Easy adaption when there are changes to the requirements at any stage of the project

Once the modularity and constraints/or dependencies are identified and baselined, prioritizing of these modules will be done so that the agile team along with the business can decide on the order of development and deployment of modules.


Sprint Planning

The user stories as part of the backlog are prioritized but not estimated yet. The estimation requires a detailed discussion as part of sprint planning. Hence the agile team takes part in the grooming session to discuss all the user stories along with their priority/dependency, story points, the capacity of the team, productivity and timeliness. Once each story is assigned with story points, the Sprint capacity is committed based on a team’s capacity to develop those many stories in each Sprint. This exercise will lead to the number of sprints so as to develop all the stories that are part of the backlog.


Sprint Execution

For better representation, we are considering a data lake/data mart solution for one of the insurance firms. The solution encompasses of pricing models, product analysis, risk cost per policy, claim activity management and claim scoring/forecast. Firstly, all the backlog items should be arranged and aligned with various business processes. In this case, the business processes are Underwriting, Reinsurance, Policy, and claims. The backlog items will be grouped into the above processes like:


  • Need to see the evaluation and sharing the risk factors
  • Need of a report to do Primary Factor Analysis


•   Rating and Pricing of a product analysis

•   User can see various Product Quotes

•   Quote Acceptance

•   Quote bind

•   Price optimization decisions


•   Policy issuance

•   Policy Servicing(Endorsement)


•   Notice of loss

•   Claim estimation

•   Claim ratio

•   Claim settlement

Once the business processes are identified, there is a need for identifying the order in which these business processes are up for BI/Big Data implementation. To initiate, the business owner will identify the first business process for the implementation. In this case, it is a policy.

The creation of the backlog items for this business process will take place in sprint grooming session covering user stories, story points estimation, acceptance criteria, definition of ready, definition of done and priorities. The user stories (as part of backlog) would be prioritized for various sprints by the product owner for further development.

To achieve this business processes seamlessly, various sprints should be conducted like Daily Scrum Meet (where the team identifies dependencies and blockers), Sprint Planning (where the stories for the current sprint are prioritized), Sprint Retrospective (where activities with respect to the scrum process can be improved) & Sprint Review (where the feature can be demonstrated)

The modularity and identified dependencies ensure that the development is carried out with utmost planning and outweigh the potential risks of dependency related bottlenecks. Also if any of the stories are blocked during the current sprint, this can be replaced with another story on priority which was clarified during the grooming session with the acceptance of the product owner thereby lowering the risks of an incomplete sprint.

The completion of the first business process (which is policy) will lead to the next business process based on the order in which the business would like to get it implemented.

So, by adapting modularity and functionality, any BI/Big Data programs can be implemented in agile and reap all the benefits. The whole exercise needs to planned well along with all the key stakeholders so that there won’t be any ambiguity and the whole team can work on these modules with greater focus.