A Product Is Only as Good as Its Data: 4 Common Data Mistakes Businesses Make

February 20, 2020


Years ago, product companies were constantly talking about the need for data; as the power of collecting more data from a larger variety of sources was becoming apparent, and is part of the reason every company is now a tech company.

Today, the conversation is shifting toward examination of data quality and control–and I saw this reflected at this year’s Product World 2020. Turns out, a whole lot of data can get messy.​ When different departments of a product company are trying to solve the same problem in their own way, they end up working with different sources of data–and as a result, inconsistent information. They end up building or iterating on a product with that inconsistent information–and then wonder why they’re struggling to remain competitive.

In the 20 years I’ve worked with large-scale data analytics–including the advent of technological innovations that let us do more with it–I still see businesses describe themselves as “data-driven” when they aren’t, really. They wrestle with a few issues around how they analyze and apply data-driven insights, but most notably, there’s a general lack of examination of how the data is mined, where it’s taken from, and a clear understanding of data’s evolution throughout the decision-making process.

If I could pick the four most common mistakes businesses make around their data-driven processes, it would be these:


Mistaking Correlation For Causation

11 years ago, the Scheie Eye Institute in Pennsylvania conducted a study with the goal of understanding why the nation was seeing increased rates of nearsightedness. Data gathered from hundreds of families led the research team to conclude that the presence of a night light in a baby’s room led to increased chances of myopia development. One year later, that study was debunked–​ it had mistaken correlation for causation, and omitted data from other influencing factors like the families’ genetic predispositions. (Parents with nearsightedness are more likely to use night light and their babies may have inherited their myopia genes). This was a lower-stakes study, but as we increasingly rely on data analytics to drive bigger decisions, the absence of considering other confounders has the potential to wreak havoc.


Inaccurate Data Collection

In a business context, working with incomplete data, or unreliable data sources, can waste a ton of time of money. Let’s say an app developer wants to measure how users prefer to log in to the app (using email  vs. using Facebook  login). Surprisingly they see that the percentage of Facebook logins over all login attempts is higher than 100%.  So, what happened? Turns out, the data was not distinguishing from general Facebook activity on the app, so even when a user shared their app activity on Facebook, it was marked as an Facebook login event.

Now imagine this happening with educational apps–something we’re seeing more of as education moves further into the 21st century. Companies attaching data analytics and personalized, data-driven curriculum are working with an extremely diverse group of users, from geographical to cultural variances. Are we accurately assessing learning patterns in the absence of this contextual data–or what I call benchmark analytics?

Businesses can avoid this by first understanding the goal of the analysis. Have a hypothesis before conducting tests, as that is the only way to determine if the kind of data being used makes sense–but don’t waste time trying to prove an assumption or belief. Every A/B test is expensive, they warrant code changes, and they take time–especially for pricing tests. If someone insists US$2.99 is the best price point for the app store, don’t waste months trying to prove it–in the end, it will only increase the complexity of your code and waste time.


Being Stingy With Data

One of the biggest issues we found in data analytics is data quality. In the midst of mission control, few of us stop to ask, “do I trust my data?”

Data evolves with any product, consumers change their minds, and the more agile a company becomes, the harder it is to keep track of the changes.

Traditional data analytics is full of great charts and dashboards. But when someone needs something slightly different than the usual, it typically requires opening a ticket–followed by the person on the other end providing data that isn’t exactly what was needed…and the start of a 20-email thread over a few days. Then repeat this process again next time.

This does not work well with agile products that require a lot of on-the-fly change, which is why the industry needs to see independent data investigation open to more people across an organization. By allowing any team member to pull up data and go deeper (or in different directions, without opening a ticket), an organization is easing bottlenecks in the crucial discovery phase–all without learning to code or catching up on complex background knowledge.

Business should do three things to make this happen. First, provide teams with a user-friendly query portal that anyone can intuitively navigate, otherwise the barrier to entry for non-analysts is too high. Second, make sure to facilitate communication–in one place. A huge pitfall of being constantly reachable (email, slack, phone, etc.) is that different pockets of information are communicated by different people over different mediums (and that slack channel likely also has some conversations about where to go for lunch). Creating a space within the analytics space where multi-team conversations can take place, are searchable, and transparent (this will also prevent teams from doing double-work.) Finally, incorporate automation using machine learning technology. As more employees within a business generate insights around data, the smarter the diagnostic aspect of data analytics technology will get.

Time is another  significant factor in maintaining data quality, but as teams race to be the ones to find the solution to the problem, it’s tempting to cut corners. After testing, time remains a factor–what holds true today for user behavior may look entirely different in six months.

The solution is to run queries across a longer time frame, and get context for specificity. Look at volumes, look at patterns. And remember that how and why teams are looking at data will impact their results–just those blinding nightlights.


Allowing Communication Disarray

It’s 7 a.m. and the CEO sends out an angry email saying there’s a glitch in the product, and someone–anyone–needs to fix it immediately. Different teams will race toward the solution, each pulling data from different sources (or using the same data, but pulled from different time frames). Whoever makes the data match up with a proposed solution, wins.

This causes a lot of problems. Primarily, it’s sloppy–all of this is communicated over group emails, often among multiple threads. By the time it comes to pull the trigger on a decision in the meeting room, nobody can really say where that chart came from, or whether it’s up-to-date.

To avoid this, enable employees to build on ideas while also being able to organize and track their data’s journey. This guarantees that analysis comes out of a single source of truth, and that the limited time in that meeting isn’t spent asking how this data came together–it should already be apparent.


Not Aligning Infrastructure With Resources

Data is the new oil–and if product businesses can learn from the oil barons, know that it’s always wise to have control over one’s data, from both security and access standpoints.

For businesses, that means owning their own data warehouses. But “owning” data can mean different things to different businesses.

Thanks to cloud and SaaS providers, there’s no need to reinvent the wheel; businesses don’t need to implement huge infrastructure initiatives with built-in best practices–those options are already out there.

On the other hand, organizations that choose to build their own enterprise software for big data analytics will require a totally different set of expertise, big cultural changes and long query-to-resolution cycles. For businesses that do go this route, they need to first ake stock of whether the business can take on all of the engineers and product managers required to run it. Second, they’ll need to become well-versed in how the best companies out there have successfully done this–it should never happen in a vacuum.

All of these missteps touch on a foundational lesson every leader will learn as we enter a new era of analytics capabilities: data analytics is a mindset and a process, not just a software or a feature. To remain on the cutting edge of analytics, businesses must find ways to create a data-driven culture that values a single source of truth, sufficient context, and efficient, purposeful collaboration.


About Author

Alex Li is founder of Kubit, a smart analytics platform that makes deep data discovery and insights accessible for everyone.