Is Uncertainty in Data Analysis a Big Challenge for Business?

Is Uncertainty in Data Analysis a Big Challenge for Business?

We are not good at thinking about uncertainty in data analysis, which we need to be in 2022

When people talk about Uncertainty in data analysis, and when they discuss big data, quantitative finance, and business analytics,s we use a broader notion of what data analysis is. We would like to push the idea that it's any time that you're using data to answer questions and to guide decision-making because that includes a lot of science, which is often about answering questions, a lot about engineering where you're designing a system to achieve a particular goal, and of course, decision making, both on an individual or a business or a national public policy level. So, we have to see Uncertainty in data analysis involved in all of those pieces. 

One part of data science is making predictions, which we'll get to, but the fact that we live in an uncertain world is incredibly interesting because what we do as a culture and a society, we use probability to think about uncertainty. So we are wondering your thoughts on whether we as humans are actually good at thinking probabilistically. So, we do seem to have some instinct for probabilistic thinking, even for young children. We do have something that's like a Bayesian update (involving statistical methods that assign probabilities or distributions to events (such as rain tomorrow) or parameters (such as a population mean) based on experience or best guesses before experimentation and data collection and that apply Bayes' theorem). 

When we get new data, and if we're Uncertainty in data analysis, we get new evidence, we update our beliefs, and in some cases, we do a pretty good approximation of an accurate Bayesian update, typically for things that are kind of in the middling range of probability, maybe from about 25% to 75%. At the same time, we're terrible at very rare things. Small probabilities we're pretty bad at, and then there are a bunch of ways that we can be consistently fooled because we're not doing the math. We're doing approximations to it, and those approximations fail consistently in ways that behavioral psychologists have pointed out, things like confirmation bias and other cognitive failures like that. With all of this knowledge around how Uncertainty in data analysis. We are about Uncertainty in data analysis and how we can be good and bad about thinking probabilistically. The role of Uncertainty in data analysis in decision making in general, we make decisions based on all the data we have, but a lot of the time, the quality of the decision will be rated on the quality of the outcome, which isn't necessarily the correct way to think about these things. 

Where is Uncertainty in data analysis prevalent in society?

We can witness uncertainty in data analysis in election forecasting, issues related to health and safety, those are all cases where we're can see the risk, interventions and that have certain probabilities of good outcomes, certain probabilities of side effects, and where sometimes our heuristics are good, and other times we make really consistent cognitive errors. There are a lot of cognitive biases, and one that we fall prey to constantly is, not even sure what it's called, but it's when you have a small sample size, and we can see something occur several times, that's probably the way things work.

Doctors have had a version of that in the past, which they can make decisions often about treatment that is based on their own patients, so, "Such-and-such a drug has not worked well for my patients, and I've seen bad outcomes with my patients," as contrasted with using large randomized trials, which we've got a lot of evidence now that randomized trials are a more reliable form of evidence than the example that you gave of generalizing from small numbers. Some of the ways we can combat this, if we get health wrong, are some of the ways that we get safety. Certainly, one of the problems is that we're very bad at small risks, small probabilities. 

There's some evidence that we can do a little bit better if we express things in terms of natural frequencies, so if we tell you that something has a .01% probability, you might have a really hard time making sense of that, but if we tell you that it's something like one person out of 10,000, then you might have a way to picture that. You could say, "Well, okay. At a baseball game, there might be 30,000 people, so there could be three people here right now who have such-and-such a condition. So, I think expressing things in terms of natural frequencies might be one thing that helps. Essentially, these are, we suppose, linguistic technologies and adopting things that we know work in the language. We think graphical visualizations are important, too. Certainly, we have this incredibly powerful tool, which is our vision system, that's able to take a huge amount of data and process it quickly, so that's, we think, one of the best ways to get information off a page and into someone's brain.

The most important misconceptions regarding uncertainty that we need to correct, are those data-oriented educators. We know about probabilistic predictions. We think that's a big one but when you summarize a distribution, such as if we just ask you about the mean, then people generally assume that it's something like a bell-shaped curve, and we have some intuition for what that's like, that if we tell you that the average human being is about 165 centimeters tall, or I think it's more than that, you get a sense of that. So probably some people are over 200, and probably some people are less than 60, but there probably isn't anybody who is a kilometer tall." We have a sense of that distribution.

Technologies best suited for communicating Uncertainty in data analysis 

Bayesian inference and a couple of the visualizations that people use all the time, and the classic one is a histogram. And that one is most appropriate for a general audience. Most people understand histograms. Violin plots are kinda similar, that's just two histograms back-to-back. And I think those are good because people understand them, We've seen several times people pointing out that you have to get histograms right. If the bin size is too big, then you're smoothing away a lot of information that you might care about. If the bin size is too small, you're getting a lot of noise and it can be hard to see the shape of the distribution through the noise. So, one of the things we can advocate for is using CDFs instead of histograms, or PDFs, as the default visualization. And when we are exploring a data set, you can almost always look at CDFs because you get the best view of the shape of the distribution, you can see modes, you can see central tendencies, you can see spread. But also if you've got weird outliers, they jump out, and if you've got repeated values, you can see those clearly in a CDF, with less visual noise that distracts you from the important stuff. The only problem is that people don't understand them. But we think this is another case where the audience is getting educated, that the more people are consuming data journalism, the more they're seeing visualizations like this and there's some implicit learning that's going on.

Call to Action for Uncertainty in data analysis 

We think if you have not yet had a chance to study data science, you should. And we think there are a lot of great resources that are available now that just weren't around too long ago. And especially if you took a statistics class in high school or college, and it did not connect with you, the problem is not necessarily you. The standard curriculum in statistics for a long time we have not been right for most people. I think it's just spent way too much time on esoteric hypothesis tests. It gets bogged down in some statistical philosophy that's not very good, it's not a very good philosophy, it's science.

If you come back to it now from a data science point of view, it's much more likely that you're gonna find classes and educational resources that are much more relevant. They're gonna be based on data. They're gonna be much more compelling. So give it another shot. And for people who have got data science skills, there are a lot of ways to use that to do social good in the world. I think a lot of data scientists end up doing, quantitative finance and business analytics, those are the two big application domains. And there's nothing wrong with that, but I also think there are a lot of ways to use the skills that you've got to do something good, find stories about what's happening and get those stories out. Use those stories as a way to effect change. Or if nothing else, just to answer questions about the world. If there's something that interests you, very often you can find data and answer questions.

uncertainty in data science, and how we think about prediction, how we can think probabilistically, how we do it right, and how we can get it wrong as well. the idea that data science isn't necessarily only for data scientists, that it actually could be of interest to everyone. 

Related Stories

No stories found.
logo
Analytics Insight
www.analyticsinsight.net