Exploring the Usefulness of Data Science in Public Policy and Governance

by May 20, 2020

Data Science

Data science has expanded its reach to various touchpoints. Its powerful framework helps expand the evidence-based understanding of policymaking and directly improve service delivery.

Such understanding is useful in differentiating between two broad (though highly interdependent) trends that define data science. The first is a gradual expansion of the types of data and statistical methods that can be used to glean insights into policy studies, such as predictive analytics, clustering, big data methods, and the analysis of networks, text, and images. The second trend is the emergence of a set of tools and the formalization of standards in the data analysis process. These tools include open-source programming languages, data visualization, cloud computing, reproducible research, as well as data collection and storage infrastructure.

According to Brookings research, perhaps not coincidentally, these two trends align reasonably well with the commonly cited data science Venn diagram. While it is a simplification, it is still a useful and meaningful starting point. Moreover, the expanded view of data and statistics has meaningful repercussions for both policy analysts and consumers of that analysis.


The Usefulness of Data Science for Public Policy and Governance

Evaluating data is becoming a core component of government oversight. The actions of private companies are more frequently in databases than file cabinets and having that digital information obscured from regulators will undermine our societal safeguards. Government agencies should already be acting to evaluate problematic AI-hiring software and seeking to uncover biases in models that determine who gets health interventions. As algorithmic decision-making becomes more common, it will be necessary to have a core of talented civic data scientists to audit their use in regulated industries.

Even for public servants who never write code themselves, it will be critical to have enough data science literacy to meaningfully interpret the proliferation of empirical research. Despite recent setbacks—such as proposed cuts to evidence-building infrastructure in the Trump administration’s budget proposal—evidence-based policymaking is not going anywhere in the long term. There are already 125 federal statistical agencies, and the Foundations of Evidence-Based Policymaking Act, passed early last year, expands the footprint and impact of evidence across government programs.

Further, the mindset of a data scientist is tremendously valuable for public servants: It forces people to confront uncertainty, consider counterfactuals, reason about complex patterns, and wonder what information is missing. It makes people skeptical of anecdotes, which, while often emotionally powerful, are not sufficient sources of information on which to build expansive policies. The late and lauded Alice Rivlin knew all this in 1970 when she published “Systemic Thinking for Social Action.” Arguing for more rigor and scientific processes in government decision-making, Rivlin wrote a pithy final line: “Put more simply, to do better, we must have a way of distinguishing better from worse.”


Implementing Data-Scientific Thinking and Evidence-Based Policies

The tools and data to distinguish better from worse are more available than ever before, and more policymakers must know how to use and interpret them. Continued expansion of evidence-based decision-making relies on many individuals in many different roles, adopting practices that encourage data-scientific thinking. Managers in government agencies can hire analysts with a rigorous understanding of data in addition to a background in policy. They can also work to open up their datasets, contributing to Data.gov and the broader evidence-based infrastructure. Grant-making organizations have a critical role, too. They should be mandating an evaluation budget—at least 5% of a grant—to collect data and see if the programs they are funding actually work. When they fund research, it should require replicable research and open-data practices.