Policy analysis is a data-driven field, but in this work we come into contact with a large number of people that don’t have as much experience working rigorously with data. Lobbyists, advocates, communications professionals and many more all play a role in the cadre of people who are involved in getting our policymakers the necessary information to make better informed decisions.
I wanted to take a moment today to talk to some of my colleagues who are working towards the same goal of evidence-based policymaking, but don’t necessarily have the formal background in statistics and explain one of the most common challenges that data analysts face.
Descriptive vs. inferential statistics
When most people hear the word “statistics,” they think of the specific numbers that we use to describe things in the world. However, I want to talk about statistics as a particular type of mathematics: the practice of doing statistical analysis. When we do statistics, it is almost always in one of two major categories of analysis: descriptive or inferential.
Descriptive statistics are used to summarize and describe data that we already have. If I tell you the average income in a county, the unemployment rate in a state, or the percentage of students who passed an exam, I am describing observed information. These statistics help us understand what happened, but they do not necessarily tell us why it happened or whether the same thing would happen somewhere else.
Inferential statistics go a step further. They use data from a sample to draw conclusions about a larger population. This is important because collecting information from every person, business, or household is often impossible. Instead, we gather information from a smaller group and use statistical methods to estimate what is likely true for the broader population.
How samples help us understand populations
Suppose you wanted to know the average height of every adult in Ohio. Measuring all 9 million adults would be extraordinarily expensive and time consuming. Instead, you might randomly select a few thousand people and measure them.
If the sample is chosen properly, the average height in the sample will probably be very close to the average height in the full population. It will not be exactly correct, but it will usually be close enough to answer most practical questions.
The same principle applies throughout policy analysis. Pollsters survey a few thousand voters to estimate public opinion. Economists analyze a subset of households to estimate poverty rates. Researchers study a group of patients to learn about the effectiveness of medical treatments.
The key idea is that we are rarely interested in the sample alone. We use the sample because we want to learn something about a much larger population.
Notice how quickly the lines between descriptive and inferential statistics can get blurred. Calculating the average height of a group is a descriptive task, but intentionally calculating the height of a representative sample is inferential. This might seem a bit pedantic, like arguing about whether we should be saying this data or these data, but I would argue that this distinction is essential to understand.
The challenge of generalization
One of the most common mistakes people make when interpreting research is assuming that a finding from one population will automatically apply to another. This works when we’re careful about how we set up our analysis, but it can quickly fall apart and lead to incorrect analyses when we interpret results too broadly.
Imagine a workforce development program that increases employment among participants in one city. That result tells us something useful about that specific program in that specific location. However, it does not necessarily tell us what would happen if the exact same program were implemented somewhere else.
The local economy may be different. Participants may have different educational backgrounds. Employers may have different hiring practices. Transportation systems, housing costs, and labor market conditions may all vary. Even when the underlying program is identical, the environment surrounding it may not be.
This problem appears constantly in policy research. A tax policy that works well in one state may produce different outcomes in another. An educational intervention that improves test scores in one district may have little effect elsewhere. Researchers often refer to this issue as external validity. In other words, how well do we think results from one study can be generalized beyond the group that was originally examined.
This doesn’t mean that we can’t learn anything from past studies or programs that have been implemented in other locations. Often in policy analysis this is the only kind of data we have access to. What it does mean is we need to think about what the unique factors are that might influence these outcomes, and we need to do sensitivity analysis to try and figure out what might happen if some of our assumptions don’t hold.
Understanding the difference between descriptive and inferential statistics is important for anyone who works with data regularly. It is essential to be able to identify whether a piece of information is a simple statement of the way things are, or if it has been carefully constructed to enable us to make generalizations about some population.

