How to get census data during the government shutdown - or any other time

With the ongoing government shutdown, anyone who has tried to use a website maintained by one of the many federal agencies has encountered difficulties. Some of these obstructions are due to the fact that these websites probably aren’t being maintained as normal, while some are simply unnecessary roadblocks.

The main example I want to talk about today is data.census.gov, the online portal many analysts use frequently to get data collected by the Census Bureau, particularly American Community Survey data

The notice on the top of the website says that inquiries won’t be answered until appropriations are made. As far as I can tell, there’s no reason you shouldn’t be able to query this website during the shutdown. The data is public, and while we will certainly have to wait for appropriations before we can expect any new data to be added, we should be able to access the data that is already there. 

The good news is that there are ways you can get this data if you have some technical chops. I’m going to briefly explain two methods you can use to get American Community Survey data during the shutdown. 

IPUMS  

If you work with American Community Survey data, you should be aware of IPUMS. They provide individual survey responses to a wide range of surveys in the United States and abroad, with the American Community Survey included. 

This is a useful resource because their website makes it easy to get the data you need. You can shop around for different survey questions and make yourself a bespoke dataset that has all the information you need without any fluff. 

The main catch with IPUMS data is that because it is individual survey responses rather than aggregated results, you need to calculate any summary statistics yourself. If your goal is to find out the poverty rate in Wisconsin, you need to get all the respondents from Wisconsin and calculate a weighted average. A benefit of getting the individual survey responses is that you can get more specific answers than you might find on data.census.gov

The main downside I want to point out is that because you get individual responses, it is much more difficult to get small geographic estimates. If you are dealing with geographic areas that have less than 200,000 residents, this data is going to be more difficult. 

Tidycensus

If you can code in R, then I’d highly recommend becoming familiar with the tidycensus package. This lets you directly interface with the Census Bureau’s data API. You can take any table you would otherwise search up on data.census.gov and download the results directly in R. 

Using the package is not too difficult, and the basic usage page has everything you need to get started. One thing that is frustrating right now is figuring out exactly what the code for the variable you are interested in is. Because the main website is down, this can sometimes be annoying. Normally, I’d search the variable I was interested in to figure out exactly what table to ask for and which columns/rows to draw from, but the package does have all that information in a table so you can find it with a little bit of effort.

The advantage of this method is that it entirely bypasses the fact that data.census.gov prevents people from downloading data on the front end. With the right prompt, this package gets you exactly the same data. Additionally, this has the benefit of letting us work with smaller geographic regions than IPUMS enables because the data is disaggregated. 

The existence of this API suggests that there are other ways to directly query American Community Survey data without using R. A quick google search led me to this Python library which appears to do the same thing. 

Hopefully this is helpful to someone who’s been waiting for data.census.gov to come back online. This data is important and access to it is critical for so much analysis. It is a shame that access to it has become another casualty to politics.