How Big Data Informs Economics

In A Fistful of Dollars, Clint Eastwood challenges Gian Maria Volonte with the words, “When a man with .45 meets a man with a rifle, you said, the man with a pistol’s a dead man. Let’s see if that’s true. Go ahead, load up and shoot.”

That’s the right words to challenge big data, which recently reappeared in economics debates (Noah Smith, Chris House via Mark Thoma). Big data is a rifle, but not necessary winning. Economists must have special reasons to abandon small datasets and start messing with more numbers.

Unlike business, which only recently discovered the sexiest job of the future, economists do analytics for the last 150 years. They deal with “big data” for half of that period (I count from 1940, when the CPS started). So, how can the new big data be useful to them?

Let’s find out what big data offers. First of all, more information, of course. Notable cases include predicting the present with Google and Joshua Blumenstock’s use of mobile phones in development economics. Less notable cases encounter the same problem: a decline in the quality of data. Compare long surveys that development economists collect when they do experiments versus what Facebook dares to ask its most loyal users. Despite Facebook having 1.5 bn. observations, economists end up with much better evidences. That’s not about depth alone. Social scientists ask clearer questions, find representative respondents, and take nonresponses seriously. If you do a responsible job, you have to construct smaller but better samples like this.

Second, big data comes with its own tools, which, like econometrics, are deeply rooted in statistics but ignorant about causation:

Big data tools
Big data tools

The slogan is: to predict and to classify. But economics does care about cause and effect relations. Data scientists dispense with these relations because the professional penalty for misidentification is lower than in economics. And, honestly, at this stage, they have more important problems to solve. For example, much time still goes into capacity building and data wrangling.

Hal Varian shows a few compelling technical examples in his 2014 paper. One example comes from Kaggle’s Titanic competition:

Varian - 2014 - Big Data New Tricks for Econometrics
Varian – 2014 – Big Data New Tricks for Econometrics

The task requires predicting whether a person survived the crash or not. The chart says that children had more chances to survive than old passengers, while for the rest age didn’t matter. A regression tree captures this nonlinearity in the age, while logit regression does not. Hence, the big data tool does better than the economics tool.

But an economist who remembers to “always plot the data” is ready for this. Like with other big data tools, it’s useful to know the trees, but something similar is already available on the econometrics workbench.

There’s nothing ideological in these comments on big data. More data potentially available for research is better than less data. And data scientists do things economists can’t. The objection is the following. Economists mostly deal with the problems of two types. Type One, figuring out how n big variables, like inflation and unemployment, interact with each other. Type Two, making practical policy recommendations for the people who typically read nothing more than executive summaries. While big data can inform top-notch economics research, these two problems are easier to solve with simple models and small data. So, a pistol turns out to be better than a rifle.