Can Learning Change Your Mind?

Adam Ozimek asks, “Can Economics Change Your Mind?

In this skeptical view, economists and those who read economics are locked into ideologically motivated beliefs—liberals versus conservatives, for example—and just pick whatever empirical evidence supports those pre-conceived positions. I say this is wrong and solid empirical evidence, even of the complicated econometric sort, changes plenty of minds.

Just to make myself clear, only a human himself can change his mind, and economics can’t. And since the question is basically about learning, not economics, I reformulate the question accordingly: Can Learning Change Your Mind?

The rest turns out to be simple. If I want to change my mind big time, I take an issue I know nothing about and read some research. There will be surprises.

But if I happen to discover big surprises in the area of my competence, I become suspicious. Evidences don’t drop down like Newtonian apples. They flow like a river. Then learning is a flow, too. It’s a continuous process that brings no surprises if you learn constantly.

Where does continuity come from? First, from discounting new studies. New studies have standard limitations, even being factually and methodologically correct. Most frequent limitations concern long-term relationships, external validity, general equilibrium effects. Second, from the nature of the economy itself. Research in economics often speaks in yes-no terms, while economic processes are continuous. For marketing purposes, researchers formulate questions and answers like “Does X cause Y?”, which is a yes-no question tested with regressions. But causation is not about p-values in handpicked models. Causation is also the degree of impact. But this degree jumps wildly even within different specifications of a single model. That means I need a lot of similar studies to change my mind about X and Y.

Removing one letter from Bertrand Russell, “One of the symptoms of approaching nervous breakdown is the belief that one work is terribly important.”

Going back to Adam’s initial (yes-no) question, I’d say yes, some economists “are locked into ideologically motivated beliefs,” and yes, some economists produce knowledge that other people can learn from. These two groups overlap, but it’s no obstacle to good learning.

PS: In his post, Adam Ozimek also asked to submit studies that changed one’s mind. Since I see mind-changing potential as a function of novelty, I’d recommend a simple source of mind-changing studies: visit RePEc’s top cited studies list and read carefully the papers you haven’t read yet. There will be surprises.

Machine Learning for Economists: An Introduction

A crash course for economists who would like to learn machine learning.

Why should economists bother at all? Machine learning (ML) generally outperforms econometrics in predictions. And that is why ML is becoming more popular in operations, where econometrics’ advantage in tractability is less valuable. So it’s worth knowing the both, and choose the approach that suits your goals best.

An Introduction

These articles have been written by economists for economists. Other readers may not appreciate constant references to economic analysis and should start from the next section.

  1. Athey, Susan, and Guido Imbens. “NBER Lectures on Machine Learning,” 2015. A shortcut from econometrics to machine learning. Key principles and algorithms. Comparative performance of ML.
  2. Varian, “Big Data: New Tricks for Econometrics.” Some ML algorithms and new sources of data.
  3. Einav and Levin, “The Data Revolution and Economic Analysis.” Mostly about new data.

Applications

Practical applications get little publicity, especially if they are successful. But these materials do give an impression what the field is about.

Government

  1. Bloomberg and Flowers, “NYC Analytics.” NYC Mayor’s Office of Data Analysis describes their data management system and improvements in operations.
  2. UK Government, Tax Agent Segmentation.
  3. Data.gov, Applications. Some are ML-based.
  4. StackExchange, Applications.

Governments use ML sparingly. Developers emphasize open data more than algorithms.

Business

  1. Kaggle, Data Science Use cases. An outline of business applications. Few companies have the data to implement these things.
  2. Kaggle, Competitions. (Make sure you chose “All Competitions” and then “Completed”.) Each competition has a leaderboard. When users publish their solutions on GitHub, you can find links to these solutions on the leaderboard.

Industrial solutions are more powerful and complex than these examples, but they are not publicly available. Data-driven companies post some details about this work in their blogs.

Emerging applications

Various prediction and classification problems. For ML research, see the last section.

  1. Stanford’s CS229 Course, Student projects. See “Recent years’ projects.” Hundreds of short papers.
  2. CMU ML Department, Student projects. More advanced problems, compared to CS229.

Algorithms

A tree of ML algorithms:

machine_learning_alogrithms
Source

Econometricians may check the math behind the algorithms and find it familiar. Mathematical background:

  1. Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning. Standard reference. More formal approach. [free copy]
  2. James et al., An Introduction to Statistical Learning. Another standard reference by the same authors. More practical approach with coding. [free copy]
  3. Kaggle, Metrics. ML problems are all about minimizing prediction errors. These are various definitions of errors.
  4. (optional) Mitchell, Machine Learning. Close to Hastie, Tibshirani, and Friedman.

For what makes ML different from econometrics, see chapters “Model Assessment and Selection” and “Model Inference and Averaging” in The Elements.

Handy cheat sheets by KDnuggets, Microsoft, and Emanuel Ferm. Also this guideline:

screenshot
Source

Software and Hardware

Stata does not support many ML algorithms. Its counterpart in the ML community is R. R is a language, so you’ll need more tools to make it work:

  1. RStudio. A standard coding environment. Similar to Stata.
  2. CRAN packages for ML.
  3. James et al., An Introduction to Statistical Learning. This text introduces readers to R. Again, it is available for free.

Python is the closest alternative to R. Packages “scikit-learn” and “statsmodels” do ML in Python.

If your datasets and computations get heavier, you can run code on virtual servers by Google and Amazon. They have ML-ready instances that execute code faster. It takes a few minutes to set up one.

Summary

I limited this survey to economic applications. Other applications of ML include computer vision, speech recognition, and artificial intelligence.

The advantage of ML approaches (like neural networks and random forest) over econometrics (linear and logistic regressions) is substantial in these non-economic applications.

Economic systems often have linear properties, so ML is less impressive here. Nonetheless, it does predict things better, and more of practical solutions get done in the ML way.

Research in Machine Learning

  1. arXiv, Machine Learning. Drafts of important papers appear here first. Then they got published in journals.
  2. CS journals. Applied ML research also appear in engineering journals.
  3. CS departments. For example: CMU ML Department, PhD dissertations.

How Big Data Informs Economics

In A Fistful of Dollars, Clint Eastwood challenges Gian Maria Volonte with the words, “When a man with .45 meets a man with a rifle, you said, the man with a pistol’s a dead man. Let’s see if that’s true. Go ahead, load up and shoot.”

That’s the right words to challenge big data, which recently reappeared in economics debates (Noah Smith, Chris House via Mark Thoma). Big data is a rifle, but not necessary winning. Economists must have special reasons to abandon small datasets and start messing with more numbers.

Unlike business, which only recently discovered the sexiest job of the future, economists do analytics for the last 150 years. They deal with “big data” for half of that period (I count from 1940, when the CPS started). So, how can the new big data be useful to them?

Let’s find out what big data offers. First of all, more information, of course. Notable cases include predicting the present with Google and Joshua Blumenstock’s use of mobile phones in development economics. Less notable cases encounter the same problem: a decline in the quality of data. Compare long surveys that development economists collect when they do experiments versus what Facebook dares to ask its most loyal users. Despite Facebook having 1.5 bn. observations, economists end up with much better evidences. That’s not about depth alone. Social scientists ask clearer questions, find representative respondents, and take nonresponses seriously. If you do a responsible job, you have to construct smaller but better samples like this.

Second, big data comes with its own tools, which, like econometrics, are deeply rooted in statistics but ignorant about causation:

Big data tools
Big data tools

The slogan is: to predict and to classify. But economics does care about cause and effect relations. Data scientists dispense with these relations because the professional penalty for misidentification is lower than in economics. And, honestly, at this stage, they have more important problems to solve. For example, much time still goes into capacity building and data wrangling.

Hal Varian shows a few compelling technical examples in his 2014 paper. One example comes from Kaggle’s Titanic competition:

Varian - 2014 - Big Data New Tricks for Econometrics
Varian – 2014 – Big Data New Tricks for Econometrics

The task requires predicting whether a person survived the crash or not. The chart says that children had more chances to survive than old passengers, while for the rest age didn’t matter. A regression tree captures this nonlinearity in the age, while logit regression does not. Hence, the big data tool does better than the economics tool.

But an economist who remembers to “always plot the data” is ready for this. Like with other big data tools, it’s useful to know the trees, but something similar is already available on the econometrics workbench.

There’s nothing ideological in these comments on big data. More data potentially available for research is better than less data. And data scientists do things economists can’t. The objection is the following. Economists mostly deal with the problems of two types. Type One, figuring out how n big variables, like inflation and unemployment, interact with each other. Type Two, making practical policy recommendations for the people who typically read nothing more than executive summaries. While big data can inform top-notch economics research, these two problems are easier to solve with simple models and small data. So, a pistol turns out to be better than a rifle.