Research Is as Good as Its Reproducibility

Complex systems happen to have probabilistic, rather than deterministic, properties, and this fact made social sciences look deficient next to the real hard sciences (as if hard sciences predicted weather or earthquakes better than economics predicts financial crises).

What’s the difference? When today’s results differ from yesterday’s results, it’s not because authors get science wrong. In most cases, these authors just study slightly different contexts and may obtain seemingly contradictory results. Still, to benefit from generalization, it’s easier to take “slightly different” as “the same” and treat the result as a random variable.

In this case, “contradictions” get resolved surprisingly simply: by replicating the experiment and collecting more data. In the end, you have a distribution of the impact over studies, not simply of the impact within a single experiment.

Schoenfeld and Ioannidis show the dispersion of results in cancer research (“Is everything we eat associated with cancer?”, 2012):


Each point indicates a single study that estimates how much a given ingredient may contribute to getting cancer. The bad news: onion is more useful than bacon. The good news: we can say that a single estimate is never enough. A single study is not systematic, even after a peer review.

The recent attempt to reproduce 100 major studies in psychology confirms the divergence: “A large portion of replications produced weaker evidence for the original findings.” In this case, they also found a bias in reporting.

Economics also has reported effects varying across papers. By Eva Vivalt (2014):


This chart reports how conditional cash transfers affect different outcomes, measured in standard deviations. Cash transfers exemplify the rule: The impact is often absent, otherwise it varies (sometimes for the worse). For more, check this:

  • AidGrade: Programs by outcomes. A curated collection of popular public programs with their impact compared across programs and outcomes.
  • Social Science Registry. Registering a randomized trial in advance reduces the positive effect bias in publications and saves data-mining efforts by economists when nothing interesting comes out of the economist’s Plan A.

The dispersion of the impact is not a unique feature of randomized trials. Different estimates from similar papers appear elsewhere in economics. It’s most evident in literature surveys, especially those with nice summary tables: Xu, “The Role Of Law In Economic Growth”; Olken and Pande, “Corruption in Developing Countries”; DellaVigna and Gentzkow, “Persuasion.”

The problem, of course, is that the evidences are as good as their reproducibility. And reproducibility requires data on demand. But how many authors can claim that their results can be replicated? A useful classification by Levitt and List (2009):


Naturally-occurring data occurs naturally, so we cannot replicate it at will. A lot of highly cited papers rely on the naturally occurring data from the right-hand side methods. That’s, in fact, the secret. When an author finds a nice natural experiment that escapes accusations of endogeneity, his paper becomes an authority on the subject. (Either because natural experiments happen rarely and competing papers aren’t appearing, or because the identification looks so elegant that the readers fall in love with the paper.) But this experiment is only one point on the scale. It doesn’t become reliable just because we don’t know where the other points would be.

The work based on controlled data gets less attention, but this work gives a systematic account of causal relationships. Moreover, these papers cover the treatments of a practical sort: well-defined actions that NGOs and governments can implement. This seamless connections is a big burden, since taping “naturally-occurring” evidences to policies adds another layer of distrust between policy makers and researchers. For example, try to connect this list of references in labor economics to government policies.

Though to many researchers “practical” is an obscene word (and I don’t emphasize this quality), reproducible results are a scientific issue. What do reproducible results need? More cooperation, simpler inquiries, and less reliance on chance. More on this is coming.

One thought on “Research Is as Good as Its Reproducibility

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s