When ROI Hits the Roof


The Coalition for Evidence-Based Policy has a nice compilation of low-cost program evaluations. Example #2 tells us about a $75 million education program that improved nothing. The cost of finding this out was $50K. The simple math says that returns on money invested in evaluation reached 150,000%. It kinda outperforms S&P500.

What’s the trick? First, as mentioned before, maybe the same program completes something else. The program had aimed at improving student results and attendance, and it didn’t improve them. But the teachers got $3K more each and bought themselves useful things. Nothing wrong with that, but we need other ideas to improve education.

Second, so-called unconditional money transfers rarely motivate better performance, though it may seem counterintuitive. Not only in education. Public services just happened to be in full view of everybody. Then ROI in evaluation depends on how much the government or business puts into unchecked programs. This time it was $75 million, next time it’s $750 million. Big policies promise big returns, either due to better selection or faster rejection.

Third, such opportunities exist because big organizations evaluate execution, not impact. Execution is easier to monitor, so public corporations have to have independent auditors who ensure that employees don’t steal. In contrast, efficiency audit requires management’s genuine interest in rigorous evaluation, but there’s no incentives for that. After all, stealing is everywhere a crime, while incompetence is not (despite incompetence being more wasteful).

With that said, ROI of 150,000% is a fact. If you spend on a policy doing X and the policy does nothing to X, you can just leave $75M on the table. Without that $50K evaluation, you’d lose them.

Making Informed Choices in the Complex World

The human nature is complex. The world has billions of interconnections. But it’s surprisingly simple to understand them.

Here’s a practical question. How can the government improve education? Google returns 600 million answers to this question. Unfortunately, most suggestions can’t help. But we still can find out which of them would help. The MIT Poverty Action Lab did a series of evaluations:


This long page says that half of the programs developed by top experts had no impact on test scores (horizontal lines touch the vertical zero on the left-hand side plot) . Though these programs can be useful for something else, they are money wasted as far as learning itself is concerned. The plot on the right is scarier: it’s cost effectiveness of the programs on a log scale. You can see a 100-fold difference in cost effectiveness of scholarships and information provision—in respect to their impact on test scores. It’s like having two shops in one street: one sells 1 apple for $100, another does 100 apples for the same $100.

Take Google or Microsoft, which question the impact of their actions too. Instead of education, they care about profits. They did similar evaluations to find out that 80–90% of their ideas don’t work.

The world is complex and punishes for unjustified self-confidence. Health care, finance, government, nonprofits employ policies that are supposed to work, but they don’t when tested. And these policies are still in force because, well, someone is already paid for being very confident in them. Besides, recognition of the opposite needs courage and doubts, both of which look harmful to career. Costs and complexity aren’t the problem; evaluations are simple and often very cheap. The aforementioned studies separated out the impact in randomized evaluations, but the choices are many:


And more on them later.

Ordinary government failures

(comparing public policies against one of the deadliest diseases; source)

Governments make mistakes. But not those that typically get into the press.

Stories about government failures—sort of Brooking’s and Heritage Foundations stories here and there—are inconclusive. It’s unclear where a “failure” starts because you have no baseline for “success.” In result, the press and think tanks criticize governments for events anyone can be blamed for. Mostly because of huge negative effects.

The financial crisis of 2008 is a conventional government failure in public narratives. September 11 is. But neither was predicted by alternative institutions. Individual economists who forecasted the burst in 2008 came from different backgrounds, organizations, and countries. These diverse frameworks—though being very valuable dissent—are not a systematic improvement over mainstream economics. Predicting 9/11 has even a weaker record (Nostradamus and similar).

Governments make other, more systematic, mistakes. Studying and reporting these mistakes make sense because a government can do better in next iterations. The government can’t learn from the Abu Ghraib abuse, however terrible it was. But it can learn to improve domestic prisons, in which basically similar things happen routinely.

Systematic problems are easier to track, predict, and resolve. A good example unexpectedly comes from the least developed nations. Well, from international organizations and nonprofits that run their anti-poverty programs there. These organizations use randomized evaluations and quasi-experimental methods to separate out the impact of public programs on predefined goals. The results show manifold differences in efficacies of the programs—and it’s a huge success.

Organizations such as the MIT Poverty Action Lab and Innovations for Poverty Action evaluated hundreds of public policies over the last ten years. Now, guess how much press coverage they got. Zero. The NYT can’t even find the Lab mentions among its articles. Google returns 34 links for the same query, most of them to hosted blogs.

One explanation is the storytelling tradition in newspapers. Journalists are taught to tell stories (which is what readers like). Presenting systematic evidences makes a bad story. You have little drama in numbers, however important they are. And telling numbers reduce your readership, which is incompatible with a successful journalist career. Even new data journalism comes from blogs, not well-establised publishers.

More fundamentally, mass media’s choice of priorities leads to little attention to systematic problems in general. Each day brings hot news that sound interesting, however irrelevant and impractical they may be. Reporting public policy research can’t compete in hotness with political speeches and new dangerous enemies around. It took a couple of decades for climate change to become a somewhat regular topic. And survival rates of other important issues are much lower.