## An Economist’s Starter Kit

I put together some links for economists. These are standard tools and services, and you may have seen most of them already.

But I’m sure this list is missing something you know of. So I’m asking you to have a look and contribute to posterity.

This is the list. And here you can edit it.

## Can Learning Change Your Mind?

In this skeptical view, economists and those who read economics are locked into ideologically motivated beliefs—liberals versus conservatives, for example—and just pick whatever empirical evidence supports those pre-conceived positions. I say this is wrong and solid empirical evidence, even of the complicated econometric sort, changes plenty of minds.

Just to make myself clear, only a human himself can change his mind, and economics can’t. And since the question is basically about learning, not economics, I reformulate the question accordingly: Can Learning Change Your Mind?

The rest turns out to be simple. If I want to change my mind big time, I take an issue I know nothing about and read some research. There will be surprises.

But if I happen to discover big surprises in the area of my competence, I become suspicious. Evidences don’t drop down like Newtonian apples. They flow like a river. Then learning is a flow, too. It’s a continuous process that brings no surprises if you learn constantly.

Where does continuity come from? First, from discounting new studies. New studies have standard limitations, even being factually and methodologically correct. Most frequent limitations concern long-term relationships, external validity, general equilibrium effects. Second, from the nature of the economy itself. Research in economics often speaks in yes-no terms, while economic processes are continuous. For marketing purposes, researchers formulate questions and answers like “Does X cause Y?”, which is a yes-no question tested with regressions. But causation is not about p-values in handpicked models. Causation is also the degree of impact. But this degree jumps wildly even within different specifications of a single model. That means I need a lot of similar studies to change my mind about X and Y.

Removing one letter from Bertrand Russell, “One of the symptoms of approaching nervous breakdown is the belief that one work is terribly important.”

Going back to Adam’s initial (yes-no) question, I’d say yes, some economists “are locked into ideologically motivated beliefs,” and yes, some economists produce knowledge that other people can learn from. These two groups overlap, but it’s no obstacle to good learning.

PS: In his post, Adam Ozimek also asked to submit studies that changed one’s mind. Since I see mind-changing potential as a function of novelty, I’d recommend a simple source of mind-changing studies: visit RePEc’s top cited studies list and read carefully the papers you haven’t read yet. There will be surprises.

## Software for Researchers: New Data and Applications

The tools mentioned here help manage reproducible research and handle new types of data. Why should you go after new data? New data provides new insights. For example, the recent Clark Medal winners used unconventional data in their major works. This data came large and unstructured, so Excel, Word, and email wouldn’t do the job.

I write for economists, but other social scientists can also find these recommendations useful. These tools have a steep learning curve and pay off over time. Some improve small-data analysis as well, but most gains come from new sources and real-time analysis.

Each section ends with a recommended reading list.

## Standard Tools

LaTeX and DropBox streamline collaboration. The recommended LaTeX editor is LyX. Zotero and its browser plugin manage the references. LyX supports Zotero via another plugin.

Stata and Matlab do numerical computations. Both are paid, have good support and documentation. Free alternatives: IPython and RStudio to Stata, Octave to Matlab.

Mathematica does symbolic computations. Sage is a free alternative.

1. Frain, “Applied LATEX for Economists, Social Scientists and Others.” Or a shorter intro to LaTeX by another author.
2. UCLA, Stata Tutorial. This tutorial fits the economist’s goals. To make it shorter, study Stata’s very basic functionality and then google specific questions.
3. Varian, “Mathematica for Economists.” Written 20 years ago. Mathematica became more powerful since then. See their tutorials.

## New Data Sources

The most general source is the Internet itself. Scraping info from websites sometimes requires a permission (see the website’s terms of use and robots.txt).

Some websites have APIs, which send data in structured formats but limit the number of requests. Site owners may alter the limit by agreement. When the website has no API, Kimono and Import.io extract structured data from webpages. When they can’t, BeautifulSoup and similar parsers can.

Other sources include industrial software, custom data collection systems (like surveys in Amazon Turk), and physical media. Text recognition systems require little manual labor, so digitizing analog sources is easy now.

Socrata, data.gov, quandl, FRED2 maintain the most comprehensive collection of public datasets. But the universe is much bigger, and exotic data hides elsewhere.

1. Varian, “Big Data.”
2. Glaeser et al., “Big Data and Big Cities.”
3. Athey and Imbens, “Big Data and Economics, Big Data and Economies.”
4. National Academy of Sciences, Drawing Causal Inference from Big Data [videos]
5. StackExchange, Open Data. A website for data requests.

## One Programming Language

A general purpose programming language can manage data that comes in peculiar formats or requires cleaning.

Use Python by default. Its packages also replicate core functionality of Stata, Matlab, and Mathematica. Other packages handle GIS, NLP, visual, and audio data.

Python comes as a standalone installation or in special distributions like Anaconda. For easier troubleshooting, I recommend the standalone installation. Use pip for package management.

Python is slow compared to other popular languages, but certain tweaks make it fast enough to avoid learning other languages, like Julia or Java. Generally, execution time is not an issue. Execution becomes twice cheaper each year (Moore’s Law) and coder’s time gets more expensive.

Command line interfaces make massive operations on files easier. For Macs and other *nix systems, learn bash. For Windows, see cmd.exe.

1. Kevin Sheppard, “Introduction to Python for Econometrics, Statistics and Data Analysis.”
2. McKinney, Python for Data Analysis. [free demo code from the book]
3. Sargent and Stachurski, “Quantitative Economics with Python.” The major project using Python and Julia in economics. Check their lectures, use cases, and open source library.
4. Gentzkow and Shapiro, “What Drives Media Slant?” Natural language processing in media economics.
5. Dell, “GIS Analysis for Applied Economists.” Use of Python for GIS data. Outdated in technical details, but demonstrates the approach.
6. Dell, “Trafficking Networks and the Mexican Drug War.” Also see other works in economic geography by Dell.
7. Repository awesome-python. Best practices.

## Version Control and Repository

Version control tracks changes in files. It includes:

• showing changes made in text files: for taking control over multiple revisions
• reverting and accepting changes: for reviewing contributions by coauthors
• support for multiple branches: for tracking versions for different seminars and data sources
• synchronizing changes across computers: for collaboration and remote processing
• forking: for other researchers to replicate and extend your work

Version control by Git is a de-facto standard. GitHub.com is the largest service that maintains Git repositories. It offers free storage for open projects and paid storage for private repositories.

## Sharing

### Storage

A GitHub repository is a one-click solution for both code and data. No problems with university servers, relocated personal pages, or sending large files via email.

When your project goes north of 1 GB, you can use GitHub’s Large File Storage or alternatives: AWS, Google Cloud, mega.nz, or torrents.

### Demonstration

Jupyter notebooks combine text, code, and output on the same page. See examples:

1. QuantEcon’s notebooks.
2. Repository of data-science-ipython-notebooks. Machine learning applications.

Beamer for LaTeX is a standard solution for slides. TikZ for LaTeX draws diagrams and graphics.

## Remote Server

Remote servers store large datasets in memory. They do numerical optimization and Monte Carlo simulations. GPU-based servers train artificial neural networks much faster and require less coding. These things save time.

If campus servers have peculiar limitations, third-party companies offer scalable solutions (AWS and Google Cloud). Users pay for storage and processor power, so exploratory analysis goes quickly.

A typical workflow with version control:

1. Creating a Git repository
2. Taking a small sample of data
3. Coding and debugging research on a local computer
4. Executing an instance on a remote server
5. Syncing the code between two locations via Git
6. Running the code on the full sample on the server

Some services allow writing code in a browser and running it right on their servers.

1. EC2 AMI for scientific computing in Python and R. Read the last paragraph first.
2. Amazon, Scientific Computing Using Spot Instances

## Real-time Applications

Real-time analysis requires optimization for performance. I exemplify with industrial applications:

1. Jordan, On Computational Thinking, Inferential Thinking and Big Data. A general talk about getting better results faster.
2. Google, Economics and Electronic Commerce research
3. Microsoft, Economics and Computation research

## The Map

A map for learning new data technologies by Swami Chandrasekaran:

## Research Is as Good as Its Reproducibility

Complex systems happen to have probabilistic, rather than deterministic, properties, and this fact made social sciences look deficient next to the real hard sciences (as if hard sciences predicted weather or earthquakes better than economics predicts financial crises).

What’s the difference? When today’s results differ from yesterday’s results, it’s not because authors get science wrong. In most cases, these authors just study slightly different contexts and may obtain seemingly contradictory results. Still, to benefit from generalization, it’s easier to take “slightly different” as “the same” and treat the result as a random variable.

In this case, “contradictions” get resolved surprisingly simply: by replicating the experiment and collecting more data. In the end, you have a distribution of the impact over studies, not simply of the impact within a single experiment.

Schoenfeld and Ioannidis show the dispersion of results in cancer research (“Is everything we eat associated with cancer?”, 2012):

Each point indicates a single study that estimates how much a given ingredient may contribute to getting cancer. The bad news: onion is more useful than bacon. The good news: we can say that a single estimate is never enough. A single study is not systematic, even after a peer review.

The recent attempt to reproduce 100 major studies in psychology confirms the divergence: “A large portion of replications produced weaker evidence for the original findings.” In this case, they also found a bias in reporting.

Economics also has reported effects varying across papers. By Eva Vivalt (2014):

This chart reports how conditional cash transfers affect different outcomes, measured in standard deviations. Cash transfers exemplify the rule: The impact is often absent, otherwise it varies (sometimes for the worse). For more, check this:

• AidGrade: Programs by outcomes. A curated collection of popular public programs with their impact compared across programs and outcomes.
• Social Science Registry. Registering a randomized trial in advance reduces the positive effect bias in publications and saves data-mining efforts by economists when nothing interesting comes out of the economist’s Plan A.

The dispersion of the impact is not a unique feature of randomized trials. Different estimates from similar papers appear elsewhere in economics. It’s most evident in literature surveys, especially those with nice summary tables: Xu, “The Role Of Law In Economic Growth”; Olken and Pande, “Corruption in Developing Countries”; DellaVigna and Gentzkow, “Persuasion.”

The problem, of course, is that the evidences are as good as their reproducibility. And reproducibility requires data on demand. But how many authors can claim that their results can be replicated? A useful classification by Levitt and List (2009):

Naturally-occurring data occurs naturally, so we cannot replicate it at will. A lot of highly cited papers rely on the naturally occurring data from the right-hand side methods. That’s, in fact, the secret. When an author finds a nice natural experiment that escapes accusations of endogeneity, his paper becomes an authority on the subject. (Either because natural experiments happen rarely and competing papers aren’t appearing, or because the identification looks so elegant that the readers fall in love with the paper.) But this experiment is only one point on the scale. It doesn’t become reliable just because we don’t know where the other points would be.

The work based on controlled data gets less attention, but this work gives a systematic account of causal relationships. Moreover, these papers cover the treatments of a practical sort: well-defined actions that NGOs and governments can implement. This seamless connections is a big burden, since taping “naturally-occurring” evidences to policies adds another layer of distrust between policy makers and researchers. For example, try to connect this list of references in labor economics to government policies.

Though to many researchers “practical” is an obscene word (and I don’t emphasize this quality), reproducible results are a scientific issue. What do reproducible results need? More cooperation, simpler inquiries, and less reliance on chance. More on this is coming.

## Growth Diagnostics in Russia: Getting Started

I’m going to do a couple of case studies in growth diagnostics. The first country is Russia for reasons I’ll explain in the next post. The second country is likely to be China, but you’re still free to send your suggestions.

I’m using a constraint analysis framework by Hausmann–Rodrik–Velasco (HRV). HRV developed a comprehensive, yet structured, framework with a 10-year record of practical applications. It includes a formal model and handy heuristics. It’s also compatible with the literature on growth factors, such as physical and human capital.

# The Formal Model

The formal model comes from HRV (2004) — an early draft that still contains all the math of an augmented neoclassical model of economic growth. The equation of interest:

where $r$ is the return on capital defined as

The first equation describes accumulation of capital and consumption under distortions. The distortions are denoted with the Greeks and fall into five categories:

A very formal approach would require picking values for these parameters and simulating the model to compare it with actual values of consumption and capital. A well-calibrated model would predict responses to the changes in the parameters, which would immediately reveal the constraint. I won’t follow this approach because some parameters have no direct or estimable counterparts in the data.

Instead, I’m using this formal model for discipline and test candidate constraints with heuristics. The summary so far:

# Heuristics

The shortcut to growth constraints is a useful table from HRV (2008):

Compared to the formal model, this table includes human capital and specific tests for each constraint mentioned in the header.

Estimating the responses to constraints may be challenging. For example, if you have an indicator for expropriation, you can’t readily say by how much an increase in “expropriation” would reduce economic growth. There’s no universal solution to this problem. For this, I’ll focus on constraints we can estimate with reasonable confidence.

# The Helpers

A candidate for the binding variable is often a compromise among different priorities. The interest rate has to balance inflation and unemployment. Taxes raise some costs via taxation and reduce other costs via public goods. Macro stability after government spending cuts may be followed by political instability.

In this case, growth diagnostics would send contradictory signals. You must increase and decrease the same variable simultaneously! This seems possible in politics, but not in mathematics. To clarify such ambiguities, constraint testing requires a few more models.

Though the list of models is open, most of the job is done by a few conventional macro tools.

# The Next Post

In the text post, I’ll briefly review the Russian economy and challenges it poses to growth diagnostics.

The entire case study will be accompanied with the replication files, which I try to make suitable for an immediate replication for any other major economy.

## Growth Diagnostics: A Crash Course

In the mid-2000s, Ricardo Hausmann and Dani Rodrik developed a growth diagnostics framework for dealing with persistent economic growth failures:

With this decision tree:

The symptoms in the table indicate binding constraints. According to Hausmann and Rodrik, easing binding constraints would accelerate economic growth. Therefore, government should address its country’s constraints first. But before it must find these constraints.

Such evidence-based prioritizing could balance politics and fashion as the major determinants of public policies. It’s worth remembering that the major cost of government is not public spending, but the time wasted on implementing wrong reforms. This cost grows exponentially when measured against the scenario in which evidence-based policies indeed change things for better.

To be sure, governmental decisions are always accompanied by some sort of research. This research, however, often suffers from biases and politics. An economist needs a framework that keeps him disciplined. Hausmann-Rodrik growth diagnostics is such a framework.

A crash course in this framework would look like this:

1. Hausmann, Klinger, and Wagner, “Doing Growth Diagnostics in Practice.”
2. Hausmann, Rodrik, and Velasco, “Growth Diagnostics,” in Serra and Stiglitz, The Washington Consensus Reconsidered.
3. World Bank, Country case studies.

I’d add two things to these materials.

First, a formal growth model. Doing diagnostics without modeling may seem easier, but after a while you’ll lose the big picture. As you lose the big picture, you can no longer rank priorities, even with microeconomic estimates of returns to policies. The Solow model and its modifications would suffice.

Also, unlike the authors, I’m more cautious about focusing on a single constraint. First, economic evidences are often inconclusive (but better than non-economic non-evidences). Second, governments fail to implement at least some of the policies that they’ve planned. In the end, you may advocate a wrong policy that isn’t going to be implemented anyway! As a remedy, I recommend stylized diversification akin to the Kelly criterion, when government allocates efforts proportionally to the expected payoffs from each constraint-policy pair.

In the next posts, I’ll review a couple of major economies in the spirit of Hausmann and Rodrik. Stay tuned!

## How Big Data Informs Economics

In A Fistful of Dollars, Clint Eastwood challenges Gian Maria Volonte with the words, “When a man with .45 meets a man with a rifle, you said, the man with a pistol’s a dead man. Let’s see if that’s true. Go ahead, load up and shoot.”

That’s the right words to challenge big data, which recently reappeared in economics debates (Noah Smith, Chris House via Mark Thoma). Big data is a rifle, but not necessary winning. Economists must have special reasons to abandon small datasets and start messing with more numbers.

Unlike business, which only recently discovered the sexiest job of the future, economists do analytics for the last 150 years. They deal with “big data” for half of that period (I count from 1940, when the CPS started). So, how can the new big data be useful to them?

Let’s find out what big data offers. First of all, more information, of course. Notable cases include predicting the present with Google and Joshua Blumenstock’s use of mobile phones in development economics. Less notable cases encounter the same problem: a decline in the quality of data. Compare long surveys that development economists collect when they do experiments versus what Facebook dares to ask its most loyal users. Despite Facebook having 1.5 bn. observations, economists end up with much better evidences. That’s not about depth alone. Social scientists ask clearer questions, find representative respondents, and take nonresponses seriously. If you do a responsible job, you have to construct smaller but better samples like this.

Second, big data comes with its own tools, which, like econometrics, are deeply rooted in statistics but ignorant about causation:

The slogan is: to predict and to classify. But economics does care about cause and effect relations. Data scientists dispense with these relations because the professional penalty for misidentification is lower than in economics. And, honestly, at this stage, they have more important problems to solve. For example, much time still goes into capacity building and data wrangling.

Hal Varian shows a few compelling technical examples in his 2014 paper. One example comes from Kaggle’s Titanic competition:

The task requires predicting whether a person survived the crash or not. The chart says that children had more chances to survive than old passengers, while for the rest age didn’t matter. A regression tree captures this nonlinearity in the age, while logit regression does not. Hence, the big data tool does better than the economics tool.

But an economist who remembers to “always plot the data” is ready for this. Like with other big data tools, it’s useful to know the trees, but something similar is already available on the econometrics workbench.

There’s nothing ideological in these comments on big data. More data potentially available for research is better than less data. And data scientists do things economists can’t. The objection is the following. Economists mostly deal with the problems of two types. Type One, figuring out how n big variables, like inflation and unemployment, interact with each other. Type Two, making practical policy recommendations for the people who typically read nothing more than executive summaries. While big data can inform top-notch economics research, these two problems are easier to solve with simple models and small data. So, a pistol turns out to be better than a rifle.

## Impact and Implementation of Evidence Based Policies

Chris Blattman noted that economists lack evidences on important policies. That’s true for foreign aid programs, which Chris mentioned. But defined broadly, policy making in poor countries can source evidences from elsewhere. NBER alone supplies 20 policy-relevant papers each week. And so does the World Bank, which recently studied its own economy:

About 49 percent of the World Bank’s policy reports … have the stated objective of informing the public debate or influencing the development community. … About 13 percent of policy reports were downloaded at least 250 times while more than 31 percent of policy reports are never downloaded. Almost 87 percent of policy reports were never cited.

In an ideal world, policy makers would read more and adjust their economies to the models we already know thanks to the decades of thorough research. This is not happening because policy makers are managers, not researchers with well-defined problems. And, as Russell Ackoff said, managers do not solve problems they manage messes.

Governments have their own limits of the messes they can deal with. Economists in research, on the contrary, simplify messes to tractable models. Let’s take one of the most powerful ideas in development: structural changes. Illustrated by Dani Rodrik:

The negative slope of the fitted values says that people moved from more productive to less productive industries over time. Which, of course, is a bad structural change. We can blame politics for this or whatever, but it’s hard to separate politics and, say, incompetence.

Emerging (and not so emerging) economies love the idea of employment growing in productive sectors. Even reports on sub-Saharan Africa regularly refer to knowledge economies and high-value-added industries. But in the end, many nations have something like that picture. (Oh, those messes.)

Did economists learn to manage messes better than public officials? Well, that’s what development economics is trying to accomplish. While it doesn’t include “general equilibrium effects” (the key takeaway from Daron Acemoglu), the baseline for judging the effectiveness of assessment programs is way below this and other criticisms. The baseline is eventually the intuition of a local public official—and policies that he would otherwise enact, if there were no evidence based programs.

Instead, these assessment programs provide simple tools for clear objectives. NGOs and local governments can expect something specific.

What about big evidence based policies? They require capacity building. At the extreme, look at the healthcare reform in the United States. Before anything happened, the Affordable Care Act already contained 1,000 pages. Implementation was difficult. Could a government in Central Africa implement a comparable reform, even having abundant evidences on healthcare in the US or at home?

Economists start to ignore the problem of implementation as the potential impact of their insights increases. The connection is not direct, but if you simplify a complex problem, you get a solution for the simplified problem. Someone else must complete the solution, and that becomes the problem.

## On the Subjective Sense of Authority and Entitlement

A paper by Fourcade et al. tell all about us, sneaky economists:

In this essay, we investigate the dominant position of economics within the network of the social sciences in the United States. We begin by documenting the relative insularity of economics, using bibliometric data. Next we analyze the tight management of the field from the top down, which gives economics its characteristic hierarchical structure. Economists also distinguish themselves from other social scientists through their much better material situation (many teach in business schools, have external consulting activities), their more individualist worldviews, and in the confidence they have in their discipline’s ability to fix the world’s problems. Taken together, these traits constitute what we call the superiority of economists, where economists’ objective supremacy is intimately linked with their subjective sense of authority and entitlement. While this superiority has certainly fueled economists’ practical involvement and their considerable influence over the economy, it has also exposed them more to conflicts of interests, political critique, even derision.

Blogging economists commented it furiously. I think it’s mostly because the paper takes the right tone to touch an economist’s nerve. It mentions some problems in the profession, but the conclusion is radically irrelevant to any social problems social sciences—including economists—are trying to solve. This does bug economists.

The paper takes virtues for sins. Like this:

The opposite [to being axiomatic] would be arguing by example. You’re not allowed to do that. … There is a word for it. People say “that’s anecdotal.” That’s the end of you if people have said you’re anecdotal … [Another thing is] what modern people say … the modern thing is: “it’s not identified.” God, when your causality is not identified, that’s the end of you.

Actually, anecdotes are popular in economics. However, economics differentiates between case-based proof (which is almost never a proof) and case-based example (which demonstrates what you’ve proved with statistics). Second, yes, identification is a great thing—it shows what strings to pull to achieve socially beneficial results. The paper should have mentioned how many people suffer due to policies invented with anecdote-based reasoning that lacks causal connections.

It’s difficult to discuss the rest of the paper. The best way to get after economists is to show evidences stronger than economists do. And for this, you have to look elsewhere.

## Economics of Poor Governance

Ideas about “structural reforms” get copy-pasted from one development report to another, and for a good reason. These recommendations—basically about improving the economy’s fundamentals—indeed matters. But guess who’s supposed to implement them? Governments! And the quality of reforms depend on the quality of governments that implement them. What’s happening to government?

So, why is that?

The rule of law, which government is supposed to provide, is crucial for economic development:

But the rule of law implies incorruptible public officials:

Which is not the case when a country lacks specific institutions:

However, these institutions undermine narrow political power and, therefore, unlikely to emerge:

In brief, we ask corrupted officials to stop being corrupted and limit their power. Actually, it does make sense because development economists also address honest officials, who are more numerous and try to change things. It may work. Ruling parties in non-democracies attempt to improve governance without giving much power away to the press or opposition. They initiate genuine openness reforms, let citizens request information, complain, sue the government—unless it becomes political.

Still, the quality of government stagnates around the world. Partly, it happens because dark forces hiding in ruling parties defend their interests. Hey, that’s the problem we’ve started from! Not surprisingly. The literature says about the benefits of good governance, but its recommendations follow from the relationships found in developed countries, which already have uncorrupted governments to enforce the rule of law and the rest. Demanding Switzerland-style governance from corrupted governments looks like a hopeless idea. Not only the dark force resists, we also have few reliable solutions in mind—solutions that would be feasible given all peculiarities of political institutions in developing countries.

What sort of knowledge is to look for? Perhaps, of two types:

1. How to improve governance when the dark force resists? Honest judiciary, transparent elections, and able police create conditions for economic development. Well done. Now we want to know more about paths to these conditions. Much work has been done before in law and political studies. But as an ignorant economist, I see much space for improvements. Economists got their invisible hands on these subjects just recently and noticed the scarcity of (a) formal models, (b) suitable data. These are nice things to have. Theories escaping these two pieces leave us in the Middle Ages, which is not nice.
2. How to run an economy with dark forces? Given its history, economic theory paid more attention to well-governed nations, not to those with problems. For one example, corruption creates information asymmetries of the type that economists haven’t paid much attention to. Take a firm that has political connections and can generate profits above the market average. Can it gain access to capital? No. For this, it must credibly disclose its superiority to banks, which is impossible because the advantage comes from informal political connections. One way or another, capital finds these opportunities, but imperfections remain, so the equilibrium level of investments is lower than the economy and technology allow. So, the problem has implications that have not been studied as deeply as the economics of good governance.

These issues arise in development economics; it’s impossible to ignore them. But the field is small compared to the rest of econ-worlds:

Economics of poor governance awaits intellectual reinforcement from other fields. Wellbeing of 6 bn people is at stake.