Better Models for Education

A cautionary tale

Four years ago leading universities jumped into the bandwagon of massive open online courses. They didn’t get much more attention since then:

C98510C2-08E0-494A-9922-F7DC114F39FE
Google Trends

This is international data. In the US, interest in MOOCs declined, despite respectable institutions kept offering new courses on various topics. Is it a marketing failure, which best universities would be proud of, or a bad educational technology?

Let’s see. A typical MOOC consists of

  • lecture slides and exercises
  • a talking head that reads the slides
  • a discussion board, barely alive
  • an optional certificate

Despite many professors having good presentation skills, this technology is not different from a textbook. In fact, ten years before MOOCs, the MIT offered a much better solution: OpenCourseWare — a guideline how to study like an MIT student. It wasn’t tied to particular enrollment dates, pace, or lecturer. Instead, it showed what a diligent student should complete in one semester.

MOOCs became popular after Sebastian Thrun and Peter Norvig had released their open AI course. More than 100,000 students had enrolled, and universities decided to supply more courses. But the AI course was backed by new exciting technologies like self-driving cars and text recognition, while a standard university course covered boring rudiments available in any textbook.

The quality of online courses didn’t improve over time. Each professor appreciated his own brand and didn’t collaborate with colleagues from other universities. So each one had his own course, that is, slides and exercises. For example, a large MOOC provider offers 609 “data science” courses. Students enroll in just a dozen of them, when the lecturer already has a very good reputation. Like Andrew Ng and his machine learning course based on Stanford’s CS229 and available online since 1999.

The history of MOOCs shows how a lot of smart people keep making things that don’t work. Interestingly, it has to do with their core competencies and not online education itself.

Because someone else did better.

Y Combinator: Engaging educators

University professors have little motivation to work with students. Richard Feynman described teaching as “something [to do] so that when I don’t have any ideas and I’m not getting anywhere I can say to myself, ‘At least I’m living; at least I’m doing something; I’m making some contribution’—it’s just psychological.” So when it comes to research vs teaching, many professors choose research.

Anyway, most universities teach future workers, not researchers or educators. Normally, you expect workers teaching workers. Workers raised by professors are like Tarzan raised by gorillas. An innocent problem in a primary school, but the difference in interests increases as education progresses.

How to align the interests of educators and students? By involving the educator in the student’s real passion. That’s what startup accelerators do.

Y Combinator, the most prestigious of accelerators, invests in early-stage startups and puts their founders through a 3-month training program. The 5% stake that Y Combinator acquires for $120K ensures that the mentor’s wellbeing depends on the performance of his students.

Mentorship and apprenticeship are old business practices, of course. Startup accelerators add a social component by bringing many founders to one place. They also escape the research lab hierarchy, when a senior faculty member secures funding and employs graduate students as cheap labor force.

The MIT Media Lab is perhaps the most famous academic lab that operates like a startup accelerator. Professors join the companies founded by their graduates. That’s not a general practice in other universities, in which offering a stake for better mentoring sounds like an insult.

Khan Academy: Engaging students

Engaging students is the second most important task of an educator after engaging himself. This task takes time, so schools and colleges prefer to get rid of the least motivated troublemakers, instead. Many leave college because they see better options. How can educators decrease attrition?

Khan Academy was a one-man project done by a hedge fund analyst in his spare time. The founder taught math on YouTube years before universities started publishing videos of their own classes.

But arguably the best part of Khan Academy appeared later, when students started solving exercises online and getting immediate feedback. Happened before, but Khan Academy polished this technology with data:

In brief, Khan Academy sets the sequence of exercises such that students are not discouraged by frequent failures. It’s part of Khan Academy’s gamification mechanism, which keeps learners motivated throughout K-12.

Stack Exchange: Asking and answering questions

Good educators teach the Socratic way, by asking leading questions. This technique does not scale well in a class with 100+ students. A good alternative is a Q&A website, like StackExchange or Quora.

StackExchange covers many academic subjects up to the graduate level. Its community encourages good questions and punishes for ill-prepared ones. Over time, a motivated person learns how to do preliminary research and ask right questions.

Answering these questions makes more sense than standardized tests or oral exams. Other advantages? Real problems, clear rewards, faster feedback.

Wikipedia: Accumulating knowledge

Wikipedia is fifteen year old, but the education system integrated only one half of it: students copy-paste Wikipedia content into their essays. It should be the other way around! Instead of assigning essays that no one reads, university professors could assign editing Wikipedia articles.

That’s a real contribution. Wikipedia editors check changes and reject the bad ones. It’s easy to track these edits. The Wikimedia Foundation always look for new editors and broader coverage. The content goes straight onto the front page of Google Search.

Despite all the advantages, I saw very few professors who practice this. That’s again about engaging educators, rather than students.

GitHub: Offering creative assignments

GitHub became a Wikipedia for code. Anyone can contribute to a project of interest. The list of open issues suggests possible contributions.

Like Wikipedia and StackExchange, GitHub addresses genuine problems, not synthetic exercises. Software engineers dominate, but any STEM project suits this platform.

Kaggle: Encouraging competition

Though the idea of 3,500 statisticians competing for $50,000 may seem irrational, Kaggle attracted thousands of math-savvy folks to practical problem solving. “Practical” is Kaggle’s key innovation. Competitive problem solving existed before in international olympiads and websites like Hacker Rank. Kaggle made such competitions useful, massive, and scalable.

Some CS departments encourage students to take part in Kaggle competitions. Why here and not on Wikipedia or GitHub? Kaggle challenges look much more like a standardized testing with clear-cut ranking. No need to evaluate whether the student made a useful contribution or just cheated.

Code4Startup: Learning for doing

Learning by doing is an old, popular, and effective technique. But task assignment is a trap. Stupid tasks kill motivation, and the rest dies by itself.

The simplest way to improve motivation is to increase the reward. Startup success stories turned to be a very effective one. More importantly, they are free.

Code4Startup turned this idea into a service. They offer courses showing users how to make a clone of a successful startup. Unlike MOOCs, these courses show how to turn coding and marketing skills into a useful product.

Code School and treehouse take a similar approach.

A honorary mention goes to McDonald’s and Walmart. These companies employ and train the people which top universities would never admit (and other universities get rid of these people after admission). Those who complain about students paying them $50K a year must try to teach a person working for the minimum wage.

A comment

The services I mentioned have nothing to do with the formal education system. Many of them are not even labeled as educational. But they do what colleges are supposed to do, and do it better.

Three more things. (1) These services never associated themselves with colleges. More importantly, none attempted to reform the formal educational system. That’d be an interesting waste of time, as it was for John Dewey and other reformers. (2) These services scale and depend less and less on the limited supply of really good professors. (3) These services specialize. They don’t teach everything; they make narrow tools to improve specific skills.

Comparing their popularity with that of top universities (the MIT is much more popular outside the US; other terms are insensitive to geography):

1D02B960-8D95-4D2F-8596-EBEDFB5B8C98
Google Trends: The United States

Selected services (the two plots have different vertical scales and only trends are comparable; for more, check the links):

5D4EB9D7-AE45-4E01-A987-9AE9C160FA80
Google Trends

So if education is changing, it it’s changing outside traditional institutions.

Can Learning Change Your Mind?

Adam Ozimek asks, “Can Economics Change Your Mind?

In this skeptical view, economists and those who read economics are locked into ideologically motivated beliefs—liberals versus conservatives, for example—and just pick whatever empirical evidence supports those pre-conceived positions. I say this is wrong and solid empirical evidence, even of the complicated econometric sort, changes plenty of minds.

Just to make myself clear, only a human himself can change his mind, and economics can’t. And since the question is basically about learning, not economics, I reformulate the question accordingly: Can Learning Change Your Mind?

The rest turns out to be simple. If I want to change my mind big time, I take an issue I know nothing about and read some research. There will be surprises.

But if I happen to discover big surprises in the area of my competence, I become suspicious. Evidences don’t drop down like Newtonian apples. They flow like a river. Then learning is a flow, too. It’s a continuous process that brings no surprises if you learn constantly.

Where does continuity come from? First, from discounting new studies. New studies have standard limitations, even being factually and methodologically correct. Most frequent limitations concern long-term relationships, external validity, general equilibrium effects. Second, from the nature of the economy itself. Research in economics often speaks in yes-no terms, while economic processes are continuous. For marketing purposes, researchers formulate questions and answers like “Does X cause Y?”, which is a yes-no question tested with regressions. But causation is not about p-values in handpicked models. Causation is also the degree of impact. But this degree jumps wildly even within different specifications of a single model. That means I need a lot of similar studies to change my mind about X and Y.

Removing one letter from Bertrand Russell, “One of the symptoms of approaching nervous breakdown is the belief that one work is terribly important.”

Going back to Adam’s initial (yes-no) question, I’d say yes, some economists “are locked into ideologically motivated beliefs,” and yes, some economists produce knowledge that other people can learn from. These two groups overlap, but it’s no obstacle to good learning.

PS: In his post, Adam Ozimek also asked to submit studies that changed one’s mind. Since I see mind-changing potential as a function of novelty, I’d recommend a simple source of mind-changing studies: visit RePEc’s top cited studies list and read carefully the papers you haven’t read yet. There will be surprises.

Software for Researchers: New Data and Applications

The tools mentioned here help manage reproducible research and handle new types of data. Why should you go after new data? New data provides new insights. For example, the recent Clark Medal winners used unconventional data in their major works. This data came large and unstructured, so Excel, Word, and email wouldn’t do the job.

I write for economists, but other social scientists can also find these recommendations useful. These tools have a steep learning curve and pay off over time. Some improve small-data analysis as well, but most gains come from new sources and real-time analysis.

Each section ends with a recommended reading list.

Standard Tools

LaTeX and DropBox streamline collaboration. The recommended LaTeX editor is LyX. Zotero and its browser plugin manage the references. LyX supports Zotero via another plugin.

Stata and Matlab do numerical computations. Both are paid, have good support and documentation. Free alternatives: IPython and RStudio to Stata, Octave to Matlab.

Mathematica does symbolic computations. Sage is a free alternative.

  1. Frain, “Applied LATEX for Economists, Social Scientists and Others.” Or a shorter intro to LaTeX by another author.
  2. UCLA, Stata Tutorial. This tutorial fits the economist’s goals. To make it shorter, study Stata’s very basic functionality and then google specific questions.
  3. Varian, “Mathematica for Economists.” Written 20 years ago. Mathematica became more powerful since then. See their tutorials.

New Data Sources

The most general source is the Internet itself. Scraping info from websites sometimes requires a permission (see the website’s terms of use and robots.txt).

Some websites have APIs, which send data in structured formats but limit the number of requests. Site owners may alter the limit by agreement. When the website has no API, Kimono and Import.io extract structured data from webpages. When they can’t, BeautifulSoup and similar parsers can.

Other sources include industrial software, custom data collection systems (like surveys in Amazon Turk), and physical media. Text recognition systems require little manual labor, so digitizing analog sources is easy now.

Socrata, data.gov, quandl, FRED2 maintain the most comprehensive collection of public datasets. But the universe is much bigger, and exotic data hides elsewhere.

  1. Varian, “Big Data.”
  2. Glaeser et al., “Big Data and Big Cities.”
  3. Athey and Imbens, “Big Data and Economics, Big Data and Economies.”
  4. National Academy of Sciences, Drawing Causal Inference from Big Data [videos]
  5. StackExchange, Open Data. A website for data requests.

One Programming Language

A general purpose programming language can manage data that comes in peculiar formats or requires cleaning.

Use Python by default. Its packages also replicate core functionality of Stata, Matlab, and Mathematica. Other packages handle GIS, NLP, visual, and audio data.

Python comes as a standalone installation or in special distributions like Anaconda. For easier troubleshooting, I recommend the standalone installation. Use pip for package management.

Python is slow compared to other popular languages, but certain tweaks make it fast enough to avoid learning other languages, like Julia or Java. Generally, execution time is not an issue. Execution becomes twice cheaper each year (Moore’s Law) and coder’s time gets more expensive.

Command line interfaces make massive operations on files easier. For Macs and other *nix systems, learn bash. For Windows, see cmd.exe.

  1. Kevin Sheppard, “Introduction to Python for Econometrics, Statistics and Data Analysis.”
  2. McKinney, Python for Data Analysis. [free demo code from the book]
  3. Sargent and Stachurski, “Quantitative Economics with Python.” The major project using Python and Julia in economics. Check their lectures, use cases, and open source library.
  4. Gentzkow and Shapiro, “What Drives Media Slant?” Natural language processing in media economics.
  5. Dell, “GIS Analysis for Applied Economists.” Use of Python for GIS data. Outdated in technical details, but demonstrates the approach.
  6. Dell, “Trafficking Networks and the Mexican Drug War.” Also see other works in economic geography by Dell.
  7. Repository awesome-python. Best practices.

Version Control and Repository

Version control tracks changes in files. It includes:

  • showing changes made in text files: for taking control over multiple revisions
  • reverting and accepting changes: for reviewing contributions by coauthors
  • support for multiple branches: for tracking versions for different seminars and data sources
  • synchronizing changes across computers: for collaboration and remote processing
  • forking: for other researchers to replicate and extend your work

Version control by Git is a de-facto standard. GitHub.com is the largest service that maintains Git repositories. It offers free storage for open projects and paid storage for private repositories.

Sharing

Storage

A GitHub repository is a one-click solution for both code and data. No problems with university servers, relocated personal pages, or sending large files via email.

When your project goes north of 1 GB, you can use GitHub’s Large File Storage or alternatives: AWS, Google Cloud, mega.nz, or torrents.

Demonstration

Jupyter notebooks combine text, code, and output on the same page. See examples:

  1. QuantEcon’s notebooks.
  2. Repository of data-science-ipython-notebooks. Machine learning applications.

Beamer for LaTeX is a standard solution for slides. TikZ for LaTeX draws diagrams and graphics.

Remote Server

Remote servers store large datasets in memory. They do numerical optimization and Monte Carlo simulations. GPU-based servers train artificial neural networks much faster and require less coding. These things save time.

If campus servers have peculiar limitations, third-party companies offer scalable solutions (AWS and Google Cloud). Users pay for storage and processor power, so exploratory analysis goes quickly.

A typical workflow with version control:

  1. Creating a Git repository
  2. Taking a small sample of data
  3. Coding and debugging research on a local computer
  4. Executing an instance on a remote server
  5. Syncing the code between two locations via Git
  6. Running the code on the full sample on the server

Some services allow writing code in a browser and running it right on their servers.

  1. EC2 AMI for scientific computing in Python and R. Read the last paragraph first.
  2. Amazon, Scientific Computing Using Spot Instances
  3. Google, Datalab

Real-time Applications

Real-time analysis requires optimization for performance. I exemplify with industrial applications:

  1. Jordan, On Computational Thinking, Inferential Thinking and Big Data. A general talk about getting better results faster.
  2. Google, Economics and Electronic Commerce research
  3. Microsoft, Economics and Computation research

The Map

A map for learning new data technologies by Swami Chandrasekaran:

6F57E263-545E-4E1D-B239-D01C17074A77
Source

 

Machine Learning for Economists: An Introduction

A crash course for economists who would like to learn machine learning.

Why should economists bother at all? Machine learning (ML) generally outperforms econometrics in predictions. And that is why ML is becoming more popular in operations, where econometrics’ advantage in tractability is less valuable. So it’s worth knowing the both, and choose the approach that suits your goals best.

An Introduction

These articles have been written by economists for economists. Other readers may not appreciate constant references to economic analysis and should start from the next section.

  1. Athey, Susan, and Guido Imbens. “NBER Lectures on Machine Learning,” 2015. A shortcut from econometrics to machine learning. Key principles and algorithms. Comparative performance of ML.
  2. Varian, “Big Data: New Tricks for Econometrics.” Some ML algorithms and new sources of data.
  3. Einav and Levin, “The Data Revolution and Economic Analysis.” Mostly about new data.

Applications

Practical applications get little publicity, especially if they are successful. But these materials do give an impression what the field is about.

Government

  1. Bloomberg and Flowers, “NYC Analytics.” NYC Mayor’s Office of Data Analysis describes their data management system and improvements in operations.
  2. UK Government, Tax Agent Segmentation.
  3. Data.gov, Applications. Some are ML-based.
  4. StackExchange, Applications.

Governments use ML sparingly. Developers emphasize open data more than algorithms.

Business

  1. Kaggle, Data Science Use cases. An outline of business applications. Few companies have the data to implement these things.
  2. Kaggle, Competitions. (Make sure you chose “All Competitions” and then “Completed”.) Each competition has a leaderboard. When users publish their solutions on GitHub, you can find links to these solutions on the leaderboard.

Industrial solutions are more powerful and complex than these examples, but they are not publicly available. Data-driven companies post some details about this work in their blogs.

Emerging applications

Various prediction and classification problems. For ML research, see the last section.

  1. Stanford’s CS229 Course, Student projects. See “Recent years’ projects.” Hundreds of short papers.
  2. CMU ML Department, Student projects. More advanced problems, compared to CS229.

Algorithms

A tree of ML algorithms:

machine_learning_alogrithms
Source

Econometricians may check the math behind the algorithms and find it familiar. Mathematical background:

  1. Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning. Standard reference. More formal approach. [free copy]
  2. James et al., An Introduction to Statistical Learning. Another standard reference by the same authors. More practical approach with coding. [free copy]
  3. Kaggle, Metrics. ML problems are all about minimizing prediction errors. These are various definitions of errors.
  4. (optional) Mitchell, Machine Learning. Close to Hastie, Tibshirani, and Friedman.

For what makes ML different from econometrics, see chapters “Model Assessment and Selection” and “Model Inference and Averaging” in The Elements.

Handy cheat sheets by KDnuggets, Microsoft, and Emanuel Ferm. Also this guideline:

screenshot
Source

Software and Hardware

Stata does not support many ML algorithms. Its counterpart in the ML community is R. R is a language, so you’ll need more tools to make it work:

  1. RStudio. A standard coding environment. Similar to Stata.
  2. CRAN packages for ML.
  3. James et al., An Introduction to Statistical Learning. This text introduces readers to R. Again, it is available for free.

Python is the closest alternative to R. Packages “scikit-learn” and “statsmodels” do ML in Python.

If your datasets and computations get heavier, you can run code on virtual servers by Google and Amazon. They have ML-ready instances that execute code faster. It takes a few minutes to set up one.

Summary

I limited this survey to economic applications. Other applications of ML include computer vision, speech recognition, and artificial intelligence.

The advantage of ML approaches (like neural networks and random forest) over econometrics (linear and logistic regressions) is substantial in these non-economic applications.

Economic systems often have linear properties, so ML is less impressive here. Nonetheless, it does predict things better, and more of practical solutions get done in the ML way.

Research in Machine Learning

  1. arXiv, Machine Learning. Drafts of important papers appear here first. Then they got published in journals.
  2. CS journals. Applied ML research also appear in engineering journals.
  3. CS departments. For example: CMU ML Department, PhD dissertations.

Consistency in Data Science

I took Kaggle competitions to measure internal validity in data science. Validity is an issue because it’s easy to get good predictions with a long feature list. Researchers know about this problem, managers don’t. But managers have the data and researchers don’t. Fortunately, managers release it into the wild sometimes, like for those Kaggle competitions. So let’s look if predictions based on this data remain consistent.

Kaggle has a handy rule for detecting overfitting:

Kaggle competitions are decided by your model’s performance on a test data set. Kaggle has the answers for this data set, but withholds them to compare with your predictions. Your Public score is what you receive back upon each submission (that score is calculated using a statistical evaluation metric, which is always described on the Evaluation page). BUT: Your Public Score is being determined from only a fraction of the test data set — usually between 25-33%. This is the Public Leaderboard, and it shows some relative performance during the competition.

When the competition ends, we take your selected submissions (see below) and score your predictions against the REMAINING FRACTION of the test set, or the private portion. You never receive ongoing feedback about your score on this portion, so it is the Private leaderboard. Final competition results are based on the Private leaderboard, and the Winner is the person(s) at the top of the Private Leaderboard.

Teams can’t win by submitting a lucky model that did well on the public set. Like, if you make a million models with different parameters and then choose the best fit. Instead, consistent solutions must perform well on both public and private sets. This is the validity that makes the model useful.

I scrapped both public and private leaderboards from 165 competitions. Correlation between the public and private scores for popular competitions:

kaggle_score_matrix_top_100

Perfectly consistent solutions would have similar scores on both public (horizontal axis) and private (vertical axis) leaderboards. We would see a straight line. It’s not very straight in some competitions. Points moving away from the diagonals say that solutions don’t digest the new data well and their predictive power is declining.

Correlation for places:

kaggle_place_matrix_top_100

This plot is illustrative for individual skills. When a data scientist gets a high score by luck, he won’t retain the position on the private leaderboard. Otherwise, he retains the position and if others do well, they form a straight line. We again don’t see the straight line in some cases.

What are those “some cases”? One is “restaurant revenue prediction“: predicting revenues for restaurants given geography and demographics. That’s a typical business problem in the sense that the company has few observations and many determinants. Data analysis can’t help here until the company gets the data on thousands of other restaurants. McDonald’s or Starbucks can get more, smaller chains can’t.

The Analytics Edge” competition is the MIT’s course homework for predicting successful blog posts also suffers from too many factors affecting the outcome.

Sometimes limitations exist by design. Kaggle is running a stock price prediction competition now, but the suggested data can’t do the job. Algorithmic trading relies on handpicked cases with unique data models, and the competition offers just the opposite.

How the same data scientists perform across different competitions:

kaggle_place_cross_matrix

Yes, we should find more straight lines, but they are not here. Instead, there are dense spots around the bottom left corners. Those are teams that broke into the top 100 on many occasions. They sort of did well without domain knowledge. However, when detected, experts did very well, as in this competition sponsored by an Internet search company.

Many problems remain unfriendly to quants, so solutions may be valid but not powerful. It can be fixed with more information, but other approaches often take over. For example, insiders remain the best investors in the restaurant business. A person runs a local restaurant for ten years. He knows the competitors, prices, costs, margins, clients. Of course, he is a better investor than the chain owner, even if the chain owner has a formal model. Markets work well here and centralized analysis don’t.

Research Is as Good as Its Reproducibility

Complex systems happen to have probabilistic, rather than deterministic, properties, and this fact made social sciences look deficient next to the real hard sciences (as if hard sciences predicted weather or earthquakes better than economics predicts financial crises).

What’s the difference? When today’s results differ from yesterday’s results, it’s not because authors get science wrong. In most cases, these authors just study slightly different contexts and may obtain seemingly contradictory results. Still, to benefit from generalization, it’s easier to take “slightly different” as “the same” and treat the result as a random variable.

In this case, “contradictions” get resolved surprisingly simply: by replicating the experiment and collecting more data. In the end, you have a distribution of the impact over studies, not simply of the impact within a single experiment.

Schoenfeld and Ioannidis show the dispersion of results in cancer research (“Is everything we eat associated with cancer?”, 2012):

screenshot

Each point indicates a single study that estimates how much a given ingredient may contribute to getting cancer. The bad news: onion is more useful than bacon. The good news: we can say that a single estimate is never enough. A single study is not systematic, even after a peer review.

The recent attempt to reproduce 100 major studies in psychology confirms the divergence: “A large portion of replications produced weaker evidence for the original findings.” In this case, they also found a bias in reporting.

Economics also has reported effects varying across papers. By Eva Vivalt (2014):

screenshot

This chart reports how conditional cash transfers affect different outcomes, measured in standard deviations. Cash transfers exemplify the rule: The impact is often absent, otherwise it varies (sometimes for the worse). For more, check this:

  • AidGrade: Programs by outcomes. A curated collection of popular public programs with their impact compared across programs and outcomes.
  • Social Science Registry. Registering a randomized trial in advance reduces the positive effect bias in publications and saves data-mining efforts by economists when nothing interesting comes out of the economist’s Plan A.

The dispersion of the impact is not a unique feature of randomized trials. Different estimates from similar papers appear elsewhere in economics. It’s most evident in literature surveys, especially those with nice summary tables: Xu, “The Role Of Law In Economic Growth”; Olken and Pande, “Corruption in Developing Countries”; DellaVigna and Gentzkow, “Persuasion.”

The problem, of course, is that the evidences are as good as their reproducibility. And reproducibility requires data on demand. But how many authors can claim that their results can be replicated? A useful classification by Levitt and List (2009):

1268D4B3-DB5E-4258-85B2-50C317434D16

Naturally-occurring data occurs naturally, so we cannot replicate it at will. A lot of highly cited papers rely on the naturally occurring data from the right-hand side methods. That’s, in fact, the secret. When an author finds a nice natural experiment that escapes accusations of endogeneity, his paper becomes an authority on the subject. (Either because natural experiments happen rarely and competing papers aren’t appearing, or because the identification looks so elegant that the readers fall in love with the paper.) But this experiment is only one point on the scale. It doesn’t become reliable just because we don’t know where the other points would be.

The work based on controlled data gets less attention, but this work gives a systematic account of causal relationships. Moreover, these papers cover the treatments of a practical sort: well-defined actions that NGOs and governments can implement. This seamless connections is a big burden, since taping “naturally-occurring” evidences to policies adds another layer of distrust between policy makers and researchers. For example, try to connect this list of references in labor economics to government policies.

Though to many researchers “practical” is an obscene word (and I don’t emphasize this quality), reproducible results are a scientific issue. What do reproducible results need? More cooperation, simpler inquiries, and less reliance on chance. More on this is coming.

Working Hours and Productivity in the United States

When East Asian countries grew at record rates, some articles attributed this to factor accumulation (eg Krugman 1994). Indeed, Japan and South Korea reinvested a lot of their output and also benefited from the growing working-age population. The data showed that factor accumulation actually went along with productivity growth, so these economies did have “genuine” improvements in the end.

Now, twenty years later, the same can be said about the United States. But this time, instead of capital, labor input drives economic growth. In 1950, the countries that would be called G7 looked this way (all data from PWT8, OECD):

kpw_lpw_1950

US workers had relatively short working hours and much more equipment than their colleagues in other countries. In 2010, the picture looks different:

kpw_lpw_2010

Hours declined rapidly in all countries but the United States. To feel the difference:

hour_deu_usa

With the typical disclaimer about comparing hours across economies, I’d rather emphasize the dynamics of changes, instead of comparing countries directly. The growth paths for regional leaders:

kpw_lpw_three

These lines just smooth annual observations along 1950–2011. I also added GDP per worker under the markers.

Overall, if German firms cut hours by 40% since 1950, US firms cut only by 10%. Working hours stopped declining in the US around 1980 (perhaps to offset stagnating real incomes). Regardless of which counterfactual you like more (the US trend before 1980 or Germany’s), it implies a substantial difference in output — fueled by labor input, just as capital input helped East Asian economies decades ago.

Software as an Institution

The rules of the game, known to economists as institutions and to managers as corporate culture, usually entail inoperable ideas. That is, any country or business has some rules, but these rules coincide neither with optimal rules nor with leadership vision. Maybe with an exception of the top decile of performers or something like this.

This inoperability isn’t surprising since the rules have obscure formulations. Douglass North and his devotees did best at narrowing what “good institutions” are, but with North’s bird-eye view, you also need an ant-eye view on how changes happen.

An insider perspective had been there all the time, of course. Organizational psychology and operations management organized many informalities happening in firms. In general, we do know something about what managers should and shouldn’t do. Still, many findings aren’t robust as we’d like them to be. There’s also a communication problem between researchers and practitioners, meaning neither of the two cares what the other is doing.

These three problems—formulation, coverage, and communication of effective rules—have an unexpected solution in software. How comes? Software defines the rules.

Perhaps Excel doesn’t create such an impression, but social networks illustrate this case best. After the 90s, software engineers and designers became more involved in the social aspects of their products. Twitter made public communications shorter and arguably more efficient. In contrast to anonymous communities of the early 2000s, Facebook insisted on real identities and secure environment. Instagram and Pinterest focused users on sharing images. All major social networks introduced upvotes and shares for content ranking.

Governance in online communities can explain success of StackExchange and Quora in the Q&A space, where Google and Amazon failed. Like Wikipedia, these services combined successful incentive mechanisms with community-led monitoring. This monitoring helped dealing with low-quality content that would dominate if these services simply grew the user base, as previous contenders tried.

Wikipedia has 120,000 active editors, which is about twice as many employees as Google has (or alternatively, twelve Facebooks). And the users under the jurisdiction of major social networks:

So software defines the rules that several billion people follow daily. But unlike soft institutions, the rules engraved in code are very precise. Much more so than institutional ratings for countries or corporate culture leaflets for employees. Code-based rules also imply enforcement (“fill in all fields marked with ‘*'”). Less another big issue.

Software captures the data related to the impact of rules on performance. For example, Khan Academy extensively uses performance tracking to design the exercises that students are more likely to complete — something that schools with all the experienced teachers do mostly through compulsion.

Finally, communication between researchers and practitioners becomes less relevant because critical decisions get made at the R&D stage. Researchers don’t have to annoy managers in trenches because software already contains the best practices. Like at Amazon.com that employed algorithms to grant its employees access privileges based on the past performance.

These advantages make effective reproducible institutions available to communities and businesses. That is, no more obscure books, reports, and blog posts about best practices and good institutions. Just a product that does specific things, backed by robust research.

What would that be? SaaI: software as an institution?

Russia Growth Diagnostics: Conclusions

After a week of writing about the Russian economy, I put together the pieces of its growth diagnostics. The conclusion follows.

Contents

  1. Getting Started
  2. Introduction to Russia
  3. Finance
  4. Infrastructure and Human Capital
  5. Uncertainty
  6. Taxes and Laws
  7. Market Structure and Competition

The Hausmann–Rodrik–Velasco framework (HRV) that I use throughout the series is explained in the first post. The replication files are available on GitHub.

Conclusion

I picked Russia’s industrial policy as the top issue. Economists rightfully reach for their revolvers when they hear this. Local barons use “industrial policy” to justify protection for their industries. Usually it ends badly, even backed by good historical examples. The United States, Germany, and South Korea protected their “infant industries” at some point of time. But the isolated impact of protectionism is unclear in these cases. It may have any sign. Besides, protection for the 19th-century manufacturing sectors says a little about what a country should promote now.

So what should a country promote with industrial policies? In Russia, less protection for losers and more benefits for winners. This rule is hard to follow when the government appoints losers and winners. The market just supports those who get government protection. It must be the other way around! Government may support companies that do well in competitive, often international, markets. Usually, it’s not about handpicking specific firms, but about simple rules for entire sectors. A good industrial policy for Russia would imply less discretion in industrial policy making.

Since this series concerns only diagnostics, I’m not commenting on actions that can make sense in this case. However, I hope this series shows well that for upper-middle income countries, like Russia, the HRV framework must go much deeper because constraints become less obvious as economic complexity increases.

Appendix: What deserves attention

Industrial policy is important, but overall I took four issues from the previous posts in the series. The table below ranks each of them on a scale from 1 to 5 according to:

Impact

  • How large is the possible impact of changes in this constraint on economic growth?
  • How far is the current situation from the optimal one?

Confidence

  • How many evidences are available (or potentially available) for managing changes in this direction?
  • Confidence in normative recommendations that economics can offer here
  • Clarity of recommendations and their quantitative substance
  • The “distortion potential” for stakeholders in cherry picking or changing recommendations in their favor

Feasibility

  • Are there stakeholders who lose from changes in this direction?
  • Do these stakeholders have policy making power?
  • Are there stakeholders who win and can support changes in policies?

This is my arbitrary scale, and you may have your own. Now back to the list. The four constraints belong to “Taxes and Laws” (formal taxes, formal regulations) and “Market Structure and Competition” (industrial organization, industrial policy):

screenshot

(More here means more potential for growth. The scale is for ranking only and doesn’t reflect the size of the gap between constraints.)

Formal taxes. As mentioned in the post, Russia combines its own idiosyncratic taxes with the same taxes that disincentivize accumulation of human capital in developed countries. Changing this situation could be technically easy because taxes are specific and quantitatively transparent mechanisms. However, taxes are vulnerable to politics (partly because of their transparency) and changes here are likely to face much opposition.

Formal regulations. Regulations constrain growth through two channels. First, they are direct and often not justifiable costs that always end in prices. So the citizen pays for them. Second, regulations grant power to corrupt officials. Corruption discredits both good and bad regulations, with more negative consequences for law enforcement and state capacity. That’s a quite conventional simplification, and the key nontrivial question is how large the distortions are (see rough estimates in this post). As for confidence, the burden of proof for regulations must be on the regulator and lawmakers. This is an idealistic picture for any country, but the point here is that many proofs don’t fit well what economics already knows.

Industrial organization. I put de-facto competition above de-jure regulations because competitive firms actually find ways to optimize regulatory costs, minimize corruption, and somehow do this without breaking the law. I would exemplify this with multinationals, which are more productive than domestic firms even in bad environments. For the entire range of issues that arise here, see the previous post.

Industrial policy. Like regulations, the policy of keeping weak firms afloat is paid by citizens, including workers who suffer poor management and lower wages. This part of low-performing firms can’t live without subsidies of some sort, but these subsidies are a bad form of job security. The key policy problem is how to relocate capital and labor to the best firms without unemployment and loss of physical capital. It’s technically difficult and economic knowledge here is limited. Why did I mark it as the most feasible then? Certain cases could have many winners, including powerful decision makers, and this is more important than administrative complexities.

This doesn’t say how much exactly the Russian economy could get from solving problems in these areas. The impact depends on the changes in question. But the exercise in growth diagnostics just organizes many issues in a single framework, which should ideally direct attention to more specific issues.