Pitfalls of rating systems

A few years ago, YouTube changed its rating mechanism from five-star rating to upvote–downvote system. And it makes sense after you look at the typical distribution of ratings:

In most cases, users set either 1 or 5. That’s not very informationally efficient, but it’s the fact that users were reluctant to rate videos across the entire scale.

This J-shaped distribution creates problems because the mean here makes no sense. When a website reports an average rating of 3.0, it means one of two things. Either one person rated the video at 1 and another one at 5. Or both rated it at 3, which is almost never the case.

In an economy guided by ratings, the difference between these two interpretations is large and unpleasant. Since no one rates stuff around the mean, a decision based on this mean is uninformed. In the end, you watch something that you’d later rate at 1 or 5, not 3. It’s like you’d ordered a steak and the waiter brought you sushi.

The worse thing about this risk is that it’s implicit. Users look at ratings to reduce the risk of making a wrong choice, but instead they gamble between 1 and 5. Fortunately, the ratings aren’t entirely random. They’re conditioned on the stuff we observe, like gender, age, and interests. The means then may start working. Just check if those 1s and 5s were set by distinct demographic groups.

Of course, it’s now about hundreds of 1s and 5s, because the degrees of freedom go down with each factor we get into the equation. How to get more ratings?

The solution is exactly what YouTube did: replace a five-star scale with a binary choice. Users don’t like investing time in thinking about the proper rating, so thumbs up or down helps with decision fatigue.

More ratings allow computing the means for subgroups of users. These subratings become more relevant for those who search stuff by its rating. Though YouTube didn’t make customized ratings yet, that’s an option for many web services relying on user feedback.

While Uber and Fiverr can improve their rating systems by reducing it to binary choices, a scale is still a good choice for, say, IMDb. When you watch a movie for two hours, you try to rate it better than YouTube’s typical three minutes. And then multiple peaks emerge for controversial movies:

You have the mean and median near to each other in a sort of Poisson distribution. The other two peaks are around radical 1 and 10. So, you need more than two grades on a scale.

Conventional hits have the YouTube pattern though:

Which again looks like the Poisson distribution with the disproportionate number of 1s.

In the end, a good rating system has to balance between the desirable number of votes and the size of the scale.

The cost of being in the top, or when Zipf’s law breaks

Many things in the world have a Zipfian distribution. Xavier Gabaix recently attempted to explain how this pattern may emerge for the case of cities. Lada Adamic covered how Zipf’s law goes online. The literature is vast because any field has its own examples.

But Zipf’s law is a market-type pattern: the outcome of many agents making decisions independently. When you have a planner, it may break.

Here’s an example:

(Data comes from generous Wikipedians.)

Zipf’s law implies that here you must see a nice linear relations along the ranks. Instead, you see three lines: top 3 (I), top 10 (II), and 30 songs (III). The slopes of the lines are different in an interesting way.

Line II and III have the same slope and discontinuity between 10 and 11. (Actually, the discontinuity is between 9 and 10, but the 10th song is Katy Perry’s 2013 release, while songs around it are older. Kind of an outlier.)

This discontinuity is the bonus for being in Top 10. If any magazine or website mentions YouTube most viewed videos, it’s usually something like top 3 or 10 or somethings like that. Other songs don’t get much attention, even if they’re equally good.

But there’s another effect: the bonus for taking the first place. This is what Line I and Line II are about. Psy goes through the roof of Zipf’s law just as Bieber did before him.

So, why it’s about planning? Well, without the media, there would be no tops. And tops are coordinating devices that say what we should watch, listen, and do first of all.

That is planning.