Best Time to Post? It’s Irrelevant

While social media invent various algorithms to show relevant information to users, companies like Buffer are trying to understand how to circumvent these algorithms to promote their clients’ content. This is not necessarily a zero-sum game, as it may seem. Optimizers add more structure to the content, pick relevant addressees, and distribute content to the media where information overload is less extreme.

The simplest problem around is to pick the best time for posting when you already have certain content. I looked into this once for, and the optimal timing happened to depend a lot on the subsite in question. StackExchange is a network of Q&A websites built on a common technology but with somewhat segregated users and different rules. The subsites look alike, integrated, and you normally expect the common features to prevail over everything else. But according to the data, the patterns of performance, such as time-to-answer, vary across the subsites. The soft rules—those that are not engraved in the common software code—and people make them vary.

Here’s another example: Y-Combinator’s Hacker News, which has a solid community and transparent ranking algorithm. The rules are simple: a user submits a link and title, the community upvote this submission. Good submissions make the front page, bad submissions are unread and forgotten. The service receives more than 300,000 submissions annually. The question is the same: given a submission, what’s the best time to post it? I took the number of expected upvotes as the criterion.

Many studied the Hacker News dataset before. A good example is this one. There’s even a special app for picking the time (I didn’t get what it does exactly). They answered different questions, though.

Here’s my version of events. In this post, however, I’d make another point based on this data.

First, just looking at upvotes shows that weekends are the better days for posting (0 is Monday, 6 is Sunday):


In particular:


However, this approach can’t say much. Time affects not only users who read links submitted to Hacker News (demand), but also those who submit the links (supply). You have causation suspects right away. Like, maybe users submit better links on weekends because they have more time to pick the good ones. Then scheduling the same submission of yours to weekends would not increase the upvotes it gets.

For a bunch of typical reasons (few variables available, unstructured data, and no suitable natural experiments), the impact of time on upvotes is hard to separate from other factors. You have only indirect evidences. For example, less competition on weekends may increase expected upvotes:


It remains unclear how to sum up indirect evidences into conclusions. Statistical models would disappoint. Time-related variables explain less than 1% of variation—meaning, unsurprisingly, that the other 99% depend on something else. This something includes the page you link to, the readers, and nuances of Hacker News’ architecture.

My point is, even a simple algorithm can be efficient, meaning, its outcome is independent of irrelevant factors, like time. A complex algorithm may perform worse, in fact. If content promotion depends on the author’s social capital (followers, friends, subscribers), ranking relies on the author’s past submissions rather than the current one. So, Facebook’s or Quora’s algorithms for sorting things for users are not only harder to pass through; they also may distort important outcomes.

See also: Python notebook with Hacker News data analysis

Unrestricted evil

Online services before Facebook were mostly anonymous. At least, no one required real names and SMS confirmations. You sign up and write anything. I mean anything.

There were scary fairy tales about writing anything without putting your real name on it. Various regulations imposed on the Internet, especially in non-democracies, rely on these tales. But is the opposite—real names and full disclosure—really necessary for a good-standing community?

Let’s check. The anonymous culture is still alive on some resources. Today is not about 4chan, but about Stack Exchange, a major Q&A website. Stack Exchange has an open data tool for querying data. The tool is quite useful for testing various hypotheses about human communities. It would be a service to humanity if other web services offered similar openness, but so far we have few.

Stack Exchange (SE) doesn’t ask names and so on. Though real names are common and some employers ask candidates for links to their SE profiles, the service is basically anonymous. We expect dirty things to happen.

One dirty thing is excessive downvotes for questions and answers others post. Kind of vandalism. And here’s the graph:

The graph shows net votes (upvotes – downvotes) for each user with 10+ upvotes or downvotes from a 1% sample (about 31K users). That small tick on the left is users who mostly downvote.

A slightly different perspective:

(The axes were log-linearized for easy reading.)

User behavior happens to be extremely balanced. Few users tend to upvote or downvote extremes. Most of them try to be honest.

So, showing your name is not necessary for good behavior. Online communities can manage themselves without references to the official world, regulations, and witch-hunting. It only matters what environment people want to be in, and then they’ll be able to recreate it online.

Startups across countries

A few plots in addition to yesterday’s post on startups.

Startups and economic development

Sources: dataset and Penn World Table 7.0.

That’s not a bad fit for relations between startups and GDP. The number of startups in the dataset seems to be a good indicator of entrepreneurial activity in general.

Startup nation

Here’s an illustration for Dan Senor and Saul Singer’s thesis about Startup Nation:
Israel has relatively more startups than the US. Tel Aviv and Silicon Valley drive the numbers for their countries, so it’s not exactly a nation-wide phenomenon. You call the book Startup City, though the result is no less impressive.

Web data and language barriers

Like other sources based on voluntary reporting, CruchBase may have data biased on one or another way. For example, it may underrepresent countries, in which English is not a major language. And we expect a bias in favor of bigger firms. And here’s the case:

China and Russia indeed either have bigger startups on average or just underreport to CrunchBase. The latter is the case because these are exactly two major countries that stand behind a language firewall. They have their own Facebooks, Twitters, and Amazons. So, we expect them to be less active on CruchBase. More so:

The surprising break after the 90th percentile separate countries into two groups. What are the groups? Look here:

(US and UK are excluded to make the graph readable. 100+ startup countries included.)

Group 1 are countries with < 0.02 startups per 1,000 inhabitants and Group 2 are the rest. And in result Group 2 contains countries with an explicitly high role of English language. So, the break indeed looks like a language thing.

Nevertheless, language per se is not a big factor in development, so it doesn’t bias the data on GDP in a systematic way. (You can also control the very first plot for the percentage of English-speaking population.)