No, "Big Data" Can’t Predict the Future

With Google’s dominance in the online search engine market we entered the Age of Free. Indeed, services offered online are nowadays expected to be offered at no cost. Which, of course, does not mean that there is no cost to it, only that the consumer doesn’t pay it. Early attempts financed the services with ads, but we soon saw a move toward making the consumer the product. Today, free and unfree services alike compete for “users” and then make money off the data they collect.

Data has always been used, but what’s new for our time is the very low (or even zero) marginal cost for collecting and analyzing huge amounts of data. The concept of “Big Data” is taking over and is predicted to be “the future” of business.

There’s a problem here, and it is the over-reliance on the Law of Large Numbers in social forecasting. Statistical probabilities for events may mathematically converge to the mean, but is it applicable in the real world? The answer is most definitely yes in the natural sciences. Repeated controlled experiments will weed out erroneous explanations or causes to phenomena, at least assuming we’re good enough at separating and controlling those causes.

What about the social sciences? In this age of scientism, as Hayek called it, we’re told “Big Data” will completely transform production, logistics, and sales. The reason for this is that vendors can better target customers and even foresee what they might want next. Amazon.com does this on their web site in crude form, where they make suggestions based on your purchase history and what others with similar purchase histories have searched for. Sometimes it works, and sometimes it doesn’t.

There is some regularity to our interests and behavior. All of us are, after all, human beings — and we’re formed in certain cultures. So one American with interests x, y, and z may have other interests similar to another American who also has an interest in x, y, and z.

Human Behavior Is Unpredictable

But similarity is not the same thing as prediction. Amazon.com’s suggestions or the highly annoying ads following you around web sites are useful methods for sellers because they can somewhat accurately identify what not to offer. Exclusion of very low-probability interests increases the probability for suggesting something that the person behind the eyeballs focusing on the computer screen may be interested in.

To use as prediction, however, exclusion of almost-zero probability events is far from sufficient. Indeed, prediction requires that we are able to accurately exclude all but one or a couple highly probable outcomes. And we have to be able to rely on that these predictions turn out to be true. Otherwise we’re just playing games, and so we’re making guesses. Sure, they’re educated guesses (because we’ve excluded the impossible and almost-impossible), but they’re still games and guesses.

Where Big Data Fails

Speaking of guesses, Microsoft’s Bing search engine, which powers the Windows digital assistant Cortana among other things, has produced a prediction engine with the purpose of predicting sports and other results. They rely on very advanced algorithms and huge amounts of collected data.

Amazingly, they did very well initially and predicted the outcomes of the World Cup perfectly. So maybe we can use Big Data to get a glimpse of the future?

No, not so. The Bing teams are learning a lesson only Austrians and, more specifically, Misesian praxeologists, seem to be alone in grasping: that there are no constants in human action, and therefore that predictions of social phenomena are impossible. Pattern predictions, as Hayek called them, may not be impossible, but predictions of exact magnitudes are. For instance, we can rely on economic law (such as “demand curves slope downward”) to estimate an outcome such as “the price will be lower than it otherwise would have been,” but we can’t say exactly what that price will be.

When it comes to sports, reality shows and other competitions between individuals or teams, the story is exactly the same. The team with a better track record doesn’t always win. Why? They have objectively performed better than the other team, perhaps exclusively so, but this doesn’t say anything about the future. We’re not here referring to the philosophical doubt as in “will the sun shine tomorrow?” (maybe something changes completely the sun’s ability to shine during the night).

The Social Sciences Are Different

In the social sciences we’re dealing with complex phenomena. Action and, especially, its outcome is the result of a complex system of social interaction, psychology, and much more. Are the players in both teams as motivated and focused as they were before? Did anything in their personal lives affect their mindsets or psyches? How do the players within their teams and players in other teams react on each other before and during the game? A team with a poor track record can upset a team with an objectively better track record; this happens all the time. Sometimes for the sole reason that the better team underestimates the worse team, or because the underdog feels no pressure to perform and therefore plays less defensively.

Bing’s prediction engine struggles with this, just as we would predict. As Windows Central reported recently, the prediction engine had its “worst week yet” picking only four of fourteen winners in the NFL. Overall, its track record was approximately two-thirds right and one-third wrong (95–53). It’s definitely better than tossing a coin, but pretty far from actually predicting the results.

In other words, if you’re placing bets you may want to use the Bing prediction engine. That is, unless you have the type of tacit, implicit understanding of what’s going on that the engine is missing. Maybe you can beat it, or maybe not. In either case, you cannot count on coming out a victor each and every time.

The reason for this is that the outcome simply cannot be predicted perfectly — or even close to it. Even the players themselves cannot predict who’ll win a game, but they may have inside information about whether their own team seems motivated and focused. It is not a perfect method, however, and it certainly cannot be scientific.

Even with Big Data there’s no predicting of social events — there’s only guessing. Yes, guessing with access to huge amounts of data is easier, at least if the data is reliable and relevant. But a good guess is not the same thing as a prediction; it is still a guess, and it can be wrong. Winning every time requires luck.

No, “Big Data” Can’t Predict the Future

Human Behavior Is Unpredictable

Where Big Data Fails

The Social Sciences Are Different