Big Data or Pig Data?

(A fable on huge amounts of data and why we don’t need models)

 There was a pig who wanted to be a scientist. He was not interested in models. When asked how he planned on making sense of the world, the pig would say in a deep mysterious voice, “I don’t do models: the world is my model” and then with a twinkle in his eyes, look at his interlocutor smugly.

By his phrase, “I don’t do models, the world is my model”, he meant that the world’s data was enough for him, the pig scientist. The more the data, the more accurately the pig declared, he would be able to predict what might happen in the world.

Around that time, some dogs opened a pub called, “Doogle” which was visited by all animals in the jungle. The wine was delicious and the traffic at the pub was unprecedented. The dogs became rich and famous; they also obtained a lot of data from the visiting animals. They bought even more pubs and collected even more data about their customers.

Now, they wanted to analyze this data to attract even more customers towards Doogle. The pig saw this as a big opportunity and gathered other like-minded pigs. The drove of pigs helped Doogle in applying pigstatistical methods (ham-correlation formulation etc), to predict various things including: kinds of animals attracted to the kinds of beverages; drinking patterns of different

mini-pigs copyspeech
Pigs looking at pig data, and applying pigstatistics

animals; the kinds of tables liked by classes of animals; arrival times; number of glasses Doogle would need in the near future, etc, etc, etc. To an astonishing degree, the pigs made quite accurate predictions using their pigstatistics.

The services of our pigs were acquired by other entities including FaceSlap, Barker, and Snorter, among others. Our heroic pigs helped their clients in outshining the competition. In fact the pigs method of collecting huge amounts of data and then applying pigstatistics on it came to be known as “Pig Data” in their honor.

In the meantime, somewhere in the jungle, the group of owl scientists who had through history been making models and theories and performing experiments based on them, were now being told that it was all meaningless; that their approach was worthless. The owls didn’t pay any attention, even though everyone else was euphoric. However, if the truth be told, some owls did lose heart and became so demoralized that they gradually transformed into pigs! And immersed themselves deep in the world of pig-data.

From time to time, Doogle, FaceSlap and others, would make some modifications, such as changing the color of the wine-glass and seeing how quickly people reached for the glass based on the color. Upon analysis of the customers reactions, the pigs could then analyze which color resulted in the fastest response-time. So this was the era of pig-data. The pigs had won the battle. Pig data was everywhere.

But the fact is that our hero-pig, whom we met at the beginning of our fable, was still not happy. He felt that things were only getting started. He wanted to replace the owls completely. What’s more, he wanted to predict EVERYTHING. He wanted psychohistory, as the ‘good doctor’ of old had dreamt. Yes sir, predicting everything was his goal!

He decided to start his quest by studying falling bodies. As was his norm, he collected data about all instances of all objects falling down all over the place. He now had huge amounts of data, and he applied pigstatistics on it. He discovered that more things fell in the morning and during day-time, when animals were awake, and fewer things fell during the night when animals were sleeping!

He shared his findings in front of the whole jungle, looking directly at the owls, who were also present. The chief owl, called Owlileonewtein, countered that while such information could be useful, it did not explain much. Why did bodies fall? At what rate did they fall? What were the relevant factors, etc?

Our Hero ecstatic on discovering the law of falling bodies

On hearing this, the pig positively beamed with joy because he had come prepared. He announced proudly that he had found a correlation between the weight of the body and the speed of falling. His stats told him that while heavy things fell at a great speed, light things such as animal hair, bird feathers, etc fell much more slowly. “So therefore”, he thundered, “I have discovered the law of falling bodies. Heavy goes fast; light goes slow.” All the animals clapped in joy. The law of falling bodies had been discovered!

Upon hearing all this, Owlileonewtein, the chief owl, said forcefully, “But this is not correct. If we ignore friction and air resistance, I can tell you that all bodies, regardless of their heaviness, fall at the same rate. Indeed consider a frictionless plane…”

But as soon as he said this, the pig snorted, “Frictionless plane? My dear animals, has anyone ever heard of such an oxymoron?” All animals laughed.

Owlileonewtein protested: “No, based on my model, we can do suitable experiments to test it…”.

On hearing this, the pig suddenly got very serious and menacing. He lifted his paw and pointed it at Owlileonewtein, “You sir, are a relic of the past. Your way of doing things is over. Haven’t you heard what my fellow pig scientist, Peter Norpig, head of pig intelligence at Doogle, has said, ‘All models are wrong, and we can learn models from data.’ So enough of your models and enough of your model-based experiments. We need neither! All we need is pig-data!” And with this, the pig in his furious excitement stood up on his hind-legs, and shouted, stretching the word ‘pig’ with the full force of his pig personality: “Piiiiiiiiiiiiiiiiiiiiig!” And the animals responded: “DATA!”

“Piiiiiiiiiiiiiiiiiiiig” — “DATA”! “Heavy goes fast; light goes slow!”

Having demonstrated his power to the owls, as a last act of annihilation, he picked up a stone from the ground and tore away a strand of hair from his tail. Holding one object in each fore-leg, he dropped them at the same time. The stone reached the ground much earlier than the strand. With this, he dusted one fore-leg against the other, and then turned around to show his backside to the owls. He shouted triumphantly, one last time:

“Heavy goes fast; light goes slow!”, “Heavy goes fast; light goes pig bum textslow!”

“Piiiiiiiiiiiiiiiiiiiig” — “DATA!”

(Endnote: There is nothing wrong with huge amounts of data, i.e. big data. But we need to think about the direction that we are taking. What are our goals? And what are the potential benefits and potential limitations of such an approach. Does analyzing all kinds of data make any sense at all? Where does it make sense? Where doesn’t it make sense? Should we be reflecting on the claims being made about big-data, etc.)

Update @ 6 April, 2013: I made the following animation that makes a similar point in a different way. 

20 responses to “Big Data or Pig Data?”

  1. Well, if the pig had truly done statistics on the weight he should have noticed that a grain of sand falls much faster than a feather with the same weight, therefore, he could not present his law of falling boides since he would have data showing that sometimes same weights have different speeds… Unless the pig is fabricating results too 😉

    1. Hehe. Indeed, the pig lies 🙂 I think he also didn’t see steam rising up 😉

  2. Old Major would approve of your post – but he is long gone. Below is something I was sent recently about “big data”:

    Machine Learning, Big Data, Deep Learning, Data Mining, Statistics, Decision & Risk Analysis, Probability, Fuzzy Logic FAQ

    1. Indeed, our friend Orwell again. I esp. took inspiration from the cadence of “Four legs good, two legs bad”

      1. “Four legs good, two legs bad (p<0.05). However, the authors recognise the limitations of making causal inferences from observational data."

    2. Brigg’s criticism of the notion that probability is just a number suggests to me a prequel for your tale.

      Hero pig co-authors a book on Pigtastic Intelligence which has in it some useful tips and is very accessible. It becomes a staple of pig-schools, so that generations of pig-practitioners come to respect and rely on it. It advocates a simplistic notion of pignistic rationality: decision making by numbers. On second thoughts, this all seems a but implausible: surely the wise Owls would have seen the danger and stepped in before the misunderstanding of probability led to broader misunderstandings?

      1. 🙂 Thanks for this interesting prequel. As for the owls, they could not stop it. But they are trying to fight back. Here’s what Sydney Brenner (the Nobel prize winning biologist) says, “Actually, the orgy of fact extraction in which everybody is currently engaged has, like most consumer economies, accumulated a vast debt. This is a debt of theory, and some of us are soon going to have an exciting time paying it back – with interest, I hope”

  3. Slightly related paper (and topic): Information Physics
    http://www.mdpi.com/2078-2489/3/4/567

    1. Thanks for the link frienda!

  4. There is a clock in Exeter cathedral based on the Ptolemy’s calculations taking the earth as the centre of the universe. It remains incredibly accurate.

  5. […] Big Data or Pig Data? […]

  6. In your story, to what extent have the Owls been careful to preserve the Owl ‘brand’, and to explain themselves to the other animals?

    1. I think within themselves they are pretty secure about their “brand”. But as for countering the pigs approach, that’s something else. Some of the owls come forward and criticize the pigs. The pigs approach is successful in some application/engineering domains, and there is as Sydney said (in the quote above), an orgy going on. So its not exactly easy but the times might well be a changing!

  7. […] In closing, there’s a piece I’d like you all to read – the article that inspired me to write this; a fun little fable that cautions against too heavy a reliance on the crutch of unstructured data analysis. It’s called “Big Data, or Pig Data?“. […]

  8. This is awesome! As an owl, I feel like I always have to be defensive of the work I do.

  9. i agree with the overall message that we need to be thinking of the application of big data. It is often overlooked how difficult it can be to sell to clients and customers – this article covers it well http://blog.insight.com/2013/05/sell-the-application-of-big-data-2

  10. […] de la théorie par rapport aux données, la lecture indispensable de la semaine est cette fable très divertissante sur la « Pig Data » qui, à la façon de La ferme des […]

  11. […] See on scensci.wordpress.com […]

  12. […] If your model’s not explicit (and if you don’t care much for doing experiments), then your big data might as well be pig data. While I’m at it, I’ll raise you R. A. Fisher […]

  13. […] in my humorous disdain for nerds who haven’t read literature, I wrote an Orwellian fable titled Big Data or Pig Data, a humorous critique of the statistical approach to science. Not to toot my own horn, while […]

Leave a comment