# Correlation does not equal Causation*

Around the time that it became fashionable to bash on this study showing that high IQ is correlated with liberalism, atheism, and monogamy, I started working on a post entitled “Correlation does not equal Causation: The Last Refuge of Scoundrels.” (Yes, I was still working on the title too).

My intent wasn’t to defend the study’s findings (apparently its faults were many), but rather to highlight the ubiquity of the phrase “Correlation is not Causation” in pieces condemning it. Apparently, every educated person in American knows the phrase “correlation is not causation” (CinC hereafter), and they can trot it out whenever statistical evidence threatens their favorite worldview.

I was going to rail against the overuse of the CinC attack by hitting on the following:

**Some Statements about Probability****

So obvious that it often gets lost in these discussions is that correlation actually *is* evidence for causation, i.e.

1. P(Causation | Correlation) > P(Causation | No Correlation)

But few studies report a simple bivariate correlation anyway. Most use a linear model to estimate the “effect of X on Y, net of covariates,” or the *partial correlation*:

2. P(Causation | Partial Correlation) > P(Causation | Bivariate Correlation)

And of course, there are all sorts of sophisticated models one can use to determine the effect of X on Y even if you can’t observe some covariates. This is called identifying the effect of X on Y.

3. P(Causation | Cleanly identified correlation) > P(Causation | weakly identified correlation)

Next, you need a plausible theory explaining why X causes Y (and not vice versa, for example), and why the partial correlation isn’t spurious (i.e. you have controlled for all possible covariates).

4. P(Causation | Correlation backed with a plausible theory of causation) > P(Causation | Correlation without a plausible theory of causation)

And finally, replication:

5. P(Causation | Cleanly identified partial correlation found in independent sources of data backed with a plausible theory of causation) > P(Causation | Without all that stuff)

**My Aborted Crusade**

I just wrote out a bunch of uncontroversial inequalities. But even the final and most-probable statement falls short of establishing causality. The unfortunate fact is that *nothing establishes causality.* It is impossible. We do not live in a deterministic world, we live in a probabilistic one. We cannot even attach a probability of 1 to a statement like “Smoking increases your risk of cancer.” All of that statistical evidence could just be a coincidence so unlikely that it will only happen once in the history of the universe!

Still, the probability that smoking increases your risk of cancer is incredibly high. And if you tell me, “We still don’t know whether smoking causes cancer, because not even identified partial correlations equal causation with probability = 1,” I will conclude that you are a nihilist (or a postmodernist).

When educated bloggers use “correlation is not causation” as a bludgeon to clobber any statistical analysis they don’t agree with, I see this as an assault on the simple notion that we humans can use reason to increase our certainty about the world around us. Inappropriate use of the CinC attack corrodes the power that good social science should have on the public discourse. In fact, I’d argue that “Correlation is not Causation” is second only to “There are lies, damned lies, and statistics” in phrases that most damage the public discourse’s relation to quantitative social science.

I believe this so strongly that I intended to start a one-man crusade against the phrase. But then I read the aptly titled Jenny McCarthy is objectively pro-dead children at the United States of Jamerica. Cue Jenny:

The idea that vaccines are a primary cause of autism is not as crackpot as some might wish. Autism’s 60-fold rise in 30 years matches a tripling of the US vaccine schedule.

Sigh.

So unless I can popularize the phrase “Bivariate correlation usually doesn’t imply causation,” I think I’m going to put my anti-CinC crusade to bed.

——-

* This asterisk was for rhetorical effect

** P(X|Y) is read as “The probability of X occurring given that we know Y is true.”