Skip to content

Probability is Counterintuitive: Immigrant Solidarity Edition

May 3, 2010

I am going to let you all in on a simple statistical trick.

The Trick

First, randomly generate three vectors of 100,000 observations each.

A <- rnorm(10^6,100,15)
B <- rnorm(10^6,25,3)
C <- rnorm(10^6,10^5,10^4)
Because these vectors have all been randomly generated, their correlations are basically zero.
cor(A,B) = 0.0002
cor(B,C) = -0.0017
cor(A,C) = -0.0003
Interpret these correlations as saying, “Knowing the value of A tells us nothing about the value of B or C.”
But here’s the magic. Create ratios of A to C and B to C and then take their correlation:
I got a correlation of 0.37 – not bad in the social sciences. But remember, these correlated ratios were created by manipulating three random, independently generated vectors of observations.

Fill in the Blanks

Okay, let’s rename our randomly generated variables.

  1. Let’s call “A” the number of Hispanic immigrants in a city.
  2. Call “B” the number of violent crimes in a city.
  3. Call “C” the population of a city.
So A/C is the percentage of Hispanic immigrants in a city, and B/C is the violent crime rate for that city. And what do you know – they’re strongly and positively correlated.
What’s more: call “C – A” the number of non-immigrants in your city. Then the correlation of Real Americans with violent crime is strong and negative. Incredible!
Arm Yourself

In the wake of Arizona’s immigration law, I’ve seen a fair bit of discussion of the relationship between immigrants and violent crime. And while I’ve yet to see any of these bloggers as much as link to a simple correlation with real data, I know that somewhere out there is an amateur social scientist with a single-user license to SPSS, and he’s going to show you a scatterplot just like this:
and claim that it means something. But if you’re correlating two ratios with the same denominator, it means nothing.
Studies in epidemiology and criminology are especially prone to this problem because they are often reliant on aggregate-level rates. I don’t know if this problem has a name, but I’d like to propose “The Nativist’s Fallacy.” Just use your imagination and think of all the ethnic groups we could put in “A” and all of the events we could stick in “B”. For some, the opportunity is irresistible.
This plot was generated by drawing only 1,000 observations for A, B, and C.  cor(A/C,B/C) was 0.99 here.

From → Uncategorized

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: