# Probability is Counterintuitive: Immigrant Solidarity Edition

I am going to let you all in on a simple statistical trick.

**The Trick**

First, randomly generate three vectors of 100,000 observations each.

A <- rnorm(10^6,100,15)B <- rnorm(10^6,25,3)C <- rnorm(10^6,10^5,10^4)

Because these vectors have all been randomly generated, their correlations are basically zero.

cor(A,B) = 0.0002cor(B,C) = -0.0017cor(A,C) = -0.0003

Interpret these correlations as saying, “Knowing the value of A tells us nothing about the value of B or C.”

But here’s the magic. Create ratios of A to C and B to C and then take their correlation:

cor(A/C,B/C)

I got a correlation of 0.37 – not bad in the social sciences. But remember, these correlated ratios were created by manipulating three random, independently generated vectors of observations.

**Fill in the Blanks**

Okay, let’s rename our randomly generated variables.

- Let’s call “A” the number of Hispanic immigrants in a city.
- Call “B” the number of violent crimes in a city.
- Call “C” the population of a city.

So A/C is the percentage of Hispanic immigrants in a city, and B/C is the violent crime rate for that city. And what do you know – they’re strongly and positively correlated.

What’s more: call “C – A” the number of non-immigrants in your city. Then the correlation of Real Americans with violent crime is strong and negative. Incredible!

**Arm Yourself**

In the wake of Arizona’s immigration law, I’ve seen a fair bit of discussion of the relationship between immigrants and violent crime. And while I’ve yet to see any of these bloggers as much as link to a simple correlation with real data, I know that somewhere out there is an amateur social scientist with a single-user license to SPSS, and he’s going to show you a scatterplot just like this:

and claim that it means something. But if you’re correlating two ratios with the same denominator, it means nothing.

Studies in epidemiology and criminology are especially prone to this problem because they are often reliant on aggregate-level rates. I don’t know if this problem has a name, but I’d like to propose “The Nativist’s Fallacy.” Just use your imagination and think of all the ethnic groups we could put in “A” and all of the events we could stick in “B”. For some, the opportunity is irresistible.

—————————-

This plot was generated by drawing only 1,000 observations for A, B, and C. cor(A/C,B/C) was 0.99 here.

Advertisements

Leave a Comment