Amateur Statisticians

Here’s a fun game for all you social scientists: what is wrong about Adam Serwer’s post on The GOP’s Amateur Statisticians?

Hint: If you take the time to read this groundbreaking article on Trends in Surveys on Surveys in Public Opinion Quarterly, you can find out what percentage of Americans make a variant of the mistake that Adam just made. The answer: about 86%.

Anyway, I left a hastily-composed comment on his blog in case you are really dying to know his very common mistake.

UPDATE: Adam has updated his post, which proves that he is a class-act. Disappointingly few people respond when I leave methodological reamarks in the comments, including (ahem) certain popular sociology blogs.

The natural state

I have an overly-literal mind, I’ve been told. I’ve always maintained that it’s impossible to believe in the supernatural, because anything that exists is natural by definition. An obvious point, but failure to note this gives conservative’s one big leg-up in “Markets are natural, States are not” rhetoric. Consider this self-imploding reader’s rant, which Andrew Sullivan gave an A-fucking-men:

Where does Rick Hertzberg think society’s ability to give people “enough to eat and a roof over their heads” comes from, if not from those economic liberties and rights he holds as secondary? It’s all from the surplus created by the division of labor and comparative advantage. The overflowing abundance that marks modern society – where people like Hertzberg can make a comfortable living writing for The New Yorker without ever cultivating his own food, weaving his own clothes, building his own home, and so on – would not exist if not for the continued protection of free enterprise and private property. (And he dares to quote Adam Smith in his follow-up post!?)

Free enterprise comes before voting.

If I can steal generously from Hayek for a second, society didn’t develop the complexity that it has today because everyone in a small village in 2,500 B.C., or 100 A.D., or 1640s New England got together and voted to divide their time and effort in order to provide goods and services for exchange; this happens organically. This happens because it has proven, over thousands of years, to be the most efficient and mutually-beneficial means of getting past subsistence and reaching a better life. Without this, there is no possibility for organized self-government and modern civil rights.

In what possible viable world view could the “right to vote” be valued more favorably than property rights and the freedom of enterprise?

First, I’ll point out the reader conflates the division of labor with free enterprise. These are very different things. The division of labor started the first time one caveman decided to hunt, and another to gather. It’s only accelerated since then. Meanwhile, free enterprise is a state of affairs that has never existed – never, not once, from the moment that the first priest on the Mesopotamian flood plain coordinated the activity of a village of farmers to the current regime of regulated nation states.

But notice something else going on here: the reader claims that the division of labor happened organically because the tests of time proved it to be the most efficient way of organizing human productive activity. But guess what else grew over the last couple of millennia and has withstood the tests of time? The state. And the growth was natural as the growth of the market: one set of hierarchies folding into another, progressively larger state apparatuses emerging. It’s nothing but polemics to call something so fundamental to the concept of civilization “artificial.”

Here’s the deal: State power integrates markets (Evidence: The Babylonian Empire. The Roman Empire. The Arabian Empire. The British Empire. Etc.) Integrated markets further the division of labor. Productivity grows, leading state power to grow. And then state power integrates more markets. In short, States and markets love each other.

The idea is just laughable that there were peasants in the south of France who really wanted to inextricably link their fates with a bunch of unseen-others in Paris, but the mean feudal lords wouldn’t let them engage in free enterprise.

One key to future economic health is to move past the silly dichotomy of natural markets/artificial states. Business doesn’t like the state because the state taxes their profits. End of story. It would be awfully noble for the Chamber of Commerce to be spending all of this money because the state is interfering with some higher-order precondition for system-wide productive efficiency!

I guess reading comprehension isn’t necessary for success

Jamelle Bouie on Andrew Sullivan:

Simply put, wealth simply doesn’t enjoy a 1:1 relationship with success. Some people work hard for their wealth, but some are lucky, some have it handed to them, and others are ultimately capitalizing on advantages they gained at birth. Money is money, not moral worth, and Sullivan is smart enough to know the difference.

Andrew Sullivan responds to Jamelle:

Jamelle Bouie questions my choice of words. Why are so many on the left incapable of acknowledging that many people who are rich – but, of course by no means all of them – earned it the hard way? Until more liberals internalize this, they will fail to persuade America of the occasional need for government because people will rightly suspect that what they are really about is penalizing or diminishing hard work. By the way, I favor an inheritance tax. But I also favor allowing those who work hard to keep as much of their own money as possible.

Andrew’s underbloggers summarize the exchange in their Weekly Wrap:

Andrew bucked Jamelle Bouie’s jeers at the “successful.”

“…by neatly failing to respond to a single point raised by Jamelle,” they could have added.

In any case, it is important that liberals continue to sharpen the argument that “money is money, not moral worth,” and that we have no natural right to keeping every dollar we earn on the open market. We can’t have a natural claim to market income because – guess what? – the market isn’t natural. Most forms of market exchange need a capable state.

Here’s my best effort to dismiss the argument that taxation is theft.

The more poverty changes, the more reformers stay the same

Or maybe vice versa.

Ta-Nehisi Coates discusses NYC’s efforts to get food stamp recipients to stop drinking sugary beverages and concludes:

I’m willing to be swayed, but this feels like something that was cooked up in a lab without any consideration for ordinary human nature. This is not a math formula. You need to convince actual, living, breathing people.

The 19th century English reformer/statistician who helped create the concept of a poverty line calculated the income you’d need to live healthily on the cheapest, most nutritious food available: legumes. Some 30% of the population couldn’t buy enough legumes to thrive, and he called these the “primary poor.”

But an equally large group he called the “secondary poor.” These were the people who’d be fine if they didn’t insist on having one steak meal a week. The reformer felt nothing but opprobrium for these people.

Anyway, the early welfare state sure wasn’t going to give people more than the minimum income necessary to maintain their health. And so a large portion of the British citizenry chose malnourishment over foregoing meat.

Some morals:

1) Poor people haven’t changed much in a century and a half, but that doesn’t mean bien pensants are any closer to understanding them

2) Quantitative social science is the handmaiden of the modern state

3) Quantification necessitates – but obscures – moral judgement!


DISCLAIMER: I learnt all of this last night as I was falling asleep to James Vernon’s lectures on “The Peculiar Modernity of Britain.” (available here or on iTunes U). Point being: you should probably listen to it for yourself rather than trust me on this.

Maybe he was doing Yahweh’s Work

In the introduction to Winner-Take-All Politics, Pierson and Hacker write that Goldman CEO Blankfein

insisted that Goldman was doing ‘God’s work” – apparently missing the passage in the Bible about how hard it is for a rich man to enter heaven.

Perhaps Blankfein missed that passage because it’s in the New Testament? Not cool, Pierson and Hacker.

Other than that, I’m enjoying the book.

Stuff Only White People Like

OKCupid’s The REAL ‘Stuff White People Like’ has made a fair-sized splash around the Internet.

What did OKCupid do?

OKCupid analyzed half a million of their users’ profiles in order to answer the following (unanswerable?) question:

What is it that makes a culture unique? How are whitesblacks,Asians, or whoever different from everybody else? What tastes, interests, and concepts define an ethnic group? And is there any way to make fun of other races in public and get away with it?

Their method? To use statistical methods (not described anywhere I could find them) to uncover “the words and phrases that made each racial group’s essays statistically distinct from the others'” – or as they label these tastes, “racial outliers.”

These “racial outliers” are then ranked by how unusual the preference is. For example, blacks are twenty times as likely to write “soul food” as non-blacks. No other ratios are given, but judging by the size of the font, this is near the upper-limit of “unusualness”.

To recap: We have lists of phrases ranked by how much more likely black men (for example) are than non-black men to use that phrase.

What did OKCupid imply that they did?

On the one hand, those quotes I used in the previous section came from the article itself, so OKCupid was honest about the metric their lists actually use. But could the reader easily misinterpret these lists? Of course: Ta-Nehisi Coates did, but still made the sharp point that “average black man” is not “average black man on OKCupid”. Shani-O did, but still made the sharp point that these lists more accurately represent what people “think they *should* like, in order to attract that special someone.”

Both of these misreadings are pretty understandable, considering that OKCupid titled the post The REAL ‘Stuff White People Like’ and showed charts named “Stuff Black People Like” and wrote stuff like “Double finally, how bold is it that I am cool is the second most typical phrase for black men?” and

In general, I won’t comment too much on these lists, because the whole point of this piece is to let the groups speak for themselves, but I have to say that the mind of the white man is the world’s greatest sausagefest.

I’ll explain next why it’s nonsense to assume that anything on these lists is “typical.” In any case, OKCupid knew that a blog post called “Stuff White People Like Disproportionately to Non-Whites” wasn’t going to get many views.

What do these lists actually tell us? The case of the Boston Red Sox

Matt Yglesias noticed that these were lists of racial outliers, but he titled his post on the topic “Stuff White People Really Like“. Is this a more accurate interpretation of the lists? Not really: does anyone actually believe that white girls really like the Boston Red Sox?

I am surprised that more people – especially considering how many millions of white New Yorkers exist – have not objected to OKCupid’s insinuation that liking the Red Sox is somehow the apogee of white femininity.

How did the Red Sox get to the top of the white women’s list? It’s easy enough, if you consider how white New England is and remember that these lists favor phrases that are really unpopular among other races.

Let’s go back to soul food, which we know blacks are twenty times more likely to list than non-blacks. This means that – at most! -5% of non-blacks listed soul food. We know this is the maximum because 5%*20 = 100%.

Of course, 100% of blacks did not list soul food. But think about how quickly the “black percentage” drops as we decrease the “non-black percentage”

  • Non-black: 4%, Black 80%
  • Non-black 3%, Black 60%
  • Non-black 2%, Black 40%
  • Non-black 1%, Black 20%
  • Non-black .5%, Black 10%

Here we see that it is easier to get on the list by being unpopular among most races than by being popular in one. A 1 percentage-point change in unpopularity among non-blacks does as much for soul food’s ranking as a 20 percentage-point in popularity among blacks.

Back to the Red Sox: keep in mind that only 5% of the population lives in New England, that not everyone in New England is a Red Sox fan, and not every Red Sox fan is going to list “Red Sox” in their profile. Now does anyone want to take a guess as to what percentage of white girls actually listed the Red Sox in their profiles? One percent?*

I bet more white people loathe the Red Sox than like them, and I bet the vast majority of white people are pretty much indifferent to them. And yet we are told that the Red Sox and Megadeath and the Ghostbusters are somehow the epitome of whiteness – the “tastes . . . that are specially important to [whites].”

Or perhaps black people and so on just don’t like the Dropkick Murphys.

Stuff Non-White People Don’t Give a Shit About (And Most Whites Probably Don’t Either)

… and that’s what these lists are really about.

Anyway, I apologize to all of those who saw their ethnicity reduced to a goofy list of phrases stamped with the veneer of Statistical Science. Advanced statistical techniques are unparalleled at giving us very specific answers to very specific questions. But asking a computer to crunch a bunch of numbers is the easy part. Knowing what question you asked the computer is where the subtleties lie.

Am I taking this too seriously? To some extent OKCupid wants us to take these lists seriously:

The information in this article is not our opinion. It’s data, aggregated from the essays of half a million real people.

They want the thousands of people who read that post to believe that these lists are TRUTH, and not the result of a statistical technique more or less guaranteed to reproduce racial stereotypes. Because of this article, hundreds of black girls are probably on dates right now with white guys furtively reading from crib notes on soul food and Luther Vandross.

In any case, I am just glad that OKCupid didn’t do this by religion too. Seeing a big-lettered JOE LIEBERMAN scrawled atop the “What Jews Like” would be more than I could take.


*Seriously OKCupid, I’d love to know. Your lack of a methodology section has led to a lot of guesswork here.

A Portrait of the Non-Voter as an Old Man

Yesterday I had the chance to overhear my union dues-paying, social-security dependent, Democratic neighbor explain why he wasn’t going to vote for the Democrats this year.

Politicians, he has decided, are a bunch of crooks who only care about themselves. He had four major complaints justifying his decision not to vote:

1. Somerville wastes money on fireworks when Boston has a perfectly good fireworks display

2. Somerville wastes too much money planting trees along Somerville Ave

3. Somerville wastes too much money on parking enforcement, even paying officers to distribute tickets in the middle of the night

4. Somerville no longer allows him to park his car against the direction of traffic, like they do in Medford

Government, he concluded, should be taking money from the rich people and giving it to the poor. Instead they waste their time on this bullshit.

I spend so much time thinking about national policy that I forget how much more salient local government is for some people, the difference between our national military budget and the Somerville firework budget notwithstanding.

I also tend to forget how many voters aren’t at home in either party. I had never even considered before that there might be a voting bloc of small-government welfare-statists (or is it socialist libertarians?) whose operative philosophy is pretty much share the wealth and then leave me the fuck alone.

The Overdetermination of Choice -> The Overestimation of Effects

I suppose I should clarify why I think heterogeneous treatment effects are dire enough in education research to cast doubt on a lot of estimations of the value of a high school degree.

In my last post I only hinted at the first type of selection bias, whereas arguably type II is more important. Both forms of selection bias are mechanisms that determine what kind of person appears in our data set with a high school degree.

Selection Bias Type I: You know that you never want to work/could never hold down the kind of job that requires a high school degree. You therefore decide not to obtain a high school degree. The marginal utility for you of a high school degree is 0.

Selection Bias Type II: You attend an underperforming public school. You learn few valuable skills while sitting in the classroom and you realize that a diploma from Public School 128 is not a valuable credential, and so you drop out at a young age.

Now, if you attend a St. Grottlesex school, not only do your demographic characteristics almost guarantee that you’ll graduate, but your degree is also worth a lot more than other degrees, incentivizing you even further to graduate.

So what does this all add up to?

Students with family characteristics most predisposed to finishing high school are incentivized by the opportunity to obtain the most valuable degrees.

Students with family characteristics least predisposed to finishing high school are incentivized by the opportunity to obtain the least valuable degrees.

In the population as a whole, then, valuable degrees are overrepresented and less valuable degrees are underrepresented. So when researchers try to estimate the value of a high school degree with an unbalanced data set, they will consistently overestimate the actual value of a degree for the least advantaged groups.

The unfortunate consequence of this is that a John McWhorter can casually glance at poverty statistics and then wildly overestimate the efficacy of a high school degree. Armed with his cross-tabulation, he then says, “Look how valuable a degree is! When you choose not to graduate, you have chosen a life of poverty.” It’s a pat observation, and one that makes the alleviation of poverty seem like you just need a few more guidance counselors to twist a student’s arm here, give a word of encouragement there.

But I’d argue that to some (possibly very large) extent, the failure of so many students to graduate is a reflection of what those students estimate the value of that degree to be. If their school’s performance is poor and if they see few opportunities for high school grads in their community, they may well wonder what the point of showing up to homeroom each morning is.

In a world like that, it’s better to think of low graduation rates as symptoms – not the causes – of deeper social ills.

Karl Polanyi in the News

Andrew Sullivan quotes approvingly from an article on San Francisco’s new parking policy, which is described as “the most aggressive free market parking policy in the nation”:

The goal is to ensure that there’s always a space available, so that people stop endlessly driving in circles looking for parking. People will be able to check online to find out the current parking cost in the place they intend to visit. Parking garages will have a better chance of undercutting on-street rates, so that those garages can fill. If you’ve ever driven in San Francisco, you know that it’s hard to decide to use a garage because, well, if you just drive around the block once more, you might get lucky. Under SF Park, if you just drive around the block once more, you’ll probably find a space, but it will cost more than a garage, especially if you’ll be there for a while. So drivers are more likely to fill up the garages.

If the program fails, which I hope it doesn’t, it will be as a result of being too timid. There will inevitably be pressure to set a maximum parking price, at which prices will stop rising, which means that space will fill up, which means that everyone will be driving around the block again. Andrew Price at Good asks: Could parking costs reach $10/hour? Conceivably yes, for a few high-demand hours, which are almost certainly also hours when transit is abundant. What’s wrong with that?

But for some reason Sullivan titled his post – which is celebrating massive governmental manipulation of the prices of parking spots – “San Francisco’s Libertarian Dream.”

In short, this is an illustration of the paradox at the heart of Karl Polanyi’s The Great Transformation: the free market was planned!

No government, no free market.

Poverty isn’t disarmingly simple

In a New Republic piece, John McWhorter wrote the following, which was later excerpted on the Daily Dish, thereby becoming a part of the Internet’s collective consciousness:

One of the most sobering observations made by Wax comes in the form of a disarmingly simple calculus presented first by Isabel Sawhill and Christopher Jencks. If you finish high school and keep a job without having children before marriage, you will almost certainly not be poor. Period. I have repeatedly felt the air go out of the room upon putting this to black audiences. No one of any political stripe can deny it. It is human truth on view. In 2004, the poverty rate among blacks who followed that formula was less than 6 percent, as opposed to the overall rate of 24.7 percent. Even after hearing the earnest musings about employers who are less interested in people with names like Tomika, no one can gainsay the simple truth of that advice. Crucially, neither bigotry nor even structural racism can explain why an individual does not live up to it.

If I could advise all of the teenagers in our country, I would advise them to stay in school and wait to have children until they were employed. But despite the “disarmingly simple calculus” above, I doubt my advice would be a panacea for national poverty rates. The true calculus is much less simple than the one McWhorter describes.

The technical part wherein I try to explain the importance of a balanced data set

The statistical evidence cited seems to be a simple cross-tabulation, and not a particularly interesting one: how surprised are any of us that blacks who complete high school and keep a job are much less likely to be poor? Perhaps Sawhill and Jencks ran a regression analysis that found that, controlling for other variables, completing high school and delaying parenthood are negatively associated with poverty. I have no reason to doubt this: I completely believe that married, diploma-holding African-American parents are much less likely to live in poverty.

Unfortunately, regression is unable to test for causal effects wherever there might be “heterogeneous treatment effects.” Basically, people who choose to complete high school in part choose to do so because they believe a high school degree will benefit them. People who choose not to finish high school may in part choose to do so because they believe a high school degree will not benefit them as much. In a lot of cases, these beliefs might even be accurate.

Imagine that we have a sample of 100 adults. Half completed high school and earn $100/week. Half dropped out of high school and now earn $50/week. Bivariate regression would tell us that the a high school degree is associated with an increase in your earnings of $50/week, and so we might want to assume that if we could encourage all of the drop-outs to get their GEDs, then they too would earn $100/week. But this only makes sense if we make the – strong and often unwarranted – assumption that a high school degree has the same effect for everybody.

Here is a better idea. Find two-hundred teenagers. Next, randomly administer some “intervention” or “treatment” by which we insure that 100 kids graduated and 100 kids did not. 10 years later, we would check in on these 200 adults and compare the average earnings of the graduate group to the average earnings of the didn’t-graduate group. Because we know that teenagers didn’t choose their treatment based on their beliefs about the effects of the treatment, we could now safely say, “The effect of obtaining a high school diploma for such-and-such a group of teenagers is $X.”

Unfortunately, experiments like that described above are often prohibitively expensive, impossible, or just plain immoral. The next best alternative is to create a “balanced data set” in which we match pairs of individuals based on their likelihood of receiving the treatment (in this case, graduating high school). For example, we would want to find two adults whom we determine to have a 75% probability of having finished high school, but one of whom finished and one of whom didn’t. In other words, “75% of people that look like these two finish high school, but one did and one didn’t.” We would then compare the differences in income of the graduates and non-graduates within these matched pairs. Only by comparing these differences can we hope to estimate the causal effects of a treatment where “selection into treatment” bias may exist.

In short, without “balancing” our data whenever selection bias exists as described above, we can only speak about associations. If we do balance our data, we can then speak about casual effects.

Why this matters

McWhorter and Wax have identified several factors that are strongly and starkly correlated with living in poverty. The implication in McWhorter’s piece is that these behaviors cause one to live in poverty. Furthermore, the behaviors that McWhorter have identified occur during adolescence, when we can imagine teens choosing whether or not to do the right thing. In short, McWhorter has expounded a causal logic that heavily implies something like, If you had only made him wear a condom, then your chances right now of being poor would be 6% instead of 24.7%.

By arguing that the most crucial junctures are those that happen during adolescence, McWhorter makes poverty seem like such a simple bullet to dodge. Just do the right thing, teenagers! But what if poverty is overdetermined by the same background factors that push certain segments of the teen population toward these behaviors? McWhorter’s story ignores the fact that by the time you are 13 you already have a long history: teenagers at high risk of pregnancy are not otherwise identical to teenagers at low risk. High risk teenagers may know that a diploma would benefit them less than their peers.

None of this is to deny that we should do everything we can to encourage teenagers to delay pregnancy and graduate high school. These choices undoubtedly improve life chances on the margins. But the oversized effects that McWhorter celebrates are just not supported by the statistics he cites. Poverty isn’t so simple, and making it appear so only encourage us to preach from the gospel of personal responsibility and then call it a day.