Gelman’s Problems with P-Values

Andrew Gelman is a seemingly tireless crusader against the sloppy use of p-values. Today he posted a very short (4 page) new article that explains some of the problems with p-values, and gives some quick examples of when they fall apart vs. when they merely do no harm. I recommend reading the whole thing, especially if you’ve recently been exposed to the standard two semester sequence of statistics in Sociology or econometrics. If you’re totally unfamiliar with Bayesian analysis, some of the terms will be a bit confusing, but it’s a good opportunity to search around a bit and get a feel for the language of Bayesianism. A couple gems:

The casual view of the P value as posterior probability of the truth of the null hypothesis is false and not even close to valid under any reasonable model, yet this misunderstanding persists even in high-stakes settings (as discussed, for example, by Greenland in 2011). The formal view of the P value as a probability conditional on the null is mathematically correct but typically irrelevant to research goals (hence, the popularity of alternative—if wrong—interpretations).

This passage, from the opening, names both the most common but wrong interpretation and identifies one source of that wrongness: what p-values actually mean is not very interesting, and so we’d much rather they mean what they don’t.

One big practical problem with P values is that they cannot easily be compared. … Consider a simple example of two independent experiments with estimates (standard error) of 25 (10) and 10 (10). The first experiment is highly statistically significant (two and a half standard errors away from zero, corresponding to a normal-theory P value of about 0.01) while the second is not significant at all. Most disturbingly here, the difference is 15 (14), which is not close to significant. The naive (and common) approach of summarizing an experiment by a P value and then contrasting results based on significance levels, fails here, in implicitly giving the imprimatur of statistical significance on a comparison that could easily be explained by chance alone.

Gelman has written about this example many times before under the heading “The difference between significant and not significant is not significant.” This is the quickest explanation I’ve seen.

About these ads
Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 133 other followers