Gelman’s Problems with P-Values

Andrew Gelman is a seemingly tireless crusader against the sloppy use of p-values. Today he posted a very short (4 page) new article that explains some of the problems with p-values, and gives some quick examples of when they fall apart vs. when they merely do no harm. I recommend reading the whole thing, especially if you’ve recently been exposed to the standard two semester sequence of statistics in Sociology or econometrics. If you’re totally unfamiliar with Bayesian analysis, some of the terms will be a bit confusing, but it’s a good opportunity to search around a bit and get a feel for the language of Bayesianism. A couple gems:

The casual view of the P value as posterior probability of the truth of the null hypothesis is false and not even close to valid under any reasonable model, yet this misunderstanding persists even in high-stakes settings (as discussed, for example, by Greenland in 2011). The formal view of the P value as a probability conditional on the null is mathematically correct but typically irrelevant to research goals (hence, the popularity of alternative—if wrong—interpretations).

This passage, from the opening, names both the most common but wrong interpretation and identifies one source of that wrongness: what p-values actually mean is not very interesting, and so we’d much rather they mean what they don’t.

One big practical problem with P values is that they cannot easily be compared. … Consider a simple example of two independent experiments with estimates (standard error) of 25 (10) and 10 (10). The first experiment is highly statistically significant (two and a half standard errors away from zero, corresponding to a normal-theory P value of about 0.01) while the second is not significant at all. Most disturbingly here, the difference is 15 (14), which is not close to significant. The naive (and common) approach of summarizing an experiment by a P value and then contrasting results based on significance levels, fails here, in implicitly giving the imprimatur of statistical significance on a comparison that could easily be explained by chance alone.

Gelman has written about this example many times before under the heading “The difference between significant and not significant is not significant.” This is the quickest explanation I’ve seen.


1 Comment

  1. I don’t know who views “The casual view of the P value as posterior probability of the truth of the null hypothesis is false” given that it is conditional on the H_0 being true and hence cannot be used for that purpose. However, the fact that people misuse a tool does not mean we should abandon it.

    People sometimes misuse guns, but we all know that banning them and replacing them with sticks would not solve the problem of self-defense. Savages are totally unfamiliar with piloting a plane, but that doesn’t stop us from flying our planes nontheless instead of resorting to horse carriages.

    Andrew himself admits to seeing many misuses of Bayesian statistics and these will get more prevalent if the Bayesian approach somehow gains traction (a similar situation with prevalence of viruses for Mac vs viruses for Windows).

%d bloggers like this: