Andrew Gelman is a seemingly tireless crusader against the sloppy use of p-values. Today he posted a very short (4 page) new article that explains some of the problems with p-values, and gives some quick examples of when they fall apart vs. when they merely do no harm. I recommend reading the whole thing, especially if you’ve recently been exposed to the standard two semester sequence of statistics in Sociology or econometrics. If you’re totally unfamiliar with Bayesian analysis, some of the terms will be a bit confusing, but it’s a good opportunity to search around a bit and get a feel for the language of Bayesianism. A couple gems:
The casual view of the P value as posterior probability of the truth of the null hypothesis is false and not even close to valid under any reasonable model, yet this misunderstanding persists even in high-stakes settings (as discussed, for example, by Greenland in 2011). The formal view of the P value as a probability conditional on the null is mathematically correct but typically irrelevant to research goals (hence, the popularity of alternative—if wrong—interpretations).
This passage, from the opening, names both the most common but wrong interpretation and identifies one source of that wrongness: what p-values actually mean is not very interesting, and so we’d much rather they mean what they don’t.
One big practical problem with P values is that they cannot easily be compared. … Consider a simple example of two independent experiments with estimates (standard error) of 25 (10) and 10 (10). The first experiment is highly statistically significant (two and a half standard errors away from zero, corresponding to a normal-theory P value of about 0.01) while the second is not significant at all. Most disturbingly here, the difference is 15 (14), which is not close to significant. The naive (and common) approach of summarizing an experiment by a P value and then contrasting results based on significance levels, fails here, in implicitly giving the imprimatur of statistical significance on a comparison that could easily be explained by chance alone.
Gelman has written about this example many times before under the heading “The difference between significant and not significant is not significant.” This is the quickest explanation I’ve seen.