Marginal Revolution just linked to a paper with one of my favorite titles of all time: Star Wars: the Empirics Strike Back. The paper reports a big meta-analysis of all of the significance tests done in three top econ journals – AER, JPE, and QJE – and shows how the distribution of their significance tests shows a tip between .10 and .25, and a slight bump at just under .05, thus exhibiting a two-humped “camel shape.” The authors argue that this distribution suggests that researchers play with specifications to coax findings near significance across the threshold. The paper’s title is actually a substantive finding as well: “Inflation [of significance] is larger in articles where stars are used in order to highlight statistical significance and lower in articles with theoretical models.”
I like this finding because it’s much narrower and suggests a much more plausible mechanism than McCloskey and Ziliak’s famous “Standard Error of Regressions.” (For a good critique of M&Z, especially their coding scheme, see Hoover and Siegler.) Rather than simply asserting that economists don’t understand significance, and are part of a “cult”, Brodeur et al. show a small but predictable amount of massaging to push results towards the far-too-important thresholds of .10 and .05. So, they agree in some sense with M&Z that economists are putting too much emphasis on these thresholds, but without an excessive claim about cult-like behavior.
Economists, in fact, look like corporate earnings managers. I’m not super up on the earnings management literature, but various authors in that field argue that corporations have strong incentives to report positive rather than negative earnings, and to meet analyst expectations. The distribution of earnings shows just that: fewer companies reporting very slightly negative earnings than you would expect, and fewer that just barely miss analyst expectations than just barely exceed (see, e.g. Lee 2007, Payne and Robb 2000, Degeorge et al. 1999 – if anyone has a better cite for these findings, please leave a comment!). Like the economists* engaged in the Star Wars, businesses have incentives to coax earnings towards analyst’s expectations.
What’s the larger lesson? I think these examples are both cases of a kind of expanded Goodheart’s Law, or a parallel case to Espeland and Sauder’s “reactivity” of rankings. Another variant that perhaps gets closest is Campbell’s Law first articulated in the context of high-stakes testing: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” It’s not clear just how “corrupt” economics findings or corporate earnings statements are, but Campbell’s law and its close proxies remind us of the need to look for both subtle and overt forms of distortion whenever we turn a particular measure into a powerful threshold.
* Note that I am picking on economics here only because the article studied econ journals. I would bet that a similar finding could be obtained in Sociology journals, and probably other social science fields with a heavy statistical bent.