Statistical Optimism: Mortgage Finance and Depressions, Retro Edition

Let’s coin a new phrase: “statistical optimism.” Statistical optimism refers to the belief that if only we had better statistics about X, and that everyone was made aware of those statistics, then we would make better decisions about X and some set of problems would go away without any major changes in the institutions actually making decisions. It’s a practical, quanty version of the classic Enlightenment-style idea that more knowledge always makes things better. Note that by statistics, here, I mean the production and distribution of quantitative data, the old sense of statistics (vital statistics, censuses, national income statistics, etc.), and not the inferential field we know and love today.

This phrase came to mind today as I was reading through a March, 1932 interview with Senator La Follette* about the need for better economic statistics to improve economic planning in the midst of the depression. The interview is chock full of great quotes that give you a flavor of what it was like to live in a time before the CPS, NIPA, and all the other routine, standardized, official data we take for granted. For example:

It is a sad commentary on our statistical information that in the third winter of the depression we have absolutely no authoritative official figures on unemployment. The only data we have are those collected by the census in 1930 for the country as a whole and for certain cities in January 1931.

The authoritative bit here was to be important, too, as FDR and Hoover fought in the 1932 campaign over whose (partial, non-standardized) unemployment figures were better.

The belief that gets me, though, and that seems to be widely shared across the political spectrum at this point, is that just having good data will fix all kinds of ideological disputes. It was this belief, in part, that motivated the founding of the NBER, and it was this belief that animated Hoover to work to produce all kinds of economic reports in the 1920s and early 1930s in concert with economists and businessmen (e.g. Recent Economic Trends, Recent Social Trends, etc.). La Follette was a Republican also, but later founded the Wisconsin Progressive Party, and clearly believed in less business-led solutions to economic problems than Hoover, but he had the same attitude of statistical optimism. A quote from the end of the interview about the potential for authoritative statistics to prevent future depressions struck me as especially relevant and, from a post-2008 perspective, ironic:

Suppose late in 1928 some authoritative body in Washington had publicly emphasized the fact that there was an excess of private houses on the market. Suppose it had pointed out that construction figures showed an appreciable falling off in the building of new houses. Surely in the light of such warnings people would not have continued investing their hard-earned savings in first and second mortgage real estate bonds thus increasing the supply of new capital for speculative building which continued into 1929.

If only it were so.

FRED New Housing Starts 2006 to 2011

Though, I suppose, in fairness to La Follette, what he called for was not simply the creation of better data but also the creation of an institution – a national economic council, somewhat of a precursor to what ended up being the Council of Economic Advisers – that would have the authority to interpret data, not just collect it. Still, the optimism is palpable, and from our vantage point, tragic.

* La Follette is important in my work because he introduced a resolution in 1932 which called for the creation of the first** official US national income estimates.
** Well, he thought they were the first, and so do most people. The FTC actually produced an estimate in 1926, but almost no one knows about it, and no one did much with it then either.

Advertisements

Undoing Publication Bias with “P-Curves”, Minimum Wage Edition

Following the blog rabbit hole today, I came across an interesting statistics and data analysis blog I hadn’t seen before: Simply Statistics. The blog authors are biostatisticians at Johns Hopkins, and at least one is creating a 9-month MOOC sequence on data analysis that looks quite interesting. So, far my favorite post (and the one that led me to the blog) is a counter-rant to all the recent p-value bashing (e.g. this Nature piece): On the scalability of statistical procedures: why the p-value bashers just don’t get it. The post’s argument boils down to something like, “P-values, there is no alternative!” But check out the full post for the interesting defense of the oft-maligned and even more oft-misinterpreted mainstay of conventional quantitative research.

Apart from that post, I also enjoyed a link to a recent working paper, which is what I wanted to highlight here. Even though the blog authors defend p-valus as a simple way of controlling researcher degrees of freedom, they also seem to be part of a growing group of statisticians interested in finding ways of correcting for the “statistical significance filter“, as Andrew Gelman puts it. The method presented in “P-Curve Fixes Publication Bias: Obtaining Unbiased Effect Size Estimates from Published Studies Alone” seems quite intuitive. Basically, the authors show how to simulate a p-curve (distribution of p-values) that best matches the observed p-values in a collection of studies, given the assumption that only significant results are published (but not perfectly accounting for other forms of p-hacking, discussed in the paper). Although the paper is short, it presents payoffs for analysis of two vexing problems, including the relationship between unemployment and the minimum wage. Here’s the example reproduced in full:

Our first example involves the well-known economics prediction that increases in minimum wage raise unemployment. In a meta-analysis of the empirical evidence, Card and Krueger (1995) noted that effect size estimates are smaller in studies with larger samples and comment that “the studies in the literature have been affected by specification-searching and publication biases, induced by editors’ and authors’ tendencies to look for negative and statistically significant estimates of the employment effect of the minimum wage […] researchers may have to temper the inferences they draw […]” (p.242).

From Figure 1 in their article (Card & Krueger, 1995) we obtained the t-statistic and degrees of freedom from the fifteen studies they reviewed. As we show in our Figure 4, averaging the reported effect size estimates one obtains a notable effect size, but correcting for selective reporting via p-curve brings it to zero. This does not mean increases in minimum wage would never increase unemployment, it does mean that the evidence Card and Kruger collected suggesting it had done so in the past, can be fully accounted by selective reporting. P-curve provides a quantitative calibration to Card and Krueger’s qualitative concerns. The at the time controversial claim that the existing evidence pointed to an effect size smaller than believed was not controversial enough; the evidence actually pointed to a nonexisting effect.

So, Nelson et al. provide an intuitive way of formalizing Card & Krueger’s assertion that publication bias could account for some of the findings of a negative relationship between unemployment and minimum wage increases – and even further, that publication bias could actually reduce the best estimate of the effect to zero (which seems consistent with much, thought certainly not all, of the recent literature).

These methods seem really neat, but I’m not entirely sure what problems in sociology we could generalize them to. In the subfields I follow most closely, most research is either not quantitative, or is based on somewhat idiosyncratic data and hence it’s hard to imagine a bunch of studies with sufficiently comparable dependent variables and hypotheses from which one could draw a distribution. I’d bet demographers would have more luck. But in economic sociology, published replication seems sufficiently rare to prevent us from making much headway on the the issue of publication bias using quantitative techniques like this – which perhaps points to a very different set of problems.