Bhutan’s new rival in public happiness measurement: Lithuania

As readers of this blog may know, Bhutan has long produced a Gross National Happiness indicator as an alternative measure of welfare to more traditional economic indicators (e.g. GDP). Happiness measures have gained some traction elsewhere, and research on the economics of happiness (starting from the Easterlin paradox and its critics) has gotten quite mainstream. But official, public measures of happiness are still the exception rather than the norm. But not, it seems, in Lithuania:

Lithuanian capital to install public ‘happiness barometer’
The mayor of Vilnius plans to install a huge screen on the town hall to broadcast a real-time “happiness barometer” that will monitor the mood of the Lithuanian capital.

The giant display will monitor the level of happiness among the city’s 520,000 residents by showing a number on the scale of one to 10 that reflects tabulated votes sent in by locals from their mobile phones and computers.

“This barometer is a great tool for politicians. If we take a decision and see a sharp fall in the mood of the city, then we know we have done something horribly wrong,” mayor Arturas Zuokas said.

I’m not sure how I feel about this. On the one hand, most citizens aren’t that tapped into government activity, and so real-time reactions are likely to be more noise than signal. On the other hand, I’d rather CNN report on the noise-like fluctuations of happiness than to hear another billion stories about why the Dow was up or down ten points today (or, why various mostly clueless pundits think the Dow was up or down ten points, anyway).

What would such a story look like? “Happiness was up 5% today on strong sunshine, moderated by a predicted storm this weekend and reports of possible cost overruns in the new Defense Department IT overhaul…”? Try writing your own fictional happiness trend story in the comments!

The Economy is Real and Everything Else is Fake, Even Diabetes

About a month ago, the New York Times ran an interesting article on evaluations of a federal program that gave subsidies to poor families to move into richer neighborhoods.* The article is titled Intangible Dividend of Antipoverty Effort: Happiness, which suggests that the only measurable benefit from the antipoverty program was increased happiness. And that’s how the article begins:

When thousands of poor families were given federal housing subsidies in the early 1990s to move out of impoverished neighborhoods, social scientists expected the experience of living in more prosperous communities would pay off in better jobs, higher incomes and more education.

That did not happen. But more than 10 years later, the families’ lives had improved in another way: They reported being much happier than a comparison group of poor families who were not offered subsidies to move, a finding that was published on Thursday in the journal Science.

That alone is an interesting finding, as it leads to all sorts of tough, reflective policy questions. Is the purpose of government to promote economic well-being and equality? Or more intangible gains like increased happiness? Of course, social science have a solution to this problem: commensurate the intangible with the real:

The improvement was equal to the level of life satisfaction of someone whose annual income was $13,000 more a year, said Jens Ludwig, a professor of public policy at the University of Chicago and the lead author of the study.

So, even though the poor didn’t actually get $13,000 more per year in income, they’re as happy as someone who did. Hooray? This bit alone would merit a blog post in my “quantification of everything” series. But what really pushes this article over the top in terms of its reification of the economic comes next:

[T]here was little evidence that the new neighborhoods made much of a difference in either income or education, a disappointment for social scientists, who had hoped that the experiment would lead to new ways of combating poverty.

What researchers did find were substantial improvements in the physical and mental health of the people who moved. Researchers reported last year in The New England Journal of Medicine that the participants who moved to new neighborhoods had lower rates of obesity and diabetes than those not offered the chance to move.

So let’s walk through this one more time. Researchers were disappointed to learn that these intervention efforts had only “intangible” effects, like increased happiness, and not “tangible” effects like increased income. But, oh by the way, participants also had lower rates of diabetes.

You see my confusion, right? I, perhaps naively, would have thought that better physical and mental health outcomes would be one notch more important than higher income.** I mean, income itself is an intangible – it’s a stream of promises. What we care about, what economists have told us to care about since as far back as Adam Smith and before, are not the promises, but what we can get from them. Like, I don’t know, more health and happiness.*** And yet, somehow, finding out that the program increases both health and happiness leads to a headline emphasizing “intangible” dividends because there was no direct effect on income?

* Hat tip to Beth Berman for sending the article along and sharing my confusion over the definition of intangible used in the story.
** I want to emphasize here that my problem is with the framing of the NYT article, and not the study itself (which I have not read). We all know that media reportage of complex (or even relatively simple) social science findings is always through a scanner darkly, so to speak.
*** Also, naively, I would have guessed that most of the benefits would accrue to the next generation, who presumably grew up in neighborhoods with better schools, etc.

Emory Lied About Admissions Data (to Increase Rankings?)

Starting in 2000, Emory University misreported admissions data to the USNWR and the Department of Education. Specifically, Emory reported the standardized test scores of admitted students rather than enrolled students, thus increasing their average scores. It’s interesting that they chose to report a strategically incorrect, but real, number rather than simply fabricating data wholecloth. I wonder if they assumed the practice would be more defensible – “Oops, we sent the wrong spreadsheet” instead of “We just faked it.”

As Marilyn Strathern, riffing on Goodheart’s law, puts it, “When a measure becomes a target, it ceases to be a good measure.”

Interestingly, USNWR claims the small increase in average SAT scores had no effect on Emory’s ranking:

U.S. News officials said the effect was small. “Our preliminary calculations show that the misreported data would not have changed the school’s ranking in the past two years (No. 20) and would likely have had a small to negligible effect in the several years prior,” said Brian Kelly, the magazine’s editor.

Perhaps it was just an error (well, an error combined with institutional inertia and a touch of cover-up), rather than a strategy. It’ll be interesting to see what the investigation reveals.

What do P-Values and Earnings Management Have in Common?

Marginal Revolution just linked to a paper with one of my favorite titles of all time: Star Wars: the Empirics Strike Back. The paper reports a big meta-analysis of all of the significance tests done in three top econ journals – AER, JPE, and QJE – and shows how the distribution of their significance tests shows a tip between .10 and .25, and a slight bump at just under .05, thus exhibiting a two-humped “camel shape.” The authors argue that this distribution suggests that researchers play with specifications to coax findings near significance across the threshold. The paper’s title is actually a substantive finding as well: “Inflation [of significance] is larger in articles where stars are used in order to highlight statistical significance and lower in articles with theoretical models.”

I like this finding because it’s much narrower and suggests a much more plausible mechanism than McCloskey and Ziliak’s famous “Standard Error of Regressions.” (For a good critique of M&Z, especially their coding scheme, see Hoover and Siegler.) Rather than simply asserting that economists don’t understand significance, and are part of a “cult”, Brodeur et al. show a small but predictable amount of massaging to push results towards the far-too-important thresholds of .10 and .05. So, they agree in some sense with M&Z that economists are putting too much emphasis on these thresholds, but without an excessive claim about cult-like behavior.

Economists, in fact, look like corporate earnings managers. I’m not super up on the earnings management literature, but various authors in that field argue that corporations have strong incentives to report positive rather than negative earnings, and to meet analyst expectations. The distribution of earnings shows just that: fewer companies reporting very slightly negative earnings than you would expect, and fewer that just barely miss analyst expectations than just barely exceed (see, e.g. Lee 2007, Payne and Robb 2000, Degeorge et al. 1999 – if anyone has a better cite for these findings, please leave a comment!). Like the economists* engaged in the Star Wars, businesses have incentives to coax earnings towards analyst’s expectations.

What’s the larger lesson? I think these examples are both cases of a kind of expanded Goodheart’s Law, or a parallel case to Espeland and Sauder’s “reactivity” of rankings. Another variant that perhaps gets closest is Campbell’s Law first articulated in the context of high-stakes testing: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” It’s not clear just how “corrupt” economics findings or corporate earnings statements are, but Campbell’s law and its close proxies remind us of the need to look for both subtle and overt forms of distortion whenever we turn a particular measure into a powerful threshold.

* Note that I am picking on economics here only because the article studied econ journals. I would bet that a similar finding could be obtained in Sociology journals, and probably other social science fields with a heavy statistical bent.

Quantified Status and Credit Scores: A Thought Experiment

Imagine that, at some point in the near future, a large information company like Google or Facebook comes up with a metric to determine how fashionable or popular an individual is, call it a “quantified status” measure. Klout is one attempt to do this already, but I don’t think it’s particularly successful. I have in mind something more like “whuffie” from Cory Doctorow’s Down and Out in the Magic Kingdom. Now, imagine that in addition to having a good metric of your reputation, this company also produces statistical evidence showing that influential purchasers provide a value to the company whose products they purchase. If you see Cory Doctorow using an iPad, it influences perceptions of iPads to make more consumers think they are cool, and that increases iPad sales. Think here of Podolny’s work on “status signals.” Similarly, if someone low in status uses your product, it actually costs you sales.* Maybe the relationship isn’t perfect, but it’s statistically significant and substantively important (a high status person buying your product is worth substantially more than the sale price, a low status person either actually costs you money or at a minimum nets you much less than the sale price).

Should the company be allowed to use this quantified status measure to charge different prices to consumers? In other words, if an iPad normally costs $300, should Apple be allowed to sell it for $200 to someone with relatively high status, and $400 to someone with relatively low status? Companies already do a bit of this here and there: giving away free bags and gadgets to celebrities, for example. But here they would take it to the extreme – iPads no longer have a single price, their price is always a function of the purchaser’s status. No one would be outright prohibited from buying iPads, but some individuals would be charged a very large amount to make up for the bit of damage they do to the brand and the “cool factor” of the product.

Now suppose that this quantified status measure correlates very heavily with race, education, class origin, sexuality, gender, age, and/or having certain physical disabilities. Would your answer to the previous question change? Let’s stick with race. Suppose that quantified status had a pretty strong correlation with race, and thus by adopting quantified status as a pricing factor Apple would end up charging black customers $50 more on average, and white customers $10 less on average. But, keep in mind that quantified status never looked at race as a variable – instead, it looked at things like your purchase history, your number of Facebook friends, how much status your friends had, etc. And keep in mind that Apple has a business justification for adopting quantified status – it has statistical models showing that low status customers actually cost it money, and high status customers are worth more than they are paying. To not charge different prices would be to throw money away. Apple isn’t engaging in price discrimination in the classic sense of charging customers by their “willingness to pay,” but rather determining that selling an iPad to Cory Doctorow is actually an entirely different, but commensurable, transaction from selling one to me. What should Apple do? And should the state step in to stop it?

The example above sounds a little bit fanciful, although most of the elements to make it possible are in place already for some kinds of companies and purchases. But the example also maps very closely onto the actual history of credit scoring (see Martha Poon’s work, among others). Credit scoring came onto the scene in the 1960s and 1970s as a way for consumer credit providers to more accurately assess whether individuals would be likely to default. In the 1990s, mortgage lenders adopted credit scoring as one of the most important indicators of whether a borrower was a good risk. But, rather than simply excluding individuals who were deemed a poor risk, lenders started charging different prices – different interest rates – for loans to customers with lower credit scores. Mortgage lenders in the mid-20th century relied on control-by-screening (not lending to individuals with bad credit histories), but charged relatively similar prices to everyone who received a loan. In the 1990s, they transitioned to control-by-risk, charging a risk-based price to borrowers, with credit scores playing a big part in determining which risk category an individual fell into (see Poon 2009 for details). Credit scores did not explicitly use race, gender, age or other socially meaningful variables (due, in part, to federal legislation prohibiting them from so doing), but they did have strong correlations with some of those characteristics.

States and the federal government for the most part allowed lenders to rely on credit scores, properly cleansed of explicit reliance on categories like race and gender. Should they do the same for status? You could argue that credit relationships are long term, and thus it makes sense to need more information about the potential borrower than you do for a purchase of a homogenous commodity (an iPad), but if we believe the Podolny story, then even these spot purchases may create a lasting tie between a firm’s brand and the consumer’s status, especially for any product consumed publicly. Where do we draw the line and say, no, the transaction is over and you must treat everyone you make the transaction with equally? And so on.

* When we read Podolny in Mark Mizruchi’s economic sociology seminar, we also read a short article about the champagne maker Cristal being upset that rappers had adopted it as their champagne of choice. Here’s the original Economist story about Cristal and a story about Jay-Z’s response. From the Economist: “Asked if an association between Cristal and the bling lifestyle could actually hurt the brand, [Cristal’s managing director] replies: ‘That’s a good question, but what can we do? We can’t forbid people from buying it. I’m sure Dom Pérignon or Krug would be delighted to have their business.'”

The Quantification of Everything: Webcomic Addition

And surprisingly enough, the webcomic is not xkcd! Saturday Morning Breakfast Cereal has a new comic that mocks the trend towards the quantification of everything:

Image

I’m guessing SMBC wasn’t thinking of Ted Porter’s Trust in Numbers, but this comic offers a surprisingly good gloss, specifically, how numbers help us make decisions “without seeming to decide.”

The Quantification of Everything: End-of-the-World Edition

In further proof that everything can be, has been, or is being quantified, check out this quantification of the end of the world. No, I don’t mean MacKenzie’s fascinating work on “the end of the world trade” (insurance against massive economic collapse), but rather The Rapture Index. Billed as “the prophetic speedometer of end-time activity”, the index adds up measures of 45 variables including “False Christs”, “Moral Standards”, “The Economy”, “Mark of the Beast”, and “Volcanoes”. In case you’re worried, the index currently measures 182, up one point as of the most recent tabulation. Here’s the full description of the purpose of the index:

The Rapture Index has two functions: one is to factor together a number of related end time components into a cohesive indicator, and the other is to standardize those components to eliminate the wide variance that currently exists with prophecy reporting.

The Rapture Index is by no means meant to predict the rapture, however, the index is designed to measure the type of activity that could act as a precursor to the rapture.

You could say the Rapture index is a Dow Jones Industrial Average of end time activity, but I think it would be better if you viewed it as prophetic speedometer. The higher the number, the faster we’re moving towards the occurrence of pre-tribulation rapture.

Submitted by Carl, with this note: “My interpretation is that God plays dice, but His probability distribution is conditioned on Jimmy Carter’s latest public statement on the one-state solution.”

The Quantification of Everything: Link Round-up

  • Scoring Grants: Grant Panels as Prom Committees (H/T Pam Smock).

    The first challenge of grant review is the scoring algorithms. The NIH has changed this quite recently from a 1-5 scale with decimals to a 1-9 scale with no decimals. Not surprisingly, changing the scale has not done a great deal for how people map their evaluations to the metric. Some struggle mightily to develop their own algorithms to allocate points and retain measurement fidelity across all applications. Others lead with their gut and are quite idiosyncratic from application to application. The adjudication of these mapping processes happens in those airless hotel conference rooms when the applications are discussed and scored. People engage in brinkmanship, acquiescence, passionate articulate speechifying, and occasionally embarrassing backtracking. The consequence, however, is a scoring mechanism for any given application is constructed through a rough form of consensus building. Obviously most of the debate occurs in slicing up those on the margins. Everyone in these rooms recognizes the Elvis on Velvet and the Picasso of applications. Everything else is much harder.

  • Teacher Quality: No Student Left Untested (via The Economist’s Democracy in America)

    The state is making a bet that threatening to fire and publicly humiliate teachers it deems are underperforming will be sufficient to produce higher test scores. Since most teachers in New York do not teach tested subjects (reading and mathematics in grades 3-8), the state will require districts to create measures for everything that is taught (called, in state bureaucratese, “student learning objectives”) for all the others. So, in the new system, there will be assessments in every subject, including the arts and physical education. No one knows what those assessments will look like. Everything will be measured, not to help students, but to evaluate their teachers. If the district’s own assessments are found to be not sufficiently rigorous by State Commissioner of Education John King (who has only three years of teaching experience, two in charter schools), he has the unilateral power to reject them.

  • Putting a value on state parks (via Mark Thoma).

    Using conventional economic approaches to estimate the value of recreation time combined with relatively conservative assumptions, the estimated an annual contribution of the state park system is around $14 billion. That value is considerably larger than the annual operation and management costs of state parks.

    I hope Leslie Knope is reading!

  • The Quantification of Everything: NGO Rankings

    Readers of the blog likely know my fascination with quantification in all of its delightful and nefarious forms, including questionable rankings. A fellow UM Sociology student sent me a link to this excellent critique: Lies, damned lies, and ranking lists: The Top 100 Best NGOs. Interestingly, Algoso’s critique is not simply along the lines that the NGO rankings are arbitrary and their methodology a black box (though that is the case), but rather that NGOs, unlike (say) universities, are not sufficiently alike to be ranked at all:

    So could we apply that metric/formula approach to NGOs? I don’t think so. As GJ points out, there’s no easy way to compare impacts across social sectors. At least universities are all doing basically the same thing (they educate students, conduct research, run athletic programs, etc.) and are structured in basically the same ways. But Wikimedia Foundation, Ashoka, TED, Search for Common Ground, and MSF? I could not think of a more diverse group of organizations in terms of missions, methods, or structures. How would you ever craft a set of metrics that would apply to all of these, let alone a formula that spits out a number to fairly rank them?

    The problem is not simply that these particular rankings suck, but that the category “NGO” covers too heterogeneous a collection of objects to be worth ranking in the first place. Of course, as we know from the various branches of institutional theory, external forces like rankings often produce conformity when it is lacking to begin with. Whether or not NGOs will become more commensurable (is there an analog to earnings per share or return on equity?) remains to be seen, but it’s not impossible. For the moment though, rather than use the rankings to increase conformity, Algoso argues for dequantification – getting rid of the rankings entirely – on the grounds that this category is not sufficiently uniform to be worth ranking at all.*

    If any other readers come across interesting examples of unusual or contested quantifications, please send them my way!

    * This strategy is somewhat similar to the one my co-authors (Ellen Berrey and Fiona Rose Greenland) and I have written about in a new project on the dequantification of affirmative action at the University of Michigan. There, “race” was the contested, unsettled category whose lack of homogeneity imperiled a quantified system of admissions. Paper to be presented (hopefully) at ASA this year.

    The Quantification of Everything: Weighing Faith

    I wasn’t familiar with the website/magazine freq.uenci.es (“a collaborative genealogy of spirituality”) until a friend linked to a fascinating article by Lynne Gerber on the Christian weight-loss program First Place. The article documents the ritual weigh-ins that begin meetings of the groups, and how members make sense of their results:

    When being weighed, the member steps on the scale and recites the week’s scripture memory verse, one of nine commitments participants make for the duration of the thirteen-week program. The leader writes down the member’s weight in her book—it is almost always a her—along with the member’s success at recalling the verse. The fusion between religiosity and weight loss that marks First Place is exemplified in that moment where the member is held accountable to two sacred symbols of God’s power and will: scripture and the scale.

    The article goes into some detail about how members make sense of their successes and failures at meeting the various commitments – both spiritual and physical – required by the program. The end is particularly punchy, and connects the scale and faith again:

    The question Christian weight loss programs often poses for scholars of both religion and of dieting culture is similar to the ambiguity in First Place’s purpose: is Christian weight loss essentially a secular venture, luring believers into its programs by adding a spiritual varnish to a worldly practice, or is it merely explicating, marking or making clear the religious concerns that are at the heart of weight loss projects both sacred and secular. … First Place members don’t really care. They are much more taken with tension that mounts as the weigh-in progresses and their faithfulness is about to be measured by number. By collectively divining the scale in the wake of that judgment, the tension between godly ideals and bodily realities are eased and the program maintains its plausibility for another week. [Emphasis Added]

    Highly recommended for a quick and interesting Monday afternoon read.