David Brooks has a new op-ed about “the philosophy of data.”* The examples he uses are interesting in their own right – the lack of streaks or hot hands in professional sports, the relatively small impact of campaign spending on election outcomes, and pronoun usage vs. self-confidence – but they are also interesting in that I don’t think any of them really rise to the level of “big data” or at least, if they do, they are not especially new forms of “big data.” No one’s entirely sure what big data is – my favorite definition is something like “any dataset too big to fit in Excel” – but I tend to have in mind data produced as a by-product of regular activity that provides a means for unobtrusive analysis of individual behavior. Think of your Google search stream, the set of all public Tweets, your purchases at Amazon, etc. Brooks’ examples are all older and more banal, in some sense: people have been tracking election outcomes, sports statistics, and language usage for a long time. Still, I wonder if all the rumblings about big data in academia and the corporate world are going to spawn more reflections about data-driven decision-making, or “the philosophy of data” in general.
Still, I wish Brooks had talked a bit more about the parts of big data that I see as potentially most problematic or downright Orwellian. Think here of the story of Target realizing (algorithmically predicting, anyway) that a teenaged girl was pregnant before even her parents knew (reported here). Privacy concerns meet up with massive corporations collecting data on our every movement and sophisticated analytical tools.
On the other hand, and perhaps closer to Brooks’ story, think of the major narrative around the presidential campaign and big data: the Obama campaign’s “victory lab”, and their ability to micro-target voters and donors in a way far surpassing the Republican efforts. Read side by side* with the Political Science findings on the relative unimportance of campaign spending and GOTV efforts, the story has an ironic twist. Research by Sides and other political scientists suggests that this fundraising and GOTV effort was only marginally effective, if at all. Obama won for the same reasons other candidates in the past won: in this case, an incumbent presiding over a growing economy usually wins. A slowly-growing economy predicts a small win. Voila. So, in this case, small data successfully predicts the relatively small value of big data.
*H/T to FRG for sending it along.
**Or side by Sides?