David Brooks has a new op-ed about “the philosophy of data.”* The examples he uses are interesting in their own right – the lack of streaks or hot hands in professional sports, the relatively small impact of campaign spending on election outcomes, and pronoun usage vs. self-confidence – but they are also interesting in that I don’t think any of them really rise to the level of “big data” or at least, if they do, they are not especially new forms of “big data.” No one’s entirely sure what big data is – my favorite definition is something like “any dataset too big to fit in Excel” – but I tend to have in mind data produced as a by-product of regular activity that provides a means for unobtrusive analysis of individual behavior. Think of your Google search stream, the set of all public Tweets, your purchases at Amazon, etc. Brooks’ examples are all older and more banal, in some sense: people have been tracking election outcomes, sports statistics, and language usage for a long time. Still, I wonder if all the rumblings about big data in academia and the corporate world are going to spawn more reflections about data-driven decision-making, or “the philosophy of data” in general.
Still, I wish Brooks had talked a bit more about the parts of big data that I see as potentially most problematic or downright Orwellian. Think here of the story of Target realizing (algorithmically predicting, anyway) that a teenaged girl was pregnant before even her parents knew (reported here). Privacy concerns meet up with massive corporations collecting data on our every movement and sophisticated analytical tools.
On the other hand, and perhaps closer to Brooks’ story, think of the major narrative around the presidential campaign and big data: the Obama campaign’s “victory lab”, and their ability to micro-target voters and donors in a way far surpassing the Republican efforts. Read side by side* with the Political Science findings on the relative unimportance of campaign spending and GOTV efforts, the story has an ironic twist. Research by Sides and other political scientists suggests that this fundraising and GOTV effort was only marginally effective, if at all. Obama won for the same reasons other candidates in the past won: in this case, an incumbent presiding over a growing economy usually wins. A slowly-growing economy predicts a small win. Voila. So, in this case, small data successfully predicts the relatively small value of big data.
*H/T to FRG for sending it along.
**Or side by Sides?
Carlos Ferreira
/ February 5, 2013I sometimes think that “big data”, like “cyberspace”, reflects absolutely nothing. It is a vision of a future, sure, but one which is conditional. I have no doubt that data availability will contribute to reorganising society and institutions, but it will do so in ways we are yet to understand.
jlundy
/ February 5, 2013As someone eagerly following the sudden interest in “big data” I appreciate it coming up on your radar. (And ICOS’s too? Is there an event or something that should be mentioned?)
1) Totally agree on the lack of definition of its meaning. I think you’re right about it being large datasets gathered automatically. For instance, IBM’s Dave Bartlett refers to millions of devices out there collecting data autonomously that should be used to more intelligently manage buildings.
2) Even more strongly agree about the recent turn to data-driven philosophy. Corporations in particular love “data” and “metrics,” but I find that they are remarkably un-skeptical about what counts as data. As a semi-corporate researcher myself, I love the turn toward empiricism; but I also constantly struggle to make sure that data is actually academic-grade, properly grounded data.
Martin Barron
/ February 5, 2013Your definition of “Big Data” is pretty close to what Bob Groves calls “Organic data” (as opposed to “Designed Data”. you might find it interesting.
http://poq.oxfordjournals.org/content/75/5/861.abstract
Dan Hirschman
/ February 5, 2013Thanks! I really like the term “organic data.”