We tend to think of information overload as a new problem. Kristin Luker’s excellent methods of research advice guide/textbook, Salsa Dancing into the Social Sciences, is subtitled Research in an Age of Info-glut. But what is info-glut? And when did it start?
The mathematician in me wants to define info-glut in a sort of peculiar way.* Think back to algebra. It won’t hurt, I promise! Suppose you’ve got a system of equations with a few unknowns. There are only three possibilities:
The last answer – known as an “overdetermined system” – is how I think about info-glut. If you are in situation two, your path is straightforward, there is one right answer**. Situation one is desperate, and there’s no right path. In situation three, you have too much to work with and everything can’t be right.
But what if you already know that your information isn’t quite right? In other words, if you think your observations have some error? The obvious answer to we moderns is to combine the data into a statistic, like an average.
In 1749, no one had yet mastered this trick.*** Euler, a justly famous and important mathematician, when faced with too many observations of the orbits of Jupiter and Saturn to solve for too few unknowns wrote, “Now, from these equations we can only conclude nothing; and the reason, perhaps, is that I have tried to satisfy several observations exactly, whereas I should have only satisfied them approximately; and this error has then multiplied itself.” (Quoted in Stigler 1990: 27). Euler’s failure to deal with his info-glut led him as far as to question whether Newton’s inverse square law held over large distances (ibid: 30).
Writing just one year later about the libration of the moon, Tobias Mayer would propose a first solution to the problem of combining observations. Given 27 equations (observations) for three unknowns, Mayer strategically grouped the 27 into three groups of nine and added them up within those group. He thus reduced his overdetermined system of 27 equations into an exactly determined system of three, but still capturing something from each observation. His info-glut was gone.
Laplace improved on Mayer’s method, and soon after this “method of averages” was replaced with the modern least squares estimators. All of them solve the same fundamental problem: how to estimate a set of unknowns given too much information (none of which is perfect). We’ve long since automated these solutions – check out your favorite statistics package or even just spreadsheet program – and we no longer think of having too many observations as a “problem”. But Euler, working before this first solution, did.
All of this is to say that new statistical techniques, especially ones that become settled conventions, solve problems in info-glut. These solutions encode particular choices about what features of the world to emphasize – for example, assuming normal distributions makes us think in terms of mean and variance, and not so much about “fat tails”, a problem for modern finance (see MacKenzie’s work). Our modern iteration of this centuries old problem is only different in quantity****, not in kind. The proliferation of kinds of data awaits new methods for simplifying it. Info-glut, in other words, is relative to the advance of statistical methods that become taken-for-granted and institutionalized.
* To be fair, Luker is specifically talking about info-glut in the context of qualitative and historical (and “non-canonical”) research, and so this mathematically-inspired presentation doesn’t really do her work justice. On the other hand, the boundaries between qualitative and quantitative are never quite so rigid. See, for example, natural language processing methods.
** As Asimov noted, “The number two is ridiculous and can’t exist.”
*** The rest of this comes from Stigler’s excellent The History of Statistics: The Measurement of Uncertainty before 1900, chapter 1.