How can we best understand trends in postwar income inequality in the United States? What data are available for understanding these trends? What is the best way to represent these trends visually? In this post, I want to argue that the basic facts of income inequality over the last 65 years require a minimum of two graphs drawing on two data sources. First, I’m going to say a bit about the data, then a bit about the trends, and finally I’m going to show a few possible graphs which cover parts of the story (but none of which is perfect on its own).
Data on Income Distribution
The United States has surprisingly poor historical data about income distribution (and thus, income inequality). More recent years are covered by comprehensive survey datasets like the Panel Study on Income Dynamics. But the crucial period from the end of World War II to the 1960s is covered in only two big datasets: first, the now famous Piketty and Saez data on top incomes which goes back to 1913 , and second, Current Population Survey data limited to measurements of family rather than household income that go back to 1947. For whatever reason, the Census historical data on household incomes only start in 1967, presumably reflecting some change in the methodology of the CPS’s annual income supplement.
My favorite dataset for understanding income distribution, the CBO’s post-tax and transfer data, only go back to 1979. These data combine survey and income tax data in a way that is very difficult for researchers outside the government, along with estimates of government transfers, and they also attempt to adjust for household size and the nonlinear relationship between expenses and number of people in the household. Thus, the data are probable the best available for looking at real economic outcomes from the bottom of the distribution to the top 1%. As such, these data are the base for Lane Kenworthy’s excellent “best inequality graph.” I recommend his extensive analysis and defense of the graph (the updated version of which is below). I agree that it (or something very similar) is the best graph to cover the post-1970s period, but I will argue that at least two graphs are needed to show what happened to the whole distribution from 1947 to present.
Stylized Facts of Inequality, 1947-2011
As suggested by the above graph, one of the most important (and recently discovered ) facts about inequality in the 20th century is the dramatic growth of incomes at the very top combined with the stagnation of real income for most of the distribution. The stagnation in wages for the middle of the distribution starts in the late 1970s/early 1980s, and persists until present. The top 20% or 10% do a bit better, and the very top (.01%) do incredibly well (as I will show in a moment). But what happened before, in the crucial postwar golden years of 1947-1978(ish)?
To me, the most salient feature of the 1940s-1970s income distribution is how every part of the distribution rose relatively equally. Specifically, between 1945 and 1978, the income threshold to be in the 20th, 40th, 60th, 80th, and 95th percentile all doubled. Fascinatingly, during this time, the incomes at the very top stagnated. These trends diverged in the 1980s – top incomes kept going up, and the very top skyrocketed, while most income stagnated.
Alright, so now we have a sense of the basic facts and the best available data. How can we best visualize them?
Two Possible Graphs
The brilliant thing about Kenworthy’s graph is that it manages to portray so viscerally the stagnation at the bottom alongside the growth at the top while using actual dollar magnitudes. When we switch to telling the whole postwar income distribution story, however, I’m not sure we can do it cleanly with actual dollar amounts. At least, the best things I’ve come up with so far involve normalizations instead. If you’d like to take your hand at it, I’m happy to provide the spreadsheet from which these graphs were generated. So, the first graph tries to show the equality of gains across the distribution followed by the rupture in the late 1970s.
This graph shows the threshold for the real (inflation-adjusted) 20th, 50th (median), 80th, and 95th percentiles of family income from 1947 to 2011, with 1947 set to 100. This graph shows the unified growth of incomes up through the late 1970s, and then the divergence as the median and 20th percentiles stagnate while the top quintile continue to increase. This increase levels off for both the 80th and 95th percentiles in late 1990s, and over the last decade incomes have basically been flat at all levels. But this paints a distorted picture of the very top of the income distribution. While the 95th percentile tripled since 1947, and increased by about 50% since 1980, the very top have done a lot better. So, here comes the Piketty and Saez data, mixed with a dash of not quite commensurable census data:
We borrow the median income data from the previous graph and combine it with the top income thresholds from the Piketty and Saez dataset, all inflation-adjusted, all normalized to set 1947=100. Also, these are the Piketty and Saez data excluding capital gains (which would make the picture look even more extreme, but also less comparable as the Current Population Survey doesn’t capture capital gains well). What do we see? Median income still rises and then flattens out post-1980. The 90th percentile follows much the same trend, but flattens out a bit less. In contrast, the 99 and 99.99 percentiles behave quite differently, staying relatively flat in the 1950s-1960s, and skyrocketing in the 1980s. The trend at the very top (the 99.99th percentile) is particularly striking. These very elite, top incomes didn’t budge from 1947 to 1978. Then, they take off like gangbusters, increasing by a factor of 6 in just 30 years. The 99th percentile follows the same trend, but much less sharply.
So, together, what do these graphs show? The postwar golden era was one of rising incomes for everyone but the superrich. The 1980s-2000s saw stagnating incomes in the middle of the distribution, small gains at the top, and massive gains at the very top.
There are lots of other ways you could graph this data. You can show actual income on regular or logged scales, you can look at simple ratios (90/50) that more directly capture our understanding of inequality, and so on. I like these because they show trends very nicely, and they highlight the stylized facts that I think most usefully characterize the income distribution in this period . What do you think? Suggest an alternative, or ping me for the data and plot it yourself!
Kevin Bryan, of A Fine Theorem, published a nice detailed paper on this topic in 2008. Bryan and co-author Martinez use data from the CPS, Piketty and Saez, and Social Security data which I had missed in my discussions. That paper also has some nice examples of what you can do with 90/50 and 50/10 ratios, and logged graphs. Here’s one example:
This figures shows and then decomposes the 90/10 gap: “Figure 2 presents the evolution of log income ratios. It shows that from 1961 to 2002, the CPS March log 90-10 ratio increased from 1.23 to 1.61. The ratios computed using the CPS ORG data set behave similarly. Figure 2 also shows that the vast majority of the increase in the log 90-10 ratio is due to an increase in the 90-50 ratio.”
 That I know of! Experts on income data, please come forward and let me know of any that I’ve missed!
 When the US first started collecting income taxes, and thus generated good data on top income earners.
 What’s difference between a household and a family, according to the Census? Glad you asked: “A family consists of two or more people (one of whom is the householder) related by birth, marriage, or adoption residing in the same housing unit. A household consists of all people who occupy a housing unit regardless of relationship. A household may consist of a person living alone or multiple unrelated individuals or families living together.”
 I am currently working on a paper / dissertation chapter on the history of income distribution data which tries to understand why it took so long for the growth in top incomes in the 1980s to become widely discussed (e.g. “the 1%” that became such a topic of academic and political interest in the 2000s). Send me an email if you’d like to read a (very) preliminary version, or attend my presentations at SASE or ASA this summer.
 This is the part where data vis folks can make fun of me for using Excel. I know, I’m sorry. One of these days I plan to do more than just tinker with R and actually get it to do what I want. Until then, we can all suffer through.
 It’s worth putting in a reminder that the thing being graphed here is the distribution of income, more specifically, the threshold needed to be in a certain part of the income distribution in different years. Individuals follow income trajectories, and don’t stay in exactly the same place over time. Questions around the stability of income within-individuals and across generations are exactly what panel studies like the PSID are designed to show. Unfortunately, as far as I know, they don’t go back much before the late 1960s. In response to criticisms along these lines, Statistics Canada has recently published some data on stability in the very top income earners (using confidential tax data) which suggests that “Four-fifths of Canadians in the top five income percentile have consistently been there in the past five years, the statistics show, and the proportion of people remaining in the upper echelons has been growing since the early 1980s.” So, the 1%, in Canada at least, is a consistent group of individuals, not simply a statistical artifact as individuals rotate in and out. The US does not publish similar official income data, nor similar data on mobility into and out of the 1%.
Kopczuk, Saez, and Song (2010) also have a nice paper on the US using Social Security data which tries to determine how much of the increase in inequality is due to transitory vs. permanent dynamics, and thus they conclude: “the evolution of annual earnings inequality over time is very close to the evolution of inequality of longer term earnings.” (94-95) Kopczuk et al. also find that those who are in the top 1% of earners in one year are 80% likely to be in the top 1% the following year, and 60% likely five years later, again suggesting that the top 1% is a meaningful group. More broadly It seems like Social Security data have real promise for producing income inequality measures and graphs going back to World War II, but have thus far been used only a handful of scholars due to their lack of public availability. One nice feature of the Social Security data is the inclusion of (a few) demographic variables including gender and race. For example, this nice graph shows that women make up only about 14% of the top 1% of income earners, and only about 22% of the top 10% of income earners, even as they make up about 44% of all workers (all data through 2004).