So, I’m hoping to avoid too much election coverage in this blog, but when Survey USA released general election polling results for all 50 states, I decided I had to dig into the data a bit deeper. In particular, I wanted to understand the claim that winning a primary implies that a candidate will do better in the general election in that state. For example, Senator Clinton might claim that her strong showing in Ohio suggest she’s more likely to win the state in the general election. The question is: Does this argument hold any water? For example, a strong showing in a primary might suggest that a candidate is highly favored by the base, but has little crossover appeal, and thus would be negatively correlated with general election polls. On the other hand, strong performance (especially in a primary rather than a caucus, more on that later) might mean that a candidate has a lot of support among independents, and thus would be likely to do well in pre-election polls. So, we begin the analysis with no particular conclusion in mind, with perhaps a slight bias towards the 2nd if for no other reason than that is is the argument most commonly made.
So far, my analysis suggests that the 2nd story has more merit – but wait til the end for a bit more details. The particular finding: For every 10% that Sen. Clinton’s vote total exceeded Sen. Obama’s in a primary or caucus, she was predicted to improve about 4% against Sen. McCain in that state (as compared to Obama). The same holds true in reverse for Sen. Obama (perforce, by the logic of linear regression). This result holds up when the analysis is restricted to only large states (those worth 15 or more electoral votes, of which only 7 have had meaningful primaries, as MI and FL don’t count. First, Obama was not on the MI ballot. Second, neither candidate campaigned in either state, making the vote totals a little meaningless.) For those with a bit of statistical background, the correlation for all states is .714 (which is quite high).
Here’s an ugly, quick, I apologize for not knowing how to make a better graph, scatterplot of the data:
When restricted to only those states where either Sen. Clinton or Sen. Obama is within 8 percentage points of Sen. McCain in the Survey USA data (one way of approximating the idea of swing states), the result remains nearly identical. This list includes 22 of the 39 valid cases. The list of states that would be defined as swing states is as follows: AK, CO, DE, FL, HI, IA, ME, MA, MI, MN, MO, MT, NE, NV, NH, NJ, NM, NC, ND, OK, OR, PA, SC, SD, TN, TX, VA, WA, WV, WI.*
This list has some serious issues, however. In particular, Ohio, widely regarded by all parties as a swing state, does not make the cut – perhaps because of their intense campaigning in the last few weeks, both Obama and Clinton poll about 10 points ahead of McCain there right now. Additionally, for similar reasons, Texas was within 8 points for both candidates, even though the state will be very difficult. Hawaii is another strange outlier – Obama is favored to win by a landslide, but Clinton only by 4. So, to generate a list that approximates more closely those states seen as swing states by the political pundits, I asked a politically savvy friend to generate a list independent of the Survey USA Data. Here’s the list:
Max’s Swing States: CO, FL, GA, IN, IA, MI, MN, MO, NV, NH, NM, NC, OH, OR, PA, VA, WV, WI.
This list maps very closely onto the list of states that were within 5% in the 2004 election, with a couple additions (principally, MO, IN, GA, NC, VA and WV, of which GA and IN are probably the most contentious additions). When the analysis is restricted to this tighter, but perhaps more in line with the common wisdom, list of swing states, the effect completely drops out.
Let me reiterate that: In the states most commonly thought of as swing states, performance in the primary does not usefully predict general election polls against McCain. To say that a third way, the margin by which Sen. Clinton defeated or lost to Sen. Obama in one of those swing states was not significantly correlated with how well she performed (vis a vis Obama) in the general election polls. The correlation in those states was only .248 – not tiny, but much smaller than for the overall sample.
Here’s another ugly scatterplot showing the lack of relationship:
Another analysis we could run would exclude the results of caucus states. Obama’s victories in the caucus states, and in particular the size of those victories, has been seen as something of an aberratino – a very excited minority has a much bigger say in the smaller caucus process. So, using the categorization provided by cnn.com, I rerun the same 1 independent and 1 dependent variable for only primaries. For the 26 states with primaries (as opposed to caucuses), the story is much the same as the one in the first two runs above – a strong relationship (r=.737).
So what do we conclude? There is some evidence for the argument that a strong showing in a primary predicts a relatively stronger showing in the general election. However, there is some reason to doubt whether or not this general trend will apply in the states that matter most – important swing states like Ohio (where Obama and Clinton are polling almost identically in general election matchups, both ahead 10 points). So, long story short, do not take an assertion by a candidate that their strong showing in a state’s primary implies they will win that state handily in the general election at face value. The data may simply not support it.
As a kicker, here’s some pure descriptive stuff that might be much more important.
The average of (HC-JM)-(BC-JM) is -5%, which is to say, Obama does 5% better against McCain, averaged over all states, than Clinton. In the swing states in particular, Obama does quite well. Here’s a bar chart showing the dependent variable, Clinton vs Obama net McCain, across Max’s Swing States:
If anyone wants my data file, I’m happy to provide it – just drop me a comment with your email address. If anyone with more statistical sophistication than myself wants to suggest better ways to run the numbers, while keeping the goal in mind of being descriptive (and not trying to make strong claims of any sort based on such few cases and a single poll) and easy to present, I would love to try some other methods. For now though, I am off to bed. Tomorrow, a post on why seating the Michigan and Florida delegations would disenfranchise voters who trusted the national level Democratic party, and another on social construction and booze (assuming I get around to them).
* In case you were curious:
The following are only swing states for Clinton (i.e. |Clinton-McCain| < 8 ):
CO, DE, HI, IA, ME, OK, TN, WA, WV, WI
The following are only swing states for Obama:
AK,FL, MA, MT, NE, ND, SD, VA
The following are swing states for both:
MI, MN, MO, NV, NH, NJ, NM, NC, OR, PA, SC, TX