Statistical Analysis of Voting and Elections Data

Descriptive and
Inferential Statistical
Methods:
Analysis of Voting and
Elections
Toni Menninger https://www.slideshare.net/amenning/presentations

Descriptive Statistics:
Voter Turnout by
Income, Age, and
Gender

Voting and Elections in Statistics
Source: http://www.demos.org/publication/why-voting-gap-matters

The graph displays voter turnout 2008-2012 by income
group. Both longitudinal and cross-sectional data are
presented. This graph is not a frequency distribution. The
variable shown is the conditional probability that a
member of an income group cast a vote in a general
election. This probability is highly and positively
correlated with income. The gap between presidential
and midterm election turnout is also greatest in the
lowest income group.

0
1
2
3
4
5
6
7
8
Numberofhouseholdsinmillion Distribution of Household Income in the
USA
Source: US Census Bureau 2012

Source: http://www.demos.org

The time series (longitudinal) chart shows voter
registration rates for three income groups over time, since
1972. The population has been divided into five income
groups (quintiles) (it is not stated whether income refers
to household or individual income). Of the lowest quintile,
only about 50% are registered whereas more than 80% of
the highest quintile are registered. As a result, the low
income population is far underrepresented in the
population of registered voters.

This chart again combines longitudinal (four consecutive presidential
elections), with cross-sectional (age) data. The data represent
conditional probabilities that a member of a certain age group is
registered to vote (blue), that a registered member actually voted
(orange), and that a member voted (the product of the other two,
green). The data indicate a huge gap in voter turnout between the young
and especially those aged 55-75 who are almost twice as likely to cast
their vote. The gap has shrunk recently. Note that age and income are
correlated, so the income and age gaps are not independent
phenomena. Remember that the 18-30 years age group is quite large
(next chart) but underrepresented in elections.

0 5 10
Under 5 years
5 to 9 years
10 to 14 years
15 to 19 years
20 to 24 years
25 to 29 years
30 to 34 years
35 to 39 years
40 to 44 years
45 to 49 years
50 to 54 years
55 to 59 years
60 to 64 years
65 to 69 years
70 to 74 years
75 to 79 years
80 to 84 years
85 to 89 years
90 years and over
Millions
Population Profile of USA according to 2010
Census
Female
Male

Source: Census

Again here is a longitudinal view of the age gap.
Over the last 50 years, all age groups but
especially the young and middle age groups
have become less involved in elections.

The table compares the relative frequency distribution of the whole
population by age group with the distribution of the population of voters
by age group. The age group 18-29 represents 21% of the population but
only 15% of voters. Its voting power is diluted by the age gap in voter
turnout. The Age groups 45 and older are overrepresented in the
population of voters relative to their share of the population. The
difference is displayed graphically on the next chart.

Source: Census
There is also a gender gap in voting behavior: women are more likely to
vote, except among those 65 years and older.

Source: Census
Further reading
File, Thom. 2013. “Young-Adult Voting: An Analysis of Presidential
Elections, 1964–2012.” Current Population Survey Reports, P20-
572. U.S. Census Bureau, Washington, DC.
http://www.census.gov/prod/2014pubs/p20-573.pdf
File, Thom. 2013. “ The Diversifying Electorate—Voting Rates by
Race and Hispanic Origin in 2012 (and Other Recent Elections).”
Current Population Survey Reports, P20-569. U.S. Census Bureau,
Washington, DC
Thom File and Sarah Crissey, “Voting and Registration in the Election of
November 2008,” U.S. Census Bureau, Washington, DC, May 2010.

Inferential Statistics:
Polls and Election
Forecasting

Polls and Election Forecasting
An election poll can be considered as a binomial experiment if the
survey question is binary (“do you support candidate X, yes or no?”).
The “true” ratio of supporters of the candidate in the population
(which however can change over time) is p, n is the sample size. The
number of supporters in the sample (the sample statistic) is a binomial
random variable with mean = np and variance = npq ≈ n/4, therefore
standard deviation SD ≈1/2 √n.
For large enough sample size, the binomial distribution approaches a
normal distribution. The chance is 95% that the sample statistic will be
within 2 (1.96 to be precise) standard deviations of the mean, or
within ± √n. The tolerance interval for the sample proportion lies
within ± 1/ √n of p (this result is obtained from dividing everything by
the sample size n).
For n=100, the tolerance interval is p ± 10%; for n=400, p ± 5%;
for n = 1,000, roughly p ± 3%.

Example: p=0.45, n=100 Mean=45, SD ≈5.
The 95% tolerance interval for the sample statistic is [35; 55],
or 45% ± 10%.
For n=400 we have mean = 180, SD ≈ 10, tolerance interval [160-200],
or 45% ± 5%. That is, even if the candidate’s level of support is only
45%, it is quite likely to observe a level of 49% or more in the sample,
from which it would be impossible to reliably infer the actual outcome
of the election.
These examples show that most polls have a fairly large uncertainty. It
takes a large sample size for the sample statistic to give a good
estimate of the population statistic (the actual level of support for the
candidate), especially in close races. The reliability of a statistical
estimate is expressed in terms of its confidence interval (which is not
the same as the tolerance interval). When polls are reported, it is
usually stated how reliable they are.
E. g. “plus or minus 3 percentage points, 19 out of twenty times”
means that the 95% confidence interval is ± 3%.

A poll is only valid of it’s based on a true random sample, which
is actually difficult to achieve. Refer to book chapters 7.4 and
7.5 for details. It is known that certain population groups are
more difficult to reach and tend to be underrepresented in
polls. A difficulty in recent years has been the increasing
number of people with only cell phone connections.
Two recent articles in the New York Times, one of them related
to polling, the other to the estimate of unemployment,
highlight these difficulties with obtaining random samples:
Why Polls Tend to Undercount Democrats, NYT 10/30/2014
A New Reason to Question the Official Unemployment Rate,
NYT 8/26/2014

More sophisticated election forecasts use data aggregated from
many polls to calculate the probability of each election
outcome. Examples are the New York Times election model
(methodology is explained here) and Nate Silver’s
‘FiveThirtyEight’ election forecast model (see next page).
These forecasters have developed models to predict the overall
outcome of a Senate or presidential election (which are held
state by state). They estimate the probability of each state-wide
outcome and then simulate the national election, using the
estimated probabilities, many times. The simulation outcomes
are used to estimate the probability distribution for the overall
outcome in terms of Senate seats held by each party.
On the NYT election web site, you can see this in action: when
you click a button, a fortune wheel starts turning representing
each Senate race displaying the estimated probabilities.

Voting and Elections in Statistics: Forecasting
Nate Silver’s famous “FiveThirtyEight” model
accurately predicted the 2008 and 2012
presidential election. For details, see
http://fivethirtyeight.com/interactives/senat
e-forecast/

Statistical Analysis of Voting and Elections Data

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Statistical Analysis of Voting and Elections Data

Semelhante a Statistical Analysis of Voting and Elections Data (20)

Mais de Toni Menninger

Mais de Toni Menninger (16)

Último

Último (14)

Statistical Analysis of Voting and Elections Data