SlideShare a Scribd company logo
1 of 16
Download to read offline
5-4 1
Methodology and Ontology in Statistical Modeling:
Some error statistical reflections
Our presentation falls under the second of the bulleted
questions for the conference:
How do methods of data generation, statistical
modeling, and inference influence the
construction and appraisal of theories?
Statistical methodology can influence what we think
we’re finding out about the world, in the most
problematic ways, traced to such facts as:
• All statistical models are false
• Statistical significance is not substantive
significance
• Statistical association is not causation
• No evidence against a statistical null hypothesis is
not evidence the null is true
• If you torture the data enough they will confess.
(or just omit unfavorable data)
These points are ancient (lying with statistics, lies damn
lies, and statistics)
People are discussing these problems more than ever
(big data), but it’s rarely realized is how much certain
methodologies are at the root of the current problems
5-4 2
All Statistical Models are False
Take the popular slogan in statistics and elsewhere is
“all statistical models are false!”
What the “all models are false” charge boils down to:
(1) the statistical model of the data is at most an
idealized and partial representation of the actual
data generating source.
(2) a statistical inference is at most an idealized and
partial answer to a substantive theory or question.
• But we already know our models are
idealizations: that’s what makes them models
• Reasserting these facts is not informative,.
• Yet they are taken to have various (dire)
implications about the nature and limits of
statistical methodology
• Neither of these facts precludes the use of these to
find out true things
• On the contrary, it would be impossible to learn
about the world if we did not deliberately falsify
and simplify.
5-4 3
• Notably, the “all models are false” slogan is followed
up by “But some are useful”,
• Their usefulness, we claim, is being capable of
adequately capturing an aspect of a phenomenon of
interest
• Then a hypothesis asserting its adequacy (or
inadequacy) is capable of being true!
Note: All methods of statistical inferences rest on
statistical models.
What differentiates accounts is how well they
step up to the plate in checking adequacy, learning
despite violations of statistical assumptions
(robustness)
5-4 4
Statistical significance is not substantive significance
Statistical models (as they arise in the methodology of
statistical inference) live somewhere between
1. Substantive questions, hypotheses, theories H
2. Statistical models of phenomenon, experiments,
data: M
3. Data x
What statistical inference has to do is afford
adequate link-ups (reporting precision, accuracy,
reliability)
5-4 5
Recent Higgs reports on evidence of a real (Higg’s-like)
effect (July 2012, March 2013)
Researchers define a “global signal strength” parameter
H0: μ = 0 corresponds to the background (null hypothesis),
μ > 0 to background + Standard Model Higgs boson signal,
but μ is only indirectly related to parameters in substantive
models
As is typical of so much of actual inference (experimental
and non), testable predictions are statistical:
They deduced what would be expected statistically from
background alone (compared to the 5 sigma observed)
in particular, alluding to an overall test S:
Pr(Test S would yields d(X) > 5 standard
deviations; H0) ≤ .0000003.
This is an example of an error probability
5-4 6
The move from statistical report to evidence
The inference actually detached from the evidence can be
put in any number of ways
There is strong evidence for H: a Higgs (or a Higgs-like)
particle.
An implicit principle of inference is
Why do data x0 from a test S provide evidence for
rejecting H0 ?
Because were H0 a reasonably adequate description of
the process generating the data would (very probably)
have survived, (with respect to the question).
Yet statistically significant departures are generated:
July 2012, March 2013 (from 5 to 7 sigma)
Inferring the observed difference is “real” (non-fluke)
has been put to a severe test
Philosophers often call it an “argument from
coincidence”
(This is a highly stringent level, apparently in this arena of
particle physics smaller observed effects often disappear)
5-4 7
Even so we cannot infer to any full theory
That’s what’s wrong with the slogan “Inference to the
“best” Explanation
Some explanatory hypothesis T entails statistically
significant effect.
Statistical effect x is observed.
Data x are good evidence for T.
The problem: Pr(T “fits” data x; T is false ) = high
And in other less theoretical fields, the perils of “theory-
laden” interpretation of even genuine statistical effects
are great
[Babies look statistically significantly longer when red
balls are picked from a basket with few red balls:
Does this show they are running, at some intuitive level,
a statistical significance test, recognizing statistically
surprising results? It’s not clear]
5-4 8
The general worry reflects an implicit requirement for
evidence:
Minimal Requirement for Evidence. If data are in
accordance with a theory T, but the method would have
issued so good a fit even if T is false, then the data
provide poor or no evidence for T.
The basic principle isn’t new, we find it Peirce, Popper,
Glymour….what’s new is finding a way to use error
probabilities from frequentist statistics (error statistics)
to cash it out
To resolve controversies in statistics and even give a
foundation for rival accounts
5-4 9
Dirty Hands: But these statistical assessments, some
object, depend on methodological choices in specifying
statistical methods; outputs are influence by
discretionary judgments: dirty hands argument
While it is obvious that human judgments and human
measurements are involved, (like “all models are false”)
it is too trivial an observation to distinguish how
different account handle threats of bias and unwarranted
inferences
Regardless of the values behind choices in collecting,
modeling, drawing inferences from data, I can critically
evaluate how good a job has been done.
(test too sensitive, not sensitive enough, violated
assumptions)
5-4 10
An even more extreme argument, moves from “models
are false”, to models are objects of belief, to therefore
statistical inference is all about subjective probability.
By the time we get to the “confirmatory stage” we’ve
made so many judgments, why fuss over a few
subjective beliefs at the last part….
George Box (a well known statistician) “the
confirmatory stage of an investigation…will typically
occupy, perhaps, only the last 5 per cent of the
experimental effort. The other 95 per cent—the
wondering journey that has finally led to that
destination---involves many heroic subjective choices
(what variables? What levels? What scales?, etc. etc….
Since there is no way to avoid these subjective
choices…why should we fuss over subjective
probability?” (70)
It is one thing to say our models are objects of
belief, and quite another to convert the entire task to
modeling beliefs.
We may call this shift from phenomena to
epiphenomena (Glymour 2010)
Yes there are assumptions, but we can test them, or
at least discern how they may render our inferences less
precise, or completely wrong.
5-4 11
The choice isn’t full blown truth or degrees of
belief.
We may warrant models (and inferences) to various
degrees, such as by assessing how well corroborated
they are.
Some try to adopt this perspective of testing their
statistical models, but give us tools with very little
power to find violations
• Some of these same people, ironically, say since we
know our model is false, the criteria of high power
to detect falsity is not of interest. (Gelman).
• Knowing something is an approximation is not to
pinpoint where it is false, or how to get a better
model.
[Unless you have methods with power to probe this
approximation, you will have learned nothing about
where the model stands up and where it breaks down,
what flaws you can rule out, and which you cannot.]
5-4 12
Back to our question
How do methods of data generation, statistical
modeling, and analysis influence the construction and
appraisal of theories at multiple levels?
• All statistical models are false
• Statistical significance is not substantive
significance
• Statistical association is not causation
• No evidence against a statistical null hypothesis is
not evidence the null is true
• If you torture the data enough they will confess.
(or just omit unfavorable data)
These facts open the door to a variety of antiquated
statistical fallacies, but the all models are false, dirty
hands, it’s all subjective, encourage them.
From popularized to sophisticated research, in social
sciences, medicine, social psychology
“We’re more fooled by noise than ever before, and it’s
because of a nasty phenomenon called “big data”. With
big data, researchers have brought cherry-picking to an
industrial level”. (Taleb, Fooled by randomness 2013)
It’s not big data it’s big mistakes about methodology
and modeling
5-4 13
This business of cherry picking falls under a more
general issue of “selection effects” that I have been
studying and writing about for many years.
Selection effects come in various forms and given
different names: double counting,hunting with a shotgun
(for statistical significance) looking for the pony, look
elsewhere effects, data dredging, multiple testing, p-
value hacking
One common example: A published result of a clinical
trial alleges statistically significant benefit (of a given
drug for a given disease), at a small level .01, but
ignores 19 other non-significant trials actually make it
easy to find a positive result on one factor or other, even
if all are spurious.
The probability that the procedure yields erroneous
rejections differs from, and will be much greater than,
0.01
(nominal vs actual significance levels)
How to adjust for hunting and multiple testing is a
separate issue (e.g., false discovery rates).
5-4 14
If one reports results selectively, or stop when the
data look good, etc. it becomes easy to prejudge
hypotheses:
Your favored hypothesis H might be said to have
“passed” the test, but it is a test that lacks stringency or
severity.
(our minimal principle for evidence again)
• Selection effects alter the error probabilities of tests
and estimation methods, so at least methods that
compute them can pick up on the influences
• If on the other hand, they are reported in the same
way, significance testing’s basic principles are being
twisted, distorted, invalidly used
• It is not a problem about long-runs either—.
We cannot say about the case at hand that it has done a
good job of avoiding the source of misinterpretation,
since it makes it so easy to find a fit even if false.
5-4 15
The growth of fallacious statistics is due to the acceptability
of methods that declare themselves free from such error-
probabilistic encumbrances (e.g., Bayesian accounts).
Popular methods of model selection (AIC, and others)
suffer from similar blind spots
Whole new fields for discerning spurious statistics, non-
replicable results; statistical forensics: all use error
statistical methods to identify flaws
(Stan Young, John Simonsohn, Brad Efron, Baggerly and
Coombes)
• All statistical models are false
• Statistical significance is not substantive
significance
• Statistical association is not causation
• No evidence against a statistical null hypothesis is
not evidence the null is true
• If you torture the data enough they will confess.
(or just omit unfavorable data)
To us, the list is not a list of embarrassments but
justifications for the account we favor.
5-4 16
Models are false
Does not prevent finding out true things with them
Discretionary choices in modeling
Do not entail we are only really learning about
beliefs
Do not prevent critically evaluating the properties
of the tools you chose.
A methodology that uses probability to assess and
control error probabilities has the basis for pinpointing
the fallacies (statistical forensics, meta statistical
analytics)
These models work because they need only capture
rather coarse properties of the phenomena being probed:
the error probabilities assessed are approximately related
to actual ones.
Problems are intertwined with testing assumptions of
statistical models
The person I’ve learned the most about this is Aris
Spanos who will now turn to that.

More Related Content

What's hot

Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversyjemille6
 
beyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paperbeyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paperChristian Robert
 
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correctionjemille6
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performancejemille6
 
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...jemille6
 
Discussion a 4th BFFF Harvard
Discussion a 4th BFFF HarvardDiscussion a 4th BFFF Harvard
Discussion a 4th BFFF HarvardChristian Robert
 
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."jemille6
 
Exploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory ResearchExploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory Researchjemille6
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talkjemille6
 
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...jemille6
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...jemille6
 
CourseProjectReviewPaper.jktamanaPDF
CourseProjectReviewPaper.jktamanaPDFCourseProjectReviewPaper.jktamanaPDF
CourseProjectReviewPaper.jktamanaPDFJasmine Tamanaha
 
Basic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-MakingBasic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-MakingPenn State University
 
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...jemille6
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Hypothesis testing, error and bias
Hypothesis testing, error and biasHypothesis testing, error and bias
Hypothesis testing, error and biasDr.Jatin Chhaya
 
Insights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibilityInsights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibilityDorothy Bishop
 

What's hot (20)

Controversy Over the Significance Test Controversy
Controversy Over the Significance Test ControversyControversy Over the Significance Test Controversy
Controversy Over the Significance Test Controversy
 
beyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paperbeyond objectivity and subjectivity; a discussion paper
beyond objectivity and subjectivity; a discussion paper
 
Severe Testing: The Key to Error Correction
Severe Testing: The Key to Error CorrectionSevere Testing: The Key to Error Correction
Severe Testing: The Key to Error Correction
 
D. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severelyD. G. Mayo: Your data-driven claims must still be probed severely
D. G. Mayo: Your data-driven claims must still be probed severely
 
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist PerformanceProbing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance
 
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
D. Mayo: The Science Wars and the Statistics Wars: scientism, popular statist...
 
Discussion a 4th BFFF Harvard
Discussion a 4th BFFF HarvardDiscussion a 4th BFFF Harvard
Discussion a 4th BFFF Harvard
 
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
Yoav Benjamini, "In the world beyond p<.05: When & How to use P<.0499..."
 
Exploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory ResearchExploratory Research is More Reliable Than Confirmatory Research
Exploratory Research is More Reliable Than Confirmatory Research
 
Final mayo's aps_talk
Final mayo's aps_talkFinal mayo's aps_talk
Final mayo's aps_talk
 
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
D. G. Mayo: The Replication Crises and its Constructive Role in the Philosoph...
 
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
Surrogate Science: How Fisher, Neyman-Pearson, and Bayes Were Transformed int...
 
CourseProjectReviewPaper.jktamanaPDF
CourseProjectReviewPaper.jktamanaPDFCourseProjectReviewPaper.jktamanaPDF
CourseProjectReviewPaper.jktamanaPDF
 
Basic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-MakingBasic Statistical Concepts & Decision-Making
Basic Statistical Concepts & Decision-Making
 
Bayes rpp bristol
Bayes rpp bristolBayes rpp bristol
Bayes rpp bristol
 
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
Mayo: Evidence as Passing a Severe Test (How it Gets You Beyond the Statistic...
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Hypothesis testing, error and bias
Hypothesis testing, error and biasHypothesis testing, error and bias
Hypothesis testing, error and bias
 
Insights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibilityInsights from psychology on lack of reproducibility
Insights from psychology on lack of reproducibility
 
Aron chpt 5 ed
Aron chpt 5 edAron chpt 5 ed
Aron chpt 5 ed
 

Viewers also liked

The Horizon Report 2015 with Audience Participation using Paper Clickers (Pli...
The Horizon Report 2015 with Audience Participation using Paper Clickers (Pli...The Horizon Report 2015 with Audience Participation using Paper Clickers (Pli...
The Horizon Report 2015 with Audience Participation using Paper Clickers (Pli...Jonathan Bacon
 
Unidad 8 projecto Final
Unidad 8 projecto FinalUnidad 8 projecto Final
Unidad 8 projecto FinalEddieRo
 
Borang permohonan kemasukan murid prasekolah
Borang permohonan kemasukan murid prasekolahBorang permohonan kemasukan murid prasekolah
Borang permohonan kemasukan murid prasekolahafaizomar
 
PetroSync - Formation Damage Prevention and Treatments
PetroSync - Formation Damage Prevention and TreatmentsPetroSync - Formation Damage Prevention and Treatments
PetroSync - Formation Damage Prevention and TreatmentsPetroSync
 

Viewers also liked (7)

AB.Cook.Stephen.Resume
AB.Cook.Stephen.ResumeAB.Cook.Stephen.Resume
AB.Cook.Stephen.Resume
 
The Horizon Report 2015 with Audience Participation using Paper Clickers (Pli...
The Horizon Report 2015 with Audience Participation using Paper Clickers (Pli...The Horizon Report 2015 with Audience Participation using Paper Clickers (Pli...
The Horizon Report 2015 with Audience Participation using Paper Clickers (Pli...
 
tahun 2015-04
tahun 2015-04tahun 2015-04
tahun 2015-04
 
Unidad 8 projecto Final
Unidad 8 projecto FinalUnidad 8 projecto Final
Unidad 8 projecto Final
 
Borang permohonan kemasukan murid prasekolah
Borang permohonan kemasukan murid prasekolahBorang permohonan kemasukan murid prasekolah
Borang permohonan kemasukan murid prasekolah
 
PetroSync - Formation Damage Prevention and Treatments
PetroSync - Formation Damage Prevention and TreatmentsPetroSync - Formation Damage Prevention and Treatments
PetroSync - Formation Damage Prevention and Treatments
 
Conectivismo o Conectismo: Carlos Machado Coyoc
Conectivismo o Conectismo: Carlos Machado CoyocConectivismo o Conectismo: Carlos Machado Coyoc
Conectivismo o Conectismo: Carlos Machado Coyoc
 

Similar to Mayo O&M slides (4-28-13)

Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...jemille6
 
Research Methodology
Research MethodologyResearch Methodology
Research MethodologyAneel Raza
 
“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”jemille6
 
D. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &LearningD. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &Learningjemille6
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severityjemille6
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilismjemille6
 
Steps in hypothesis.pptx
Steps in hypothesis.pptxSteps in hypothesis.pptx
Steps in hypothesis.pptxYashwanth Rm
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsificationjemille6
 
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017Big Data Spain
 
Hypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docxHypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docxadampcarr67227
 
Surviving statistics lecture 1
Surviving statistics lecture 1Surviving statistics lecture 1
Surviving statistics lecture 1MikeBlyth
 
Hypothesis....Phd in Management, HR, HRM, HRD, Management
Hypothesis....Phd in Management, HR, HRM, HRD, ManagementHypothesis....Phd in Management, HR, HRM, HRD, Management
Hypothesis....Phd in Management, HR, HRM, HRD, Managementdr m m bagali, phd in hr
 
20 OCT-Hypothesis Testing.ppt
20 OCT-Hypothesis Testing.ppt20 OCT-Hypothesis Testing.ppt
20 OCT-Hypothesis Testing.pptShivraj Nile
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testingpraveen3030
 
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBig Data Week
 

Similar to Mayo O&M slides (4-28-13) (20)

Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
Statistical "Reforms": Fixing Science or Threats to Replication and Falsifica...
 
Research Methodology
Research MethodologyResearch Methodology
Research Methodology
 
“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”“The importance of philosophy of science for statistical science and vice versa”
“The importance of philosophy of science for statistical science and vice versa”
 
D. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &LearningD. G. Mayo Columbia slides for Workshop on Probability &Learning
D. G. Mayo Columbia slides for Workshop on Probability &Learning
 
Error Control and Severity
Error Control and SeverityError Control and Severity
Error Control and Severity
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and ProbabilismStatistical Inference as Severe Testing: Beyond Performance and Probabilism
Statistical Inference as Severe Testing: Beyond Performance and Probabilism
 
Steps in hypothesis.pptx
Steps in hypothesis.pptxSteps in hypothesis.pptx
Steps in hypothesis.pptx
 
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and FalsificationP-Value "Reforms": Fixing Science or Threat to Replication and Falsification
P-Value "Reforms": Fixing Science or Threat to Replication and Falsification
 
Hypothesis
HypothesisHypothesis
Hypothesis
 
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017The Data Errors we Make by Sean Taylor at Big Data Spain 2017
The Data Errors we Make by Sean Taylor at Big Data Spain 2017
 
Hypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docxHypothesis TestingThe Right HypothesisIn business, or an.docx
Hypothesis TestingThe Right HypothesisIn business, or an.docx
 
Surviving statistics lecture 1
Surviving statistics lecture 1Surviving statistics lecture 1
Surviving statistics lecture 1
 
Hypothesis....Phd in Management, HR, HRM, HRD, Management
Hypothesis....Phd in Management, HR, HRM, HRD, ManagementHypothesis....Phd in Management, HR, HRM, HRD, Management
Hypothesis....Phd in Management, HR, HRM, HRD, Management
 
20 OCT-Hypothesis Testing.ppt
20 OCT-Hypothesis Testing.ppt20 OCT-Hypothesis Testing.ppt
20 OCT-Hypothesis Testing.ppt
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Basics of Research and Bias
Basics of Research and BiasBasics of Research and Bias
Basics of Research and Bias
 
Conceptual Challenges in Big Data Practices
Conceptual Challenges in Big Data PracticesConceptual Challenges in Big Data Practices
Conceptual Challenges in Big Data Practices
 
educ201.pptx
educ201.pptxeduc201.pptx
educ201.pptx
 
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal InferenceBDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
BDW17 London - Totte Harinen, Uber - Why Big Data Didn’t End Causal Inference
 

More from jemille6

D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfjemille6
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfjemille6
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022jemille6
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inferencejemille6
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?jemille6
 
What's the question?
What's the question? What's the question?
What's the question? jemille6
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metasciencejemille6
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...jemille6
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Twojemille6
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...jemille6
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testingjemille6
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredgingjemille6
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probabilityjemille6
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)jemille6
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)jemille6
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...jemille6
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (jemille6
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...jemille6
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...jemille6
 
The Statistics Wars and Their Casualties
The Statistics Wars and Their CasualtiesThe Statistics Wars and Their Casualties
The Statistics Wars and Their Casualtiesjemille6
 

More from jemille6 (20)

D. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdfD. Mayo JSM slides v2.pdf
D. Mayo JSM slides v2.pdf
 
reid-postJSM-DRC.pdf
reid-postJSM-DRC.pdfreid-postJSM-DRC.pdf
reid-postJSM-DRC.pdf
 
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
Errors of the Error Gatekeepers: The case of Statistical Significance 2016-2022
 
Causal inference is not statistical inference
Causal inference is not statistical inferenceCausal inference is not statistical inference
Causal inference is not statistical inference
 
What are questionable research practices?
What are questionable research practices?What are questionable research practices?
What are questionable research practices?
 
What's the question?
What's the question? What's the question?
What's the question?
 
The neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and MetascienceThe neglected importance of complexity in statistics and Metascience
The neglected importance of complexity in statistics and Metascience
 
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
Mathematically Elegant Answers to Research Questions No One is Asking (meta-a...
 
On Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the TwoOn Severity, the Weight of Evidence, and the Relationship Between the Two
On Severity, the Weight of Evidence, and the Relationship Between the Two
 
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
Revisiting the Two Cultures in Statistical Modeling and Inference as they rel...
 
Comparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple TestingComparing Frequentists and Bayesian Control of Multiple Testing
Comparing Frequentists and Bayesian Control of Multiple Testing
 
Good Data Dredging
Good Data DredgingGood Data Dredging
Good Data Dredging
 
The Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of ProbabilityThe Duality of Parameters and the Duality of Probability
The Duality of Parameters and the Duality of Probability
 
The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)The Statistics Wars and Their Causalities (refs)
The Statistics Wars and Their Causalities (refs)
 
The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)The Statistics Wars and Their Casualties (w/refs)
The Statistics Wars and Their Casualties (w/refs)
 
On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...On the interpretation of the mathematical characteristics of statistical test...
On the interpretation of the mathematical characteristics of statistical test...
 
The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (The role of background assumptions in severity appraisal (
The role of background assumptions in severity appraisal (
 
The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...The two statistical cornerstones of replicability: addressing selective infer...
The two statistical cornerstones of replicability: addressing selective infer...
 
The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...The replication crisis: are P-values the problem and are Bayes factors the so...
The replication crisis: are P-values the problem and are Bayes factors the so...
 
The Statistics Wars and Their Casualties
The Statistics Wars and Their CasualtiesThe Statistics Wars and Their Casualties
The Statistics Wars and Their Casualties
 

Mayo O&M slides (4-28-13)

  • 1. 5-4 1 Methodology and Ontology in Statistical Modeling: Some error statistical reflections Our presentation falls under the second of the bulleted questions for the conference: How do methods of data generation, statistical modeling, and inference influence the construction and appraisal of theories? Statistical methodology can influence what we think we’re finding out about the world, in the most problematic ways, traced to such facts as: • All statistical models are false • Statistical significance is not substantive significance • Statistical association is not causation • No evidence against a statistical null hypothesis is not evidence the null is true • If you torture the data enough they will confess. (or just omit unfavorable data) These points are ancient (lying with statistics, lies damn lies, and statistics) People are discussing these problems more than ever (big data), but it’s rarely realized is how much certain methodologies are at the root of the current problems
  • 2. 5-4 2 All Statistical Models are False Take the popular slogan in statistics and elsewhere is “all statistical models are false!” What the “all models are false” charge boils down to: (1) the statistical model of the data is at most an idealized and partial representation of the actual data generating source. (2) a statistical inference is at most an idealized and partial answer to a substantive theory or question. • But we already know our models are idealizations: that’s what makes them models • Reasserting these facts is not informative,. • Yet they are taken to have various (dire) implications about the nature and limits of statistical methodology • Neither of these facts precludes the use of these to find out true things • On the contrary, it would be impossible to learn about the world if we did not deliberately falsify and simplify.
  • 3. 5-4 3 • Notably, the “all models are false” slogan is followed up by “But some are useful”, • Their usefulness, we claim, is being capable of adequately capturing an aspect of a phenomenon of interest • Then a hypothesis asserting its adequacy (or inadequacy) is capable of being true! Note: All methods of statistical inferences rest on statistical models. What differentiates accounts is how well they step up to the plate in checking adequacy, learning despite violations of statistical assumptions (robustness)
  • 4. 5-4 4 Statistical significance is not substantive significance Statistical models (as they arise in the methodology of statistical inference) live somewhere between 1. Substantive questions, hypotheses, theories H 2. Statistical models of phenomenon, experiments, data: M 3. Data x What statistical inference has to do is afford adequate link-ups (reporting precision, accuracy, reliability)
  • 5. 5-4 5 Recent Higgs reports on evidence of a real (Higg’s-like) effect (July 2012, March 2013) Researchers define a “global signal strength” parameter H0: μ = 0 corresponds to the background (null hypothesis), μ > 0 to background + Standard Model Higgs boson signal, but μ is only indirectly related to parameters in substantive models As is typical of so much of actual inference (experimental and non), testable predictions are statistical: They deduced what would be expected statistically from background alone (compared to the 5 sigma observed) in particular, alluding to an overall test S: Pr(Test S would yields d(X) > 5 standard deviations; H0) ≤ .0000003. This is an example of an error probability
  • 6. 5-4 6 The move from statistical report to evidence The inference actually detached from the evidence can be put in any number of ways There is strong evidence for H: a Higgs (or a Higgs-like) particle. An implicit principle of inference is Why do data x0 from a test S provide evidence for rejecting H0 ? Because were H0 a reasonably adequate description of the process generating the data would (very probably) have survived, (with respect to the question). Yet statistically significant departures are generated: July 2012, March 2013 (from 5 to 7 sigma) Inferring the observed difference is “real” (non-fluke) has been put to a severe test Philosophers often call it an “argument from coincidence” (This is a highly stringent level, apparently in this arena of particle physics smaller observed effects often disappear)
  • 7. 5-4 7 Even so we cannot infer to any full theory That’s what’s wrong with the slogan “Inference to the “best” Explanation Some explanatory hypothesis T entails statistically significant effect. Statistical effect x is observed. Data x are good evidence for T. The problem: Pr(T “fits” data x; T is false ) = high And in other less theoretical fields, the perils of “theory- laden” interpretation of even genuine statistical effects are great [Babies look statistically significantly longer when red balls are picked from a basket with few red balls: Does this show they are running, at some intuitive level, a statistical significance test, recognizing statistically surprising results? It’s not clear]
  • 8. 5-4 8 The general worry reflects an implicit requirement for evidence: Minimal Requirement for Evidence. If data are in accordance with a theory T, but the method would have issued so good a fit even if T is false, then the data provide poor or no evidence for T. The basic principle isn’t new, we find it Peirce, Popper, Glymour….what’s new is finding a way to use error probabilities from frequentist statistics (error statistics) to cash it out To resolve controversies in statistics and even give a foundation for rival accounts
  • 9. 5-4 9 Dirty Hands: But these statistical assessments, some object, depend on methodological choices in specifying statistical methods; outputs are influence by discretionary judgments: dirty hands argument While it is obvious that human judgments and human measurements are involved, (like “all models are false”) it is too trivial an observation to distinguish how different account handle threats of bias and unwarranted inferences Regardless of the values behind choices in collecting, modeling, drawing inferences from data, I can critically evaluate how good a job has been done. (test too sensitive, not sensitive enough, violated assumptions)
  • 10. 5-4 10 An even more extreme argument, moves from “models are false”, to models are objects of belief, to therefore statistical inference is all about subjective probability. By the time we get to the “confirmatory stage” we’ve made so many judgments, why fuss over a few subjective beliefs at the last part…. George Box (a well known statistician) “the confirmatory stage of an investigation…will typically occupy, perhaps, only the last 5 per cent of the experimental effort. The other 95 per cent—the wondering journey that has finally led to that destination---involves many heroic subjective choices (what variables? What levels? What scales?, etc. etc…. Since there is no way to avoid these subjective choices…why should we fuss over subjective probability?” (70) It is one thing to say our models are objects of belief, and quite another to convert the entire task to modeling beliefs. We may call this shift from phenomena to epiphenomena (Glymour 2010) Yes there are assumptions, but we can test them, or at least discern how they may render our inferences less precise, or completely wrong.
  • 11. 5-4 11 The choice isn’t full blown truth or degrees of belief. We may warrant models (and inferences) to various degrees, such as by assessing how well corroborated they are. Some try to adopt this perspective of testing their statistical models, but give us tools with very little power to find violations • Some of these same people, ironically, say since we know our model is false, the criteria of high power to detect falsity is not of interest. (Gelman). • Knowing something is an approximation is not to pinpoint where it is false, or how to get a better model. [Unless you have methods with power to probe this approximation, you will have learned nothing about where the model stands up and where it breaks down, what flaws you can rule out, and which you cannot.]
  • 12. 5-4 12 Back to our question How do methods of data generation, statistical modeling, and analysis influence the construction and appraisal of theories at multiple levels? • All statistical models are false • Statistical significance is not substantive significance • Statistical association is not causation • No evidence against a statistical null hypothesis is not evidence the null is true • If you torture the data enough they will confess. (or just omit unfavorable data) These facts open the door to a variety of antiquated statistical fallacies, but the all models are false, dirty hands, it’s all subjective, encourage them. From popularized to sophisticated research, in social sciences, medicine, social psychology “We’re more fooled by noise than ever before, and it’s because of a nasty phenomenon called “big data”. With big data, researchers have brought cherry-picking to an industrial level”. (Taleb, Fooled by randomness 2013) It’s not big data it’s big mistakes about methodology and modeling
  • 13. 5-4 13 This business of cherry picking falls under a more general issue of “selection effects” that I have been studying and writing about for many years. Selection effects come in various forms and given different names: double counting,hunting with a shotgun (for statistical significance) looking for the pony, look elsewhere effects, data dredging, multiple testing, p- value hacking One common example: A published result of a clinical trial alleges statistically significant benefit (of a given drug for a given disease), at a small level .01, but ignores 19 other non-significant trials actually make it easy to find a positive result on one factor or other, even if all are spurious. The probability that the procedure yields erroneous rejections differs from, and will be much greater than, 0.01 (nominal vs actual significance levels) How to adjust for hunting and multiple testing is a separate issue (e.g., false discovery rates).
  • 14. 5-4 14 If one reports results selectively, or stop when the data look good, etc. it becomes easy to prejudge hypotheses: Your favored hypothesis H might be said to have “passed” the test, but it is a test that lacks stringency or severity. (our minimal principle for evidence again) • Selection effects alter the error probabilities of tests and estimation methods, so at least methods that compute them can pick up on the influences • If on the other hand, they are reported in the same way, significance testing’s basic principles are being twisted, distorted, invalidly used • It is not a problem about long-runs either—. We cannot say about the case at hand that it has done a good job of avoiding the source of misinterpretation, since it makes it so easy to find a fit even if false.
  • 15. 5-4 15 The growth of fallacious statistics is due to the acceptability of methods that declare themselves free from such error- probabilistic encumbrances (e.g., Bayesian accounts). Popular methods of model selection (AIC, and others) suffer from similar blind spots Whole new fields for discerning spurious statistics, non- replicable results; statistical forensics: all use error statistical methods to identify flaws (Stan Young, John Simonsohn, Brad Efron, Baggerly and Coombes) • All statistical models are false • Statistical significance is not substantive significance • Statistical association is not causation • No evidence against a statistical null hypothesis is not evidence the null is true • If you torture the data enough they will confess. (or just omit unfavorable data) To us, the list is not a list of embarrassments but justifications for the account we favor.
  • 16. 5-4 16 Models are false Does not prevent finding out true things with them Discretionary choices in modeling Do not entail we are only really learning about beliefs Do not prevent critically evaluating the properties of the tools you chose. A methodology that uses probability to assess and control error probabilities has the basis for pinpointing the fallacies (statistical forensics, meta statistical analytics) These models work because they need only capture rather coarse properties of the phenomena being probed: the error probabilities assessed are approximately related to actual ones. Problems are intertwined with testing assumptions of statistical models The person I’ve learned the most about this is Aris Spanos who will now turn to that.