2. Diederik Stapel, Psychologist
• Research Professor, Consumer
Science
• Director, Tilburg Institute for
Behavioral Economics Research
(TIBER)
• Faculty dean
• Ph.D. Psychology, Cum Laude,
University of Amsterdam, The
Netherlands
• Winner ASPO Best Dissertation
Award, Dutch Association of
Social Psychologists)
• Winner Jos Jaspars Early Career
Award, European Association for
Experimental Social Psychology
• Fulbright scholar
• Over 100 publications
3. Diederik Stapel, Gigantic Fraud
• At least 30 articles and
several book chapters
based on fabricated data
• Forfeited Ph.D.
• 12 students with
dissertations under
investigation
• University of Tilburg will
press criminal charges for
fraud and forgery
• Investigation ongoing,
likely to take a year or
more
4. THE
UNIVERSITIES
THE PUBLIC THE STUDENTS
THE FIELD
COLLEAGUES
THE
JOURNALS
5. Did Stapel fake his
research? Did he and his
students really make all
those people fill out forms
for an apple? Did Stapel
really cross-tabulate the
data? …
THE FIELD
Who cares? The
experiments are
preposterous. You‟d have to
be a highly trained social
psychologist, or a journalist,
to think otherwise.
-Andrew Ferguson
“The Chump Effect”
The Weekly Standard
6. It is important for a PhD student or
research Master‟s student to gain
personal experience of the entire
research process, including the
collection and processing of the data,
and certainly so where their own
research is involved. A number of Mr
Stapel‟s PhD students therefore
never experienced this process for
THE STUDENTS themselves. …
It was precisely because of the
isolated approach that the young
researchers were unaware that this
was not a normal state of affairs in
social psychology research.
-The Levelt Committee
“Interim Report Regarding the Breach of
Scientific Integrity Committed by Prof. D.A.
Stapel”
7. Looking back, this is a
mega-sized failure. Not only
was the research not value-
free, the results were
completely fake!
…I regret very much that
COLLEAGUES this has happened and I will
do everything I can to
recover the trust in scientific
work in social psychology.
-Roos Vonk
“Bewildered: Research on
„Psychology of Meat‟ is based on
fraud”
8. “Report
finds
THE
UNIVERSITIES massive
fraud at
Dutch
universities”
-Headline, Nature
9. • Poor collaboration
• Isolation of researchers
within the university
• Critical failure of peer
reviewers
• Bias toward positive results
(“Verification Factory”)
WHY • Uncritical view of data by
reviewers and colleagues
• Data hoarding
• Lack of independent officer
to report suspected fraud
• Lack of joint responsibility
for training researchers
Levelt Committee Report
10. • Better “integrity” training for PhD
students
• Appoint “Confidential Counselor
for Academic Integrity”
• Draft rules for protecting whistle
blowers specific to scientific
matters
• Dual supervisors for PhD
HOW •
candidates
Doctoral boards must ascertain
that data was collected and
analyzed by the candidate
• Publications must specify where
and how data were collected
• Research data must be held on
file and made available on request
for at least five years
• Publications must disclose where
data are held and how to access
Levelt Committee Report
11. From the ASA Code of Conduct:
15. Authorship Credit
(a) Sociologists take responsibility and
credit, including authorship credit, only for
work they have actually performed or to
which they have contributed.
(b) Sociologists ensure that principal
authorship and other publication credits
AUTHORSHIP are based on the relative scientific or
professional contributions of the
individuals involved, regardless of their
status. In claiming or determining the
ordering of authorship, sociologists seek to
reflect accurately the contributions of main
participants in the research and writing
process.
(c) A student is usually listed as principal
author on any multiple authored
publication that substantially derives from
the student's dissertation or thesis.
12. Marc Hauser, Evolutionary Biologist
• Professor, Harvard
College
• Co-director, Mind, Brain,
and Behavior Program
• Director, Cognitive
Evolution Lab
• NSF Young Investigator
Award
• Science medal from the
College de France
• Guggenheim Fellow
• ~200 articles published,
as well as 6 books
13. Marc Hauser, Fraud?
• Found solely responsible for
8 counts of academic
misconduct
• After a year‟s leave of
absence, faculty voted
overwhelmingly to bar him
from teaching
• Resigned in August 2011
• Other studies were
replicated by Hauser and
co-authors
• Harvard has not specified
the nature of his misconduct
• Internal documents suggest
that he falsified and
fabricated data
14. Leslie K. John, George
Loewenstein, and Drazen Prelec.
(forthcoming)
“Measuring the
Prevalence of
Questionable Research
Practices with Incentives
for Truth-telling”
Psychological Science
15. Admission rates and defensibility ratings, by item.
Note: Defensibility ratings were provided by respondents who admitted to having engaged in the given behavior.
Item Control (%) Bayesian Truth Odds Ratio Two-tailed p Mean defensibility
Serum (%) (Likelihood ratio) (SD)
0=Indefensible
1=Possibly defensible
2=Defensible
In a paper, failing to
report all of a study's
dependent
measures.
Deciding whether to
collect more data
after looking to see
whether the results
were significant.
In a paper, failing to
report all of a study's
conditions.
Stopping collecting
data earlier than
planned because
one found the result
that one had been
looking for.*
In a paper,
'Rounding off' a p
value (e.g. reporting
that a p value of .054
is less than .05)
In a paper,
selectively reporting
studies that 'worked.'
*Difference between experimental conditions significant at alpha ≤ 0.005
16. Admission rates and defensibility ratings, by item.
Note: Defensibility ratings were provided by respondents who admitted to having engaged in the given behavior.
Item Control (%) Bayesian Truth Odds Ratio Two-tailed p Mean defensibility
Serum (%) (Likelihood ratio) (SD)
0=Indefensible
1=Possibly defensible
2=Defensible
Deciding whether to
exclude data after
looking at the impact
of doing so on the
results.
In a paper, reporting
an unexpected
finding as having
been predicted from
the start.*
In a paper, claiming
that results are
unaffected by
demographic
variables (e.g.
gender) when one is
actually unsure (or
knows that they do).
Falsifying data.
*Difference between experimental conditions significant at alpha ≤ 0.005
17. Admission rates and defensibility ratings, by item.
Items are listed in decreasing order of judged defensibility.
Note: Defensibility ratings were provided by respondents who admitted to having engaged in the given behavior.
Item Control (%) Bayesian Truth Odds Ratio Two-tailed p Mean defensibility
Serum (%) (Likelihood ratio) (SD)
0=Indefensible
1=Possibly defensible
2=Defensible
In a paper, failing to
report all of a study's
63.4 66.5 1.14 0.23 1.84 (.39)
dependent
measures.
Deciding whether to
collect more data
after looking to see 55.9 58.0 1.08 0.46 1.79 (.44)
whether the results
were significant.
In a paper, failing to
report all of a study's 27.7 27.4 0.98 0.90 1.77 (.49)
conditions.
Stopping collecting
data earlier than
planned because
15.6 22.5 1.57 0.00 1.76 (.48)
one found the result
that one had been
looking for.*
In a paper,
'Rounding off' a p
value (e.g. reporting 22.0 23.3 1.07 0.58 1.68 (.57)
that a p value of .054
is less than .05)
In a paper,
selectively reporting 45.8 50.0 1.18 0.13 1.66 (.53)
studies that 'worked.'
*Difference between experimental conditions significant at alpha ≤ 0.005
18. Admission rates and defensibility ratings, by item.
Items are listed in decreasing order of judged defensibility.
Note: Defensibility ratings were provided by respondents who admitted to having engaged in the given behavior.
Item Control (%) Bayesian Truth Odds Ratio Two-tailed p Mean defensibility
Serum (%) (Likelihood ratio) (SD)
0=Indefensible
1=Possibly defensible
2=Defensible
Deciding whether to
exclude data after
looking at the impact 38.2 43.4 1.23 0.06 1.61 (.59)
of doing so on the
results.
In a paper, reporting
an unexpected
finding as having 27.0 35.0 1.45 0.00 1.5 (.60)
been predicted from
the start.*
In a paper, claiming
that results are
unaffected by
demographic
3.0 4.5 1.52 0.16 1.32 (.60)
variables (e.g.
gender) when one is
actually unsure (or
knows that they do).
Falsifying data.
0.6 1.7 2.75 0.07 0.16 (.37)
*Difference between experimental conditions significant at alpha ≤ 0.005