Open science LMU session contribution E Steyerberg 2jul20

July 2, 2020
Open Science for research questions,
data, and analyses?
Ewout W. Steyerberg, PhD
Professor of Clinical Biostatistics and
Medical Decision Making
Thanks to many for assistance and inspiration,
including the GAP3 consortium, CENTER-TBI Study

Open Science: what is it at LMU?
2-Jul-202 Insert > Header & footer

Open Science: what is it in the Netherlands?
https://www.openscience.nl/
https://www.coalition-s.org/

Open vs closed science
Long ago
- Performed by few, elitarian scientists
- Doing private experiments
- Discussion in small, closed communities
Recent
- Science as a profession
- Protect data + code as intellectual property
- Aim for shocking findings in high IF journals
https://www.sciencemag.org/news/2020/06/whos-blame-these-three-scientists-are-heart-surgisphere-covid-19-scandal

Overall claim
“Open Science will make research better”
Vote pro / con
Aims today:
- Highlight some strong points in Open Science
- Hint at some challenges in Open Science
Reflections based on personal 30-yr research experience,
specific focus on prediction / decision making

Open Science to better address
Big research questions

Open science research questions: case 1
Example 1: Red cards and dark skin soccer players
https://psyarxiv.com/qkwst/

• 29 teams involving 61 analysts; same dataset; same research question:
whether soccer referees are more likely to give red cards to dark skin
toned players than light skin toned players
• Estimated odds ratios 0.89 –2.93 (median 1.3)
• 20 teams: statistically significant positive effect, 9: non-significant relation

Estimated odds ratios by 29 research teams

“Logistic regression”

• 29 teams involving 61 analysts; same dataset; same research question:
whether soccer referees are more likely to give red cards to dark skin toned
players than light skin toned players
• Estimated odds ratios 0.89 –2.93 (median 1.3).
• 20 teams: statistically significant positive effect, 9: non-significant relation.
• 21 unique combinations of covariates
• “Variation in analysis of complex data may be difficult to
avoid, even by experts with honest intentions”

Machine learning vs conventional modeling
1. Findings convincing?
2. Systematic / ”it depends” ?

Findings not convincing
Cox, #4, 30 vars, max c =0.793
RF, #7, 600 vars, c=0.797
Elastic, #9, 600 vars, c=0.801

1. Findings convincing?
“We found that random forests did not outperform Cox models despite their
inherent ability to accommodate nonlinearities and interactions. …
Elastic nets achieved the highest discrimination performance …, demonstrating
the ability of regularisation to select relevant variables and optimise model
coefficients in an EHR context.”

1. Findings convincing? Not in case-study
2. Systematic / ”it depends” ?

• 243 real datasets from “the OpenML database”
• RF performed better than LR:
mean difference between RF and LR was 0.041 (95%-CI =[0.031,0.053]) for
the Area Under the ROC Curve
• Results were dependent on the inclusion criteria used to select the example
datasets
• ES: Results rely on 10 x 10-fold cross-validation

• More clarification needed when ML / RF works best; at least large N needed

Systematic review on ML vs classic modeling

Summary on examples of Open Science
to better address Big research questions
• 1 data set
• multiple modelers
• Multiple modeling options
• 1 neutral comparison; 243 OpenML databases
• Review of 282 comparative studies: meta-research

Heterogeneity in data .. ignored

Data sharing
• Pro:
• Allowed for larger sample size in a rare disease
• Cons:
• Heterogeneity?
• Substantial politics / efforts

Open Science: analyses and interpretation

OHDSI: bridging data sharing - analyses

Analyses: ODHSI model

OHDSI: COVID and other research topics

The power of OHDSI

OMOP common data model enables sharing of
model development code

Performance for different outcomes in multiple cohorts

OHDSI: bridging data sharing - analyses
• Keep data local
• Run locally started, centrally available analyses
• Share results centrally

Open Science challenge:
dealing with heterogeneity
Heterogeneity
• Study design
• Selection of subjects
• Measurement of covariates
• Measurement of outcomes
• Associations of covariates with outcome
• Overall outcome rates
• Performance of prediction models

Analyses: dealing with heterogeneity

15 cohorts: 11 RCTs, 4 Observational studies

Heterogeneous case-mix

Heterogeneous predictor effects

Heterogeneous predictions

Heterogeneity in individual predictions

“Open Science will make research better”
1. Research questions in competitions
• Red cards
• Neutral comparisons / meta-analysis
2. Data sharing
• old-fashioned?
3. Analyses
• OHDSI: modern
• Heterogeneity
Open science research extends discussions from meta-analysis;
contrast Cochrane reviews vs Big Data

Open science LMU session contribution E Steyerberg 2jul20

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Open science LMU session contribution E Steyerberg 2jul20

Semelhante a Open science LMU session contribution E Steyerberg 2jul20 (20)

Último

Último (20)

Open science LMU session contribution E Steyerberg 2jul20