1. The document discusses open science approaches to addressing big research questions through multiple analysts studying the same dataset and research question, as well as data and analysis sharing initiatives.
2. It describes challenges in open science including variation in analyses and interpretations due to heterogeneity across datasets and studies.
3. Initiatives like OHDSI are highlighted as bridging data sharing and analyses while keeping data local, but heterogeneity across data sources and their impact on predictions is still a challenge.
Open science LMU session contribution E Steyerberg 2jul20
1. July 2, 2020
Open Science for research questions,
data, and analyses?
Ewout W. Steyerberg, PhD
Professor of Clinical Biostatistics and
Medical Decision Making
Thanks to many for assistance and inspiration,
including the GAP3 consortium, CENTER-TBI Study
3. Open Science: what is it in the Netherlands?
2-Jul-203 Insert > Header & footer
https://www.openscience.nl/
https://www.coalition-s.org/
4. Open vs closed science
Long ago
- Performed by few, elitarian scientists
- Doing private experiments
- Discussion in small, closed communities
Recent
- Science as a profession
- Protect data + code as intellectual property
- Aim for shocking findings in high IF journals
https://www.sciencemag.org/news/2020/06/whos-blame-these-three-scientists-are-heart-surgisphere-covid-19-scandal
5. Overall claim
“Open Science will make research better”
Vote pro / con
Aims today:
- Highlight some strong points in Open Science
- Hint at some challenges in Open Science
Reflections based on personal 30-yr research experience,
specific focus on prediction / decision making
2-Jul-205 Insert > Header & footer
7. Open science research questions: case 1
Example 1: Red cards and dark skin soccer players
https://psyarxiv.com/qkwst/
2-Jul-207 Insert > Header & footer
8. Open science research questions: case 1
• 29 teams involving 61 analysts; same dataset; same research question:
whether soccer referees are more likely to give red cards to dark skin
toned players than light skin toned players
• Estimated odds ratios 0.89 –2.93 (median 1.3)
• 20 teams: statistically significant positive effect, 9: non-significant relation
2-Jul-208 Insert > Header & footer
11. Open science research questions: case 1
• 29 teams involving 61 analysts; same dataset; same research question:
whether soccer referees are more likely to give red cards to dark skin toned
players than light skin toned players
• Estimated odds ratios 0.89 –2.93 (median 1.3).
• 20 teams: statistically significant positive effect, 9: non-significant relation.
• 21 unique combinations of covariates
• “Variation in analysis of complex data may be difficult to
avoid, even by experts with honest intentions”
2-Jul-2011 Insert > Header & footer
14. Findings not convincing
Cox, #4, 30 vars, max c =0.793
RF, #7, 600 vars, c=0.797
Elastic, #9, 600 vars, c=0.801
2-Jul-2014 Insert > Header & footer
15. Machine learning vs conventional modeling
1. Findings convincing?
“We found that random forests did not outperform Cox models despite their
inherent ability to accommodate nonlinearities and interactions. …
Elastic nets achieved the highest discrimination performance …, demonstrating
the ability of regularisation to select relevant variables and optimise model
coefficients in an EHR context.”
2-Jul-2015 Insert > Header & footer
16. Machine learning vs conventional modeling
1. Findings convincing? Not in case-study
2. Systematic / ”it depends” ?
2-Jul-2016 Insert > Header & footer
19. Open science research questions: case 2
• 243 real datasets from “the OpenML database”
• RF performed better than LR:
mean difference between RF and LR was 0.041 (95%-CI =[0.031,0.053]) for
the Area Under the ROC Curve
• Results were dependent on the inclusion criteria used to select the example
datasets
• ES: Results rely on 10 x 10-fold cross-validation
2-Jul-2019 Insert > Header & footer
20. Open science research questions: case 2
• More clarification needed when ML / RF works best; at least large N needed
2-Jul-2020 Insert > Header & footer
23. Summary on examples of Open Science
to better address Big research questions
• 1 data set
• multiple modelers
• Multiple modeling options
• 1 neutral comparison; 243 OpenML databases
• Review of 282 comparative studies: meta-research
2-Jul-2023 Insert > Header & footer
37. Open Science challenge:
dealing with heterogeneity
Heterogeneity
• Study design
• Selection of subjects
• Measurement of covariates
• Measurement of outcomes
• Associations of covariates with outcome
• Overall outcome rates
• Performance of prediction models
44. “Open Science will make research better”
1. Research questions in competitions
• Red cards
• Neutral comparisons / meta-analysis
2. Data sharing
• old-fashioned?
3. Analyses
• OHDSI: modern
• Heterogeneity
Open science research extends discussions from meta-analysis;
contrast Cochrane reviews vs Big Data
2-Jul-2044 Insert > Header & footer