Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ML and AI: a blessing and curse forstatisticians and medical doctors
1. ML and AI:
a blessing and curse for
statisticians and medical doctors
Maarten van Smeden
University Medical Center Utrecht
Julius Center for Health Sciences and Primary Care
The Netherlands
Twitter: @MvanSmeden
Email: M.vanSmeden@umcutrecht.nl
STRATOS member (TG6: diagnostic tests and prediction models)
9 March 2020
Freiburg, Germany, Institut für Medizinische Biometrie und Statistik
Biometrischen Kolloquium
Sides available at https://www.slideshare.net/MaartenvanSmeden
I have no conflicts of interest to declare
2. Freiburg, 9 March 2019 Twitter: @MaartenvSmedenhttps://bit.ly/2CwW43A
3. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Terminology
In medical research, “artificial intelligence” usually
just means “machine learning” or “algorithm”
4. Freiburg, 9 March 2019 Twitter: @MaartenvSmedenhttps://bit.ly/2v2aokk
5. Freiburg, 9 March 2019 Twitter: @MaartenvSmedenhttps://bit.ly/2TOdd0F
6. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Tech company business model
7. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Tech company business model
https://bit.ly/2HSp8X5; https://bit.ly/2Z0Pfop; https://bit.ly/2KIcpHG; https://bit.ly/33IJhr9
8. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Other success stories
https://go.nature.com/2VG2hS7; https://bbc.in/2Z1drXQ; https://bit.ly/2TAfRIP
9. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
IBM Watson winning Jeopardy! (2011)
https://bbc.in/2TMvV8I
10. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
IBM Watson for oncology
https://bit.ly/2LxiWGj
15. Freiburg, 9 March 2019 Twitter: @MaartenvSmedenhttps://bit.ly/38A1ng0
16. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
“Everything is an ML method”
https://bit.ly/2lEVn33
17. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
“ML methods come from computer science”
https://bit.ly/2zhbwPv; https://stanford.io/2TVp1xK; https://stanford.io/2ZfED0k
Leo Breiman Jerome H Friedman Trevor Hastie
CART, random forest Gradient boosting Elements of statistical learning
Education Physics/Math Physics Statistics
Job title Professor of Statistics Professor of Statistics Professor of Statistics
18. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
“ML methods for prediction, statistics for explaining”
Damen, BMJ, 2016, DOI:10.1136/bmj.i2416
363 developed models how many?
Decision trees 0
Random forests 0
Support vector machines 0
Nearest neighbor algorithms 0
Neural networks 1
19. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
“ML methods for prediction, statistics for explaining”
1See further: Kreiff and Diaz Ordaz; https://bit.ly/2m1eYdK
ML and causal inference, small selection1
• Superlearner (e.g. van der Laan)
• High dimensional propensity scores (e.g. Schneeweiss)
• The book of why (Pearl)
20. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Two cultures
Breiman, Stat Sci, 2001, DOI: 10.1214/ss/1009213726
21. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Statistics Machine learning
Covariates Features
Outcome variable Target
Model Network, graphs
Parameters Weights
Model for discrete var. Classifier
Model for continuous var. Regression
Log-likelihood Loss
Multinomial regression Softmax
Measurement error Noise
Subject/observation Sample/instance
Dummy coding One-hot encoding
Measurement invariance Concept drift
Statistics Machine learning
Prediction Supervised learning
Latent variable modeling Unsupervised learning
Fitting Learning
Prediction error Error
Sensitivity Recall
Positive predictive value Precision
Contingency table Confusion matrix
Measurement error model Noise-aware ML
Structural equation model Gaussian Bayesian network
Gold standard Ground truth
Derivation–validation Training–test
Experiment A/B test
Adapted from Daniel Obserski: https://bit.ly/2YN12Xf and Robert Tibshirani: https://stanford.io/2zqEGfr
Language
22. Freiburg, 9 March 2019 Twitter: @MaartenvSmedenRobert Tibshirani: https://stanford.io/2zqEGfr
Machine learning: large grant = $1,000,000
Statistics: large grant = $50,000
23. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
ML refers to a culture, not to methods
Distinguishing between statistics and machine learning
• Substantial overlap methods used by both cultures
• Substantial overlap analysis goals
• Attempts to separate the two frequently result in disagreement
Pragmatic approach:
I’ll use “ML” to refer to models roughly outside of the traditional regression
types of analysis: decision trees (and descendants), SVMs, neural networks
(including Deep learning), boosting etc.
26. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Example: retinal disease
Gulshan et al, JAMA, 2016, 10.1001/jama.2016.17216; Picture retinopathy: https://bit.ly/2kB3X2w
Diabetic retinopathy
Deep learning (= Neural network)
• 128,000 images
• Transfer learning (preinitialization)
• Sensitivity and specificity > .90
• Estimated from training data
27. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Example: lymph node metastases
Bejnordi et al, JAMA, 2018, doi: 10.1001/jama.2017.14585. See our letter to the editor for a critical discussion: https://bit.ly/2kcYS0e
Deep learning competition
But:
• 390 teams signed up, 23 submitted
• “Only” 270 images for training
• Test AUC range: 0.56 to 0.99
28. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Deep learning on images
Many similar studies and challenges in
• radiology
• pathology
• dermatology
• opthalmology
• gastroenterology
• cardiology
• ….
Topol, Nature Medicine, 2019, DOI: 10.1038/s41591-018-0300-7
29. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
The future?
Topol, Nature Medicine, 2019, DOI: 10.1038/s41591-018-0300-7
30. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Other sources of “medical” data
• Large scale gene expression data
• e.g. diagnosis of acute myeloid leukemia
https://bit.ly/2k8Ao8e
• Prognostication by text mining electronic health records
• e.g. predicting life expectancy
https://bit.ly/2k8Ao8e
• Analyzing social media posts
• e.g. pharmacovigilance, adverse events monitoring via Twitter posts
https://bit.ly/2m0KKrg
• Speech signal processing
• e.g. Parkinson‟s disease,
https://bit.ly/2v3ZdHR
34. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Skin cancer and rulers
Esteva et al., Nature, 2016, DOI: 10.1038/nature21056; https://bit.ly/2lE0vV0
35. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Predicting mortality – the conclusion
PlosOne, 2018, DOI: 10.1371/journal.pone.0202344
36. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Predicting mortality – the results
PlosOne, 2018, DOI: 10.1371/journal.pone.0202344
37. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Predicting mortality – the media
PlosOne, 2018, DOI: 10.1371/journal.pone.0202344; https://bit.ly/2Q6H41R; https://bit.ly/2m3RLrn
41. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Comparison “ML” vs statistical models
• “ML” versus statistical models is a false dichotomy
• Advanced “ML” shows promise, especially in areas that are
not the traditional “tabular data” (e.g. images, sound)
• Tabular data settings where “ML” can be compared with
traditional regression model techniques show little added value
in medical applications
42. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Sources of prediction error
Y = # $ + &
For a model ' the expected test prediction error is:
σ)
+ bias) .#/ $ + var .#/ $
See equation 2.46 in Hastie et al., the elements of statistical learning, https://stanford.io/2voWjra
Irreducible error Mean squared prediction error
(with E & = 0, var & = 9)
, values in $ are not random)
What we don’t model How we model
≈≈
43. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Irreducible error is often large
• Health and lack thereof complex to measure (‘no gold standard’)
• Predictors of diseases are often imperfectly and partly
measured
• We often don’t know all the causal mechanisms at play
• much easier to predict if you know the causal mechanisms!
• Predicting the future even more difficult
Understanding prediction uncertainty is key
Courtesy Cecile Janssens: https://bit.ly/2Jf5ft6
44. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Classification versus risk prediction
Most “ML” classifiers don’t come naturally with risk prediction, i.e.
a probability estimate of predicted outcome for individuals
45. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Classification versus risk prediction
Most “ML” classifiers don’t come naturally with risk prediction, i.e.
a probability estimate of predicted outcome for individuals
• Models can be trained to be optimized for a certain predictive
performance (e.g. AUC, classification accuracy, calibration)
• Which performance to use to compare models are optimized
for different types of performance?
• Possibly much large sample size needed to obtain reliable
(calibrated) risk predictions1 than reliable classifications
Van Smeden et al., Stat Meth Med Res, 2019, doi: 10.1177/0962280218784726
46. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Flexible algorithms are data hungry
From slide deck Ben van Calster: https://bit.ly/38Aqmjs
47. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Where do we stand on “ML” vs doctors?
Radiology and pathology
• Article hits: 12,000
• After screening: 22
• Out-of-sample comparison “ML” vs doctors: 2
Faes et al., Lancet Digital Health, 2019, doi: 10.1016/S2589-7500(19)30123-2
48. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Some personal observations
• Doctors did not work under realistic time constraints and/or no
access to all regular diagnostic information
• The output generated by algorithms and physicians not
evaluated on the same scale
• Apparent (optimistic) model performance vs medical doctors
Van Smeden et al., JAMA, 2018, doi: 10.1001/jama.2018.1466
50. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Algorithms, the environment and costs
The costs of running (cloud computing) the Transformer algorithm
are estimated at 1 to 3 million Dollars
https://bit.ly/33Dj38X
51. Freiburg, 9 March 2019 Twitter: @MaartenvSmeden
Final remarks
• Algorithms are high maintenance
• Developed models need repeated testing and updating to
remain useful over time and place
• Many new barriers: black box proprietary algorithms, computing
costs
• Regulation and quality control of algorithms
• New data quality issues
52. Freiburg, 9 March 2019 Twitter: @MaartenvSmedenhttps://twitter.com/DrHughHarvey/status/1230218991026819077