Aspirational Block Program Block Syaldey District - Almora
MLPA for health care presentation smc
1. Texas State University SHLC Presentation
Shaun Comfort, MD, MBA
Associate Director of Risk Management
Genentech, A Member of the Roche Group
This presentation represents the opinions of Dr. Comfort,
and not that of Genentech, A Member of the Roche Group.
2. Common Buzzwords
2
• Artificial Intelligence (AI) The theory and development of computer
systems able to perform tasks that normally require human intelligence such
as vision, speech recognition, decision-making, and translation. (Source:
Google Search)
• Machine Learning (ML) - Machine learning is a type of (AI) that provides
computers with the ability to learn without being explicitly programmed.
(Source: WhatIs.com).
• Predictive Analytics (PA) - Predictive analytics uses statistical
algorithms and machine learning techniques to identify the likelihood of
future outcomes based on historical data. (Source:
https://www.sas.com/en_us/insights/analytics/predictive-analytics.html).
• For this presentation we assume that ML and PA are synonyms
3. Machine Learning
3Source: Downloaded Google Images
Unlike traditional programming (aka “coding”), ML uses a
set of input data and the answers (aka “output”,
“response”, etc) to build a program
4. Some Applications of ML
General “Supervised” Learning Flow and Examples:
Input(s) Fitting Function(s) Output(s)
Annotated Emails Naïve Bayes Spam (Yes/No?)
“ “ Financial Data CART/Partition Fraud(Yes/No?)
“ “ Google Images Deep ANN(s) Image ID(Cat Y/N?)
“ “ Starfield Maps Deep ANN(s) Asteroid (Y/N?)
Inputs Fitting Function Outputs
Source: Adapted from Andrew Ng Lecture: Artificial Intelligence is the New Electricity,
Stanford MSx Future Forum. January 25, 2017
5. What Can ML Do for Healthcare?
Some Potential Examples for “Supervised” ML:
Input(s) Fitting Function Output(s)
EHRs, Lab Data CART/Partition
Predict High Risk Re-
Admit Patients
EHRs, Lab Data CART/Partition
Medical Diagnostic
Decision Trees
Payer Data, EHRs
Deep ANN(s), Log
Regression, etc
ID Adverse Events
Hospital Operations,
Pharmacy data
Deep ANN(s) Improve efficiency
Inputs Fitting Function Outputs
6. SML/PA Process
What Question do you want to answer (eg, identify high
utilizers for intervention, predict re-admissions) ?
What data do you have to train a model?
What features (ie, predictors, factors, etc) in your data do
you want to use?
What kind of model do you want to use (eg, K-NN, ANN,
Logistic Regression, Random Forests, etc)?
What metrics must your model meet? High precision, high
predictability, etc?
7. Performance Yardsticks
Some common ‘metrics’ used to gauge a model’s
performance include:
Classification Models (eg, High Re-Admission Risk Y/N?,
Adverse Event Y/N?, etc)
Inter-rater Agreement Scores (eg, Kappa), Sensitivity, Specificity,
Precision, Recall, False Positive/Negative Rate
Regression Models (eg, Forecasting Hospital Census,
Resource modeling, etc)
Root Mean Square Forecast Error, etc
8. IV Catheter Insertion Example
Mann 2014, identified predictive variables for successful
intravenous catheter (IV) insertion, based on data from 592
children in two hospitals.
The dataset provided with JMP-SAS Software, used for this exercise.
Goal:
Predict - Prob(success) of starting IV on first try using 17 features (17)):
Mean Difficulty, Mean Nurse Experience, Active Minutes, Nurse
Competency Scores, etc
Technique – Random Forest using Bootstrap Aggregation, 416/176
Training/Validation cases, Trees/Forest = 6, Terms/Split = 4
Source: J. Mann, P. Larsen, and J. Brinkley, "Exploring the use of negative binomial regression modeling for pediatric peripheral
intravenous catheterization", Journal of Medical Statistics and Informatics, Vol. 2, Article 6, 2014
10. Model Performance Results
Key Results are as follows:
Generalized r2 = 0.79, Misclassification Rate = 10%
Sensitivity (Pos % Agreement) = 79.7%*
Specificity (Neg % Agreement) = 97.1%*
Positive Predictive Value (Precision) = 95.2%*
Recall (% of True Assessments Positive) = 79.7%*
Gwet AC1 Kappa = 80.6%*
F-Score = 86.8%*
ROC Area Under the Curve = 0.97*
Conclusion – Model shows high predictive agreement with
actual validation (hold out) data
*Results based on the validation (not training) data set. All analysis performed using JMP-SAS 13 Pro
11. Catheter Insertion Example, cont.
Resulting ROC and Confusion Matrix with Validation Data
ROC on Validation Data
Actual
Count
Predicted
Count
Success 1 No Yes
No 99 3
Yes 15 59
Confusion Matrix
12. Catheter Insertion Example, cont.
Which features were most important?
Mean difficulty score most important + mean nurse experience, active minutes,
and competency at IV placements following.
Term
Number
of Splits
G^2 Portion
Mean Difficulty 22 163.388829 0.6582
Mean Nurse Exp 13 15.7981274 0.0636
Mean Active Minutes 16 13.5956915 0.0548
Mean Nurse Comp 13 12.2800927 0.0495
Mean Distress 12 8.05889412 0.0325
Weight 4 6.83369539 0.0275
Lost IV 9 4.94248458 0.0199
Shift 7 4.6603365 0.0188
Mean Cooperative 8 4.18512134 0.0169
Age 7 3.934808 0.0159
Gender 7 2.20229066 0.0089
Device Assisted 4 2.15128853 0.0087
Dehydrated 7 1.99162831 0.0080
Previous IV 6 1.62364168 0.0065
Counselor Present 4 1.17042892 0.0047
Family Present 7 0.87367306 0.0035
Expert RN 1 0.32490665 0.0013
Support Present 3 0.23575945 0.0009
13. Conclusions
Catheter Insertion Prediction Model –
Random Forest Model with high predictive ability to estimate
chance of successful pediatric IV insertion on 1st try
Key insights are that the mean difficulty assessment score for the IV
placement is by far the most predictive feature determining
outcome
Nurse experience, active time spent on IV insertions, and
competency scores on IV insertions are the next most important
predictors so:
Use your most experienced IV RNs on your most difficult patients to
maximize ‘first time’ successful insertion
Train and ‘score’ your IV insertion nurses to assess competency, etc for
successful insertions
14. Some Final Thoughts
The Good News –
MLPA techniques have been used with great success in many industries
The rise of large datasets, hardware advancements, and investments in AI
are paying off with the rush towards Supervised Machine Learning
solutions
The Health Care Industry (ie, Medicine, HC Delivery, Pharma, Med Dev, etc)
can derive similar benefits with appropriate adoption of this technology
The Bad News: Garbage In “Still” = Garbage Out (GIGO)
Not even super-AI can develop meaningful insights from trash
Invest in collecting/cleaning your data appropriately.
Solid, clean data is “gold dust” for predictive modelling. Treat it as such!!
Compare your model results to human subject matter expert performance
whenever possible, this is your best ‘ground truth’