3. My solution
RiskPredictor
A tool that identifies
those people at risk1
and their underlying
conditions
1 Risk is defined as an individual’s predicted healthcare cost
4. The data: unique, rich and messy
Unique de-identified data set from Zakipoint
that combines three types of information:
There are about 250,000 rows and 2,000 people
in these datasets
6. Model performs very well
1 http://us.milliman.com/mara/
Model performance in line with proprietary programs which cost $100,000+
There is room for improvement (more data, more features, etc.)
R2Model
RiskPredictor - Random Forest
ACG1 (Commercial)
RiskPredictor – Linear (Ridge)
MARA1 (Commercial)
20.5%
29.7%
34.4%
57.9%
12. Model performs very well
1 http://us.milliman.com/mara/
There is room for improvement (more data,
more features, etc.)…
… but model performance is in line with
proprietary programs which cost $100,000+
R2Model
RiskPredictor - Random Forest
ACG1 (Commercial)
RiskPredictor – Linear (Ridge)
MARA1 (Commercial)
20.5%
29.7%
34.4%
57.9%
18. Data
Unique data set from zph that combines:
1) Claim information (ICD-9 codes, medical costs, gender, age)
2) Biometric information (BMI, BP, cholesterol, A1C, etc.)
3) Behavioral (HRA and wellness program participation)
The dataset contains 2k lives and is in csv format (masked)
Obese
Overweight
Normal
Underweight (5)
(245)
(430)
(411)
19. Algorithm anatomy
Raw data
(icd-9 codes, BMI)
1000s diagnostic features
(e.g. diabetes, obesity, etc.)
Regression
(linear & random forest)
Predicted cost
(risk scores)