FlorenceAI: Reinventing Data Science at Humana

FlorenceAI
Reinventing Data Science at Humana
David Mack, PhD
Cognitive/Machine Learning Principal
AI Engineering, Digital Health and Analytics
TM
A more human way to healthcareTM

David Mack, PhD – Cognitive/Machine Learning Principal
I have worked at Humana for 5½ years in clinical and enterprise
data science. I have been one of the primary architects and
maintainers of Humana’s ML Platform for the past 2 years that
now serves hundreds of data scientists. I love to tinker with
homemade IoT devices, build cool stuff, and learn new things!
Humana’s bold goal is to address the needs of the whole person
Have focused on community partnerships and social determinants of health
Commitment to help our millions of members achieve their best health
Fortune 50 company with $77.2bn consolidated revenue in 2020
Humana has invested significant resources into fighting:
• COVID-19 Pandemic
• Food Insecurity
• Loneliness and Social Isolation
• Inequities in Healthcare
Formed Digital Health and Analytics Organization in 2018
Through advanced analytics, experiential design, data and technology we are
working to meet our associates, members and the communities we serve,
anytime, anywhere, anyhow

What exactly is FlorenceAI*?
| 3
A cloud platform for automating and accelerating the delivery
lifecycle of data science solutions at scale in Azure
Key Foundational Pillars
• Feature stores
• Starter code frameworks
• Notebook based workflow
• Prod deployment partnership
• Extensive training curriculum
End-to-end ecosystem benefits
• Empowers data scientists to solve complex problems
• Promotes access to open-source innovation
• Simplifies model consumption with single interface
• Transforms workflows to improve performance
Microsoft Azure Cloud
Foundational Components
Other Key Tools
* Patent Pending

Feature Stores – Quality Ingredients for ML Algorithms
| 4
Extensive Metadata
• Standard descriptions
• Centralized ref tables
• Ratings to identify any
quality impacts
• Enables discovery and
exploration
Tens of thousands of features available for training and scoring
with hundreds of instances available across multiple years
Economies of Scale
• Pre-computed for
entire population
• Refreshed regularly at
different cadences
• Production ready and
pre-validated
Flexible but Specific
• Designed to cover
most use cases
• Domain expertise in
feature design
• Self-service for
custom situations

End-to-End Process
| 5
Cohort
Design
Initial Feature
Selection
Model Training
Experiments
Score and Register
Best Model
Record Training
Artifacts
Scoring Code
and Testing
Promote Model and
Automate Scoring

Example Problem to Help Trace the Workflow
| 6
12 months of history
Over 11 months of enrollment
6 months looking forward
Continuous enrollment
Fixed Calendar Date
Age ≥ 65, Medicare Advantage
Evidence of CKD stage in Medical Claims or Lab Results
Predict the most severe stage of Chronic Kidney Disease in the next 6 months
Criteria to Define the Cohort
All code snippets shown in subsequent slides are for illustrative purposes only and may have certain field names or variables redacted for security

Initial Feature Selection and
Traditional Model Training

Walkthrough:
Initial Feature Selection Notebook
Goal:
Identify hundreds of important
features among tens of thousands

First Round of Model Experimentation using SparkML
| 9
Helper Function to execute
the run available in shared
“experiment utility”

Arrive at a “Best Model” using SparkML
| 10
Different helper function to
save the best model and
provide more details
Accuracy alone isn’t always enough, so it’s important
to have views like ROC curves or Heatmaps to help
catch potential mistakes early

Walkthrough:
SparkML Helper Functions
Goals:
Abstract complexity and
standardize logging

Encouraging Reproducibility with Reusable Code
| 12
What items are automatically saved to the MLFlow run?
• Hyperparameters
• Relevant Metrics
• MLFlow model object
• Evaluation Metric Figure (Downloadable)
What other artifacts are saved to ADLS?
• Original Input Schemas before any indexing or feature prep
• Original Training and Test Datasets with just selected features
• String Indexes and Imputation Dictionaries (outside of pipeline models)
• Best Model Scores from both training and test data
Storage
Account
Scoped
Workspace
Scoped

Applying Deep Neural Networks
to Tabular Data at Scale

Key Distinctions of Deep Neural Networks
| 14
Multiclass
Example
Learns over
repeated passes
called “epochs”
What extra things can we do to help us decide which model is the best?
• Use early stopping to minimize training time and combat overfitting
• Use callbacks to log values at the end of each epoch
• Test on smaller chunks of data and scale up as we learn more

Bayesian Hyperparameter Searching with Hyperopt
| 15
Attempts to minimize
our loss function
Can set our hyperparameter space and the
number of trials we want to run
Used a sample of our training data to go
quickly over the 20 trials we chose to run

MLFlow has a Handy Comparison Tool to Help us Focus
| 16
Quick Insights: Complex Layer 1 and Complex Layer 2 don’t do well
Complex Layer 1 with Simpler Layer 2 do much better
Can highlight
ranges to focus
our attention

Let’s use MORE Data with Distributed Training!
| 17
Driver Only Petastorm
Petastorm &
Horovod
1 MM members
1 Worker
6 sec per epoch
Lots of trials to narrow
down our choices
10 MM members
1 Worker
63 sec per epoch
Using all the data, but
takes forever
10 MM members
16 Workers
14 sec per epoch
Train on all the data
much more quickly
We generally see a sqrt(n) speed up over a single worker
Using Petastorm and Horovod, we used all the data and trained 4.5x faster

Walkthrough:
Petastorm and Horovod
Helper Functions
Goals:
Save headaches and empower
data scientists to train on all of the
data quickly

We Improved the Precision of our Model!
| 19
We don’t see as much over-prediction of the majority class
and see better precision in the mid-range classes
SparkML Logistic Regression Tensorflow NN on all the Data
Weighted f1 score = 0.615
(prw = 0.633, rcw = 0.609)
Weighted f1 score = 0.615
(prw = 0.646, rcw = 0.602)

Register, Score, and Preserve
the Model Before Deploying
it to Production

Scoring with a Spark UDF from MLFlow
| 21
• This allows us to easily get the scores into a Spark dataframe from any MLFlow model
• Can repeat for other types of targets or our training DF

Registering the Model
| 22
Model Metadata
(Screenshot from Models Tab in DB Workspace)
First registered in the Data
Scientist’s dev DB workspace
The Data Scientist promotes it to
“production” status in the dev
workspace after review
The associated MLFlow run is used
to also register it in our “production”
workspace for automated jobs
This newly registered model
is the official version used for
automated scoring
The path within the ADLS storage account contains the version so we can support multiple versions at the same time

Production Deployment Pipeline – Notebook-based Workflow
| 23
Key Requirements
• Use Azure DevOps to deploy code to various environments for testing and execution
• Tie execution to specific package versions and LTS non-ML Databricks Runtimes
• Use ADF Parameters to provide flexibility to minimize YAML code duplication
Reusable Framework of 3 notebooks: Feature Engineering, Scoring, Validation
Upstream Dependency Check
to prevent flow of bad data
and errors from missing data Logging via SQL Server to record
both success and failure

Partnership Between Data Scientists and AI Engineers is Pivotal
| 24
Each of the required files needed for deployment are part of the starter repo
and help the data scientist to have the end goal in view from the beginning
Each model is initially reviewed
and subsequently monitored for
AI Bias in key areas
All models are peer reviewed for both domain and
technical accuracy prior to production deployment

Key Early Wins – big steps forward
Scaling and automating clunky processes
• Scaled from less than 40 condition flags on-premise to over 3x this in the cloud
• Got contributions from multiple teams following templates
• Now updates over 1 bn rows daily in 1.5 hours for entire member population
Faster prep, more iterations, better tuning and collaboration
• Reduced feature engineering step on very large source from hours to a few min
• Enabled DS team to iterate on models faster, going from 5+ hours for training to a
half hour or less, even for complex GBT models
• Reduced scoring step on prospective members from a week to 30 minutes
Shared resources accelerate everyone
• Hundreds of feature stores mean less process/data duplication and more time to
improve model design with a variety of approaches
• Flexibility to score at scale regardless of algorithm package in automated fashion
with a common output format

A more human way to healthcareTM

Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

FlorenceAI: Reinventing Data Science at Humana

Recommended

More Related Content

What's hot

What's hot (20)

Similar to FlorenceAI: Reinventing Data Science at Humana

Similar to FlorenceAI: Reinventing Data Science at Humana (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (16)

FlorenceAI: Reinventing Data Science at Humana