SlideShare a Scribd company logo
1 of 28
Download to read offline
FlorenceAI
Reinventing Data Science at Humana
David Mack, PhD
Cognitive/Machine Learning Principal
AI Engineering, Digital Health and Analytics
TM
A more human way to healthcareTM
David Mack, PhD – Cognitive/Machine Learning Principal
I have worked at Humana for 5½ years in clinical and enterprise
data science. I have been one of the primary architects and
maintainers of Humana’s ML Platform for the past 2 years that
now serves hundreds of data scientists. I love to tinker with
homemade IoT devices, build cool stuff, and learn new things!
Humana’s bold goal is to address the needs of the whole person
Have focused on community partnerships and social determinants of health
Commitment to help our millions of members achieve their best health
Fortune 50 company with $77.2bn consolidated revenue in 2020
Humana has invested significant resources into fighting:
• COVID-19 Pandemic
• Food Insecurity
• Loneliness and Social Isolation
• Inequities in Healthcare
Formed Digital Health and Analytics Organization in 2018
Through advanced analytics, experiential design, data and technology we are
working to meet our associates, members and the communities we serve,
anytime, anywhere, anyhow
What exactly is FlorenceAI*?
| 3
A cloud platform for automating and accelerating the delivery
lifecycle of data science solutions at scale in Azure
Key Foundational Pillars
• Feature stores
• Starter code frameworks
• Notebook based workflow
• Prod deployment partnership
• Extensive training curriculum
End-to-end ecosystem benefits
• Empowers data scientists to solve complex problems
• Promotes access to open-source innovation
• Simplifies model consumption with single interface
• Transforms workflows to improve performance
Microsoft Azure Cloud
Foundational Components
Other Key Tools
* Patent Pending
Feature Stores – Quality Ingredients for ML Algorithms
| 4
Extensive Metadata
• Standard descriptions
• Centralized ref tables​
• Ratings to identify any
quality impacts
• Enables discovery and
exploration
Tens of thousands of features available for training and scoring
with hundreds of instances available across multiple years​
Economies of Scale
• Pre-computed​ for
entire population
• Refreshed regularly​ at
different cadences
• Production ready and
pre-validated
Flexible but Specific
• Designed to cover
most use cases
• Domain expertise in
feature design
• Self-service for
custom situations
End-to-End Process
| 5
Cohort
Design
Initial Feature
Selection
Model Training
Experiments
Score and Register
Best Model
Record Training
Artifacts
Scoring Code
and Testing
Promote Model and
Automate Scoring
Example Problem to Help Trace the Workflow
| 6
12 months of history
Over 11 months of enrollment
6 months looking forward
Continuous enrollment
Fixed Calendar Date
Age ≥ 65, Medicare Advantage
Evidence of CKD stage in Medical Claims or Lab Results
Predict the most severe stage of Chronic Kidney Disease in the next 6 months​
Criteria to Define the Cohort
All code snippets shown in subsequent slides are for illustrative purposes only and may have certain field names or variables redacted for security
Initial Feature Selection and
Traditional Model Training
Walkthrough:
Initial Feature Selection Notebook
Goal:
Identify hundreds of important
features among tens of thousands
First Round of Model Experimentation using SparkML
| 9
Helper Function to execute
the run available in shared
“experiment utility”
Arrive at a “Best Model” using SparkML
| 10
Different helper function to
save the best model and
provide more details
Accuracy alone isn’t always enough, so it’s important
to have views like ROC curves or Heatmaps to help
catch potential mistakes early
Walkthrough:
SparkML Helper Functions
Goals:
Abstract complexity and
standardize logging
Encouraging Reproducibility with Reusable Code
| 12
What items are automatically saved to the MLFlow run?
• Hyperparameters
• Relevant Metrics
• MLFlow model object
• Evaluation Metric Figure (Downloadable)
What other artifacts are saved to ADLS?
• Original Input Schemas before any indexing or feature prep
• Original Training and Test Datasets with just selected features
• String Indexes and Imputation Dictionaries (outside of pipeline models)
• Best Model Scores from both training and test data
Storage
Account
Scoped
Workspace
Scoped
Applying Deep Neural Networks
to Tabular Data at Scale
Key Distinctions of Deep Neural Networks
| 14
Multiclass
Example
Learns over
repeated passes
called “epochs”
What extra things can we do to help us decide which model is the best?
• Use early stopping to minimize training time and combat overfitting
• Use callbacks to log values at the end of each epoch
• Test on smaller chunks of data and scale up as we learn more
Bayesian Hyperparameter Searching with Hyperopt
| 15
Attempts to minimize
our loss function
Can set our hyperparameter space and the
number of trials we want to run
Used a sample of our training data to go
quickly over the 20 trials we chose to run
MLFlow has a Handy Comparison Tool to Help us Focus
| 16
Quick Insights: Complex Layer 1 and Complex Layer 2 don’t do well
Complex Layer 1 with Simpler Layer 2 do much better
Can highlight
ranges to focus
our attention
Let’s use MORE Data with Distributed Training!
| 17
Driver Only Petastorm
Petastorm &
Horovod
1 MM members
1 Worker
6 sec per epoch
Lots of trials to narrow
down our choices
10 MM members
1 Worker
63 sec per epoch
Using all the data, but
takes forever
10 MM members
16 Workers
14 sec per epoch
Train on all the data
much more quickly
We generally see a sqrt(n) speed up over a single worker
Using Petastorm and Horovod, we used all the data and trained 4.5x faster
Walkthrough:
Petastorm and Horovod
Helper Functions
Goals:
Save headaches and empower
data scientists to train on all of the
data quickly
We Improved the Precision of our Model!
| 19
We don’t see as much over-prediction of the majority class
and see better precision in the mid-range classes
SparkML Logistic Regression Tensorflow NN on all the Data
Weighted f1 score = 0.615
(prw = 0.633, rcw = 0.609)
Weighted f1 score = 0.615
(prw = 0.646, rcw = 0.602)
Register, Score, and Preserve
the Model Before Deploying
it to Production
Scoring with a Spark UDF from MLFlow
| 21
• This allows us to easily get the scores into a Spark dataframe from any MLFlow model
• Can repeat for other types of targets or our training DF
Registering the Model
| 22
Model Metadata
(Screenshot from Models Tab in DB Workspace)
First registered in the Data
Scientist’s dev DB workspace
The Data Scientist promotes it to
“production” status in the dev
workspace after review
The associated MLFlow run is used
to also register it in our “production”
workspace for automated jobs
This newly registered model
is the official version used for
automated scoring
The path within the ADLS storage account contains the version so we can support multiple versions at the same time
Production Deployment Pipeline – Notebook-based Workflow
| 23
Key Requirements
• Use Azure DevOps to deploy code to various environments for testing and execution
• Tie execution to specific package versions and LTS non-ML Databricks Runtimes
• Use ADF Parameters to provide flexibility to minimize YAML code duplication
Reusable Framework of 3 notebooks: Feature Engineering, Scoring, Validation
Upstream Dependency Check
to prevent flow of bad data
and errors from missing data Logging via SQL Server to record
both success and failure
Partnership Between Data Scientists and AI Engineers is Pivotal
| 24
Each of the required files needed for deployment are part of the starter repo
and help the data scientist to have the end goal in view from the beginning
Each model is initially reviewed
and subsequently monitored for
AI Bias in key areas
All models are peer reviewed for both domain and
technical accuracy prior to production deployment
Early Wins for the Platform
Key Early Wins – big steps forward
Scaling and automating clunky processes
• Scaled from less than 40 condition flags on-premise to over 3x this in the cloud
• Got contributions from multiple teams following templates
• Now updates over 1 bn rows daily in 1.5 hours for entire member population
Faster prep, more iterations, better tuning and collaboration
• Reduced feature engineering step on very large source from hours to a few min
• Enabled DS team to iterate on models faster, going from 5+ hours for training to a
half hour or less, even for complex GBT models
• Reduced scoring step on prospective members from a week to 30 minutes
Shared resources accelerate everyone
• Hundreds of feature stores mean less process/data duplication and more time to
improve model design with a variety of approaches
• Flexibility to score at scale regardless of algorithm package in automated fashion
with a common output format
A more human way to healthcareTM
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot

The Emerging Integration Reference Architecture | MuleSoft
The Emerging Integration Reference Architecture | MuleSoftThe Emerging Integration Reference Architecture | MuleSoft
The Emerging Integration Reference Architecture | MuleSoftMuleSoft
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Center of Excellence Peer to Peer Forum
Center of Excellence Peer to Peer ForumCenter of Excellence Peer to Peer Forum
Center of Excellence Peer to Peer ForumPegasystems
 
Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...
Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...
Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...Simplilearn
 
Build your first blockchain application with Amazon Managed Blockchain - SVC2...
Build your first blockchain application with Amazon Managed Blockchain - SVC2...Build your first blockchain application with Amazon Managed Blockchain - SVC2...
Build your first blockchain application with Amazon Managed Blockchain - SVC2...Amazon Web Services
 
CamundaCon 2022 Keynote: The Process Orchestration Journey
CamundaCon 2022 Keynote: The Process Orchestration JourneyCamundaCon 2022 Keynote: The Process Orchestration Journey
CamundaCon 2022 Keynote: The Process Orchestration JourneyBernd Ruecker
 
How to work with your salesforce contacts
How to work with your salesforce contactsHow to work with your salesforce contacts
How to work with your salesforce contactsSalesforce Partners
 
MuleSoft Surat Meetup#54 - MuleSoft Automation
MuleSoft Surat Meetup#54 - MuleSoft AutomationMuleSoft Surat Meetup#54 - MuleSoft Automation
MuleSoft Surat Meetup#54 - MuleSoft AutomationJitendra Bafna
 
Five Ways to Automate API Testing with Postman
Five Ways to Automate API Testing with PostmanFive Ways to Automate API Testing with Postman
Five Ways to Automate API Testing with PostmanPostman
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseCloudera, Inc.
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Databricks
 
How Netflix Is Solving Authorization Across Their Cloud
How Netflix Is Solving Authorization Across Their CloudHow Netflix Is Solving Authorization Across Their Cloud
How Netflix Is Solving Authorization Across Their CloudTorin Sandall
 
DevOps Engineer Day-to-Day Activities
DevOps Engineer Day-to-Day Activities DevOps Engineer Day-to-Day Activities
DevOps Engineer Day-to-Day Activities Intellipaat
 
UI5 with Akamai - Introduction to the Content Delivery Network
UI5 with Akamai - Introduction to the Content Delivery NetworkUI5 with Akamai - Introduction to the Content Delivery Network
UI5 with Akamai - Introduction to the Content Delivery NetworkGokul Anand E, PMP®
 
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...Sonatype
 

What's hot (20)

The Emerging Integration Reference Architecture | MuleSoft
The Emerging Integration Reference Architecture | MuleSoftThe Emerging Integration Reference Architecture | MuleSoft
The Emerging Integration Reference Architecture | MuleSoft
 
DEVOPS TOOLS SWOT ANALYSIS
DEVOPS TOOLS SWOT ANALYSISDEVOPS TOOLS SWOT ANALYSIS
DEVOPS TOOLS SWOT ANALYSIS
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Devops ppt
Devops pptDevops ppt
Devops ppt
 
Center of Excellence Peer to Peer Forum
Center of Excellence Peer to Peer ForumCenter of Excellence Peer to Peer Forum
Center of Excellence Peer to Peer Forum
 
Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...
Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...
Introduction To DevOps | Devops Tutorial For Beginners | DevOps Training For ...
 
Build your first blockchain application with Amazon Managed Blockchain - SVC2...
Build your first blockchain application with Amazon Managed Blockchain - SVC2...Build your first blockchain application with Amazon Managed Blockchain - SVC2...
Build your first blockchain application with Amazon Managed Blockchain - SVC2...
 
CamundaCon 2022 Keynote: The Process Orchestration Journey
CamundaCon 2022 Keynote: The Process Orchestration JourneyCamundaCon 2022 Keynote: The Process Orchestration Journey
CamundaCon 2022 Keynote: The Process Orchestration Journey
 
How to work with your salesforce contacts
How to work with your salesforce contactsHow to work with your salesforce contacts
How to work with your salesforce contacts
 
MuleSoft Surat Meetup#54 - MuleSoft Automation
MuleSoft Surat Meetup#54 - MuleSoft AutomationMuleSoft Surat Meetup#54 - MuleSoft Automation
MuleSoft Surat Meetup#54 - MuleSoft Automation
 
Five Ways to Automate API Testing with Postman
Five Ways to Automate API Testing with PostmanFive Ways to Automate API Testing with Postman
Five Ways to Automate API Testing with Postman
 
HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBaseHBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Compaction Improvements in Apache HBase
 
How to Send IDOC to SAP using MuleSoft
How to Send IDOC to SAP using MuleSoftHow to Send IDOC to SAP using MuleSoft
How to Send IDOC to SAP using MuleSoft
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
 
How Netflix Is Solving Authorization Across Their Cloud
How Netflix Is Solving Authorization Across Their CloudHow Netflix Is Solving Authorization Across Their Cloud
How Netflix Is Solving Authorization Across Their Cloud
 
DevOps Engineer Day-to-Day Activities
DevOps Engineer Day-to-Day Activities DevOps Engineer Day-to-Day Activities
DevOps Engineer Day-to-Day Activities
 
UI5 with Akamai - Introduction to the Content Delivery Network
UI5 with Akamai - Introduction to the Content Delivery NetworkUI5 with Akamai - Introduction to the Content Delivery Network
UI5 with Akamai - Introduction to the Content Delivery Network
 
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
DevOps and Continuous Delivery Reference Architectures (including Nexus and o...
 
B2B commerce breakout session Basecamp Copenhagen
B2B commerce breakout session Basecamp CopenhagenB2B commerce breakout session Basecamp Copenhagen
B2B commerce breakout session Basecamp Copenhagen
 

Similar to FlorenceAI: Reinventing Data Science at Humana

Experimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOpsExperimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOpsDatabricks
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowLviv Startup Club
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowEdunomica
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Lviv Startup Club
 
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video Platform
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video PlatformDeep Caliper Event Integration, Blackboard Learn and Kaltura Video Platform
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video PlatformDan Rinzel
 
Managing Data Science Projects
Managing Data Science ProjectsManaging Data Science Projects
Managing Data Science ProjectsDanielle Dean
 
Chapter 10
Chapter 10Chapter 10
Chapter 10bodo-con
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya
 
renita lobo-CV-Automation
renita lobo-CV-Automationrenita lobo-CV-Automation
renita lobo-CV-AutomationRenita Lobo
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionFlorian Wilhelm
 
Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AISanjana Chowdhury
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersRevolution Analytics
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
Agile Development unleashed
Agile Development unleashedAgile Development unleashed
Agile Development unleashedlivgeni
 
An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...wweinmeyer79
 
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptxMinh Nguyen
 

Similar to FlorenceAI: Reinventing Data Science at Humana (20)

Experimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOpsExperimentation to Industrialization: Implementing MLOps
Experimentation to Industrialization: Implementing MLOps
 
Lect7
Lect7Lect7
Lect7
 
Lect7
Lect7Lect7
Lect7
 
Foutse_Khomh.pptx
Foutse_Khomh.pptxFoutse_Khomh.pptx
Foutse_Khomh.pptx
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
 
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video Platform
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video PlatformDeep Caliper Event Integration, Blackboard Learn and Kaltura Video Platform
Deep Caliper Event Integration, Blackboard Learn and Kaltura Video Platform
 
Managing Data Science Projects
Managing Data Science ProjectsManaging Data Science Projects
Managing Data Science Projects
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
 
renita lobo-CV-Automation
renita lobo-CV-Automationrenita lobo-CV-Automation
renita lobo-CV-Automation
 
Bridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to ProductionBridging the Gap: from Data Science to Production
Bridging the Gap: from Data Science to Production
 
Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AI
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Agile Development unleashed
Agile Development unleashedAgile Development unleashed
Agile Development unleashed
 
DevOps 101
DevOps 101DevOps 101
DevOps 101
 
An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...An intro to building an architecture repository meta model and modeling frame...
An intro to building an architecture repository meta model and modeling frame...
 
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx
2018-Sogeti-TestExpo-Intelligent_Predictive_Models.pptx
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...ThinkInnovation
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 

Recently uploaded (16)

IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 

FlorenceAI: Reinventing Data Science at Humana

  • 1. FlorenceAI Reinventing Data Science at Humana David Mack, PhD Cognitive/Machine Learning Principal AI Engineering, Digital Health and Analytics TM A more human way to healthcareTM
  • 2. David Mack, PhD – Cognitive/Machine Learning Principal I have worked at Humana for 5½ years in clinical and enterprise data science. I have been one of the primary architects and maintainers of Humana’s ML Platform for the past 2 years that now serves hundreds of data scientists. I love to tinker with homemade IoT devices, build cool stuff, and learn new things! Humana’s bold goal is to address the needs of the whole person Have focused on community partnerships and social determinants of health Commitment to help our millions of members achieve their best health Fortune 50 company with $77.2bn consolidated revenue in 2020 Humana has invested significant resources into fighting: • COVID-19 Pandemic • Food Insecurity • Loneliness and Social Isolation • Inequities in Healthcare Formed Digital Health and Analytics Organization in 2018 Through advanced analytics, experiential design, data and technology we are working to meet our associates, members and the communities we serve, anytime, anywhere, anyhow
  • 3. What exactly is FlorenceAI*? | 3 A cloud platform for automating and accelerating the delivery lifecycle of data science solutions at scale in Azure Key Foundational Pillars • Feature stores • Starter code frameworks • Notebook based workflow • Prod deployment partnership • Extensive training curriculum End-to-end ecosystem benefits • Empowers data scientists to solve complex problems • Promotes access to open-source innovation • Simplifies model consumption with single interface • Transforms workflows to improve performance Microsoft Azure Cloud Foundational Components Other Key Tools * Patent Pending
  • 4. Feature Stores – Quality Ingredients for ML Algorithms | 4 Extensive Metadata • Standard descriptions • Centralized ref tables​ • Ratings to identify any quality impacts • Enables discovery and exploration Tens of thousands of features available for training and scoring with hundreds of instances available across multiple years​ Economies of Scale • Pre-computed​ for entire population • Refreshed regularly​ at different cadences • Production ready and pre-validated Flexible but Specific • Designed to cover most use cases • Domain expertise in feature design • Self-service for custom situations
  • 5. End-to-End Process | 5 Cohort Design Initial Feature Selection Model Training Experiments Score and Register Best Model Record Training Artifacts Scoring Code and Testing Promote Model and Automate Scoring
  • 6. Example Problem to Help Trace the Workflow | 6 12 months of history Over 11 months of enrollment 6 months looking forward Continuous enrollment Fixed Calendar Date Age ≥ 65, Medicare Advantage Evidence of CKD stage in Medical Claims or Lab Results Predict the most severe stage of Chronic Kidney Disease in the next 6 months​ Criteria to Define the Cohort All code snippets shown in subsequent slides are for illustrative purposes only and may have certain field names or variables redacted for security
  • 7. Initial Feature Selection and Traditional Model Training
  • 8. Walkthrough: Initial Feature Selection Notebook Goal: Identify hundreds of important features among tens of thousands
  • 9. First Round of Model Experimentation using SparkML | 9 Helper Function to execute the run available in shared “experiment utility”
  • 10. Arrive at a “Best Model” using SparkML | 10 Different helper function to save the best model and provide more details Accuracy alone isn’t always enough, so it’s important to have views like ROC curves or Heatmaps to help catch potential mistakes early
  • 11. Walkthrough: SparkML Helper Functions Goals: Abstract complexity and standardize logging
  • 12. Encouraging Reproducibility with Reusable Code | 12 What items are automatically saved to the MLFlow run? • Hyperparameters • Relevant Metrics • MLFlow model object • Evaluation Metric Figure (Downloadable) What other artifacts are saved to ADLS? • Original Input Schemas before any indexing or feature prep • Original Training and Test Datasets with just selected features • String Indexes and Imputation Dictionaries (outside of pipeline models) • Best Model Scores from both training and test data Storage Account Scoped Workspace Scoped
  • 13. Applying Deep Neural Networks to Tabular Data at Scale
  • 14. Key Distinctions of Deep Neural Networks | 14 Multiclass Example Learns over repeated passes called “epochs” What extra things can we do to help us decide which model is the best? • Use early stopping to minimize training time and combat overfitting • Use callbacks to log values at the end of each epoch • Test on smaller chunks of data and scale up as we learn more
  • 15. Bayesian Hyperparameter Searching with Hyperopt | 15 Attempts to minimize our loss function Can set our hyperparameter space and the number of trials we want to run Used a sample of our training data to go quickly over the 20 trials we chose to run
  • 16. MLFlow has a Handy Comparison Tool to Help us Focus | 16 Quick Insights: Complex Layer 1 and Complex Layer 2 don’t do well Complex Layer 1 with Simpler Layer 2 do much better Can highlight ranges to focus our attention
  • 17. Let’s use MORE Data with Distributed Training! | 17 Driver Only Petastorm Petastorm & Horovod 1 MM members 1 Worker 6 sec per epoch Lots of trials to narrow down our choices 10 MM members 1 Worker 63 sec per epoch Using all the data, but takes forever 10 MM members 16 Workers 14 sec per epoch Train on all the data much more quickly We generally see a sqrt(n) speed up over a single worker Using Petastorm and Horovod, we used all the data and trained 4.5x faster
  • 18. Walkthrough: Petastorm and Horovod Helper Functions Goals: Save headaches and empower data scientists to train on all of the data quickly
  • 19. We Improved the Precision of our Model! | 19 We don’t see as much over-prediction of the majority class and see better precision in the mid-range classes SparkML Logistic Regression Tensorflow NN on all the Data Weighted f1 score = 0.615 (prw = 0.633, rcw = 0.609) Weighted f1 score = 0.615 (prw = 0.646, rcw = 0.602)
  • 20. Register, Score, and Preserve the Model Before Deploying it to Production
  • 21. Scoring with a Spark UDF from MLFlow | 21 • This allows us to easily get the scores into a Spark dataframe from any MLFlow model • Can repeat for other types of targets or our training DF
  • 22. Registering the Model | 22 Model Metadata (Screenshot from Models Tab in DB Workspace) First registered in the Data Scientist’s dev DB workspace The Data Scientist promotes it to “production” status in the dev workspace after review The associated MLFlow run is used to also register it in our “production” workspace for automated jobs This newly registered model is the official version used for automated scoring The path within the ADLS storage account contains the version so we can support multiple versions at the same time
  • 23. Production Deployment Pipeline – Notebook-based Workflow | 23 Key Requirements • Use Azure DevOps to deploy code to various environments for testing and execution • Tie execution to specific package versions and LTS non-ML Databricks Runtimes • Use ADF Parameters to provide flexibility to minimize YAML code duplication Reusable Framework of 3 notebooks: Feature Engineering, Scoring, Validation Upstream Dependency Check to prevent flow of bad data and errors from missing data Logging via SQL Server to record both success and failure
  • 24. Partnership Between Data Scientists and AI Engineers is Pivotal | 24 Each of the required files needed for deployment are part of the starter repo and help the data scientist to have the end goal in view from the beginning Each model is initially reviewed and subsequently monitored for AI Bias in key areas All models are peer reviewed for both domain and technical accuracy prior to production deployment
  • 25. Early Wins for the Platform
  • 26. Key Early Wins – big steps forward Scaling and automating clunky processes • Scaled from less than 40 condition flags on-premise to over 3x this in the cloud • Got contributions from multiple teams following templates • Now updates over 1 bn rows daily in 1.5 hours for entire member population Faster prep, more iterations, better tuning and collaboration • Reduced feature engineering step on very large source from hours to a few min • Enabled DS team to iterate on models faster, going from 5+ hours for training to a half hour or less, even for complex GBT models • Reduced scoring step on prospective members from a week to 30 minutes Shared resources accelerate everyone • Hundreds of feature stores mean less process/data duplication and more time to improve model design with a variety of approaches • Flexibility to score at scale regardless of algorithm package in automated fashion with a common output format
  • 27. A more human way to healthcareTM
  • 28. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.