SlideShare a Scribd company logo
1 of 32
Download to read offline
Operationalizing Machine Learning—
Managing Provenance from Raw Data to
Predictions
Nabeel Sarwar, Machine Learning Engineer
June 2nd, 2018
2
INTRODUCTION AND BACKGROUND
CUSTOMER EXPERIENCE TEAM
27 MILLION CUSTOMERS (HIGH SPEED DATA,
VIDEO, VOICE, HOME SECURITY, MOBILE)
INGESTING ABOUT 2 BILLION EVENTS / MONTH
HIGH-VOLUME OF MACHINE-GENERATED EVENTS
DATA SCIENCE PIPELINE
GREW FROM A FEW DOZEN TO 150+ DATA
SOURCES / FEEDS IN ABOUT A YEAR
Comcast collects, stores, and uses all data in accordance with our privacy disclosures to users and applicable laws.
3
COMCAST APPLIED AI
Media &
Video Analytics
Machine Learning
& Data Science
Content Discovery
Speech
& NLP
Video
High Speed Internet
Home Security /
Automation
Customer Service
Universal Parks
Media Properties
4
BUSINESS PROBLEM
INCREASE POSITIVE CUSTOMER EXPERIENCES
RESOLVE POTENTIAL ISSUES CORRECTLY,
QUICKLY AND EVEN BETTER PROACTIVELY
PREDICT AND DIAGNOSE SERVICE TROUBLE
ACROSS MULTIPLE KNOWLEDGE DOMAINS
REDUCE COSTS THROUGH EARLIER RESOLUTION
AND BY REDUCING AVOIDABLE TECHNICIAN
VISITS
5
AI FOR CUSTOMER SERVICE
5
ProactivePredictiveInteractive
1 2 3 4
6
XFINITY VIRTUAL ASSISTANT
My Account Main Screen XFINITY Assistant Type a question Disambiguate
7
VIRTUAL ASSISTANT – STEP BY STEP
7
Devices, Applications, and Platforms
instrumented to provide telemetry
Natural language
input and feedback
Interactive (Conversational) Actions
Proactive (Automatic) Actions
Customer intents
Domain models
NLP
Action
Catalog
Schedule Truck Roll
Self-Heal
Notifications
Agent Contact
Choose Best
Explore
Decision Engine
Context
Root Cause Predictions
Predictive AI/MLPredictive AI/ML
Predictive AI/ML
8
TECHNICAL PROBLEM
MULTIPLE PROGRAMMING AND DATA SCIENCE
ENVIRONMENTS
WIDESPREAD AND DISCORDANT DATA SOURCES
THE “DATA PLANE” PROBLEM: COMBINING DATA AT
REST AND DATA IN MOTION
CONSISTENT FEATURE ENGINEERING
ML VERSIONING: DATA, CODE, FEATURES, MODELS
9
EXAMPLE NEAR REAL TIME
PREDICTION USE CASE
CUSTOMER RUNS A “SPEED TEST”
EVENT TRIGGERS A PREDICTION FLOW
ENRICH WITH NETWORK HEALTH AND OTHER
INDICATORS
EXECUTE ML MODEL
PREDICT WHETHER IT IS A WIFI, MODEM, OR
NETWORK ISSUE
Detect
Enrich
Predict
Gather Data
Event
ML
Model
Engage Customer
Act / Notify
Network Diagnostic Services
Slow
Speed?
Additional Context Services
Run
Prediction
1 0
SPACE CORRELATION EXAMPLE
ML ALGORITHM NEEDS TO LEARN THAT THERE IS NO NEED TO SEND 3 REPAIR TRUCKS
• LOGS FROM WHICH TRAINING DATASETS ARE SOURCED SHOW CORRELATION
BETWEEN UNSUCCESSFUL TRUCK DISPATCHES AND CONCENTRATED CABLE
FAILURES
• GEO-LOCATION IS AVAILABLE IN THE CUSTOMER CONTEXT
• ALGORITHM CAN CLUSTER CUSTOMERS BASED ON GEO-LOCATION
Cable
Green = Works
Yellow = Has Problems
Likely failure
1 1
CHALLENGE- STANDARDIZATION OF
FEATURES
TWO MAIN CHALLENGES
• FEATURE ASSEMBLY (ENRICHMENT) DURING
PREDICTION TIME
• DISCOVERING CORRELATIONS WHEN WE
HAVE 25 MILLION CUSTOMERS EACH USING 10
PRODUCTS
WE NEED A STANDARDIZATION OF FEATURES,
ACTIONS AND REWARDS
FEATURE STORE – CURATED DATA STORE TO
DRIVE MODEL TRAINING AND MODEL
PREDICTION
1 2
ML PIPELINE – ROLES & WORKFLOW
Define
Use
Case
Business User
Data Scientist
ML Operations
Explore
Features
Create and
publish new
features
Create &
Validate
Models
Model
Selection
Go Live with
Selected
Models
• Define Online Feature
Assembly
• Define pipeline to
collect outcomes
• Model Deployment
and Monitoring
Model
Review
Iterate
Evaluate
Live Model
Performance
Inception Exploration
Model
Development
Candidate Model
Selection
Model
Operationalization
Model
Evaluation
Go Live
Phase
Monitor Live
ModelsCollect new data & retrain
Iterate
1 3
SOLUTION MOTIVATION
SELF-SERVICE
PLATFORM
ALIGN DATA
SCIENTISTS AND
PRODUCTION
MODELS TREATED
AS CODE
HIGH THROUGHPUT
STREAM PLATFORM
1 4
WHY METADATA DRIVEN?
INSPIRED BY GROUND CONTEXT
• Berkeley’s RISE Lab
• Application context
• Parameters, callbacks, “meaty” metadata
• Behavior Context
• Data sets and code
• Change Context
• Version history
• Track any change end-to-end -> entire pipeline is versioned
• Metadata drives what/how code is ran
1 5
AN OVERVIEW OF SPARK FLOWS
RAW DATA
STREAM
Feature Creation
Pipeline
VERSION
Historical
RAW
Store
Feature Creation
Disk or
Memory
Model
ON DEMAND OR
CONTINUOUS
Historical
Feature Store
Online
Feature Store
Prediction
CUSTOMER
EXPERIENCE
ELEMENTS
Analysis &
Business
Value
1 6
FEATURE STORE
TWO TYPES OF FEATURE STORES:
• Online Feature Store – Current values by key (Key/Value
Store)
• History Feature Store – Append features as they are
collected (Ex. Hadoop File System, AWS S3)
ONLINE FEATURE STORE
• Used in the prediction phase for enrichment
• Needs to support fast ingest and query as it stores current
data for given account or account & device combination
HISTORY FEATURE STORE
• Used to build history of features
• Data Scientists use this store to create their training
datasets
MAINTAIN (VERSIONED) RAW DATA SEPARATELY
Feature Creation
Pipeline
History
Feature Store
Online
Feature Store
Prediction
Phase
Model Training
Phase
AppendOverwrite
1 7
USING THE ONLINE FEATURE STORE
MODEL
EXECUTION
TRIGGER
1. Payload only
contains Model
Name & Account
Number
FEATURE
ASSEMBLY
Model
Metadata
Online
Feature
Store
2. Model Metadata
informs which
features are needed
for a model
3. Pull required
features by account
number
MODEL
EXECUTION
4. Pass full set of
assembled features
for model execution
5. Prediction
1 8
FEATURE CREATION PIPELINE
Aggregation
Pipeline
On Demand
Pipeline
Continuous
Stream
On Demand
Feature Request
External
Rest API
Feature Writer
Feature
Assembly
Feature
Metadata
Model
Metadata
TWO TYPES:
• Continuous aggregations on streaming data
• On Demand Features
AGGREGATION FEATURE EXAMPLES
• Number of customer calls in the past 30 days. Key =
Account Number
• Number of signal errors > 2000 in a 24 hour tumbling
window. Key= Account Number + Device Id
ON DEMAND FEATURE EXAMPLE
• Diagnostic telemetry information for each device for
a given customer
• Expensive to collect. Only requested on demand
• Model Metadata specified TTL for such a feature
Online
Feature Store
History Feature
Store
Online
Feature Store
1 9
FEATURE METADATA
KEY: NAMESPACE, NAME & VERSION
ONLINE FEATURE STORE KEY DEFINITIONS: JSON PATH & SCRIPT
REFERENCES (CODE & METHOD) IN GITHUB
HISTORICAL FEATURE STORE KEY DEFINITIONS: JSON PATH TO
EXTRACT IDENTIFIERS, CONNECTION PARAMETERS TO HISTORY
STORE, SCRIPT & JSON PATH TO EXTRACT PARTITIONS
UPDATE TS EXTRACTORS : COMBINATION OF JSON PATHS, SCRIPT
REFERENCES TO EXTRACT TIMESTAMP FROM FEATURE PAYLOADS
HOW DO I IDENTIFY A
FEATURE?
HOW I IDENTIFY A SPECIFIC
INSTANCE OF A FEATURE
HOW DO I WRITE TO A
HISTORY STORE(S)?
WHAT IS THE UPDATE TIME
STAMPS FOR EACH
FEATURE VALUE? EVENT VS.
INGESTION TIME
2 0
EXAMPLE FEATURE VALUE
HEADER: TIMESTAMP, INTERNAL CUSTOMER IDENTIFIER
PAYLOAD: JSON PAYLOAD (EX. SPEED TEST DATA)
2 1
INGEST FEATURE VALUE
HEADER: TIMESTAMP, INTERNAL CUSTOMER IDENTIFIER
PAYLOAD: JSON PAYLOAD (EX. SPEED TEST DATA)
FEATURE INGESTION
PIPELINE
FEATURE METADATA
Scripts
RepositoryOnline
Feature Store
History
Feature Store
2 2
MODEL METADATA
KEY: USECASE, NAME & VERSION
PER FEATURE DEFINITION: PRE-FEATURE ENGINEERING HOOKS,
ATTRIBUTE LEVEL FEATURE ENGINEERING HOOKS, POST-FEATURE
ENGINEERING HOOKS, TTL
HOW DO I IDENTIFY A
MODEL?
DEFINE ENVIRONMENT
PARAMETERS FOR MODEL
EXECUTION
CONSISTENT FEATURE
ENGINEERING (SCRIPTS).
WHY?
HOW IS THE MODEL
DEPLOYED? AUTOSCALING
DEFINITIONS
ENVIRONMENT PARAMETERS
MODEL DEPLOYMENT DEFINITIONS
2 3
CONSISTENT DATA
PLACE DATA ON SAME PLANE
• S3 (or form data plane via Alluxio)
• Storage parameters driven by metadata
• Consistent persistence and reads
• Metadata-driven operators
• Historical Store
• Raw data
• Engineered features
VERSION THE DATA
• Feature Creation keeps metadata paths
2 4
CONSISTENT FEATURE ENGINEERING -
MODEL METADATA
FEATURE ENGINEERING MUST BE CONSISTENT:
• Training
• Prediction Phase
METADATA DRIVEN
• Using configured scripts just like feature metadata
• Define features used
• Define TTL per feature (Prediction Phase)
SCRIPTS DEFINED IN MODEL METADATA ARE DEFINED BY
DATA SCIENTIST
• Used for creating training/testing/validation datasets from raw
features. Apply on a record of data used for training or during
prediction
• Also used at prediction time to perform real time feature engineering
Feature
Engineering
Model
Metadata
Online Feature
Store
Scripts
Repository
Reference
Feature
Engineering
Scripts
History Feature
Store
Record
for
training
Record
for
prediction
Training
Prediction
2 5
CONSISTENT FEATURE ENGINEERING
SQL AS A UNIFYING LANGUAGE
• Replace as many operations with their SQL equivalent
• No need to translate code
• No need for DSL
SPARK AS A UNIFYING LANGUAGE
• Many tools for deeper feature engineering
• Redeploy same code through streaming / web app
• Less frameworks
APPLICABLE AT EVERY PHASE
• Post-Ingest
• In-flight or at-rest
• Pre-model
• Standards to fit both stream and batch
2 6
MODEL DEPLOYMENT
MODEL AS CODE
• H20 AI Pojo
• Spark ML Models
• Simple Python Scripts – Regression Models
• Specialized Python Scripts – Math libraries and need
specialized hardware like GPU support
ONE MODEL MULTIPLE DEPLOYMENT MODELS
• Deploy as Docker containers with REST Endpoints – Easy to
test and used directly if request has all the features available
• Deploy as Map Operators within Streaming framework
• Deploy as Lambda/SageMaker Spark functions in AWS
• SparkLauncher
• DataBricks Jobs API
2 7
PREDICTION PHASE
ASSEMBLE FEATURES FOR
A GIVEN MODEL
Online
Feature
Store
Model
/Feature
Metadata
Feature
Store
API
Feature
Assembly
Feature Creation
Pipeline
Are All
Features
Current?
No
History
Feature Store
Online
Feature Store
Feature
Assembly
Append store (Ex. S3, HDFS,
Redshift) for use by Data
Scientist for Model Training
Model
Execution
Prediction/Outc
ome Store
Customer
Context
REQUESTING
APPLICATION
Listens
Payload: Model
Name + Account
Number
Yes
2 8
FEATURES OF THE ML PIPELINE
AWS AGNOSTIC
• Integrates with the AWS Cloud but not
dependent on it
• Framework should be able to work in a non-
AWS distributed environment with
configuration (not code) changes
TRACEABILITY & REPEATABILITY &
AUDITABILITY
• Model to be traced back to business use-
cases
• Full traceability from raw data to feature
engineering to predictions
• “Everything Versioned” enables repeatability
CI/CD SUPPORT
• Code, Metadata (Hyper-Parameters) and
Data (Training/Validation Data) are
versioned. Deployable artifacts to integrate
with CI/CD Pipeline
2 9
NEXT STEPS AND FUTURE WORK
UI PORTAL FOR
• MODEL / FEATURE AND METADATA MANAGEMENT
• CONTAINERIZATION SUPPORT FOR MODEL EXECUTION
PHASE
• WORKBENCH FOR DATA SCIENTIST
• CONTINUOUS MODEL MONITORING
KNOWLEDGE SHARING
• Promote Reusability : Users search for features by model
• Search features by their importance in models
• Real time model evaluation by comparing predictions with
outcomes
• Determining first-class tools
AUTOMATING THE RETRAINING PROCESS
SUPPORT FOR MULTIPLE/PLUGGABLE FEATURE
STORES (SLA DRIVEN)
3 0
SUMMARY
Metadata Driven
Feature/Model
Definition,
Versioning , Feature
Assembly, Model
Deployment, Model
Monitoring is
metadata driven
Automation
Orchestrated
Deployment for
new Features
and Models
Rapid
Onboarding
Portal for Model
and Feature
Management as
well Model
Deployment
Data Consistency
Feature store
enforces a
consistent data
pipeline ensuring
that the data
used for training
is functionally
identical to the
data used for
predictions
Monitoring and
Metrics
Ability to execute
& monitor
multiple Models
in production to
enable real-time
metrics driven
model selection
Iterative/Consistent
Model
Development
Multiple versions of
the Models can be
developed
iteratively while
consuming from a
consistent dataset
(feature store),
enables A/B &
Multivariate Testing
THANK YOU!
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions with Nabeel Sarwar

More Related Content

What's hot

Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsDatabricks
 
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...Databricks
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleDatabricks
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningDatabricks
 
Anomaly Detection at Scale!
Anomaly Detection at Scale!Anomaly Detection at Scale!
Anomaly Detection at Scale!Databricks
 
Asynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache SparkAsynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache SparkDatabricks
 
Extending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkExtending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkDatabricks
 
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Databricks
 
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformLessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformDatabricks
 
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSABuilding the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSADatabricks
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...Databricks
 
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowFrom Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowDatabricks
 
SparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time BiddingSparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time BiddingDatabricks
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastDatabricks
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerDatabricks
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit
 
AI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionAI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionDatabricks
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
 

What's hot (20)

Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
 
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
 
Anomaly Detection at Scale!
Anomaly Detection at Scale!Anomaly Detection at Scale!
Anomaly Detection at Scale!
 
Asynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache SparkAsynchronous Hyperparameter Optimization with Apache Spark
Asynchronous Hyperparameter Optimization with Apache Spark
 
Extending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySparkExtending Machine Learning Algorithms with PySpark
Extending Machine Learning Algorithms with PySpark
 
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
 
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformLessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics Platform
 
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSABuilding the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
 
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowFrom Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
 
SparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time BiddingSparkML: Easy ML Productization for Real-Time Bidding
SparkML: Easy ML Productization for Real-Time Bidding
 
Scaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and FeastScaling Data and ML with Apache Spark and Feast
Scaling Data and ML with Apache Spark and Feast
 
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn SchedulerCloud-Native Apache Spark Scheduling with YuniKorn Scheduler
Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
 
Spark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik SivashanmugamSpark Summit EU talk by Kaarthik Sivashanmugam
Spark Summit EU talk by Kaarthik Sivashanmugam
 
AI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat DetectionAI on Spark for Malware Analysis and Anomalous Threat Detection
AI on Spark for Malware Analysis and Anomalous Threat Detection
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 

Similar to Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions with Nabeel Sarwar

Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Dataconomy Media
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant confluent
 
Mainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataMainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataDevOps for Enterprise Systems
 
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...Insight Technology, Inc.
 
Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...
Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...
Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...Nesma
 
Managing an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderManaging an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderProduct School
 
Smarter Manufacturing through Equipment Data-Driven Application Design
Smarter Manufacturing through Equipment Data-Driven Application DesignSmarter Manufacturing through Equipment Data-Driven Application Design
Smarter Manufacturing through Equipment Data-Driven Application DesignKimberly Daich
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...Flink Forward
 
Modernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-ArchitectModernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-ArchitectDevOps.com
 
SplunkLive! Utrecht 2016 - NXP
SplunkLive! Utrecht 2016 - NXPSplunkLive! Utrecht 2016 - NXP
SplunkLive! Utrecht 2016 - NXPSplunk
 
How ManageEngine NetFlow Analyzer helped Boston Properties Save Bandwidth Costs
How ManageEngine NetFlow Analyzer helped Boston Properties Save Bandwidth CostsHow ManageEngine NetFlow Analyzer helped Boston Properties Save Bandwidth Costs
How ManageEngine NetFlow Analyzer helped Boston Properties Save Bandwidth CostsNetFlow Analyzer
 
Disruptive Trends in Application Development
Disruptive Trends in Application DevelopmentDisruptive Trends in Application Development
Disruptive Trends in Application DevelopmentWaveMaker, Inc.
 
Architecting Design Development Test Request System in Aras
Architecting Design Development Test Request System in ArasArchitecting Design Development Test Request System in Aras
Architecting Design Development Test Request System in ArasAras
 
SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014 SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014 Sujit Ghosh
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsÁkos Horváth
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!Richard Robinson
 
Unlock your core business assets for the hybrid cloud with addi webinar dec...
Unlock your core business assets for the hybrid cloud with addi   webinar dec...Unlock your core business assets for the hybrid cloud with addi   webinar dec...
Unlock your core business assets for the hybrid cloud with addi webinar dec...Sherri Hanna
 
IDEAS Global A.I. Conference 2022.pdf
IDEAS Global A.I. Conference 2022.pdfIDEAS Global A.I. Conference 2022.pdf
IDEAS Global A.I. Conference 2022.pdfManimuthu Ayyannan
 

Similar to Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions with Nabeel Sarwar (20)

Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...Flink Forward San Francisco 2018:  Dave Torok & Sameer Wadkar - "Embedding Fl...
Flink Forward San Francisco 2018: Dave Torok & Sameer Wadkar - "Embedding Fl...
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
 
Mainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live DataMainframe Application Testing both With and Without Live Data
Mainframe Application Testing both With and Without Live Data
 
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...
 
Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...
Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...
Nesma autumn conference 2015 - Is FPA a valuable addition to predictable agil...
 
Managing an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product LeaderManaging an Experimentation Platform by LinkedIn Product Leader
Managing an Experimentation Platform by LinkedIn Product Leader
 
Smarter Manufacturing through Equipment Data-Driven Application Design
Smarter Manufacturing through Equipment Data-Driven Application DesignSmarter Manufacturing through Equipment Data-Driven Application Design
Smarter Manufacturing through Equipment Data-Driven Application Design
 
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...Event Streaming Architecture for Industry 4.0 -  Abdelkrim Hadjidj & Jan Kuni...
Event Streaming Architecture for Industry 4.0 - Abdelkrim Hadjidj & Jan Kuni...
 
Modernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-ArchitectModernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-Architect
 
SplunkLive! Utrecht 2016 - NXP
SplunkLive! Utrecht 2016 - NXPSplunkLive! Utrecht 2016 - NXP
SplunkLive! Utrecht 2016 - NXP
 
How ManageEngine NetFlow Analyzer helped Boston Properties Save Bandwidth Costs
How ManageEngine NetFlow Analyzer helped Boston Properties Save Bandwidth CostsHow ManageEngine NetFlow Analyzer helped Boston Properties Save Bandwidth Costs
How ManageEngine NetFlow Analyzer helped Boston Properties Save Bandwidth Costs
 
Disruptive Trends in Application Development
Disruptive Trends in Application DevelopmentDisruptive Trends in Application Development
Disruptive Trends in Application Development
 
Architecting Design Development Test Request System in Aras
Architecting Design Development Test Request System in ArasArchitecting Design Development Test Request System in Aras
Architecting Design Development Test Request System in Aras
 
SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014 SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014
 
Incremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical SystemsIncremental Queries and Transformations for Engineering Critical Systems
Incremental Queries and Transformations for Engineering Critical Systems
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
 
Unlock your core business assets for the hybrid cloud with addi webinar dec...
Unlock your core business assets for the hybrid cloud with addi   webinar dec...Unlock your core business assets for the hybrid cloud with addi   webinar dec...
Unlock your core business assets for the hybrid cloud with addi webinar dec...
 
IDEAS Global A.I. Conference 2022.pdf
IDEAS Global A.I. Conference 2022.pdfIDEAS Global A.I. Conference 2022.pdf
IDEAS Global A.I. Conference 2022.pdf
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 

Recently uploaded (20)

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 

Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions with Nabeel Sarwar

  • 1. Operationalizing Machine Learning— Managing Provenance from Raw Data to Predictions Nabeel Sarwar, Machine Learning Engineer June 2nd, 2018
  • 2. 2 INTRODUCTION AND BACKGROUND CUSTOMER EXPERIENCE TEAM 27 MILLION CUSTOMERS (HIGH SPEED DATA, VIDEO, VOICE, HOME SECURITY, MOBILE) INGESTING ABOUT 2 BILLION EVENTS / MONTH HIGH-VOLUME OF MACHINE-GENERATED EVENTS DATA SCIENCE PIPELINE GREW FROM A FEW DOZEN TO 150+ DATA SOURCES / FEEDS IN ABOUT A YEAR Comcast collects, stores, and uses all data in accordance with our privacy disclosures to users and applicable laws.
  • 3. 3 COMCAST APPLIED AI Media & Video Analytics Machine Learning & Data Science Content Discovery Speech & NLP Video High Speed Internet Home Security / Automation Customer Service Universal Parks Media Properties
  • 4. 4 BUSINESS PROBLEM INCREASE POSITIVE CUSTOMER EXPERIENCES RESOLVE POTENTIAL ISSUES CORRECTLY, QUICKLY AND EVEN BETTER PROACTIVELY PREDICT AND DIAGNOSE SERVICE TROUBLE ACROSS MULTIPLE KNOWLEDGE DOMAINS REDUCE COSTS THROUGH EARLIER RESOLUTION AND BY REDUCING AVOIDABLE TECHNICIAN VISITS
  • 5. 5 AI FOR CUSTOMER SERVICE 5 ProactivePredictiveInteractive
  • 6. 1 2 3 4 6 XFINITY VIRTUAL ASSISTANT My Account Main Screen XFINITY Assistant Type a question Disambiguate
  • 7. 7 VIRTUAL ASSISTANT – STEP BY STEP 7 Devices, Applications, and Platforms instrumented to provide telemetry Natural language input and feedback Interactive (Conversational) Actions Proactive (Automatic) Actions Customer intents Domain models NLP Action Catalog Schedule Truck Roll Self-Heal Notifications Agent Contact Choose Best Explore Decision Engine Context Root Cause Predictions Predictive AI/MLPredictive AI/ML Predictive AI/ML
  • 8. 8 TECHNICAL PROBLEM MULTIPLE PROGRAMMING AND DATA SCIENCE ENVIRONMENTS WIDESPREAD AND DISCORDANT DATA SOURCES THE “DATA PLANE” PROBLEM: COMBINING DATA AT REST AND DATA IN MOTION CONSISTENT FEATURE ENGINEERING ML VERSIONING: DATA, CODE, FEATURES, MODELS
  • 9. 9 EXAMPLE NEAR REAL TIME PREDICTION USE CASE CUSTOMER RUNS A “SPEED TEST” EVENT TRIGGERS A PREDICTION FLOW ENRICH WITH NETWORK HEALTH AND OTHER INDICATORS EXECUTE ML MODEL PREDICT WHETHER IT IS A WIFI, MODEM, OR NETWORK ISSUE Detect Enrich Predict Gather Data Event ML Model Engage Customer Act / Notify Network Diagnostic Services Slow Speed? Additional Context Services Run Prediction
  • 10. 1 0 SPACE CORRELATION EXAMPLE ML ALGORITHM NEEDS TO LEARN THAT THERE IS NO NEED TO SEND 3 REPAIR TRUCKS • LOGS FROM WHICH TRAINING DATASETS ARE SOURCED SHOW CORRELATION BETWEEN UNSUCCESSFUL TRUCK DISPATCHES AND CONCENTRATED CABLE FAILURES • GEO-LOCATION IS AVAILABLE IN THE CUSTOMER CONTEXT • ALGORITHM CAN CLUSTER CUSTOMERS BASED ON GEO-LOCATION Cable Green = Works Yellow = Has Problems Likely failure
  • 11. 1 1 CHALLENGE- STANDARDIZATION OF FEATURES TWO MAIN CHALLENGES • FEATURE ASSEMBLY (ENRICHMENT) DURING PREDICTION TIME • DISCOVERING CORRELATIONS WHEN WE HAVE 25 MILLION CUSTOMERS EACH USING 10 PRODUCTS WE NEED A STANDARDIZATION OF FEATURES, ACTIONS AND REWARDS FEATURE STORE – CURATED DATA STORE TO DRIVE MODEL TRAINING AND MODEL PREDICTION
  • 12. 1 2 ML PIPELINE – ROLES & WORKFLOW Define Use Case Business User Data Scientist ML Operations Explore Features Create and publish new features Create & Validate Models Model Selection Go Live with Selected Models • Define Online Feature Assembly • Define pipeline to collect outcomes • Model Deployment and Monitoring Model Review Iterate Evaluate Live Model Performance Inception Exploration Model Development Candidate Model Selection Model Operationalization Model Evaluation Go Live Phase Monitor Live ModelsCollect new data & retrain Iterate
  • 13. 1 3 SOLUTION MOTIVATION SELF-SERVICE PLATFORM ALIGN DATA SCIENTISTS AND PRODUCTION MODELS TREATED AS CODE HIGH THROUGHPUT STREAM PLATFORM
  • 14. 1 4 WHY METADATA DRIVEN? INSPIRED BY GROUND CONTEXT • Berkeley’s RISE Lab • Application context • Parameters, callbacks, “meaty” metadata • Behavior Context • Data sets and code • Change Context • Version history • Track any change end-to-end -> entire pipeline is versioned • Metadata drives what/how code is ran
  • 15. 1 5 AN OVERVIEW OF SPARK FLOWS RAW DATA STREAM Feature Creation Pipeline VERSION Historical RAW Store Feature Creation Disk or Memory Model ON DEMAND OR CONTINUOUS Historical Feature Store Online Feature Store Prediction CUSTOMER EXPERIENCE ELEMENTS Analysis & Business Value
  • 16. 1 6 FEATURE STORE TWO TYPES OF FEATURE STORES: • Online Feature Store – Current values by key (Key/Value Store) • History Feature Store – Append features as they are collected (Ex. Hadoop File System, AWS S3) ONLINE FEATURE STORE • Used in the prediction phase for enrichment • Needs to support fast ingest and query as it stores current data for given account or account & device combination HISTORY FEATURE STORE • Used to build history of features • Data Scientists use this store to create their training datasets MAINTAIN (VERSIONED) RAW DATA SEPARATELY Feature Creation Pipeline History Feature Store Online Feature Store Prediction Phase Model Training Phase AppendOverwrite
  • 17. 1 7 USING THE ONLINE FEATURE STORE MODEL EXECUTION TRIGGER 1. Payload only contains Model Name & Account Number FEATURE ASSEMBLY Model Metadata Online Feature Store 2. Model Metadata informs which features are needed for a model 3. Pull required features by account number MODEL EXECUTION 4. Pass full set of assembled features for model execution 5. Prediction
  • 18. 1 8 FEATURE CREATION PIPELINE Aggregation Pipeline On Demand Pipeline Continuous Stream On Demand Feature Request External Rest API Feature Writer Feature Assembly Feature Metadata Model Metadata TWO TYPES: • Continuous aggregations on streaming data • On Demand Features AGGREGATION FEATURE EXAMPLES • Number of customer calls in the past 30 days. Key = Account Number • Number of signal errors > 2000 in a 24 hour tumbling window. Key= Account Number + Device Id ON DEMAND FEATURE EXAMPLE • Diagnostic telemetry information for each device for a given customer • Expensive to collect. Only requested on demand • Model Metadata specified TTL for such a feature Online Feature Store History Feature Store Online Feature Store
  • 19. 1 9 FEATURE METADATA KEY: NAMESPACE, NAME & VERSION ONLINE FEATURE STORE KEY DEFINITIONS: JSON PATH & SCRIPT REFERENCES (CODE & METHOD) IN GITHUB HISTORICAL FEATURE STORE KEY DEFINITIONS: JSON PATH TO EXTRACT IDENTIFIERS, CONNECTION PARAMETERS TO HISTORY STORE, SCRIPT & JSON PATH TO EXTRACT PARTITIONS UPDATE TS EXTRACTORS : COMBINATION OF JSON PATHS, SCRIPT REFERENCES TO EXTRACT TIMESTAMP FROM FEATURE PAYLOADS HOW DO I IDENTIFY A FEATURE? HOW I IDENTIFY A SPECIFIC INSTANCE OF A FEATURE HOW DO I WRITE TO A HISTORY STORE(S)? WHAT IS THE UPDATE TIME STAMPS FOR EACH FEATURE VALUE? EVENT VS. INGESTION TIME
  • 20. 2 0 EXAMPLE FEATURE VALUE HEADER: TIMESTAMP, INTERNAL CUSTOMER IDENTIFIER PAYLOAD: JSON PAYLOAD (EX. SPEED TEST DATA)
  • 21. 2 1 INGEST FEATURE VALUE HEADER: TIMESTAMP, INTERNAL CUSTOMER IDENTIFIER PAYLOAD: JSON PAYLOAD (EX. SPEED TEST DATA) FEATURE INGESTION PIPELINE FEATURE METADATA Scripts RepositoryOnline Feature Store History Feature Store
  • 22. 2 2 MODEL METADATA KEY: USECASE, NAME & VERSION PER FEATURE DEFINITION: PRE-FEATURE ENGINEERING HOOKS, ATTRIBUTE LEVEL FEATURE ENGINEERING HOOKS, POST-FEATURE ENGINEERING HOOKS, TTL HOW DO I IDENTIFY A MODEL? DEFINE ENVIRONMENT PARAMETERS FOR MODEL EXECUTION CONSISTENT FEATURE ENGINEERING (SCRIPTS). WHY? HOW IS THE MODEL DEPLOYED? AUTOSCALING DEFINITIONS ENVIRONMENT PARAMETERS MODEL DEPLOYMENT DEFINITIONS
  • 23. 2 3 CONSISTENT DATA PLACE DATA ON SAME PLANE • S3 (or form data plane via Alluxio) • Storage parameters driven by metadata • Consistent persistence and reads • Metadata-driven operators • Historical Store • Raw data • Engineered features VERSION THE DATA • Feature Creation keeps metadata paths
  • 24. 2 4 CONSISTENT FEATURE ENGINEERING - MODEL METADATA FEATURE ENGINEERING MUST BE CONSISTENT: • Training • Prediction Phase METADATA DRIVEN • Using configured scripts just like feature metadata • Define features used • Define TTL per feature (Prediction Phase) SCRIPTS DEFINED IN MODEL METADATA ARE DEFINED BY DATA SCIENTIST • Used for creating training/testing/validation datasets from raw features. Apply on a record of data used for training or during prediction • Also used at prediction time to perform real time feature engineering Feature Engineering Model Metadata Online Feature Store Scripts Repository Reference Feature Engineering Scripts History Feature Store Record for training Record for prediction Training Prediction
  • 25. 2 5 CONSISTENT FEATURE ENGINEERING SQL AS A UNIFYING LANGUAGE • Replace as many operations with their SQL equivalent • No need to translate code • No need for DSL SPARK AS A UNIFYING LANGUAGE • Many tools for deeper feature engineering • Redeploy same code through streaming / web app • Less frameworks APPLICABLE AT EVERY PHASE • Post-Ingest • In-flight or at-rest • Pre-model • Standards to fit both stream and batch
  • 26. 2 6 MODEL DEPLOYMENT MODEL AS CODE • H20 AI Pojo • Spark ML Models • Simple Python Scripts – Regression Models • Specialized Python Scripts – Math libraries and need specialized hardware like GPU support ONE MODEL MULTIPLE DEPLOYMENT MODELS • Deploy as Docker containers with REST Endpoints – Easy to test and used directly if request has all the features available • Deploy as Map Operators within Streaming framework • Deploy as Lambda/SageMaker Spark functions in AWS • SparkLauncher • DataBricks Jobs API
  • 27. 2 7 PREDICTION PHASE ASSEMBLE FEATURES FOR A GIVEN MODEL Online Feature Store Model /Feature Metadata Feature Store API Feature Assembly Feature Creation Pipeline Are All Features Current? No History Feature Store Online Feature Store Feature Assembly Append store (Ex. S3, HDFS, Redshift) for use by Data Scientist for Model Training Model Execution Prediction/Outc ome Store Customer Context REQUESTING APPLICATION Listens Payload: Model Name + Account Number Yes
  • 28. 2 8 FEATURES OF THE ML PIPELINE AWS AGNOSTIC • Integrates with the AWS Cloud but not dependent on it • Framework should be able to work in a non- AWS distributed environment with configuration (not code) changes TRACEABILITY & REPEATABILITY & AUDITABILITY • Model to be traced back to business use- cases • Full traceability from raw data to feature engineering to predictions • “Everything Versioned” enables repeatability CI/CD SUPPORT • Code, Metadata (Hyper-Parameters) and Data (Training/Validation Data) are versioned. Deployable artifacts to integrate with CI/CD Pipeline
  • 29. 2 9 NEXT STEPS AND FUTURE WORK UI PORTAL FOR • MODEL / FEATURE AND METADATA MANAGEMENT • CONTAINERIZATION SUPPORT FOR MODEL EXECUTION PHASE • WORKBENCH FOR DATA SCIENTIST • CONTINUOUS MODEL MONITORING KNOWLEDGE SHARING • Promote Reusability : Users search for features by model • Search features by their importance in models • Real time model evaluation by comparing predictions with outcomes • Determining first-class tools AUTOMATING THE RETRAINING PROCESS SUPPORT FOR MULTIPLE/PLUGGABLE FEATURE STORES (SLA DRIVEN)
  • 30. 3 0 SUMMARY Metadata Driven Feature/Model Definition, Versioning , Feature Assembly, Model Deployment, Model Monitoring is metadata driven Automation Orchestrated Deployment for new Features and Models Rapid Onboarding Portal for Model and Feature Management as well Model Deployment Data Consistency Feature store enforces a consistent data pipeline ensuring that the data used for training is functionally identical to the data used for predictions Monitoring and Metrics Ability to execute & monitor multiple Models in production to enable real-time metrics driven model selection Iterative/Consistent Model Development Multiple versions of the Models can be developed iteratively while consuming from a consistent dataset (feature store), enables A/B & Multivariate Testing