SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Operationalizing Edge Machine Learning
with Apache Spark
ParallelM
2
Growing AI Investments; Few Deployed at Scale
Source: “Artificial Intelligence: The Next Digital Frontier?”, McKinsey Global Institute, June 2017
Out of 160 reviewed
AI use cases:
88% did not
progress
beyond the
experimental
stage
But successful early
AI adopters report:
Profit margins
3–15%
higher than
industry
average
20%
AI in
Production
80%
Developing,
Experimenting,
Contemplating
Survey of 3073 AI-aware C-level Executives
3
Challenges of Deploying & Managing ML in Production
• Diverse focus and expertise of Data Science & Ops teams
• Increased risk from non-deterministic nature of ML
• Current Operations solutions do not address uniqueness of ML Apps
4
Challenges of Edge/Distributed Topologies
• Varied resources at each level
• Scale, heterogeneity, disconnected operation
IoT is Driving Explosive Growth in Data Volume
“Things” Edge and Network Data Center/Cloud
Data Lake
5
What We Need For Operational ML
• Accelerate deployment & facilitate collaboration between Data & Ops teams
• Monitor validity of ML predictions, diagnose data and ML performance issues
• Orchestrate training, update, and configuration of ML pipelines across
distributed, heterogeneous infrastructure with tracking
6
What We Need For Edge Operational ML
• Distribute analytics processing to the optimal point for each use case
• Flexible management framework enables:
• Secure centralized and/or local learning, prediction, or combined learning/prediction
• Granular monitoring and control of model update policies
• Support multi-layer topologies to achieve maximum scale while accommodating low
bandwidth or unreliable connectivity
Sources
Edge
Intelligence
Streams
Data Lake
Central/Cloud
Intelligence
Batches
Edge/Cloud
Orchestration
7
ML Orchestration
ML Health
Business
Impact
Model
Governance
Continuous
Integration/
Deployment
Database
Business Value
Machine Learning
Models
MLOps – Managing the full Production ML Lifecycle
8
Models, Retraining
Control, Statistics
Events, Alerts
Data
Data Science
Platforms
Data Streams Data Lakes
Our Approach
MCenter
MCenter Server
MCenter
Agent
MCenter
Agent
MCenter
Agent
MCenter
Agent
and more…
Analytic Engines
MCenter Developer Connectors
9
Operational Abstraction
• Link pipelines (training and inference) via
an ION (Intelligence Overlay Network)
• Basically a Directed Graph
representation with allowance for cycles
• Pipelines are DAGs within each engine
• Distributed execution over
heterogeneous engines, programming
languages and geographies
Policy
based
Update
Example – KMeans Batch Training
Plus Streaming Inference
Anomaly Detection
10
An Example ION to Resource Mapping
Human
Approved
Model
Update
Every 5 min - EdgeEvery Tuesday at 10AM - Cloud
Sources
Edge
Intelligence
Streams
Data Lake
Central/Cloud
Intelligence
Batches
Models
11
Pipeline Examples
Training Pipeline
(SparkML)
Inference Pipeline
(SparkML)
12
Instrument, Upload, Orchestrate, Monitor
12
13
Integrating with Analytics Engines (Spark)
Job Management
• Via SparkLauncher: A library to control launching, monitoring and terminating jobs
• PM Agent communicates with Spark through this library for job management (also uses Java
API to launch child processes)
Statistics
• Via SparkListener: A Spark-driver callback service
• SparkListener taps into all accumulators which, is one of the popular ways to expose statistics
• PM agent communicates with the Spark driver and exposes statistics via a REST endpoint
ML Health / Model collection and updates
• PM Agent delivers and receives health events, health objects and models via sockets from
custom PM components in the ML Pipeline
14
Demo Description
Training
Inference
Thank You!
Nisha Talagala
nisha.talagala@parallelm.com
Vinay Sridhar
vinay.sridhar@parallelm.com
16
Data Lake
What We Need For Edge Operational ML
• Distribute analytics processing to the optimal point for each use case
• Flexible management framework enables:
• Secure centralized and/or local learning, prediction, or combined learning/prediction
• Granular monitoring and control of model update policies
• Support multi-layer topologies to achieve maximum scale while accommodating
low bandwidth or unreliable connectivity
Central/Cloud
Intelligence
Sources
Edge
Intelligence
Batches
Streams
Edge/Cloud
Orchestration
17
Integrating with Analytics Engines (TensorFlow)
Job Management
• TensorFlow Python programs run as standalone applications
• Standard process control mechanisms based on the OS is used to monitor and control
TensorFlow programs
Statistics Collection
• PM Agent parses contents via TensorBoard log files to extract meaningful statistics and events
that data scientists added
ML Health / Model collection
• Generation of models and health objects is recorded on a shared medium
18
An Example ION
18
Node 1: Inference Pipeline
Node 2: (Re) Training Pipeline
Model
Node 3: Policy
Model
Every 5 min on Spark cluster 1
Every Tuesday at 10AM on Spark cluster 2
Human approves/rejects
When: anytime there is
a new model

Mais conteúdo relacionado

Mais procurados

STRIDE And DREAD
STRIDE And DREADSTRIDE And DREAD
STRIDE And DREADchuckbt
 
Security Information and Event Management (SIEM)
Security Information and Event Management (SIEM)Security Information and Event Management (SIEM)
Security Information and Event Management (SIEM)k33a
 
Présentation et démo ELK/SIEM/Wazuh
Présentation et démo ELK/SIEM/Wazuh Présentation et démo ELK/SIEM/Wazuh
Présentation et démo ELK/SIEM/Wazuh clevernetsystemsgeneva
 
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...Databricks
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingDatabricks
 
NIST cybersecurity framework
NIST cybersecurity frameworkNIST cybersecurity framework
NIST cybersecurity frameworkShriya Rai
 
Présentation ELK/SIEM et démo Wazuh
Présentation ELK/SIEM et démo WazuhPrésentation ELK/SIEM et démo Wazuh
Présentation ELK/SIEM et démo WazuhAurélie Henriot
 
BATbern48_How Zero Trust can help your organisation keep safe.pdf
BATbern48_How Zero Trust can help your organisation keep safe.pdfBATbern48_How Zero Trust can help your organisation keep safe.pdf
BATbern48_How Zero Trust can help your organisation keep safe.pdfBATbern
 
Présentation Méthode EBIOS Risk Manager
Présentation Méthode EBIOS Risk ManagerPrésentation Méthode EBIOS Risk Manager
Présentation Méthode EBIOS Risk ManagerComsoce
 
Cybersecurity roadmap : Global healthcare security architecture
Cybersecurity roadmap : Global healthcare security architectureCybersecurity roadmap : Global healthcare security architecture
Cybersecurity roadmap : Global healthcare security architecturePriyanka Aash
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®confluent
 
SOC Architecture - Building the NextGen SOC
SOC Architecture - Building the NextGen SOCSOC Architecture - Building the NextGen SOC
SOC Architecture - Building the NextGen SOCPriyanka Aash
 
ICION 2016 - Cyber Security Governance
ICION 2016 - Cyber Security GovernanceICION 2016 - Cyber Security Governance
ICION 2016 - Cyber Security GovernanceCharles Lim
 
Blockchain Security and Privacy
Blockchain Security and PrivacyBlockchain Security and Privacy
Blockchain Security and PrivacyAnil John
 
Everything Blockchain Presentation - Feb 2022
Everything Blockchain Presentation -  Feb 2022Everything Blockchain Presentation -  Feb 2022
Everything Blockchain Presentation - Feb 2022RedChip Companies, Inc.
 
QRadar, ArcSight and Splunk
QRadar, ArcSight and Splunk QRadar, ArcSight and Splunk
QRadar, ArcSight and Splunk M sharifi
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Kai Wähner
 
Cybersecurity Capability Maturity Model (C2M2)
Cybersecurity Capability Maturity Model (C2M2)Cybersecurity Capability Maturity Model (C2M2)
Cybersecurity Capability Maturity Model (C2M2)Maganathin Veeraragaloo
 

Mais procurados (20)

Probe.ly
Probe.lyProbe.ly
Probe.ly
 
STRIDE And DREAD
STRIDE And DREADSTRIDE And DREAD
STRIDE And DREAD
 
Security Information and Event Management (SIEM)
Security Information and Event Management (SIEM)Security Information and Event Management (SIEM)
Security Information and Event Management (SIEM)
 
Présentation et démo ELK/SIEM/Wazuh
Présentation et démo ELK/SIEM/Wazuh Présentation et démo ELK/SIEM/Wazuh
Présentation et démo ELK/SIEM/Wazuh
 
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
 
Build Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks StreamingBuild Real-Time Applications with Databricks Streaming
Build Real-Time Applications with Databricks Streaming
 
NIST cybersecurity framework
NIST cybersecurity frameworkNIST cybersecurity framework
NIST cybersecurity framework
 
A Threat Hunter Himself
A Threat Hunter HimselfA Threat Hunter Himself
A Threat Hunter Himself
 
Présentation ELK/SIEM et démo Wazuh
Présentation ELK/SIEM et démo WazuhPrésentation ELK/SIEM et démo Wazuh
Présentation ELK/SIEM et démo Wazuh
 
BATbern48_How Zero Trust can help your organisation keep safe.pdf
BATbern48_How Zero Trust can help your organisation keep safe.pdfBATbern48_How Zero Trust can help your organisation keep safe.pdf
BATbern48_How Zero Trust can help your organisation keep safe.pdf
 
Présentation Méthode EBIOS Risk Manager
Présentation Méthode EBIOS Risk ManagerPrésentation Méthode EBIOS Risk Manager
Présentation Méthode EBIOS Risk Manager
 
Cybersecurity roadmap : Global healthcare security architecture
Cybersecurity roadmap : Global healthcare security architectureCybersecurity roadmap : Global healthcare security architecture
Cybersecurity roadmap : Global healthcare security architecture
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
Best Practices for Streaming IoT Data with MQTT and Apache Kafka®
 
SOC Architecture - Building the NextGen SOC
SOC Architecture - Building the NextGen SOCSOC Architecture - Building the NextGen SOC
SOC Architecture - Building the NextGen SOC
 
ICION 2016 - Cyber Security Governance
ICION 2016 - Cyber Security GovernanceICION 2016 - Cyber Security Governance
ICION 2016 - Cyber Security Governance
 
Blockchain Security and Privacy
Blockchain Security and PrivacyBlockchain Security and Privacy
Blockchain Security and Privacy
 
Everything Blockchain Presentation - Feb 2022
Everything Blockchain Presentation -  Feb 2022Everything Blockchain Presentation -  Feb 2022
Everything Blockchain Presentation - Feb 2022
 
QRadar, ArcSight and Splunk
QRadar, ArcSight and Splunk QRadar, ArcSight and Splunk
QRadar, ArcSight and Splunk
 
Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?Can Apache Kafka Replace a Database?
Can Apache Kafka Replace a Database?
 
Cybersecurity Capability Maturity Model (C2M2)
Cybersecurity Capability Maturity Model (C2M2)Cybersecurity Capability Maturity Model (C2M2)
Cybersecurity Capability Maturity Model (C2M2)
 

Semelhante a Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala and Vinay Sridhar

MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Lviv Startup Club
 
Data Analytics in Digital Transformation
Data Analytics in Digital TransformationData Analytics in Digital Transformation
Data Analytics in Digital TransformationMukund Babbar
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPaige_Roberts
 
TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...
TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...
TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...Vasu S
 
Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Nisha Talagala
 
SplunkLive! Frankfurt 2018 - Integrating Metrics & Logs
SplunkLive! Frankfurt 2018 - Integrating Metrics & LogsSplunkLive! Frankfurt 2018 - Integrating Metrics & Logs
SplunkLive! Frankfurt 2018 - Integrating Metrics & LogsSplunk
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Catalysing research and enterprise collaboration in the data ecosystem
Catalysing research and enterprise collaboration in the data ecosystemCatalysing research and enterprise collaboration in the data ecosystem
Catalysing research and enterprise collaboration in the data ecosystemDublinked .
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
 
Graphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform OptimizationGraphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform OptimizationBig Data Value Association
 
Winning with data
Winning with dataWinning with data
Winning with dataNUS-ISS
 
Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...
Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...
Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...Precisely
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devopsUlf Mattsson
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowLviv Startup Club
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowEdunomica
 

Semelhante a Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala and Vinay Sridhar (20)

MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
 
Data Analytics in Digital Transformation
Data Analytics in Digital TransformationData Analytics in Digital Transformation
Data Analytics in Digital Transformation
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...
TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...
TDWI Checklist - The Automation and Optimization of Advanced Analytics Based ...
 
Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017Strata parallel m-ml-ops_sept_2017
Strata parallel m-ml-ops_sept_2017
 
SplunkLive! Frankfurt 2018 - Integrating Metrics & Logs
SplunkLive! Frankfurt 2018 - Integrating Metrics & LogsSplunkLive! Frankfurt 2018 - Integrating Metrics & Logs
SplunkLive! Frankfurt 2018 - Integrating Metrics & Logs
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Catalysing research and enterprise collaboration in the data ecosystem
Catalysing research and enterprise collaboration in the data ecosystemCatalysing research and enterprise collaboration in the data ecosystem
Catalysing research and enterprise collaboration in the data ecosystem
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.
 
Graphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform OptimizationGraphical Data Analytic Workflows and Cross-Platform Optimization
Graphical Data Analytic Workflows and Cross-Platform Optimization
 
Winning with data
Winning with dataWinning with data
Winning with data
 
Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...
Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...
Digital Transformation: How to Run Best-in-Class IT Operations in a World of ...
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devops
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
Mohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with KubeflowMohamed Sabri: Operationalize machine learning with Kubeflow
Mohamed Sabri: Operationalize machine learning with Kubeflow
 
Modernize NetOps with Business-Aware Network Monitoring
Modernize NetOps with Business-Aware Network MonitoringModernize NetOps with Business-Aware Network Monitoring
Modernize NetOps with Business-Aware Network Monitoring
 

Mais de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 

Último (20)

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 

Operationalizing Edge Machine Learning with Apache Spark with Nisha Talagala and Vinay Sridhar

  • 1. Operationalizing Edge Machine Learning with Apache Spark ParallelM
  • 2. 2 Growing AI Investments; Few Deployed at Scale Source: “Artificial Intelligence: The Next Digital Frontier?”, McKinsey Global Institute, June 2017 Out of 160 reviewed AI use cases: 88% did not progress beyond the experimental stage But successful early AI adopters report: Profit margins 3–15% higher than industry average 20% AI in Production 80% Developing, Experimenting, Contemplating Survey of 3073 AI-aware C-level Executives
  • 3. 3 Challenges of Deploying & Managing ML in Production • Diverse focus and expertise of Data Science & Ops teams • Increased risk from non-deterministic nature of ML • Current Operations solutions do not address uniqueness of ML Apps
  • 4. 4 Challenges of Edge/Distributed Topologies • Varied resources at each level • Scale, heterogeneity, disconnected operation IoT is Driving Explosive Growth in Data Volume “Things” Edge and Network Data Center/Cloud Data Lake
  • 5. 5 What We Need For Operational ML • Accelerate deployment & facilitate collaboration between Data & Ops teams • Monitor validity of ML predictions, diagnose data and ML performance issues • Orchestrate training, update, and configuration of ML pipelines across distributed, heterogeneous infrastructure with tracking
  • 6. 6 What We Need For Edge Operational ML • Distribute analytics processing to the optimal point for each use case • Flexible management framework enables: • Secure centralized and/or local learning, prediction, or combined learning/prediction • Granular monitoring and control of model update policies • Support multi-layer topologies to achieve maximum scale while accommodating low bandwidth or unreliable connectivity Sources Edge Intelligence Streams Data Lake Central/Cloud Intelligence Batches Edge/Cloud Orchestration
  • 7. 7 ML Orchestration ML Health Business Impact Model Governance Continuous Integration/ Deployment Database Business Value Machine Learning Models MLOps – Managing the full Production ML Lifecycle
  • 8. 8 Models, Retraining Control, Statistics Events, Alerts Data Data Science Platforms Data Streams Data Lakes Our Approach MCenter MCenter Server MCenter Agent MCenter Agent MCenter Agent MCenter Agent and more… Analytic Engines MCenter Developer Connectors
  • 9. 9 Operational Abstraction • Link pipelines (training and inference) via an ION (Intelligence Overlay Network) • Basically a Directed Graph representation with allowance for cycles • Pipelines are DAGs within each engine • Distributed execution over heterogeneous engines, programming languages and geographies Policy based Update Example – KMeans Batch Training Plus Streaming Inference Anomaly Detection
  • 10. 10 An Example ION to Resource Mapping Human Approved Model Update Every 5 min - EdgeEvery Tuesday at 10AM - Cloud Sources Edge Intelligence Streams Data Lake Central/Cloud Intelligence Batches Models
  • 13. 13 Integrating with Analytics Engines (Spark) Job Management • Via SparkLauncher: A library to control launching, monitoring and terminating jobs • PM Agent communicates with Spark through this library for job management (also uses Java API to launch child processes) Statistics • Via SparkListener: A Spark-driver callback service • SparkListener taps into all accumulators which, is one of the popular ways to expose statistics • PM agent communicates with the Spark driver and exposes statistics via a REST endpoint ML Health / Model collection and updates • PM Agent delivers and receives health events, health objects and models via sockets from custom PM components in the ML Pipeline
  • 16. 16 Data Lake What We Need For Edge Operational ML • Distribute analytics processing to the optimal point for each use case • Flexible management framework enables: • Secure centralized and/or local learning, prediction, or combined learning/prediction • Granular monitoring and control of model update policies • Support multi-layer topologies to achieve maximum scale while accommodating low bandwidth or unreliable connectivity Central/Cloud Intelligence Sources Edge Intelligence Batches Streams Edge/Cloud Orchestration
  • 17. 17 Integrating with Analytics Engines (TensorFlow) Job Management • TensorFlow Python programs run as standalone applications • Standard process control mechanisms based on the OS is used to monitor and control TensorFlow programs Statistics Collection • PM Agent parses contents via TensorBoard log files to extract meaningful statistics and events that data scientists added ML Health / Model collection • Generation of models and health objects is recorded on a shared medium
  • 18. 18 An Example ION 18 Node 1: Inference Pipeline Node 2: (Re) Training Pipeline Model Node 3: Policy Model Every 5 min on Spark cluster 1 Every Tuesday at 10AM on Spark cluster 2 Human approves/rejects When: anytime there is a new model