SlideShare uma empresa Scribd logo
1 de 46
Baixar para ler offline
End-to-End ML pipelines with Beam,
Flink, TensorFlow, and Hopsworks
Theofilos Kakantousis
Software Engineer & COO
@theofiloskak
3rd Apache Beam meetup, Stockholm, July 2019
Agenda
1. End-to-end ML pipelines
2. What is Hopsworks
3. Beam Portable Runner with Flink in Hopsworks
4. ML Pipelines with Beam and TensorFlow Extended
5. Demo
ML Pipelines
End-to-end ML Pipeline
Data
Prep
Data
Ingest
Train Serve
Online
Monitor
Distributed Storage
Raw
Data
Data
Lake
Resource Manager
Typical Feature Store pipeline
Hopsworks Timeline
“If you’re working with big data and Hadoop, this one paper could repay your
investment in the Morning Paper many times over.... HopsFS is a huge win.”
- Adrian Colyer, The Morning Paper
World’s first Hadoop
platform to support
GPUs-as-a-Resource
World’s fastest
Hadoop Published
at USENIX FAST
with Oracle and
Spotify
World’s First
Open Source Feature
Store for Machine
Learning
World’s First
Distributed Filesystem to
store small files in metadata
on NVMe disks
Winner of IEEE
Scale Challenge
2017
with HopsFS -
1.2m ops/sec
2017
World’s most scalable
Filesystem with
Multi Data Center
Availability
2018 2019
World’s first
Open Source Platform
to support TensorFlow
Extended (TFX) on
Beam
What is Hopsworks
What is Hopsworks
True Project-based multi-tenancy
Proj-XProject-42
Kafka TopicResources /Projs/My/Data
Project-AllExperimentsModels
Experiments
Hopsworks REST API
● Manage Hopsworks resources via the REST API
○ Projects
○ Datasets
○ Jobs
○ FeatureStore
○ Experiments
○ ModelServing
○ Kafka
○ ...
● Documented with Swagger and hosted on SwaggerHub
○ https://app.swaggerhub.com/apis-docs/logicalclocks/hopsworks-api/0.10.0
Beam on Hopsworks
Beam Portable Runner
Beam Model: Fn Runners
Apache
Flink
Apache
Spark
Beam Model: Pipeline Construction
Other
LanguagesBeam Java
Beam
Python
Execution Execution
Cloud
Dataflo
w
Execution
1. End users: who want to
write pipelines in a
language that’s familiar.
2. SDK writers: who want
to make Beam concepts
available in new
languages.
3. Runner writers: who
have a distributed
processing environment
and want to support
Beam pipelines
https://s.apache.org/apache-beam-project-overview
Beam-as-a-Service in Hopsworks
● Develop Beam pipelines in Python from Jupyter notebooks
● Tooling to simplify deployment and execution
● Manage lifecycle of Beam Portability JobService(JobServer)
● Logging and monitoring of Beam jobs
● SDK Workers(harness) with conda env
● Scalable execution on Flink/Spark clusters
Hopsworks API
● hops-util-py (Python) and HopsUtil(Java)
● Simplifies development:
○ Sets security config
○ Discover cluster services
○ Helper methods for the Hopsworks REST API
○ ML Experiments
● Manage Beam Runners and Job Service
https://github.com/logicalclocks/hops-util-py/, https://github.com/logicalclocks/hops-util
Beam Portability - Process vs Docker
● Docker:
○ Build image with all your
dependencies
○ Update or modify? build new
containers
○ Additional infrastructure
components
● Process:
○ Install dependencies on all
servers
○ Management of
dependencies?
○ Easy to update and modify
libraries
○ Challenge? Multi-tenancy &
keep servers in sync
● SDK Worker: SDK-provided program responsible for executing user code
● How to manage the user’s dependencies, libraries, … ?
First class Python: Conda in the Cluster
Conda Repo
Hopsworks Cluster
No need to write
Dockerfiles
Jupyter dashboard in Hopsworks
● Manage notebook
settings from
dashboard
Jupyter dashboard in Hopsworks
● Execute a Beam Python
pipeline
● With the Python kernel
either in a docker
container managed by
Kubernetes or as a local
Python process.
● In a PySpark executor in
the cluster.
Notebooks as Beam jobs in ML pipelines
Beam portability architecture in Hopsworks
https://www.slideshare.net/ThomasWeise/python-streaming-pipelines-on-flink-beam-meetup-at-lyft-2019
Beam portability architecture in Hopsworks
HopsFS
Local/YARN/K8s
Hopsworks
Session cluster on YARN
Beam portability architecture in Hopsworks
Local/YARN/K8s
Compiled and shipped with
HopsFS dependencies
Hopsworks
Session cluster on YARN
HopsFS
Local/YARN/K8s
hops-util.py
Beam portability architecture in Hopsworks
Local/YARN/K8s
Hopsworks
Session cluster on YARN
HopsFS
Local/YARN/K8s
hops-util.py
# creates and starts runner
# localizes Job Service jar file
from HopsFS
# Provides arguments (ports,
artifacts_dir, etc.)
# Start Job Service and
returns host,port
# Job Service automatically
shuts down when Python
pipeline shuts down
host,port = start_runner()
Beam portability architecture in Hopsworks
Local/YARN/K8s
Hopsworks
Session cluster on YARN
HopsFS
Local/YARN/K8s
hops-util.py
Python conda env and
Hopsworks env
variables are set for
SDKWorker
Hopsworks API
https://github.com/logicalclocks/hops-util-py/, https://github.com/logicalclocks/hops-util
def start_runner(
runner="flink",
runner_name="session",
runner_config=config)
def start_jobservice(
runner = "Resources",
artifacts_dir="Resources",
job_server_path="hdfs:///user/flink/",
job_server_jar="beam-runners-flink-1.8-job-server-2.13.0.jar",
sdk_worker_parallelism=1)
hops.beam.start_runner()
hops.beam.start_jobservice()
Logging
● Flink JobManager and TaskManager
● Beam Job service
○ Local mode - logs in project’s Jupyter staging dir
○ Cluster - logs in the PySpark container where process is running.
● SDK Worker
○ Logs are in the Flink TaskManager container
● Collect and visualize with the ELK stack
○ Logs are accessible only by project members
Logging
Secure Beam with TLS certificates
TensorFlow Extended (TFX)
Hidden Technical Debt in Machine Learning Systems
Data validation
Distributed
Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data
Collection
Hardware
Management
Data Model Prediction
φ(x)
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
TensorFlow Extended (TFX)
https://www.tensorflow.org/tfx
TFX on a Flink Cluster with Portable Runner
TFX on a Flink Cluster with Portable Runner
Distributed Deep Learning in Hopsworks
Executor 1 Executor N
Driver
HopsFS (HDFS)TensorBoard Model Serving
Experiments - TensorBoard
● Repeatable
experiments
● Manage
experiments
metadata
● Integration with
Tensorboard
Orchestration
Apache Airflow-as-a-Service
● Airflow available as a
multi-tenant service
in a Hopsworks
● Develop pipelines
with Hopsworks
operators and
sensors
Apache Airflow-as-a-Service
Apache Airflow-as-a-Service - TFX pipeline
●
Putting it all together
Horizontally Scalable ML Pipelines
Raw Data
Event Data
Monitor
HopsFS
Serving
Feature Store /
TFX Transform
Data PrepIngest DeployExperiment /
Train
logs
logs
Metadata Store
External
Model Analysis
FeatureStore
Compatibility...
● Hopsworks-1.0
● Beam 2.13.0
● Flink 1.8.0
● TensorFlow 1.14.0
● TFX 0.13
● TensorFlow Model Analysis 0.13.2
Demo
Conclusions & Future Work
● Summary
○ Hopsworks v1.0 the first on-prem open source horizontally scalable platform to support Beam
Portable Runner with Flink runner
○ Develop and Manage lifecycle of horizontally scalable End-to-End ML Pipelines with Beam and
TFX
● Future Work
○ Add support for Spark Runner
○ Export metrics for Flink runner to InfluxDB and visualize with Grafana
Contributors
Jim Dowling, Seif Haridi, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias
Gebremeskel, Fabio Buso, Antonios Kouzoupis, Kim Hammar, Steffen Grohsschmiedt, Alex Ormenisan,
Robin Andersson, Moritz Meister, Kajetan Maliszewski, Netsanet Gebretsadkan Kidane, Sina Sheikholeslami,
Joel Stenkvist, August Bonds, Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer,
Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre
Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Qi Qi, ...
How to get started with Hopsworks?
@hopsworks
Register for a free account at: www.hops.site
Images available for AWS, GCE, Virtualbox.
https://www.logicalclocks.com/
https://github.com/logicalclocks/hopsworks
https://www.meetup.com/HopsML-Stockholm
Reach us
@logicalclocks

Mais conteúdo relacionado

Mais procurados

KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesAnimesh Singh
 
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...OpenNebula Project
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Thomas Weise
 
Kubernetes The New Research Platform
Kubernetes The New Research PlatformKubernetes The New Research Platform
Kubernetes The New Research PlatformBob Killen
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPONOpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPONOpenNebula Project
 
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...OpenNebula Project
 
Composable infrastructure try valence
Composable infrastructure try valenceComposable infrastructure try valence
Composable infrastructure try valenceShuquan Huang
 
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...Flink Forward
 
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...Timothy Spann
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Thomas Weise
 
Managing microservices with istio on OpenShift - Meetup
Managing microservices with istio on OpenShift - MeetupManaging microservices with istio on OpenShift - Meetup
Managing microservices with istio on OpenShift - MeetupJosé Román Martín Gil
 
Dynamic pricing of Lyft rides using streaming
Dynamic pricing of Lyft rides using streamingDynamic pricing of Lyft rides using streaming
Dynamic pricing of Lyft rides using streamingAmar Pai
 
Notary - container signing
Notary - container signingNotary - container signing
Notary - container signingMoby Project
 
Cloud Native Applications on Kubernetes: a DevOps Approach
Cloud Native Applications on Kubernetes: a DevOps ApproachCloud Native Applications on Kubernetes: a DevOps Approach
Cloud Native Applications on Kubernetes: a DevOps ApproachNicola Ferraro
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?ArangoDB Database
 
InfluxDB Client Libraries and Applications by Ivan Kudibal, Engineering Manag...
InfluxDB Client Libraries and Applications by Ivan Kudibal, Engineering Manag...InfluxDB Client Libraries and Applications by Ivan Kudibal, Engineering Manag...
InfluxDB Client Libraries and Applications by Ivan Kudibal, Engineering Manag...InfluxData
 
p4alu: Arithmetic Logic Unit in P4
p4alu: Arithmetic Logic Unit in P4p4alu: Arithmetic Logic Unit in P4
p4alu: Arithmetic Logic Unit in P4Kentaro Ebisawa
 
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati..."The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...Edge AI and Vision Alliance
 

Mais procurados (20)

KFServing and Kubeflow Pipelines
KFServing and Kubeflow PipelinesKFServing and Kubeflow Pipelines
KFServing and Kubeflow Pipelines
 
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
 
Kubernetes The New Research Platform
Kubernetes The New Research PlatformKubernetes The New Research Platform
Kubernetes The New Research Platform
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPONOpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
 
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
 
Composable infrastructure try valence
Composable infrastructure try valenceComposable infrastructure try valence
Composable infrastructure try valence
 
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
 
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
 
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019Streaming your Lyft Ride Prices - Flink Forward SF 2019
Streaming your Lyft Ride Prices - Flink Forward SF 2019
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
Managing microservices with istio on OpenShift - Meetup
Managing microservices with istio on OpenShift - MeetupManaging microservices with istio on OpenShift - Meetup
Managing microservices with istio on OpenShift - Meetup
 
Dynamic pricing of Lyft rides using streaming
Dynamic pricing of Lyft rides using streamingDynamic pricing of Lyft rides using streaming
Dynamic pricing of Lyft rides using streaming
 
Notary - container signing
Notary - container signingNotary - container signing
Notary - container signing
 
Cloud Native Applications on Kubernetes: a DevOps Approach
Cloud Native Applications on Kubernetes: a DevOps ApproachCloud Native Applications on Kubernetes: a DevOps Approach
Cloud Native Applications on Kubernetes: a DevOps Approach
 
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
gVisor, Kata Containers, Firecracker, Docker: Who is Who in the Container Space?
 
InfluxDB Client Libraries and Applications by Ivan Kudibal, Engineering Manag...
InfluxDB Client Libraries and Applications by Ivan Kudibal, Engineering Manag...InfluxDB Client Libraries and Applications by Ivan Kudibal, Engineering Manag...
InfluxDB Client Libraries and Applications by Ivan Kudibal, Engineering Manag...
 
p4alu: Arithmetic Logic Unit in P4
p4alu: Arithmetic Logic Unit in P4p4alu: Arithmetic Logic Unit in P4
p4alu: Arithmetic Logic Unit in P4
 
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati..."The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
"The Vision Acceleration API Landscape: Options and Trade-offs," a Presentati...
 

Semelhante a End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.

Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19ExtremeEarth
 
Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIQAware GmbH
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302Timothy Spann
 
Hail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceTimothy Spann
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...DataStax Academy
 
Programming the Network Data Plane
Programming the Network Data PlaneProgramming the Network Data Plane
Programming the Network Data PlaneC4Media
 
Serverless Event Streaming Applications as Functions on K8
Serverless Event Streaming Applications as Functions on K8Serverless Event Streaming Applications as Functions on K8
Serverless Event Streaming Applications as Functions on K8DoKC
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Animesh Singh
 
Sysml 2019 demo_paper
Sysml 2019 demo_paperSysml 2019 demo_paper
Sysml 2019 demo_paperstrange_loop
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...Timothy Spann
 
Kubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformKubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformBob Killen
 
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrTimothy Spann
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopJim Dowling
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...DataWorks Summit
 
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembBuilding a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembStreamNative
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiSlim Baltagi
 

Semelhante a End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks. (20)

Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19The ExtremeEarth infrastructure-phiweek19
The ExtremeEarth infrastructure-phiweek19
 
Hopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AIHopsworks - The Platform for Data-Intensive AI
Hopsworks - The Platform for Data-Intensive AI
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
 
Hail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open source
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
 
Programming the Network Data Plane
Programming the Network Data PlaneProgramming the Network Data Plane
Programming the Network Data Plane
 
Serverless Event Streaming Applications as Functions on K8
Serverless Event Streaming Applications as Functions on K8Serverless Event Streaming Applications as Functions on K8
Serverless Event Streaming Applications as Functions on K8
 
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
Hybrid Cloud, Kubeflow and Tensorflow Extended [TFX]
 
Sysml 2019 demo_paper
Sysml 2019 demo_paperSysml 2019 demo_paper
Sysml 2019 demo_paper
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
 
Kubernetes: The Next Research Platform
Kubernetes: The Next Research PlatformKubernetes: The Next Research Platform
Kubernetes: The Next Research Platform
 
Real time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...Present and future of unified, portable, and efficient data processing with A...
Present and future of unified, portable, and efficient data processing with A...
 
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre ZembBuilding a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 

Último

Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 

Último (20)

Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 

End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.

  • 1. End-to-End ML pipelines with Beam, Flink, TensorFlow, and Hopsworks Theofilos Kakantousis Software Engineer & COO @theofiloskak 3rd Apache Beam meetup, Stockholm, July 2019
  • 2. Agenda 1. End-to-end ML pipelines 2. What is Hopsworks 3. Beam Portable Runner with Flink in Hopsworks 4. ML Pipelines with Beam and TensorFlow Extended 5. Demo
  • 4. End-to-end ML Pipeline Data Prep Data Ingest Train Serve Online Monitor Distributed Storage Raw Data Data Lake Resource Manager
  • 6. Hopsworks Timeline “If you’re working with big data and Hadoop, this one paper could repay your investment in the Morning Paper many times over.... HopsFS is a huge win.” - Adrian Colyer, The Morning Paper World’s first Hadoop platform to support GPUs-as-a-Resource World’s fastest Hadoop Published at USENIX FAST with Oracle and Spotify World’s First Open Source Feature Store for Machine Learning World’s First Distributed Filesystem to store small files in metadata on NVMe disks Winner of IEEE Scale Challenge 2017 with HopsFS - 1.2m ops/sec 2017 World’s most scalable Filesystem with Multi Data Center Availability 2018 2019 World’s first Open Source Platform to support TensorFlow Extended (TFX) on Beam
  • 9. True Project-based multi-tenancy Proj-XProject-42 Kafka TopicResources /Projs/My/Data Project-AllExperimentsModels Experiments
  • 10. Hopsworks REST API ● Manage Hopsworks resources via the REST API ○ Projects ○ Datasets ○ Jobs ○ FeatureStore ○ Experiments ○ ModelServing ○ Kafka ○ ... ● Documented with Swagger and hosted on SwaggerHub ○ https://app.swaggerhub.com/apis-docs/logicalclocks/hopsworks-api/0.10.0
  • 12. Beam Portable Runner Beam Model: Fn Runners Apache Flink Apache Spark Beam Model: Pipeline Construction Other LanguagesBeam Java Beam Python Execution Execution Cloud Dataflo w Execution 1. End users: who want to write pipelines in a language that’s familiar. 2. SDK writers: who want to make Beam concepts available in new languages. 3. Runner writers: who have a distributed processing environment and want to support Beam pipelines https://s.apache.org/apache-beam-project-overview
  • 13. Beam-as-a-Service in Hopsworks ● Develop Beam pipelines in Python from Jupyter notebooks ● Tooling to simplify deployment and execution ● Manage lifecycle of Beam Portability JobService(JobServer) ● Logging and monitoring of Beam jobs ● SDK Workers(harness) with conda env ● Scalable execution on Flink/Spark clusters
  • 14. Hopsworks API ● hops-util-py (Python) and HopsUtil(Java) ● Simplifies development: ○ Sets security config ○ Discover cluster services ○ Helper methods for the Hopsworks REST API ○ ML Experiments ● Manage Beam Runners and Job Service https://github.com/logicalclocks/hops-util-py/, https://github.com/logicalclocks/hops-util
  • 15. Beam Portability - Process vs Docker ● Docker: ○ Build image with all your dependencies ○ Update or modify? build new containers ○ Additional infrastructure components ● Process: ○ Install dependencies on all servers ○ Management of dependencies? ○ Easy to update and modify libraries ○ Challenge? Multi-tenancy & keep servers in sync ● SDK Worker: SDK-provided program responsible for executing user code ● How to manage the user’s dependencies, libraries, … ?
  • 16. First class Python: Conda in the Cluster Conda Repo Hopsworks Cluster No need to write Dockerfiles
  • 17. Jupyter dashboard in Hopsworks ● Manage notebook settings from dashboard
  • 18. Jupyter dashboard in Hopsworks ● Execute a Beam Python pipeline ● With the Python kernel either in a docker container managed by Kubernetes or as a local Python process. ● In a PySpark executor in the cluster.
  • 19. Notebooks as Beam jobs in ML pipelines
  • 20. Beam portability architecture in Hopsworks https://www.slideshare.net/ThomasWeise/python-streaming-pipelines-on-flink-beam-meetup-at-lyft-2019
  • 21. Beam portability architecture in Hopsworks HopsFS Local/YARN/K8s Hopsworks Session cluster on YARN
  • 22. Beam portability architecture in Hopsworks Local/YARN/K8s Compiled and shipped with HopsFS dependencies Hopsworks Session cluster on YARN HopsFS Local/YARN/K8s hops-util.py
  • 23. Beam portability architecture in Hopsworks Local/YARN/K8s Hopsworks Session cluster on YARN HopsFS Local/YARN/K8s hops-util.py # creates and starts runner # localizes Job Service jar file from HopsFS # Provides arguments (ports, artifacts_dir, etc.) # Start Job Service and returns host,port # Job Service automatically shuts down when Python pipeline shuts down host,port = start_runner()
  • 24. Beam portability architecture in Hopsworks Local/YARN/K8s Hopsworks Session cluster on YARN HopsFS Local/YARN/K8s hops-util.py Python conda env and Hopsworks env variables are set for SDKWorker
  • 25. Hopsworks API https://github.com/logicalclocks/hops-util-py/, https://github.com/logicalclocks/hops-util def start_runner( runner="flink", runner_name="session", runner_config=config) def start_jobservice( runner = "Resources", artifacts_dir="Resources", job_server_path="hdfs:///user/flink/", job_server_jar="beam-runners-flink-1.8-job-server-2.13.0.jar", sdk_worker_parallelism=1) hops.beam.start_runner() hops.beam.start_jobservice()
  • 26. Logging ● Flink JobManager and TaskManager ● Beam Job service ○ Local mode - logs in project’s Jupyter staging dir ○ Cluster - logs in the PySpark container where process is running. ● SDK Worker ○ Logs are in the Flink TaskManager container ● Collect and visualize with the ELK stack ○ Logs are accessible only by project members
  • 28. Secure Beam with TLS certificates
  • 30. Hidden Technical Debt in Machine Learning Systems Data validation Distributed Training Model Serving A/B Testing Monitoring Pipeline Management HyperParameter Tuning Feature Engineering Data Collection Hardware Management Data Model Prediction φ(x) https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
  • 32. TFX on a Flink Cluster with Portable Runner
  • 33. TFX on a Flink Cluster with Portable Runner
  • 34. Distributed Deep Learning in Hopsworks Executor 1 Executor N Driver HopsFS (HDFS)TensorBoard Model Serving
  • 35. Experiments - TensorBoard ● Repeatable experiments ● Manage experiments metadata ● Integration with Tensorboard
  • 37. Apache Airflow-as-a-Service ● Airflow available as a multi-tenant service in a Hopsworks ● Develop pipelines with Hopsworks operators and sensors
  • 39. Apache Airflow-as-a-Service - TFX pipeline ●
  • 40. Putting it all together
  • 41. Horizontally Scalable ML Pipelines Raw Data Event Data Monitor HopsFS Serving Feature Store / TFX Transform Data PrepIngest DeployExperiment / Train logs logs Metadata Store External Model Analysis FeatureStore
  • 42. Compatibility... ● Hopsworks-1.0 ● Beam 2.13.0 ● Flink 1.8.0 ● TensorFlow 1.14.0 ● TFX 0.13 ● TensorFlow Model Analysis 0.13.2
  • 43. Demo
  • 44. Conclusions & Future Work ● Summary ○ Hopsworks v1.0 the first on-prem open source horizontally scalable platform to support Beam Portable Runner with Flink runner ○ Develop and Manage lifecycle of horizontally scalable End-to-End ML Pipelines with Beam and TFX ● Future Work ○ Add support for Spark Runner ○ Export metrics for Flink runner to InfluxDB and visualize with Grafana
  • 45. Contributors Jim Dowling, Seif Haridi, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias Gebremeskel, Fabio Buso, Antonios Kouzoupis, Kim Hammar, Steffen Grohsschmiedt, Alex Ormenisan, Robin Andersson, Moritz Meister, Kajetan Maliszewski, Netsanet Gebretsadkan Kidane, Sina Sheikholeslami, Joel Stenkvist, August Bonds, Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Qi Qi, ...
  • 46. How to get started with Hopsworks? @hopsworks Register for a free account at: www.hops.site Images available for AWS, GCE, Virtualbox. https://www.logicalclocks.com/ https://github.com/logicalclocks/hopsworks https://www.meetup.com/HopsML-Stockholm Reach us @logicalclocks