Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
End-to-End ML pipelines with Beam, Flink, TensorFlow and Hopsworks.
1. End-to-End ML pipelines with Beam,
Flink, TensorFlow, and Hopsworks
Theofilos Kakantousis
Software Engineer & COO
@theofiloskak
3rd Apache Beam meetup, Stockholm, July 2019
2. Agenda
1. End-to-end ML pipelines
2. What is Hopsworks
3. Beam Portable Runner with Flink in Hopsworks
4. ML Pipelines with Beam and TensorFlow Extended
5. Demo
6. Hopsworks Timeline
“If you’re working with big data and Hadoop, this one paper could repay your
investment in the Morning Paper many times over.... HopsFS is a huge win.”
- Adrian Colyer, The Morning Paper
World’s first Hadoop
platform to support
GPUs-as-a-Resource
World’s fastest
Hadoop Published
at USENIX FAST
with Oracle and
Spotify
World’s First
Open Source Feature
Store for Machine
Learning
World’s First
Distributed Filesystem to
store small files in metadata
on NVMe disks
Winner of IEEE
Scale Challenge
2017
with HopsFS -
1.2m ops/sec
2017
World’s most scalable
Filesystem with
Multi Data Center
Availability
2018 2019
World’s first
Open Source Platform
to support TensorFlow
Extended (TFX) on
Beam
10. Hopsworks REST API
● Manage Hopsworks resources via the REST API
○ Projects
○ Datasets
○ Jobs
○ FeatureStore
○ Experiments
○ ModelServing
○ Kafka
○ ...
● Documented with Swagger and hosted on SwaggerHub
○ https://app.swaggerhub.com/apis-docs/logicalclocks/hopsworks-api/0.10.0
12. Beam Portable Runner
Beam Model: Fn Runners
Apache
Flink
Apache
Spark
Beam Model: Pipeline Construction
Other
LanguagesBeam Java
Beam
Python
Execution Execution
Cloud
Dataflo
w
Execution
1. End users: who want to
write pipelines in a
language that’s familiar.
2. SDK writers: who want
to make Beam concepts
available in new
languages.
3. Runner writers: who
have a distributed
processing environment
and want to support
Beam pipelines
https://s.apache.org/apache-beam-project-overview
13. Beam-as-a-Service in Hopsworks
● Develop Beam pipelines in Python from Jupyter notebooks
● Tooling to simplify deployment and execution
● Manage lifecycle of Beam Portability JobService(JobServer)
● Logging and monitoring of Beam jobs
● SDK Workers(harness) with conda env
● Scalable execution on Flink/Spark clusters
14. Hopsworks API
● hops-util-py (Python) and HopsUtil(Java)
● Simplifies development:
○ Sets security config
○ Discover cluster services
○ Helper methods for the Hopsworks REST API
○ ML Experiments
● Manage Beam Runners and Job Service
https://github.com/logicalclocks/hops-util-py/, https://github.com/logicalclocks/hops-util
15. Beam Portability - Process vs Docker
● Docker:
○ Build image with all your
dependencies
○ Update or modify? build new
containers
○ Additional infrastructure
components
● Process:
○ Install dependencies on all
servers
○ Management of
dependencies?
○ Easy to update and modify
libraries
○ Challenge? Multi-tenancy &
keep servers in sync
● SDK Worker: SDK-provided program responsible for executing user code
● How to manage the user’s dependencies, libraries, … ?
16. First class Python: Conda in the Cluster
Conda Repo
Hopsworks Cluster
No need to write
Dockerfiles
18. Jupyter dashboard in Hopsworks
● Execute a Beam Python
pipeline
● With the Python kernel
either in a docker
container managed by
Kubernetes or as a local
Python process.
● In a PySpark executor in
the cluster.
22. Beam portability architecture in Hopsworks
Local/YARN/K8s
Compiled and shipped with
HopsFS dependencies
Hopsworks
Session cluster on YARN
HopsFS
Local/YARN/K8s
hops-util.py
23. Beam portability architecture in Hopsworks
Local/YARN/K8s
Hopsworks
Session cluster on YARN
HopsFS
Local/YARN/K8s
hops-util.py
# creates and starts runner
# localizes Job Service jar file
from HopsFS
# Provides arguments (ports,
artifacts_dir, etc.)
# Start Job Service and
returns host,port
# Job Service automatically
shuts down when Python
pipeline shuts down
host,port = start_runner()
24. Beam portability architecture in Hopsworks
Local/YARN/K8s
Hopsworks
Session cluster on YARN
HopsFS
Local/YARN/K8s
hops-util.py
Python conda env and
Hopsworks env
variables are set for
SDKWorker
26. Logging
● Flink JobManager and TaskManager
● Beam Job service
○ Local mode - logs in project’s Jupyter staging dir
○ Cluster - logs in the PySpark container where process is running.
● SDK Worker
○ Logs are in the Flink TaskManager container
● Collect and visualize with the ELK stack
○ Logs are accessible only by project members
30. Hidden Technical Debt in Machine Learning Systems
Data validation
Distributed
Training
Model
Serving
A/B
Testing
Monitoring
Pipeline Management
HyperParameter
Tuning
Feature Engineering
Data
Collection
Hardware
Management
Data Model Prediction
φ(x)
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
41. Horizontally Scalable ML Pipelines
Raw Data
Event Data
Monitor
HopsFS
Serving
Feature Store /
TFX Transform
Data PrepIngest DeployExperiment /
Train
logs
logs
Metadata Store
External
Model Analysis
FeatureStore
44. Conclusions & Future Work
● Summary
○ Hopsworks v1.0 the first on-prem open source horizontally scalable platform to support Beam
Portable Runner with Flink runner
○ Develop and Manage lifecycle of horizontally scalable End-to-End ML Pipelines with Beam and
TFX
● Future Work
○ Add support for Spark Runner
○ Export metrics for Flink runner to InfluxDB and visualize with Grafana
45. Contributors
Jim Dowling, Seif Haridi, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Ermias
Gebremeskel, Fabio Buso, Antonios Kouzoupis, Kim Hammar, Steffen Grohsschmiedt, Alex Ormenisan,
Robin Andersson, Moritz Meister, Kajetan Maliszewski, Netsanet Gebretsadkan Kidane, Sina Sheikholeslami,
Joel Stenkvist, August Bonds, Vasileios Giannokostas, Johan Svedlund Nordström,Rizvi Hasan, Paul Mälzer,
Bram Leenders, Juan Roca, Misganu Dessalegn, K “Sri” Srijeyanthan, Jude D’Souza, Alberto Lorente, Andre
Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Qi Qi, ...
46. How to get started with Hopsworks?
@hopsworks
Register for a free account at: www.hops.site
Images available for AWS, GCE, Virtualbox.
https://www.logicalclocks.com/
https://github.com/logicalclocks/hopsworks
https://www.meetup.com/HopsML-Stockholm
Reach us
@logicalclocks