Community adoption of Kubernetes (instead of YARN) as a scheduler for Apache Spark has been accelerating since the major improvements from Spark 3.0 release. Companies choose to run Spark on Kubernetes to use a single cloud-agnostic technology across their entire stack, and to benefit from improved isolation and resource sharing for concurrent workloads. In this talk, the founders of Data Mechanics, a serverless Spark platform powered by Kubernetes, will show how to easily get started with Spark on Kubernetes.
2. https://www.datamechanics.co
Who We Are
Jean-Yves “JY” Stephan
Co-Founder & CEO @ Data Mechanics
jy@datamechanics.co
Previously:
Software Engineer and
Spark Infrastructure Lead @ Databricks
Julien Dumazert
Co-Founder & CTO @ Data Mechanics
julien@datamechanics.co
Previously:
Lead Data Scientist @ ContentSquare
Data Scientist @ BlaBlaCar
3. https://www.datamechanics.co
Who Are You? (Live Poll)
What is your experience with running Spark on Kubernetes?
● I’ve never used it, but I’m curious to learn more about it.
● I’ve prototyped using it, but I’m not using it in production.
● I’m using it in production.
6. https://www.datamechanics.co
Data Mechanics is a serverless Spark platform...
● Autopilot features
○ Fast autoscaling
○ Automated pod and disk sizing
○ Autotuning Spark configuration
● Fully Dockerized
● Priced on Spark tasks time
(instead of wasted server
uptime)
7. https://www.datamechanics.co
... deployed on a k8s cluster in our customers’ cloud account
● Sensitive data does not leave this cloud account. Private clusters are supported.
● Data Mechanics manages the Kubernetes cluster (using EKS, GKE, AKS).
A Kubernetes cluster in our customer’s AWS, GCP, or Azure cloud account
APINotebooks
Data scientists
Data engineers
Script,
Airflow,
or other
scheduler
Data
Mechanics
Gateway
Autoscaling
node groups
8. https://www.datamechanics.co
How is Data Mechanics different from Spark-on-k8s open-source?
Check our blog post How Data Mechanics Improves On Spark on Kubernetes for more details
● Monitor your application
logs, configs, and metrics
● Jupyter and Airflow
Integrations
● Track your costs and
performance over time
● Automated tuning of VMs,
disks, and Spark configs
● Fast Autoscaling
● I/O optimizations
● Spot Nodes Support
Dynamic OptimizationsAn intuitive UI
● SSO & Private Clusters
support
● Optimized Spark images for
your use case.
● No Setup, No Maintenance.
Slack Support.
A Managed Service
11. Motivations for running Spark on Kubernetes
● High resource sharing - k8s
reallocates resources across
concurrent apps in <10s
● Each Spark app has its own
Spark version, python
version, and dependencies
● A rich ecosystem of tools for
your entire stack (logging &
monitoring, CI/CD, security)
● Reduce lock-in and deploy
everywhere (cloud, on-prem)
● Run non-Spark workloads on
the same cluster (Python
ETL, ML model serving, etc)
A cloud-agnostic infra layer
for your entire stack
Full isolation in a shared
cost-efficient cluster
● Reliable and fast way to
package dependencies
● Same environment in local,
dev, testing, and prod
● Simple workflow for data
scientists and engineers
Docker Development
Workflow
https://www.datamechanics.co
13. Checklist to get started with Spark-on-Kubernetes
● Save Spark logs to a
persistent storage
● Collect system metrics
(memory, CPU, I/O, …)
● Host the Spark History
Server
Monitoring
● 5-10x shuffle performance
boost using local SSDs
● Configure Spot Nodes and
handle spot interruptions
● Optimize Spark app
configs (pod sizing,
bin-packing)
Optimizations
● Create the cluster, with
proper networking, data
access, and node pools
● Install the spark-operator
and cluster-autoscaler
● Integrate your tools
(Airflow, Jupyter, CI/CD, …)
Basic Setup
Check our blog post Setting up, Managing & Monitoring Spark on Kubernetes for more details.
14. Set up the Spark History Server (Spark UI)
Do It Yourself (the hard way):
● Write Spark event logs to a persistent storage (using spark.eventLog.dir)
● Follow these instructions to install the Spark History Server as a Helm Chart.
Use Our Free Hosted Spark History Server (the easy way):
● Install our open-sourced Spark agent http://github.com/datamechanics/delight
● View the Spark UI at https://datamechanics.co/delight
https://www.datamechanics.co
15. Data Mechanics Delight: a free & cross-platform Spark UI
● With new system metrics (memory
& CPU) and a better UX
● First milestone is available:
Free Hosted Spark History Server
● Second milestone: Early 2021
New metrics and data vizs :)
● Get Started at
https://datamechanics.co/delight
https://www.datamechanics.co
16. For reliability & cost reduction, you should have different node pools:
● system pods on small on-demand nodes (m5.large)
● Spark driver pods on on-demand nodes (m5.xlarge)
● Spark executor pods on larger spot nodes (r5d.2xlarge)
Multiple node pools that scale down to zero
On-demand m5.xlarge
Driver Driver Spot r5d.2xlarge
Exec
Spot r5d.2xlarge
Exec
Spot r5d.2xlarge
Exec
On-demand m5.large
Spark-operator
Ingress
controller
https://www.datamechanics.co
17. ● Install the cluster-autoscaler
● Define a labelling scheme for your nodes to select them
● Create auto-scaling groups (ASGs) manually (use the Terraform AWS EKS module)
● Add those labels as ASG tags to inform the cluster-autoscaler
Example setup on AWS EKS
Node label ASG tag
acme-lifecycle: spot k8s.io/cluster-autoscaler/node-template/label/acme-lifecycle: spot
acme-instance: r5d.2xlarge k8s.io/cluster-autoscaler/node-template/label/acme-instance: r5d.2xlarge
https://www.datamechanics.co
18. Using preemptible nodes
We’re all set to schedule pods on preemptible nodes!
● Using vanilla Spark submit (another option is pod templates):
● Using the spark operator:
--conf spark.kubernetes.node.selector.acme-lifecyle=spot
spec:
driver:
nodeSelector:
- acme-lifecyle=preemptible
executor:
nodeSelector:
- acme-lifecyle=spot
https://www.datamechanics.co
20. Advantages of the Docker Dev Workflow for Spark
Build & run locally
for dev/testing
Build, push & run
with prod data on k8s
Control your environment
● Pick your Spark and Python version independently
● Package your complex dependencies in the image
Make Spark more reliable
● Same environment between dev, test, and prod
● No flaky runtime downloads/bootstrap actions
Speed up your iteration cycle
● Docker caches previous layers
● <30 seconds iteration cycle on prod data !
https://www.datamechanics.co
21. Spark & Docker Dev Workflow: Demo Time
What we’ll show
● Package your code and dependencies in a Docker image
● Iterate locally on the image
● Run the same image on Kubernetes
● Optimize performance at scale
The example
● Using the million song dataset (500G) from the Echo Nest
● Create harmonious playlists by comparing soundtracks
Credits to Kerem Turgutlu
https://www.datamechanics.co
25. Spark-on-Kubernetes improvements
February 2018
Spark 2.3
Initial release
June 2020
Spark 3.0
Dynamic allocation
Local code upload
November 2018
Spark 2.4
Client Mode
Volume mounts
Simpler dependency mgt
https://www.datamechanics.co
26. Dynamic allocation on Kubernetes
● Plays well with k8s autoscaling
○ Executors spin up in 5 seconds when
there is capacity, 1-2 min when a new
node must be provisioned
● Available since Spark 3.0
through shuffle tracking
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.shuffleTracking.enabled=true
https://www.datamechanics.co
27. Spark-on-Kubernetes improvements
February 2018
Spark 2.3
Initial release
June 2020
Spark 3.0
Dynamic allocation
Local code upload
December 2020
Spark 3.1
Spark-on-k8s is GA
(“experimental” removed)
Better Handle Node Shutdown
November 2018
Spark 2.4
Client Mode
Volume mounts
Simpler dependency mgt
https://www.datamechanics.co
28. NodeNode
Better Handling for Node Shutdown
Copy shuffle and cache data during graceful decommissioning of a node
Node
Driver
Exec
1) k8s warns
node of shutdown
Exec
This will occur:
● During dynamic allocation (downscale)
● Or when a node goes down (e.g. spot
interruption).
To handle spot interruptions, you need a
node termination handler (daemonset)
● AWS
● GCP
● Azure
https://www.datamechanics.co
29. NodeNode
Better Handling for Node Shutdown
Copy shuffle and cache data during graceful decommissioning of a node
Node
Driver
Exec
1) k8s warns
node of shutdown
2) Driver stops scheduling tasks.
Failed tasks do not count against
stage failure.
Exec
3) Shuffle & cached data is copied to other executors.
This will occur:
● During dynamic allocation (downscale)
● Or when a node goes down (e.g. spot
interruption).
To handle spot interruptions, you need a
node termination handler (daemonset)
● AWS
● GCP
● Azure
https://www.datamechanics.co
30. Node
Better Handling for Node Shutdown
Copy shuffle and cache data during graceful decommissioning of a node
Node
Driver
4) Spark application
continues unimpacted
Exec
This will occur:
● During dynamic allocation (downscale)
● Or when a node goes down (e.g. spot
interruption).
To handle spot interruptions, you need a
node termination handler (daemonset)
● AWS
● GCP
● Azure
https://www.datamechanics.co
31. Spark-on-Kubernetes improvements
February 2018
Spark 2.3
Initial release
June 2020
Spark 3.0
Dynamic allocation
Local code upload
December 2020
Spark 3.1
Spark-on-k8s is GA
(“experimental” removed)
Better Handle Node Shutdown
November 2018
Spark 2.4
Client Mode
Volume mounts
Simpler dependency mgt
TBD
Use remote storage for
persisting shuffle data
https://www.datamechanics.co
32. Thank you!
Your feedback is important to us.
Don’t forget to rate
and review the sessions.
Get in touch!
@JYStephan Jean-Yves Stephan
@DumazertJulien Julien Dumazert
@DataMechanics_
www.datamechanics.co
https://www.datamechanics.co