SlideShare a Scribd company logo
1 of 32
Download to read offline
https://www.datamechanics.co
Getting Started with
Apache Spark on Kubernetes
Jean-Yves Stephan, Co-Founder & CEO @ Data Mechanics
Julien Dumazert, Co-Founder & CTO @ Data Mechanics
www.datamechanics.co
https://www.datamechanics.co
Who We Are
Jean-Yves “JY” Stephan
Co-Founder & CEO @ Data Mechanics
jy@datamechanics.co
Previously:
Software Engineer and
Spark Infrastructure Lead @ Databricks
Julien Dumazert
Co-Founder & CTO @ Data Mechanics
julien@datamechanics.co
Previously:
Lead Data Scientist @ ContentSquare
Data Scientist @ BlaBlaCar
https://www.datamechanics.co
Who Are You? (Live Poll)
What is your experience with running Spark on Kubernetes?
● I’ve never used it, but I’m curious to learn more about it.
● I’ve prototyped using it, but I’m not using it in production.
● I’m using it in production.
https://www.datamechanics.co
Agenda
What is Data Mechanics ?
Why run Spark on Kubernetes ?
How to get started ?
End-to-end dev workflow (demo)
Future of Spark-on-Kubernetes
https://www.datamechanics.co
Agenda
What is Data Mechanics ?
Why run Spark on Kubernetes ?
How to get started ?
End-to-end dev workflow (demo)
Future of Spark-on-Kubernetes
https://www.datamechanics.co
Data Mechanics is a serverless Spark platform...
● Autopilot features
○ Fast autoscaling
○ Automated pod and disk sizing
○ Autotuning Spark configuration
● Fully Dockerized
● Priced on Spark tasks time
(instead of wasted server
uptime)
https://www.datamechanics.co
... deployed on a k8s cluster in our customers’ cloud account
● Sensitive data does not leave this cloud account. Private clusters are supported.
● Data Mechanics manages the Kubernetes cluster (using EKS, GKE, AKS).
A Kubernetes cluster in our customer’s AWS, GCP, or Azure cloud account
APINotebooks
Data scientists
Data engineers
Script,
Airflow,
or other
scheduler
Data
Mechanics
Gateway
Autoscaling
node groups
https://www.datamechanics.co
How is Data Mechanics different from Spark-on-k8s open-source?
Check our blog post How Data Mechanics Improves On Spark on Kubernetes for more details
● Monitor your application
logs, configs, and metrics
● Jupyter and Airflow
Integrations
● Track your costs and
performance over time
● Automated tuning of VMs,
disks, and Spark configs
● Fast Autoscaling
● I/O optimizations
● Spot Nodes Support
Dynamic OptimizationsAn intuitive UI
● SSO & Private Clusters
support
● Optimized Spark images for
your use case.
● No Setup, No Maintenance.
Slack Support.
A Managed Service
https://www.datamechanics.co
Agenda
What is Data Mechanics ?
Why run Spark on Kubernetes ?
How to get started ?
End-to-end dev workflow (demo)
Future of Spark-on-Kubernetes
Architecture of Spark-on-Kubernetes
https://www.datamechanics.co
Motivations for running Spark on Kubernetes
● High resource sharing - k8s
reallocates resources across
concurrent apps in <10s
● Each Spark app has its own
Spark version, python
version, and dependencies
● A rich ecosystem of tools for
your entire stack (logging &
monitoring, CI/CD, security)
● Reduce lock-in and deploy
everywhere (cloud, on-prem)
● Run non-Spark workloads on
the same cluster (Python
ETL, ML model serving, etc)
A cloud-agnostic infra layer
for your entire stack
Full isolation in a shared
cost-efficient cluster
● Reliable and fast way to
package dependencies
● Same environment in local,
dev, testing, and prod
● Simple workflow for data
scientists and engineers
Docker Development
Workflow
https://www.datamechanics.co
https://www.datamechanics.co
Agenda
What is Data Mechanics ?
Why choose Spark on Kubernetes ?
How to get started ?
End-to-end dev workflow (demo)
Future of Spark-on-Kubernetes
Checklist to get started with Spark-on-Kubernetes
● Save Spark logs to a
persistent storage
● Collect system metrics
(memory, CPU, I/O, …)
● Host the Spark History
Server
Monitoring
● 5-10x shuffle performance
boost using local SSDs
● Configure Spot Nodes and
handle spot interruptions
● Optimize Spark app
configs (pod sizing,
bin-packing)
Optimizations
● Create the cluster, with
proper networking, data
access, and node pools
● Install the spark-operator
and cluster-autoscaler
● Integrate your tools
(Airflow, Jupyter, CI/CD, …)
Basic Setup
Check our blog post Setting up, Managing & Monitoring Spark on Kubernetes for more details.
Set up the Spark History Server (Spark UI)
Do It Yourself (the hard way):
● Write Spark event logs to a persistent storage (using spark.eventLog.dir)
● Follow these instructions to install the Spark History Server as a Helm Chart.
Use Our Free Hosted Spark History Server (the easy way):
● Install our open-sourced Spark agent http://github.com/datamechanics/delight
● View the Spark UI at https://datamechanics.co/delight
https://www.datamechanics.co
Data Mechanics Delight: a free & cross-platform Spark UI
● With new system metrics (memory
& CPU) and a better UX
● First milestone is available:
Free Hosted Spark History Server
● Second milestone: Early 2021
New metrics and data vizs :)
● Get Started at
https://datamechanics.co/delight
https://www.datamechanics.co
For reliability & cost reduction, you should have different node pools:
● system pods on small on-demand nodes (m5.large)
● Spark driver pods on on-demand nodes (m5.xlarge)
● Spark executor pods on larger spot nodes (r5d.2xlarge)
Multiple node pools that scale down to zero
On-demand m5.xlarge
Driver Driver Spot r5d.2xlarge
Exec
Spot r5d.2xlarge
Exec
Spot r5d.2xlarge
Exec
On-demand m5.large
Spark-operator
Ingress
controller
https://www.datamechanics.co
● Install the cluster-autoscaler
● Define a labelling scheme for your nodes to select them
● Create auto-scaling groups (ASGs) manually (use the Terraform AWS EKS module)
● Add those labels as ASG tags to inform the cluster-autoscaler
Example setup on AWS EKS
Node label ASG tag
acme-lifecycle: spot k8s.io/cluster-autoscaler/node-template/label/acme-lifecycle: spot
acme-instance: r5d.2xlarge k8s.io/cluster-autoscaler/node-template/label/acme-instance: r5d.2xlarge
https://www.datamechanics.co
Using preemptible nodes
We’re all set to schedule pods on preemptible nodes!
● Using vanilla Spark submit (another option is pod templates):
● Using the spark operator:
--conf spark.kubernetes.node.selector.acme-lifecyle=spot
spec:
driver:
nodeSelector:
- acme-lifecyle=preemptible
executor:
nodeSelector:
- acme-lifecyle=spot
https://www.datamechanics.co
https://www.datamechanics.co
Agenda
What is Data Mechanics ?
Why choose Spark on Kubernetes ?
How to get started ?
End-to-end dev workflow (demo)
Future of Spark-on-Kubernetes
Advantages of the Docker Dev Workflow for Spark
Build & run locally
for dev/testing
Build, push & run
with prod data on k8s
Control your environment
● Pick your Spark and Python version independently
● Package your complex dependencies in the image
Make Spark more reliable
● Same environment between dev, test, and prod
● No flaky runtime downloads/bootstrap actions
Speed up your iteration cycle
● Docker caches previous layers
● <30 seconds iteration cycle on prod data !
https://www.datamechanics.co
Spark & Docker Dev Workflow: Demo Time
What we’ll show
● Package your code and dependencies in a Docker image
● Iterate locally on the image
● Run the same image on Kubernetes
● Optimize performance at scale
The example
● Using the million song dataset (500G) from the Echo Nest
● Create harmonious playlists by comparing soundtracks
Credits to Kerem Turgutlu
https://www.datamechanics.co
https://www.datamechanics.co
Agenda
What is Data Mechanics ?
Why choose Spark on Kubernetes ?
How to get started ?
End-to-end dev workflow (demo)
Future of Spark-on-Kubernetes
Spark-on-Kubernetes improvements
February 2018
Spark 2.3
Initial release
https://www.datamechanics.co
Spark-on-Kubernetes improvements
February 2018
Spark 2.3
Initial release
November 2018
Spark 2.4
Client Mode
Volume mounts
Simpler dependency mgt
https://www.datamechanics.co
Spark-on-Kubernetes improvements
February 2018
Spark 2.3
Initial release
June 2020
Spark 3.0
Dynamic allocation
Local code upload
November 2018
Spark 2.4
Client Mode
Volume mounts
Simpler dependency mgt
https://www.datamechanics.co
Dynamic allocation on Kubernetes
● Plays well with k8s autoscaling
○ Executors spin up in 5 seconds when
there is capacity, 1-2 min when a new
node must be provisioned
● Available since Spark 3.0
through shuffle tracking
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.shuffleTracking.enabled=true
https://www.datamechanics.co
Spark-on-Kubernetes improvements
February 2018
Spark 2.3
Initial release
June 2020
Spark 3.0
Dynamic allocation
Local code upload
December 2020
Spark 3.1
Spark-on-k8s is GA
(“experimental” removed)
Better Handle Node Shutdown
November 2018
Spark 2.4
Client Mode
Volume mounts
Simpler dependency mgt
https://www.datamechanics.co
NodeNode
Better Handling for Node Shutdown
Copy shuffle and cache data during graceful decommissioning of a node
Node
Driver
Exec
1) k8s warns
node of shutdown
Exec
This will occur:
● During dynamic allocation (downscale)
● Or when a node goes down (e.g. spot
interruption).
To handle spot interruptions, you need a
node termination handler (daemonset)
● AWS
● GCP
● Azure
https://www.datamechanics.co
NodeNode
Better Handling for Node Shutdown
Copy shuffle and cache data during graceful decommissioning of a node
Node
Driver
Exec
1) k8s warns
node of shutdown
2) Driver stops scheduling tasks.
Failed tasks do not count against
stage failure.
Exec
3) Shuffle & cached data is copied to other executors.
This will occur:
● During dynamic allocation (downscale)
● Or when a node goes down (e.g. spot
interruption).
To handle spot interruptions, you need a
node termination handler (daemonset)
● AWS
● GCP
● Azure
https://www.datamechanics.co
Node
Better Handling for Node Shutdown
Copy shuffle and cache data during graceful decommissioning of a node
Node
Driver
4) Spark application
continues unimpacted
Exec
This will occur:
● During dynamic allocation (downscale)
● Or when a node goes down (e.g. spot
interruption).
To handle spot interruptions, you need a
node termination handler (daemonset)
● AWS
● GCP
● Azure
https://www.datamechanics.co
Spark-on-Kubernetes improvements
February 2018
Spark 2.3
Initial release
June 2020
Spark 3.0
Dynamic allocation
Local code upload
December 2020
Spark 3.1
Spark-on-k8s is GA
(“experimental” removed)
Better Handle Node Shutdown
November 2018
Spark 2.4
Client Mode
Volume mounts
Simpler dependency mgt
TBD
Use remote storage for
persisting shuffle data
https://www.datamechanics.co
Thank you!
Your feedback is important to us.
Don’t forget to rate
and review the sessions.
Get in touch!
@JYStephan Jean-Yves Stephan
@DumazertJulien Julien Dumazert
@DataMechanics_
www.datamechanics.co
https://www.datamechanics.co

More Related Content

What's hot

Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
Databricks
 

What's hot (20)

How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0Deep Dive into the New Features of Apache Spark 3.0
Deep Dive into the New Features of Apache Spark 3.0
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
 
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdfRun Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache KafkaThe Heart of the Data Mesh Beats in Real-Time with Apache Kafka
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 

Similar to Getting Started with Apache Spark on Kubernetes

Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
Databricks
 

Similar to Getting Started with Apache Spark on Kubernetes (20)

Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with KubernetesKubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
Kubernetes Forum Seoul 2019: Re-architecting Data Platform with Kubernetes
 
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native KubernetesSimplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
 
Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3Native support of Prometheus monitoring in Apache Spark 3
Native support of Prometheus monitoring in Apache Spark 3
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Reliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on KubernetesReliable Performance at Scale with Apache Spark on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
20180417 hivemall meetup#4
20180417 hivemall meetup#420180417 hivemall meetup#4
20180417 hivemall meetup#4
 
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and PluginsMonitor Apache Spark 3 on Kubernetes using Metrics and Plugins
Monitor Apache Spark 3 on Kubernetes using Metrics and Plugins
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
What is New with Apache Spark Performance Monitoring in Spark 3.0
What is New with Apache Spark Performance Monitoring in Spark 3.0What is New with Apache Spark Performance Monitoring in Spark 3.0
What is New with Apache Spark Performance Monitoring in Spark 3.0
 
PySpark on Kubernetes @ Python Barcelona March Meetup
PySpark on Kubernetes @ Python Barcelona March MeetupPySpark on Kubernetes @ Python Barcelona March Meetup
PySpark on Kubernetes @ Python Barcelona March Meetup
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
 
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud PlatformTeaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
 
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-PlatformDelight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
 
Big data with Python on kubernetes (pyspark on k8s) - Big Data Spain 2018
Big data with Python on kubernetes (pyspark on k8s) - Big Data Spain 2018Big data with Python on kubernetes (pyspark on k8s) - Big Data Spain 2018
Big data with Python on kubernetes (pyspark on k8s) - Big Data Spain 2018
 
Modern DevOps with Spinnaker/Concourse and Micrometer
Modern DevOps with Spinnaker/Concourse and MicrometerModern DevOps with Spinnaker/Concourse and Micrometer
Modern DevOps with Spinnaker/Concourse and Micrometer
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Recently uploaded

Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Getting Started with Apache Spark on Kubernetes

  • 1. https://www.datamechanics.co Getting Started with Apache Spark on Kubernetes Jean-Yves Stephan, Co-Founder & CEO @ Data Mechanics Julien Dumazert, Co-Founder & CTO @ Data Mechanics www.datamechanics.co
  • 2. https://www.datamechanics.co Who We Are Jean-Yves “JY” Stephan Co-Founder & CEO @ Data Mechanics jy@datamechanics.co Previously: Software Engineer and Spark Infrastructure Lead @ Databricks Julien Dumazert Co-Founder & CTO @ Data Mechanics julien@datamechanics.co Previously: Lead Data Scientist @ ContentSquare Data Scientist @ BlaBlaCar
  • 3. https://www.datamechanics.co Who Are You? (Live Poll) What is your experience with running Spark on Kubernetes? ● I’ve never used it, but I’m curious to learn more about it. ● I’ve prototyped using it, but I’m not using it in production. ● I’m using it in production.
  • 4. https://www.datamechanics.co Agenda What is Data Mechanics ? Why run Spark on Kubernetes ? How to get started ? End-to-end dev workflow (demo) Future of Spark-on-Kubernetes
  • 5. https://www.datamechanics.co Agenda What is Data Mechanics ? Why run Spark on Kubernetes ? How to get started ? End-to-end dev workflow (demo) Future of Spark-on-Kubernetes
  • 6. https://www.datamechanics.co Data Mechanics is a serverless Spark platform... ● Autopilot features ○ Fast autoscaling ○ Automated pod and disk sizing ○ Autotuning Spark configuration ● Fully Dockerized ● Priced on Spark tasks time (instead of wasted server uptime)
  • 7. https://www.datamechanics.co ... deployed on a k8s cluster in our customers’ cloud account ● Sensitive data does not leave this cloud account. Private clusters are supported. ● Data Mechanics manages the Kubernetes cluster (using EKS, GKE, AKS). A Kubernetes cluster in our customer’s AWS, GCP, or Azure cloud account APINotebooks Data scientists Data engineers Script, Airflow, or other scheduler Data Mechanics Gateway Autoscaling node groups
  • 8. https://www.datamechanics.co How is Data Mechanics different from Spark-on-k8s open-source? Check our blog post How Data Mechanics Improves On Spark on Kubernetes for more details ● Monitor your application logs, configs, and metrics ● Jupyter and Airflow Integrations ● Track your costs and performance over time ● Automated tuning of VMs, disks, and Spark configs ● Fast Autoscaling ● I/O optimizations ● Spot Nodes Support Dynamic OptimizationsAn intuitive UI ● SSO & Private Clusters support ● Optimized Spark images for your use case. ● No Setup, No Maintenance. Slack Support. A Managed Service
  • 9. https://www.datamechanics.co Agenda What is Data Mechanics ? Why run Spark on Kubernetes ? How to get started ? End-to-end dev workflow (demo) Future of Spark-on-Kubernetes
  • 11. Motivations for running Spark on Kubernetes ● High resource sharing - k8s reallocates resources across concurrent apps in <10s ● Each Spark app has its own Spark version, python version, and dependencies ● A rich ecosystem of tools for your entire stack (logging & monitoring, CI/CD, security) ● Reduce lock-in and deploy everywhere (cloud, on-prem) ● Run non-Spark workloads on the same cluster (Python ETL, ML model serving, etc) A cloud-agnostic infra layer for your entire stack Full isolation in a shared cost-efficient cluster ● Reliable and fast way to package dependencies ● Same environment in local, dev, testing, and prod ● Simple workflow for data scientists and engineers Docker Development Workflow https://www.datamechanics.co
  • 12. https://www.datamechanics.co Agenda What is Data Mechanics ? Why choose Spark on Kubernetes ? How to get started ? End-to-end dev workflow (demo) Future of Spark-on-Kubernetes
  • 13. Checklist to get started with Spark-on-Kubernetes ● Save Spark logs to a persistent storage ● Collect system metrics (memory, CPU, I/O, …) ● Host the Spark History Server Monitoring ● 5-10x shuffle performance boost using local SSDs ● Configure Spot Nodes and handle spot interruptions ● Optimize Spark app configs (pod sizing, bin-packing) Optimizations ● Create the cluster, with proper networking, data access, and node pools ● Install the spark-operator and cluster-autoscaler ● Integrate your tools (Airflow, Jupyter, CI/CD, …) Basic Setup Check our blog post Setting up, Managing & Monitoring Spark on Kubernetes for more details.
  • 14. Set up the Spark History Server (Spark UI) Do It Yourself (the hard way): ● Write Spark event logs to a persistent storage (using spark.eventLog.dir) ● Follow these instructions to install the Spark History Server as a Helm Chart. Use Our Free Hosted Spark History Server (the easy way): ● Install our open-sourced Spark agent http://github.com/datamechanics/delight ● View the Spark UI at https://datamechanics.co/delight https://www.datamechanics.co
  • 15. Data Mechanics Delight: a free & cross-platform Spark UI ● With new system metrics (memory & CPU) and a better UX ● First milestone is available: Free Hosted Spark History Server ● Second milestone: Early 2021 New metrics and data vizs :) ● Get Started at https://datamechanics.co/delight https://www.datamechanics.co
  • 16. For reliability & cost reduction, you should have different node pools: ● system pods on small on-demand nodes (m5.large) ● Spark driver pods on on-demand nodes (m5.xlarge) ● Spark executor pods on larger spot nodes (r5d.2xlarge) Multiple node pools that scale down to zero On-demand m5.xlarge Driver Driver Spot r5d.2xlarge Exec Spot r5d.2xlarge Exec Spot r5d.2xlarge Exec On-demand m5.large Spark-operator Ingress controller https://www.datamechanics.co
  • 17. ● Install the cluster-autoscaler ● Define a labelling scheme for your nodes to select them ● Create auto-scaling groups (ASGs) manually (use the Terraform AWS EKS module) ● Add those labels as ASG tags to inform the cluster-autoscaler Example setup on AWS EKS Node label ASG tag acme-lifecycle: spot k8s.io/cluster-autoscaler/node-template/label/acme-lifecycle: spot acme-instance: r5d.2xlarge k8s.io/cluster-autoscaler/node-template/label/acme-instance: r5d.2xlarge https://www.datamechanics.co
  • 18. Using preemptible nodes We’re all set to schedule pods on preemptible nodes! ● Using vanilla Spark submit (another option is pod templates): ● Using the spark operator: --conf spark.kubernetes.node.selector.acme-lifecyle=spot spec: driver: nodeSelector: - acme-lifecyle=preemptible executor: nodeSelector: - acme-lifecyle=spot https://www.datamechanics.co
  • 19. https://www.datamechanics.co Agenda What is Data Mechanics ? Why choose Spark on Kubernetes ? How to get started ? End-to-end dev workflow (demo) Future of Spark-on-Kubernetes
  • 20. Advantages of the Docker Dev Workflow for Spark Build & run locally for dev/testing Build, push & run with prod data on k8s Control your environment ● Pick your Spark and Python version independently ● Package your complex dependencies in the image Make Spark more reliable ● Same environment between dev, test, and prod ● No flaky runtime downloads/bootstrap actions Speed up your iteration cycle ● Docker caches previous layers ● <30 seconds iteration cycle on prod data ! https://www.datamechanics.co
  • 21. Spark & Docker Dev Workflow: Demo Time What we’ll show ● Package your code and dependencies in a Docker image ● Iterate locally on the image ● Run the same image on Kubernetes ● Optimize performance at scale The example ● Using the million song dataset (500G) from the Echo Nest ● Create harmonious playlists by comparing soundtracks Credits to Kerem Turgutlu https://www.datamechanics.co
  • 22. https://www.datamechanics.co Agenda What is Data Mechanics ? Why choose Spark on Kubernetes ? How to get started ? End-to-end dev workflow (demo) Future of Spark-on-Kubernetes
  • 23. Spark-on-Kubernetes improvements February 2018 Spark 2.3 Initial release https://www.datamechanics.co
  • 24. Spark-on-Kubernetes improvements February 2018 Spark 2.3 Initial release November 2018 Spark 2.4 Client Mode Volume mounts Simpler dependency mgt https://www.datamechanics.co
  • 25. Spark-on-Kubernetes improvements February 2018 Spark 2.3 Initial release June 2020 Spark 3.0 Dynamic allocation Local code upload November 2018 Spark 2.4 Client Mode Volume mounts Simpler dependency mgt https://www.datamechanics.co
  • 26. Dynamic allocation on Kubernetes ● Plays well with k8s autoscaling ○ Executors spin up in 5 seconds when there is capacity, 1-2 min when a new node must be provisioned ● Available since Spark 3.0 through shuffle tracking spark.dynamicAllocation.enabled=true spark.dynamicAllocation.shuffleTracking.enabled=true https://www.datamechanics.co
  • 27. Spark-on-Kubernetes improvements February 2018 Spark 2.3 Initial release June 2020 Spark 3.0 Dynamic allocation Local code upload December 2020 Spark 3.1 Spark-on-k8s is GA (“experimental” removed) Better Handle Node Shutdown November 2018 Spark 2.4 Client Mode Volume mounts Simpler dependency mgt https://www.datamechanics.co
  • 28. NodeNode Better Handling for Node Shutdown Copy shuffle and cache data during graceful decommissioning of a node Node Driver Exec 1) k8s warns node of shutdown Exec This will occur: ● During dynamic allocation (downscale) ● Or when a node goes down (e.g. spot interruption). To handle spot interruptions, you need a node termination handler (daemonset) ● AWS ● GCP ● Azure https://www.datamechanics.co
  • 29. NodeNode Better Handling for Node Shutdown Copy shuffle and cache data during graceful decommissioning of a node Node Driver Exec 1) k8s warns node of shutdown 2) Driver stops scheduling tasks. Failed tasks do not count against stage failure. Exec 3) Shuffle & cached data is copied to other executors. This will occur: ● During dynamic allocation (downscale) ● Or when a node goes down (e.g. spot interruption). To handle spot interruptions, you need a node termination handler (daemonset) ● AWS ● GCP ● Azure https://www.datamechanics.co
  • 30. Node Better Handling for Node Shutdown Copy shuffle and cache data during graceful decommissioning of a node Node Driver 4) Spark application continues unimpacted Exec This will occur: ● During dynamic allocation (downscale) ● Or when a node goes down (e.g. spot interruption). To handle spot interruptions, you need a node termination handler (daemonset) ● AWS ● GCP ● Azure https://www.datamechanics.co
  • 31. Spark-on-Kubernetes improvements February 2018 Spark 2.3 Initial release June 2020 Spark 3.0 Dynamic allocation Local code upload December 2020 Spark 3.1 Spark-on-k8s is GA (“experimental” removed) Better Handle Node Shutdown November 2018 Spark 2.4 Client Mode Volume mounts Simpler dependency mgt TBD Use remote storage for persisting shuffle data https://www.datamechanics.co
  • 32. Thank you! Your feedback is important to us. Don’t forget to rate and review the sessions. Get in touch! @JYStephan Jean-Yves Stephan @DumazertJulien Julien Dumazert @DataMechanics_ www.datamechanics.co https://www.datamechanics.co