SlideShare uma empresa Scribd logo
1 de 21
Baixar para ler offline
Distributed Systems from
Scratch - Part 2
Handling third party libraries
https://github.com/phatak-dev/distributedsystems
● Madhukara Phatak
● Big data consultant and
trainer at datamantra.io
● Consult in Hadoop, Spark
and Scala
● www.madhukaraphatak.com
Agenda
● Idea
● Motivation
● Architecture of existing big data system
● Function abstraction
● Third party libraries
● Implementing third party libraries
● MySQL task
● Code example
Idea
“What it takes to build a
distributed processing system
like Spark?”
Motivation
● First version of Spark only had 1600 lines of Scala code
● Had all basic pieces of RDD and ability to run
distributed system using Mesos
● Recreating the same code with step by step
understanding
● Ample of time in hand
Distributed systems from 30000ft
Distributed Storage(HDFS/S3)
Distributed Cluster management
(YARN/Mesos)
Distributed Processing Systems
(Spark/MapReduce)
Data Applications
Our distributed system
Mesos
Scala function based abstraction
Scala functions to express logic
Function abstraction
● The whole spark API can be summarized a scala
function which can represented as follow
() => T
● This scala function can be parallelized and sent over
network to run on multiple systems using mesos
● The function is represented as a task inside the
framework
● FunctionTask.scala
Spark API as distributed function
● Initial API of the spark revolved around scala function
abstraction for processing as with RDD for data
abstraction
● Every API like map, flatMap represented as a function
task which takes one parameter and return one value
● The distribution of the functions are initially done by the
mesos which later ported to other cluster management
● This shows how the spark started with functional
programming
Till now
● Discussion about Mesos and its abstraction
● Hello world code on Mesos
● Defining Function interface
● Implementing
○ Scheduler to run scala code
○ Custom executor for scala
○ Serialize and Deserialize scala function
● https://www.youtube.com/watch?v=Oy9ToN4O63c
What a local function can do?
● Access to the local data. Even in spark, normally the
function access the hdfs local data
● Ability to access the classes provided by the framework
● Any logic which can be serialized
What it cannot do?
● Access classes outside from the framework
● Access the results of other functions (shuffle)
● Access to lookup data (broadcast)
Need of third party libraries
● Ability to add third party libraries in a distributed system
framework is important
● Third party libraries allow us to
○ Connect to third party sources
○ Use library to implement custom logic like matrix
manipulation inside function abstraction
○ Ability to extend base framework using set of
libraries ex: spark-sql
○ Ability to optimize for specific hardware
Approaches to third party libraries
● There are two different approaches to distribute third
party jars
● UberJar - Build all the dependencies with your
application code to single jar
● Second approach is to distribute the libraries separately
and adding them to the classpath of executors
● UberJar suffers from issues of jar size and versioning
● So we are going follow second approach which is
similar to one followed in Spark
Design for distributing jars
Executor 1
Executor 2
Jar serving http
server
Scheduler code
Scheduler/Driver
Download
jars over http
Download
jars over http
Distributing jars
● Third party jars are distributed over http protocol over
the cluster
● Whenever the scheduler/drives comes up it starts a http
server to serve the jars passed on to it by user
● Whenever executors are created, scheduler passes on
the uri of the http server to connect
● Executors connect to the jar server and download the
jars to respective machine. Then they add them to their
classpath.
Code for implementing
● We need multiple changes to our existing code base to
support third party jars
● The following are the different steps
○ Implementation of embedded http server
○ Change to scheduler to start http server
○ Change to executor to download jars and add it to
classpath
○ A function which uses third party library
Http Server
● We implement an embedded http server using jetty
● Jetty is a popular http server and J2EE servlet container
from eclipse organization
● One of the strength of jetty is it can be embedded inside
another program to provide http interfaces to certain
functionality
● Initial versions of Spark used jetty for jar distribution.
Newer version uses netty.
● https://eclipse.org/jetty/
● HttpServer.scala
Scheduler change
● Once we have http server, now we need to start when
we start our scheduler
● We will use registered callback for creating our jar
server.
● As part of starting the jar server, we will copy all the jars
provided by the user to a location which will beame
base director for the server.
● Once we have the server running, we pass on the
server uri to all the executors
● TaskScheduler.scala
Executor side
● In executor, we download the jars using calls to the jar
server running on master
● Once we downloaded the jars, we add it the classpath
using URLClassLoader
● We use above classloader to run our functions so that it
has access all the jars
● We plug this code in the registered callback of the
executor so it run only once
● TaskExecutor.scala
MySQL function
● This example is a function which access the mysql class
to run jdbc against a mysql instance
● We ship mysql jar using our jar distributed framework so
it will be not part of our application jar
● There is no change in our function api as it’s a normal
function as other examples
● MySQLTask.scala
References
● http://blog.madhukaraphatak.com/mesos-single-node-
setup-ubuntu/
● http://blog.madhukaraphatak.com/mesos-helloworld-
scala/
● http://blog.madhukaraphatak.com/custom-mesos-
executor-scala/
● http://blog.madhukaraphatak.com/distributing-third-
party-libraries-in-mesos/

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Interactive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark StreamingInteractive Data Analysis in Spark Streaming
Interactive Data Analysis in Spark Streaming
 
Building end to end streaming application on Spark
Building end to end streaming application on SparkBuilding end to end streaming application on Spark
Building end to end streaming application on Spark
 
Introduction to Structured streaming
Introduction to Structured streamingIntroduction to Structured streaming
Introduction to Structured streaming
 
Migrating to spark 2.0
Migrating to spark 2.0Migrating to spark 2.0
Migrating to spark 2.0
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
 
Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2Migrating to Spark 2.0 - Part 2
Migrating to Spark 2.0 - Part 2
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 
Introduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset APIIntroduction to Spark 2.0 Dataset API
Introduction to Spark 2.0 Dataset API
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to dataset
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Understanding transactional writes in datasource v2
Understanding transactional writes in  datasource v2Understanding transactional writes in  datasource v2
Understanding transactional writes in datasource v2
 
Core Services behind Spark Job Execution
Core Services behind Spark Job ExecutionCore Services behind Spark Job Execution
Core Services behind Spark Job Execution
 
Understanding time in structured streaming
Understanding time in structured streamingUnderstanding time in structured streaming
Understanding time in structured streaming
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
 

Destaque

Destaque (20)

Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overview
 
Predictive modeling healthcare
Predictive modeling healthcarePredictive modeling healthcare
Predictive modeling healthcare
 
Ranking the Web with Spark
Ranking the Web with SparkRanking the Web with Spark
Ranking the Web with Spark
 
Keyboard covert channels
Keyboard covert channelsKeyboard covert channels
Keyboard covert channels
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
AMP Camp 5 Intro
AMP Camp 5 IntroAMP Camp 5 Intro
AMP Camp 5 Intro
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2Anatomy of Spark SQL Catalyst - Part 2
Anatomy of Spark SQL Catalyst - Part 2
 
Spark on yarn
Spark on yarnSpark on yarn
Spark on yarn
 
Getting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache MesosGetting Started Running Apache Spark on Apache Mesos
Getting Started Running Apache Spark on Apache Mesos
 
Anatomy of in memory processing in Spark
Anatomy of in memory processing in SparkAnatomy of in memory processing in Spark
Anatomy of in memory processing in Spark
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Resilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARKResilient Distributed DataSets - Apache SPARK
Resilient Distributed DataSets - Apache SPARK
 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Anatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source APIAnatomy of Data Source API : A deep dive into Spark Data source API
Anatomy of Data Source API : A deep dive into Spark Data source API
 
Anatomy of spark catalyst
Anatomy of spark catalystAnatomy of spark catalyst
Anatomy of spark catalyst
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
 
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of DatabricksDataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
DataFrame: Spark's new abstraction for data science by Reynold Xin of Databricks
 

Semelhante a Building distributed processing system from scratch - Part 2

Semelhante a Building distributed processing system from scratch - Part 2 (20)

Apache spark - Installation
Apache spark - InstallationApache spark - Installation
Apache spark - Installation
 
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
 
Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2 Introduction to Apache Spark :: Lagos Scala Meetup session 2
Introduction to Apache Spark :: Lagos Scala Meetup session 2
 
Distributed tracing 101
Distributed tracing 101Distributed tracing 101
Distributed tracing 101
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Distributed Tracing
Distributed TracingDistributed Tracing
Distributed Tracing
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Data Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource ManagersData Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource Managers
 
Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan Chan
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Scalable Spark deployment using Kubernetes
Scalable Spark deployment using KubernetesScalable Spark deployment using Kubernetes
Scalable Spark deployment using Kubernetes
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Balisage - EXPath Packaging
Balisage - EXPath PackagingBalisage - EXPath Packaging
Balisage - EXPath Packaging
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark example
 
A day in the life of a log message
A day in the life of a log messageA day in the life of a log message
A day in the life of a log message
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
Let's Containerize New York with Docker!
Let's Containerize New York with Docker!Let's Containerize New York with Docker!
Let's Containerize New York with Docker!
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 

Mais de datamantra

Mais de datamantra (11)

State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
 
Spark on Kubernetes
Spark on KubernetesSpark on Kubernetes
Spark on Kubernetes
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
Spark stack for Model life-cycle management
Spark stack for Model life-cycle managementSpark stack for Model life-cycle management
Spark stack for Model life-cycle management
 
Testing Spark and Scala
Testing Spark and ScalaTesting Spark and Scala
Testing Spark and Scala
 
Understanding Implicits in Scala
Understanding Implicits in ScalaUnderstanding Implicits in Scala
Understanding Implicits in Scala
 
Introduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actorsIntroduction to concurrent programming with akka actors
Introduction to concurrent programming with akka actors
 
Functional programming in Scala
Functional programming in ScalaFunctional programming in Scala
Functional programming in Scala
 
Telco analytics at scale
Telco analytics at scaleTelco analytics at scale
Telco analytics at scale
 
Platform for Data Scientists
Platform for Data ScientistsPlatform for Data Scientists
Platform for Data Scientists
 
Building scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTPBuilding scalable rest service using Akka HTTP
Building scalable rest service using Akka HTTP
 

Último

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Último (20)

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

Building distributed processing system from scratch - Part 2

  • 1. Distributed Systems from Scratch - Part 2 Handling third party libraries https://github.com/phatak-dev/distributedsystems
  • 2. ● Madhukara Phatak ● Big data consultant and trainer at datamantra.io ● Consult in Hadoop, Spark and Scala ● www.madhukaraphatak.com
  • 3. Agenda ● Idea ● Motivation ● Architecture of existing big data system ● Function abstraction ● Third party libraries ● Implementing third party libraries ● MySQL task ● Code example
  • 4. Idea “What it takes to build a distributed processing system like Spark?”
  • 5. Motivation ● First version of Spark only had 1600 lines of Scala code ● Had all basic pieces of RDD and ability to run distributed system using Mesos ● Recreating the same code with step by step understanding ● Ample of time in hand
  • 6. Distributed systems from 30000ft Distributed Storage(HDFS/S3) Distributed Cluster management (YARN/Mesos) Distributed Processing Systems (Spark/MapReduce) Data Applications
  • 7. Our distributed system Mesos Scala function based abstraction Scala functions to express logic
  • 8. Function abstraction ● The whole spark API can be summarized a scala function which can represented as follow () => T ● This scala function can be parallelized and sent over network to run on multiple systems using mesos ● The function is represented as a task inside the framework ● FunctionTask.scala
  • 9. Spark API as distributed function ● Initial API of the spark revolved around scala function abstraction for processing as with RDD for data abstraction ● Every API like map, flatMap represented as a function task which takes one parameter and return one value ● The distribution of the functions are initially done by the mesos which later ported to other cluster management ● This shows how the spark started with functional programming
  • 10. Till now ● Discussion about Mesos and its abstraction ● Hello world code on Mesos ● Defining Function interface ● Implementing ○ Scheduler to run scala code ○ Custom executor for scala ○ Serialize and Deserialize scala function ● https://www.youtube.com/watch?v=Oy9ToN4O63c
  • 11. What a local function can do? ● Access to the local data. Even in spark, normally the function access the hdfs local data ● Ability to access the classes provided by the framework ● Any logic which can be serialized What it cannot do? ● Access classes outside from the framework ● Access the results of other functions (shuffle) ● Access to lookup data (broadcast)
  • 12. Need of third party libraries ● Ability to add third party libraries in a distributed system framework is important ● Third party libraries allow us to ○ Connect to third party sources ○ Use library to implement custom logic like matrix manipulation inside function abstraction ○ Ability to extend base framework using set of libraries ex: spark-sql ○ Ability to optimize for specific hardware
  • 13. Approaches to third party libraries ● There are two different approaches to distribute third party jars ● UberJar - Build all the dependencies with your application code to single jar ● Second approach is to distribute the libraries separately and adding them to the classpath of executors ● UberJar suffers from issues of jar size and versioning ● So we are going follow second approach which is similar to one followed in Spark
  • 14. Design for distributing jars Executor 1 Executor 2 Jar serving http server Scheduler code Scheduler/Driver Download jars over http Download jars over http
  • 15. Distributing jars ● Third party jars are distributed over http protocol over the cluster ● Whenever the scheduler/drives comes up it starts a http server to serve the jars passed on to it by user ● Whenever executors are created, scheduler passes on the uri of the http server to connect ● Executors connect to the jar server and download the jars to respective machine. Then they add them to their classpath.
  • 16. Code for implementing ● We need multiple changes to our existing code base to support third party jars ● The following are the different steps ○ Implementation of embedded http server ○ Change to scheduler to start http server ○ Change to executor to download jars and add it to classpath ○ A function which uses third party library
  • 17. Http Server ● We implement an embedded http server using jetty ● Jetty is a popular http server and J2EE servlet container from eclipse organization ● One of the strength of jetty is it can be embedded inside another program to provide http interfaces to certain functionality ● Initial versions of Spark used jetty for jar distribution. Newer version uses netty. ● https://eclipse.org/jetty/ ● HttpServer.scala
  • 18. Scheduler change ● Once we have http server, now we need to start when we start our scheduler ● We will use registered callback for creating our jar server. ● As part of starting the jar server, we will copy all the jars provided by the user to a location which will beame base director for the server. ● Once we have the server running, we pass on the server uri to all the executors ● TaskScheduler.scala
  • 19. Executor side ● In executor, we download the jars using calls to the jar server running on master ● Once we downloaded the jars, we add it the classpath using URLClassLoader ● We use above classloader to run our functions so that it has access all the jars ● We plug this code in the registered callback of the executor so it run only once ● TaskExecutor.scala
  • 20. MySQL function ● This example is a function which access the mysql class to run jdbc against a mysql instance ● We ship mysql jar using our jar distributed framework so it will be not part of our application jar ● There is no change in our function api as it’s a normal function as other examples ● MySQLTask.scala
  • 21. References ● http://blog.madhukaraphatak.com/mesos-single-node- setup-ubuntu/ ● http://blog.madhukaraphatak.com/mesos-helloworld- scala/ ● http://blog.madhukaraphatak.com/custom-mesos- executor-scala/ ● http://blog.madhukaraphatak.com/distributing-third- party-libraries-in-mesos/