SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
Apache Spark
Mate Gulyas
CTO & Co-Founder
GULYÁS MÁTÉ
@gulyasm
Getting Started
Spark Core
Spark SQL
Spark Streaming
MLlib
GraphX
Cluster Managers
UNIFIED STACK
Spark Core
Spark SQL
Spark Streaming
MLlib
GraphX
Cluster Managers
RDD API
Dataframe API
Dataset API
UNIFIED STACK
Spark Core
RDD API
Dataframe API
Dataset API
❏ Scala
❏ Java
❏ Python
❏ R
WHICH LANGUAGE TO SPARK ON?
SPARK INSTALL
DRIVER
SPARKCONTEXT
DRIVER PROGRAM
Your main function. This is what you write.
Launches parallel operations on the cluster. The
driver access Spark through SparkContext.
You access the computing cluster via SparkContext
Via SparkContext you can create RDDs.
❏ INTERACTIVE
❏ STANDALONE
A “SPARK SOFTWARE”
Resilient Distributed
Dataset (RDD)
THE MAIN ATTRACTION
RDD
❏ TRANSFORMATION
❏ ACTION
OPERATIONS ON RDD
CREATES ANOTHER RDD
TRANSFORMATION
CALCULATE VALUE AND RETURN IT
TO THE DRIVER PROGRAM
ACTION
LAZY EVALUATION
INTERACTIVE
❏ The code: github.com/gulyasm/bigdata
❏ Databricks site: spark.apache.org
❏ User mailing list
❏ Spark books
MATERIALS
MATE GULYAS
gulyasm@enbrite.ly
@gulyasm
@enbritely
THANK YOU!
TRANSFORMATIONS
ACTIONS
LAZY EVALUATION
LIFECYCLE OF A SPARK PROGRAM
1. READ DATA FROM EXTERNAL SOURCE
2. CREATE LAZY EVALUATED
TRANSFORMATIONS
3. CACHE ANY INTERMEDIATE RDD TO REUSE
4. KICK IT OFF BY CALLING SOME ACTION
PARTITIONS
RDD INTERNALS
RDD INTERFACE
➔ set of PARTITIONS
➔ list of DEPENDENCIES on PARENT RDDs
➔ functions to COMPUTE a partition given parents
➔ preferred LOCATIONS (optional)
➔ PARTITIONER for K/V pairs (optional)
MULTIPLE RDDs
/**
* :: DeveloperApi ::
* Implemented by subclasses to compute a given partition.
*/
@DeveloperApi
def compute(split: Partition, context: TaskContext): Iterator[T]
/** Implemented by subclasses to return the set of partitions in this RDD. */
protected def getPartitions: Array[Partition]
/** Implemented by subclasses to return how this RDD depends on parent RDDs.
*/
protected def getDependencies : Seq[Dependency[_]] = deps
/** Optionally overridden by subclasses to specify placement preferences. */
protected def getPreferredLocations (split: Partition): Seq[String] = Nil
/** Optionally overridden by subclasses to specify how they are partitioned.
*/
@transient val partitioner: Option[Partitioner] = None
INTERNALS
THE IMPORTANT PART
❏ HOW EXECUTION WORKS
❏ TERMINOLOGY
❏ WHAT SHOULD WE CARE ABOUT
PIPELINING
❏ Parallel to CPU pipelining
❏ More steps at a time
❏ Recap: computation kicks of when an
action is called due to lazy evaluation
PIPELINING
text = sc.textFile("twit1.txt")
words = nonempty.flatMap(lambda x: x.split(" "))
fwords = words.filter(lambda x: len(x) > 0)
ones = fwords.map(lambda x: (x, 1))
result = ones.reduceByKey(lambda l,r: r+l)
result.collect()
PIPELINING
text = sc.textFile( )
words = nonempty.flatMap( )
fwords = words.filter( )
ones = fwords.map( )
result = ones.reduceByKey( )
result.collect()
PIPELINING
sc.textFile( )
.flatMap( )
.filter( )
.map( )
.reduceByKey( )
PIPELINING
sc.textFile().flatMap().filter().map().reduceByKey()
RDD RDD RDD RDD RDD
textFile(
) flatMap() filter() map() reduceByKey()
text resultwords fwords ones
PIPELINING
PIPELINING
def runJob[T, U](
rdd: RDD[T],
partitions: Seq[Int],
func: (Iterator[T]) => U)
) : Array[U]
RDD RDD RDD RDD RDD
textFile(
) flatMap() filter() map() reduceByKey()
text resultwords fwords ones
collect()
PIPELINING
JOB
❏ Basically an action
❏ An action creates a job
❏ A whole computation with all
dependencies
RDD RDD RDD RDD RDD
textFile(
) flatMap() filter() map() reduceByKey()
text resultwords fwords ones
collect()
Job
STAGE
❏ Unit of execution
❏ Named after the last transformation
(the one runJob was called on)
❏ Transformations pipelined together into
stages
❏ Stage boundary usually means shuffling
RDD RDD RDD RDD RDD
textFile(
) flatMap() filter() map() reduceByKey()
text resultwords fwords ones
collect()
Job
Stage 1 Stage 2
STAGE
❏ Unit of execution
❏ Named after the last transformation
(the one runJob was called on)
❏ Transformations pipelined together into
stages
❏ Stage boundary usually means shuffling
RDD RDD RDD RDD RDD
textFile(
) flatMap() filter() map() reduceByKey()
text resultwords fwords ones
collect()
Job
Stage 1 Stage 2
PT1
PT2
PT1
PT2
PT1
PT2
PT1
PT2
PT1
PT1
Shuffle
Repartitioning
text = sc.textFile("twit1.txt")
words = nonempty.flatMap(lambda x: x.split(" "))
fwords = words.filter(lambda x: len(x) > 1)
ones = fwords.map(lambda x: (x, 1))
rp = ones.repartition(6)
result = rp.reduceByKey(lambda l,r: r+l)
result.collect()
TaskSet
THE PROCESS
RDD Objects DAG Scheduler Task Scheduler Executor
RDD
RDD RDD
RDD
RDD
sc.textFile.
map()
.groupBy()
.filter()
Build DAG of
operators
T
T
T
T
T
T
T
T
T
S
S
S
S
- Split DAG into
stages of tasks
- Each stage when
ready = ALL
dependent task are
finished
DAG Task
Task
Scheduler
- Launches tasks
- Retry failed tasks
Executor
Block manager
Task threads
Task threads
Task threads
- Store and serve
blocks
- Executes tasks
MATE GULYAS
gulyasm@enbrite.ly
@gulyasm
@enbritely
THANK YOU!

Mais conteúdo relacionado

Destaque

Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZDataFactZ
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBaseCarol McDonald
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...DataWorks Summit/Hadoop Summit
 
Apache Spark An Overview
Apache Spark An OverviewApache Spark An Overview
Apache Spark An OverviewMohit Jain
 
Zero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraZero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraRussell Spitzer
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient DataCarol McDonald
 

Destaque (7)

Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
 
Apache Spark streaming and HBase
Apache Spark streaming and HBaseApache Spark streaming and HBase
Apache Spark streaming and HBase
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
Apache HBase + Spark: Leveraging your Non-Relational Datastore in Batch and S...
 
Apache Spark An Overview
Apache Spark An OverviewApache Spark An Overview
Apache Spark An Overview
 
Zero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and CassandraZero to Streaming: Spark and Cassandra
Zero to Streaming: Spark and Cassandra
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 

Mais de Máté Gulyás

Go meetup | Go 1.8 release
Go meetup | Go 1.8 releaseGo meetup | Go 1.8 release
Go meetup | Go 1.8 releaseMáté Gulyás
 
Big Data Universe - How we design architectures
Big Data Universe - How we design architecturesBig Data Universe - How we design architectures
Big Data Universe - How we design architecturesMáté Gulyás
 
Startup safary | Getting started with microservices
Startup safary | Getting started with microservicesStartup safary | Getting started with microservices
Startup safary | Getting started with microservicesMáté Gulyás
 
VDay 2015 - Golang as the Dream of a Devops Engineer
VDay 2015 - Golang as the Dream of a Devops EngineerVDay 2015 - Golang as the Dream of a Devops Engineer
VDay 2015 - Golang as the Dream of a Devops EngineerMáté Gulyás
 
Webkonf 2015 | A web sötét oldala
Webkonf 2015 | A web sötét oldalaWebkonf 2015 | A web sötét oldala
Webkonf 2015 | A web sötét oldalaMáté Gulyás
 
Apache Spark: The modern data analytics platform
Apache Spark: The modern data analytics platformApache Spark: The modern data analytics platform
Apache Spark: The modern data analytics platformMáté Gulyás
 

Mais de Máté Gulyás (7)

Go meetup | Go 1.8 release
Go meetup | Go 1.8 releaseGo meetup | Go 1.8 release
Go meetup | Go 1.8 release
 
Info tanar meetup
Info tanar meetupInfo tanar meetup
Info tanar meetup
 
Big Data Universe - How we design architectures
Big Data Universe - How we design architecturesBig Data Universe - How we design architectures
Big Data Universe - How we design architectures
 
Startup safary | Getting started with microservices
Startup safary | Getting started with microservicesStartup safary | Getting started with microservices
Startup safary | Getting started with microservices
 
VDay 2015 - Golang as the Dream of a Devops Engineer
VDay 2015 - Golang as the Dream of a Devops EngineerVDay 2015 - Golang as the Dream of a Devops Engineer
VDay 2015 - Golang as the Dream of a Devops Engineer
 
Webkonf 2015 | A web sötét oldala
Webkonf 2015 | A web sötét oldalaWebkonf 2015 | A web sötét oldala
Webkonf 2015 | A web sötét oldala
 
Apache Spark: The modern data analytics platform
Apache Spark: The modern data analytics platformApache Spark: The modern data analytics platform
Apache Spark: The modern data analytics platform
 

Último

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 

Último (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Budapest Spark Meetup - Basics of Spark coding