SlideShare uma empresa Scribd logo
1 de 63
APACHE-SPARK	

LARGE-SCALE DATA PROCESSING ENGINE
Bartosz Bogacki <bbogacki@bidlab.pl>
CTO, CODER, ROCK CLIMBER
• current: 	

• Chief Technology Officer at Bidlab
• previous:	

• IT Director at
InternetowyKantor.pl SA
• Software Architect / Project
Manager at Wolters Kluwer
Polska
• find out more (if you care):	

• linkedin.com/in/bartoszbogacki
WE PROCESS MORETHAN
200GB OF LOGS DAILY
Did I mention that…
?
WHY?
• To discover inventory and potential	

• To optimize traffic	

• To optimize campaigns	

• To learn about trends	

• To calculate conversions
APACHE SPARK !
HISTORY
• 2013-06-19 Project enters Apache incubation	

• 2014-02-19 Project established as an ApacheTop
Level Project.	

• 2014-05-30 Spark 1.0.0 released
• "Apache Spark is a (lightning-) fast and
general-purpose cluster computing system"	

• Engine compatible with Apache Hadoop	

• Up to 100x faster than Hadoop 	

• Less code to write, more elastic	

• Active community (117 developers
contributed to release 1.0.0)
KEY CONCEPTS
• Spark /YARN / Mesos resources compatible	

• HDFS / S3 support built-in	

• RDD - Resilient Distribiuted Dataset	

• Transformations & Actions	

• Written in Scala,API for Java / Scala / Python
ECOSYSTEM
• Spark Streaming	

• Shark	

• MLlib (machine learning)	

• GraphX	

• Spark SQL
RDD
• Collections of objects	

• Stored in memory (or disk)	

• Spread across the cluster	

• Auto-rebuild on failure
TRANSFORMATIONS
• map / flatMap	

• filter	

• union / intersection / join / cogroup	

• distinct	

• many more…
ACTIONS
• reduce / reduceByKey	

• foreach	

• count / countByKey	

• first / take / takeOrdered	

• collect / saveAsTextFile / saveAsObjectFile
EXAMPLES
val s1=sc.parallelize(Array(1,2,3,4,5))
val s2=sc.parallelize(Array(3,4,6,7,8))
val s3=sc.parallelize(Array(1,2,2,3,3,3))
!
s2.map(num => num * num)
// => 9, 16, 36, 49, 64
s1.reduce((a,b) => a + b)
// => 15
s1 union s2
// => 1, 2, 3, 4, 5, 3, 4, 6, 7, 8
s1 subtract s2
// => 1, 5, 2
s1 intersection s2
// => 4, 3
s3.distinct
// => 1, 2, 3
EXAMPLES
val set1 = sc.parallelize(Array[(Integer,String)]
((1,”bartek"),(2,"jacek"),(3,"tomek")))
val set2 = sc.parallelize(Array[(Integer,String)]
((2,”nowak”),(4,"kowalski"),(5,"iksiński")))
!
set1 join set2
// =>(2,(jacek,nowak))
set1 leftOuterJoin set2
// =>(1,(bartek,None)), (2,(jacek,Some(nowak))), (3,
(tomek,None))
set1 rightOuterJoin set2
// =>(4,(None,kowalski)), (5,(None,iksiński)), (2,
(Some(jacek),nowak))
EXAMPLES
set1.cogroup(set2).sortByKey()
// => (1,(ArrayBuffer(bartek),ArrayBuffer())), (2,
(ArrayBuffer(jacek),ArrayBuffer(nowak))), (3,
(ArrayBuffer(tomek),ArrayBuffer())), (4,
(ArrayBuffer(),ArrayBuffer(kowalski))), (5,
(ArrayBuffer(),ArrayBuffer(iksiński)))
!
set2.map((t) => (t._1, t._2.length))
// => (2,5), (4,8), (5,8)
!
val set3 = sc.parallelize(Array[(String,Long)]
(("onet.pl",1), ("onet.pl",1), ("wp.pl",1))
!
set3.reduceByKey((n1,n2) => n1 + n2)
// => (onet.pl,2), (wp.pl,1)
HANDS ON
RUNNING EC2 	

SPARK CLUSTER
./spark-ec2 -k spark-key -i spark-key.pem
-s 5
-t m3.2xlarge
launch cluster-name
--region=eu-west-1
SPARK CONSOLE
LINKING WITH SPARK
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.0.0</version>
</dependency>
If you want to use HDFS	

!
groupId = org.apache.hadoop
artifactId = hadoop-client
version = <your-hdfs-version>
If you want to use Spark Streaming	

!
groupId = org.apache.spark
artifactId = spark-streaming_2.10
version = 1.0.0
INITIALIZING
• SparkConf conf = new SparkConf()
.setAppName("TEST")
.setMaster("local");	

• JavaSparkContext sc = new
JavaSparkContext(conf);
CREATING RDD
• List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);	

• JavaRDD<Integer> distData = sc.parallelize(data);
CREATING RDD
• JavaRDD<String> logLines = sc.textFile("data.txt");
CREATING RDD
• JavaRDD<String> logLines = sc.textFile(”hdfs://
<HOST>:<PORT>/daily/data-20-00.txt”);	

• JavaRDD<String> logLines = sc.textFile(”s3n://my-
bucket/daily/data-*.txt”);
TRANSFORM
JavaRDD<Log> logs =
logLines.map(new Function<String, Log>() {
public Log call(String s) {
return LogParser.parse(s);
}
}).filter(new Function<Log, Boolean>(){
public Integer call(Log log) {
return log.getLevel() == 1;
}
});
ACTION :)
logs.count();
TRANSFORM-ACTION
List<Tuple2<String,Integer>> result = 	
	 sc.textFile(”/data/notifies-20-00.txt”)	
	 .mapToPair(new PairFunction<String, String, Integer>() {	
	 	 	 @Override	
	 	 	 public Tuple2<String, Integer> call(String line) throws Exception {	
	 	 	 	 NotifyRequest nr = LogParser.parseNotifyRequest(line);	
	 	 	 	 return new Tuple2<String, Integer>(nr.getFlightId(), 1);	
	 	 	 }	
	 	 })	
	 .reduceByKey(new Function2<Integer, Integer, Integer>(){	
	 	 	 @Override	
	 	 	 public Integer call(Integer v1, Integer v2) throws Exception {	
	 	 	 	 return v1 + v2;	
	 	 	 }})	
	 .sortByKey()	
.collect();
FUNCTIONS, 	

PAIRFUNCTIONS, 	

ETC.
BROADCASTVARIABLES
• "allow the programmer to keep a read-only
variable cached on each machine rather than
shipping a copy of it with tasks"
Broadcast<int[]> broadcastVar =
sc.broadcast(new int[] {1, 2, 3});
!
broadcastVar.value();
// returns [1, 2, 3]
ACCUMULATORS
• variables that are only “added” to through an associative
operation (add())	

• only the driver program can read the accumulator’s value
Accumulator<Integer> accum = sc.accumulator(0);
!
sc.parallelize(Arrays.asList(1, 2, 3, 4)).foreach(x ->
accum.add(x));
!
accum.value();
// returns 10
SERIALIZATION
• All objects used in your code have to be
serializable	

• Otherwise:
org.apache.spark.SparkException: Job aborted: Task not
serializable: java.io.NotSerializableException
USE KRYO SERIALIZER
public class MyRegistrator implements KryoRegistrator {	
	 @Override	
	 public void registerClasses(Kryo kryo) {	
	 	 kryo.register(BidRequest.class);	
	 	 kryo.register(NotifyRequest.class);	
	 	 kryo.register(Event.class);	
}	
}
sparkConfig.set(	
	 "spark.serializer", "org.apache.spark.serializer.KryoSerializer");	
sparkConfig.set(	
	 "spark.kryo.registrator", "pl.instream.dsp.offline.MyRegistrator");	
sparkConfig.set(	
	 "spark.kryoserializer.buffer.mb", "10");
CACHE !
JavaPairRDD<String, Integer> cachedSet = 	
	 sc.textFile(”/data/notifies-20-00.txt”)	
	 .mapToPair(new PairFunction<String, String, Integer>() {	
	 	 	 @Override	
	 	 	 public Tuple2<String, Integer> call(String line) throws Exception
	 	 	 {	
	 	 	 	 NotifyRequest nr = LogParser.parseNotifyRequest(line);	
	 	 	 	 return new Tuple2<String, Integer>(nr.getFlightId(), 1);	
	 	 	 }	
	 	 }).cache();
RDD PERSISTANCE
• MEMORY_ONLY	

• MEMORY_AND_DISK	

• MEMORY_ONLY_SER	

• MEMORY_AND_DISK_SER	

• DISK_ONLY	

• MEMORY_ONLY_2, MEMORY_AND_DISK_2, …	

• OFF_HEAP (Tachyon, ecperimental)
PARTITIONS
• RDD is partitioned	

• You may (and probably should) control number
and size of partitions with coalesce() method	

• By default 1 input file = 1 partition
PARTITIONS
• If your partitions are too big, you’ll face:
[GC 5208198K(5208832K), 0,2403780 secs]
[Full GC 5208831K->5208212K(5208832K), 9,8765730 secs]
[Full GC 5208829K->5208238K(5208832K), 9,7567820 secs]
[Full GC 5208829K->5208295K(5208832K), 9,7629460 secs]
[GC 5208301K(5208832K), 0,2403480 secs]
[Full GC 5208831K->5208344K(5208832K), 9,7497710 secs]
[Full GC 5208829K->5208366K(5208832K), 9,7542880 secs]
[Full GC 5208831K->5208415K(5208832K), 9,7574860 secs]
WARN storage.BlockManagerMasterActor: Removing
BlockManager BlockManagerId(0, ip-xx-xx-xxx-xxx.eu-
west-1.compute.internal, 60048, 0) with no recent heart
beats: 64828ms exceeds 45000ms
RESULTS
• result.saveAsTextFile(„hdfs://<HOST>:<PORT>/
out.txt")	

• result.saveAsObjectFile(„/result/out.obj”)	

• collect()
PROCESS RESULTS 	

PARTITION BY PARTITION
for (Partition partition : result.rdd().partitions()) {	
	 List<String> subresult[] = 	
	 	 result.collectPartitions(new int[] { partition.index() });	
	 	
	 for (String line : subresult[0])	
	 {	
	 	 System.out.println(line);	
	 }	
}
SPARK STREAMING
„SPARK STREAMING IS AN EXTENSION OFTHE
CORE SPARK APITHAT ENABLES 	

HIGH-THROUGHPUT, FAULT-TOLERANT
STREAM PROCESSING OF LIVE DATA STREAMS.”
HOW IT WORKS?
DSTREAMS
• continuous stream of data, either the input data
stream received from source, or the processed
data stream generated by transforming the input
stream	

• represented by a continuous sequence of RDDs
INITIALIZING
• SparkConf conf = new
SparkConf().setAppName("Real-Time
Analytics").setMaster("local");	

• JavaStreamingContext jssc = new
JavaStreamingContext(conf, new
Duration(TIME_IN_MILIS));;
CREATING DSTREAM
• JavaReceiverInputDStream<String> logLines =
jssc.socketTextStream(sourceAddr, sourcePort,
StorageLevels.MEMORY_AND_DISK_SER);
DATA SOURCES
• plainTCP sockets	

• Apache Kafka	

• Apache Flume	

• ZeroMQ
TRANSFORMATIONS
• map, flatMap, filter, union, join, etc.	

• transform	

• updateStateByKey
WINDOW OPERATIONS
• window	

• countByWindow / countByValueAndWindow	

• reduceByWindow / reduceByKeyAndWindow
OUTPUT OPERTIONS
• print	

• foreachRDD	

• saveAsObjectFiles	

• saveAsTextFiles	

• saveAsTextFiles
THINGSTO REMEMBER
USE SPARK-SHELLTO LEARN
PROVIDE ENOUGH RAM 	

TO WORKERS
PROVIDE ENOUGH RAM 	

TO EXECUTOR
SET FRAME SIZE / BUFFERS
ACCORDINGLY
USE KRYO SERIALIZER
SPLIT DATATO APPROPRIATE
NUMBER OF PARTITIONS
PACKAGEYOUR APPLICATION	

IN UBER-JAR
DESIGNYOUR DATA FLOW
AND…
BUILD A FRAMEWORKTO
PROCESS DATA EFFICIENTLY
IT’S EASIER WITH SCALA!
	 // word count example	
	 inputLine.flatMap(line => line.split(" "))	
	 	 .map(word => (word, 1))	
	 	 .reduceByKey(_ + _);
HOW WE USE SPARK?
HOW WE USE SPARK?
HOW WE USE SPARK?
THANKS!
we’re hiring !	

mail me: bbogacki@bidlab.pl

Mais conteúdo relacionado

Mais procurados

Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized ViewsCarl Yeksigian
 
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and OperationsNeo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and OperationsMark Needham
 
Advanced akka features
Advanced akka featuresAdvanced akka features
Advanced akka featuresGrzegorz Duda
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valleyPatrick McFadin
 
Profiling Oracle with GDB
Profiling Oracle with GDBProfiling Oracle with GDB
Profiling Oracle with GDBEnkitec
 
Appengine Java Night #2b
Appengine Java Night #2bAppengine Java Night #2b
Appengine Java Night #2bShinichi Ogawa
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Appengine Java Night #2a
Appengine Java Night #2aAppengine Java Night #2a
Appengine Java Night #2aShinichi Ogawa
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
GDG DevFest 2015 - Reactive approach for slowpokes
GDG DevFest 2015 - Reactive approach for slowpokesGDG DevFest 2015 - Reactive approach for slowpokes
GDG DevFest 2015 - Reactive approach for slowpokesSergey Tarasevich
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on firePatrick McFadin
 
Openstack grizzley puppet_talk
Openstack grizzley puppet_talkOpenstack grizzley puppet_talk
Openstack grizzley puppet_talkbodepd
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
 
Developing your own OpenStack Swift middleware
Developing your own OpenStack Swift middlewareDeveloping your own OpenStack Swift middleware
Developing your own OpenStack Swift middlewareChristian Schwede
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team ApachePatrick McFadin
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 

Mais procurados (20)

Python database interfaces
Python database  interfacesPython database  interfaces
Python database interfaces
 
Cassandra Materialized Views
Cassandra Materialized ViewsCassandra Materialized Views
Cassandra Materialized Views
 
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and OperationsNeo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
Neo4j GraphTour: Utilizing Powerful Extensions for Analytics and Operations
 
Cassandra 3.0
Cassandra 3.0Cassandra 3.0
Cassandra 3.0
 
Advanced akka features
Advanced akka featuresAdvanced akka features
Advanced akka features
 
Real data models of silicon valley
Real data models of silicon valleyReal data models of silicon valley
Real data models of silicon valley
 
Profiling Oracle with GDB
Profiling Oracle with GDBProfiling Oracle with GDB
Profiling Oracle with GDB
 
Appengine Java Night #2b
Appengine Java Night #2bAppengine Java Night #2b
Appengine Java Night #2b
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Appengine Java Night #2a
Appengine Java Night #2aAppengine Java Night #2a
Appengine Java Night #2a
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
GDG DevFest 2015 - Reactive approach for slowpokes
GDG DevFest 2015 - Reactive approach for slowpokesGDG DevFest 2015 - Reactive approach for slowpokes
GDG DevFest 2015 - Reactive approach for slowpokes
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
Openstack grizzley puppet_talk
Openstack grizzley puppet_talkOpenstack grizzley puppet_talk
Openstack grizzley puppet_talk
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
 
Developing your own OpenStack Swift middleware
Developing your own OpenStack Swift middlewareDeveloping your own OpenStack Swift middleware
Developing your own OpenStack Swift middleware
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 

Destaque

B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsAmazon Web Services
 
Large-Scale Distributed Systems in Display Advertising
Large-Scale Distributed Systems in Display AdvertisingLarge-Scale Distributed Systems in Display Advertising
Large-Scale Distributed Systems in Display Advertisingbbogacki
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosScaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosDiscover Pinterest
 
Lessons learned from building Demand Side Platform
Lessons learned from building Demand Side PlatformLessons learned from building Demand Side Platform
Lessons learned from building Demand Side Platformbbogacki
 
Cubes – pluggable model explained
Cubes – pluggable model explainedCubes – pluggable model explained
Cubes – pluggable model explainedStefan Urbanek
 
Bubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsBubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsStefan Urbanek
 
Designing the perfect display monetization dashboard (public)
Designing the perfect display monetization dashboard (public)Designing the perfect display monetization dashboard (public)
Designing the perfect display monetization dashboard (public)Ian Thomas
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...Amazon Web Services
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningNik Spirin
 
Apache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerApache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerIMC Institute
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesDatabricks
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionsudhakara st
 

Destaque (15)

B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on aws
 
Large-Scale Distributed Systems in Display Advertising
Large-Scale Distributed Systems in Display AdvertisingLarge-Scale Distributed Systems in Display Advertising
Large-Scale Distributed Systems in Display Advertising
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosScaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and Mesos
 
Lessons learned from building Demand Side Platform
Lessons learned from building Demand Side PlatformLessons learned from building Demand Side Platform
Lessons learned from building Demand Side Platform
 
Cubes – pluggable model explained
Cubes – pluggable model explainedCubes – pluggable model explained
Cubes – pluggable model explained
 
Bubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsBubbles – Virtual Data Objects
Bubbles – Virtual Data Objects
 
Designing the perfect display monetization dashboard (public)
Designing the perfect display monetization dashboard (public)Designing the perfect display monetization dashboard (public)
Designing the perfect display monetization dashboard (public)
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
Apache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainerApache Spark & Hadoop : Train-the-trainer
Apache Spark & Hadoop : Train-the-trainer
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 

Semelhante a Introduction to Apache Spark / PUT 06.2014

Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek PROIDEA
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackJakub Hajek
 
Scala introduction
Scala introductionScala introduction
Scala introductionvito jeng
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming JobsDatabricks
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Big Data Spain
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupGwen (Chen) Shapira
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourPeter Friese
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for TrainingBryan Yang
 
OpenStack API's and WSGI
OpenStack API's and WSGIOpenStack API's and WSGI
OpenStack API's and WSGIMike Pittaro
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.David Tollmyr
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak PROIDEA
 

Semelhante a Introduction to Apache Spark / PUT 06.2014 (20)

Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek Docker Logging and analysing with Elastic Stack - Jakub Hajek
Docker Logging and analysing with Elastic Stack - Jakub Hajek
 
Docker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic StackDocker Logging and analysing with Elastic Stack
Docker Logging and analysing with Elastic Stack
 
Scala introduction
Scala introductionScala introduction
Scala introduction
 
Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Czzawk
CzzawkCzzawk
Czzawk
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
 
Intro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data MeetupIntro to Spark - for Denver Big Data Meetup
Intro to Spark - for Denver Big Data Meetup
 
Pdxpugday2010 pg90
Pdxpugday2010 pg90Pdxpugday2010 pg90
Pdxpugday2010 pg90
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Apache Cassandra and Go
Apache Cassandra and GoApache Cassandra and Go
Apache Cassandra and Go
 
Master tuning
Master   tuningMaster   tuning
Master tuning
 
CouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 HourCouchDB Mobile - From Couch to 5K in 1 Hour
CouchDB Mobile - From Couch to 5K in 1 Hour
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for Training
 
OpenStack API's and WSGI
OpenStack API's and WSGIOpenStack API's and WSGI
OpenStack API's and WSGI
 
Scala active record
Scala active recordScala active record
Scala active record
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 

Último

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 

Último (20)

Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 

Introduction to Apache Spark / PUT 06.2014