SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
Lightning-fast analytics with Spark and Cassandra 
Nick Bailey 
@nickmbailey 
©2014 DataStax Confidential. Do not distribute without consent. 
1
What is Spark? 
*Apache Project since 2010" 
*Fast" 
* 10x-100x faster than Hadoop MapReduce" 
* In-memory storage" 
* Single JVM process per node" 
*Easy" 
*Rich Scala, Java and Python APIs" 
* 2x-5x less code" 
* Interactive shell 
Analytic 
Analytic 
Search
API 
map reduce
API 
map" 
filter" 
groupBy 
sort" 
union" 
join" 
leftOuterJoin 
rightOuterJoin 
reduce" 
count" 
fold" 
reduceByKey 
groupByKey 
cogroup 
cross" 
zip 
sample" 
take" 
first 
partitionBy 
mapWith 
pipe" 
save 
...
API 
*Resilient Distributed Datasets (RDD)" 
*Collections of objects spread across a cluster, stored in RAM or on Disk" 
* Built through parallel transformations" 
* Automatically rebuilt on failure" 
" 
*Operations" 
* Transformations (e.g. map, filter, groupBy)" 
* Actions (e.g. count, collect, save)
Operator Graph: Optimization and Fault Tolerance 
join 
groupBy 
filter 
Stage 3 
Stage 1 
Stage 2 
A: B: 
C: D: E: 
F: 
map 
= RDD = Cached partition
Fast 
Running Time (s) 
4000 
3000 
2000 
1000 
0 
1 5 10 20 30 
Number of Iterations 
110 sec / iteration 
Hadoop Spark 
first iteration 80 sec" 
further iterations 1 sec 
* Logistic Regression Performance"
Why Spark on Cassandra? 
*Data model independent queries" 
*Cross-table operations (JOIN, UNION, etc.)" 
*Complex analytics (e.g. machine learning)" 
*Data transformation, aggregation, etc." 
*Stream processing
How to Spark on Cassandra? 
*DataStax Cassandra Spark driver" 
*Open source: https://github.com/datastax/spark-cassandra-connector" 
*Compatible with" 
* Spark 0.9+" 
*Cassandra 2.0+" 
*DataStax Enterprise 4.5+
Cassandra Spark Driver 
*Cassandra tables exposed as Spark RDDs" 
*Read from and write to Cassandra" 
*Mapping of C* tables and rows to Scala objects" 
*All Cassandra types supported and converted to Scala types" 
*Server side data selection" 
*Spark Streaming support" 
*Scala and Java support
Connecting to Cassandra 
// Import Cassandra-specific functions on SparkContext and RDD objects" 
import com.datastax.driver.spark._" 
" 
" 
// Spark connection options" 
val conf = new SparkConf(true)" 
! ! .setMaster("spark://192.168.123.10:7077")" 
! ! .setAppName("cassandra-demo")" 
.set(“cassandra.connection.host", "192.168.123.10") // initial 
contact! 
.set("cassandra.username", "cassandra")! 
.set("cassandra.password", "cassandra") " 
" 
val sc = new SparkContext(conf)
Accessing Data 
CREATE TABLE test.words (word text PRIMARY KEY, count int);" 
" 
INSERT INTO test.words (word, count) VALUES ('bar', 30);" 
INSERT INTO test.words (word, count) VALUES ('foo', 20); 
*Accessing table above as RDD: 
// Use table as RDD" 
val rdd = sc.cassandraTable("test", "words")" 
// rdd: CassandraRDD[CassandraRow] = CassandraRDD[0]" 
" 
rdd.toArray.foreach(println)" 
// CassandraRow[word: bar, count: 30]! 
// CassandraRow[word: foo, count: 20]" 
" 
rdd.columnNames // Stream(word, count) " 
rdd.size // 2" 
" 
val firstRow = rdd.first // firstRow: CassandraRow = CassandraRow[word: bar, 
count: 30]! 
firstRow.getInt("count") // Int = 30
Saving Data 
val newRdd = sc.parallelize(Seq(("cat", 40), ("fox", 50)))" 
// newRdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[2]" 
" 
newRdd.saveToCassandra("test", "words", Seq("word", "count")) 
*RDD above saved to Cassandra: 
SELECT * FROM test.words;" 
" 
word | count" 
------+-------" 
bar | 30" 
foo | 20" 
cat | 40" 
fox | 50" 
" 
(4 rows)
Type Mapping 
CQL Type Scala Type 
ascii String 
bigint Long 
boolean Boolean 
counter Long 
decimal BigDecimal, java.math.BigDecimal 
double Double 
float Float 
inet java.net.InetAddress 
int Int 
list Vector, List, Iterable, Seq, IndexedSeq, java.util.List 
map Map, TreeMap, java.util.HashMap 
set Set, TreeSet, java.util.HashSet 
text, varchar String 
timestamp Long, java.util.Date, java.sql.Date, org.joda.time.DateTime 
timeuuid java.util.UUID 
uuid java.util.UUID 
varint BigInt, java.math.BigInteger 
*nullable values Option
Mapping Rows to Objects 
*Mapping rows to Scala Case Classes" 
*CQL underscore case column mapped to Scala camel case 
property" 
*Custom mapping functions (see docs) 
CREATE TABLE test.cars (" 
! id text PRIMARY KEY," 
! model text," 
! fuel_type text," 
! year int! 
); 
case class Vehicle(" 
! id: String," 
! model: String," 
! fuelType: String," 
! year: Int! 
)" 
" 
sc.cassandraTable[Vehicle]("test", "cars").toArray! 
//Array(Vehicle(KF334L, Ford Mondeo, Petrol, 2009)," 
// Vehicle(MT8787, Hyundai x35, Diesel, 2011) 
!
Server Side Data Selection 
*Reduce the amount of data transferred" 
*Selecting columns 
sc.cassandraTable("test", 
"users").select("username").toArray.foreach(println)" 
// CassandraRow{username: john} ! 
// CassandraRow{username: tom} 
" 
*Selecting rows (by clustering columns and/or secondary indexes) 
sc.cassandraTable("test", "cars").select("model").where("color = ?", 
"black").toArray.foreach(println)" 
// CassandraRow{model: Ford Mondeo}! 
// CassandraRow{model: Hyundai x35}
Spark SQL 
Compatible 
Spark SQL Streaming ML 
Spark (General execution engine) 
Graph 
Cassandra
Spark SQL 
*SQL query engine on top of Spark" 
*Hive compatible (JDBC, UDFs, types, metadata, etc.)" 
*Support for in-memory processing" 
*Pushdown of predicates to Cassandra when possible
Spark SQL Example 
" 
import com.datastax.spark.connector._" 
" 
// Connect to the Spark cluster" 
val conf = new SparkConf(true)...! 
val sc = new SparkContext(conf)" 
" 
// Create Cassandra SQL context! 
val cc = new CassandraSQLContext(sc)! 
" 
// Execute SQL query" 
val rdd = cc.sql("SELECT * FROM keyspace.table WHERE ...”)"
Analytics Workload Isolation 
Cassandra" 
+ Spark DC 
Cassandra" 
Only DC 
Online 
App 
Analytical 
App 
Mixed Load Cassandra Cluster
Analytics High Availability 
*Spark Workers run on all Cassandra nodes" 
*Workers are resilient by default" 
*First Spark node promoted as Spark Master" 
*Standby Master promoted on failure" 
*Master HA available in DataStax Enterprise 
Spark Master 
Spark Standby Master 
Spark Worker
Questions?

Mais conteúdo relacionado

Mais procurados

Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayMatthias Niehoff
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraRustam Aliyev
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkVictor Coustenoble
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & SparkMatthias Niehoff
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Spark Summit
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopPatricia Gorla
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...StampedeCon
 
Spark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 FuriousSpark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 FuriousRussell Spitzer
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureRussell Spitzer
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.Natalino Busa
 
Spark Streaming with Cassandra
Spark Streaming with CassandraSpark Streaming with Cassandra
Spark Streaming with CassandraJacek Lewandowski
 
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...DataStax
 
Heuritech: Apache Spark REX
Heuritech: Apache Spark REXHeuritech: Apache Spark REX
Heuritech: Apache Spark REXdidmarin
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionPatrick McFadin
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 
Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Jon Haddad
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data prajods
 
Using Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraUsing Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraJim Hatcher
 
Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Russell Spitzer
 
Escape From Hadoop: Spark One Liners for C* Ops
Escape From Hadoop: Spark One Liners for C* OpsEscape From Hadoop: Spark One Liners for C* Ops
Escape From Hadoop: Spark One Liners for C* OpsRussell Spitzer
 

Mais procurados (20)

Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and Spark
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
Cassandra and Spark: Optimizing for Data Locality-(Russell Spitzer, DataStax)
 
Introduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and HadoopIntroduction to Real-Time Analytics with Cassandra and Hadoop
Introduction to Real-Time Analytics with Cassandra and Hadoop
 
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
Beyond the Query – Bringing Complex Access Patterns to NoSQL with DataStax - ...
 
Spark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 FuriousSpark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 Furious
 
Spark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and FutureSpark Cassandra Connector: Past, Present, and Future
Spark Cassandra Connector: Past, Present, and Future
 
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.Hadoop + Cassandra: Fast queries on data lakes, and  wikipedia search tutorial.
Hadoop + Cassandra: Fast queries on data lakes, and wikipedia search tutorial.
 
Spark Streaming with Cassandra
Spark Streaming with CassandraSpark Streaming with Cassandra
Spark Streaming with Cassandra
 
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
 
Heuritech: Apache Spark REX
Heuritech: Apache Spark REXHeuritech: Apache Spark REX
Heuritech: Apache Spark REX
 
Time series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long versionTime series with Apache Cassandra - Long version
Time series with Apache Cassandra - Long version
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Intro to py spark (and cassandra)
Intro to py spark (and cassandra)Intro to py spark (and cassandra)
Intro to py spark (and cassandra)
 
Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data Apache Cassandra and Python for Analyzing Streaming Big Data
Apache Cassandra and Python for Analyzing Streaming Big Data
 
Using Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into CassandraUsing Spark to Load Oracle Data into Cassandra
Using Spark to Load Oracle Data into Cassandra
 
Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0
 
Escape From Hadoop: Spark One Liners for C* Ops
Escape From Hadoop: Spark One Liners for C* OpsEscape From Hadoop: Spark One Liners for C* Ops
Escape From Hadoop: Spark One Liners for C* Ops
 

Destaque

Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache SparkJosef Adersberger
 
Transform & Analyze Time Series Data via Apache Spark @Windward
Transform & Analyze Time Series Data via Apache Spark @WindwardTransform & Analyze Time Series Data via Apache Spark @Windward
Transform & Analyze Time Series Data via Apache Spark @WindwardDemi Ben-Ari
 
Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)Pavel Hardak
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...DataStax
 
Data Modeling with Cassandra and Time Series Data
Data Modeling with Cassandra and Time Series DataData Modeling with Cassandra and Time Series Data
Data Modeling with Cassandra and Time Series DataDani Traphagen
 
Spark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit
 
Time Series Analysis with Spark
Time Series Analysis with SparkTime Series Analysis with Spark
Time Series Analysis with SparkSandy Ryza
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache SparkQAware GmbH
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016Duyhai Doan
 
Why Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldWhy Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldDean Wampler
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXKrishna Sankar
 
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기Taejun Kim
 
Time Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy RyzaTime Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy RyzaSpark Summit
 
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법NAVER D2
 
Apache Spark 입문에서 머신러닝까지
Apache Spark 입문에서 머신러닝까지Apache Spark 입문에서 머신러닝까지
Apache Spark 입문에서 머신러닝까지Donam Kim
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Spark overview 이상훈(SK C&C)_스파크 사용자 모임_20141106
Spark overview 이상훈(SK C&C)_스파크 사용자 모임_20141106Spark overview 이상훈(SK C&C)_스파크 사용자 모임_20141106
Spark overview 이상훈(SK C&C)_스파크 사용자 모임_20141106SangHoon Lee
 

Destaque (20)

Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
Transform & Analyze Time Series Data via Apache Spark @Windward
Transform & Analyze Time Series Data via Apache Spark @WindwardTransform & Analyze Time Series Data via Apache Spark @Windward
Transform & Analyze Time Series Data via Apache Spark @Windward
 
Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)Building Scalable IoT Apps (QCon S-F)
Building Scalable IoT Apps (QCon S-F)
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
 
Data Modeling with Cassandra and Time Series Data
Data Modeling with Cassandra and Time Series DataData Modeling with Cassandra and Time Series Data
Data Modeling with Cassandra and Time Series Data
 
Spark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til PifflSpark Summit EU talk by Miha Pelko and Til Piffl
Spark Summit EU talk by Miha Pelko and Til Piffl
 
Time Series Analysis with Spark
Time Series Analysis with SparkTime Series Analysis with Spark
Time Series Analysis with Spark
 
Cassandra 3.0
Cassandra 3.0Cassandra 3.0
Cassandra 3.0
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
Cassandra 3 new features 2016
Cassandra 3 new features 2016Cassandra 3 new features 2016
Cassandra 3 new features 2016
 
Why Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data WorldWhy Scala Is Taking Over the Big Data World
Why Scala Is Taking Over the Big Data World
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기Spark & Zeppelin을 활용한 머신러닝 실전 적용기
Spark & Zeppelin을 활용한 머신러닝 실전 적용기
 
Time Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy RyzaTime Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy Ryza
 
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
[D2 COMMUNITY] Spark User Group - 머신러닝 인공지능 기법
 
Apache Spark 입문에서 머신러닝까지
Apache Spark 입문에서 머신러닝까지Apache Spark 입문에서 머신러닝까지
Apache Spark 입문에서 머신러닝까지
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Spark overview 이상훈(SK C&C)_스파크 사용자 모임_20141106
Spark overview 이상훈(SK C&C)_스파크 사용자 모임_20141106Spark overview 이상훈(SK C&C)_스파크 사용자 모임_20141106
Spark overview 이상훈(SK C&C)_스파크 사용자 모임_20141106
 

Semelhante a Lightning fast analytics with Spark and Cassandra

Lightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and SparkLightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and SparkTim Vincent
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionLightbend
 
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard MaasSpark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard MaasSpark Summit
 
5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra EnvironmentJim Hatcher
 
Apache Spark RDDs
Apache Spark RDDsApache Spark RDDs
Apache Spark RDDsDean Chen
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQLjeykottalam
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
 
Spark Programming
Spark ProgrammingSpark Programming
Spark ProgrammingTaewook Eom
 
Apache Spark for Library Developers with Erik Erlandson and William Benton
Apache Spark for Library Developers with Erik Erlandson and William BentonApache Spark for Library Developers with Erik Erlandson and William Benton
Apache Spark for Library Developers with Erik Erlandson and William BentonDatabricks
 
An Introduction to Spark
An Introduction to SparkAn Introduction to Spark
An Introduction to Sparkjlacefie
 
An Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark MeetupAn Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark Meetupjlacefie
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developersChristopher Batey
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...Duyhai Doan
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingGerger
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Аліна Шепшелей
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...Inhacking
 
Declarative & workflow based infrastructure with Terraform
Declarative & workflow based infrastructure with TerraformDeclarative & workflow based infrastructure with Terraform
Declarative & workflow based infrastructure with TerraformRadek Simko
 

Semelhante a Lightning fast analytics with Spark and Cassandra (20)

Lightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and SparkLightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and Spark
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
 
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard MaasSpark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard Maas
 
5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment5 Ways to Use Spark to Enrich your Cassandra Environment
5 Ways to Use Spark to Enrich your Cassandra Environment
 
Apache Spark RDDs
Apache Spark RDDsApache Spark RDDs
Apache Spark RDDs
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Spark Programming
Spark ProgrammingSpark Programming
Spark Programming
 
Apache Spark for Library Developers with Erik Erlandson and William Benton
Apache Spark for Library Developers with Erik Erlandson and William BentonApache Spark for Library Developers with Erik Erlandson and William Benton
Apache Spark for Library Developers with Erik Erlandson and William Benton
 
Escape from Hadoop
Escape from HadoopEscape from Hadoop
Escape from Hadoop
 
An Introduction to Spark
An Introduction to SparkAn Introduction to Spark
An Introduction to Spark
 
An Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark MeetupAn Introduct to Spark - Atlanta Spark Meetup
An Introduct to Spark - Atlanta Spark Meetup
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster Computing
 
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
Vitalii Bondarenko HDinsight: spark. advanced in memory big-data analytics wi...
 
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
SE2016 BigData Vitalii Bondarenko "HD insight spark. Advanced in-memory Big D...
 
Declarative & workflow based infrastructure with Terraform
Declarative & workflow based infrastructure with TerraformDeclarative & workflow based infrastructure with Terraform
Declarative & workflow based infrastructure with Terraform
 

Mais de nickmbailey

Clojure at DataStax: The Long Road From Python to Clojure
Clojure at DataStax: The Long Road From Python to ClojureClojure at DataStax: The Long Road From Python to Clojure
Clojure at DataStax: The Long Road From Python to Clojurenickmbailey
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecturenickmbailey
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Sparknickmbailey
 
Cassandra and Clojure
Cassandra and ClojureCassandra and Clojure
Cassandra and Clojurenickmbailey
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basicsnickmbailey
 
An Introduction to Cassandra on Linux
An Introduction to Cassandra on LinuxAn Introduction to Cassandra on Linux
An Introduction to Cassandra on Linuxnickmbailey
 
Introduction to Cassandra and Data Modeling
Introduction to Cassandra and Data ModelingIntroduction to Cassandra and Data Modeling
Introduction to Cassandra and Data Modelingnickmbailey
 
CFS: Cassandra backed storage for Hadoop
CFS: Cassandra backed storage for HadoopCFS: Cassandra backed storage for Hadoop
CFS: Cassandra backed storage for Hadoopnickmbailey
 
Clojure and the Web
Clojure and the WebClojure and the Web
Clojure and the Webnickmbailey
 

Mais de nickmbailey (9)

Clojure at DataStax: The Long Road From Python to Clojure
Clojure at DataStax: The Long Road From Python to ClojureClojure at DataStax: The Long Road From Python to Clojure
Clojure at DataStax: The Long Road From Python to Clojure
 
Introduction to Cassandra Architecture
Introduction to Cassandra ArchitectureIntroduction to Cassandra Architecture
Introduction to Cassandra Architecture
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
 
Cassandra and Clojure
Cassandra and ClojureCassandra and Clojure
Cassandra and Clojure
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
 
An Introduction to Cassandra on Linux
An Introduction to Cassandra on LinuxAn Introduction to Cassandra on Linux
An Introduction to Cassandra on Linux
 
Introduction to Cassandra and Data Modeling
Introduction to Cassandra and Data ModelingIntroduction to Cassandra and Data Modeling
Introduction to Cassandra and Data Modeling
 
CFS: Cassandra backed storage for Hadoop
CFS: Cassandra backed storage for HadoopCFS: Cassandra backed storage for Hadoop
CFS: Cassandra backed storage for Hadoop
 
Clojure and the Web
Clojure and the WebClojure and the Web
Clojure and the Web
 

Último

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 

Último (20)

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 

Lightning fast analytics with Spark and Cassandra

  • 1. Lightning-fast analytics with Spark and Cassandra Nick Bailey @nickmbailey ©2014 DataStax Confidential. Do not distribute without consent. 1
  • 2. What is Spark? *Apache Project since 2010" *Fast" * 10x-100x faster than Hadoop MapReduce" * In-memory storage" * Single JVM process per node" *Easy" *Rich Scala, Java and Python APIs" * 2x-5x less code" * Interactive shell Analytic Analytic Search
  • 4. API map" filter" groupBy sort" union" join" leftOuterJoin rightOuterJoin reduce" count" fold" reduceByKey groupByKey cogroup cross" zip sample" take" first partitionBy mapWith pipe" save ...
  • 5. API *Resilient Distributed Datasets (RDD)" *Collections of objects spread across a cluster, stored in RAM or on Disk" * Built through parallel transformations" * Automatically rebuilt on failure" " *Operations" * Transformations (e.g. map, filter, groupBy)" * Actions (e.g. count, collect, save)
  • 6. Operator Graph: Optimization and Fault Tolerance join groupBy filter Stage 3 Stage 1 Stage 2 A: B: C: D: E: F: map = RDD = Cached partition
  • 7. Fast Running Time (s) 4000 3000 2000 1000 0 1 5 10 20 30 Number of Iterations 110 sec / iteration Hadoop Spark first iteration 80 sec" further iterations 1 sec * Logistic Regression Performance"
  • 8. Why Spark on Cassandra? *Data model independent queries" *Cross-table operations (JOIN, UNION, etc.)" *Complex analytics (e.g. machine learning)" *Data transformation, aggregation, etc." *Stream processing
  • 9. How to Spark on Cassandra? *DataStax Cassandra Spark driver" *Open source: https://github.com/datastax/spark-cassandra-connector" *Compatible with" * Spark 0.9+" *Cassandra 2.0+" *DataStax Enterprise 4.5+
  • 10. Cassandra Spark Driver *Cassandra tables exposed as Spark RDDs" *Read from and write to Cassandra" *Mapping of C* tables and rows to Scala objects" *All Cassandra types supported and converted to Scala types" *Server side data selection" *Spark Streaming support" *Scala and Java support
  • 11. Connecting to Cassandra // Import Cassandra-specific functions on SparkContext and RDD objects" import com.datastax.driver.spark._" " " // Spark connection options" val conf = new SparkConf(true)" ! ! .setMaster("spark://192.168.123.10:7077")" ! ! .setAppName("cassandra-demo")" .set(“cassandra.connection.host", "192.168.123.10") // initial contact! .set("cassandra.username", "cassandra")! .set("cassandra.password", "cassandra") " " val sc = new SparkContext(conf)
  • 12. Accessing Data CREATE TABLE test.words (word text PRIMARY KEY, count int);" " INSERT INTO test.words (word, count) VALUES ('bar', 30);" INSERT INTO test.words (word, count) VALUES ('foo', 20); *Accessing table above as RDD: // Use table as RDD" val rdd = sc.cassandraTable("test", "words")" // rdd: CassandraRDD[CassandraRow] = CassandraRDD[0]" " rdd.toArray.foreach(println)" // CassandraRow[word: bar, count: 30]! // CassandraRow[word: foo, count: 20]" " rdd.columnNames // Stream(word, count) " rdd.size // 2" " val firstRow = rdd.first // firstRow: CassandraRow = CassandraRow[word: bar, count: 30]! firstRow.getInt("count") // Int = 30
  • 13. Saving Data val newRdd = sc.parallelize(Seq(("cat", 40), ("fox", 50)))" // newRdd: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[2]" " newRdd.saveToCassandra("test", "words", Seq("word", "count")) *RDD above saved to Cassandra: SELECT * FROM test.words;" " word | count" ------+-------" bar | 30" foo | 20" cat | 40" fox | 50" " (4 rows)
  • 14. Type Mapping CQL Type Scala Type ascii String bigint Long boolean Boolean counter Long decimal BigDecimal, java.math.BigDecimal double Double float Float inet java.net.InetAddress int Int list Vector, List, Iterable, Seq, IndexedSeq, java.util.List map Map, TreeMap, java.util.HashMap set Set, TreeSet, java.util.HashSet text, varchar String timestamp Long, java.util.Date, java.sql.Date, org.joda.time.DateTime timeuuid java.util.UUID uuid java.util.UUID varint BigInt, java.math.BigInteger *nullable values Option
  • 15. Mapping Rows to Objects *Mapping rows to Scala Case Classes" *CQL underscore case column mapped to Scala camel case property" *Custom mapping functions (see docs) CREATE TABLE test.cars (" ! id text PRIMARY KEY," ! model text," ! fuel_type text," ! year int! ); case class Vehicle(" ! id: String," ! model: String," ! fuelType: String," ! year: Int! )" " sc.cassandraTable[Vehicle]("test", "cars").toArray! //Array(Vehicle(KF334L, Ford Mondeo, Petrol, 2009)," // Vehicle(MT8787, Hyundai x35, Diesel, 2011) !
  • 16. Server Side Data Selection *Reduce the amount of data transferred" *Selecting columns sc.cassandraTable("test", "users").select("username").toArray.foreach(println)" // CassandraRow{username: john} ! // CassandraRow{username: tom} " *Selecting rows (by clustering columns and/or secondary indexes) sc.cassandraTable("test", "cars").select("model").where("color = ?", "black").toArray.foreach(println)" // CassandraRow{model: Ford Mondeo}! // CassandraRow{model: Hyundai x35}
  • 17. Spark SQL Compatible Spark SQL Streaming ML Spark (General execution engine) Graph Cassandra
  • 18. Spark SQL *SQL query engine on top of Spark" *Hive compatible (JDBC, UDFs, types, metadata, etc.)" *Support for in-memory processing" *Pushdown of predicates to Cassandra when possible
  • 19. Spark SQL Example " import com.datastax.spark.connector._" " // Connect to the Spark cluster" val conf = new SparkConf(true)...! val sc = new SparkContext(conf)" " // Create Cassandra SQL context! val cc = new CassandraSQLContext(sc)! " // Execute SQL query" val rdd = cc.sql("SELECT * FROM keyspace.table WHERE ...”)"
  • 20. Analytics Workload Isolation Cassandra" + Spark DC Cassandra" Only DC Online App Analytical App Mixed Load Cassandra Cluster
  • 21. Analytics High Availability *Spark Workers run on all Cassandra nodes" *Workers are resilient by default" *First Spark node promoted as Spark Master" *Standby Master promoted on failure" *Master HA available in DataStax Enterprise Spark Master Spark Standby Master Spark Worker