SlideShare uma empresa Scribd logo
1 de 67
Baixar para ler offline
Escape From Hadoop: 
Spark One Liners for C* Ops 
Kurt Russell Spitzer 
DataStax
Who am I? 
• Bioinformatics Ph.D from UCSF 
• Works on the integration of 
Cassandra (C*) with Hadoop, 
Solr, and SPARK!! 
• Spends a lot of time spinning 
up clusters on EC2, GCE, 
Azure, … 
http://www.datastax.com/dev/ 
blog/testing-cassandra-1000- 
nodes-at-a-time 
• Developing new ways to make 
sure that C* Scales
Why escape from Hadoop? 
HADOOP 
Many Moving Pieces 
Map Reduce 
Single Points of Failure 
Lots of Overhead 
And there is a way out!
Spark Provides a Simple and Efficient 
framework for Distributed Computations 
Node Roles 2 
In Memory Caching Yes! 
Generic DAG Execution Yes! 
Great Abstraction For Datasets? RDD! 
Spark 
Worker 
Spark 
Worker 
Spark 
Master 
Spark 
Worker 
Resilient Distributed 
Dataset 
Spark Executor
Spark is Compatible with HDFS, 
Parquet, CSVs, ….
Spark is Compatible with HDFS, 
Parquet, CSVs, …. 
AND 
APACHE CASSANDRA 
Apache 
Cassandra
Apache Cassandra is a Linearly Scaling 
and Fault Tolerant noSQL Database 
Linearly Scaling: 
The power of the database 
increases linearly with the 
number of machines 
2x machines = 2x throughput 
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html 
Fault Tolerant: 
Nodes down != Database Down 
Datacenter down != Database Down
Apache Cassandra 
Architecture is Very Simple 
Node Roles 1 
Replication Tunable 
Replication 
Consistency Tunable 
C* 
C* C* 
C* 
Client
DataStax OSS Connector 
Spark to Cassandra 
https://github.com/datastax/spark4cassandra4connector 
Cassandra Spark 
Keyspace Table 
RDD[CassandraRow] 
RDD[Tuples] 
Bundled9and9Supported9with9DSE94.5!
Spark Cassandra Connector uses the 
DataStax Java Driver to Read from and 
Write to C* 
Spark C* 
Full Token 
Range 
Each Executor Maintains 
a connection to the C* 
Cluster 
Spark 
Executor 
DataStax 
Java Driver 
Tokens 1001 -2000 
Tokens 1-1000 
Tokens … 
RDD’s read into different 
splits based on sets of 
tokens
Co-locate Spark and C* for 
Best Performance 
C* 
C* C* 
Spark 
Worker 
C* 
Spark 
Worker 
Spark 
Master 
Spark 
Running Spark Workers Worker 
on 
the same nodes as your 
C* Cluster will save 
network hops when 
reading and writing
Setting up C* and Spark 
DSE > 4.5.0 
Just start your nodes with 
dse cassandra -k 
Apache Cassandra 
Follow the excellent guide by Al Tobey 
http://tobert.github.io/post/2014-07-15-installing-cassandra-spark-stack.html
We need a Distributed System 
For Analytics and Batch Jobs 
But it doesn’t have to be complicated!
Even count needs to be 
distributed 
Ask me to write a Map Reduce 
for word count, I dare you. 
You could make this easier by adding yet another 
technology to your Hadoop Stack (hive, pig, impala) or 
we could just do one liners on the spark shell.
Basics: Getting a Table and 
Counting 
CREATE&KEYSPACE&newyork&WITH&replication&=&{'class':&'SimpleStrategy',&'replication_factor':&1&};& 
use&newyork;& 
CREATE&TABLE&presidentlocations&(&time&int,&location&text&,&PRIMARY&KEY&time&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&1&,&'White&House'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&2&,&'White&House'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&3&,&'White&House'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&4&,&'White&House'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&5&,&'Air&Force&1'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&6&,&'Air&Force&1'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&7&,&'Air&Force&1'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&8&,&'NYC'&&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&9&,&'NYC'&&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&10&,&'NYC'&&);
Basics: Getting a Table and 
Counting 
CREATE&KEYSPACE&newyork&WITH&replication&=&{'class':&'SimpleStrategy',&'replication_factor':&1&};& 
use&newyork;& 
CREATE&TABLE&presidentlocations&(&time&int,&location&text&,&PRIMARY&KEY&time&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&1&,&'White&House'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&2&,&'White&House'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&3&,&'White&House'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&4&,&'White&House'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&5&,&'Air&Force&1'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&6&,&'Air&Force&1'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&7&,&'Air&Force&1'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&8&,&'NYC'&&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&9&,&'NYC'&&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&10&,&'NYC'&&); 
scala>&sc.cassandraTable(“newyork","presidentlocations")& 
& & 
cassandraTable
Basics: Getting a Table and 
Counting 
CREATE&KEYSPACE&newyork&WITH&replication&=&{'class':&'SimpleStrategy',&'replication_factor':&1&};& 
use&newyork;& 
CREATE&TABLE&presidentlocations&(&time&int,&location&text&,&PRIMARY&KEY&time&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&1&,&'White&House'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&2&,&'White&House'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&3&,&'White&House'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&4&,&'White&House'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&5&,&'Air&Force&1'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&6&,&'Air&Force&1'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&7&,&'Air&Force&1'&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&8&,&'NYC'&&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&9&,&'NYC'&&);& 
INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&10&,&'NYC'&&); 
scala>&sc.cassandraTable(“newyork","presidentlocations")& 
& .count& 
res3:&Long&=&10 
cassandraTable 
count 
10
Basics: take() and toArray 
scala>&sc.cassandraTable("newyork","presidentlocations")& 
cassandraTable
Basics: take() and toArray 
scala>&sc.cassandraTable("newyork","presidentlocations").take(1)& 
! 
res2:&Array[com.datastax.spark.connector.CassandraRow]&=&Array(CassandraRow{time:&9,&location:&NYC}) 
cassandraTable 
take(1) 
Array of CassandraRows 
9 NYC
Basics: take() and toArray 
scala>&sc.cassandraTable("newyork","presidentlocations").take(1)& 
! 
res2:&Array[com.datastax.spark.connector.CassandraRow]&=&Array(CassandraRow{time:&9,&location:&NYC}) 
cassandraTable 
take(1) 
Array of CassandraRows 
9 NYC 
scala>&sc.cassandraTable(“newyork","presidentlocations") 
cassandraTable
Basics: take() and toArray 
scala>&sc.cassandraTable("newyork","presidentlocations").take(1)& 
! 
res2:&Array[com.datastax.spark.connector.CassandraRow]&=&Array(CassandraRow{time:&9,&location:&NYC}) 
cassandraTable 
take(1) 
Array of CassandraRows 
9 NYC 
scala>&sc.cassandraTable(“newyork","presidentlocations").toArray& 
! 
res3:&Array[com.datastax.spark.connector.CassandraRow]&=&Array(& 
& CassandraRow{time:&9,&location:&NYC},&& 
& CassandraRow{time:&3,&location:&White&House},&& 
& …, 
& CassandraRow{time:&6,&location:&Air&Force&1}) 
cassandraTable 
toArray 
Array of CassandraRows 
9 NYC 
99 NNYYCC 99 NNYYCC
Basics: Getting Row Values 
out of a CassandraRow 
scala>&sc.cassandraTable("newyork","presidentlocations").take(1)(0).get[Int]("time")& 
! 
res5:&Int&=&9 
cassandraTable 
http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/spark/sparkSupportedTypes.html
Basics: Getting Row Values 
out of a CassandraRow 
scala>&sc.cassandraTable("newyork","presidentlocations").take(1)(0).get[Int]("time")& 
! 
res5:&Int&=&9 
cassandraTable 
take(1) 
Array of CassandraRows 
9 NYC 
http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/spark/sparkSupportedTypes.html
Basics: Getting Row Values 
out of a CassandraRow 
scala>&sc.cassandraTable("newyork","presidentlocations").take(1)(0).get[Int]("time")& 
! 
res5:&Int&=&9 
cassandraTable 
take(1) 
Array of CassandraRows 
9 NYC 
9 
get[Int] 
get[Int] 
get[String] 
… 
get[Any] 
Got Null ? 
get[Option[Int]] 
http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/spark/sparkSupportedTypes.html
Copy A Table 
Say we want to restructure our table or add a new column? 
CREATE&TABLE&characterlocations&(& 
& time&int,&& 
& character&text,&& 
& location&text,&& 
& PRIMARY&KEY&(time,character)& 
);
Copy A Table 
Say we want to restructure our table or add a new column? 
CREATE&TABLE&characterlocations&(& 
& time&int,&& 
& character&text,&& 
& location&text,&& 
& PRIMARY&KEY&(time,character)& 
); 
sc.cassandraTable(“newyork","presidentlocations")& 
& .map(&row&=>&(& 
& & & row.get[Int](“time"),& 
& & & "president",&& 
& & & row.get[String](“location")& 
& )).saveToCassandra("newyork","characterlocations") 
cassandraTable 
1 white house
Copy A Table 
Say we want to restructure our table or add a new column? 
CREATE&TABLE&characterlocations&(& 
& time&int,&& 
& character&text,&& 
& location&text,&& 
& PRIMARY&KEY&(time,character)& 
); 
sc.cassandraTable(“newyork","presidentlocations")& 
& .map(&row&=>&(& 
& & & row.get[Int](“time"),& 
& & & "president",&& 
& & & row.get[String](“location")& 
& )).saveToCassandra("newyork","characterlocations") 
cassandraTable 
1 white house
Copy A Table 
Say we want to restructure our table or add a new column? 
CREATE&TABLE&characterlocations&(& 
& time&int,&& 
& character&text,&& 
& location&text,&& 
& PRIMARY&KEY&(time,character)& 
); 
sc.cassandraTable(“newyork","presidentlocations")& 
& .map(&row&=>&(& 
& & & row.get[Int](“time"),& 
& & & "president",&& 
& & & row.get[String](“location")& 
& )).saveToCassandra("newyork","characterlocations") 
cassandraTable 
get[Int] get[String] 
1 white house 
1,president,white house
get[Int] get[String] 
C* 
Copy A Table 
Say we want to restructure our table or add a new column? 
CREATE&TABLE&characterlocations&(& 
& time&int,&& 
& character&text,&& 
& location&text,&& 
& PRIMARY&KEY&(time,character)& 
); 
sc.cassandraTable(“newyork","presidentlocations")& 
& .map(&row&=>&(& 
& & & row.get[Int](“time"),& 
& & & "president",&& 
& & & row.get[String](“location")& 
& )).saveToCassandra("newyork","characterlocations") 
cassandraTable 
1 white house 
1,president,white house 
saveToCassandra
get[Int] get[String] 
C* 
Copy A Table 
Say we want to restructure our table or add a new column? 
CREATE&TABLE&characterlocations&(& 
& time&int,&& 
& character&text,&& 
& location&text,&& 
& PRIMARY&KEY&(time,character)& 
); 
sc.cassandraTable(“newyork","presidentlocations")& 
& .map(&row&=>&(& 
& & & row.get[Int](“time"),& 
& & & "president",&& 
& & & row.get[String](“location")& 
& )).saveToCassandra("newyork","characterlocations") 
cqlsh:newyork>&SELECT&*&FROM&characterlocations&;& 
! 
&time&|&character&|&location& 
kkkkkk+kkkkkkkkkkk+kkkkkkkkkkkkk& 
&&&&5&|&president&|&Air&Force&1& 
&&&10&|&president&|&&&&&&&&&NYC& 
…& 
…& 
cassandraTable 
1 white house 
1,president,white house 
saveToCassandra
Filter a Table 
What if we want to filter based on a 
non-clustering key column? 
scala>&sc.cassandraTable(“newyork","presidentlocations")& 
& .filter(&_.get[Int]("time")&>&7&)& 
& .toArray& 
! 
res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& 
Array(& 
& CassandraRow{time:&9,&location:&NYC},&& 
& CassandraRow{time:&10,&location:&NYC},&& 
& CassandraRow{time:&8,&location:&NYC}& 
) 
cassandraTable
Filter a Table 
What if we want to filter based on a 
non-clustering key column? 
scala>&sc.cassandraTable(“newyork","presidentlocations")& 
& .filter(&_.get[Int]("time")&>&7&)& 
& .toArray& 
! 
res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& 
Array(& 
& CassandraRow{time:&9,&location:&NYC},&& 
& CassandraRow{time:&10,&location:&NYC},&& 
& CassandraRow{time:&8,&location:&NYC}& 
) 
cassandraTable 
Filter
Filter a Table 
What if we want to filter based on a 
non-clustering key column? 
scala>&sc.cassandraTable(“newyork","presidentlocations")& 
& .filter(&_.get[Int]("time")&>&7&)& 
& .toArray& 
! 
res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& 
Array(& 
& CassandraRow{time:&9,&location:&NYC},&& 
& CassandraRow{time:&10,&location:&NYC},&& 
& CassandraRow{time:&8,&location:&NYC}& 
) 
cassandraTable 
Filter 
_ (Anonymous Param) 
1 white house
Filter a Table 
What if we want to filter based on a 
non-clustering key column? 
scala>&sc.cassandraTable(“newyork","presidentlocations")& 
& .filter(&_.get[Int]("time")&>&7&)& 
& .toArray& 
! 
res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& 
Array(& 
& CassandraRow{time:&9,&location:&NYC},&& 
& CassandraRow{time:&10,&location:&NYC},&& 
& CassandraRow{time:&8,&location:&NYC}& 
) 
cassandraTable 
Filter 
1 white house 
get[Int] 
1 
_ (Anonymous Param)
Filter a Table 
What if we want to filter based on a 
non-clustering key column? 
scala>&sc.cassandraTable(“newyork","presidentlocations")& 
& .filter(&_.get[Int]("time")&>&7&)& 
& .toArray& 
! 
res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& 
Array(& 
& CassandraRow{time:&9,&location:&NYC},&& 
& CassandraRow{time:&10,&location:&NYC},&& 
& CassandraRow{time:&8,&location:&NYC}& 
) 
cassandraTable 
_ (Anonymous Param) >7 
1 white house 
get[Int] 
1 
Filter
Filter a Table 
What if we want to filter based on a 
non-clustering key column? 
scala>&sc.cassandraTable(“newyork","presidentlocations")& 
& .filter(&_.get[Int]("time")&>&7&)& 
& .toArray& 
! 
res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& 
Array(& 
& CassandraRow{time:&9,&location:&NYC},&& 
& CassandraRow{time:&10,&location:&NYC},&& 
& CassandraRow{time:&8,&location:&NYC}& 
) 
cassandraTable 
_ (Anonymous Param) >7 
1 white house 
get[Int] 
1 
Filter
Backfill a Table with a 
Different Key! 
CREATE&TABLE&timelines&(& 
&&time&int,& 
&&character&text,& 
&&location&text,& 
&&PRIMARY&KEY&((character),&time)& 
) 
If we actually want to have quick 
access to timelines we need a 
C* table with a different 
structure.
Backfill a Table with a 
Different Key! 
CREATE&TABLE&timelines&(& 
&&time&int,& 
&&character&text,& 
&&location&text,& 
&&PRIMARY&KEY&((character),&time)& 
) 
If we actually want to have quick 
access to timelines we need a 
C* table with a different 
structure. 
sc.cassandraTable(“newyork","characterlocations")& 
& .saveToCassandra("newyork","timelines") 
1 white house 
cassandraTable 
president
Backfill a Table with a 
Different Key! 
CREATE&TABLE&timelines&(& 
&&time&int,& 
&&character&text,& 
&&location&text,& 
&&PRIMARY&KEY&((character),&time)& 
) 
If we actually want to have quick 
access to timelines we need a 
C* table with a different 
structure. 
sc.cassandraTable(“newyork","characterlocations")& 
& .saveToCassandra("newyork","timelines") 
1 white house 
cassandraTable 
saveToCassandra 
president C*
Backfill a Table with a 
Different Key! 
CREATE&TABLE&timelines&(& 
&&time&int,& 
&&character&text,& 
&&location&text,& 
&&PRIMARY&KEY&((character),&time)& 
) 
If we actually want to have quick 
access to timelines we need a 
C* table with a different 
structure. 
sc.cassandraTable(“newyork","characterlocations")& 
& .saveToCassandra("newyork","timelines") 
cqlsh:newyork>&select&*&from&timelines;& 
! 
&character&|&time&|&location& 
kkkkkkkkkkk+kkkkkk+kkkkkkkkkkkkk& 
&president&|&&&&1&|&White&House& 
&president&|&&&&2&|&White&House& 
&president&|&&&&3&|&White&House& 
&president&|&&&&4&|&White&House& 
&president&|&&&&5&|&Air&Force&1& 
&president&|&&&&6&|&Air&Force&1& 
&president&|&&&&7&|&Air&Force&1& 
&president&|&&&&8&|&&&&&&&&&NYC& 
&president&|&&&&9&|&&&&&&&&&NYC& 
&president&|&&&10&|&&&&&&&&&NYC 
1 white house 
cassandraTable 
saveToCassandra 
president C*
Import a CSV 
I have some data in another source which I 
could really use in my Cassandra table 
sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& 
& .map(_.split(“,"))& 
& .map(&line&=>&& 
& & (line(0),line(1),line(2)))& 
& .saveToCassandra("newyork","timelines") 
textFile
Import a CSV 
I have some data in another source which I 
could really use in my Cassandra table 
sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& 
& .map(_.split(“,"))& 
& .map(&line&=>&& 
& & (line(0),line(1),line(2)))& 
& .saveToCassandra("newyork","timelines") 
textFile 
Map 
plissken,1,Federal Reserve 
split 
plissken 1 Federal Reserve
Import a CSV 
I have some data in another source which I 
could really use in my Cassandra table 
sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& 
& .map(_.split(“,"))& 
& .map(&line&=>&& 
& & (line(0),line(1),line(2)))& 
& .saveToCassandra("newyork","timelines") 
textFile 
Map 
plissken,1,Federal Reserve 
split 
plissken 1 Federal Reserve 
plissken,1,Federal Reserve
Import a CSV 
I have some data in another source which I 
could really use in my Cassandra table 
sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& 
& .map(_.split(“,"))& 
& .map(&line&=>&& 
& & (line(0),line(1),line(2)))& 
& .saveToCassandra("newyork","timelines") 
textFile 
Map 
plissken,1,Federal Reserve 
split 
plissken 1 Federal Reserve 
plissken,1,Federal Reserve 
saveToCassandra 
C*
Import a CSV 
I have some data in another source which I 
could really use in my Cassandra table 
sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& 
& .map(_.split(“,"))& 
& .map(&line&=>&& 
& & (line(0),line(1),line(2)))& 
& .saveToCassandra("newyork","timelines") 
textFile 
Map 
plissken,1,white house 
split 
plissken 1 white house 
plissken,1,white house 
saveToCassandra 
C* 
cqlsh:newyork>&select&*&from&timelines&where&character&=&'plissken';& 
! 
&character&|&time&|&location& 
kkkkkkkkkkk+kkkkkk+kkkkkkkkkkkkkkkkk& 
&&plissken&|&&&&1&|&Federal&Reserve& 
&&plissken&|&&&&2&|&Federal&Reserve& 
&&plissken&|&&&&3&|&Federal&Reserve& 
&&plissken&|&&&&4&|&&&&&&&&&&&Court& 
&&plissken&|&&&&5&|&&&&&&&&&&&Court& 
&&plissken&|&&&&6&|&&&&&&&&&&&Court& 
&&plissken&|&&&&7&|&&&&&&&&&&&Court& 
&&plissken&|&&&&8&|&&Stealth&Glider& 
&&plissken&|&&&&9&|&&&&&&&&&&&&&NYC& 
&&plissken&|&&&10&|&&&&&&&&&&&&&NYC
Import a CSV 
I have some data in another source which I 
could really use in my Cassandra table 
sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& 
& .map(_.split(“,"))& 
& .map(&line&=>&& 
& & (line(0),line(1),line(2)))& 
& .saveToCassandra("newyork","timelines") 
textFile 
Map 
plissken,1,white house 
split 
plissken 1 white house 
plissken,1,white house 
saveToCassandra 
C* 
cqlsh:newyork>&select&*&from&timelines&where&character&=&'plissken';& 
! 
&character&|&time&|&location& 
kkkkkkkkkkk+kkkkkk+kkkkkkkkkkkkkkkkk& 
&&plissken&|&&&&1&|&Federal&Reserve& 
&&plissken&|&&&&2&|&Federal&Reserve& 
&&plissken&|&&&&3&|&Federal&Reserve& 
&&plissken&|&&&&4&|&&&&&&&&&&&Court& 
&&plissken&|&&&&5&|&&&&&&&&&&&Court& 
&&plissken&|&&&&6&|&&&&&&&&&&&Court& 
&&plissken&|&&&&7&|&&&&&&&&&&&Court& 
&&plissken&|&&&&8&|&&Stealth&Glider& 
&&plissken&|&&&&9&|&&&&&&&&&&&&&NYC& 
&&plissken&|&&&10&|&&&&&&&&&&&&&NYC
Perform a Join with MySQL 
Maybe a little more than one line … 
MySQL Table “quotes” in “escape_from_ny” 
import&java.sql._& 
import&org.apache.spark.rdd.JdbcRDD& 
Class.forName(“com.mysql.jdbc.Driver”).newInstance();//Connector/J&added&toSpark&Shell&Classpath& 
val&quotes&=&new&JdbcRDD(& 
& sc,&& 
& ()&=>&{& 
& & DriverManager.getConnection("jdbc:mysql://Localhost/escape_from_ny?user=root")},&& 
& "SELECT&*&FROM&quotes&WHERE&?&<=&ID&and&ID&<=&?”,& 
& 0,& 
& 100,& 
& 5,&& 
& (r:&ResultSet)&=>&{& 
& & (r.getInt(2),r.getString(3))& 
& }& 
)& 
! 
quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23&
Perform a Join with MySQL 
Maybe a little more than one line … 
quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23& 
! 
quotes.join(& 
& sc.cassandraTable(“newyork","timelines")& 
& .filter(&_.get[String]("character")&==&“plissken")& 
& .map(&row&=>&(row.get[Int](“time"),row.get[String]("location"))))& 
& .take(1)& 
& .foreach(println)& 
! 
(5,& 
& (Bob&Hauk:&& There&was&an&accident.&& 
& & & About&an&hour&ago,&a&small&jet&went&down&inside&New&York&City.&& 
& & & The&President&was&on&board.& 
& &Snake&Plissken:&The&president&of&what?,& 
& Court)& 
) 
cassandraTable 
JdbcRDD 
Needs to be in the form of RDD[K,V] 
5, ‘Bob Hauk: …'
Perform a Join with MySQL 
Maybe a little more than one line … 
quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23& 
! 
quotes.join(& 
& sc.cassandraTable(“newyork","timelines")& 
& .filter(&_.get[String]("character")&==&“plissken")& 
& .map(&row&=>&(row.get[Int](“time"),row.get[String]("location"))))& 
& .take(1)& 
& .foreach(println)& 
! 
(5,& 
& (Bob&Hauk:&& There&was&an&accident.&& 
& & & About&an&hour&ago,&a&small&jet&went&down&inside&New&York&City.&& 
& & & The&President&was&on&board.& 
& &Snake&Plissken:&The&president&of&what?,& 
& Court)& 
) 
cassandraTable 
JdbcRDD 
plissken,5,court 
5,court 
5, ‘Bob Hauk: …'
Perform a Join with MySQL 
Maybe a little more than one line … 
quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23& 
! 
quotes.join(& 
& sc.cassandraTable(“newyork","timelines")& 
& .filter(&_.get[String]("character")&==&“plissken")& 
& .map(&row&=>&(row.get[Int](“time"),row.get[String]("location"))))& 
& .take(1)& 
& .foreach(println)& 
! 
(5,& 
& (Bob&Hauk:&& There&was&an&accident.&& 
& & & About&an&hour&ago,&a&small&jet&went&down&inside&New&York&City.&& 
& & & The&President&was&on&board.& 
& &Snake&Plissken:&The&president&of&what?,& 
& Court)& 
) 
cassandraTable 
JdbcRDD 
plissken,5,court 
5,court 5,(‘Bob Hauk: …’,court) 
5, ‘Bob Hauk: …'
Perform a Join with MySQL 
Maybe a little more than one line … 
quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23& 
! 
quotes.join(& 
& sc.cassandraTable(“newyork","timelines")& 
& .filter(&_.get[String]("character")&==&“plissken")& 
& .map(&row&=>&(row.get[Int](“time"),row.get[String]("location"))))& 
& .take(1)& 
& .foreach(println)& 
! 
(5,& 
& (Bob&Hauk:&& There&was&an&accident.&& 
& & & About&an&hour&ago,&a&small&jet&went&down&inside&New&York&City.&& 
& & & The&President&was&on&board.& 
& &Snake&Plissken:&The&president&of&what?,& 
& Court)& 
) 
cassandraTable 
JdbcRDD 
plissken,5,court 
5,court 5,(‘Bob Hauk: …’,court) 
5, ‘Bob Hauk: …'
Easy Objects with Case 
Classes 
We have the technology to make this even easier! 
case&class&timelineRow&&(character:String,&time:Int,&location:String)& 
sc.cassandraTable[timelineRow](“newyork","timelines")& 
& .filter(&_.character&==&“plissken")& 
& .filter(&_.time&==&8)& 
& .toArray& 
res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) 
timelineRow 
character,time,location
Easy Objects with Case 
Classes 
We have the technology to make this even easier! 
case&class&timelineRow&&(character:String,&time:Int,&location:String)& 
sc.cassandraTable[timelineRow](“newyork","timelines")& 
& .filter(&_.character&==&“plissken")& 
& .filter(&_.time&==&8)& 
& .toArray& 
res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) 
cassandraTable[timelineRow] 
timelineRow 
character,time,location
Easy Objects with Case 
Classes 
We have the technology to make this even easier! 
case&class&timelineRow&&(character:String,&time:Int,&location:String)& 
sc.cassandraTable[timelineRow](“newyork","timelines")& 
& .filter(&_.character&==&“plissken")& 
& .filter(&_.time&==&8)& 
& .toArray& 
res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) 
cassandraTable[timelineRow] 
timelineRow 
character,time,location 
filter 
character == plissken
Easy Objects with Case 
Classes 
We have the technology to make this even easier! 
case&class&timelineRow&&(character:String,&time:Int,&location:String)& 
sc.cassandraTable[timelineRow](“newyork","timelines")& 
& .filter(&_.character&==&“plissken")& 
& .filter(&_.time&==&8)& 
& .toArray& 
res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) 
cassandraTable[timelineRow] 
timelineRow 
character,time,location 
filter 
character == plissken 
time == 8
Easy Objects with Case 
Classes 
We have the technology to make this even easier! 
case&class&timelineRow&&(character:String,&time:Int,&location:String)& 
sc.cassandraTable[timelineRow](“newyork","timelines")& 
& .filter(&_.character&==&“plissken")& 
& .filter(&_.time&==&8)& 
& .toArray& 
res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) 
cassandraTable[timelineRow] 
timelineRow 
character,time,location 
filter 
character == plissken 
time == 8 
character:plissken,time:8,location: Stealth Glider
Easy Objects with Case 
Classes 
We have the technology to make this even easier! 
case&class&timelineRow&&(character:String,&time:Int,&location:String)& 
sc.cassandraTable[timelineRow](“newyork","timelines")& 
& .filter(&_.character&==&“plissken")& 
& .filter(&_.time&==&8)& 
& .toArray& 
res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) 
The Future 
cassandraTable[timelineRow] 
timelineRow 
character,time,location 
filter 
character == plissken 
time == 8 
character:plissken,time:8,location: Stealth Glider
A Map Reduce for Word 
Count … 
scala>&sc.cassandraTable(“newyork”,"presidentlocations")& 
& .map(&_.get[String](“location”)&)& 
& .flatMap(&_.split(“&“))& 
& .map(&(_,1))& 
& .reduceByKey(&_&+&_&)& 
& .toArray& 
res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 
cassandraTable
A Map Reduce for Word 
Count … 
scala>&sc.cassandraTable(“newyork”,"presidentlocations")& 
& .map(&_.get[String](“location”)&)& 
& .flatMap(&_.split(“&“))& 
& .map(&(_,1))& 
& .reduceByKey(&_&+&_&)& 
& .toArray& 
res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 
1 white house 
cassandraTable 
get[String]
A Map Reduce for Word 
Count … 
scala>&sc.cassandraTable(“newyork”,"presidentlocations")& 
& .map(&_.get[String](“location”)&)& 
& .flatMap(&_.split(“&“))& 
& .map(&(_,1))& 
& .reduceByKey(&_&+&_&)& 
& .toArray& 
res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 
1 white house 
white house 
cassandraTable 
get[String] 
_.split()
A Map Reduce for Word 
Count … 
scala>&sc.cassandraTable(“newyork”,"presidentlocations")& 
& .map(&_.get[String](“location”)&)& 
& .flatMap(&_.split(“&“))& 
& .map(&(_,1))& 
& .reduceByKey(&_&+&_&)& 
& .toArray& 
res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 
1 white house 
white house 
white, 1 house, 1 
cassandraTable 
get[String] 
_.split() 
(_,1)
A Map Reduce for Word 
Count … 
scala>&sc.cassandraTable(“newyork”,"presidentlocations")& 
& .map(&_.get[String](“location”)&)& 
& .flatMap(&_.split(“&“))& 
& .map(&(_,1))& 
& .reduceByKey(&_&+&_&)& 
& .toArray& 
res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 
1 white house 
white house 
white, 1 house, 1 
house, 1 house, 1 
house, 2 
cassandraTable 
get[String] 
_.split() 
(_,1) 
_ + _
A Map Reduce for Word 
Count … 
scala>&sc.cassandraTable(“newyork”,"presidentlocations")& 
& .map(&_.get[String](“location”)&)& 
& .flatMap(&_.split(“&“))& 
& .map(&(_,1))& 
& .reduceByKey(&_&+&_&)& 
& .toArray& 
res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 
1 white house 
white house 
white, 1 house, 1 
house, 1 house, 1 
house, 2 
cassandraTable 
get[String] 
_.split() 
(_,1) 
_ + _
Stand Alone App Example 
https://github.com/RussellSpitzer/spark4cassandra4csv 
Car,:Model,:Color 
Dodge,:Caravan,:Red: 
Ford,:F150,:Black: 
Toyota,:Prius,:Green 
Spark SCC 
RDD: 
[CassandraRow] 
!!! 
FavoriteCars 
Table 
Cassandra 
Column:Mapping 
CSV
Thanks for listening! 
There is plenty more we can do with Spark but … 
Questions?
Getting started with Cassandra?! 
DataStax Academy offers free online Cassandra training! 
Planet Cassandra has resources for learning the basics from ‘Try Cassandra’ tutorials to in depth 
language and migration pages! 
Find a way to contribute back to the community: talk at a meetup, or share your story on 
PlanetCassandra.org! 
Need help? Get questions answered with Planet Cassandra’s free virtual office hours running weekly! 
Email us: Community@DataStax.com! 
Thanks for coming to the meetup!! 
In production?! 
Tweet us: @PlanetCassandra!
Thanks:for:your:Time:and:Come:to:C*:Summit!: 
SEPTEMBER91094911,9201499|99SAN9FRANCISCO,9CALIF.99|99THE9WESTIN9ST.9FRANCIS9HOTEL 
Cassandra:Summit:Link

Mais conteúdo relacionado

Mais procurados

Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache SparkJosef Adersberger
 
Maximum Overdrive: Tuning the Spark Cassandra Connector
Maximum Overdrive: Tuning the Spark Cassandra ConnectorMaximum Overdrive: Tuning the Spark Cassandra Connector
Maximum Overdrive: Tuning the Spark Cassandra ConnectorRussell Spitzer
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayMatthias Niehoff
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesRussell Spitzer
 
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraEscape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraPiotr Kolaczkowski
 
Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Russell Spitzer
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & SparkMatthias Niehoff
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax EnablementVincent Poncet
 
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...DataStax
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraRustam Aliyev
 
Spark Streaming with Cassandra
Spark Streaming with CassandraSpark Streaming with Cassandra
Spark Streaming with CassandraJacek Lewandowski
 
Real time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesosReal time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesosRahul Kumar
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and SparkEvan Chan
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisDuyhai Doan
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team ApachePatrick McFadin
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016StampedeCon
 
PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slidesDat Tran
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015Patrick McFadin
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is FailingDataWorks Summit
 

Mais procurados (20)

Time Series Processing with Apache Spark
Time Series Processing with Apache SparkTime Series Processing with Apache Spark
Time Series Processing with Apache Spark
 
Maximum Overdrive: Tuning the Spark Cassandra Connector
Maximum Overdrive: Tuning the Spark Cassandra ConnectorMaximum Overdrive: Tuning the Spark Cassandra Connector
Maximum Overdrive: Tuning the Spark Cassandra Connector
 
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials DayAnalytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
Analytics with Cassandra, Spark & MLLib - Cassandra Essentials Day
 
Spark Cassandra Connector Dataframes
Spark Cassandra Connector DataframesSpark Cassandra Connector Dataframes
Spark Cassandra Connector Dataframes
 
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & CassandraEscape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
Escape from Hadoop: Ultra Fast Data Analysis with Spark & Cassandra
 
Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0
 
Analytics with Cassandra & Spark
Analytics with Cassandra & SparkAnalytics with Cassandra & Spark
Analytics with Cassandra & Spark
 
Apache Spark and DataStax Enablement
Apache Spark and DataStax EnablementApache Spark and DataStax Enablement
Apache Spark and DataStax Enablement
 
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
Lessons from Cassandra & Spark (Matthias Niehoff & Stephan Kepser, codecentri...
 
Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Spark Streaming with Cassandra
Spark Streaming with CassandraSpark Streaming with Cassandra
Spark Streaming with Cassandra
 
Real time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesosReal time data pipeline with spark streaming and cassandra with mesos
Real time data pipeline with spark streaming and cassandra with mesos
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
 
OLAP with Cassandra and Spark
OLAP with Cassandra and SparkOLAP with Cassandra and Spark
OLAP with Cassandra and Spark
 
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 ParisReal time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
Real time data processing with spark & cassandra @ NoSQLMatters 2015 Paris
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
 
PySpark in practice slides
PySpark in practice slidesPySpark in practice slides
PySpark in practice slides
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 

Destaque

Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
SGS Tekniks - Best Electronic Contract Manufacturing Company in India
SGS Tekniks - Best Electronic Contract Manufacturing Company in IndiaSGS Tekniks - Best Electronic Contract Manufacturing Company in India
SGS Tekniks - Best Electronic Contract Manufacturing Company in IndiaSGS Tekniks
 
Earned value management lecture 2009e my31
Earned value management lecture 2009e my31Earned value management lecture 2009e my31
Earned value management lecture 2009e my31rongo620
 
Focusing on the Threats to the Detriment of the Vulnerabilities
Focusing on the Threats to the Detriment of the VulnerabilitiesFocusing on the Threats to the Detriment of the Vulnerabilities
Focusing on the Threats to the Detriment of the VulnerabilitiesRoger Johnston
 
Interoperability in a Highly Decentralised Country- Lessons Learned
Interoperability in a Highly Decentralised Country- Lessons LearnedInteroperability in a Highly Decentralised Country- Lessons Learned
Interoperability in a Highly Decentralised Country- Lessons LearnedPlan de Calidad para el SNS
 
A good horse runs even at the shadow of the whip
A good horse runs even at the shadow of the whipA good horse runs even at the shadow of the whip
A good horse runs even at the shadow of the whipRhea Myers
 
RAKOR PERSIAPAN MUSRENBANG TAHUN 2015 - BAPPEDA GRESIK
RAKOR PERSIAPAN MUSRENBANG TAHUN 2015 - BAPPEDA GRESIKRAKOR PERSIAPAN MUSRENBANG TAHUN 2015 - BAPPEDA GRESIK
RAKOR PERSIAPAN MUSRENBANG TAHUN 2015 - BAPPEDA GRESIKM Handoko
 
E book the-art_of_internet_dating
E book the-art_of_internet_datingE book the-art_of_internet_dating
E book the-art_of_internet_datingr_rahulsingh1988
 
Rapport Doing Business 2015
Rapport Doing Business 2015Rapport Doing Business 2015
Rapport Doing Business 2015Franck Dasilva
 
A Little Pumpkin Likes Reading Books
A Little Pumpkin Likes Reading BooksA Little Pumpkin Likes Reading Books
A Little Pumpkin Likes Reading BooksPEPY Empowering Youth
 
AmyandSusan
AmyandSusanAmyandSusan
AmyandSusansgrobins
 
Gutell 091.imb.2004.13.495
Gutell 091.imb.2004.13.495Gutell 091.imb.2004.13.495
Gutell 091.imb.2004.13.495Robin Gutell
 
المعارض والمؤتمرات شهر يونيو
المعارض والمؤتمرات شهر يونيوالمعارض والمؤتمرات شهر يونيو
المعارض والمؤتمرات شهر يونيوPalestinian Business Forum
 
Prasanth Kumar Nadh Dehydrogenase Subunit 1
Prasanth Kumar Nadh Dehydrogenase Subunit 1Prasanth Kumar Nadh Dehydrogenase Subunit 1
Prasanth Kumar Nadh Dehydrogenase Subunit 1Prasanthperceptron
 
P pt keys for good and happy life.
P pt keys for good and happy life.P pt keys for good and happy life.
P pt keys for good and happy life.Rajasekhar Dasari
 

Destaque (19)

Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
Wais i
Wais   iWais   i
Wais i
 
SGS Tekniks - Best Electronic Contract Manufacturing Company in India
SGS Tekniks - Best Electronic Contract Manufacturing Company in IndiaSGS Tekniks - Best Electronic Contract Manufacturing Company in India
SGS Tekniks - Best Electronic Contract Manufacturing Company in India
 
No tlp Polisi
No tlp PolisiNo tlp Polisi
No tlp Polisi
 
Earned value management lecture 2009e my31
Earned value management lecture 2009e my31Earned value management lecture 2009e my31
Earned value management lecture 2009e my31
 
Focusing on the Threats to the Detriment of the Vulnerabilities
Focusing on the Threats to the Detriment of the VulnerabilitiesFocusing on the Threats to the Detriment of the Vulnerabilities
Focusing on the Threats to the Detriment of the Vulnerabilities
 
Interoperability in a Highly Decentralised Country- Lessons Learned
Interoperability in a Highly Decentralised Country- Lessons LearnedInteroperability in a Highly Decentralised Country- Lessons Learned
Interoperability in a Highly Decentralised Country- Lessons Learned
 
A good horse runs even at the shadow of the whip
A good horse runs even at the shadow of the whipA good horse runs even at the shadow of the whip
A good horse runs even at the shadow of the whip
 
RAKOR PERSIAPAN MUSRENBANG TAHUN 2015 - BAPPEDA GRESIK
RAKOR PERSIAPAN MUSRENBANG TAHUN 2015 - BAPPEDA GRESIKRAKOR PERSIAPAN MUSRENBANG TAHUN 2015 - BAPPEDA GRESIK
RAKOR PERSIAPAN MUSRENBANG TAHUN 2015 - BAPPEDA GRESIK
 
E book the-art_of_internet_dating
E book the-art_of_internet_datingE book the-art_of_internet_dating
E book the-art_of_internet_dating
 
Rapport Doing Business 2015
Rapport Doing Business 2015Rapport Doing Business 2015
Rapport Doing Business 2015
 
A Little Pumpkin Likes Reading Books
A Little Pumpkin Likes Reading BooksA Little Pumpkin Likes Reading Books
A Little Pumpkin Likes Reading Books
 
AmyandSusan
AmyandSusanAmyandSusan
AmyandSusan
 
Gutell 091.imb.2004.13.495
Gutell 091.imb.2004.13.495Gutell 091.imb.2004.13.495
Gutell 091.imb.2004.13.495
 
المعارض والمؤتمرات شهر يونيو
المعارض والمؤتمرات شهر يونيوالمعارض والمؤتمرات شهر يونيو
المعارض والمؤتمرات شهر يونيو
 
Sxsf
SxsfSxsf
Sxsf
 
Gscm1
Gscm1Gscm1
Gscm1
 
Prasanth Kumar Nadh Dehydrogenase Subunit 1
Prasanth Kumar Nadh Dehydrogenase Subunit 1Prasanth Kumar Nadh Dehydrogenase Subunit 1
Prasanth Kumar Nadh Dehydrogenase Subunit 1
 
P pt keys for good and happy life.
P pt keys for good and happy life.P pt keys for good and happy life.
P pt keys for good and happy life.
 

Semelhante a Escape from Hadoop

Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandranickmbailey
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...Duyhai Doan
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Data Con LA
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkVince Gonzalez
 
Intro to Spark
Intro to SparkIntro to Spark
Intro to SparkKyle Burke
 
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and SparkCassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and SparkDataStax Academy
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developersChristopher Batey
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkVictor Coustenoble
 
Apache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating SystemApache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating SystemAdarsh Pannu
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupFrens Jan Rumph
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
 
Spark Programming
Spark ProgrammingSpark Programming
Spark ProgrammingTaewook Eom
 
Lightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and SparkLightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and SparkTim Vincent
 
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...DataStax Academy
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Holden Karau
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 

Semelhante a Escape from Hadoop (20)

Lightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and CassandraLightning fast analytics with Spark and Cassandra
Lightning fast analytics with Spark and Cassandra
 
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
Cassandra and Spark, closing the gap between no sql and analytics   codemotio...Cassandra and Spark, closing the gap between no sql and analytics   codemotio...
Cassandra and Spark, closing the gap between no sql and analytics codemotio...
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - Spark
 
Intro to Spark
Intro to SparkIntro to Spark
Intro to Spark
 
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and SparkCassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
Cassandra Summit 2014: Interactive OLAP Queries using Apache Cassandra and Spark
 
3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers3 Dundee-Spark Overview for C* developers
3 Dundee-Spark Overview for C* developers
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Lightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and SparkLightning fast analytics with Cassandra and Spark
Lightning fast analytics with Cassandra and Spark
 
Apache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating SystemApache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating System
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
PySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark MeetupPySpark Cassandra - Amsterdam Spark Meetup
PySpark Cassandra - Amsterdam Spark Meetup
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Spark Programming
Spark ProgrammingSpark Programming
Spark Programming
 
Lightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and SparkLightning Fast Analytics with Cassandra and Spark
Lightning Fast Analytics with Cassandra and Spark
 
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...
Cassandra Day Denver 2014: Feelin' the Flow: Analyzing Data with Spark and Ca...
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 

Mais de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Mais de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Último

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Último (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Escape from Hadoop

  • 1. Escape From Hadoop: Spark One Liners for C* Ops Kurt Russell Spitzer DataStax
  • 2. Who am I? • Bioinformatics Ph.D from UCSF • Works on the integration of Cassandra (C*) with Hadoop, Solr, and SPARK!! • Spends a lot of time spinning up clusters on EC2, GCE, Azure, … http://www.datastax.com/dev/ blog/testing-cassandra-1000- nodes-at-a-time • Developing new ways to make sure that C* Scales
  • 3. Why escape from Hadoop? HADOOP Many Moving Pieces Map Reduce Single Points of Failure Lots of Overhead And there is a way out!
  • 4. Spark Provides a Simple and Efficient framework for Distributed Computations Node Roles 2 In Memory Caching Yes! Generic DAG Execution Yes! Great Abstraction For Datasets? RDD! Spark Worker Spark Worker Spark Master Spark Worker Resilient Distributed Dataset Spark Executor
  • 5. Spark is Compatible with HDFS, Parquet, CSVs, ….
  • 6. Spark is Compatible with HDFS, Parquet, CSVs, …. AND APACHE CASSANDRA Apache Cassandra
  • 7. Apache Cassandra is a Linearly Scaling and Fault Tolerant noSQL Database Linearly Scaling: The power of the database increases linearly with the number of machines 2x machines = 2x throughput http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html Fault Tolerant: Nodes down != Database Down Datacenter down != Database Down
  • 8. Apache Cassandra Architecture is Very Simple Node Roles 1 Replication Tunable Replication Consistency Tunable C* C* C* C* Client
  • 9. DataStax OSS Connector Spark to Cassandra https://github.com/datastax/spark4cassandra4connector Cassandra Spark Keyspace Table RDD[CassandraRow] RDD[Tuples] Bundled9and9Supported9with9DSE94.5!
  • 10. Spark Cassandra Connector uses the DataStax Java Driver to Read from and Write to C* Spark C* Full Token Range Each Executor Maintains a connection to the C* Cluster Spark Executor DataStax Java Driver Tokens 1001 -2000 Tokens 1-1000 Tokens … RDD’s read into different splits based on sets of tokens
  • 11. Co-locate Spark and C* for Best Performance C* C* C* Spark Worker C* Spark Worker Spark Master Spark Running Spark Workers Worker on the same nodes as your C* Cluster will save network hops when reading and writing
  • 12. Setting up C* and Spark DSE > 4.5.0 Just start your nodes with dse cassandra -k Apache Cassandra Follow the excellent guide by Al Tobey http://tobert.github.io/post/2014-07-15-installing-cassandra-spark-stack.html
  • 13. We need a Distributed System For Analytics and Batch Jobs But it doesn’t have to be complicated!
  • 14. Even count needs to be distributed Ask me to write a Map Reduce for word count, I dare you. You could make this easier by adding yet another technology to your Hadoop Stack (hive, pig, impala) or we could just do one liners on the spark shell.
  • 15. Basics: Getting a Table and Counting CREATE&KEYSPACE&newyork&WITH&replication&=&{'class':&'SimpleStrategy',&'replication_factor':&1&};& use&newyork;& CREATE&TABLE&presidentlocations&(&time&int,&location&text&,&PRIMARY&KEY&time&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&1&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&2&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&3&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&4&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&5&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&6&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&7&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&8&,&'NYC'&&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&9&,&'NYC'&&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&10&,&'NYC'&&);
  • 16. Basics: Getting a Table and Counting CREATE&KEYSPACE&newyork&WITH&replication&=&{'class':&'SimpleStrategy',&'replication_factor':&1&};& use&newyork;& CREATE&TABLE&presidentlocations&(&time&int,&location&text&,&PRIMARY&KEY&time&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&1&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&2&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&3&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&4&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&5&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&6&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&7&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&8&,&'NYC'&&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&9&,&'NYC'&&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&10&,&'NYC'&&); scala>&sc.cassandraTable(“newyork","presidentlocations")& & & cassandraTable
  • 17. Basics: Getting a Table and Counting CREATE&KEYSPACE&newyork&WITH&replication&=&{'class':&'SimpleStrategy',&'replication_factor':&1&};& use&newyork;& CREATE&TABLE&presidentlocations&(&time&int,&location&text&,&PRIMARY&KEY&time&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&1&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&2&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&3&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&4&,&'White&House'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&5&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&6&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&7&,&'Air&Force&1'&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&8&,&'NYC'&&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&9&,&'NYC'&&);& INSERT&INTO&presidentlocations&(time,&location&)&VALUES&(&10&,&'NYC'&&); scala>&sc.cassandraTable(“newyork","presidentlocations")& & .count& res3:&Long&=&10 cassandraTable count 10
  • 18. Basics: take() and toArray scala>&sc.cassandraTable("newyork","presidentlocations")& cassandraTable
  • 19. Basics: take() and toArray scala>&sc.cassandraTable("newyork","presidentlocations").take(1)& ! res2:&Array[com.datastax.spark.connector.CassandraRow]&=&Array(CassandraRow{time:&9,&location:&NYC}) cassandraTable take(1) Array of CassandraRows 9 NYC
  • 20. Basics: take() and toArray scala>&sc.cassandraTable("newyork","presidentlocations").take(1)& ! res2:&Array[com.datastax.spark.connector.CassandraRow]&=&Array(CassandraRow{time:&9,&location:&NYC}) cassandraTable take(1) Array of CassandraRows 9 NYC scala>&sc.cassandraTable(“newyork","presidentlocations") cassandraTable
  • 21. Basics: take() and toArray scala>&sc.cassandraTable("newyork","presidentlocations").take(1)& ! res2:&Array[com.datastax.spark.connector.CassandraRow]&=&Array(CassandraRow{time:&9,&location:&NYC}) cassandraTable take(1) Array of CassandraRows 9 NYC scala>&sc.cassandraTable(“newyork","presidentlocations").toArray& ! res3:&Array[com.datastax.spark.connector.CassandraRow]&=&Array(& & CassandraRow{time:&9,&location:&NYC},&& & CassandraRow{time:&3,&location:&White&House},&& & …, & CassandraRow{time:&6,&location:&Air&Force&1}) cassandraTable toArray Array of CassandraRows 9 NYC 99 NNYYCC 99 NNYYCC
  • 22. Basics: Getting Row Values out of a CassandraRow scala>&sc.cassandraTable("newyork","presidentlocations").take(1)(0).get[Int]("time")& ! res5:&Int&=&9 cassandraTable http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/spark/sparkSupportedTypes.html
  • 23. Basics: Getting Row Values out of a CassandraRow scala>&sc.cassandraTable("newyork","presidentlocations").take(1)(0).get[Int]("time")& ! res5:&Int&=&9 cassandraTable take(1) Array of CassandraRows 9 NYC http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/spark/sparkSupportedTypes.html
  • 24. Basics: Getting Row Values out of a CassandraRow scala>&sc.cassandraTable("newyork","presidentlocations").take(1)(0).get[Int]("time")& ! res5:&Int&=&9 cassandraTable take(1) Array of CassandraRows 9 NYC 9 get[Int] get[Int] get[String] … get[Any] Got Null ? get[Option[Int]] http://www.datastax.com/documentation/datastax_enterprise/4.5/datastax_enterprise/spark/sparkSupportedTypes.html
  • 25. Copy A Table Say we want to restructure our table or add a new column? CREATE&TABLE&characterlocations&(& & time&int,&& & character&text,&& & location&text,&& & PRIMARY&KEY&(time,character)& );
  • 26. Copy A Table Say we want to restructure our table or add a new column? CREATE&TABLE&characterlocations&(& & time&int,&& & character&text,&& & location&text,&& & PRIMARY&KEY&(time,character)& ); sc.cassandraTable(“newyork","presidentlocations")& & .map(&row&=>&(& & & & row.get[Int](“time"),& & & & "president",&& & & & row.get[String](“location")& & )).saveToCassandra("newyork","characterlocations") cassandraTable 1 white house
  • 27. Copy A Table Say we want to restructure our table or add a new column? CREATE&TABLE&characterlocations&(& & time&int,&& & character&text,&& & location&text,&& & PRIMARY&KEY&(time,character)& ); sc.cassandraTable(“newyork","presidentlocations")& & .map(&row&=>&(& & & & row.get[Int](“time"),& & & & "president",&& & & & row.get[String](“location")& & )).saveToCassandra("newyork","characterlocations") cassandraTable 1 white house
  • 28. Copy A Table Say we want to restructure our table or add a new column? CREATE&TABLE&characterlocations&(& & time&int,&& & character&text,&& & location&text,&& & PRIMARY&KEY&(time,character)& ); sc.cassandraTable(“newyork","presidentlocations")& & .map(&row&=>&(& & & & row.get[Int](“time"),& & & & "president",&& & & & row.get[String](“location")& & )).saveToCassandra("newyork","characterlocations") cassandraTable get[Int] get[String] 1 white house 1,president,white house
  • 29. get[Int] get[String] C* Copy A Table Say we want to restructure our table or add a new column? CREATE&TABLE&characterlocations&(& & time&int,&& & character&text,&& & location&text,&& & PRIMARY&KEY&(time,character)& ); sc.cassandraTable(“newyork","presidentlocations")& & .map(&row&=>&(& & & & row.get[Int](“time"),& & & & "president",&& & & & row.get[String](“location")& & )).saveToCassandra("newyork","characterlocations") cassandraTable 1 white house 1,president,white house saveToCassandra
  • 30. get[Int] get[String] C* Copy A Table Say we want to restructure our table or add a new column? CREATE&TABLE&characterlocations&(& & time&int,&& & character&text,&& & location&text,&& & PRIMARY&KEY&(time,character)& ); sc.cassandraTable(“newyork","presidentlocations")& & .map(&row&=>&(& & & & row.get[Int](“time"),& & & & "president",&& & & & row.get[String](“location")& & )).saveToCassandra("newyork","characterlocations") cqlsh:newyork>&SELECT&*&FROM&characterlocations&;& ! &time&|&character&|&location& kkkkkk+kkkkkkkkkkk+kkkkkkkkkkkkk& &&&&5&|&president&|&Air&Force&1& &&&10&|&president&|&&&&&&&&&NYC& …& …& cassandraTable 1 white house 1,president,white house saveToCassandra
  • 31. Filter a Table What if we want to filter based on a non-clustering key column? scala>&sc.cassandraTable(“newyork","presidentlocations")& & .filter(&_.get[Int]("time")&>&7&)& & .toArray& ! res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& Array(& & CassandraRow{time:&9,&location:&NYC},&& & CassandraRow{time:&10,&location:&NYC},&& & CassandraRow{time:&8,&location:&NYC}& ) cassandraTable
  • 32. Filter a Table What if we want to filter based on a non-clustering key column? scala>&sc.cassandraTable(“newyork","presidentlocations")& & .filter(&_.get[Int]("time")&>&7&)& & .toArray& ! res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& Array(& & CassandraRow{time:&9,&location:&NYC},&& & CassandraRow{time:&10,&location:&NYC},&& & CassandraRow{time:&8,&location:&NYC}& ) cassandraTable Filter
  • 33. Filter a Table What if we want to filter based on a non-clustering key column? scala>&sc.cassandraTable(“newyork","presidentlocations")& & .filter(&_.get[Int]("time")&>&7&)& & .toArray& ! res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& Array(& & CassandraRow{time:&9,&location:&NYC},&& & CassandraRow{time:&10,&location:&NYC},&& & CassandraRow{time:&8,&location:&NYC}& ) cassandraTable Filter _ (Anonymous Param) 1 white house
  • 34. Filter a Table What if we want to filter based on a non-clustering key column? scala>&sc.cassandraTable(“newyork","presidentlocations")& & .filter(&_.get[Int]("time")&>&7&)& & .toArray& ! res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& Array(& & CassandraRow{time:&9,&location:&NYC},&& & CassandraRow{time:&10,&location:&NYC},&& & CassandraRow{time:&8,&location:&NYC}& ) cassandraTable Filter 1 white house get[Int] 1 _ (Anonymous Param)
  • 35. Filter a Table What if we want to filter based on a non-clustering key column? scala>&sc.cassandraTable(“newyork","presidentlocations")& & .filter(&_.get[Int]("time")&>&7&)& & .toArray& ! res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& Array(& & CassandraRow{time:&9,&location:&NYC},&& & CassandraRow{time:&10,&location:&NYC},&& & CassandraRow{time:&8,&location:&NYC}& ) cassandraTable _ (Anonymous Param) >7 1 white house get[Int] 1 Filter
  • 36. Filter a Table What if we want to filter based on a non-clustering key column? scala>&sc.cassandraTable(“newyork","presidentlocations")& & .filter(&_.get[Int]("time")&>&7&)& & .toArray& ! res9:&Array[com.datastax.spark.connector.CassandraRow]&=&& Array(& & CassandraRow{time:&9,&location:&NYC},&& & CassandraRow{time:&10,&location:&NYC},&& & CassandraRow{time:&8,&location:&NYC}& ) cassandraTable _ (Anonymous Param) >7 1 white house get[Int] 1 Filter
  • 37. Backfill a Table with a Different Key! CREATE&TABLE&timelines&(& &&time&int,& &&character&text,& &&location&text,& &&PRIMARY&KEY&((character),&time)& ) If we actually want to have quick access to timelines we need a C* table with a different structure.
  • 38. Backfill a Table with a Different Key! CREATE&TABLE&timelines&(& &&time&int,& &&character&text,& &&location&text,& &&PRIMARY&KEY&((character),&time)& ) If we actually want to have quick access to timelines we need a C* table with a different structure. sc.cassandraTable(“newyork","characterlocations")& & .saveToCassandra("newyork","timelines") 1 white house cassandraTable president
  • 39. Backfill a Table with a Different Key! CREATE&TABLE&timelines&(& &&time&int,& &&character&text,& &&location&text,& &&PRIMARY&KEY&((character),&time)& ) If we actually want to have quick access to timelines we need a C* table with a different structure. sc.cassandraTable(“newyork","characterlocations")& & .saveToCassandra("newyork","timelines") 1 white house cassandraTable saveToCassandra president C*
  • 40. Backfill a Table with a Different Key! CREATE&TABLE&timelines&(& &&time&int,& &&character&text,& &&location&text,& &&PRIMARY&KEY&((character),&time)& ) If we actually want to have quick access to timelines we need a C* table with a different structure. sc.cassandraTable(“newyork","characterlocations")& & .saveToCassandra("newyork","timelines") cqlsh:newyork>&select&*&from&timelines;& ! &character&|&time&|&location& kkkkkkkkkkk+kkkkkk+kkkkkkkkkkkkk& &president&|&&&&1&|&White&House& &president&|&&&&2&|&White&House& &president&|&&&&3&|&White&House& &president&|&&&&4&|&White&House& &president&|&&&&5&|&Air&Force&1& &president&|&&&&6&|&Air&Force&1& &president&|&&&&7&|&Air&Force&1& &president&|&&&&8&|&&&&&&&&&NYC& &president&|&&&&9&|&&&&&&&&&NYC& &president&|&&&10&|&&&&&&&&&NYC 1 white house cassandraTable saveToCassandra president C*
  • 41. Import a CSV I have some data in another source which I could really use in my Cassandra table sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& & .map(_.split(“,"))& & .map(&line&=>&& & & (line(0),line(1),line(2)))& & .saveToCassandra("newyork","timelines") textFile
  • 42. Import a CSV I have some data in another source which I could really use in my Cassandra table sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& & .map(_.split(“,"))& & .map(&line&=>&& & & (line(0),line(1),line(2)))& & .saveToCassandra("newyork","timelines") textFile Map plissken,1,Federal Reserve split plissken 1 Federal Reserve
  • 43. Import a CSV I have some data in another source which I could really use in my Cassandra table sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& & .map(_.split(“,"))& & .map(&line&=>&& & & (line(0),line(1),line(2)))& & .saveToCassandra("newyork","timelines") textFile Map plissken,1,Federal Reserve split plissken 1 Federal Reserve plissken,1,Federal Reserve
  • 44. Import a CSV I have some data in another source which I could really use in my Cassandra table sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& & .map(_.split(“,"))& & .map(&line&=>&& & & (line(0),line(1),line(2)))& & .saveToCassandra("newyork","timelines") textFile Map plissken,1,Federal Reserve split plissken 1 Federal Reserve plissken,1,Federal Reserve saveToCassandra C*
  • 45. Import a CSV I have some data in another source which I could really use in my Cassandra table sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& & .map(_.split(“,"))& & .map(&line&=>&& & & (line(0),line(1),line(2)))& & .saveToCassandra("newyork","timelines") textFile Map plissken,1,white house split plissken 1 white house plissken,1,white house saveToCassandra C* cqlsh:newyork>&select&*&from&timelines&where&character&=&'plissken';& ! &character&|&time&|&location& kkkkkkkkkkk+kkkkkk+kkkkkkkkkkkkkkkkk& &&plissken&|&&&&1&|&Federal&Reserve& &&plissken&|&&&&2&|&Federal&Reserve& &&plissken&|&&&&3&|&Federal&Reserve& &&plissken&|&&&&4&|&&&&&&&&&&&Court& &&plissken&|&&&&5&|&&&&&&&&&&&Court& &&plissken&|&&&&6&|&&&&&&&&&&&Court& &&plissken&|&&&&7&|&&&&&&&&&&&Court& &&plissken&|&&&&8&|&&Stealth&Glider& &&plissken&|&&&&9&|&&&&&&&&&&&&&NYC& &&plissken&|&&&10&|&&&&&&&&&&&&&NYC
  • 46. Import a CSV I have some data in another source which I could really use in my Cassandra table sc.textFile(“file:///Users/russellspitzer/ReallyImportantDocuments/PlisskenLocations.csv”)& & .map(_.split(“,"))& & .map(&line&=>&& & & (line(0),line(1),line(2)))& & .saveToCassandra("newyork","timelines") textFile Map plissken,1,white house split plissken 1 white house plissken,1,white house saveToCassandra C* cqlsh:newyork>&select&*&from&timelines&where&character&=&'plissken';& ! &character&|&time&|&location& kkkkkkkkkkk+kkkkkk+kkkkkkkkkkkkkkkkk& &&plissken&|&&&&1&|&Federal&Reserve& &&plissken&|&&&&2&|&Federal&Reserve& &&plissken&|&&&&3&|&Federal&Reserve& &&plissken&|&&&&4&|&&&&&&&&&&&Court& &&plissken&|&&&&5&|&&&&&&&&&&&Court& &&plissken&|&&&&6&|&&&&&&&&&&&Court& &&plissken&|&&&&7&|&&&&&&&&&&&Court& &&plissken&|&&&&8&|&&Stealth&Glider& &&plissken&|&&&&9&|&&&&&&&&&&&&&NYC& &&plissken&|&&&10&|&&&&&&&&&&&&&NYC
  • 47. Perform a Join with MySQL Maybe a little more than one line … MySQL Table “quotes” in “escape_from_ny” import&java.sql._& import&org.apache.spark.rdd.JdbcRDD& Class.forName(“com.mysql.jdbc.Driver”).newInstance();//Connector/J&added&toSpark&Shell&Classpath& val&quotes&=&new&JdbcRDD(& & sc,&& & ()&=>&{& & & DriverManager.getConnection("jdbc:mysql://Localhost/escape_from_ny?user=root")},&& & "SELECT&*&FROM&quotes&WHERE&?&<=&ID&and&ID&<=&?”,& & 0,& & 100,& & 5,&& & (r:&ResultSet)&=>&{& & & (r.getInt(2),r.getString(3))& & }& )& ! quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23&
  • 48. Perform a Join with MySQL Maybe a little more than one line … quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23& ! quotes.join(& & sc.cassandraTable(“newyork","timelines")& & .filter(&_.get[String]("character")&==&“plissken")& & .map(&row&=>&(row.get[Int](“time"),row.get[String]("location"))))& & .take(1)& & .foreach(println)& ! (5,& & (Bob&Hauk:&& There&was&an&accident.&& & & & About&an&hour&ago,&a&small&jet&went&down&inside&New&York&City.&& & & & The&President&was&on&board.& & &Snake&Plissken:&The&president&of&what?,& & Court)& ) cassandraTable JdbcRDD Needs to be in the form of RDD[K,V] 5, ‘Bob Hauk: …'
  • 49. Perform a Join with MySQL Maybe a little more than one line … quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23& ! quotes.join(& & sc.cassandraTable(“newyork","timelines")& & .filter(&_.get[String]("character")&==&“plissken")& & .map(&row&=>&(row.get[Int](“time"),row.get[String]("location"))))& & .take(1)& & .foreach(println)& ! (5,& & (Bob&Hauk:&& There&was&an&accident.&& & & & About&an&hour&ago,&a&small&jet&went&down&inside&New&York&City.&& & & & The&President&was&on&board.& & &Snake&Plissken:&The&president&of&what?,& & Court)& ) cassandraTable JdbcRDD plissken,5,court 5,court 5, ‘Bob Hauk: …'
  • 50. Perform a Join with MySQL Maybe a little more than one line … quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23& ! quotes.join(& & sc.cassandraTable(“newyork","timelines")& & .filter(&_.get[String]("character")&==&“plissken")& & .map(&row&=>&(row.get[Int](“time"),row.get[String]("location"))))& & .take(1)& & .foreach(println)& ! (5,& & (Bob&Hauk:&& There&was&an&accident.&& & & & About&an&hour&ago,&a&small&jet&went&down&inside&New&York&City.&& & & & The&President&was&on&board.& & &Snake&Plissken:&The&president&of&what?,& & Court)& ) cassandraTable JdbcRDD plissken,5,court 5,court 5,(‘Bob Hauk: …’,court) 5, ‘Bob Hauk: …'
  • 51. Perform a Join with MySQL Maybe a little more than one line … quotes:&org.apache.spark.rdd.JdbcRDD[(Int,&String)]&=&JdbcRDD[9]&at&JdbcRDD&at&<console>:23& ! quotes.join(& & sc.cassandraTable(“newyork","timelines")& & .filter(&_.get[String]("character")&==&“plissken")& & .map(&row&=>&(row.get[Int](“time"),row.get[String]("location"))))& & .take(1)& & .foreach(println)& ! (5,& & (Bob&Hauk:&& There&was&an&accident.&& & & & About&an&hour&ago,&a&small&jet&went&down&inside&New&York&City.&& & & & The&President&was&on&board.& & &Snake&Plissken:&The&president&of&what?,& & Court)& ) cassandraTable JdbcRDD plissken,5,court 5,court 5,(‘Bob Hauk: …’,court) 5, ‘Bob Hauk: …'
  • 52. Easy Objects with Case Classes We have the technology to make this even easier! case&class&timelineRow&&(character:String,&time:Int,&location:String)& sc.cassandraTable[timelineRow](“newyork","timelines")& & .filter(&_.character&==&“plissken")& & .filter(&_.time&==&8)& & .toArray& res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) timelineRow character,time,location
  • 53. Easy Objects with Case Classes We have the technology to make this even easier! case&class&timelineRow&&(character:String,&time:Int,&location:String)& sc.cassandraTable[timelineRow](“newyork","timelines")& & .filter(&_.character&==&“plissken")& & .filter(&_.time&==&8)& & .toArray& res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) cassandraTable[timelineRow] timelineRow character,time,location
  • 54. Easy Objects with Case Classes We have the technology to make this even easier! case&class&timelineRow&&(character:String,&time:Int,&location:String)& sc.cassandraTable[timelineRow](“newyork","timelines")& & .filter(&_.character&==&“plissken")& & .filter(&_.time&==&8)& & .toArray& res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) cassandraTable[timelineRow] timelineRow character,time,location filter character == plissken
  • 55. Easy Objects with Case Classes We have the technology to make this even easier! case&class&timelineRow&&(character:String,&time:Int,&location:String)& sc.cassandraTable[timelineRow](“newyork","timelines")& & .filter(&_.character&==&“plissken")& & .filter(&_.time&==&8)& & .toArray& res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) cassandraTable[timelineRow] timelineRow character,time,location filter character == plissken time == 8
  • 56. Easy Objects with Case Classes We have the technology to make this even easier! case&class&timelineRow&&(character:String,&time:Int,&location:String)& sc.cassandraTable[timelineRow](“newyork","timelines")& & .filter(&_.character&==&“plissken")& & .filter(&_.time&==&8)& & .toArray& res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) cassandraTable[timelineRow] timelineRow character,time,location filter character == plissken time == 8 character:plissken,time:8,location: Stealth Glider
  • 57. Easy Objects with Case Classes We have the technology to make this even easier! case&class&timelineRow&&(character:String,&time:Int,&location:String)& sc.cassandraTable[timelineRow](“newyork","timelines")& & .filter(&_.character&==&“plissken")& & .filter(&_.time&==&8)& & .toArray& res13:&Array[timelineRow]&=&Array(timelineRow(plissken,8,Stealth&Glider)) The Future cassandraTable[timelineRow] timelineRow character,time,location filter character == plissken time == 8 character:plissken,time:8,location: Stealth Glider
  • 58. A Map Reduce for Word Count … scala>&sc.cassandraTable(“newyork”,"presidentlocations")& & .map(&_.get[String](“location”)&)& & .flatMap(&_.split(“&“))& & .map(&(_,1))& & .reduceByKey(&_&+&_&)& & .toArray& res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) cassandraTable
  • 59. A Map Reduce for Word Count … scala>&sc.cassandraTable(“newyork”,"presidentlocations")& & .map(&_.get[String](“location”)&)& & .flatMap(&_.split(“&“))& & .map(&(_,1))& & .reduceByKey(&_&+&_&)& & .toArray& res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 1 white house cassandraTable get[String]
  • 60. A Map Reduce for Word Count … scala>&sc.cassandraTable(“newyork”,"presidentlocations")& & .map(&_.get[String](“location”)&)& & .flatMap(&_.split(“&“))& & .map(&(_,1))& & .reduceByKey(&_&+&_&)& & .toArray& res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 1 white house white house cassandraTable get[String] _.split()
  • 61. A Map Reduce for Word Count … scala>&sc.cassandraTable(“newyork”,"presidentlocations")& & .map(&_.get[String](“location”)&)& & .flatMap(&_.split(“&“))& & .map(&(_,1))& & .reduceByKey(&_&+&_&)& & .toArray& res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 1 white house white house white, 1 house, 1 cassandraTable get[String] _.split() (_,1)
  • 62. A Map Reduce for Word Count … scala>&sc.cassandraTable(“newyork”,"presidentlocations")& & .map(&_.get[String](“location”)&)& & .flatMap(&_.split(“&“))& & .map(&(_,1))& & .reduceByKey(&_&+&_&)& & .toArray& res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 1 white house white house white, 1 house, 1 house, 1 house, 1 house, 2 cassandraTable get[String] _.split() (_,1) _ + _
  • 63. A Map Reduce for Word Count … scala>&sc.cassandraTable(“newyork”,"presidentlocations")& & .map(&_.get[String](“location”)&)& & .flatMap(&_.split(“&“))& & .map(&(_,1))& & .reduceByKey(&_&+&_&)& & .toArray& res17:&Array[(String,&Int)]&=&Array((1,3),&(House,4),&(NYC,3),&(Force,3),&(White,4),&(Air,3)) 1 white house white house white, 1 house, 1 house, 1 house, 1 house, 2 cassandraTable get[String] _.split() (_,1) _ + _
  • 64. Stand Alone App Example https://github.com/RussellSpitzer/spark4cassandra4csv Car,:Model,:Color Dodge,:Caravan,:Red: Ford,:F150,:Black: Toyota,:Prius,:Green Spark SCC RDD: [CassandraRow] !!! FavoriteCars Table Cassandra Column:Mapping CSV
  • 65. Thanks for listening! There is plenty more we can do with Spark but … Questions?
  • 66. Getting started with Cassandra?! DataStax Academy offers free online Cassandra training! Planet Cassandra has resources for learning the basics from ‘Try Cassandra’ tutorials to in depth language and migration pages! Find a way to contribute back to the community: talk at a meetup, or share your story on PlanetCassandra.org! Need help? Get questions answered with Planet Cassandra’s free virtual office hours running weekly! Email us: Community@DataStax.com! Thanks for coming to the meetup!! In production?! Tweet us: @PlanetCassandra!