The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
Intro to cassandra + hadoop
1. Cassandra + Hadoop An Introduction to Hadoop Analytics over Cassandra Data
2. Introductions What is Cassandra? A highly scalable distributed data store Born at Facebook, grew up in the community What is Hadoop? A set of Apache projects Deal with Big Data in a distributed way Open source versions of MapReduce, GFS, BigTable, as well as additions, such as Pig and Hive
3. What makes them compatible? Cassandra is great at a lot of things Fast, extremely scalable writes, fast random reads Flexible semi-structured data model Not as good with ad-hoc answers Enter Hadoop MapReduce, Pig, and Hive are extensible Output from Hadoop into Cassandra
4. MapReduce Input from Cassandra as of 0.6.x Baked in output to Cassandra as of 0.7.0 Streaming support is coming in 0.7 Example: WordCount
5. Pig What is Pig? A platform for data analytics developed at Yahoo! Includes PigLatin, Grunt shell, and interpreter that compiles down to MapReduce Simplifies data analysis Cassandra integration Stu Hood added Pig integration in Cassandra 0.6 Example: WordCount with Pig
6. Hive What is Hive? A platform for data analytics developed at Facebook Draws from the familiar SQL -> Hive QL Compiles down to MapReduce Cassandra integration Availability of a Cassandra storage handler is coming soon – HIVE-1434
7. Example Use Case Raptr.com Gaming statistics and achievements across platforms Home-grown -> Cassandra + Hadoop (Pig) Idea to execution much faster Query runtime from hours to 10-15 minutes
8. Questions Contact Email: jeremy.hanna@rackspace.com Twitter: @jeromatron IRC: jeromatron on irc.freenode.net - #cassandra, #hadoop Further information http://wiki.apache.org/cassandra/HadoopSupport Cassandra: The Definitive Guide