Speaker: Andy Cobley, Lecturer at University of Dundee
Video: http://www.youtube.com/watch?v=0U4iOSMnRdk&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=1
Abstract: The raspberry Pi is a credit-card sized $25 ARM based linux box designed to teach children the basics of programming. The machine comes with a 700MHz ARM and 512Mb of memory and boots off a SD card, not much power for running the likes of a Cassandra cluster. This presentation will discuss the problems of getting Cassandra up and running on the Pi and will answer the all important question: Why on Earth would you want to do this!?
What Are The Drone Anti-jamming Systems Technology?
C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi
1. Hardware Agnostic: Cassandra on Raspberry Pi
Andy Cobley | Lecturer, University of Dundee, Scotland
#CASSANDRAEU
CASSANDRASUMMITEU
2. What we will discuss today…
* Cassandra is hardware agnostic
* So why not run it on a Raspberry Pi ?
* How hard can it be ?
* What can we do with it once it works?
#CASSANDRAEU
CASSANDRASUMMITEU
3. Who Am I ?
* Andy Cobley
* Program Director, MSc in Data Science and Business Intelligence
* School of Computing
* University of Dundee
* Twitter: @andycobley
#CASSANDRAEU
CASSANDRASUMMITEU
4. Whats a Raspberry Pi ?
* Single chip Linux computer
* 500 Meg ram
* Boots off an SD card
* Ethernet port
* (graphics and all you need for a general purpose computer)
#CASSANDRAEU
CASSANDRASUMMITEU
6. And, here’s one for real
* Also 4 node cluster.
#CASSANDRAEU
CASSANDRASUMMITEU
7. The Bad News
* Cassandra is designed to be fast, fast at writing, fast at reading.
* This laptop with one instance of Cassandra will do 12,000 write
operations
* Raspberry Pi will do 200 !
#CASSANDRAEU
CASSANDRASUMMITEU
8. More bad news !
* Running a external USB drive is actually worse !
* Probably be hardware feature
#CASSANDRAEU
CASSANDRASUMMITEU
10. And then there’s Java!
* Oracle Java vs OpenJDK
#CASSANDRAEU
CASSANDRASUMMITEU
11. And Raspbian
* Raspbian is Debian for the PI
* Uses the Hard floating point accelerator
* Much faster than Debian
* Current official Oracle JDK won’t run on it !
#CASSANDRAEU
CASSANDRASUMMITEU
13. Hard vs Soft Float
* And then it turns out:
Actually
not
much
difference
in
performance
#CASSANDRAEU
CASSANDRASUMMITEU
14. The Problem with compression
* Cassandra uses compression for performance
* Started in version 1.0
2x-‐4x
reduc8on
in
data
size
25-‐35%
performance
improvement
on
reads
5-‐10%
performance
improvement
on
writes
#CASSANDRAEU
CASSANDRASUMMITEU
15. Compression types
* Three types:
Google
Snappy
Compressor
(Faster
read/writes)
DeflateCompressor
(Java
zip,
slower
,
beOer
compression)
* Snappy Compression not available on Pi
(requires
na8ve
methods,
so
someone
might
get
it
to
work!)
#CASSANDRAEU
CASSANDRASUMMITEU
16. Compression
* Cassandra 1.2 (and 2) also has lz4 compression
* Which is good news !
#CASSANDRAEU
CASSANDRASUMMITEU
17. And the startup script
* Startup script allocates memory
* Calculates based on number of processors
* Pi reports Zero processors !
* Boom !
* Now fixed
#CASSANDRAEU
CASSANDRASUMMITEU
18. JMX Config
* In Cassandra-env.sh
* JVM_OPTS="$JVM_OPTS Djava.rmi.server.hostname=192.168.1.15”
* Or else nodetool will not work between nodes
#CASSANDRAEU
CASSANDRASUMMITEU
19. JVM OPT UseCondCardMark
* C* 1.22. added UseCondCardMark as a JVM Opt
* "for better lock handling especially on hotspot with multicore
processor”
* In cassandra-env.sh
#if
[
"$JVM_VERSION"
>
"1.7"
]
;
then
#
JVM_OPTS="$JVM_OPTS
-‐XX:
+UseCondCardMark"
#fi
#CASSANDRAEU
CASSANDRASUMMITEU
21. The Good News !
* We’ve forgotten one thing
* The Pi cost £25
* You can power 4 from USB hub (no need for a power supply on
each one)
* So:
#CASSANDRAEU
CASSANDRASUMMITEU
22. So, have a 64 node computer for £2000
University
of
Southhampton
#CASSANDRAEU
CASSANDRASUMMITEU
23. Or this
* 32 node Beowolf cluster:
* Joshua Kiepert, Boise University
#CASSANDRAEU
CASSANDRASUMMITEU
24. Or this Hadoop Cluster from LinkedIn
hOp://prac8calcloudcompu8ng.com/post/53996976003/hadoop-‐running-‐on-‐a-‐14-‐chip-‐
raspberry-‐pi-‐cluster
#CASSANDRAEU
CASSANDRASUMMITEU
25. Adding nodes is good
* Adding nodes adds performance
* Adding nodes adds replicas of data
* BUT
* Make sure your ring is balanced,
* Pi’s don’t like to be unbalanced.
#CASSANDRAEU
CASSANDRASUMMITEU
26. Vnodes
* Vnodes (in 1.2) would be very nice
* However at this point I haven’t got 1.2 on Pi running on a cluster
* As for Cassandra 2, see later
#CASSANDRAEU
CASSANDRASUMMITEU
29. Stress test commands
* ./stress -d 192.168.1.10,192.168.1.11,192.168.1.12 -o insert -I
DeflateCompressor
* Note: nodes to use
* You will get different performance if you insert to less nodes than
you have in your ring
#CASSANDRAEU
CASSANDRASUMMITEU
30. Getting more memory
* On Debian, you can free memory from the graphics chip
Cd
/boot
sudo
cp
start.elf
start.elf.old
sudo
cp
arm224_start.elf
to
start.elf
reboot
#CASSANDRAEU
CASSANDRASUMMITEU
31. Getting more Memory
* Under Rasbian
* Run with a monitor plugged for the first time
* Set options for screen memory
* Perhaps disable boot to GUI
#CASSANDRAEU
CASSANDRASUMMITEU
33. Multiple nodes
* Make a master SD card
* Copy it !
* Make sure the master version has no data on it.
* Consider ”Puppet” (though I don’t use it)
#CASSANDRAEU
CASSANDRASUMMITEU
34. Starting as a service
* See https://github.com/acobley/CassandraStartup
* Put the file in /etc/init.d
* update-rc.d cassandra defaults
#CASSANDRAEU
CASSANDRASUMMITEU
35. Pi is for teaching
* So for £200 we get an 8 node C* cluster
* It can be reconfigured, blown away, stress tested and generally
abused
* We can simulate data racks, data centers and I hope even long
network delays.
* Hopefully our students will use these clusters
#CASSANDRAEU
CASSANDRASUMMITEU
36. C* is network aware
* We know C* can be configured to be aware of:
Network
racks
Data
Centers
* We know we can have replicas are stored across these racks
* How can we play with this cheaply
#CASSANDRAEU
CASSANDRASUMMITEU
38. TC ?
* What about the Linux tc command
* Lets look again at the diagram
#CASSANDRAEU
CASSANDRASUMMITEU
39. Network
* What we can’t do
* Recommended bandwidth is 1000 Mbit/s (Gigabit) or greater.
* Bind the Thrift interface (listen_address) to a specific NIC (Network
Interface Card).
* Bind the RPC server interface (rpc_address) to another NIC.
hOp://www.datastax.com/docs/1.2/cluster_architecture/cluster_planning
#CASSANDRAEU
CASSANDRASUMMITEU
40. What about Cassandra 2.0
* Internode compression currently uses Snappy
* So turn it off in conf file:
internode_compression:
none
#CASSANDRAEU
CASSANDRASUMMITEU
41. How does C* 2 run on a PI
* Some bad news
* So need to tune it :
* See John Berryman’s blog:
* http://www.opensourceconnections.com/2013/08/31/building-theperfect-cassandra-test-environment/
#CASSANDRAEU
CASSANDRASUMMITEU
42. Pi is discovery
* Cassandra wouldn’t run on a PI
* It does now.
* Running it on a Pi shook out some Cassandra bugs
* You can run it in a secure lab
#CASSANDRAEU
CASSANDRASUMMITEU
43. Pi is for fun
* Most important, this was pure Geeky Fun
#CASSANDRAEU
CASSANDRASUMMITEU
45. What we discussed today…
* Raspberry Pi is cheap
* C* needs some work to run on it
* You can make clusters cheaply for experimentation
* It’s fun !
#CASSANDRAEU
CASSANDRASUMMITEU
#CASSANDRAEU
CASSANDRASUMMITEU