The raspberry Pi is a credit-card sized $25 ARM based linux box designed to teach children the basics of programming. The machine comes with a 700MHz ARM and 512Mb of memory and boots off a SD card, not much power for running the likes of a Cassandra cluster. This presentation will discuss the problems of getting Cassandra up and running on the Pi and will answer the all important question: Why on Earth would you want to do this!?
2. * Cassandra is hardware agnostic
* So why not run it on a Raspberry Pi ?
* How hard can it be ?
* What can we do with it once it works?
Cassandra on Raspberry Pi
3. * Andy Cobley
* School of Computing
* University of Dundee
* Twitter: @andycobley
Who Am I ?
4. * Single chip Linux computer
* 500 Meg ram
* Boots off an SD card
* Ethernet port
* (graphics and all you need for a general purpose computer)
Whats a Raspberry Pi ?
6. * And here’s the Cassandra cluster *
And, here’s one for real
* Power Permitting !
7. * Cassandra is designed to be fast, fast at writing, fast at reading.
* This laptop with one instance of Cassandra will do 12,000 write
operations
* Raspberry Pi will do 200 !
The Bad News
8. * Running a external USB drive is actually worse !
* Probably be hardware feature
More bad news !
11. * Raspbian is Debian for the PI
* Uses the Hard floating point accelerator
* Much faster than Debian
* Current Oracle JDK won’t run on it !
And Raspbian
13. * Actually not much difference in performance
Hard vs Soft Float
14. * Cassandra uses compression for performance
* Started in version 1.0
2x-‐4x
reduc+on
in
data
size
25-‐35%
performance
improvement
on
reads
5-‐10%
performance
improvement
on
writes
The Problem with compression
15. * Two types:
Google
Snappy
Compressor
(Faster
read/writes)
DeflateCompressor
(Java
zip,
slower
,
beLer
compression)
* Snappy Compression not available on Pi
(requires
na+ve
methods,
so
someone
might
get
it
to
work!)
Compression types
16. * Startup script allocates memory
* Calculates based on number of processors
* Pi reports Zero processors !
* Boom !
* Now fixed
And the startup script
17. * In Cassandra-env.sh
* JVM_OPTS="$JVM_OPTS -
Djava.rmi.server.hostname=192.168.1.15”
* Or else nodetool will not work between nodes
JMX Config
18. * C* 1.22. added UseCondCardMark as a JVM Opt
* "for better lock handling especially on hotspot with multicore
processor”
* In cassandra-env.sh
#if
[
"$JVM_VERSION"
>
"1.7"
]
;
then
#
JVM_OPTS="$JVM_OPTS
-‐XX:+UseCondCardMark"
#fi
JVM OPT UseCondCardMark
19.
20. * We’ve forgotten one thing
* The Pi cost £25
* You can power 4 from USB hub (no need for a power supply on
each one)
* So:
The Good News !
21. So, have a 64 node computer for £2000
University
of
Southhampton
22. * 32 node Beowolf cluster:
* Joshua Kiepert, Boise University
Or this
23. * Adding nodes adds performance
* Adding nodes adds replicas of data
* BUT
* Make sure your ring is balanced,
* Pi’s don’t like to be unbalanced.
Adding nodes is good
24. * Vnodes (in 1.2) would be very nice
* However at this point I haven’t got 1.2 on Pi running on a cluster
Vnodes
27. * ./stress -d 192.168.1.10,192.168.1.11,192.168.1.12 -o insert -I
DeflateCompressor
* Note: nodes to use
* You will get different performance if you insert to less nodes than
you have in your ring
Stress test commands
28. * Adding a node (in the absence of Vnodes)
Must
seed
form
a
known
node
Use
a
program
to
calculate
new
keys
Bring
up
new
node
with
the
correct
key
in
cassandra.yaml
Use
node
tool
to
move
other
nodes
Adding Nodes Procedure
29. * Python code
import
sys
if
(len(sys.argv)
>
1):
num
=
int(sys.argv[1])
else:
num
=
int(raw_input("How
many
nodes?
:"))
for
i
in
range(0,num):
print
'node
%d:
%d'
%
(i,
(i*(2**127)/num))
Calculating keys
30. * Use nodetool
sudo
./nodetool
-‐h
192.168.1.10
move
42535295865117307932921825928971026432
* And cleanup
./nodetool
-‐h
192.168.1.10
cleanup
Moving existing nodes
31. * On Debian, you can free memory from the graphics chip
Cd
/boot
sudo
cp
start.elf
start.elf.old
sudo
cp
arm224_start.elf
to
start.elf
reboot
Getting more memory
32. * Under Rasbian
* Run with a monitor plugged for the first time
* Set options for screen memory
* Perhaps disable boot to GUI
Getting more Memory
36. * So for £200 we get an 8 node C* cluster
* It can be reconfigured, blown away, stress tested and generally
abused
* We can simulate data racks, data centers and I hope even long
network delays.
* Hopefully our upcoming MSc in Data Science will use these clusters
Pi is for teaching
37. * We know C* can be configured to be aware of:
Network
racks
Data
Centers
* We know we can have replicas are stored across these racks
* How can we play with this cheaply ?
C* is network aware
39. * Cassandra wouldn’t run on a PI
* It does now.
* Running it on a Pi shook out some Cassandra bugs
* You can run it in a secure lab
Pi is discovery