2. *Cassandra is hardware agnostic
*So why not run it on a Raspberry Pi ?
*How hard can it be ?
*What can we do with it once it works?
Cassandra on Raspberry Pi
4. *Single chip Linux computer
*500 Meg ram
*Boots off an SD card
*Ethernet port
*(graphics and all you need for a general purpose computer)
Whats a Raspberry Pi ?
6. *And here’s the Cassandra cluster *
And, here’s one for real
* Power Permitting !
7. *Cassandra is designed to be fast, fast at writing, fast at reading.
*This laptop with one instance of Cassandra will do 12,000 write
operations
*Raspberry Pi will do 200 !
The Bad News
8. *Running a external USB drive is actually worse !
*Probably be hardware feature
More bad news !
11. *Raspbian is Debian for the PI
*Uses the Hard floating point accelerator
*Much faster than Debian
*Current Oracle JDK won’t run on it !
And Raspbian
14. *Cassandra uses compression for performance
*Started in version 1.0
2x-4x reduction in data size
25-35% performance improvement on reads
5-10% performance improvement on writes
The Problem with compression
15. *Two types:
Google Snappy Compressor (Faster read/writes)
DeflateCompressor (Java zip, slower , better
compression)
*Snappy Compression not available on Pi
(requires native methods, so someone might get it to
work!)
Compression types
16. *Startup script allocates memory
*Calculates based on number of processors
*Pi reports Zero processors !
*Boom !
*Now fixed
And the startup script
18. *C* 1.22. added UseCondCardMark as a JVM Opt
*"for better lock handling especially on hotspot with multicore
processor”
*In cassandra-env.sh
#if [ "$JVM_VERSION" > "1.7" ] ; then
# JVM_OPTS="$JVM_OPTS -
XX:+UseCondCardMark"
#fi
JVM OPT UseCondCardMark
19.
20. *We’ve forgotten one thing
*The Pi cost £25
*You can power 4 from USB hub (no need for a power supply on
each one)
*So:
The Good News !
21. So, have a 64 node computer for £2000
University of Southhampton
22. *32 node Beowolf cluster:
*Joshua Kiepert, Boise University
Or this
23. *Adding nodes adds performance
*Adding nodes adds replicas of data
*BUT
*Make sure your ring is balanced,
*Pi’s don’t like to be unbalanced.
Adding nodes is good
24. *Vnodes (in 1.2) would be very nice
*However at this point I haven’t got 1.2 on Pi running on a cluster
Vnodes
27. *./stress -d 192.168.1.10,192.168.1.11,192.168.1.12 -o insert -I
DeflateCompressor
*Note: nodes to use
*You will get different performance if you insert to less nodes than
you have in your ring
Stress test commands
28. *Adding a node (in the absence of Vnodes)
Must seed form a known node
Use a program to calculate new keys
Bring up new node with the correct key in
cassandra.yaml
Use node tool to move other nodes
Adding Nodes Procedure
29. *Python code
import sys
if (len(sys.argv) > 1):
num = int(sys.argv[1])
else:
num = int(raw_input("How many nodes? :"))
for i in range(0,num):
print 'node %d: %d' % (i, (i*(2**127)/num))
Calculating keys
31. *On Debian, you can free memory from the graphics chip
Cd /boot
sudo cp start.elf start.elf.old
sudo cp arm224_start.elf to start.elf
reboot
Getting more memory
32. *Under Rasbian
*Run with a monitor plugged for the first time
*Set options for screen memory
*Perhaps disable boot to GUI
Getting more Memory
36. *So for £200 we get an 8 node C* cluster
*It can be reconfigured, blown away, stress tested and generally
abused
*We can simulate data racks, data centers and I hope even long
network delays.
*Hopefully our upcoming MSc in Data Science will use these clusters
Pi is for teaching
37. *We know C* can be configured to be aware of:
Network racks
Data Centers
*We know we can have replicas are stored across these racks
*How can we play with this cheaply ?
C* is network aware