SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
Hardware Agnostic: Cassandra on Raspberry Pi
Andy Cobley | Lecturer, University of Dundee, Scotland
*  Cassandra is hardware agnostic
*  So why not run it on a Raspberry Pi ?
*  How hard can it be ?
*  What can we do with it once it works?
Cassandra on Raspberry Pi
*  Andy Cobley
*  School of Computing
*  University of Dundee
*  Twitter: @andycobley
Who Am I ?
*  Single chip Linux computer
*  500 Meg ram
*  Boots off an SD card
*  Ethernet port
*  (graphics and all you need for a general purpose computer)
Whats a Raspberry Pi ?
Pi with pound coin
*  And here’s the Cassandra cluster *
And, here’s one for real
* Power Permitting !
*  Cassandra is designed to be fast, fast at writing, fast at reading.
*  This laptop with one instance of Cassandra will do 12,000 write
operations
*  Raspberry Pi will do 200 !
The Bad News
*  Running a external USB drive is actually worse !
*  Probably be hardware feature
More bad news !
Raspberry Pi Schematic
*  Oracle Java vs OpenJDK
And then there’s Java!
*  Raspbian is Debian for the PI
*  Uses the Hard floating point accelerator
*  Much faster than Debian
*  Current Oracle JDK won’t run on it !
And Raspbian
*  http://www.oracle.com/technetwork/java/embedded/downloads/
javase/index.html
*  Java SE Embedded version 6
*  Cassandra might prefer 6
*  But
*  https://blogs.oracle.com/henrik/entry/oracle_releases_jdk_for_linux
*  Preview at:
*  https://jdk8.java.net/fxarmpreview/
Oracle java
*  Actually not much difference in performance
Hard vs Soft Float
*  Cassandra uses compression for performance
*  Started in version 1.0
2x-­‐4x	
  reduc+on	
  in	
  data	
  size	
  
25-­‐35%	
  performance	
  improvement	
  on	
  reads	
  
5-­‐10%	
  performance	
  improvement	
  on	
  writes	
  
The Problem with compression
*  Two types:
Google	
  Snappy	
  Compressor	
  (Faster	
  read/writes)	
  
DeflateCompressor	
  (Java	
  zip,	
  slower	
  ,	
  beLer	
  
compression)	
  
*  Snappy Compression not available on Pi
(requires	
  na+ve	
  methods,	
  so	
  someone	
  might	
  get	
  it	
  to	
  
work!)	
  
Compression types
*  Startup script allocates memory
*  Calculates based on number of processors
*  Pi reports Zero processors !
*  Boom !
*  Now fixed
And the startup script
*  In Cassandra-env.sh
*  JVM_OPTS="$JVM_OPTS -
Djava.rmi.server.hostname=192.168.1.15”
*  Or else nodetool will not work between nodes
JMX Config
*  C* 1.22. added UseCondCardMark as a JVM Opt
*  "for better lock handling especially on hotspot with multicore
processor”
*  In cassandra-env.sh
#if	
  [	
  "$JVM_VERSION"	
  >	
  "1.7"	
  ]	
  ;	
  then	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
#	
  	
  	
  	
  JVM_OPTS="$JVM_OPTS	
  -­‐XX:+UseCondCardMark"	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
#fi	
  	
  
JVM OPT UseCondCardMark
*  We’ve forgotten one thing
*  The Pi cost £25
*  You can power 4 from USB hub (no need for a power supply on
each one)
*  So:
The Good News !
So, have a 64 node computer for £2000
University	
  of	
  Southhampton	
  
*  32 node Beowolf cluster:
*  Joshua Kiepert, Boise University
Or this
*  Adding nodes adds performance
*  Adding nodes adds replicas of data
*  BUT
*  Make sure your ring is balanced,
*  Pi’s don’t like to be unbalanced.
Adding nodes is good
*  Vnodes (in 1.2) would be very nice
*  However at this point I haven’t got 1.2 on Pi running on a cluster
Vnodes
Performance with 3/4 nodes
Performance with 5/6 nodes
*  ./stress -d 192.168.1.10,192.168.1.11,192.168.1.12 -o insert -I
DeflateCompressor
*  Note: nodes to use
*  You will get different performance if you insert to less nodes than
you have in your ring
Stress test commands
*  Adding a node (in the absence of Vnodes)
Must	
  seed	
  form	
  a	
  known	
  node	
  
Use	
  a	
  program	
  to	
  calculate	
  new	
  keys	
  	
  
Bring	
  up	
  new	
  node	
  with	
  the	
  correct	
  key	
  in	
  
cassandra.yaml	
  
Use	
  node	
  tool	
  to	
  move	
  other	
  nodes	
  
Adding Nodes Procedure
*  Python code
import	
  sys	
  
if	
  (len(sys.argv)	
  >	
  1):	
  
	
  	
  	
  num	
  =	
  int(sys.argv[1])	
  
else:	
  
	
  	
  	
  num	
  =	
  int(raw_input("How	
  many	
  nodes?	
  :"))	
  
for	
  i	
  in	
  range(0,num):	
  
	
  	
  	
  print	
  'node	
  %d:	
  %d'	
  %	
  (i,	
  (i*(2**127)/num))	
  
Calculating keys
*  Use nodetool
sudo	
  ./nodetool	
  -­‐h	
  192.168.1.10	
  move	
  
42535295865117307932921825928971026432	
  
*  And cleanup
./nodetool	
  -­‐h	
  192.168.1.10	
  cleanup	
  
Moving existing nodes
*  On Debian, you can free memory from the graphics chip
Cd	
  /boot	
  
sudo	
  cp	
  start.elf	
  start.elf.old	
  
sudo	
  cp	
  arm224_start.elf	
  to	
  start.elf	
  
reboot	
  
Getting more memory
*  Under Rasbian
*  Run with a monitor plugged for the first time
*  Set options for screen memory
*  Perhaps disable boot to GUI
Getting more Memory
*  I prefer static network addresses
*  Edit /etc/network/interfaces
iface	
  eth0	
  inet	
  sta+c	
  
	
  	
  	
  	
  	
  	
  	
  address	
  192.168.1.41	
  
	
  	
  	
  	
  	
  	
  	
  netmask	
  255.255.255.0	
  
	
  	
  	
  	
  	
  	
  	
  network	
  192.168.1.0	
  
	
  	
  	
  	
  	
  	
  	
  broadcast	
  192.168.1.255	
  
	
  	
  	
  	
  	
  	
  	
  gateway	
  192.168.1.254	
  
* 
Network address
*  Make a master SD card
*  Copy it !
*  Make sure the master version has no data on it.
*  Consider ”Puppet” (though I don’t use it)
Multiple nodes
*  See https://github.com/acobley/CassandraStartup
*  Put the file in /etc/init.d
*  update-rc.d cassandra defaults
Starting as a service
*  So for £200 we get an 8 node C* cluster
*  It can be reconfigured, blown away, stress tested and generally
abused
*  We can simulate data racks, data centers and I hope even long
network delays.
*  Hopefully our upcoming MSc in Data Science will use these clusters
Pi is for teaching
*  We know C* can be configured to be aware of:
Network	
  racks	
  
Data	
  Centers	
  
*  We know we can have replicas are stored across these racks
*  How can we play with this cheaply ?
C* is network aware
Proposed teaching tool
10mbs	
  
Hubb	
  
Noise	
  
injec+on	
  
Switch	
  
2	
  
Switch	
  
1	
  
Pi	
  1	
  
Pi	
  2	
  
Pi	
  3	
  
Pi	
  1	
  
Pi	
  2	
  
Pi	
  3	
  
*  Cassandra wouldn’t run on a PI
*  It does now.
*  Running it on a Pi shook out some Cassandra bugs
*  You can run it in a secure lab
Pi is discovery
*  Most important, this was pure Geeky Fun
Pi is for fun
*  Data Science:
*  http://www.computing.dundee.ac.uk/study/postgrad/
degreedetails.asp?17
Obligatory Plug
*  Raspberry Pi is cheap
*  C* needs some work to run on it
*  You can make clusters cheaply for experimentation
*  It’s fun !
C* is Hardware Agnostic
THANK YOU

Mais conteúdo relacionado

Mais procurados

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 

Mais procurados (20)

Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Webinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache CassandraWebinar: Getting Started with Apache Cassandra
Webinar: Getting Started with Apache Cassandra
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
Scylla Summit 2018: Keeping Your Latency SLAs No Matter What!
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developers
 
Real-time data analytics with Cassandra at iland
Real-time data analytics with Cassandra at ilandReal-time data analytics with Cassandra at iland
Real-time data analytics with Cassandra at iland
 
How to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instancesHow to Monitor and Size Workloads on AWS i3 instances
How to Monitor and Size Workloads on AWS i3 instances
 
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...
Scylla Summit 2018: Introducing ValuStor, A Memcached Alternative Made to Run...
 
Cassandra: An Alien Technology That's not so Alien
Cassandra: An Alien Technology That's not so AlienCassandra: An Alien Technology That's not so Alien
Cassandra: An Alien Technology That's not so Alien
 
Multi-Region Cassandra Clusters
Multi-Region Cassandra ClustersMulti-Region Cassandra Clusters
Multi-Region Cassandra Clusters
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra
 
Scylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent DatabasesScylla Summit 2018: Consensus in Eventually Consistent Databases
Scylla Summit 2018: Consensus in Eventually Consistent Databases
 
How We Made Scylla Maintenance Easier, Safer and Faster
How We Made Scylla Maintenance Easier, Safer and FasterHow We Made Scylla Maintenance Easier, Safer and Faster
How We Made Scylla Maintenance Easier, Safer and Faster
 
Scylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
Scylla Summit 2018: Meshify - A Case Study, or Petshop SeamonstersScylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
Scylla Summit 2018: Meshify - A Case Study, or Petshop Seamonsters
 
ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016ScyllaDB @ Apache BigData, may 2016
ScyllaDB @ Apache BigData, may 2016
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 

Semelhante a C* Summit 2013: Hardware Agnostic - Cassandra on Raspberry Pi by Andy Cobley

Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1
Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1
Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1
Andy Cobley
 
cluster research
cluster researchcluster research
cluster research
Will Dixon
 
Node-RED and getting started on the Internet of Things
Node-RED and getting started on the Internet of ThingsNode-RED and getting started on the Internet of Things
Node-RED and getting started on the Internet of Things
Boris Adryan
 

Semelhante a C* Summit 2013: Hardware Agnostic - Cassandra on Raspberry Pi by Andy Cobley (20)

Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1
Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1
Data stax cassandra_summit_2013_cassandra_raspberrypi-rc1
 
C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi
C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi
C* Summit EU 2013: Hardware Agnostic: Cassandra on Raspberry Pi
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
How to Hack Edison
How to Hack EdisonHow to Hack Edison
How to Hack Edison
 
cluster research
cluster researchcluster research
cluster research
 
The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012
 
Starting Raspberry Pi
Starting Raspberry PiStarting Raspberry Pi
Starting Raspberry Pi
 
Начало работы с Intel IoT Dev Kit
Начало работы с Intel IoT Dev KitНачало работы с Intel IoT Dev Kit
Начало работы с Intel IoT Dev Kit
 
Polstra 44con2012
Polstra 44con2012Polstra 44con2012
Polstra 44con2012
 
Hacking and Forensics on the Go - 44CON 2012
Hacking and Forensics on the Go - 44CON 2012Hacking and Forensics on the Go - 44CON 2012
Hacking and Forensics on the Go - 44CON 2012
 
Senior Design: Raspberry Pi Cluster Computing
Senior Design: Raspberry Pi Cluster ComputingSenior Design: Raspberry Pi Cluster Computing
Senior Design: Raspberry Pi Cluster Computing
 
DEF CON 23 - Phil Polstra - one device to pwn them all
DEF CON 23 - Phil Polstra - one device to pwn them allDEF CON 23 - Phil Polstra - one device to pwn them all
DEF CON 23 - Phil Polstra - one device to pwn them all
 
Node-RED and getting started on the Internet of Things
Node-RED and getting started on the Internet of ThingsNode-RED and getting started on the Internet of Things
Node-RED and getting started on the Internet of Things
 
Ccna 1 Chapter 1 V4.0 Answers
Ccna 1 Chapter 1 V4.0 AnswersCcna 1 Chapter 1 V4.0 Answers
Ccna 1 Chapter 1 V4.0 Answers
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
NetBSD workshop
NetBSD workshopNetBSD workshop
NetBSD workshop
 
[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020[KubeCon NA 2020] containerd: Rootless Containers 2020
[KubeCon NA 2020] containerd: Rootless Containers 2020
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 

Mais de DataStax Academy

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 

Mais de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

C* Summit 2013: Hardware Agnostic - Cassandra on Raspberry Pi by Andy Cobley

  • 1. Hardware Agnostic: Cassandra on Raspberry Pi Andy Cobley | Lecturer, University of Dundee, Scotland
  • 2. *  Cassandra is hardware agnostic *  So why not run it on a Raspberry Pi ? *  How hard can it be ? *  What can we do with it once it works? Cassandra on Raspberry Pi
  • 3. *  Andy Cobley *  School of Computing *  University of Dundee *  Twitter: @andycobley Who Am I ?
  • 4. *  Single chip Linux computer *  500 Meg ram *  Boots off an SD card *  Ethernet port *  (graphics and all you need for a general purpose computer) Whats a Raspberry Pi ?
  • 6. *  And here’s the Cassandra cluster * And, here’s one for real * Power Permitting !
  • 7. *  Cassandra is designed to be fast, fast at writing, fast at reading. *  This laptop with one instance of Cassandra will do 12,000 write operations *  Raspberry Pi will do 200 ! The Bad News
  • 8. *  Running a external USB drive is actually worse ! *  Probably be hardware feature More bad news !
  • 10. *  Oracle Java vs OpenJDK And then there’s Java!
  • 11. *  Raspbian is Debian for the PI *  Uses the Hard floating point accelerator *  Much faster than Debian *  Current Oracle JDK won’t run on it ! And Raspbian
  • 12. *  http://www.oracle.com/technetwork/java/embedded/downloads/ javase/index.html *  Java SE Embedded version 6 *  Cassandra might prefer 6 *  But *  https://blogs.oracle.com/henrik/entry/oracle_releases_jdk_for_linux *  Preview at: *  https://jdk8.java.net/fxarmpreview/ Oracle java
  • 13. *  Actually not much difference in performance Hard vs Soft Float
  • 14. *  Cassandra uses compression for performance *  Started in version 1.0 2x-­‐4x  reduc+on  in  data  size   25-­‐35%  performance  improvement  on  reads   5-­‐10%  performance  improvement  on  writes   The Problem with compression
  • 15. *  Two types: Google  Snappy  Compressor  (Faster  read/writes)   DeflateCompressor  (Java  zip,  slower  ,  beLer   compression)   *  Snappy Compression not available on Pi (requires  na+ve  methods,  so  someone  might  get  it  to   work!)   Compression types
  • 16. *  Startup script allocates memory *  Calculates based on number of processors *  Pi reports Zero processors ! *  Boom ! *  Now fixed And the startup script
  • 17. *  In Cassandra-env.sh *  JVM_OPTS="$JVM_OPTS - Djava.rmi.server.hostname=192.168.1.15” *  Or else nodetool will not work between nodes JMX Config
  • 18. *  C* 1.22. added UseCondCardMark as a JVM Opt *  "for better lock handling especially on hotspot with multicore processor” *  In cassandra-env.sh #if  [  "$JVM_VERSION"  >  "1.7"  ]  ;  then                                                                                                                                                 #        JVM_OPTS="$JVM_OPTS  -­‐XX:+UseCondCardMark"                                                                                                                                 #fi     JVM OPT UseCondCardMark
  • 19.
  • 20. *  We’ve forgotten one thing *  The Pi cost £25 *  You can power 4 from USB hub (no need for a power supply on each one) *  So: The Good News !
  • 21. So, have a 64 node computer for £2000 University  of  Southhampton  
  • 22. *  32 node Beowolf cluster: *  Joshua Kiepert, Boise University Or this
  • 23. *  Adding nodes adds performance *  Adding nodes adds replicas of data *  BUT *  Make sure your ring is balanced, *  Pi’s don’t like to be unbalanced. Adding nodes is good
  • 24. *  Vnodes (in 1.2) would be very nice *  However at this point I haven’t got 1.2 on Pi running on a cluster Vnodes
  • 27. *  ./stress -d 192.168.1.10,192.168.1.11,192.168.1.12 -o insert -I DeflateCompressor *  Note: nodes to use *  You will get different performance if you insert to less nodes than you have in your ring Stress test commands
  • 28. *  Adding a node (in the absence of Vnodes) Must  seed  form  a  known  node   Use  a  program  to  calculate  new  keys     Bring  up  new  node  with  the  correct  key  in   cassandra.yaml   Use  node  tool  to  move  other  nodes   Adding Nodes Procedure
  • 29. *  Python code import  sys   if  (len(sys.argv)  >  1):        num  =  int(sys.argv[1])   else:        num  =  int(raw_input("How  many  nodes?  :"))   for  i  in  range(0,num):        print  'node  %d:  %d'  %  (i,  (i*(2**127)/num))   Calculating keys
  • 30. *  Use nodetool sudo  ./nodetool  -­‐h  192.168.1.10  move   42535295865117307932921825928971026432   *  And cleanup ./nodetool  -­‐h  192.168.1.10  cleanup   Moving existing nodes
  • 31. *  On Debian, you can free memory from the graphics chip Cd  /boot   sudo  cp  start.elf  start.elf.old   sudo  cp  arm224_start.elf  to  start.elf   reboot   Getting more memory
  • 32. *  Under Rasbian *  Run with a monitor plugged for the first time *  Set options for screen memory *  Perhaps disable boot to GUI Getting more Memory
  • 33. *  I prefer static network addresses *  Edit /etc/network/interfaces iface  eth0  inet  sta+c                address  192.168.1.41                netmask  255.255.255.0                network  192.168.1.0                broadcast  192.168.1.255                gateway  192.168.1.254   *  Network address
  • 34. *  Make a master SD card *  Copy it ! *  Make sure the master version has no data on it. *  Consider ”Puppet” (though I don’t use it) Multiple nodes
  • 35. *  See https://github.com/acobley/CassandraStartup *  Put the file in /etc/init.d *  update-rc.d cassandra defaults Starting as a service
  • 36. *  So for £200 we get an 8 node C* cluster *  It can be reconfigured, blown away, stress tested and generally abused *  We can simulate data racks, data centers and I hope even long network delays. *  Hopefully our upcoming MSc in Data Science will use these clusters Pi is for teaching
  • 37. *  We know C* can be configured to be aware of: Network  racks   Data  Centers   *  We know we can have replicas are stored across these racks *  How can we play with this cheaply ? C* is network aware
  • 38. Proposed teaching tool 10mbs   Hubb   Noise   injec+on   Switch   2   Switch   1   Pi  1   Pi  2   Pi  3   Pi  1   Pi  2   Pi  3  
  • 39. *  Cassandra wouldn’t run on a PI *  It does now. *  Running it on a Pi shook out some Cassandra bugs *  You can run it in a secure lab Pi is discovery
  • 40. *  Most important, this was pure Geeky Fun Pi is for fun
  • 41. *  Data Science: *  http://www.computing.dundee.ac.uk/study/postgrad/ degreedetails.asp?17 Obligatory Plug
  • 42. *  Raspberry Pi is cheap *  C* needs some work to run on it *  You can make clusters cheaply for experimentation *  It’s fun ! C* is Hardware Agnostic