SlideShare uma empresa Scribd logo
1 de 28
@cassandralondon
Thanks
Reminder
Next meetup Wednesday 8th December
Jake Luciani will be giving a talk on
"Lucandra" (a Cassandra backend for
Lucene open source search software)
Quick intro to Cassandra
• Decentralized
• Fault-tolerant
• Tunable consistency
• Elasticity
This talk
Why consider EC2?
What are the challenges of running
Cassandra on EC2?
Is it a good idea?
Cassandra design decisions
Cassandra designed to run on many
commodity servers
It is designed to deal with unreliable
hardware and networks
Why consider EC2?
On demand instances
“frees you from the costs and complexities
of planning, purchasing, and maintaining
hardware and transforms what are
commonly large fixed costs into much
smaller variable costs”
http://aws.amazon.com/ec2/pricing/
Why consider EC2?
Multiple “Availability Zones” in multiple
regions (US East, US West, Ireland and
Singapore)
http://aws.amazon.com/ec2/
Writing to Cassandra
1. Write added to local log on target
machine
2. Memtable updated
3. Memtable flushed to disk as data
files (SSTable plus SSTable Index)
4. Eventually data files are compacted
http://wiki.apache.org/cassandra/ArchitectureOverview#Write_path
IO
IO
IO
Reading from Cassandra
1. Read from any node
2. Partitioner
3. Wait for R responses
4. Wait for N – R responses in the
background and perform read repair
http://wiki.apache.org/cassandra/ArchitectureOverview#Read_path
IO
IO
Reading from Cassandra
Reads from multiple SSTables
The application use-case will affect
performance and what the bottleneck is
(totally random reads being worst case)
IO
The challenges
Getting good enough I/O performance
Not a huge number of resources on the
Internet (new and shiny)
Some minor setup and monitoring
challenges (documentation is available)
EC2 I/O performance
Ephemeral or EBS; low, moderate or
high I/O performance indicators
“other resources like the network and the disk
subsystem are shared among instances…
when a resource is under-utilized you will
often be able to consume a higher share of
that resource”
http://aws.amazon.com/ec2/instance-types/
EBS or ephemeral?
Jonathan Ellis recently on mailing list:
“we recommend using raid0 ephemeral disks
on EC2 with L or XL instance sizes for better
i/o performance.”
http://cassandra-user-incubator-apache-
org.3065146.n2.nabble.com/Cold-boot-performance-problems-
tp5615829p5615889.html
http://www.coreyhulen.org/?p=326
EBS or ephemeral?
Amazon suggest EBS is better:
“Amazon EBS is particularly suited for
applications that require a database, file
system, or access to raw block level storage”
http://aws.amazon.com/ebs/
“The latency and throughput of Amazon EBS
volumes is designed to be significantly better
than the Amazon EC2 instance stores in nearly
all cases. You can also attach multiple volumes
to an instance and stripe across the volumes.
This is one way to improve I/O rates,
especially if your application performs a lot of
random access across your data set.”
http://aws.amazon.com/ebs/
EC2 I/O benchmark
Throughput measured using dd
Seek measured using seeker.c
Software RAID uses mdadm
http://www.linuxinsight.com/how_fast_is_your_disk.html
http://en.wikipedia.org/wiki/Mdadm
Which is better?
EBS has better throughput, ephemeral
better for random seeks
Generic benchmarks aren’t great –
depends on your use case
Warning: EC2 performance not
consistent
EC2 Cassandra benchmark
Read and write TPS
Benchmarks carried out by Corey Hulen
http://www.coreyhulen.org/?p=326
Which is better?
Corey suggests:
“Raid 0 EBS drives are the way to go”
“We didn’t notice a difference above the
normal EC2 fluctuations when testing
for 2 vs 4 drives”
Conclusions
Cassandra will run acceptably on EC2, but
real HW is better
It will depend on your use case –
particularly the types of read that you do
Real HW may work out cheaper
Conclusions
Ephemeral I/O seems to be better than
EBS, although EBS has other advantages
(doesn’t disappear if you stop the node)
Again, it will depend on use case
Conclusions
Large nodes are the best bet
Small nodes have poor I/O
Extra large nodes are probably not
worth it (better to have more nodes)
http://cassandra-user-incubator-apache-
org.3065146.n2.nabble.com/Nodes-dropping-out-of-cluster-
due-to-GC-tp5128481p5131568.html
Questions?
Please leave feedback on meetup.com
Follow @cassandralondon on Twitter

Mais conteúdo relacionado

Mais procurados

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
Data Con LA
 

Mais procurados (20)

Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
 
Multi-Region Cassandra Clusters
Multi-Region Cassandra ClustersMulti-Region Cassandra Clusters
Multi-Region Cassandra Clusters
 
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
Getting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute ServicesGetting Started with Amazon EC2 and Compute Services
Getting Started with Amazon EC2 and Compute Services
 
Micro-batching: High-performance writes
Micro-batching: High-performance writesMicro-batching: High-performance writes
Micro-batching: High-performance writes
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
Building a Multi-Region Cluster at Target (Aaron Ploetz, Target) | Cassandra ...
 
Cisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStackCisco: Cassandra adoption on Cisco UCS & OpenStack
Cisco: Cassandra adoption on Cisco UCS & OpenStack
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013Cassandra at eBay - Cassandra Summit 2013
Cassandra at eBay - Cassandra Summit 2013
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
 
Cassandra Summit 2014: Novel Multi-Region Clusters — Cassandra Deployments Sp...
Cassandra Summit 2014: Novel Multi-Region Clusters — Cassandra Deployments Sp...Cassandra Summit 2014: Novel Multi-Region Clusters — Cassandra Deployments Sp...
Cassandra Summit 2014: Novel Multi-Region Clusters — Cassandra Deployments Sp...
 
Load testing Cassandra applications
Load testing Cassandra applicationsLoad testing Cassandra applications
Load testing Cassandra applications
 
Disaster Recovery Synapse
Disaster Recovery SynapseDisaster Recovery Synapse
Disaster Recovery Synapse
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 

Semelhante a Running Cassandra on Amazon EC2

Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
Tom Laszewski
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
Tom Laszewski
 
So There’s This Amazon Thing
So There’s This Amazon ThingSo There’s This Amazon Thing
So There’s This Amazon Thing
Jared Faris
 
Scaling web application in the Cloud
Scaling web application in the CloudScaling web application in the Cloud
Scaling web application in the Cloud
Federico Feroldi
 

Semelhante a Running Cassandra on Amazon EC2 (20)

Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your Startup
 
Amazon AWS basics needed to run a Cassandra Cluster in AWS
Amazon AWS basics needed to run a Cassandra Cluster in AWSAmazon AWS basics needed to run a Cassandra Cluster in AWS
Amazon AWS basics needed to run a Cassandra Cluster in AWS
 
AWS January 2016 Webinar Series - Amazon Aurora for Enterprise Database Appli...
AWS January 2016 Webinar Series - Amazon Aurora for Enterprise Database Appli...AWS January 2016 Webinar Series - Amazon Aurora for Enterprise Database Appli...
AWS January 2016 Webinar Series - Amazon Aurora for Enterprise Database Appli...
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech TalksDeep Dive on Elastic File System - February 2017 AWS Online Tech Talks
Deep Dive on Elastic File System - February 2017 AWS Online Tech Talks
 
Diveinto AWS
Diveinto AWS Diveinto AWS
Diveinto AWS
 
AWS Webcast - Explore the AWS Cloud for Government
AWS Webcast - Explore the AWS Cloud for GovernmentAWS Webcast - Explore the AWS Cloud for Government
AWS Webcast - Explore the AWS Cloud for Government
 
NWCloud Cloud Track - Best Practices for Architecting in the Cloud
NWCloud Cloud Track - Best Practices for Architecting in the CloudNWCloud Cloud Track - Best Practices for Architecting in the Cloud
NWCloud Cloud Track - Best Practices for Architecting in the Cloud
 
AWS Session.pptx
AWS Session.pptxAWS Session.pptx
AWS Session.pptx
 
PASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best PracticesPASS 17 SQL Server on AWS Best Practices
PASS 17 SQL Server on AWS Best Practices
 
Amazon Cassandra Basics & Guidelines for AWS/EC2/VPC/EBS
Amazon Cassandra Basics & Guidelines for AWS/EC2/VPC/EBSAmazon Cassandra Basics & Guidelines for AWS/EC2/VPC/EBS
Amazon Cassandra Basics & Guidelines for AWS/EC2/VPC/EBS
 
AWS Architecting In The Cloud
AWS Architecting In The CloudAWS Architecting In The Cloud
AWS Architecting In The Cloud
 
So There’s This Amazon Thing
So There’s This Amazon ThingSo There’s This Amazon Thing
So There’s This Amazon Thing
 
Scaling web application in the Cloud
Scaling web application in the CloudScaling web application in the Cloud
Scaling web application in the Cloud
 
Getting Started with Amazon Aurora
Getting Started with Amazon AuroraGetting Started with Amazon Aurora
Getting Started with Amazon Aurora
 
AMAZON CLOUD Course Content
AMAZON CLOUD Course ContentAMAZON CLOUD Course Content
AMAZON CLOUD Course Content
 
Time to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the CloudTime to Science, Time to Results. Accelerating Scientific research in the Cloud
Time to Science, Time to Results. Accelerating Scientific research in the Cloud
 
Scalable Web Architecture
Scalable Web ArchitectureScalable Web Architecture
Scalable Web Architecture
 
Design, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech Talks
Design, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech TalksDesign, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech Talks
Design, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech Talks
 

Mais de Dave Gardner

Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011
Dave Gardner
 
2011.07.18 cassandrameetup
2011.07.18 cassandrameetup2011.07.18 cassandrameetup
2011.07.18 cassandrameetup
Dave Gardner
 

Mais de Dave Gardner (13)

Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)Cabs, Cassandra, and Hailo (at Cassandra EU)
Cabs, Cassandra, and Hailo (at Cassandra EU)
 
Cabs, Cassandra, and Hailo
Cabs, Cassandra, and HailoCabs, Cassandra, and Hailo
Cabs, Cassandra, and Hailo
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13
 
Cassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patternsCassandra concepts, patterns and anti-patterns
Cassandra concepts, patterns and anti-patterns
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
 
Cassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache CassandraCassandra's Sweet Spot - an introduction to Apache Cassandra
Cassandra's Sweet Spot - an introduction to Apache Cassandra
 
Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011Intro slides from Cassandra London July 2011
Intro slides from Cassandra London July 2011
 
2011.07.18 cassandrameetup
2011.07.18 cassandrameetup2011.07.18 cassandrameetup
2011.07.18 cassandrameetup
 
Cassandra + Hadoop = Brisk
Cassandra + Hadoop = BriskCassandra + Hadoop = Brisk
Cassandra + Hadoop = Brisk
 
Introduction to Cassandra at London Web Meetup
Introduction to Cassandra at London Web MeetupIntroduction to Cassandra at London Web Meetup
Introduction to Cassandra at London Web Meetup
 
PHP and Cassandra
PHP and CassandraPHP and Cassandra
PHP and Cassandra
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Running Cassandra on Amazon EC2

  • 3. Reminder Next meetup Wednesday 8th December Jake Luciani will be giving a talk on "Lucandra" (a Cassandra backend for Lucene open source search software)
  • 4. Quick intro to Cassandra • Decentralized • Fault-tolerant • Tunable consistency • Elasticity
  • 5. This talk Why consider EC2? What are the challenges of running Cassandra on EC2? Is it a good idea?
  • 6. Cassandra design decisions Cassandra designed to run on many commodity servers It is designed to deal with unreliable hardware and networks
  • 7. Why consider EC2? On demand instances “frees you from the costs and complexities of planning, purchasing, and maintaining hardware and transforms what are commonly large fixed costs into much smaller variable costs” http://aws.amazon.com/ec2/pricing/
  • 8. Why consider EC2? Multiple “Availability Zones” in multiple regions (US East, US West, Ireland and Singapore) http://aws.amazon.com/ec2/
  • 9. Writing to Cassandra 1. Write added to local log on target machine 2. Memtable updated 3. Memtable flushed to disk as data files (SSTable plus SSTable Index) 4. Eventually data files are compacted http://wiki.apache.org/cassandra/ArchitectureOverview#Write_path IO IO IO
  • 10. Reading from Cassandra 1. Read from any node 2. Partitioner 3. Wait for R responses 4. Wait for N – R responses in the background and perform read repair http://wiki.apache.org/cassandra/ArchitectureOverview#Read_path IO IO
  • 11. Reading from Cassandra Reads from multiple SSTables The application use-case will affect performance and what the bottleneck is (totally random reads being worst case) IO
  • 12. The challenges Getting good enough I/O performance Not a huge number of resources on the Internet (new and shiny) Some minor setup and monitoring challenges (documentation is available)
  • 13. EC2 I/O performance Ephemeral or EBS; low, moderate or high I/O performance indicators “other resources like the network and the disk subsystem are shared among instances… when a resource is under-utilized you will often be able to consume a higher share of that resource” http://aws.amazon.com/ec2/instance-types/
  • 14. EBS or ephemeral? Jonathan Ellis recently on mailing list: “we recommend using raid0 ephemeral disks on EC2 with L or XL instance sizes for better i/o performance.” http://cassandra-user-incubator-apache- org.3065146.n2.nabble.com/Cold-boot-performance-problems- tp5615829p5615889.html http://www.coreyhulen.org/?p=326
  • 15. EBS or ephemeral? Amazon suggest EBS is better: “Amazon EBS is particularly suited for applications that require a database, file system, or access to raw block level storage” http://aws.amazon.com/ebs/
  • 16. “The latency and throughput of Amazon EBS volumes is designed to be significantly better than the Amazon EC2 instance stores in nearly all cases. You can also attach multiple volumes to an instance and stripe across the volumes. This is one way to improve I/O rates, especially if your application performs a lot of random access across your data set.” http://aws.amazon.com/ebs/
  • 17. EC2 I/O benchmark Throughput measured using dd Seek measured using seeker.c Software RAID uses mdadm http://www.linuxinsight.com/how_fast_is_your_disk.html http://en.wikipedia.org/wiki/Mdadm
  • 18.
  • 19.
  • 20. Which is better? EBS has better throughput, ephemeral better for random seeks Generic benchmarks aren’t great – depends on your use case Warning: EC2 performance not consistent
  • 21. EC2 Cassandra benchmark Read and write TPS Benchmarks carried out by Corey Hulen http://www.coreyhulen.org/?p=326
  • 22.
  • 23.
  • 24. Which is better? Corey suggests: “Raid 0 EBS drives are the way to go” “We didn’t notice a difference above the normal EC2 fluctuations when testing for 2 vs 4 drives”
  • 25. Conclusions Cassandra will run acceptably on EC2, but real HW is better It will depend on your use case – particularly the types of read that you do Real HW may work out cheaper
  • 26. Conclusions Ephemeral I/O seems to be better than EBS, although EBS has other advantages (doesn’t disappear if you stop the node) Again, it will depend on use case
  • 27. Conclusions Large nodes are the best bet Small nodes have poor I/O Extra large nodes are probably not worth it (better to have more nodes) http://cassandra-user-incubator-apache- org.3065146.n2.nabble.com/Nodes-dropping-out-of-cluster- due-to-GC-tp5128481p5131568.html
  • 28. Questions? Please leave feedback on meetup.com Follow @cassandralondon on Twitter