SlideShare uma empresa Scribd logo
1 de 49
SriSatish Ambati
Performance, Riptano, Cassandra
Azul Systems & OpenJDK
Twitter: @srisatish
srisatish.ambati@gmail.com
Cache & Concurrency considerations
for a high performance Cassandra
Trail ahead
Elements of Cache Performance
Metrics, Monitors
JVM goes to BigData Land!
Examples
Lucandra, Twissandra
Cassandra Performance with JVM
Commentary
Runtime Views
Non Blocking HashMap
Locking: concurrency
Garbage Collection
A feather in the CAP
• Eventual
Consistency
– Levels
– Doesn’t mean data
loss (journaled)
• SEDA
– Partitioning, Cluster
& Failure detection,
Storage engine mod
– Event driven & non-
blocking io
– Pure Java
Count what is countable, measure what is measurable, and what is not
measurable, make measurable
-Galileo
Elements of Cache Performance
Metrics
• Operations:
– Ops/s: Puts/sec, Gets/sec, updates/sec
– Latencies, percentiles
– Indexing
• # of nodes – scale, elasticity
• Replication
– Synchronous, Asynchronous (fast writes)
• Tuneable Consistency
• Durability/Persistence
• Size & Number of Objects, Size of Cache
• # of user clients
Elements of Cache Performance:
“Think Locality”
• Hot or Not: The 80/20 rule.
– A small set of objects are very popular!
– What is the most RT tweet?
• Hit or Miss: Hit Ratio
– How effective is your cache?
– LRU, LFU, FIFO.. Expiration
• Long-lived objects lead to better locality.
• Spikes happen
– Cascading events
– Cache Thrash: full table scans
Real World Performance
• Facebook Inbox
– Writes:0.12ms, Reads:15ms @ 50GB data
• Twitter performance
– Twissandra (simulation)
• Cassandra for Search & Portals
– Lucandra, solandra (simulation)
• ycbs/PNUTS benchmarks
– 5ms read/writes @ 5k ops/s (50/50 Update heavy)
– 8ms reads/5ms writes @ 5k ops/s (95/5 read heavy)
• Lab environment
– ~5k writes per sec per node, <5ms latencies
– ~10k reads per sec per node, <5ms latencies
• Performance has improved in newer versions
yahoo cloud store benchmark
50/50 – Update Heavy
yahoo cloud store benchmark
95/5 – read heavy
JVM in BigData Land!
Limits for scale
• Locks : synchronized
– Can’t use all my multi-cores!
– java.util.collections also hold locks
– Use non-blocking collections!
• (de)Serialization is expensive
– Hampers object portability
– Use avro, thrift!
• Object overhead
– average enterprise collection has 3 elements!
– Use byte[ ], primitives where possible!
• Garbage Collection
– Can’t throw memory at the problem!
– Mitigate, Monitor, Measure foot print
Tools
• What is the JVM doing:
– dtrace, hprof, introscope, jconsole,
visualvm, yourkit, azul zvision
• Invasive JVM observation tools
– bci, jvmti, jvmdi/pi agents, jmx, logging
• What is the OS doing:
– dtrace, oprofile, vtune
• What is the network disk doing:
– Ganglia, iostat, lsof, netstat, nagios
furiously fast writes
• Append only writes
– Sequential disk access
• No locks in critical path
• Key based atomicity
client
issues
write
n1
partitioner
commit log
apply to
memory
n2
find node
furiously fast writes
• Use separate disks for commitlog
– Don’t forget to size them well
– Isolation difficult in the cloud..
• Memtable/SSTable sizes
– Delicately balanced with GC
• memtable_throughput_in_mb
Cassandra on EC2 cloud
*Corey Hulen, EC2
Cassandra on EC2 cloud
Compactions
K1 < Serialized data >
K2 < Serialized data >
K3 < Serialized data >
--
--
--
Sorted
K2 < Serialized data >
K10 < Serialized data >
K30 < Serialized data >
--
--
--
Sorted
K4 < Serialized data >
K5 < Serialized data >
K10 < Serialized data >
--
--
--
Sorted
MERGE SORT
Loaded in memory
K1 < Serialized data >
K2 < Serialized data >
K3 < Serialized data >
K4 < Serialized data >
K5 < Serialized data >
K10 < Serialized data >
K30 < Serialized data >
Sorted
K1 Offset
K5 Offset
K30 Offset
Bloom Filter
Index File
Data File
D E L E T E D
Compactions
• Intense disk io & mem churn
• Triggers GC for tombstones
• Minor/Major Compactions
• Reduce priority for better reads
• Other Parameters -
– CompactionManager.
minimumCompactionThreshold=xxxx
Example: compaction in
realworld, cloudkick
reads design
reads performance
• BloomFilter used to identify the right file
• Maintain column indices to look up columns
– Which can span different SSTables
• Less io than typical b-tree
• Cold read: Two seeks
– One for Key lookup, another row lookup
• Key Cache
– Optimized in latest cassandra
• Row Cache
– Improves read performance
– GC sensitive for large rows.
• Most (google) applications require single row
transactions*
*Sanjay G, BigTable Design, Google.
Client Performance
Marshal Arts:
Ser/Deserialization
• Clients dominated by Thrift, Avro
– Hector, Pelops
• Thrift: upgrade to latest: 0.5, 0.4
• No news: java.io.Serializable is S.L..O.…W
• Use “transient”
• avro, thrift, proto-buf
• Common Patterns of Doom:
– Death by a million gets
Serialization + Deserialization
uBench• http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2
Adding Nodes
• New nodes
– Add themselves to busiest node
– And then Split its Range
• Busy Node starts transmit to new node
• Bootstrap logic initiated from any node, cli, web
• Each node capable of ~40MB/s
– Multiple replicas to parallelize bootstrap
• UDP for control messages
• TCP for request routing
inter-node comm
• Gossip Protocol
– It’s exponential
– (epidemic algorithm)
• Failure Detector
– Accrual rate phi
• Anti-Entropy
– Bringing replicas to uptodate
Bloom Filter: in full bloom
• “constant” time
• size:compact
• false positives
• Single lookup
for key in file
• Deletion
• Improve
– Counting BF
– Bloomier filters
Birthdays, Collisions &
Hashing functions
• Birthday Paradox
For the N=21 people in this room
Probability that at least 2 of them share same birthday is
~0.47
• Collisions are real!
• An unbalanced HashMap behaves like a list O(n) retrieval
• Chaining & Linear probing
• Performance Degrades
• with 80% table density
•
the devil’s in the details
CFS
• All in the
family!
• denormalize
Memtable
• In-memory
• ColumnFamily specific
• throughput
determines size before
flush
• Larger memtables can
improve reads
SSTable
• MemTable “flushes”
to a SSTable
• Immutable after
• Read: Multiple
SSTable lookups
possible
• Chief Execs:
– SSTableWriter
– SSTableReader
Write: Runtime threads
Writes: runtime mem
Example: Java Overheads
writes: monitors
U U I D
• java.util.UUID is slow
– static use leads to contention
SecureRandom
• Uses /dev/urandom for seed initialization
-Djava.security.egd=file:/dev/urandom
• PRNG without file is atleast 20%-40% better.
• Use TimeUUIDs where possible – much faster
• JUG – java.uuid.generator
• http://github.com/cowtowncoder/java-uuid-generator
• http://jug.safehaus.org/
• http://johannburkard.de/blog/programming/java/Java-UUID-generators-compared.html
synchronized
• Coarse grained locks
• io under lock
• Stop signal on a highway
• java.util.concurrent does not mean no
locks
• Non Blocking, Lock free, Wait free
collections
Scalable Lock-Free Coding Style
• Big Array to hold Data
• Concurrent writes via: CAS & Finite State
Machine
– No locks, no volatile
– Much faster than locking under heavy load
– Directly reach main data array in 1 step
• Resize as needed
– Copy Array to a larger Array on demand
– Use State Machine to help copy
– “ Mark” old Array words to avoid missing late
updates
Non-Blocking HashMap
0 100 200 300 400 500 600 700 800
0
200
400
600
800
1000
1200
Threads
M-ops/sec
0 100 200 300 400 500 600 700 800
0
200
400
600
800
1000
1200
Threads
M-ops/sec
NB-99
CHM-99
NB-75
CHM-75
1K Table 1M Table
NB
CHM
Azul Vega2 – 768 cpus
Cassandra uses High Scale
Non-Blocking Hashmap
public class BinaryMemtable implements IFlushable
{
…
private final Map<DecoratedKey,byte[]> columnFamilies =
new NonBlockingHashMap<DecoratedKey, byte[]>();
/* Lock and Condition for notifying new clients about Memtable
switches */
private final Lock lock = new ReentrantLock(); Condition condition;
…
}
public class Table
{
…
private static final Map<String, Table> instances = new
NonBlockingHashMap<String, Table>();
…
}
GC-sensitive elements within
Cassandra
• Compaction triggers System.gc()
– Tombstones from files
• “GCInspector”
• Memtable Threshold, sizes
• SSTable sizes
• Low overhead collection choices
Garbage Collection
• Pause Times
if stop_the_word_FullGC > ttl_of_node
=> failed requests; failure accrual & node repair.
• Allocation Rate
– New object creation, insertion rate
• Live Objects (residency)
– if residency in heap > 50%
– GC overheads dominate.
• Overhead
– space, cpu cycles spent GC
• 64-bit not addressing pause times
– Bigger is not better!
Memory Fragmentation
• Fragmentation
– Performance degrades over time
– Inducing “Full GC” makes problem go away
– Free memory that cannot be used
• Reduce occurrence
– Use a compacting collector
– Promote less often
– Use uniform sized objects
• Solution – unsolved
– Use latest CMS with CR:6631166
– Azul’s Zing JVM & Pauseless GC
CASSANDRA-1014
Best Practices:
Garbage Collection
• GC Logs are cheap even in
production
-Xloggc:/var/log/cassandra/gc.log
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
-XX:+PrintHeapAtGC
• Slightly expensive ones:
-XX:PrintFLSStatistics=2 -XX:CMSStatistics=1
-XX:CMSInitiationStatistics
Sizing: Young Generation
• Should we set –Xms == -Xmx ?
• Use –Xmn (fixed eden)
survivor spaces
allocations {new Object();}
eden
promotion
old generation
allocation by jvm
survivor ratio
Tenuring
Threshold
Tuning CMS
• Don’t promote too often!
– Frequent promotion causes fragmentation
• Size the generations
– Min GC times are a function of Live Set
– Old Gen should host steady state comfortably
• Parallelize on multicores:
– -XX:ParallelCMSThreads=4
– -XX:ParallelGCThreads=4
• Avoid CMS Initiating heuristic
– -XX:+UseCMSInitiationOccupanyOnly
• Use Concurrent for System.gc()
– -XX:+ExplicitGCInvokesConcurrent
Summary
Design & Implementation of Cassandra takes advantages
of strengths while avoiding common JVM issues.
• Locks:
– Avoids locks in critical path
– Uses non-blocking collections, TimeUUIDs!
– Still Can’t use all my multi-cores..?
>> Other bottlenecks to find!
• De/Serialization:
– Uses avro, thrift!
• Object overhead
– Uses mostly byte[ ], primitives where possible!
• Garbage Collection
– Mitigate: Monitor, Measure foot print.
– Work in progress by all jvm vendors!
Cassandra starts from a great footing from a JVM standpoint
and will reap the benefits of the platform!
Q&AReferences
• Verner Wogels, Eventually Consistent
http://www.allthingsdistributed.com/2008/12/eventually_consistent.htm
• Bloom, Burton H. (1970), "Space/time trade-offs in hash coding
with allowable errors"
• Avinash Lakshman, http://static.last.fm/johan/nosql-
20090611/cassandra_nosql.pdf
• Eric Brewer, CAP
http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
• Tony Printzeis, Charlie Hunt, Javaone Talk
http://www.scribd.com/doc/36090475/GC-Tuning-in-the-Java
• http://github.com/digitalreasoning/PyStratus/wiki/Documentation
• http://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf
• Cassandra on Cloud, http://www.coreyhulen.org/?p=326
• Cliff Click’s, Non-blocking HashMap
http://sourceforge.net/projects/high-scale-lib/
• Brian F. Cooper., Yahoo Cloud Storage Benchmark,
http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf

Mais conteúdo relacionado

Mais procurados

Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSDataStax Academy
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaPostgreSQL-Consulting
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
The Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsThe Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsRomain Jacotin
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlLeon Chen
 
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)Spark Summit
 
Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldJignesh Shah
 
Divide and conquer in the cloud
Divide and conquer in the cloudDivide and conquer in the cloud
Divide and conquer in the cloudJustin Swanhart
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
How Shit Works: Storage
How Shit Works: StorageHow Shit Works: Storage
How Shit Works: StorageTomer Gabel
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practiceAlexey Lesovsky
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataJignesh Shah
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation FactoryNathan Milford
 
Diagnosing MySQL performance problems
Diagnosing  MySQL performance problemsDiagnosing  MySQL performance problems
Diagnosing MySQL performance problemsJustin Swanhart
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALEPostgreSQL Experts, Inc.
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low LatencyNick Dimiduk
 

Mais procurados (20)

Shootout at the AWS Corral
Shootout at the AWS CorralShootout at the AWS Corral
Shootout at the AWS Corral
 
PostgreSQL on Solaris
PostgreSQL on SolarisPostgreSQL on Solaris
PostgreSQL on Solaris
 
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWSCassandra Summit 2014: Performance Tuning Cassandra in AWS
Cassandra Summit 2014: Performance Tuning Cassandra in AWS
 
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 ViennaAutovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
Autovacuum, explained for engineers, new improved version PGConf.eu 2015 Vienna
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
The Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systemsThe Google Chubby lock service for loosely-coupled distributed systems
The Google Chubby lock service for loosely-coupled distributed systems
 
Introduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission ControlIntroduction of Java GC Tuning and Java Java Mission Control
Introduction of Java GC Tuning and Java Java Mission Control
 
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
Deconstructiong Recommendations on Spark-(Ilya Ganelin, Capital One)
 
Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
Divide and conquer in the cloud
Divide and conquer in the cloudDivide and conquer in the cloud
Divide and conquer in the cloud
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
How Shit Works: Storage
How Shit Works: StorageHow Shit Works: Storage
How Shit Works: Storage
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practice
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
 
7 Ways To Crash Postgres
7 Ways To Crash Postgres7 Ways To Crash Postgres
7 Ways To Crash Postgres
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation Factory
 
Diagnosing MySQL performance problems
Diagnosing  MySQL performance problemsDiagnosing  MySQL performance problems
Diagnosing MySQL performance problems
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALE
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 

Destaque

【iT邦幫忙交流聚(1)】讓你的網站更潮-Twitter Boostrap入門
【iT邦幫忙交流聚(1)】讓你的網站更潮-Twitter Boostrap入門【iT邦幫忙交流聚(1)】讓你的網站更潮-Twitter Boostrap入門
【iT邦幫忙交流聚(1)】讓你的網站更潮-Twitter Boostrap入門Brecht Huang
 
Genius scan - Du boostrap à 20 millions d’utilisateurs, techniques et outils ...
Genius scan - Du boostrap à 20 millions d’utilisateurs, techniques et outils ...Genius scan - Du boostrap à 20 millions d’utilisateurs, techniques et outils ...
Genius scan - Du boostrap à 20 millions d’utilisateurs, techniques et outils ...CocoaHeads France
 
Introduction to Bootstrap
Introduction to BootstrapIntroduction to Bootstrap
Introduction to BootstrapRon Reiter
 

Destaque (6)

【iT邦幫忙交流聚(1)】讓你的網站更潮-Twitter Boostrap入門
【iT邦幫忙交流聚(1)】讓你的網站更潮-Twitter Boostrap入門【iT邦幫忙交流聚(1)】讓你的網站更潮-Twitter Boostrap入門
【iT邦幫忙交流聚(1)】讓你的網站更潮-Twitter Boostrap入門
 
Genius scan - Du boostrap à 20 millions d’utilisateurs, techniques et outils ...
Genius scan - Du boostrap à 20 millions d’utilisateurs, techniques et outils ...Genius scan - Du boostrap à 20 millions d’utilisateurs, techniques et outils ...
Genius scan - Du boostrap à 20 millions d’utilisateurs, techniques et outils ...
 
Boostrap - Start Up
Boostrap - Start UpBoostrap - Start Up
Boostrap - Start Up
 
Bootstrap
BootstrapBootstrap
Bootstrap
 
Bootstrap latihan
Bootstrap latihanBootstrap latihan
Bootstrap latihan
 
Introduction to Bootstrap
Introduction to BootstrapIntroduction to Bootstrap
Introduction to Bootstrap
 

Semelhante a ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)

Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraJon Haddad
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionDataStax Academy
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Jon Haddad
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in JavaRuben Badaró
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchJoe Alex
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...srisatish ambati
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesHazelcast
 
Cassandra
CassandraCassandra
Cassandraexsuns
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future DesignPivotalOpenSourceHub
 
Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014Charles Nutter
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Speedment, Inc.
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]Malin Weiss
 

Semelhante a ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM) (20)

Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
 
Server Tips
Server TipsServer Tips
Server Tips
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
 
In-memory Data Management Trends & Techniques
In-memory Data Management Trends & TechniquesIn-memory Data Management Trends & Techniques
In-memory Data Management Trends & Techniques
 
Cassandra
CassandraCassandra
Cassandra
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design
 
Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014Bringing Concurrency to Ruby - RubyConf India 2014
Bringing Concurrency to Ruby - RubyConf India 2014
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
 

Mais de srisatish ambati

H2O Open Dallas 2016 keynote for Business Transformation
H2O Open Dallas 2016 keynote for Business TransformationH2O Open Dallas 2016 keynote for Business Transformation
H2O Open Dallas 2016 keynote for Business Transformationsrisatish ambati
 
Digital Transformation with AI and Data - H2O.ai and Open Source
Digital Transformation with AI and Data - H2O.ai and Open SourceDigital Transformation with AI and Data - H2O.ai and Open Source
Digital Transformation with AI and Data - H2O.ai and Open Sourcesrisatish ambati
 
Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.srisatish ambati
 
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopJava one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopsrisatish ambati
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoopsrisatish ambati
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoopsrisatish ambati
 
Brisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavaBrisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavasrisatish ambati
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccsrisatish ambati
 
Svccg nosql 2011_sri-cassandra
Svccg nosql 2011_sri-cassandraSvccg nosql 2011_sri-cassandra
Svccg nosql 2011_sri-cassandrasrisatish ambati
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...srisatish ambati
 
How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Javasrisatish ambati
 

Mais de srisatish ambati (15)

H2O Open Dallas 2016 keynote for Business Transformation
H2O Open Dallas 2016 keynote for Business TransformationH2O Open Dallas 2016 keynote for Business Transformation
H2O Open Dallas 2016 keynote for Business Transformation
 
Digital Transformation with AI and Data - H2O.ai and Open Source
Digital Transformation with AI and Data - H2O.ai and Open SourceDigital Transformation with AI and Data - H2O.ai and Open Source
Digital Transformation with AI and Data - H2O.ai and Open Source
 
Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.Top 10 Performance Gotchas for scaling in-memory Algorithms.
Top 10 Performance Gotchas for scaling in-memory Algorithms.
 
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoopJava one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
 
High order bits from cassandra & hadoop
High order bits from cassandra & hadoopHigh order bits from cassandra & hadoop
High order bits from cassandra & hadoop
 
Cassandra at no_sql
Cassandra at no_sqlCassandra at no_sql
Cassandra at no_sql
 
Brisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjavaBrisk hadoop june2011_sfjava
Brisk hadoop june2011_sfjava
 
Brisk hadoop june2011
Brisk hadoop june2011Brisk hadoop june2011
Brisk hadoop june2011
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Jvm goes big_data_sfjava
Jvm goes big_data_sfjavaJvm goes big_data_sfjava
Jvm goes big_data_sfjava
 
jvm goes to big data
jvm goes to big datajvm goes to big data
jvm goes to big data
 
Svccg nosql 2011_sri-cassandra
Svccg nosql 2011_sri-cassandraSvccg nosql 2011_sri-cassandra
Svccg nosql 2011_sri-cassandra
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
 
How to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in JavaHow to Stop Worrying and Start Caching in Java
How to Stop Worrying and Start Caching in Java
 

Último

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Último (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

ApacheCon2010: Cache & Concurrency Considerations in Cassandra (& limits of JVM)

  • 1. SriSatish Ambati Performance, Riptano, Cassandra Azul Systems & OpenJDK Twitter: @srisatish srisatish.ambati@gmail.com Cache & Concurrency considerations for a high performance Cassandra
  • 2. Trail ahead Elements of Cache Performance Metrics, Monitors JVM goes to BigData Land! Examples Lucandra, Twissandra Cassandra Performance with JVM Commentary Runtime Views Non Blocking HashMap Locking: concurrency Garbage Collection
  • 3. A feather in the CAP • Eventual Consistency – Levels – Doesn’t mean data loss (journaled) • SEDA – Partitioning, Cluster & Failure detection, Storage engine mod – Event driven & non- blocking io – Pure Java
  • 4. Count what is countable, measure what is measurable, and what is not measurable, make measurable -Galileo
  • 5. Elements of Cache Performance Metrics • Operations: – Ops/s: Puts/sec, Gets/sec, updates/sec – Latencies, percentiles – Indexing • # of nodes – scale, elasticity • Replication – Synchronous, Asynchronous (fast writes) • Tuneable Consistency • Durability/Persistence • Size & Number of Objects, Size of Cache • # of user clients
  • 6. Elements of Cache Performance: “Think Locality” • Hot or Not: The 80/20 rule. – A small set of objects are very popular! – What is the most RT tweet? • Hit or Miss: Hit Ratio – How effective is your cache? – LRU, LFU, FIFO.. Expiration • Long-lived objects lead to better locality. • Spikes happen – Cascading events – Cache Thrash: full table scans
  • 7. Real World Performance • Facebook Inbox – Writes:0.12ms, Reads:15ms @ 50GB data • Twitter performance – Twissandra (simulation) • Cassandra for Search & Portals – Lucandra, solandra (simulation) • ycbs/PNUTS benchmarks – 5ms read/writes @ 5k ops/s (50/50 Update heavy) – 8ms reads/5ms writes @ 5k ops/s (95/5 read heavy) • Lab environment – ~5k writes per sec per node, <5ms latencies – ~10k reads per sec per node, <5ms latencies • Performance has improved in newer versions
  • 8. yahoo cloud store benchmark 50/50 – Update Heavy
  • 9. yahoo cloud store benchmark 95/5 – read heavy
  • 10. JVM in BigData Land! Limits for scale • Locks : synchronized – Can’t use all my multi-cores! – java.util.collections also hold locks – Use non-blocking collections! • (de)Serialization is expensive – Hampers object portability – Use avro, thrift! • Object overhead – average enterprise collection has 3 elements! – Use byte[ ], primitives where possible! • Garbage Collection – Can’t throw memory at the problem! – Mitigate, Monitor, Measure foot print
  • 11. Tools • What is the JVM doing: – dtrace, hprof, introscope, jconsole, visualvm, yourkit, azul zvision • Invasive JVM observation tools – bci, jvmti, jvmdi/pi agents, jmx, logging • What is the OS doing: – dtrace, oprofile, vtune • What is the network disk doing: – Ganglia, iostat, lsof, netstat, nagios
  • 12. furiously fast writes • Append only writes – Sequential disk access • No locks in critical path • Key based atomicity client issues write n1 partitioner commit log apply to memory n2 find node
  • 13. furiously fast writes • Use separate disks for commitlog – Don’t forget to size them well – Isolation difficult in the cloud.. • Memtable/SSTable sizes – Delicately balanced with GC • memtable_throughput_in_mb
  • 14. Cassandra on EC2 cloud *Corey Hulen, EC2
  • 16.
  • 17. Compactions K1 < Serialized data > K2 < Serialized data > K3 < Serialized data > -- -- -- Sorted K2 < Serialized data > K10 < Serialized data > K30 < Serialized data > -- -- -- Sorted K4 < Serialized data > K5 < Serialized data > K10 < Serialized data > -- -- -- Sorted MERGE SORT Loaded in memory K1 < Serialized data > K2 < Serialized data > K3 < Serialized data > K4 < Serialized data > K5 < Serialized data > K10 < Serialized data > K30 < Serialized data > Sorted K1 Offset K5 Offset K30 Offset Bloom Filter Index File Data File D E L E T E D
  • 18. Compactions • Intense disk io & mem churn • Triggers GC for tombstones • Minor/Major Compactions • Reduce priority for better reads • Other Parameters - – CompactionManager. minimumCompactionThreshold=xxxx
  • 21. reads performance • BloomFilter used to identify the right file • Maintain column indices to look up columns – Which can span different SSTables • Less io than typical b-tree • Cold read: Two seeks – One for Key lookup, another row lookup • Key Cache – Optimized in latest cassandra • Row Cache – Improves read performance – GC sensitive for large rows. • Most (google) applications require single row transactions* *Sanjay G, BigTable Design, Google.
  • 22. Client Performance Marshal Arts: Ser/Deserialization • Clients dominated by Thrift, Avro – Hector, Pelops • Thrift: upgrade to latest: 0.5, 0.4 • No news: java.io.Serializable is S.L..O.…W • Use “transient” • avro, thrift, proto-buf • Common Patterns of Doom: – Death by a million gets
  • 23. Serialization + Deserialization uBench• http://code.google.com/p/thrift-protobuf-compare/wiki/BenchmarkingV2
  • 24. Adding Nodes • New nodes – Add themselves to busiest node – And then Split its Range • Busy Node starts transmit to new node • Bootstrap logic initiated from any node, cli, web • Each node capable of ~40MB/s – Multiple replicas to parallelize bootstrap • UDP for control messages • TCP for request routing
  • 25. inter-node comm • Gossip Protocol – It’s exponential – (epidemic algorithm) • Failure Detector – Accrual rate phi • Anti-Entropy – Bringing replicas to uptodate
  • 26. Bloom Filter: in full bloom • “constant” time • size:compact • false positives • Single lookup for key in file • Deletion • Improve – Counting BF – Bloomier filters
  • 27. Birthdays, Collisions & Hashing functions • Birthday Paradox For the N=21 people in this room Probability that at least 2 of them share same birthday is ~0.47 • Collisions are real! • An unbalanced HashMap behaves like a list O(n) retrieval • Chaining & Linear probing • Performance Degrades • with 80% table density •
  • 28. the devil’s in the details
  • 29. CFS • All in the family! • denormalize
  • 30. Memtable • In-memory • ColumnFamily specific • throughput determines size before flush • Larger memtables can improve reads
  • 31. SSTable • MemTable “flushes” to a SSTable • Immutable after • Read: Multiple SSTable lookups possible • Chief Execs: – SSTableWriter – SSTableReader
  • 36. U U I D • java.util.UUID is slow – static use leads to contention SecureRandom • Uses /dev/urandom for seed initialization -Djava.security.egd=file:/dev/urandom • PRNG without file is atleast 20%-40% better. • Use TimeUUIDs where possible – much faster • JUG – java.uuid.generator • http://github.com/cowtowncoder/java-uuid-generator • http://jug.safehaus.org/ • http://johannburkard.de/blog/programming/java/Java-UUID-generators-compared.html
  • 37. synchronized • Coarse grained locks • io under lock • Stop signal on a highway • java.util.concurrent does not mean no locks • Non Blocking, Lock free, Wait free collections
  • 38. Scalable Lock-Free Coding Style • Big Array to hold Data • Concurrent writes via: CAS & Finite State Machine – No locks, no volatile – Much faster than locking under heavy load – Directly reach main data array in 1 step • Resize as needed – Copy Array to a larger Array on demand – Use State Machine to help copy – “ Mark” old Array words to avoid missing late updates
  • 39. Non-Blocking HashMap 0 100 200 300 400 500 600 700 800 0 200 400 600 800 1000 1200 Threads M-ops/sec 0 100 200 300 400 500 600 700 800 0 200 400 600 800 1000 1200 Threads M-ops/sec NB-99 CHM-99 NB-75 CHM-75 1K Table 1M Table NB CHM Azul Vega2 – 768 cpus
  • 40. Cassandra uses High Scale Non-Blocking Hashmap public class BinaryMemtable implements IFlushable { … private final Map<DecoratedKey,byte[]> columnFamilies = new NonBlockingHashMap<DecoratedKey, byte[]>(); /* Lock and Condition for notifying new clients about Memtable switches */ private final Lock lock = new ReentrantLock(); Condition condition; … } public class Table { … private static final Map<String, Table> instances = new NonBlockingHashMap<String, Table>(); … }
  • 41. GC-sensitive elements within Cassandra • Compaction triggers System.gc() – Tombstones from files • “GCInspector” • Memtable Threshold, sizes • SSTable sizes • Low overhead collection choices
  • 42. Garbage Collection • Pause Times if stop_the_word_FullGC > ttl_of_node => failed requests; failure accrual & node repair. • Allocation Rate – New object creation, insertion rate • Live Objects (residency) – if residency in heap > 50% – GC overheads dominate. • Overhead – space, cpu cycles spent GC • 64-bit not addressing pause times – Bigger is not better!
  • 43. Memory Fragmentation • Fragmentation – Performance degrades over time – Inducing “Full GC” makes problem go away – Free memory that cannot be used • Reduce occurrence – Use a compacting collector – Promote less often – Use uniform sized objects • Solution – unsolved – Use latest CMS with CR:6631166 – Azul’s Zing JVM & Pauseless GC
  • 45. Best Practices: Garbage Collection • GC Logs are cheap even in production -Xloggc:/var/log/cassandra/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintHeapAtGC • Slightly expensive ones: -XX:PrintFLSStatistics=2 -XX:CMSStatistics=1 -XX:CMSInitiationStatistics
  • 46. Sizing: Young Generation • Should we set –Xms == -Xmx ? • Use –Xmn (fixed eden) survivor spaces allocations {new Object();} eden promotion old generation allocation by jvm survivor ratio Tenuring Threshold
  • 47. Tuning CMS • Don’t promote too often! – Frequent promotion causes fragmentation • Size the generations – Min GC times are a function of Live Set – Old Gen should host steady state comfortably • Parallelize on multicores: – -XX:ParallelCMSThreads=4 – -XX:ParallelGCThreads=4 • Avoid CMS Initiating heuristic – -XX:+UseCMSInitiationOccupanyOnly • Use Concurrent for System.gc() – -XX:+ExplicitGCInvokesConcurrent
  • 48. Summary Design & Implementation of Cassandra takes advantages of strengths while avoiding common JVM issues. • Locks: – Avoids locks in critical path – Uses non-blocking collections, TimeUUIDs! – Still Can’t use all my multi-cores..? >> Other bottlenecks to find! • De/Serialization: – Uses avro, thrift! • Object overhead – Uses mostly byte[ ], primitives where possible! • Garbage Collection – Mitigate: Monitor, Measure foot print. – Work in progress by all jvm vendors! Cassandra starts from a great footing from a JVM standpoint and will reap the benefits of the platform!
  • 49. Q&AReferences • Verner Wogels, Eventually Consistent http://www.allthingsdistributed.com/2008/12/eventually_consistent.htm • Bloom, Burton H. (1970), "Space/time trade-offs in hash coding with allowable errors" • Avinash Lakshman, http://static.last.fm/johan/nosql- 20090611/cassandra_nosql.pdf • Eric Brewer, CAP http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf • Tony Printzeis, Charlie Hunt, Javaone Talk http://www.scribd.com/doc/36090475/GC-Tuning-in-the-Java • http://github.com/digitalreasoning/PyStratus/wiki/Documentation • http://www.cs.cornell.edu/home/rvr/papers/flowgossip.pdf • Cassandra on Cloud, http://www.coreyhulen.org/?p=326 • Cliff Click’s, Non-blocking HashMap http://sourceforge.net/projects/high-scale-lib/ • Brian F. Cooper., Yahoo Cloud Storage Benchmark, http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf

Notas do Editor

  1. Typical write operation involves a write into a commit log for durability and recoverability and an update into an in-memory data structure. The write into the in-memory data structure is performed only after a successful write into the commit log. We have a dedicated disk on each machine for the commit log since all writes into the commit log are sequential and so we can maximize disk throughput. When the in-memory data structure crosses a certain threshold, calculated based on data size and number of objects, it dumps itself to disk. This write is performed on one of many commodity disks that machines are equipped with. All writes are sequential to disk and also generate an index for ecient lookup based on row key. These indices are also persisted along with the data le. Over time many such les could exist on disk and a merge process runs in the background to collate the different les into one le. This process is very similar to the compaction process that happens in the Bigtable system
  2. “A typical read operation rst queries the in-memory data structure before looking into the les on disk. The files are looked at in the order of newest to oldest. When a disk lookup occurs we could be looking up a key in multiple les on disk. In order to prevent lookups into les that do not contain the key, a bloom lter, summarizing the keys in the le, is also stored in each data le and also kept in memory. This bloom lter is rst consulted to check if the key being looked up does indeed exist in the given le. A key in a column family could have many columns. Some special indexing is required to retrieve columns which are further away from the key. In order to prevent scanning of every column on disk we maintain column indices which allow us to jump to the right chunk on disk for column retrieval. As the columns for a given key are being serialized and written out to disk we generate indices at every 256K chunk boundary. This boundary is congurable, but we have found 256K to work well for us in our production workloads.”
  3. Description of Graph Shows the average number of cache misses expected when inserting into a hash table with various collision resolution mechanisms; on modern machines, this is a good estimate of actual clock time required. This seems to confirm the common heuristic that performance begins to degrade at about 80% table density. It is based on a simulated model of a hash table where the hash function chooses indexes for each insertion uniformly at random. The parameters of the model were: You may be curious what happens in the case where no cache exists. In other words, how does the number of probes (number of reads, number of comparisons) rise as the table fills? The curve is similar in shape to the one above, but shifted left: it requiresan average of 24 probes for an 80% full table, and you have to go down to a 50% full table for only 3 probes to be required on average. This suggests that in the absence of a cache, ideally your hash table should be about twice as large for probing as for chaining.