Mais conteúdo relacionado Semelhante a Apache Cassandra in Bangalore - Cassandra Internals and Performance (20) Apache Cassandra in Bangalore - Cassandra Internals and Performance1. BANGALORE CASSANDRA UG APRIL 2013
CASSANDRA INTERNALS &
PERFORMANCE
Aaron Morton
@aaronmorton
www.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
8. Thrift Transport
//Custom TServer implementations
o.a.c.thrift.CustomTThreadPoolServer
o.a.c.thrift.CustomTNonBlockingServer
o.a.c.thrift.CustomTHsHaServer
10. Native Binary Transport
Beta in Cassandra 1.2
Uses Netty 3.5
Enabled with
start_native_transport
(Disabled by default)
11. o.a.c.transport.Server.run()
//Setup the Netty server
new ExecutionHandler()
new NioServerSocketChannelFactory()
ServerBootstrap.setPipelineFactory()
13. o.a.c.transport.messages
CredentialsMessage()
EventMessage()
ExecuteMessage()
PrepareMessage()
QueryMessage()
ResultMessage()
(And more...)
14. Messages
Defined in the Native Binary
Protocol
$SRC/doc/native_protocol.spec
17. JMX Management Beans
Registered with the names
such as
org.apache.cassandra.db:
type=StorageProxy
19. o.a.c.cli.CliMain.main()
// Connect to server to read input
this.connect()
this.evaluateFileStatements()
this.processStatementInteractive()
20. CLI Grammar
ANTLR Grammar
$SRC/src/java/o/a/c/cli/CLI.g
30. o.a.c.cql3.QueryProcessor
// Prepares and executes CQL3 statements
// Used by Thrift & Native transports
// Access control
// Input validation
// Returns transport.ResultMessage
38. Dynamo Layer
o.a.c.service
o.a.c.net
o.a.c.dht
o.a.c.locator
o.a.c.gms
o.a.c.stream
39. o.a.c.service.StorageProxy
// Cluster wide storage operations
// Select endpoints & check CL available
// Send messages to Stages
// Wait for response
// Store Hints
45. Dynamo Layer
o.a.c.service
o.a.c.net
o.a.c.dht
o.a.c.locator
o.a.c.gms
o.a.c.stream
51. o.a.c.net.MessageDeliveryTask.run()
// If dropable and rpc_timeout
MessagingService.incrementDroppedMessag
es(verb);
MessagingService.getVerbHandler(verb)
verbHandler.doVerb(message, id)
52. Dynamo Layer
o.a.c.service
o.a.c.net
o.a.c.dht
o.a.c.locator
o.a.c.gms
o.a.c.stream
55. Dynamo Layer
o.a.c.service
o.a.c.net
o.a.c.dht
o.a.c.locator
o.a.c.gms
o.a.c.stream
59. Dynamo Layer
o.a.c.service
o.a.c.net
o.a.c.dht
o.a.c.locator
o.a.c.gms
o.a.c.stream
63. o.a.c.gms.Gossiper.GossipTask.run()
// SYN -> ACK -> ACK2
makeRandomGossipDigest()
new GossipDigestSyn()
// Use MessagingService.sendOneWay()
Gossiper.doGossipToLiveMember()
Gossiper.doGossipToUnreachableMember()
Gossiper.doGossipToSeed()
68. Database Layer
o.a.c.concurrent
o.a.c.db
o.a.c.cache
o.a.c.io
o.a.c.trace
71. Database Layer
o.a.c.concurrent
o.a.c.db
o.a.c.cache
o.a.c.io
o.a.c.trace
72. o.a.c.db.Table
// Keyspace
open(String table)
getColumnFamilyStore(String cfName)
getRow(QueryFilter filter)
apply(RowMutation mutation,
boolean writeCommitLog)
73. o.a.c.db.ColumnFamilyStore
// Column Family
getColumnFamily(QueryFilter filter)
getTopLevelColumns(...)
apply(DecoratedKey key,
ColumnFamily columnFamily,
SecondaryIndexManager.Updater
indexer)
76. o.a.c.db.Memtable
put(DecoratedKey key,
ColumnFamily columnFamily,
SecondaryIndexManager.Updater
indexer)
flushAndSignal(CountDownLatch latch,
Future<ReplayPosition>
context)
81. Today.
Write Path
Read Path
82. memtable_flush_queue_size test...
m1.xlarge Cassandra node
m1.xlarge client node
1 CF with 6 Secondary Indexes
1 Client Thread
10,000 Inserts, 100 Columns per Row
1100 bytes per Column
83. CF write latency and memtable_flush_queue_size...
memtable_flush_queue_size=7 memtable_flush_queue_size=1
1,200
900
Latency Microseconds
600
300
0
85th 95th 99th 100th
84. Request latency and memtable_flush_queue_size...
memtable_flush_queue_size=7 memtable_flush_queue_size=1
5,000,000
3,750,000
Latecy Microseconds
2,500,000
1,250,000
0
85th 95th 99th 100th
86. Request latency and durable_writes (1 client)...
enabled disabled
7,000
5,250
Latency Microseconds
3,500
1,750
0
85th 95th 99th
87. Request latency and durable_writes (10 clients)...
enabled disabled
30,000
22,500
Latency Microseconds
15,000
7,500
0
85th 95th 99th
88. Request latency and durable_writes (20 clients)...
enabled disabled
90,000
67,500
Latency Microseconds
45,000
22,500
0
85th 95th 99th
90. periodic commit log adds mutation to
queue then acknowledges.
Commit Log is appended to by a single
thread, sync is called every
commitlog_sync_period_in_ms.
91. Request latency and commitlog_sync_period_in_ms...
10,000 ms 10 ms
220
208
Latecy Microseconds
195
183
170
85th 95th 99th
92. batch commit log adds mutation to queue
and waits before acknowledging.
Writer thread processes mutations for
commitlog_sync_batch_window_in_
ms duration, then syncs, then signals.
94. Merge mutation...
Row level Isolation provided
via SnapTree.
(https://github.com/nbronson/snaptree)
96. CF Write Latency and row concurrency (10 clients)...
different rows single row
2,000
1,500
Latecy Microseconds
1,000
500
0
85th 95th 99th
99. Request latency and index concurrency (10 clients)...
different rows single row
4,000
3,000
Latecy Microseconds
2,000
1,000
0
85th 95th 99th
100. Index tests...
10,000 Inserts
50 Columns per Row
50 bytes per Column
101. Request latency and secondary indexes...
no indexes six indexes
3,000
2,250
Latecy Microseconds
1,500
750
0
85th 95th 99th
102. Today
Write Path
Read Path
104. CF read latency and bloom_filter_fp_chance...
default 0.000744. 0.1
7,000
5,250
Latecy Microseconds
3,500
1,750
0
85th 95th 99th
106. CF read latency and key_cache_size_in_mb...
default (100MB) 100% Hit Rate disabled
300
225
Latecy Microseconds
150
75
0
85th 95th 99th
107. index_interval tests...
100,000 Rows
50 Columns per Row
50 bytes per Column
key_cache_size_in_mb: 0
Read 1 Column from random 10% of Rows
108. CF read latency and index_interval...
index_interval=128 (default) index_interval=512
20,000
15,000
Latecy Microseconds
10,000
5,000
0
85th 95th 99th
110. CF read latency and row_cache_size_in_mb...
row_cache_size_in_mb=0 and key_cache_size_in_mb=100mb
row_cache_size_in_mb=100mb and key_cache_size_in_mb=0
260
195
Latecy Microseconds
130
65
0
85th 95th 99th
111. Column Index tests...
Read first Column by name from 1,200
Columns.
Read first Column by name from 1,000,000
Columns.
112. CF read latency and Column Index...
First Column from 1,200 First Column from 1,000,000
6,000
4,500
Latecy Microseconds
3,000
1,500
0
85th 95th 99th
113. Name Locality tests...
1,000,000 Columns
50 bytes per Column
Read 100 Columns from middle of row.
Read 100 Columns from spread across row.
114. CF read latency and name locality...
Adjacent Columns Spread Columns
200,000
150,000
Latecy Microseconds
100,000
50,000
0
85th 95th 99th
115. Start position tests...
1,000,000 Columns
50 bytes per Column
Read first 100 Columns without start.
Read first 100 Columns with start.
116. CF read latency and start position...
Without start position With start position
40,000
30,000
Latecy Microseconds
20,000
10,000
0
85th 95th 99th
117. Start offset tests...
1,000,000 Columns
50 bytes per Column
Read first 100 Columns with start.
Read middle 100 Columns with start.
118. CF read latency and start offset...
First MIddle
40,000
30,000
Latecy Microseconds
20,000
10,000
0
85th 95th 99th
119. Start offset tests...
1,000,000 Columns
50 bytes per Column
Read first 100 Columns without start.
Read last 100 Columns with reversed.
120. CF read latency and reversed...
Forward Reversed
40,000
30,000
Latecy Microseconds
20,000
10,000
0
85th 95th 99th
122. Aaron Morton
@aaronmorton
www.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License