Cassandra Troubleshooting (for 2.0 and earlier)

Troubleshooting Cassandra
nodetool & system.log deep dive
J.B. Langston, Senior Support Engineer

Company Confidential© 2014 DataStax, All Rights Reserved.
Agenda
2
1 Things to keep in mind
2 Useful Linux & Java tools
3 Useful nodetool commands
4 Things to look for in system.log

Troubleshooting Process
3
1 Ask what changed
2 Determine which nodes have problems
3 Find and understand errors
4 Examine bottlenecks
5 Determine root cause
6 Take corrective action

Troubleshooting Process
4
1 Ask what changed
2 Determine which nodes have problems
3 Find and understand errors
4 Examine bottlenecks
5 Share relevant information
6 Ask for help

Company Confidential© 2014 DataStax, All Rights Reserved. 5
• Does it work in another
environment?
• Did you upgrade?
• Cassandra
• Kernel
• JVM
• Driver
• Change one thing at a time!
What changed?
• Did it work before?
• What did you change?
• Settings
• Application Code
• Read/Write Load
• Data Volume
• Hardware
• Network

• Background
• Flushes
• Compactions
• Garbage collections
• Gossip
• Hinted Handoff
• Read Repair
Cassandra Processes
• Startup
• Foreground
• Coordinating requests
• Local reads
• Local writes

• Disk
• Disk space
• I/O bandwidth
• Network
• OS Resources/Limits
• File Handles
• Processes
• Mapped Memory
System Resources
• CPU
• Single Core utilization
• Multi Core utilization
• Memory
• Heap space
• Off-heap space
• OS page cache
• Garbage Collection

Linux monitoring commands
Command What it tells you…
top CPU utilization and memory use per process
top -H CPU utilization per thread, memory use is still per process
df Free disk space
iostat -x I/O bandwidth utilization
free -m Memory and cache usage
netstat -an Network connections established
iftop Network bandwidth utilization
sar All (most) of the above, with history!

Java monitoring commands
jstack -l Status and stack trace of each thread
jmap -histo Types of objects on the heap (optionally only live objects)
jmap -heap Size and usage of each java heap generation
jstat -gccause Causes of gc activity
jmap -dump Take a heap dump for further analysis
MemoryAnalyzer Post-mortem heap-dump analysis

nodetool commands
status / ring Overall cluster status
info Status, memory usage, and caches for a single node
tpstats Statistics about each thread pool on a single node
cfstats Summary statistics for all tables and keyspaces on a single node
cfhistograms Detailed statistics for a specific table on the local node
proxyhistograms Latency statistics for requests coordinated the local node
compactionstats Compactions pending and in progress
netstats Network activity: streams, read repair, and in-flight commands

nodetool status
Note: Ownership information does not include topology; for complete information,
specify a keyspace
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 54.173.171.164 22.78 MB 1 16.7% 67b3823f-6663-47d0-a04f-5914081e275c 1b
DN 54.174.19.98 22.82 MB 1 16.7% 48d6f717-017b-4868-a525-b396d3f899aa 1b
UN 54.174.245.247 22.68 MB 1 16.7% 6817e9ca-e79d-4fed-946e-7318bcfd5343 1b
Datacenter: us-west
===================
Status=Up/Down
UN 54.153.107.100 22.72 MB 1 16.7% 4abf0a7a-00ef-441a-9f70-046cd9fe1c0c 1a
UN 54.153.108.157 22.79 MB 1 16.7% 303f08dd-2a19-4175-98e7-97920232855b 1a
UN 54.153.39.203 22.67 MB 1 16.7% d1a57a91-7aef-4878-a056-88949920724c 1a

nodetool status
specify a keyspace
Datacenter: us-east
===================
Status=Up/Down
UN 54.173.171.164 22.78 MB 1 16.7% 67b3823f-6663-47d0-a04f-5914081e275c 1b
DN 54.174.19.98 22.82 MB 1 16.7% 48d6f717-017b-4868-a525-b396d3f899aa 1b
Datacenter: us-west
===================
Status=Up/Down
UN 54.153.108.157 22.79 MB 1 16.7% 303f08dd-2a19-4175-98e7-97920232855b 1a
UN 54.153.39.203 22.67 MB 1 16.7% d1a57a91-7aef-4878-a056-88949920724c 1a

nodetool ring
specify a keyspace
Datacenter: us-east
==========
Address Rack Status State Load Owns Token
-3074457345618258603
54.174.19.98 1b Down Normal 22.82 MB 16.67% -9223372036854775808
54.174.245.247 1b Up Normal 22.68 MB 16.67% -6148914691236517206
54.173.171.164 1b Up Normal 22.78 MB 16.67% -3074457345618258603
Datacenter: us-west
==========
Address Rack Status State Load Owns Token
6148914691236517205
54.153.39.203 1a Up Normal 22.67 MB 16.67% 0
54.153.107.100 1a Up Normal 22.72 MB 16.67% 3074457345618258602
54.153.108.157 1a Up Normal 22.79 MB 16.67% 6148914691236517205

nodetool info
Token : -6148914691236517206
ID : 6817e9ca-e79d-4fed-946e-7318bcfd5343
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 22.68 MB
Generation No : 1426523950
Uptime (seconds) : 1557
Heap Memory (MB) : 270.85 / 1842.00
Off Heap Memory (MB) : 0.11
Data Center : us-east
Rack : 1b
Exceptions : 0
Key Cache : size 2368 (bytes), capacity 96468992 (bytes), 104 hits, 128 requests,
0.813 recent hit rate, 14400 save period in seconds
Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests,
NaN recent hit rate, 0 save period in seconds

nodetool info
Token : -6148914691236517206
Load : 22.68 MB
Heap Memory (MB) : 270.85 / 1842.00
Rack : 1b
Exceptions : 0

nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 2 20487 0 0
RequestResponseStage 0 0 2787111 0 0
MutationStage 32 8713 1735129 0 0
ReadRepairStage 0 0 2641 0 0
ReplicateOnWriteStage 0 0 0 0 0
GossipStage 0 0 34421 0 0
CacheCleanupExecutor 0 0 1 0 0
MigrationStage 0 0 1 0 0
MemoryMeter 1 1 71 0 0
FlushWriter 1 4 73 0 5
ValidationExecutor 0 0 42 0 0
InternalResponseStage 0 0 188 0 0
AntiEntropyStage 0 0 610 0 0
MemtablePostFlusher 1 4 358 0 0
MiscStage 0 0 43 0 0
PendingRangeCalculator 0 0 6 0 0
commitlog_archiver 0 0 0 0 0
CompactionExecutor 2 5 125 0 0
AntiEntropySessions 1 1 13 0 0
HintedHandoff 2 4 34 0 0
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 7
PAGED_RANGE 0
BINARY 0
READ 0
MUTATION 42993
_TRACE 0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0

nodetool tpstats
ReadStage 0 2 20487 0 0
MutationStage 32 8713 1735129 0 0
RANGE_SLICE 0
READ_REPAIR 7
PAGED_RANGE 0
BINARY 0
READ 0
MUTATION 42993
_TRACE 0
REQUEST_RESPONSE 0
COUNTER_MUTATION 0

nodetool cfstats
Keyspace: Keyspace1
Read Count: 282781
Read Latency: 1.0155043372786716 ms.
Write Count: 2700184
Write Latency: 0.07415620861393149 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 5
Space used (live), bytes: 656644536
Space used (total), bytes: 656644536
Off heap memory used (total), bytes: 4221251
SSTable Compression Ratio: 0.0
Number of keys (estimate): 2182272
Memtable cell count: 215155
Memtable data size, bytes: 67222188
Memtable switch count: 72
Local read count: 282781
Local read latency: 1.016 ms
Local write count: 2700943
Local write latency: 0.074 ms
Pending tasks: 0
Bloom filter false positives: 379
Bloom filter false ratio: 0.00097
Bloom filter space used, bytes: 3897360
Bloom filter off heap memory used, bytes: 3897320
Index summary off heap memory used, bytes: 323931
Compression metadata off heap memory used, bytes: 0
Compacted partition minimum bytes: 259
Compacted partition maximum bytes: 310
Compacted partition mean bytes: 310
Average live cells per slice (last five minutes): 5.0
Average tombstones per slice (last five minutes): 0.0

nodetool cfstats
Keyspace: Keyspace1
Read Count: 282781
Read Latency: 1.0155043372786716 ms.
Write Latency: 0.07415620861393149 ms.
Pending Tasks: 0
Table: Standard1
SSTable count: 5
Pending tasks: 0

nodetool cfhistograms
Keyspace1/Standard1 histograms
SSTables per Read
1 sstables: 14250
2 sstables: 42908
3 sstables: 97878
4 sstables: 126087
5 sstables: 745
6 sstables: 295
7 sstables: 122
8 sstables: 448
10 sstables: 37
12 sstables: 10
14 sstables: 1
Partition Size (bytes)
1131752 bytes: 1093
1358102 bytes: 479
1629722 bytes: 364
1955666 bytes: 343
2346799 bytes: 73
2816159 bytes: 134
3379391 bytes: 7
4055269 bytes: 0
4866323 bytes: 2
Cell Count per Partition
103 cells: 112
124 cells: 5130
149 cells: 2458
179 cells: 1704
215 cells: 1460
258 cells: 823
310 cells: 1327
372 cells: 1331
446 cells: 49295
535 cells: 9091
642 cells: 11892
770 cells: 3908
924 cells: 14532
1109 cells: 20898
1331 cells: 8491
1597 cells: 8140
1916 cells: 4010
2299 cells: 19476
2759 cells: 15112
3311 cells: 10812
3973 cells: 12763
4768 cells: 18597
5722 cells: 5765
6866 cells: 11340

SSTables per Read
1 sstables: 14250
2 sstables: 42908
3 sstables: 97878
4 sstables: 126087
5 sstables: 745
6 sstables: 295
7 sstables: 122
8 sstables: 448
10 sstables: 37
12 sstables: 10
14 sstables: 1
1131752 bytes: 1093
1358102 bytes: 479
1629722 bytes: 364
1955666 bytes: 343
2346799 bytes: 73
2816159 bytes: 134
3379391 bytes: 7
4055269 bytes: 0
4866323 bytes: 2
103 cells: 112
124 cells: 5130
149 cells: 2458
179 cells: 1704
215 cells: 1460
258 cells: 823
310 cells: 1327
372 cells: 1331
446 cells: 49295
535 cells: 9091
642 cells: 11892
770 cells: 3908
924 cells: 14532
1109 cells: 20898
1331 cells: 8491
1597 cells: 8140
1916 cells: 4010
2299 cells: 19476
2759 cells: 15112
3311 cells: 10812
3973 cells: 12763
4768 cells: 18597
5722 cells: 5765
6866 cells: 11340

Write Latency (microseconds)
4 us: 83
5 us: 9004
6 us: 73288
7 us: 211239
8 us: 321305
10 us: 607511
12 us: 438101
14 us: 440503
17 us: 459036
20 us: 211763
24 us: 97240
29 us: 43564
35 us: 29579
42 us: 22575
50 us: 10442
60 us: 4842
72 us: 3566
86 us: 4323
103 us: 2038
124 us: 6192
149 us: 6850
179 us: 2657
215 us: 4536
258 us: 1839
Read Latency (microseconds)
42 us: 29
50 us: 132
60 us: 1225
72 us: 4791
86 us: 13031
103 us: 18526
124 us: 21151
149 us: 21085
179 us: 16528
215 us: 19259
258 us: 39965
310 us: 55015
372 us: 29344
446 us: 11347
535 us: 5692
642 us: 5395
770 us: 3550
924 us: 2649
1109 us: 1379
1331 us: 1084
1597 us: 1266
1916 us: 1492
2299 us: 2032
2759 us: 1494

4 us: 83
5 us: 9004
6 us: 73288
7 us: 211239
8 us: 321305
10 us: 607511
12 us: 438101
14 us: 440503
17 us: 459036
20 us: 211763
24 us: 97240
29 us: 43564
35 us: 29579
42 us: 22575
50 us: 10442
60 us: 4842
72 us: 3566
86 us: 4323
103 us: 2038
124 us: 6192
149 us: 6850
179 us: 2657
215 us: 4536
258 us: 1839
42 us: 29
50 us: 132
60 us: 1225
72 us: 4791
86 us: 13031
103 us: 18526
124 us: 21151
149 us: 21085
179 us: 16528
215 us: 19259
258 us: 39965
310 us: 55015
372 us: 29344
446 us: 11347
535 us: 5692
642 us: 5395
770 us: 3550
924 us: 2649
1109 us: 1379
1331 us: 1084
1597 us: 1266
1916 us: 1492
2299 us: 2032
2759 us: 1494

nodetool proxyhistograms
124 us: 18
149 us: 445
179 us: 5866
215 us: 14541
258 us: 19123
310 us: 20941
372 us: 32847
446 us: 37464
535 us: 26451
642 us: 25291
770 us: 37870
924 us: 35865
1109 us: 30432
1331 us: 22929
1597 us: 17592
1916 us: 12895
2299 us: 9760
2759 us: 7887
3311 us: 6024
3973 us: 4751
4768 us: 3691
5722 us: 3038
6866 us: 2453
8239 us: 1991
29 us: 4
35 us: 60
42 us: 331
50 us: 471
60 us: 1334
72 us: 2856
86 us: 4661
103 us: 65741
124 us: 37999
149 us: 42697
179 us: 45598
215 us: 30515
258 us: 23481
310 us: 21831
372 us: 22741
446 us: 26045
535 us: 33045
642 us: 43533
770 us: 51021
924 us: 57496
1109 us: 61548
1331 us: 60741
1597 us: 57538
1916 us: 52255

nodetool compactionstats
pending tasks: 7
compaction type keyspace table completed total unit progress
Compaction Keyspace1 Standard1 65967769 154639724 bytes 42.66%
Active compaction remaining time : 0h00m15s
pending tasks: 1
Validation Keyspace1 Standard1 74684443 95178582 bytes 78.47%
Active compaction remaining time : n/a

pending tasks: 7
pending tasks: 1

nodetool compactionhistory
Compaction History:
id keyspace_name columnfamily_name compacted_at bytes_in bytes_out
5d4adcc0-cbf9-11e4-9f32-098a653a7013 system local 1426523260044 1101 543
654f31c0-cc01-11e4-a84d-098a653a7013 Keyspace1 Standard1 1426526709468 53518892 53518892
59a7b070-cc1c-11e4-a84d-098a653a7013 system hints 1426538286327 106483 0
4b8e2b10-cbf2-11e4-bb39-098a653a7013 system schema_keyspaces 1426520223809 909 264
68429440-cc19-11e4-a84d-098a653a7013 Keyspace1 Standard1 1426537022340 59878420 59109472
f06de4a0-cc19-11e4-a84d-098a653a7013 Keyspace1 Standard1 1426537250794 227493794 133487816
3bec3ee0-cbf2-11e4-bb39-098a653a7013 system local 1426520197582 720 550
e263a080-cbfa-11e4-87be-098a653a7013 system schema_columns 1426523912832 27224 11517
c67f8c90-cc08-11e4-a84d-098a653a7013 system peers 1426529879001 2219 778
1edb6bf0-cc1a-11e4-a84d-098a653a7013 Keyspace1 Standard1 1426537328687 19723822 16552676
rows_merged
{1:247562, 2:67934, 3:2945, 4:2042, 5:824, 6:232, 7:2, 8:1}

Compaction History:
rows_merged
{1:247562, 2:67934, 3:2945, 4:2042, 5:824, 6:232, 7:2, 8:1}

nodetool netstats
Mode: NORMAL
Repair 028763b0-cc1e-11e4-a20c-a1d01a3fbf30
/54.174.19.98
Receiving 6 files, 117949006 bytes total
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-jb-162-Data.db
851792/17950738 bytes(4%) received from /54.174.19.98
Sending 2 files, 47709526 bytes total
/var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-jb-157-Data.db
3786324/46561942 bytes(8%) sent to /54.174.19.98
Repair 020ed850-cc1e-11e4-a20c-a1d01a3fbf30
/54.174.245.247
34195028/46561942 bytes(73%) sent to /54.174.245.247
Repair 018c88f0-cc1e-11e4-a20c-a1d01a3fbf30
/54.153.39.203 (using /172.31.10.65)
1147584/1147584 bytes(100%) sent to /54.153.39.203
46561942/46561942 bytes(100%) sent to /54.153.39.203
Read Repair Statistics:
Attempted: 39576
Mismatch (Blocking): 0
Mismatch (Background): 746
Pool Name Active Pending Completed
Commands n/a 58 2545817
Responses n/a 0 2833081

nodetool netstats
Mode: NORMAL
/54.174.19.98
3786324/46561942 bytes(8%) sent to /54.174.19.98
/54.174.245.247
34195028/46561942 bytes(73%) sent to /54.174.245.247
/54.153.39.203 (using /172.31.10.65)
1147584/1147584 bytes(100%) sent to /54.153.39.203
46561942/46561942 bytes(100%) sent to /54.153.39.203
Attempted: 39576

Cassandra Troubleshooting (for 2.0 and earlier)

Cassandra Troubleshooting (for 2.0 and earlier)

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (13)

Semelhante a Cassandra Troubleshooting (for 2.0 and earlier)

Semelhante a Cassandra Troubleshooting (for 2.0 and earlier) (20)

Último

Último (20)

Cassandra Troubleshooting (for 2.0 and earlier)

Notas do Editor