At Instagram, our mission is to capture and share the world's moments. Our app is used by over 400M people monthly; this creates a lot of challenging data needs. We use Cassandra heavily, as a general key-value storage. In this presentation, I will talk about how we use Cassandra to serve our critical use cases; the improvements/patches we made to make sure Cassandra can meet our low latency, high scalability requirements; and some pain points we have.
About the Speaker
Dikang Gu Software Engineer, Facebook
I'm a software engineer at Instagram core infra team, working on scaling Instagram infrastructure, especially on building a generic key-value store based on Cassandra. Prior to this, I worked on the development of HDFS in Facebook. I got the master degree of Computer Science in Shanghai Jiao Tong university in China.
8. USE CASE 1
Feed
8
PUSH
When posting, we push the media information to the followers' feed store.
When reading, we fetch the feed ids from the viewer's feed store.
9. USE CASE 1
Feed
• Write QPS: 1M+
• Avg/P99 Read Latency : 20ms/100ms
• Data Model:
user_id —> List(media_id)
10. USE CASE 2
Metadata store
10
Applications use C* as a key value store, they store a list of blobs associated with a key, and do point query or range query during the read
time.
11. USE CASE 2
Metadata store
• Read/Write QPS: 100K+
• Avg read size: 50KB
• Avg/P99 Read Latency : 7ms/50ms
• Data Model:
user_id —> List(Blob)
19. PENDING RANGES
Problem
19
• CPU usage +30% when bootstrapping new nodes.
• Client requests latency jumps and timeouts
• Multimap<Range<Token>, InetAddress> PendingRange
• In-efficient O(n) pendingRanges lookup for request
20. PENDING RANGES
Solution
20
• Cassandra-9258
• Use two NavigableMaps to implement the pending ranges
• We can expand or shrink the cluster without affecting requests
• Thanks Branimir Lambov for patch review and feedbacks.
24. COMPACTION IMPROVEMENTS
24
• Track the write amplification. (CASSANDRA-11420)
• Optimize the overlapping lookup. (CASSANDRA-11571)
• Optimize the isEOF() checking. (CASSANDRA-12013)
• Avoid searching for column index. (CASSANDRA-11450)
• Persist last compacted key per level. (CASSANDRA-6216)
• Compact tables before making available in L0. (CASSANDRA-10862)
26. BIG HEAP SIZE
26
• 64G max heap size
• 16G new gen size
• -XX:MaxTenuringThreshold=6
• Young GC every 10 seconds
• Avoid full GC
• 2X P99 latency drop
28. NODETOOL REBUILD
28
• rebuild may fail for nodes with TBs of data
• Cassandra-10406
• support to rebuild the failed token ranges
• Thanks Yuki Morishita for reviewing