ShareChat is a social media app with ~180 MAU and 50M DAU. We capture and aggregate various engagement metrics, viz. likes, views, shares, comments, etc., at a post level to curate better content for our users. In terms of numbers for the engagement metrics, we have writes and reads happening at a scale of 55k-60k ops/sec and 290k-300k ops/sec, respectively. With these engagement metrics directly impacting users, we need a datastore that would offer lower latencies and is highly available, resilient, and scalable. It would be better if we could achieve all of these at an optimal cost. This is to learn how we accomplished the abovementioned criteria by using in-house Kafka streams and ScyllaDB.
Aggregations at Scale for ShareChat —Using Kafka Streams and ScyllaDB
1. Aggregations at Scale for
ShareChat — Using Kafka
Streams and ScyllaDB
Charan Movva, Technical Lead
2. ■ About ShareChat
■ Why Streaming?
■ Requirements
■ Architecture and Deepdive
■ How is ScyllaDB helping us?
Agenda
3. ShareChat is India's largest home-grown regional social media platform.
■ We offer easy content consumption and sharing in 15 Indian languages
■ 125 Mn MAU
■ 1.3+ BN per month Shares
■ 31 Minutes per day
ShareChat
4. We capture a lot of client events around the engagement of a post
■ Multiple posts
■ Multiple levels of engagement
■ 370k-440k ops/sec
■ Showing these counters back to our users
■ Helps in curating the better content
Scale and Criticality of Engagement
Events
5. Possible different paradigms and issues wrt problem we are trying to solve
■ Request-response
■ Lowest-latency
■ 12500(12.5K) and 12599(12.5k) are same.
■ Batch processing
■ High-latency/high-throughput
■ Stream processing
■ Continuous and non-blocking
Why Stream Processing?
6. ■ Windowed aggregations
■ Support for multiple windows.
■ Triggers
■ Easy onboarding of new counters in future.
■ Easy onboarding of new triggers and aggregation windows.
Requirements?
12. Next Problem?
■ Heavy reads.
■ We need a datastore that could handle the increasing reads
with the best latency numbers possible.
13. Enter ScyllaDB
■ It is fast
■ Offers sub-millisecond latency
■ Better monitoring
■ Metrics visibility at DC, Cluster, Instance and shards
■ Min 50% lesser database costs
■ Well, it is the best
15. Battle Testing
■ Recent festival scale of 500K ops/sec.
■ The same setup handles the 5x-10x the current scale.
■ The cluster is stable even when the load crosses 90%.
17. We could not have been in this state without the contributions of these bright minds
■ Engineering: Shubham Dhal, Sanket Gawande, Prateek Bhargav
■ Dev Ops: Abhiroop Soni
■ Mentors/Leadership: Harshal Vora, Geetish Nayak, Chhaya Sharma
Also, you can learn more about the operational challenges and
the problems we’ve encountered in our blog post
@https://sharechat.com/blogs/engineering/streaming-aggregations-at-scale
The Team
18. Thank You
Stay in Touch
Charan Movva
charan@sharechat.co
https://twitter.com/iamCharanMovva
https://github.com/charanmovva
https://linkedin.com/in/charanmovva