Hear about how Coursera uses Cassandra as the core of its scalable online education platform. I'll discuss the strengths of Cassandra that we leverage, as well as some limitations that you might run into as well in practice.
In the second part of this talk, we'll dive into how best to effectively use the Datastax Java drivers. We'll dig into how the driver is architected, and use this understanding to develop best practices to follow. I'll also share a couple of interesting bug we've run into at Coursera.
17. Use case #3: Video Workflows
17
Input.mp4
Step 1: Audio
Step 2: Low Res Video
Step 3: High Res Video
Assembly 1: Crash
Assembly 2: Ok
Assembly 3: Crash
Assembly 4: Ok
Assembly 5: Ok
21. Cassandra - Initial Pain Points
• Can’t execute arbitrary queries
• Filtering, sorting, etc.
• Can’t be abused as an OLAP database
• Worries about ‘eventual’ consistency
21
22. Gotchas
• Lots of truly ad-hoc queries is hard
• Don’t use C* directly to explore your data. (Spark?)
• Sorting, filtering can be hard
• Consider Solr / ElasticSearch
• Or even MySQL depending on load / importance
22
23. Helpful Things
• Data modeling consulting
• Monitoring
• Data access layer for common use cases
23
33. Default Retry Policy
• Retries read if enough replicas alive, but data fetch failed.
• Retries write only for batched writes.
• Retries next host on Unavailable. 2.0.11+ or 2.1.7 (JAVA-709)
33