Yahoo! JAPAN is one of the most successful internet service companies in Japan. Their NoSQL Team's Takahiro Iwase and Murukesh Mohanan have been testing out ScyllaDB, comparing it with Cassandra on multiple parameters: performance (both throughout and latency), reliability and ease of use. They will discuss the motivations behind their search for a successor of Cassandra that can handle exceedingly heavy traffic, and their evaluation of ScyllaDB in this regard.
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Scylla Summit 2018: Cassandra and ScyllaDB at Yahoo! Japan
1. Cassandra and ScyllaDB
at Yahoo! JAPAN
Murukesh Mohanan
Takahiro Iwase
Yahoo Japan corp.
Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
2. Presenters bio
Murukesh Mohanan (Muru) has a Master's in CS from IIT Bombay and
Bachelor's in Mechanical Engineering from IIT Guwahati. He has been working
at Yahoo! JAPAN since 2016, in the NoSQL team, striving to improve processes
whilst keeping an eye on the latest tech.
Takahiro Iwase joined Yahoo! JAPAN in 2018, working on day-to-day
operations and tuning of NoSQL. Before joining Yahoo! JAPAN he developed
Okuyama, an open source NoSQL database and has been a published
contributor on large-scale database research.
4. Yahoo! In JAPAN (YJ)?
Many Strong Services
Media Search Video Answer Mail
JP
JP
Membership C2C Payment C2C EC B2C EC Local
Search Knowledge search MailNews
YAHUOKU!Premium Wallet Loco
5. YJ’s advertisement platform
▪ Reaches 90% of smartphone users in Japan
Yahoo! In JAPAN (YJ)?
Numerous access to YJ apps and sites
▪ 27 billion PVs/month from PC
▪ 43 billion PVs/month from smartphone
▪ 33 million unique PC browsers/month
▪ 56 million unique SP browsers/month
https://promotionalads.yahoo.co.jp/
6. Cassandra in YJ
Number of clusters and nodes in YJ
YJ has its own data centers and a huge number of servers;
it is quite rare in Japanese internet companies
8. Test Setup
Tools
▪ Docker on CentOS on OpenStack on ...
▪ Prometheus
• Node-exporter
• Prometheus JMX exporter
▪ Grafana (with ScyllaDB’s dashboards)
▪ cassandra-stress
9. Test Setup
System specs
▪ OpenStack nodes: 16v/16 GB/100G NVMe/1Gbps NIC
▪ cassandra-stress executed on a node with same specs
▪ Initially started out with spinning rust
• neither outshone the other
10. Test Setup
System specs
▪ Switched to 8core/8GB/100GB NVMe but Cassandra would
occasionally exhaust heap memory
▪ So bumped to 16core/16GB
13. Test Setup
cassandra-stress config
▪ Initially tried with fixed ops total count (n=xyz)
▪ Switched to fixed duration (t=Xm) (3m at first, 10m later)
▪ User profiles instead of the read/write/mixed options
▪ CL=ONE
▪ Truncate once, no warmup
▪ Unlogged batches
14. Test Setup
cassandra-stress profiles
▪ Tried multiple workloads
1. select/insert (2:1)
2. select/insert/update/delete (4:1:1:1)
3. select/insert/update (4:1:1)
▪ This presentation will focus on results with case (1, 2), but the
conclusions are applicable to the other cases as well.
16. Case1 - OPS
▪ Up until about 16 threads, C* and Scylla were level
▪ After that, Cassandra plateaued out at about 11k op/s, while Scylla kept
going on till ~30k op/s
Case1
17. Case1 - 99% tile
▪ Both were somewhat even until about 36 threads, with ~30ms latency
▪ Cassandra's latency started increasing much faster than Scylla's
Case1Case1
18. Case1 - 99.9% tile
▪ Cassandra was worse off pretty much from the start
▪ Scylla seems to handle ~100 threads with the same latency as
Cassandra at 8 threads
Case1Case1
19. Case4 - OPS
▪ Cassandra was no response for 54~ threads
▪ The performance curve continued to rise up to 270 threads of Scylla.
▪ High performances were seen in all operations.
Case1Case4Case4
21. CPU - Cassandra
▪ CPU is underutilized (<50% usage)
▪ While the client thread count rises, Cassandra's CPU usage levels off very
quickly.
22. CPU – ScyllaDB
▪ Scylla takes far more advantage of available CPU
▪ User-mode utilization is also high
23. Context Switches – Cassandra
▪ Scylla takes far more advantage of available CPU
▪ User-mode utilization is also high
24. Context Switches – ScyllaDB
▪ Far fewer context switches
▪ No extreme jumps in interrupts
25. Cassandra – GC Used heap size
▪ Assigned heap size of 8GB
26. Cassandra – GC Time
▪ Time and CPU taken by GC increasing with load
4 8 16 24 36 54 84 120 180 270
27. Takeaways
ScyllaDB
▪ Higher performance than C*
• 2-3x op/s, latency is lower
• More stable
• Better able to take advantage of concentrated power
▪ Less babysitting required!
28. Takeaways
Cassandra
▪ With some babysitting, might be able to catch up with Scylla on
defaults
• More tunables ⇔ double-edged sword
▪ G1GC helped with handling higher thread counts
• Consistent crashes with CMS
• Even so, G1GC’s full potential not seen yet
29. Takeaways
Tools
▪ Docker
• Host networking helps: Across all test cases, op/s went up 80-90
%, latency decreased.
▪ Cassandra-stress
• HTML is nice, but graphs should automatically distinguish between
results for various thread counts
• Documentation could be more complete
Hello everyone,
In this presentation, First, I introduce the use of Cassandra by Yahoo japan.
・It is the presenter of today.
・My name is Takahiro Iwase, working on operations and tuning of Cassandra.
YJ‘s services are quite popular.
You can understand it when you see this number, 90% of smartphone users in Japan use some of YJ‘s services.
Numbers of PVs and Unique Browsers are also quite huge, for example, about 70 billions PVs monthly occurs in YJ‘s services.
Such huge volume of access means that YJ has to manage a large amount of data.
As you can see, the number of Kassandra clusters has increased rapidly in two years.
We believe that the number will grow in the future.
But I also have problems.
First of problem. Cassandra is used java virtula machine. Java has GC's problem.
Second of problem. It is very hard to maintain many nodes.
To resolve of these problems, now YJ nosql team is evaluating a ScyllaDB.
My coworker, Murukesh, will talk about the detail next.