2. Satoshi Konno
http://www.cybergarage.org
• Engineering Manager of NoSQL Team @ Yahoo! Japan
• Open Source Software Developer for
Virtual Reality, IoT and Cloud Computing
• Doctor's Course Student @ JAIST
Défago Lab : The φ accrual failure detector
About me
2
3. Agenda
• Company Profile
• Summary of C* Clusters
• Issues and Solutions of C*
• Next Generation Infrastructures for C*
5. Founded : January 31, 1996
Businesses : Internet Advertising
e-Commerce
Members Services, etc.
Web Services : 100+
Smartphone Apps: 50+ (iOS), 50+ (Android)
Employees : 5,800+ (as of June 30, 2016)
Head Office : Chiyoda-ku, Tokyo, Japan
Company Profile
5
6. Shareholder Composition
6
An independent and public company in the Japanese Market
U.S. Japan
35.5% 42.9%
Market Cap
$22 billion
Market Cap
$29 billion
Market Cap
$60 billion
7. 18th Largest Internet Company in market cap
7
0
100
200
300
400
500
600
bilion U.S. dollars
http://www.statista.com/statistics/277483/market-value-of-the-largest-internet-companies-worldwide/
10. Extensive Reach to a Wide Range of Users
10
80%
80% of all Japanese Internet users use Yahoo! JAPAN
Nielsen NetView June 2015 : Data by Brands. Access from home and work using PCs (excl. internet applications)
11. Many Strong Services
11
Media
US
Search Video Answer Mail
JP
US
JP
Membership C2C Payment C2C EC B2C EC Local
Search Knowledge search MailNews
YAHUOKU!Premium Wallet Loco
17. Our Use Case Summary on Cassandra
17
100
Systems
20
Database Caching
10
Advertising Services
40
User Databases
50
Service Databases
Browsing History
Impression Data
・・・・
Meta Data
Aggregated Data
・・・・
Generated Data
Session Data
Meta Data
Aggregated Data
・・・・
Generated Data
Recommendation
Demographic Data
Life Log
・・・・
Preference Data
Behavior History
25. ISSUE #2 : Boostrap Problem - Driver
• Heavy Services : ↑3000qps/node
= C* cluster with real servers (SSD is recommended)
• Light Services : ↓1000qps/node and ↓3GB/node
= C * cluster with virtual servers on OpenStack
25
Heavy Service Light Service
CPU = Good
vCPU = Cheap
26. ISSUE #2 : Boostrap Problem - Driver
• PROBLEM : All processes in each front-end server tries
to connect a new C* node which is added into the cluster
at the same time ...
26
..........
! ! !
! ! !
vCPU = Cheap
PHOTO:AFLO
27. ISSUE #2 : Boostrap Problem - Driver
• PROBLEM : The authentication of C* based on BCrypt is
heavy processing for the vCPU nodes.
27
..........
!
vCPU : Authentication (BCrypt) is heavy !
! !
! ! !
PHOTO:AFLO
28. ISSUE #2 : Boostrap Problem - Driver
• PROBLEM : Most processes can not connect to C*
clusters on OpenStack due to the authentication
processing, and the processes will timeout and repeat to
connect without waiting endlessly …
28
All vCPU Usages = 100% !
PHOTO:AFLO
vCPU : Authentication (BCrypt) is heavy !
Timeout ! Retry !
29. ISSUE #2 : Boostrap Problem - Driver
• SOLUTION : Improving the C* drivers not to connect
simultaneously when the connection is failed.
29
..........
!! !
! ! !
PHOTO:AFLO
30. ISSUE #3 : Multi-tenancy – Slow Query
• Small Services : (↓500qps and ↓10GB) / keyspace
= Shared C* cluster with real servers
30
Shared
Cluster
50
Services
31. ISSUE #3 : Multi-tenancy – Slow Query
• PROBLEM : Couldn’t find the causal service of the high
loading queries in the multi-tenancy cluster.
31
Shared
Cluster Which
services ?
QUERY
QUERY
PHOTO:AFLO
33. ISSUE #4 : Multi-racking – Inbound Params
• PROBLEM : Our C* clusters are build with other services
in a same rack or under a same core switch.
33PHOTO:AFLO
34. ISSUE #4 : Multi-racking – Inbound Params
• PROBLEM : C* Streaming occurs when the node is
added or remove by the our operation or the failure
detection.
34
Streaming
PHOTO:AFLO
35. ISSUE #4 : Multi-racking – Inbound Params
• PROBLEM : The streaming of C* rises a heavy traffic,
and it troubles the other services.
35
Streaming
Streaming
Streaming
Stop C*
streaming !
PHOTO:AFLO
stream_throughput_outbound
stream_throughput_outbound
stream_throughput_outbound
38. • PURPOSE : To abstract our data center resources using
OpenStack.
Apps
Platforms
Infrastructures
APIAPI
API API API API
OpenStack @ Yahoo! JAPAN
38
50,000+
instances
39. Trial #1 : Special Hypervisor for C*
• PROBLEM : Our hypervisors of OpenStack has C* and
other service VMs.
39
Noisy
Neighbours
40. Trial #1 : Special Hypervisor for C*
• SOLUTION : Trying to offer the special hypervisors
which runs only C* VMs.
40
vCPU : 8+, Mem : 16GiB+
SSD : 100GiB+
Optimal
Flavors for C*
10Gbps x 2
41. TRIAL#2 : Bare Metal Clusters for C*
• PROBLEM : vCPU of OpenStack is cheap to run a C*
node in our special service environment such as the
many connections.
41
vCPU : Authentication (BCrypt) is heavy !
42. TRIAL #2 : Bare Metal Clusters for C*
• SOLUTION : Trying to offer the special bare metal
clusters which runs only C* using OpenStack Ironic.
42
Ironic
Xeon D-1541 2.1GHz (1CPU)
32GBMEM / SATA SSD 400GB
10Gbps x 2