SlideShare uma empresa Scribd logo
1 de 47
Baixar para ler offline
Scale-Out ccNUMA:
Exploiting Skew with Strongly Consistent Caching
Antonios Katsarakis*, Vasilis Gavrielatos*, 

A. Joshi, N. Oswald, B. Grot, V. Nagarajan
The University of Edinburgh
This work was supported by EPSRC, ARM and Microsoft through their PhD Fellowship Programs
*The first two authors equally contribute to this work
Large-scale online services
2
Backed by Key-Value Stores (KVS)
Characteristics:
• Numerous users
• Read-mostly workloads
( e.g. Facebook 0.2% writes [ATC’13] )

Distributed KVS
KVS Performance 101
3
…
KVS Performance 101
4
In-memory storage:

Avoid slow disk access
… … …
…
KVS Performance 101
5
In-memory storage:

Avoid slow disk access
Partitioning:

• Shard the dataset across multiple nodes
• Enables high capacity in-memory storage

… … …
…
KVS Performance 101
6
In-memory storage:

Avoid slow disk access
Partitioning:

• Shard the dataset across multiple nodes
• Enables high capacity in-memory storage

Remote Direct Memory Access (RDMA):
Avoid costly TCP/IP processing via
• Kernel bypass
• H/w network stack processing
… … …
…
KVS Performance 101
7
In-memory storage:

Avoid slow disk access
Partitioning:

• Shard the dataset across multiple nodes
• Enables high capacity in-memory storage

Remote Direct Memory Access (RDMA):
Avoid costly TCP/IP processing via
• Kernel bypass
• H/w network stack processing
Good start, but there is a problem…
Skewed Access Distribution
8
Real-world datasets → mixed popularity
• Popularity follows a power-law distribution
• Small number of objects hot; most are not
Mixed popularity → load imbalance
• Node(s) storing hottest objects
get highly loaded
• Majority of nodes are under-utilized
128 Servers
… … …
Overloaded
YCSB, skew exponent = 0.99
Skewed Access Distribution
9
Real-world datasets → mixed popularity
• Popularity follows a power-law distribution
• Small number of objects hot; most are not
Mixed popularity → load imbalance
• Node(s) storing hottest objects
get highly loaded
• Majority of nodes are under-utilized
128 Servers
… … …
Overloaded
YCSB, skew exponent = 0.99
Skew-induced load imbalance limits system throughput
Centralized cache [SOCC’11, SOSP’17]
• Dedicated node resides in front of the KVS
caching hot objects.
◦ Filters the skew with a small cache
◦ Throughput is limited by the single cache
Existing Skew Mitigation Techniques
10
… … …
← Cache
Centralized cache [SOCC’11, SOSP’17]
• Dedicated node resides in front of the KVS
caching hot objects.
◦ Filters the skew with a small cache
◦ Throughput is limited by the single cache
NUMA abstraction [NSDI’14, SOCC’16]
• Uniformly distribute requests to all servers
• Remote objects RDMA’ed from home node
◦ Load balance the client requests
◦ No locality → excessive network b/w
Most requests require remote access
Existing Skew Mitigation Techniques
11
… … …
… … …
← Cache
Centralized cache [SOCC’11, SOSP’17]
• Dedicated node resides in front of the KVS
caching hot objects.
◦ Filters the skew with a small cache
◦ Throughput is limited by the single cache
NUMA abstraction [NSDI’14, SOCC’16]
• Uniformly distribute requests to all servers
• Remote objects RDMA’ed from home node
◦ Load balance the client requests
◦ No locality → excessive network b/w
Most requests require remote access
Existing Skew Mitigation Techniques
12
… … …
… … …
Can we get the best of both worlds?
← Cache
13
Caching + NUMA
… … …
+
Scale-Out ccNUMA!
… … …
via distributed caching
14
Caching + NUMA
… … …
+
Scale-Out ccNUMA!
What are the challenges?
… … …
via distributed caching
Scale-Out ccNUMA Challenges
15
Challenge 1: Distributed cache architecture design
• Which items to cache and where?
• How to steer traffic for maximum load balance & hit rate?
Challenge 2: Keeping the caches consistent 

(i.e. what happens on a write)
• How to locate replicas?
• How to execute writes efficiently?
Scale-Out ccNUMA Challenges
16
Challenge 1: Distributed cache architecture design
• Which items to cache and where?
• How to steer traffic for maximum load balance & hit rate?
Challenge 2: Keeping the caches consistent 

(i.e. what happens on a write)
• How to locate replicas?
• How to execute writes efficiently?
Solving Challenge 1 with Symmetric Caching
17
Which items to cache and where?
• Insight: hottest objects see most hits
• Idea: all nodes cache hottest objects →
Implication: all caches have same content
• Symmetric caching:
small cache with hottest objects at each node
How to steer traffic for maximum load balance and hit rate?
• Insight: symmetric caching → all caches equal (highest) hit rate
• Idea: uniformly spread requests
• Requests for hottest objects → served locally on any node
• Cache misses served as in NUMA Abstraction
Benefits:
• Load balances and filters the skew
• Throughput scales with number of servers
• Less network b/w: most requests are served locally
Symmetric Caching
… … …
18
Which items to cache and where?
• Insight: hottest objects see most hits
• Idea: all nodes cache hottest objects →
Implication: all caches have same content
• Symmetric caching:
small cache with hottest objects at each node
How to steer traffic for maximum load balance and hit rate?
• Insight: symmetric caching → all caches equal (highest) hit rate
• Idea: uniformly spread requests
• Requests for hottest objects → served locally on any node
• Cache misses served as in NUMA abstraction
Benefits:
• Load balances and filters the skew
• Throughput scales with number of servers
• Less network b/w: most requests are served locally
Symmetric Caching
… … …
19
Which items to cache and where?
• Insight: hottest objects see most hits
• Idea: all nodes cache hottest objects →
Implication: all caches have same content
• Symmetric caching:
small cache with hottest objects at each node
How to steer traffic for maximum load balance and hit rate?
• Insight: symmetric caching → all caches equal (highest) hit rate
• Idea: uniformly spread requests
• Requests for hottest objects → served locally on any node
• Cache misses served as in NUMA abstraction
Benefits:
• Load balances and filters the skew
• Throughput scales with number of servers
• Less network b/w: most requests are served locally
Symmetric Caching
… … …
20
Which items to cache and where?
• Insight: hottest objects see most hits
• Idea: all nodes cache hottest objects →
Implication: all caches have same content
• Symmetric caching:
small cache with hottest objects at each node
How to steer traffic for maximum load balance and hit rate?
• Insight: symmetric caching → all caches equal (highest) hit rate
• Idea: uniformly spread requests
• Requests for hottest objects → served locally on any node
• Cache misses served as in NUMA abstraction
Benefits:
• Load balances and filters the skew
• Throughput scales with number of servers
• Less network b/w: most requests are served locally
Symmetric Caching
… … …
Challenge 2: How to keep the caches consistent?
Keeping the caches consistent
21
Requirement:
On a write, inform all replicas of the new value
How to locate replicas?
- Easy with Symmetric Caching!
If object in local cache → all nodes cache it
Keeping the caches consistent
22
Requirement:
On a write, inform all replicas of the new value
How to locate replicas?
- Easy with Symmetric Caching!
If object in local cache → all nodes cache it
Keeping the caches consistent
23
Requirement:
On a write, inform all replicas of the new value
How to locate replicas?
- Easy with Symmetric Caching!
If object in local cache → all nodes cache it
How to execute writes efficiently?
• Typical protocols:
◦ Ensure global write ordering via a primary
◦ Primary executes all writes → hot-spot Primary executes all writes
Write( )Write( )
Primary
Keeping the caches consistent
24
Requirement:
On a write, inform all replicas of the new value
How to locate replicas?
- Easy with Symmetric Caching!
If object in local cache → all nodes cache it
How to execute writes efficiently?
• Typical protocols:
◦ Ensure global write ordering via a primary
◦ Primary executes all writes → hot-spot
• Fully distributed writes
Can guarantee ordering via logical clocks
Avoid hot-spots
Evenly spread write propagation costs
Primary executes all writes
Write( )Write( )
Primary
Fully distributed writes
Write( ) Write( )
Protocols in Scale-out ccNUMA
25
Efficient RDMA implementation
Fully distributed writes via logical clocks
Two (per-key) strongly consistent flavours:
Write( )
Protocols in Scale-out ccNUMA
26
Efficient RDMA implementation
Fully distributed writes via logical clocks
Two (per-key) strongly consistent flavours:
◦ Linearizability (Lin): 2 RTTs
Write( )
Protocols in Scale-out ccNUMA
27
Efficient RDMA implementation
Fully distributed writes via logical clocks
Two (per-key) strongly consistent flavours:
◦ Linearizability (Lin): 2 RTTs
Broadcast Invalidations*
* along with logical (Lamport) clocks
Lin
Invalidate all caches
Write( )
Protocols in Scale-out ccNUMA
28
Efficient RDMA implementation
Fully distributed writes via logical clocks
Two (per-key) strongly consistent flavours:
◦ Linearizability (Lin): 2 RTTs
Broadcast Invalidations*
Broadcast Updates*
* along with logical (Lamport) clocks
Lin
Invalidate all caches
Write( )
Broadcast Updates
Protocols in Scale-out ccNUMA
29
Efficient RDMA implementation
Fully distributed writes via logical clocks
Two (per-key) strongly consistent flavours:
◦ Linearizability (Lin): 2 RTTs
Broadcast Invalidations*
Broadcast Updates*
◦ Sequential Consistency (SC): 1 RTT
Broadcast Updates*

* along with logical (Lamport) clocks
Lin
SC
Invalidate all caches
Write( )
Broadcast Updates
Evaluation
30
Hardware setup: 9 nodes
• 56Gb/s FDR InfiniBand NIC
• 64GB DRAM
• 2x 10 core CPUs - 25MB L3
KVS Workload:
• Skew exponent: α = 0.99 (YCSB)
• 250 M key-value pairs - (Key = 8B, Value = 40B)
Evaluated systems:
• Baseline: NUMA abstraction (state-of-the-art)
• Scale-out ccNUMA
• Per-node symmetric cache size: 0.1% of dataset
Performance
31
Both systems are network bound
Performance
32
>3χ
Both systems are network bound
Lin: >3x throughput at low write ratio
Performance
33
>3χ
1.6χ
Both systems are network bound
Lin: >3x throughput at low write ratio, 1.6x at 5% writes
2.2χ
Performance
34
Both systems are network bound
Lin: >3x throughput at low write ratio, 1.6x at 5% writes
SC: higher throughput at higher write ratios: 2.2x at 5% writes
>3χ
1.6χ
Conclusion
35
Scale-Out ccNUMA:
Distributed cache → best of Caching + NUMA
• Symmetric Caching:
◦ Load balances and filters skew
◦ Throughput scales with number of servers
◦ Less network b/w: most requests are local
• Fully distributed protocols:
◦ Efficient RDMA Implementation
◦ Fully distributed writes
◦ Two strong consistency guarantees
Up to 3x performance of state-of-the-art
while guaranteeing per-key Linearizability
Symmetric Caching
Fully distributed protocols
Write( ) Write( )
… … …
Questions?
36
Backup Slides
37
Effectiveness of caching
38
~65%
~60%
Read-only (varying skew)
39
Request breakdown
40
Network traffic
41
Read-only performance + Coalescing
42
Object-size & writes
43
Object-size & coalescing
44
Latency vs xPut
45
~ order of magnitude lower than typical 1ms QoS (on max xPut)
Break even (+model)
46
Same performance with ideal baseline (uniform workload)
Scalability (+model)
47

Mais conteúdo relacionado

Mais procurados

Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Cassandra and Riak at BestBuy.com
Cassandra and Riak at BestBuy.comCassandra and Riak at BestBuy.com
Cassandra and Riak at BestBuy.comjoelcrabb
 
Improving Clinical Data Accuracy: How to Streamline a Data Pipeline Using Nod...
Improving Clinical Data Accuracy: How to Streamline a Data Pipeline Using Nod...Improving Clinical Data Accuracy: How to Streamline a Data Pipeline Using Nod...
Improving Clinical Data Accuracy: How to Streamline a Data Pipeline Using Nod...InfluxData
 
Real Time analytics with Druid, Apache Spark and Kafka
Real Time analytics with Druid, Apache Spark and KafkaReal Time analytics with Druid, Apache Spark and Kafka
Real Time analytics with Druid, Apache Spark and KafkaDaria Litvinov
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...ScyllaDB
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkHortonworks
 
Rust Is Safe. But Is It Fast?
Rust Is Safe. But Is It Fast?Rust Is Safe. But Is It Fast?
Rust Is Safe. But Is It Fast?ScyllaDB
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingTill Rohrmann
 
Apache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryApache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryKai Wähner
 
Monitoraggio di mac address in lan
Monitoraggio di mac address in lanMonitoraggio di mac address in lan
Monitoraggio di mac address in lanCe.Se.N.A. Security
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
How Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and SaferHow Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and SaferScyllaDB
 
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, ConfluentIntroducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, ConfluentHostedbyConfluent
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 

Mais procurados (20)

Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Apache Flink Hands On
Apache Flink Hands OnApache Flink Hands On
Apache Flink Hands On
 
Cassandra and Riak at BestBuy.com
Cassandra and Riak at BestBuy.comCassandra and Riak at BestBuy.com
Cassandra and Riak at BestBuy.com
 
Improving Clinical Data Accuracy: How to Streamline a Data Pipeline Using Nod...
Improving Clinical Data Accuracy: How to Streamline a Data Pipeline Using Nod...Improving Clinical Data Accuracy: How to Streamline a Data Pipeline Using Nod...
Improving Clinical Data Accuracy: How to Streamline a Data Pipeline Using Nod...
 
Real Time analytics with Druid, Apache Spark and Kafka
Real Time analytics with Druid, Apache Spark and KafkaReal Time analytics with Druid, Apache Spark and Kafka
Real Time analytics with Druid, Apache Spark and Kafka
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Rust Is Safe. But Is It Fast?
Rust Is Safe. But Is It Fast?Rust Is Safe. But Is It Fast?
Rust Is Safe. But Is It Fast?
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Druid+superset
Druid+supersetDruid+superset
Druid+superset
 
Apache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryApache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel Industry
 
Monitoraggio di mac address in lan
Monitoraggio di mac address in lanMonitoraggio di mac address in lan
Monitoraggio di mac address in lan
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
How Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and SaferHow Scylla Make Adding and Removing Nodes Faster and Safer
How Scylla Make Adding and Removing Nodes Faster and Safer
 
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, ConfluentIntroducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
Introducing Confluent labs Parallel Consumer client | Anthony Stubbes, Confluent
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 

Semelhante a Scale-out ccNUMA - Eurosys'18

Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarDataStax Academy
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Johnny Miller
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster inwin stack
 
Stateful Applications On the Cloud: A PayPal Journey
Stateful Applications On the Cloud: A PayPal JourneyStateful Applications On the Cloud: A PayPal Journey
Stateful Applications On the Cloud: A PayPal JourneyTesora
 
Frontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkFrontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkScrapinghub
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchHakka Labs
 
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...MayaData Inc
 
NAVER Ceph Storage on ssd for Container
NAVER Ceph Storage on ssd for ContainerNAVER Ceph Storage on ssd for Container
NAVER Ceph Storage on ssd for ContainerJangseon Ryu
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016Eric Sammer
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghubit-people
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentationEdward Capriolo
 

Semelhante a Scale-out ccNUMA - Eurosys'18 (20)

Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster
 
Stateful Applications On the Cloud: A PayPal Journey
Stateful Applications On the Cloud: A PayPal JourneyStateful Applications On the Cloud: A PayPal Journey
Stateful Applications On the Cloud: A PayPal Journey
 
Frontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling frameworkFrontera: open source, large scale web crawling framework
Frontera: open source, large scale web crawling framework
 
DataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series searchDataEngConf SF16 - High cardinality time series search
DataEngConf SF16 - High cardinality time series search
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
 
NAVER Ceph Storage on ssd for Container
NAVER Ceph Storage on ssd for ContainerNAVER Ceph Storage on ssd for Container
NAVER Ceph Storage on ssd for Container
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub«Scrapy internals» Александр Сибиряков, Scrapinghub
«Scrapy internals» Александр Сибиряков, Scrapinghub
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Performance out
Performance outPerformance out
Performance out
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
 

Último

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Scale-out ccNUMA - Eurosys'18

  • 1. Scale-Out ccNUMA: Exploiting Skew with Strongly Consistent Caching Antonios Katsarakis*, Vasilis Gavrielatos*, 
 A. Joshi, N. Oswald, B. Grot, V. Nagarajan The University of Edinburgh This work was supported by EPSRC, ARM and Microsoft through their PhD Fellowship Programs *The first two authors equally contribute to this work
  • 2. Large-scale online services 2 Backed by Key-Value Stores (KVS) Characteristics: • Numerous users • Read-mostly workloads ( e.g. Facebook 0.2% writes [ATC’13] )
 Distributed KVS
  • 4. … KVS Performance 101 4 In-memory storage:
 Avoid slow disk access
  • 5. … … … … KVS Performance 101 5 In-memory storage:
 Avoid slow disk access Partitioning:
 • Shard the dataset across multiple nodes • Enables high capacity in-memory storage

  • 6. … … … … KVS Performance 101 6 In-memory storage:
 Avoid slow disk access Partitioning:
 • Shard the dataset across multiple nodes • Enables high capacity in-memory storage
 Remote Direct Memory Access (RDMA): Avoid costly TCP/IP processing via • Kernel bypass • H/w network stack processing
  • 7. … … … … KVS Performance 101 7 In-memory storage:
 Avoid slow disk access Partitioning:
 • Shard the dataset across multiple nodes • Enables high capacity in-memory storage
 Remote Direct Memory Access (RDMA): Avoid costly TCP/IP processing via • Kernel bypass • H/w network stack processing Good start, but there is a problem…
  • 8. Skewed Access Distribution 8 Real-world datasets → mixed popularity • Popularity follows a power-law distribution • Small number of objects hot; most are not Mixed popularity → load imbalance • Node(s) storing hottest objects get highly loaded • Majority of nodes are under-utilized 128 Servers … … … Overloaded YCSB, skew exponent = 0.99
  • 9. Skewed Access Distribution 9 Real-world datasets → mixed popularity • Popularity follows a power-law distribution • Small number of objects hot; most are not Mixed popularity → load imbalance • Node(s) storing hottest objects get highly loaded • Majority of nodes are under-utilized 128 Servers … … … Overloaded YCSB, skew exponent = 0.99 Skew-induced load imbalance limits system throughput
  • 10. Centralized cache [SOCC’11, SOSP’17] • Dedicated node resides in front of the KVS caching hot objects. ◦ Filters the skew with a small cache ◦ Throughput is limited by the single cache Existing Skew Mitigation Techniques 10 … … … ← Cache
  • 11. Centralized cache [SOCC’11, SOSP’17] • Dedicated node resides in front of the KVS caching hot objects. ◦ Filters the skew with a small cache ◦ Throughput is limited by the single cache NUMA abstraction [NSDI’14, SOCC’16] • Uniformly distribute requests to all servers • Remote objects RDMA’ed from home node ◦ Load balance the client requests ◦ No locality → excessive network b/w Most requests require remote access Existing Skew Mitigation Techniques 11 … … … … … … ← Cache
  • 12. Centralized cache [SOCC’11, SOSP’17] • Dedicated node resides in front of the KVS caching hot objects. ◦ Filters the skew with a small cache ◦ Throughput is limited by the single cache NUMA abstraction [NSDI’14, SOCC’16] • Uniformly distribute requests to all servers • Remote objects RDMA’ed from home node ◦ Load balance the client requests ◦ No locality → excessive network b/w Most requests require remote access Existing Skew Mitigation Techniques 12 … … … … … … Can we get the best of both worlds? ← Cache
  • 13. 13 Caching + NUMA … … … + Scale-Out ccNUMA! … … … via distributed caching
  • 14. 14 Caching + NUMA … … … + Scale-Out ccNUMA! What are the challenges? … … … via distributed caching
  • 15. Scale-Out ccNUMA Challenges 15 Challenge 1: Distributed cache architecture design • Which items to cache and where? • How to steer traffic for maximum load balance & hit rate? Challenge 2: Keeping the caches consistent 
 (i.e. what happens on a write) • How to locate replicas? • How to execute writes efficiently?
  • 16. Scale-Out ccNUMA Challenges 16 Challenge 1: Distributed cache architecture design • Which items to cache and where? • How to steer traffic for maximum load balance & hit rate? Challenge 2: Keeping the caches consistent 
 (i.e. what happens on a write) • How to locate replicas? • How to execute writes efficiently? Solving Challenge 1 with Symmetric Caching
  • 17. 17 Which items to cache and where? • Insight: hottest objects see most hits • Idea: all nodes cache hottest objects → Implication: all caches have same content • Symmetric caching: small cache with hottest objects at each node How to steer traffic for maximum load balance and hit rate? • Insight: symmetric caching → all caches equal (highest) hit rate • Idea: uniformly spread requests • Requests for hottest objects → served locally on any node • Cache misses served as in NUMA Abstraction Benefits: • Load balances and filters the skew • Throughput scales with number of servers • Less network b/w: most requests are served locally Symmetric Caching … … …
  • 18. 18 Which items to cache and where? • Insight: hottest objects see most hits • Idea: all nodes cache hottest objects → Implication: all caches have same content • Symmetric caching: small cache with hottest objects at each node How to steer traffic for maximum load balance and hit rate? • Insight: symmetric caching → all caches equal (highest) hit rate • Idea: uniformly spread requests • Requests for hottest objects → served locally on any node • Cache misses served as in NUMA abstraction Benefits: • Load balances and filters the skew • Throughput scales with number of servers • Less network b/w: most requests are served locally Symmetric Caching … … …
  • 19. 19 Which items to cache and where? • Insight: hottest objects see most hits • Idea: all nodes cache hottest objects → Implication: all caches have same content • Symmetric caching: small cache with hottest objects at each node How to steer traffic for maximum load balance and hit rate? • Insight: symmetric caching → all caches equal (highest) hit rate • Idea: uniformly spread requests • Requests for hottest objects → served locally on any node • Cache misses served as in NUMA abstraction Benefits: • Load balances and filters the skew • Throughput scales with number of servers • Less network b/w: most requests are served locally Symmetric Caching … … …
  • 20. 20 Which items to cache and where? • Insight: hottest objects see most hits • Idea: all nodes cache hottest objects → Implication: all caches have same content • Symmetric caching: small cache with hottest objects at each node How to steer traffic for maximum load balance and hit rate? • Insight: symmetric caching → all caches equal (highest) hit rate • Idea: uniformly spread requests • Requests for hottest objects → served locally on any node • Cache misses served as in NUMA abstraction Benefits: • Load balances and filters the skew • Throughput scales with number of servers • Less network b/w: most requests are served locally Symmetric Caching … … … Challenge 2: How to keep the caches consistent?
  • 21. Keeping the caches consistent 21 Requirement: On a write, inform all replicas of the new value How to locate replicas? - Easy with Symmetric Caching! If object in local cache → all nodes cache it
  • 22. Keeping the caches consistent 22 Requirement: On a write, inform all replicas of the new value How to locate replicas? - Easy with Symmetric Caching! If object in local cache → all nodes cache it
  • 23. Keeping the caches consistent 23 Requirement: On a write, inform all replicas of the new value How to locate replicas? - Easy with Symmetric Caching! If object in local cache → all nodes cache it How to execute writes efficiently? • Typical protocols: ◦ Ensure global write ordering via a primary ◦ Primary executes all writes → hot-spot Primary executes all writes Write( )Write( ) Primary
  • 24. Keeping the caches consistent 24 Requirement: On a write, inform all replicas of the new value How to locate replicas? - Easy with Symmetric Caching! If object in local cache → all nodes cache it How to execute writes efficiently? • Typical protocols: ◦ Ensure global write ordering via a primary ◦ Primary executes all writes → hot-spot • Fully distributed writes Can guarantee ordering via logical clocks Avoid hot-spots Evenly spread write propagation costs Primary executes all writes Write( )Write( ) Primary Fully distributed writes Write( ) Write( )
  • 25. Protocols in Scale-out ccNUMA 25 Efficient RDMA implementation Fully distributed writes via logical clocks Two (per-key) strongly consistent flavours: Write( )
  • 26. Protocols in Scale-out ccNUMA 26 Efficient RDMA implementation Fully distributed writes via logical clocks Two (per-key) strongly consistent flavours: ◦ Linearizability (Lin): 2 RTTs Write( )
  • 27. Protocols in Scale-out ccNUMA 27 Efficient RDMA implementation Fully distributed writes via logical clocks Two (per-key) strongly consistent flavours: ◦ Linearizability (Lin): 2 RTTs Broadcast Invalidations* * along with logical (Lamport) clocks Lin Invalidate all caches Write( )
  • 28. Protocols in Scale-out ccNUMA 28 Efficient RDMA implementation Fully distributed writes via logical clocks Two (per-key) strongly consistent flavours: ◦ Linearizability (Lin): 2 RTTs Broadcast Invalidations* Broadcast Updates* * along with logical (Lamport) clocks Lin Invalidate all caches Write( ) Broadcast Updates
  • 29. Protocols in Scale-out ccNUMA 29 Efficient RDMA implementation Fully distributed writes via logical clocks Two (per-key) strongly consistent flavours: ◦ Linearizability (Lin): 2 RTTs Broadcast Invalidations* Broadcast Updates* ◦ Sequential Consistency (SC): 1 RTT Broadcast Updates*
 * along with logical (Lamport) clocks Lin SC Invalidate all caches Write( ) Broadcast Updates
  • 30. Evaluation 30 Hardware setup: 9 nodes • 56Gb/s FDR InfiniBand NIC • 64GB DRAM • 2x 10 core CPUs - 25MB L3 KVS Workload: • Skew exponent: α = 0.99 (YCSB) • 250 M key-value pairs - (Key = 8B, Value = 40B) Evaluated systems: • Baseline: NUMA abstraction (state-of-the-art) • Scale-out ccNUMA • Per-node symmetric cache size: 0.1% of dataset
  • 32. Performance 32 >3χ Both systems are network bound Lin: >3x throughput at low write ratio
  • 33. Performance 33 >3χ 1.6χ Both systems are network bound Lin: >3x throughput at low write ratio, 1.6x at 5% writes
  • 34. 2.2χ Performance 34 Both systems are network bound Lin: >3x throughput at low write ratio, 1.6x at 5% writes SC: higher throughput at higher write ratios: 2.2x at 5% writes >3χ 1.6χ
  • 35. Conclusion 35 Scale-Out ccNUMA: Distributed cache → best of Caching + NUMA • Symmetric Caching: ◦ Load balances and filters skew ◦ Throughput scales with number of servers ◦ Less network b/w: most requests are local • Fully distributed protocols: ◦ Efficient RDMA Implementation ◦ Fully distributed writes ◦ Two strong consistency guarantees Up to 3x performance of state-of-the-art while guaranteeing per-key Linearizability Symmetric Caching Fully distributed protocols Write( ) Write( ) … … …
  • 42. Read-only performance + Coalescing 42
  • 45. Latency vs xPut 45 ~ order of magnitude lower than typical 1ms QoS (on max xPut)
  • 46. Break even (+model) 46 Same performance with ideal baseline (uniform workload)