It is well known that there are availability and latency tradeoffs that are required in order to achieve strong consistency in distributed systems. This talk will discuss whether or not there is a consistency vs throughput tradeoff in distributed database systems that guarantee ACID transactions.
2. Introduction
Distributed database systems:
■ CAP theorem: you must trade availability for consistency
■ PACELC theorem: you must trade latency for consistency
■ But do you have to trade throughput for consistency???
3. Background
Widget ID Price In_Stock
1 99 4
2 325 12
3 79 1
4 199 7
Customer
id
Store_
Credit
1 0
2 100
3 50
4 0
5 10
6 500
7 0
8 25
Customer 2 buys widget 3 with
store credit:
W = Widgets.READ(3)
C = Customers.READ(2)
IF (W.In_Stock < 1)
ABORT
IF (C.Store_Credit < W.price)
ABORT
W.In_Stock -= 1
C.Store_Credit -= W.price
4. Background
Widget ID Price In_Stock
1 99 4
2 325 12
3 79 0
4 199 7
Customer
id
Store_
Credit
1 0
2 100
3 50
4 0
5 10
6 500
7 0
8 25
Customer 2 buys widget 3 with
store credit:
W = Widgets.READ(3)
C = Customers.READ(2)
IF (W.In_Stock < 1)
ABORT
IF (C.Store_Credit < W.price)
ABORT
W.In_Stock -= 1
C.Store_Credit -= W.price
5. Background
Widget ID Price In_Stock
1 99 4
2 325 12
3 79 0
4 199 7
Customer
id
Store_
Credit
1 0
2 21
3 50
4 0
5 10
6 500
7 0
8 25
Customer 2 buys widget 3 with
store credit:
W = Widgets.READ(3)
C = Customers.READ(2)
IF (W.In_Stock < 1)
ABORT
IF (C.Store_Credit < W.price)
ABORT
W.In_Stock -= 1
C.Store_Credit -= W.price
6. Distributed databases are hard
Widget ID Price In_Stock
1 99 4
Customer id Store_ Credit
1 0
5 10
Widget ID Price In_Stock
2 325 12
Customer id Store_ Credit
2 100
6 500
Widget ID Price In_Stock
3 79 1
Customer id Store_ Credit
3 50
7 0
Widget ID Price In_Stock
4 199 7
Customer id Store_ Credit
4 0
8 25
7. Distributed databases are hard
Widget ID Price In_Stock
2 325 12
Customer id Store_ Credit
2 100
6 500
Widget ID Price In_Stock
3 79 1
Customer id Store_ Credit
3 50
7 0
Customer 2 buys widget 3 with store credit:
W = Widgets.READ(3)
C = Customers.READ(2)
IF (W.In_Stock < 1)
ABORT
IF (C.Store_Credit < W.price)
ABORT
W.In_Stock -= 1
C.Store_Credit -= W.price
W.price = 79
8. Customer id Store_ Credit
2 21
6 500
Widget ID Price In_Stock
3 79 0
Distributed databases are hard
Widget ID Price In_Stock
2 325 12
Customer id Store_ Credit
3 50
7 0
prepare
r
e
a
d
y
c
o
m
m
i
t
d
o
n
e
No conflicting transactions can run!
W.price = 79
Customer 2 buys widget 3 with store credit:
W = Widgets.READ(3)
C = Customers.READ(2)
IF (W.In_Stock < 1)
ABORT
IF (C.Store_Credit < W.price)
ABORT
W.In_Stock -= 1
C.Store_Credit -= W.price
9. Replication exacerbates the problem
Widget ID Price In_Stock
2 325 12
Customer id Store_ Credit
2 21
6 500
Widget ID Price In_Stock
3 79 0
Customer id Store_ Credit
3 50
7 0
Widget ID Price In_Stock
2 325 12
Customer id Store_ Credit
2 100
6 500
Widget ID Price In_Stock
3 79 1
Customer id Store_ Credit
3 50
7 0
replicate
replicate
10. Replication exacerbates the problem
Widget ID Price In_Stock
2 325 12
Customer id Store_ Credit
2 21
6 500
Widget ID Price In_Stock
3 79 0
Customer id Store_ Credit
3 50
7 0
Widget ID Price In_Stock
2 325 12
Customer id Store_ Credit
2 100
6 500
Widget ID Price In_Stock
3 79 1
Customer id Store_ Credit
3 50
7 0
replicate
replicate
11. Replication exacerbates the problem
Widget ID Price In_Stock
2 325 12
Customer id Store_ Credit
2 21
6 500
Widget ID Price In_Stock
3 79 0
Customer id Store_ Credit
3 50
7 0
Widget ID Price In_Stock
2 325 12
Customer id Store_ Credit
2 21
6 500
Widget ID Price In_Stock
3 79 0
Customer id Store_ Credit
3 50
7 0
done
done
No conflicting transactions can run!
replicate
replicate
12. ■ Commit protocol like 2PC
■ Needed for A, C, and D of ACID
■ BUT:
■ Latency of protocol
■ Throughput reduction
■ Synchronous replication
■ Needed for C of CAP
■ BUT:
■ Latency of replication
■ Throughput reduction
Summary of performance issues
13. ■ If we have to trade throughout for consistency, it should
be PACETLC, not PACELC!
■ But is the throughput tradeoff fundamental?
PACETLC?
14. ■ All throughput problems are caused by preventing conflicting
transactions from running. Therefore:
■ If we don’t guarantee strong isolation, the problem goes away
■ If there are no conflicting transactions, the problem never exists in
the first place
■ Deterministic systems eliminate the throughput tradeoff entirely
■ But that’s a whole other talk!
Throughput tradeoff is not fundamental!
15. Thank You
Stay in Touch
Daniel Abadi
abadi@umd.edu
@daniel_abadi
www.linkedin.com/in/databaseprof