SlideShare uma empresa Scribd logo
Shivji Kumar Jha
Navigating Transactions
ACID Complexity in Modern Databases
1
Data Platforms & OSS
• Databases, streams, app architecture
• Loves open source software (OSS)
• And communities (meetups)
• Regular speaker ( talk # 23)
• Sta
ff
Engineer at Nutanix
Shivji Kumar Jha
https://www.linkedin.com/in/shivjijha
https://youtube.com/@ShivjiKumarJha
https://t.me/theDbShots
2
• Transactions & ACID
• Implementing Transactions
• Distributed transactions
• Cloud scale databases
Contents
3
Transactions
4
Transactions
Historical Perspective
Almost all relational and even some non relational databases
Most of them follow system R - first SQL DB by IBM in 1975.
General ideas has remained same over 45+ years
MySQL, Postgres, Oracle, SQL server have similar transactions
5
Transactions
Historical Perspective
Transactions are antithesis of
scalability
Large scale systems have to
abandon transaction
Go for good performance
and high availability
6
Transactions are essential
requirements for serious
applications with valuable data
NoSQL Camp
SQL Camp
Transactions
Historical Perspective
Transactions are antithesis of
scalability
Large scale systems have to
abandon transaction
Go for good performance
and high availability
7
Transactions are essential
requirements for serious
applications with valuable data
NoSQL Camp
SQL Camp
Both are exaggerated!
Trade-offs!
8
Coined in 1983 by Theo Harder and Andreas Reuter
Atomicity
Consistency Isolation Durability
9
Coined in 1983 by Theo Harder and Andreas Reuter
The slippery slope!
In practice, one database’s implementation of ACID does not equal
another’s implementation
For example, a lot of ambiguity in meaning of “isolation”
Devil is in details!
When a system claims ACID, unclear what guarantees it provides
ACID has unfortunately become a marketing term
Transactions
10
How about
11
• ALL OR NOTHING!
• Ability to undo (abort)
• Easy to RETRY transactions
12
Atomicity
ACID
Are retries really safe with transactions?
13
Are retries really safe with transactions?
• Network failed while server tried to acknowledge commit to client. Retrying
means executing twice. Idempotency or de-duplication required in app!
• What if the error is due to overload?
• Transient or permanent error?
• What if transaction had side e
ff
ects? Send email again?
• What if client fails while retrying? Data lost?
14
• Certain statements about data (invariants) always true
• Database can’t promise
• Application speci
fi
c guarantees
• Most weakly de
fi
ned property in ACID
15
Consistency
ACID
Isolation
ACID
• Many clients access data at the same time
• Accessing same records you can run into concurrency problems (race)
• Database guarantees concurrently executing transactions are isolated
• Textbook de
fi
nitions- serializability - same result as running serially
• In practice, serializability rarely used because of performance penalty
• Actually snapshot isolation. Much weaker guarantee!
16
• Once committed, data written will not be forgotten
• Even if there is a hardware fault or database crashes
• Single node - write to HDD or SDD
• Databases usually uses a write ahead log & dirty cached pages
• Replicated databases - written to multiple nodes, wait until that happens!
• No such thing as perfect durability
ACID
Durability
17 Picture: https://www.alphr.com/tell-regular-or-ssd-hard-drive/
Implementing Transactions
18
Implementing Transactions
Balance between two problems
Improve E
ffi
ciency Preserve Correctness
Allow transactions to
execute concurrently
Ensure concurrently executing
transactions preserve ACID properties
19
Implementing Transactions
Balance between two problems
Improve E
ffi
ciency Preserve Correctness
Allow transactions to
execute concurrently
Ensure concurrently executing
transactions preserve ACID properties
Concurrently executing transactions can cause read & write anomalies
Isolation levels
Concurrency Control
Presence or absence of read & write anomalies
How transactions are scheduled & executed
20
21
Single Node Transactions
Storage Engine’s responsibility
Distributed Transactions
22
What if multiple nodes are
involved in a Transaction?
23
Send a commit request to each node & independently commit?
24
25
Send a commit request to each node & independently commit?
• Some SUCCESS, some FAILURES!
• Inconsistency between nodes
• Abort if some FAILURES?
• But you can’t go back on a committed promise!
• How about a compensating transaction to o
ff
set changes?
• Where does this responsibility sit? DB or App?
• OR commit only if everyone promises to commit?
Atomic Commitment
26
Atomic Commitment
• A transaction will not commit even if one of the participating votes against it.
• Failed processes have to reach the same conclusion as the rest of the cohort.
• Does not work in the case of Byzantine failures- process can’t lie!
• Cohorts can not choose, in
fl
uence or change proposed transaction, they can
only vote on whether or not they are willing to execute it.
• Executed by transaction manager or coordinator
• Example: MySQL , Postgres, dynamoDB, spanner, Kafka for producer and
consumer interactions.
27
Two Phase Commit
,
28
Two Phase Commit
• Coordinator (or transaction manager) a library within database server
• When database is ready to commit, coordinator starts phase 1
• Coordinate sends prepare request to each node. Are you able to commit?
• Coordinator tracks response from each participant
• If all participants vote YES , coordinator sends COMMIT request in phase 2
• If any participant says NO, coordinator sends ABORT to all nodes in phase 2
• Coordinator must write decision in its log on disk to handle crashes
29
Two Phase Commit
,
Crash (
fi
re)
30
Coordinator Failures
Two Phase Commit
• Two points of no return
• 1. If Participant says YES, it has to commit if coordinator asks to.
• 2. If coordinator decides once, decision is irrevocable
• If participant said yes and didn’t hear back from coordinator, wait forever!
• Coordinator must persist decision before it sends participants
• If no COMMIT record persisted in coordinator, abort on recovery
31
Three Phase Commit?
• 2PC is a blocking protocol
• 3PC is non blocking
• Di
ffi
cult to implement in practice. Not well adopted!
• 2 PC quite well adopted in spite of known problems.
32
Distributed Transactions in Practice
• Carries a heavy performance penalty
• Additional fsync required for crash recovery
• Addition network round trips
• Distributed Transactions in MySQL are reported to be over 10 times slower
than single node transactions.
33
Other Choices?
• Single leader? Everyone else executes same transactions in same order!
• Manually selected leader?
• Automatic selection of leader?
• O
ffl
oaded same problem to di
ff
erent time. Well less frequently!
• Use Consensus Algorithms: Zookeeper (ZAB), etcd (Raft) , Paxos?
• Global transaction order by reaching consensus on sequencer? Calvin(FaunaDB)
• 2PC over consensus groups per shard? Enter Google’s Spanner!
34
Transactions in Modern Databases
35
Amazon Aurora
36
Quick Overview
• Fully managed relational RDS
• Service Oriented Design
• Separation of Compute, storage
• multi-tenant scale out storage
• Segmented redo log
• Throughput 5x MySQL, 3x Postgres
Amazon Aurora
37
Aurora Functional Separation
DB instance
Query Processing
Access Methods
Transactions
Locking
Page Cache
Undo Management
Storage
fl
eet
Redo logging
Materialisation of data blocks
Garbage collection
Backup / Restore
38
Aurora: Quorum Style distributed coordination
Not 2PC, to avoid network chatter!
Read set & write set must overlap on at least one copy
Write set must overlap with previous write sets
39
Amazon DynamoDB
40
Dynamo DB
Highly available, weaker consistency
• Always “on” Key Value store, single key operations
• Sacri
fi
ce consistency under failure scenarios. Eventually consistent!
• Extensive use of object versioning, branching OK, resolve on read
• Example: shopping cart; merges carts. Can’t loose write, deleted can appear
• Consistency among replicas using quorum like technique (sloppy quorum)
• Gossip based distributed failure detection (hinted hando
ff
s)
41
Dynamo DB
Sloppy Quorum & hinted hando
f
• Each data item is replicated at N hosts. N distinct physical nodes
• List of nodes storing a key is called it’s preference list
• R : minimum no of nodes in successful read operation
• W: minimum no of nodes in successful write operation
42
Dynamo DB
Sloppy Quorum & hinted hando
f
• Each data item is replicated at N hosts. N distinct physical nodes
• List of nodes storing a key is called it’s preference list
• R : minimum no of nodes in successful read operation
• W: minimum no of nodes in successful write operation
43
44
customers
inventory
orders
Transaction coordinator
DynamoDB
45 https://www.infoq.com/articles/amazon-dynamodb-transactions/
Transaction coordinator failure/recovery
DynamoDB
46 https://www.infoq.com/articles/amazon-dynamodb-transactions/
Google Spanner
47
Spanner
Google’s globally distributed SQL* database
• Also inspiration for cockroachDB and yugabyteDB
• Tables with rows, columns and versioned values
• Supports transactions and SQL based query language
• Replication con
fi
gs dynamically controlled at
fi
ne grain by apps - which data-
centre’s to use, how far from users(read latency), how far are replicas (write
latency), how many replicas
• Clients automatically failover between replicas
• Data dynamically moved between data-centres to balance resources
48
49
Span server stack & transactions
50
51
References
• Designing Data-Intensive Applications ( Chapter 7 & 9) By Martin Kleppmann
• Database Internals (Chapter 5 & 13 ) By Alex Petrov
52
Books
• Amazon Aurora: Design considerations for high throughput cloud native relational databases
• Amazon Aurora: On avoiding distributed consensus….
• Dynamo: Amazon’s highly available key value store
• Distributed Transactions at scale in Amazon DynamoDB
• Spanner : Google’s Globally Distributed Database
Whitepapers
https://www.infoq.com/articles/amazon-dynamodb-transactions/
Blog
Questions?
Staying in touch:
https://www.linkedin.com/in/shivjijha
https://youtube.com/@ShivjiKumarJha
https://www.slideshare.net/shiv4289/presentations
https://t.me/theDbShots
53

Mais conteúdo relacionado

Mais procurados

Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
Mydbops
 
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Amazon Web Services
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
Omid Vahdaty
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
Jean-François Gagné
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
SRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon AuroraSRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon Aurora
Amazon Web Services
 
Scylla core dump debugging tools
Scylla core dump debugging toolsScylla core dump debugging tools
Scylla core dump debugging tools
Tomasz Grabiec
 
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBHistogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Mydbops
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
Our answer to Uber
Our answer to UberOur answer to Uber
Our answer to Uber
Alexander Korotkov
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
Alexey Lesovsky
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
Dmitry Tolpeko
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Tomas Vondra
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando Patroni
Zalando Technology
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Viswanath J
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
MariaDB plc
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
jhao niu
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
Sage Weil
 
Maxscale 소개 1.1.1
Maxscale 소개 1.1.1Maxscale 소개 1.1.1
Maxscale 소개 1.1.1
NeoClova
 

Mais procurados (20)

Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
 
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
Deep Dive on Amazon Aurora PostgreSQL Performance Tuning (DAT428-R1) - AWS re...
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
SRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon AuroraSRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon Aurora
 
Scylla core dump debugging tools
Scylla core dump debugging toolsScylla core dump debugging tools
Scylla core dump debugging tools
 
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBHistogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Our answer to Uber
Our answer to UberOur answer to Uber
Our answer to Uber
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
High Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando PatroniHigh Availability PostgreSQL with Zalando Patroni
High Availability PostgreSQL with Zalando Patroni
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
M|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScaleM|18 Architectural Overview: MariaDB MaxScale
M|18 Architectural Overview: MariaDB MaxScale
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Maxscale 소개 1.1.1
Maxscale 소개 1.1.1Maxscale 소개 1.1.1
Maxscale 소개 1.1.1
 

Semelhante a Navigating Transactions: ACID Complexity in Modern Databases

Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
Don Demcsak
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
LinkedIn
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
elliando dias
 
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users HappyGeek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
IDERA Software
 
Hbase hive pig
Hbase hive pigHbase hive pig
Hbase hive pig
Xuhong Zhang
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
Gwen (Chen) Shapira
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute Cluster
Ramsay Key
 
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
MongoDB
 
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
PivotalOpenSourceHub
 
Robust ha solutions with proxysql
Robust ha solutions with proxysqlRobust ha solutions with proxysql
Robust ha solutions with proxysql
Marco Tusa
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
David Martínez Rego
 
Hbase hivepig
Hbase hivepigHbase hivepig
Hbase hivepig
Radha Krishna
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
Scaling tappsi
Scaling tappsiScaling tappsi
Scaling tappsi
Óscar Andrés López
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
GlobalLogic Ukraine
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Bob Pusateri
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Bob Pusateri
 
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Clustrix
 
Reduced instruction set computers
Reduced instruction set computersReduced instruction set computers
Reduced instruction set computers
Syed Zaid Irshad
 

Semelhante a Navigating Transactions: ACID Complexity in Modern Databases (20)

Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users HappyGeek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
Geek Sync | Top 5 Tips to Keep Always On Always Humming and Users Happy
 
Hbase hive pig
Hbase hive pigHbase hive pig
Hbase hive pig
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
How to Build a Compute Cluster
How to Build a Compute ClusterHow to Build a Compute Cluster
How to Build a Compute Cluster
 
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014Webinar: How MongoDB is Used to Manage Reference Data - May 2014
Webinar: How MongoDB is Used to Manage Reference Data - May 2014
 
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
 
Robust ha solutions with proxysql
Robust ha solutions with proxysqlRobust ha solutions with proxysql
Robust ha solutions with proxysql
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Hbase hivepig
Hbase hivepigHbase hivepig
Hbase hivepig
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Scaling tappsi
Scaling tappsiScaling tappsi
Scaling tappsi
 
Patterns of Distributed Application Design
Patterns of Distributed Application DesignPatterns of Distributed Application Design
Patterns of Distributed Application Design
 
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
Tech Talk Series, Part 2: Why is sharding not smart to do in MySQL?
 
Reduced instruction set computers
Reduced instruction set computersReduced instruction set computers
Reduced instruction set computers
 

Mais de Shivji Kumar Jha

Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Shivji Kumar Jha
 
osi-oss-dbs.pptx
osi-oss-dbs.pptxosi-oss-dbs.pptx
osi-oss-dbs.pptx
Shivji Kumar Jha
 
pulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxpulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptx
Shivji Kumar Jha
 
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Shivji Kumar Jha
 
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarPulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Shivji Kumar Jha
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for Isolation
Shivji Kumar Jha
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event Store
Shivji Kumar Jha
 
Apache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingApache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data Streaming
Shivji Kumar Jha
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Shivji Kumar Jha
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Shivji Kumar Jha
 
Pulsar Summit Asia - Running a secure pulsar cluster
Pulsar Summit Asia -  Running a secure pulsar clusterPulsar Summit Asia -  Running a secure pulsar cluster
Pulsar Summit Asia - Running a secure pulsar cluster
Shivji Kumar Jha
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
Shivji Kumar Jha
 
FOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationFOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group Replication
Shivji Kumar Jha
 
MySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesMySQL High Availability with Replication New Features
MySQL High Availability with Replication New Features
Shivji Kumar Jha
 
MySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityMySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and Scalability
Shivji Kumar Jha
 
MySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterMySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL Cluster
Shivji Kumar Jha
 
MySQL User Camp: GTIDs
MySQL User Camp: GTIDsMySQL User Camp: GTIDs
MySQL User Camp: GTIDs
Shivji Kumar Jha
 
Open source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationOpen source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source Replication
Shivji Kumar Jha
 

Mais de Shivji Kumar Jha (18)

Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
 
osi-oss-dbs.pptx
osi-oss-dbs.pptxosi-oss-dbs.pptx
osi-oss-dbs.pptx
 
pulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxpulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptx
 
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
 
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarPulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for Isolation
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event Store
 
Apache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data StreamingApache Con 2021 Structured Data Streaming
Apache Con 2021 Structured Data Streaming
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
 
Pulsar Summit Asia - Running a secure pulsar cluster
Pulsar Summit Asia -  Running a secure pulsar clusterPulsar Summit Asia -  Running a secure pulsar cluster
Pulsar Summit Asia - Running a secure pulsar cluster
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
FOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationFOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group Replication
 
MySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesMySQL High Availability with Replication New Features
MySQL High Availability with Replication New Features
 
MySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityMySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and Scalability
 
MySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterMySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL Cluster
 
MySQL User Camp: GTIDs
MySQL User Camp: GTIDsMySQL User Camp: GTIDs
MySQL User Camp: GTIDs
 
Open source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationOpen source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source Replication
 

Último

一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
The Third Creative Media
 
What’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete RoadmapWhat’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete Roadmap
Envertis Software Solutions
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 
DevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps ServicesDevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps Services
seospiralmantra
 
Orca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container OrchestrationOrca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container Orchestration
Pedro J. Molina
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
kalichargn70th171
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
Jhone kinadey
 
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
kgyxske
 
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in NashikUpturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
Reetu63
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
ToXSL Technologies
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Peter Caitens
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio, Inc.
 

Último (20)

一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...
 
What’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete RoadmapWhat’s New in Odoo 17 – A Complete Roadmap
What’s New in Odoo 17 – A Complete Roadmap
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 
DevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps ServicesDevOps Consulting Company | Hire DevOps Services
DevOps Consulting Company | Hire DevOps Services
 
Orca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container OrchestrationOrca: Nocode Graphical Editor for Container Orchestration
Orca: Nocode Graphical Editor for Container Orchestration
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
 
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
 
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
 
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in NashikUpturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in Nashik
 
ppt on the brain chip neuralink.pptx
ppt  on   the brain  chip neuralink.pptxppt  on   the brain  chip neuralink.pptx
ppt on the brain chip neuralink.pptx
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom KittEnhanced Screen Flows UI/UX using SLDS with Tom Kitt
Enhanced Screen Flows UI/UX using SLDS with Tom Kitt
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
 
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
 

Navigating Transactions: ACID Complexity in Modern Databases

  • 1. Shivji Kumar Jha Navigating Transactions ACID Complexity in Modern Databases 1
  • 2. Data Platforms & OSS • Databases, streams, app architecture • Loves open source software (OSS) • And communities (meetups) • Regular speaker ( talk # 23) • Sta ff Engineer at Nutanix Shivji Kumar Jha https://www.linkedin.com/in/shivjijha https://youtube.com/@ShivjiKumarJha https://t.me/theDbShots 2
  • 3. • Transactions & ACID • Implementing Transactions • Distributed transactions • Cloud scale databases Contents 3
  • 5. Transactions Historical Perspective Almost all relational and even some non relational databases Most of them follow system R - first SQL DB by IBM in 1975. General ideas has remained same over 45+ years MySQL, Postgres, Oracle, SQL server have similar transactions 5
  • 6. Transactions Historical Perspective Transactions are antithesis of scalability Large scale systems have to abandon transaction Go for good performance and high availability 6 Transactions are essential requirements for serious applications with valuable data NoSQL Camp SQL Camp
  • 7. Transactions Historical Perspective Transactions are antithesis of scalability Large scale systems have to abandon transaction Go for good performance and high availability 7 Transactions are essential requirements for serious applications with valuable data NoSQL Camp SQL Camp Both are exaggerated! Trade-offs!
  • 8. 8 Coined in 1983 by Theo Harder and Andreas Reuter
  • 9. Atomicity Consistency Isolation Durability 9 Coined in 1983 by Theo Harder and Andreas Reuter
  • 10. The slippery slope! In practice, one database’s implementation of ACID does not equal another’s implementation For example, a lot of ambiguity in meaning of “isolation” Devil is in details! When a system claims ACID, unclear what guarantees it provides ACID has unfortunately become a marketing term Transactions 10
  • 12. • ALL OR NOTHING! • Ability to undo (abort) • Easy to RETRY transactions 12 Atomicity ACID
  • 13. Are retries really safe with transactions? 13
  • 14. Are retries really safe with transactions? • Network failed while server tried to acknowledge commit to client. Retrying means executing twice. Idempotency or de-duplication required in app! • What if the error is due to overload? • Transient or permanent error? • What if transaction had side e ff ects? Send email again? • What if client fails while retrying? Data lost? 14
  • 15. • Certain statements about data (invariants) always true • Database can’t promise • Application speci fi c guarantees • Most weakly de fi ned property in ACID 15 Consistency ACID
  • 16. Isolation ACID • Many clients access data at the same time • Accessing same records you can run into concurrency problems (race) • Database guarantees concurrently executing transactions are isolated • Textbook de fi nitions- serializability - same result as running serially • In practice, serializability rarely used because of performance penalty • Actually snapshot isolation. Much weaker guarantee! 16
  • 17. • Once committed, data written will not be forgotten • Even if there is a hardware fault or database crashes • Single node - write to HDD or SDD • Databases usually uses a write ahead log & dirty cached pages • Replicated databases - written to multiple nodes, wait until that happens! • No such thing as perfect durability ACID Durability 17 Picture: https://www.alphr.com/tell-regular-or-ssd-hard-drive/
  • 19. Implementing Transactions Balance between two problems Improve E ffi ciency Preserve Correctness Allow transactions to execute concurrently Ensure concurrently executing transactions preserve ACID properties 19
  • 20. Implementing Transactions Balance between two problems Improve E ffi ciency Preserve Correctness Allow transactions to execute concurrently Ensure concurrently executing transactions preserve ACID properties Concurrently executing transactions can cause read & write anomalies Isolation levels Concurrency Control Presence or absence of read & write anomalies How transactions are scheduled & executed 20
  • 21. 21 Single Node Transactions Storage Engine’s responsibility
  • 23. What if multiple nodes are involved in a Transaction? 23
  • 24. Send a commit request to each node & independently commit? 24
  • 25. 25 Send a commit request to each node & independently commit? • Some SUCCESS, some FAILURES! • Inconsistency between nodes • Abort if some FAILURES? • But you can’t go back on a committed promise! • How about a compensating transaction to o ff set changes? • Where does this responsibility sit? DB or App? • OR commit only if everyone promises to commit?
  • 27. Atomic Commitment • A transaction will not commit even if one of the participating votes against it. • Failed processes have to reach the same conclusion as the rest of the cohort. • Does not work in the case of Byzantine failures- process can’t lie! • Cohorts can not choose, in fl uence or change proposed transaction, they can only vote on whether or not they are willing to execute it. • Executed by transaction manager or coordinator • Example: MySQL , Postgres, dynamoDB, spanner, Kafka for producer and consumer interactions. 27
  • 29. Two Phase Commit • Coordinator (or transaction manager) a library within database server • When database is ready to commit, coordinator starts phase 1 • Coordinate sends prepare request to each node. Are you able to commit? • Coordinator tracks response from each participant • If all participants vote YES , coordinator sends COMMIT request in phase 2 • If any participant says NO, coordinator sends ABORT to all nodes in phase 2 • Coordinator must write decision in its log on disk to handle crashes 29
  • 31. Coordinator Failures Two Phase Commit • Two points of no return • 1. If Participant says YES, it has to commit if coordinator asks to. • 2. If coordinator decides once, decision is irrevocable • If participant said yes and didn’t hear back from coordinator, wait forever! • Coordinator must persist decision before it sends participants • If no COMMIT record persisted in coordinator, abort on recovery 31
  • 32. Three Phase Commit? • 2PC is a blocking protocol • 3PC is non blocking • Di ffi cult to implement in practice. Not well adopted! • 2 PC quite well adopted in spite of known problems. 32
  • 33. Distributed Transactions in Practice • Carries a heavy performance penalty • Additional fsync required for crash recovery • Addition network round trips • Distributed Transactions in MySQL are reported to be over 10 times slower than single node transactions. 33
  • 34. Other Choices? • Single leader? Everyone else executes same transactions in same order! • Manually selected leader? • Automatic selection of leader? • O ffl oaded same problem to di ff erent time. Well less frequently! • Use Consensus Algorithms: Zookeeper (ZAB), etcd (Raft) , Paxos? • Global transaction order by reaching consensus on sequencer? Calvin(FaunaDB) • 2PC over consensus groups per shard? Enter Google’s Spanner! 34
  • 35. Transactions in Modern Databases 35
  • 37. Quick Overview • Fully managed relational RDS • Service Oriented Design • Separation of Compute, storage • multi-tenant scale out storage • Segmented redo log • Throughput 5x MySQL, 3x Postgres Amazon Aurora 37
  • 38. Aurora Functional Separation DB instance Query Processing Access Methods Transactions Locking Page Cache Undo Management Storage fl eet Redo logging Materialisation of data blocks Garbage collection Backup / Restore 38
  • 39. Aurora: Quorum Style distributed coordination Not 2PC, to avoid network chatter! Read set & write set must overlap on at least one copy Write set must overlap with previous write sets 39
  • 41. Dynamo DB Highly available, weaker consistency • Always “on” Key Value store, single key operations • Sacri fi ce consistency under failure scenarios. Eventually consistent! • Extensive use of object versioning, branching OK, resolve on read • Example: shopping cart; merges carts. Can’t loose write, deleted can appear • Consistency among replicas using quorum like technique (sloppy quorum) • Gossip based distributed failure detection (hinted hando ff s) 41
  • 42. Dynamo DB Sloppy Quorum & hinted hando f • Each data item is replicated at N hosts. N distinct physical nodes • List of nodes storing a key is called it’s preference list • R : minimum no of nodes in successful read operation • W: minimum no of nodes in successful write operation 42
  • 43. Dynamo DB Sloppy Quorum & hinted hando f • Each data item is replicated at N hosts. N distinct physical nodes • List of nodes storing a key is called it’s preference list • R : minimum no of nodes in successful read operation • W: minimum no of nodes in successful write operation 43
  • 46. Transaction coordinator failure/recovery DynamoDB 46 https://www.infoq.com/articles/amazon-dynamodb-transactions/
  • 48. Spanner Google’s globally distributed SQL* database • Also inspiration for cockroachDB and yugabyteDB • Tables with rows, columns and versioned values • Supports transactions and SQL based query language • Replication con fi gs dynamically controlled at fi ne grain by apps - which data- centre’s to use, how far from users(read latency), how far are replicas (write latency), how many replicas • Clients automatically failover between replicas • Data dynamically moved between data-centres to balance resources 48
  • 49. 49
  • 50. Span server stack & transactions 50
  • 51. 51
  • 52. References • Designing Data-Intensive Applications ( Chapter 7 & 9) By Martin Kleppmann • Database Internals (Chapter 5 & 13 ) By Alex Petrov 52 Books • Amazon Aurora: Design considerations for high throughput cloud native relational databases • Amazon Aurora: On avoiding distributed consensus…. • Dynamo: Amazon’s highly available key value store • Distributed Transactions at scale in Amazon DynamoDB • Spanner : Google’s Globally Distributed Database Whitepapers https://www.infoq.com/articles/amazon-dynamodb-transactions/ Blog