SlideShare uma empresa Scribd logo
1 de 81
Baixar para ler offline
MySQL Cluster talk
DIY
No Best Practices
No Product Presentation
… you have been warned.
N marketing fluff
Foreword and disclaimer
Do it yourself, become a maker, get famous!
In this course you will learn how to create an eager update
anywhere cluster. You need:
●
A soldering iron, solder
●
Wires (multiple colors recommended)
●
A collection of computers
By the end of the talk you can either challenge MySQL, or
get MySQL Cluster for free – it's Open Source, as ever since.
Get armed with the distributed system theory you, as a
developer, need to master any distributed database.
DIY – Distributed Database
Cluster, or: MySQL Cluster
Ulf Wendel, MySQL/Oracle
N marketing fluff
Live on stage:
Making a Cluster
The speaker says...
Beautiful work, but unfortunately the DIY troubles begin
before the first message has been delivered in our cluster.
Long before we can speak about the latest hats fashion, we
have to fix wiring and communication! Communication
should be:
• Fast
• Reliable (loss, retransmission, checksum, ordering)
• Secure
Network performance is a limiting factor for
distributed systems. Hmm, we better go back to the
drawing board before we mess up more computers...
Availability
• Cluster as a whole unaffected by loss of nodes
Scalability
• Geographic distribution
• Scale size in terms of users and data
• Database specific: read and/or write load
Distribution Transparency
• Access, Location, Migration, Relocation (while in use)
• Replication
• Concurrency, Failure
Back to the beginning: goals
The speaker says...
A distributed database cluster strives for maximum
availability and scalability while maintaining distribution
transparency.
MySQL Cluster has a shared-nothing design good enough
for 99,999% (five minutes downtime per year). It scales
from Rasperry Pi run in a briefcase to 1.2 billion write
transactions per second on a 30 data nodes cluster (if using
possibly unsupported bleeding edge APIs.) It offers full
distribution transparency with the exception of partition
relocation to be triggered manually but performed
transparently by the cluster. That's to beat. Let's learn what
kind of clusters exist, how they tick and what the best
algorithms are.
Where are transactions run?
Primary Copy Update Anywhere
When does
synchronization
happen?
Eager
Not available for
MySQL
MySQL Cluster
3rd
party
Lazy
MySQL Replication
3rd
party
MySQL Cluster
Replication
What kind of cluster?
The speaker says...
A wide range of clusters can be categorized by asking
where transactions are run and when replicas
synchronize their data. Any eager solution ensures that all
replicas are synchronized at any time: it offers strong
consistency. A transaction cannot commit before
synchronization is done. Please note, what it means to
transaction rates:
• Single computer tx rate ~ disk/fsync rate
• Lazy cluster tx rate ~ disk/fsync rate
• Eager cluster tx rate ~ network round-trip time (RTT)
Test: Would you deploy MySQL Cluster on Amazon EC2 :-) ?
Lazy Primary Copy we have...
010101001011010
101010110100101
101010010101010
101010110101011
101010110111101
Master (Primary)
Write
Slave (Copy) Slave (Copy) Slave (Copy)
Read
Read
Lazy synchronization: eventual consistency
Primary Copy: where any transaction may run
The speaker says...
MySQL Replication falls into the category of lazy Primary
Copy clusters. It is a rather unflexible solution as all
updates must be sent to the primary. However, this
simplifies concurrency control of conflicting, concurrent
update transactions. Concurrency control is no different
from a single database.
Lazy replication can be fast. Transactions don't have to
wait for synchronization of replicas. The price of the fast
execution is the risk of stale reads and eventual
consistency. Transactions can be lost when the primary
crashes after commit and before any copy has been
updated. (This is something you can avoid by using MySQL
semi-sync replication, which delays the commit until delivery
to copy.)
BTW, confusing: Multi-Master
Master (Primary)
Slave (Copy)
Master (Primary)
Slave (Copy)
SET A = 1 SET B = 1
A, B A, B
The speaker says...
Be aware of the term Multi-Master. MySQL Community
sometimes uses it to describe a set of Primary Copy
clusters where primaries (master) replicate from each
other. This is one of the many possible topologies that you
can build with MySQL Replication. In the example, the PC
cluster on the left manages table A and the PC cluster on
the right manages table B. The Primaries copy table A
respectively table B from each other. There is no
concurrency control and conflicts can arise. There is no
distribution transparency. This is not an own kind of cluster
with regards to our where and when criteria. And, it is
rarely what you want...
Not a good goal for DIY – let's move on.
Let's do Eager Update Anywhere
010101001011010
101010110100101
101010010101010
101010110101011
101010110111101
Replica
Write
Replica Replica Replica
Read
Eager synchronization: strong consistency
Update Anywhere: any transaction can run on any replica
The speaker says...
An eager update anywhere cluster improves
distribution transparency and removes the risk of
reading stale data. Transparency and flexibility is improved
because any transaction can be directed to any
replica. Synchronization happens as part of the commit,
thus strong consistency is achieved. Remember:
transaction rate ~ network RTT. Failure tolerance is
better than with Primary Copy. There is no single point of
failure – the primary - that can cause a total outage of the
cluster. Nodes may fail without bringing the cluster down
immediately. Concurrency control (synchronization) is
complex as concurrent transactions from different replicas
may conflict.
Concurrency Control: 1SR
010101001011010
101010110100101
101010010101010
101010110101011
101010110111101
Replicat0
: SET a = 1 Replica t0
: SET a = 2
One-Copy-Serializability (1SR) for correctness
• All replicas must decide on the same transaction order
a = 1
a = 2
a = 2a = 1
a = 1
010101001011010
101010110100101
101010010101010
101010110101011
101010110111101
The speaker says...
Concurrent ACID transactions must be isolated from each
other to ensure correctness. The database system needs a
mechanism to detect conflicts. If any, transactions need to
be serialized. The challenge is to have all replicas commit
transactions in the same serial order. One-Copy-
Serializability (1SR) demands the concurrent
execution of transactions in an replicated database
to be equivalent to a serial execution of these
transactions over a single logical copy of the
database. 1SR is the highest level of consistency, lower
exist, for example, snapshot isolation. Given that, the
questions are:
• How to detect conflicting transactions?
• How to enforce a global total order?
Certification: detect conflict
Replica
Update transaction
Replica
Read query
Replica
Read set: a = 1
Write set: b = 12
Transactions get executed and certified before commit
• Conflict detection is based on read and write sets
• Multi-Primary deferred update
Certification Certification
The speaker says...
(For brevity we discuss multi-primary deferred update only.)
In a multi-primary deferred update system a read
query can be served by a replica without consulting
any of the other replicas. A write transaction must be
certified by all other replicas before it can commit.
During the execution of the transaction, the replica records
all data items read and written. The read/write sets are then
forwarded by the replica to all other replicas to certify the
remote transaction. The other replicas check whether the
remote transaction includes data items modified by an
active local transaction. The outcome of the certification
decides on commit or abort. Either symetric (statement
based) or asymetric (row based) replication can be used.
Concurrency Control
010101001011010
101010110100101
101010010101010
101010110101011
101010110111101
Replicat0
: SET a = 1 Replica t0
: SET a = 2
Various synchronization mechanisms
• Atomic commit
• Atomic broadcast
• Strict two-phase locking (2PL)
• Optimistic, Physical clock, Lamport's clock, vector clock...
a = 1
a = 2
a = 1a = 1
a = 2
The speaker says...
One challenge remains: replicas must agree on a global
total order for comitting transactions no matter in
which order they receive messages.
We will discuss atomic commit (two-phase-locking) and
atomic broadcast. The other approaches are out of scope.
Atomic commit for CC
Execute Committing PreCommit
Aborted
Comitted
Formula (background): serial execution, unnecessary
aborts
The speaker says...
Atomic commit can be expressed as a state machine with
the final states abort and commit. Once a transaction has
been executed, it enters the committing state in which
certification/voting takes place. Given the absence of
conflicting concurrent transactions, a replica sets the
transactions status to precommit. If all replicas precommit,
the transaction is comitted, otherwise it is aborted.
Don't worry about the formula. It checks for concurrent
transactions – as we did before – and ensures, in case of
conflicts, that only one transaction can commit at a time.
Problem: it may also do unnecessary aborts
depending on message delivery order as it requires all
servers to precommit->commit in the same order.
Atomic broadcast for CC
Atomic broadcast guarantees
• Agreement: if one server delivers a message, all will
• Total order: all servers deliver messages in the same order
Greatly simplified concurrency check
• Deterministic: no extra communication after local decision
The speaker says...
Atomic broadcast ensures that transaction are delivered in
the same order to all replicas. Thus, certification of
transactions is deterministic: all replicas will make the same
decision about commit or abort because they all base their
decision on the same facts. This in turn means that there is
no need to coordinate the decisions of all replicas – all
replicas will make the same decision.
A transaction does not conflict and thus will commit, if its
executed after the commit of any other transaction, or its
read set does not overlap with the write set of any other
transaction. The formula is greatly simplified! Great for DIY!
Voting quorum: ROWA, or...?
Read-One Write-All is a special quorum
• Quorum constraints: NR
+ NW
> N, NW
> N/2
Replica
Replica
Replica
Replica
Replica
Replica
Replica
Replica
Replica
Replica
Replica
Replica
Example: N= 12, read quorum NR
= 3, write quorum NW
= 10
Replica Replica Replica
Example: N= 3, read quorum NR
= 2, write quorum NW
= 2
The speaker says...
So far we have silently assumed a Read-One Write-All
(ROWA) quorum for voting. Reads could be served locally
because updates have been applied to all replicas.
Alternatively, we could make a rule that an update has to be
agreed by and applied to half of the replicas plus one. This
may be faster than achieving agreement among all replicas.
However, for a correct read we now have to contact half of
the replicas plus one and check whether they all give the
same reply. If so, we must have read the latest version as
the remaining, unchecked replicas form a minority that
cannot be updated. The read quorum overlaps the write
quorum by at least one element.
Voting quorum: ROWA!
ROWA almost always performs better
• Are Quorums an Alternative for Data Replication?
(Jimenez-Peris et.al.)
• „The obvious conclusion from these results is that ROWAA is
the best choice for a wide range of application scenarios. It offers
good scalability (within the limitations of replication protocols),
very good availability, and an acceptable communication
overhead. It also has the significant advantage of being very
simple to implement and very easy to adapt to configuration
changes. For every peculiar loads and configurations, it is possible
that some variation of quorum does better than ROWAA.“
• Background: scale out results from study
The speaker says...
Judging from the paper ROWA respectively Read-
One Write-All-Available (ROWAA) is a promising
approach. For example, it offers linear scalability for read
only workloads but still remains competitive for mixed
update and read loads. It requires a high write-to-read ratio
before the various Quorum algorithms outperform ROWA on
scalability. In sum: ROWA beats Quorums by a magnitude
for read but does not drop by a magniture for write, and the
web is read dominated. Scalability is one aspect.
Quorums also help with availability – the studies
finding is similar: ROWA is fine.
DIY decision on currency control: ROWA, atomic broadcast.
Quiz: name a system using Quorums? Riak! Next:
Availability and Fault Tolerance.
Complex failure handling required
• Later evolution: Three-Phase Commit (3PC)
Fault Tolerance: 2PC
Coordinator Participant Participant
Vote Request
PreCommit
PreCommit
Vote Request
Global Commit
Commit
The speaker says...
When discussing atomic commit we have effectively shown
the Two-Phase Commit (2PC) protocol. 2PC starts with a
vote request multicasted from a coordinator to all
participants. The participants either vote to commit
(precommit) or abort. Then, the coordinator checks the
voting result. If all voted to commit, it sends a global
commit messages and the participants commit. Otherwise
the coordinator sends a global abort command. Various
issues may arise in case of network or process
failures. Some cannot be cured using timeouts. For
example, consider the situation when a participant
precommits but gets no global commit or global abort. The
participant cannot uniliterally leave the state. At best, it can
ask another participant what to do.
Two-Phase Commit is a blocking protocol
Fault Tolerance: 2PC
Coordinator Participant Participant
Vote Request
PreCommit
PreCommit
Vote Request
The speaker says...
The worst case scenario is a crash of the coordinator after
all participants have voted to precommit. The participants
cannot leave the precommit state before the coordinator has
recovered. They do not know whether all of them have
voted to commit or not. Thus, they do not know whether a
global commit or global abort has to be performed.
As none of them has received a message about the outcome
of the voting, the participants cannot contact one another
and ask for the outcome.
Two-Phase Commit is also known as a blocking
protocol.
Reliable multicast/broadcast
• Build on the idea of group views and view changes
Virtual Synchrony
P1
P2
P3
P4
M1
M2
VC
M3
M4
G1 = {P1, P2, P3} G2 = {P1, P2, P3, P4}
The speaker says...
Virtual Synchrony is a mechanism that does not block. It is
build around the idea of associating multicast messages with
the notion of a group. A message is delivered to all
members of a group but no other processes. Either the
message is delivered to all members of a group or to none
of them. All members of the group agree that they are part
of the group before the message is multicasted (group
view). In the example, M1...3 are associated with the group
G1 = {P1, P2, P3}. If a process wants to join or leave a
group a view change message is multicated. In the
example, P4 wants to join the group and a VC message is
send while M3 is still being delivered. Virtual Synchrony
requires that either M3 is delivered to all of G1 before the
view change takes place or to none.
View changes act as a message barrier
• Remember the issues with 2PC …?
Virtual Synchrony
P1
P2
P3
P4
M5
VC
M6
G2 = {P1, P2, P3, P4} G3 = {P1, P2, P3}
M7
M8
The speaker says...
There is only one condition under which a multicast
message is allowed not to be delivered: if the sender
crashed. Assume the processes continue working and
multicast messages M5, M6, M7 to group G2 = {P1, P2, P3,
P4}. While P4 sends M7 it crashes. P4 has managed to
deliver its message to {P3}. The crash of P4 is noticed and a
view change is triggered. Because Virtual Synchrony
requires a message to be delivered to all members of the
group associated with it but the sender crashed, P3 is free
to drop M7 and the view change can take place.
A new group view G3 is established and messages can be
exchanged again.
Wire: message ordering and fault tolerance
• Common choices: UDP or TCP over IP
Reliable, delivered vs. received
010101001011010
101010110100101
101010010101010
101010110101011
101010110111101
ReplicaReplica
Update 1 Update 2
t1
: Update 1
t2
: Update 2
t1
: Update 2
t2
: Update 1 (lost)
The speaker says...
Virtual Synchrony offers reliable multicast. Reliability can be
best achieved using a protocol higher up on the OSI model.
Isis, an early framework implementing Virtual Synchrony,
has used TCP point to point connections if reliable service
was requested. TCP is a connection oriented protocol
(endpoint failures can be deteted easily) with error handling
and message delivery in the order sent. However, using
TCP only there are no ordering constraints between
messages from any two senders. Those ordering
constraints have to be implemented at the application layer.
We say a message can be recieved on the network layer
in a different order than its delivered to the application
by the model discussed. Vector clocks can be used for
global total ordering.
AB = Virtual Synchrony offering total-order delivery
• „Synchrony“ does not refer to temporal aspects
Atomic broadcast definition
P1
P2
P3
P4
M1
M2
Unordered delivery Ordered delivery
P1
P2
P3
P4
M1
M2
The speaker says...
Atomic broadcast means Virtual Synchrony used with total-
order message ordering. When Virtual Synchrony was
introduced back in the mid 80s, it was explicitly designed to
allow other message orderings. For example, it should be
able to support distributed applications that have a notion of
finding messages that commute, and thus may be applied in
an order different from the order sent to improve
performace. If events are applied in different order on
different processes, the system cannot be called
synchronous any more – the inventors called it virtually
synchronous.
However, recall we are only after total-ordering for 1SR.
Wash the brain without marketing fluff, split brain, done!
• System dependent... E.g. Isis failure detector was very basic
How to cook brains
P1
P2
P3
P4
M1
M2
n1({P1, P2, P3, P4]) = 4
VC
Split brain – Connection lost
n2({P1, P2}) = 2 < (n1/2)
The speaker says...
The failure of individual processes – or database replicas –
has been discussed. The model has measures to handle
them following using a fail stop approach.
To conclude the discussion of fault tolerance we look at a
situation called split brain: one half of the cluster lost
connection to another half. Which shall survive? The
answer is often implementation dependent. For
example, the early Virtual Synchrony framework Isis has a
rule that a new group view can only be installed if it
contains n / 2 + 1 members with n being the number of
members in the current group. In the example both halves
would shut down. Brain splitting question: how many
replicas would you project for a cluster if you don't know
split brain implementation details?
In-core architecture
DIY: Hack MySQL (oh, oh), or...?
MySQL DBMS MySQL DBMS
Load Balancer
PECL/mysqlnd_ms MySQL Proxy
PHP PHP PHP
Reflector Reflector
Replicator Replicator
GCS
The speaker says...
Here's a generic architecture made of five components:
• Clients (PHP, Java, …) using well known interfaces
• Load Balancer (for example PECL/mysqlnd_ms)
• The actual database system
• The reflector allows inspection and modification of on-
going transactions
• The (distributed) replicator handling concurrency
control
• The Group Communication System (GCS) provides
communication primitives such as multicast (GCS
examples: Appia, JGroups – Java, Spread – C/C++)
Middleware architecture
DIY: Hack MySQL (oh, oh), or...?
Virtual DBMS Virtual DBMS
Load Balancer
Clients
Reflector Reflector
Replicator Replicator
GCS
DBMS DBMS
The speaker says...
An in-core design requires support for a reflector by the
database. Strictly speaking there is no API inside MySQL one
can use. The APIs used for MySQL Replication are not
sufficient. Nonetheless, MySQL Replication can be
classified as in-core in our model. Due to the lack of an
reflector API, the only third party product following an in-
core design (Galera by Codership) has to patch the
MySQL core.
Tungsten Replicator by Continuent is a Middleware
design. Clients contact a virtual database. Requests are
intercepted, parsed and replicated. The challenge is in the
interception: statements using non-deterministic calls such
as NOW() and TIME() must be taken care of.
Hybrid architecture
DIY: Hack MySQL (oh, oh), or...?
DBMS DBMS
Load Balancer
Clients
Reflector Plugin Reflector Plugin
Replicator Replicator
GCS
The speaker says...
In a hybrid architecture the reflector runs within the
database process but the replicator layer is using extra
processes.
It is not a perfect comparison as we will see later but for
the sake of our model, we can classify MySQL Cluster as a
hybrid architecture. The reflector is implemented as a
storage engine. The replicator layer is using extra processes.
This design has some neat MySQL NDB Cluster specific
benefits. If any MySQL product has NoSQL genes, it is
MySQL Cluster.
Primary Copy Update Anywhere
Eager
Not available for
MySQL
MySQL Cluster (Hybrid)
Galera (In-core)
Lazy
MySQL Replication
(In-core)
Tungsten
(Middleware)
MySQL Cluster
Replication
(Hybrid)
DIY: Summary
The speaker says...
Time for a summary before coding ants and compilers start
their work. From a DIY perspective we can skip Lazy
Primary Copy: it has simple concurrency control, it
does not depend on network speed, it is great for flacky
and slow WAN connections but it offers eventual
consistency only (hint: enjoy PECL/mysqlnd_ms!), it has
no means to scale writes. And, it exists – no karma...
An eager update anywhere solution offering the highest
level of correctness (1SR) gives you strong consistency. It
scales writes to some degree because they can be
executed on any replica, which parallizes execution load.
Commit performance is network bound.
Full Replication Partitial Replication
Read
Scale Out
Write
Scale Out
Capability
MySQL Replication
(Lazy Primary Copy,
In-core)
MySQL Cluster
(Eager Update
Anywhere,
Hybrid)
Tungsten
(Primary Copy,
Middleware)
Galera
(Eager Update Anywhere,
In-core)
If 1SR - hard limit
DIY: The Master Class
The speaker says...
The DIY Master Class for maximum karma is a partial
replication solution offering strong consistency. Partial
replication is the only way to ultimately scale write
requests. The explanation is simple: every write adds load
to the entire cluster. Remember that writes need to be
coordinated, remember that concurrency control involves all
replicas (ROWA) or a good number of them (Quorum).
Thus, every additional replica adds load to all others. The
solution is to partition the data set and keep each partition
on a subset of all replicas only. NoSQL calls it sharding,
MySQL Cluster calls it partitioning. Partial replication –
that's the DIY master piece, that will give you KARMA.
Availability
• Shared-nothing, High Availability (99,999%)
• WAN Replication to secondary data centers
Scalability
• Read and write through partial replication (partitioning)
• Distributed queries (parallize work), real-time guarantees
• Focus In-Memory with disk storage extension
• Sophisticated thread model for multi-core CPU
• Optimized for short transaction (hundrets of operations)
Distribution Transparency
• SQL level: 100%, low-level interfaces available
MySQL (NDB) Cluster goals
The speaker says...
I am not aware of text books discussing partial
replication theory in-depth. Thus, we have to reverse
engineer an existing system. As this is a talk about
MySQL Cluster, how about talking about MySQL Cluster
finally?MySQL Cluster has originally been developed to serve
telecommunication systems. It aims to parallize work as
much as possible, hence it is a distributed database. It
started as an in-memory solution but can store data on disk
meanwhile. It runs best in environments offering low
network latency, high network throughput and issuing short
transactions. Applications should not demand complex joins.
There is no chance you throw Drupal at it and Drupal runs
super-fast out of the box! Let's see why...
SQL view: Cluster is yet another table storage engine
MySQL Cluster is a hybrid
MySQL MySQL
Load Balancer
Clients
Reflector Plugin = NDB Storage Engine
Replicator = NDB Data Node
GCS
The speaker says...
MySQL Cluster has a hybrid architecture. It consists of the
green elements on the slide. The Reflector is
implemented as a MySQL storage engine. From a SQL
user's perspective, it is just another storage engine, similar
to MyISAM, InnoDB or others (Distribution Transparency).
From a SQL client perspective there is no change: all MySQL
APIs can be used. The Reflector (NDB Storage Engine) runs
as part of the MySQL process. The Replicator is a
seperate process called NDB data node. Please note,
node means process not machine. MySQL Cluster does not
fit perfectly in the model: an NDB data node combines
Replicator and storage tasks.
BTW, what happens to Cluster if a MySQL Server fails?
Fast low-level access: bypassing the SQL layer
MySQL Cluster is a beast
MySQL MySQL
Load Balancer
Clients
Reflector Plugin = NDB Storage Engine
Replicator = NDB Data Node
GCS
Clients
4.3b read tx/s
1.2b write tx/s
(in 2012)
The speaker says...
From the perspective of MySQL Cluster, a MySQL Server is
yet another application client. MySQL Server happens to be
an application that implements a SQL view on the relational
data stored inside the cluster.
MySQL Cluster users often bypass the SQL layer by
implementing application clients on their own. SQL is a rich
query language but parsing a SQL query can take 30...50%
of the total runtime of a query. Thus, bypassing is a good
idea. The top benchmark results we show for Cluster are
achieved using C/C++ clients directly accessing MySQL
Cluster. There are many extra APIs for this special case:
NDB API (C/C++, low level), ClusterJ (ORM style),
ClusterJPA (low level), … - even for node.js (ORM style)
Partitioning (auto-sharding)
NDB Data Node 1 NDB Data Node 2
NDB Data Node 3 NDB Data Node 4
Partition 0, Primary
Partition 2, Copy
Partition 0, Copy
Partition 2, Primary
Partition 1, Primary Partition 1, Copy
Partition 3, Copy Partition 3, Primary
Node Group 1
Node Group 0
The speaker says...
There is a lot to say about how MySQL Cluster partitions a
table and spreads it over nodes. The manual has all details,
just all...
The key idea is to use an eager primary copy approach for
partitions combined with a mindful distribution of each
partitions primary and its copies. NDB supports zero or one
copies (replication factor). The failure of a partitions primary
does not cause a failure of the Cluster. In the example, the
failure of any one node has no impact. Also, node 1 and 4
may fail without a stop of the Cluster (fail stop model). But
the cluster shuts down if all nodes of a node group fail.
Concurrency Control: 2PL,“2PC“
NDB Data Node 1 NDB Data Node 2
NDB Data Node 3 NDB Data Node 4
Partition 0, Primary
Partition 2, Copy
Partition 0, Copy
Partition 2, Primary
Partition 1, Primary Partition 1, Copy
Partition 3, Copy Partition 3, Primary
W
R
R
The speaker says...
Buuuuh? Two-Phase-Locking (2PL) and Two-Phase-Commit
(2PC) are used for concurrency control. Cluster is using
traditional row locking to isolate transactions. Read and
write locks can be distributed throughout the cluster. The
locks are set on the primary partitions. Transactions are
serialized during execution. When a transaction commits, an
optimized Two-Phase-Commit is used to synchronize the
partition copies.
The SQL layer recognizes the commit as soon as the copies
are updated (and before logs have been written to disk).
The low-level NDB C/C++ application API is asynchronous.
Fire and forget is possible: your application can continue
before transaction processing as even begun!
Brain Masala
NDB Data Node 1 NDB Data Node 2
NDB Data Node 3 NDB Data Node 4
Partition 0, Primary
Partition 2, Copy
Partition 0, Copy
Partition 2, Primary
Partition 1, Primary Partition 1, Copy
Partition 3, Copy Partition 3, Primary
Arbitrator
The speaker says...
The failure of a single node is detected using a hearthbeat
protocol: details are documented, future improvements are
possible. Both MySQL Cluster and Virtual Synchrony
seperate message delivery from node failure detection.
The worst case scenario of a brain split is cured by the
introduction of arbitrators. If the nodes split and each half
is able to keep the Cluster up, the nodes try to contact the
arbitrator. It is then up to the arbitrator to decide who stays
up and who shuts down. Arbitrators are extra processes,
ideally run on extra machines. Management nodes can act
as arbitrators too. You need at least one management node
for administration, thus you always have an arbitrator
readily available.
Drupal? Sysbench? Oh, oh...
NDB Data Node 1 NDB Data Node 2
NDB Data Node 3 NDB Data Node 4
Partition 0, Primary
Partition 2, Copy
Partition 0, Copy
Partition 2, Primary
Partition 1, Primary Partition 1, Copy
Partition 3, Copy Partition 3, Primary
MySQL
The speaker says...
Partial replication (here: partitioning, sharding) is the only
known solution to the write scale out problem. But, it comes
at the high price of distributed queries.
A SQL query may require reading data from many partitions.
One the one hand work is nicely parallized over many nodes
on the other hand, records found have to be transferred
within the cluster from one node to another. Although
Cluster tries to batch requests efficiently together to
minimize communication delays, transferring data from node
to node to answer questions remains an expensive
operation.
Oh, oh... tune your partitions!
NDB Data Node 1 NDB Data Node 2
NDB Data Node 3 NDB Data Node 4
Partition 0, Primary
Partition 2, Copy
Partition 0, Copy
Partition 2, Primary
Partition 1, Primary Partition 1, Copy
Partition 3, Copy Partition 3, Primary
MySQL
CREATE TABLE cities {
id INT NOT NULL,
Population INT UNSIGNED,
city_name VARCHAR(100),
PRIMARY KEY(city_name, id)
}
SELECT id FROM cities
WHERE
city_name = 'Kiel'
The speaker says...
How much traffic and latency occurs depends on the actual
SQL query and the partitioning scheme. By default a table
is partitioned into 3840 virtual fragments (think
vBuckets) using its primary key. The partitioning can
and should be tuned.
Try to find partitioning keys that make your common,
expensive or time-criticial queries run on a single node.
Assume you have a list of cities. City names are not unique,
thus you have introduced a numeric primary key. It is likely
that your most common query checks for the city name not
for the numeric primary key only. Therefore, your
partitioning should be based on city name as well.
The ultimate Key-Value-Store?
NDB Data Node 1 NDB Data Node 2
NDB Data Node 3 NDB Data Node 4
Partition 0, Primary
Partition 2, Copy
Partition 0, Copy
Partition 2, Primary
Partition 1, Primary Partition 1, Copy
Partition 3, Copy Partition 3, Primary
MySQL
CREATE TABLE cities {
id INT NOT NULL,
city_name VARCHAR(100),
PRIMARY KEY(id)
}
SELECT FROM cities
WHERE id = 1
SELECT FROM citites
WHERE id = 100
The speaker says...
I may have stated it before: if there is any product at
MySQL that can compete with NoSQL (as in Key-Value-
Store) on the issue of distributed data stores, it is MySQL
Cluster.
An optimal query load for MySQL Cluster is one that
primarily performs lookups on partition keys. Each query will
execute on one node only. There is little traffic within the
cluster – little network overhead. Work load is perfectly
parallized.
Will your unmodified PHP application perform on Cluster?
Joins: 24...70x faster
Then
Now
NDB_API> read a from table t1 where pk = 1
[round trip]
(a = 15)
NDB_API> read b from table t2 where pk = 15
[round trip]
(b = 30)
[return a = 15, b = 30]
SELECT t1.a, t2.b FROM t1, t2
WHERE t1.pk = 1 AND t1.a = t2.pk
NDB_API> read @a=a from table t1 where pk = 1;
read b from table t2 where pk = @a
[round trip]
The speaker says...
In 7.2 we claim certain joins to execute 24...70x faster by
the help of AQL (condition push-down)! How come?
Partial replication does not go together well with joins. Take
this simple nested join as an example. There are two tables
to join. The join condition of the second table depends on
the values of the first table. Thus, t1 has to be searched
before t2 can be searched and the result can be returned to
the user. That makes two operations and two round trips.
As of 7.2, there is a new batched way of doing it. It saves
round trips. Some round trips avoided means – at the
extreme - 24...70x faster: the network is your enemy #1.
Benchmark pitfall: connections
NDB Data Node 1 NDB Data Node 2
NDB Data Node 3 NDB Data Node 4
MySQL
Load Balancer
Many, many clients
MySQL
NDB Storage Engine NDB Storage Engine
The speaker says...
If you ever come to the point of evaluating MySQL Cluster,
make sure you configure MySQL to Cluster connections
appropriately (ndb_cluster_connection_pool).
A MySQL Server with only one connection (default setting)
from itself to the cluster may not be able to serve many
concurrent clients at the rate the Cluster part itself might be
able to handle them. The connection may an impose an
artifical limitation on the cluster throughput.
Adding nodes, rebalancing
NDB Data Node 1 NDB Data Node 2
NDB Data Node 3 NDB Data Node 4
Partitions Partitions
Partitions Partitions
NDB Data Node 5 NDB Data Node 6
The speaker says...
Adding nodes, growing the capacity of your cluster in terms
of size and computing power, is an online operation. At any
time you can add nodes to your cluster.
New nodes do not immediately participate in
operations. You have to tell the cluster what to do with
them: use for new tables, or use for growing the capacity
available to existing tables. When growing existing tables,
data needs to be redistributed to the new nodes.
Rebalancing is an online operation: it does not block
clients. The partitioning algorithm used by Cluster ensures
that data is copied to new nodes only, there is no
traffic between nodes currently holding fragments of
the table to be rebalanced.
We shall...
• Code an Eager Update-Anywhere Cluster
• Prefer an hybrid design to get not too deep into MySQL
• Do not fear the lack of text books on partital replication
• Read CPU vendor tuning guides like comics
• Like Sweden or Finland
Send your application to the MySQL Cluster team.
Cluster is different. MySQL Cluster is perfect for web
session storage. Whether your Drupal, WordPress, …
runs faster is hard to tell – possibly not faster.
PS (marketing fluff): ask Sales for a show!
DIY - Summary
The speaker says...
By the end of this talk you should remember at least this:
●
There are four kinds of replication solutions based on a
matrix asking „where can all transactions run“ and „when
are replicas synchronized“
●
Clusters don't make everything faster – the network is
your enemy. For read scale out there are proven
solutions.
●
Write scale out is only possible through partial replication
(Small write Quorum would impact read performance)
THE END
Contact: ulf.wendel@oracle.com
The speaker says...
Thank you for your attendance!
Upcoming shows:
Talk&Show! (ask... :-))
YourPlace, any time
PHP Summit
Munich, December 2013

Mais conteúdo relacionado

Mais procurados

MySQL native driver for PHP (mysqlnd) - Introduction and overview, Edition 2011
MySQL native driver for PHP (mysqlnd) - Introduction and overview, Edition 2011MySQL native driver for PHP (mysqlnd) - Introduction and overview, Edition 2011
MySQL native driver for PHP (mysqlnd) - Introduction and overview, Edition 2011Ulf Wendel
 
PoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HAPoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HAUlf Wendel
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesUlf Wendel
 
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4Ulf Wendel
 
The PHP mysqlnd plugin talk - plugins an alternative to MySQL Proxy
The PHP mysqlnd plugin talk - plugins an alternative to MySQL ProxyThe PHP mysqlnd plugin talk - plugins an alternative to MySQL Proxy
The PHP mysqlnd plugin talk - plugins an alternative to MySQL ProxyUlf Wendel
 
Built-in query caching for all PHP MySQL extensions/APIs
Built-in query caching for all PHP MySQL extensions/APIsBuilt-in query caching for all PHP MySQL extensions/APIs
Built-in query caching for all PHP MySQL extensions/APIsUlf Wendel
 
Intro to PECL/mysqlnd_ms (4/7/2011)
Intro to PECL/mysqlnd_ms (4/7/2011)Intro to PECL/mysqlnd_ms (4/7/2011)
Intro to PECL/mysqlnd_ms (4/7/2011)Chris Barber
 
HTTP Plugin for MySQL!
HTTP Plugin for MySQL!HTTP Plugin for MySQL!
HTTP Plugin for MySQL!Ulf Wendel
 
PHP mysqlnd connection multiplexing plugin
PHP mysqlnd connection multiplexing pluginPHP mysqlnd connection multiplexing plugin
PHP mysqlnd connection multiplexing pluginUlf Wendel
 
Highly Available MySQL/PHP Applications with mysqlnd
Highly Available MySQL/PHP Applications with mysqlndHighly Available MySQL/PHP Applications with mysqlnd
Highly Available MySQL/PHP Applications with mysqlndJervin Real
 
MySQL Multi Master Replication
MySQL Multi Master ReplicationMySQL Multi Master Replication
MySQL Multi Master ReplicationMoshe Kaplan
 
Introduction to Galera
Introduction to GaleraIntroduction to Galera
Introduction to GaleraHenrik Ingo
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Marco Tusa
 
Zero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesZero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesSeveralnines
 
Using and Benchmarking Galera in different architectures (PLUK 2012)
Using and Benchmarking Galera in different architectures (PLUK 2012)Using and Benchmarking Galera in different architectures (PLUK 2012)
Using and Benchmarking Galera in different architectures (PLUK 2012)Henrik Ingo
 
Mysql high availability and scalability
Mysql high availability and scalabilityMysql high availability and scalability
Mysql high availability and scalabilityyin gong
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - SlidesSeveralnines
 
Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6
Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6
Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6Severalnines
 

Mais procurados (20)

MySQL native driver for PHP (mysqlnd) - Introduction and overview, Edition 2011
MySQL native driver for PHP (mysqlnd) - Introduction and overview, Edition 2011MySQL native driver for PHP (mysqlnd) - Introduction and overview, Edition 2011
MySQL native driver for PHP (mysqlnd) - Introduction and overview, Edition 2011
 
PoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HAPoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HA
 
Data massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodesData massage: How databases have been scaled from one to one million nodes
Data massage: How databases have been scaled from one to one million nodes
 
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
 
The PHP mysqlnd plugin talk - plugins an alternative to MySQL Proxy
The PHP mysqlnd plugin talk - plugins an alternative to MySQL ProxyThe PHP mysqlnd plugin talk - plugins an alternative to MySQL Proxy
The PHP mysqlnd plugin talk - plugins an alternative to MySQL Proxy
 
Built-in query caching for all PHP MySQL extensions/APIs
Built-in query caching for all PHP MySQL extensions/APIsBuilt-in query caching for all PHP MySQL extensions/APIs
Built-in query caching for all PHP MySQL extensions/APIs
 
Intro to PECL/mysqlnd_ms (4/7/2011)
Intro to PECL/mysqlnd_ms (4/7/2011)Intro to PECL/mysqlnd_ms (4/7/2011)
Intro to PECL/mysqlnd_ms (4/7/2011)
 
HTTP Plugin for MySQL!
HTTP Plugin for MySQL!HTTP Plugin for MySQL!
HTTP Plugin for MySQL!
 
PHP mysqlnd connection multiplexing plugin
PHP mysqlnd connection multiplexing pluginPHP mysqlnd connection multiplexing plugin
PHP mysqlnd connection multiplexing plugin
 
Highly Available MySQL/PHP Applications with mysqlnd
Highly Available MySQL/PHP Applications with mysqlndHighly Available MySQL/PHP Applications with mysqlnd
Highly Available MySQL/PHP Applications with mysqlnd
 
MySQL Multi Master Replication
MySQL Multi Master ReplicationMySQL Multi Master Replication
MySQL Multi Master Replication
 
Introduction to Galera
Introduction to GaleraIntroduction to Galera
Introduction to Galera
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
 
Zero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best PracticesZero Downtime Schema Changes - Galera Cluster - Best Practices
Zero Downtime Schema Changes - Galera Cluster - Best Practices
 
Using and Benchmarking Galera in different architectures (PLUK 2012)
Using and Benchmarking Galera in different architectures (PLUK 2012)Using and Benchmarking Galera in different architectures (PLUK 2012)
Using and Benchmarking Galera in different architectures (PLUK 2012)
 
Mysql high availability and scalability
Mysql high availability and scalabilityMysql high availability and scalability
Mysql high availability and scalability
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
 
Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6
Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6
Webinar slides: Introducing Galera 3.0 - Now supporting MySQL 5.6
 
Introducing Galera 3.0
Introducing Galera 3.0Introducing Galera 3.0
Introducing Galera 3.0
 

Destaque

Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Severalnines
 
MySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached APIMySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached APIMat Keep
 
Conference tutorial: MySQL Cluster as NoSQL
Conference tutorial: MySQL Cluster as NoSQLConference tutorial: MySQL Cluster as NoSQL
Conference tutorial: MySQL Cluster as NoSQLSeveralnines
 
High Availability with MySQL
High Availability with MySQLHigh Availability with MySQL
High Availability with MySQLThava Alagu
 
MySQL Proxy: Architecture and concepts of misuse
MySQL Proxy: Architecture and concepts of misuseMySQL Proxy: Architecture and concepts of misuse
MySQL Proxy: Architecture and concepts of misuseweigon
 
MySQL Fabric: High Availability using Python/Connector
MySQL Fabric: High Availability using Python/ConnectorMySQL Fabric: High Availability using Python/Connector
MySQL Fabric: High Availability using Python/ConnectorVishal Yadav
 
MySQL Proxy. From Architecture to Implementation
MySQL Proxy. From Architecture to ImplementationMySQL Proxy. From Architecture to Implementation
MySQL Proxy. From Architecture to ImplementationRonald Bradford
 
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase
 
MySQL HA Solutions
MySQL HA SolutionsMySQL HA Solutions
MySQL HA SolutionsMat Keep
 
MySQL Proxy. A powerful, flexible MySQL toolbox.
MySQL Proxy. A powerful, flexible MySQL toolbox.MySQL Proxy. A powerful, flexible MySQL toolbox.
MySQL Proxy. A powerful, flexible MySQL toolbox.Miguel Araújo
 
MySQL High Availability Deep Dive
MySQL High Availability Deep DiveMySQL High Availability Deep Dive
MySQL High Availability Deep Divehastexo
 
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware companyMySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware companyContinuent
 
High-Availability using MySQL Fabric
High-Availability using MySQL FabricHigh-Availability using MySQL Fabric
High-Availability using MySQL FabricMats Kindahl
 
MySQL Replication: What’s New in MySQL 5.7 and Beyond
MySQL Replication: What’s New in MySQL 5.7 and BeyondMySQL Replication: What’s New in MySQL 5.7 and Beyond
MySQL Replication: What’s New in MySQL 5.7 and BeyondAndrew Morgan
 
Using MySQL Fabric for High Availability and Scaling Out
Using MySQL Fabric for High Availability and Scaling OutUsing MySQL Fabric for High Availability and Scaling Out
Using MySQL Fabric for High Availability and Scaling OutOSSCube
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLRené Cannaò
 
Methods of Sharding MySQL
Methods of Sharding MySQLMethods of Sharding MySQL
Methods of Sharding MySQLLaine Campbell
 
MySQL Day Paris 2016 - MySQL HA: InnoDB Cluster and NDB Cluster
MySQL Day Paris 2016 - MySQL HA: InnoDB Cluster and NDB ClusterMySQL Day Paris 2016 - MySQL HA: InnoDB Cluster and NDB Cluster
MySQL Day Paris 2016 - MySQL HA: InnoDB Cluster and NDB ClusterOlivier DASINI
 

Destaque (20)

Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
Galera Cluster for MySQL vs MySQL (NDB) Cluster: A High Level Comparison
 
MySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached APIMySQL Cluster NoSQL Memcached API
MySQL Cluster NoSQL Memcached API
 
Conference tutorial: MySQL Cluster as NoSQL
Conference tutorial: MySQL Cluster as NoSQLConference tutorial: MySQL Cluster as NoSQL
Conference tutorial: MySQL Cluster as NoSQL
 
High Availability with MySQL
High Availability with MySQLHigh Availability with MySQL
High Availability with MySQL
 
MySQL Proxy: Architecture and concepts of misuse
MySQL Proxy: Architecture and concepts of misuseMySQL Proxy: Architecture and concepts of misuse
MySQL Proxy: Architecture and concepts of misuse
 
MySQL Fabric: High Availability using Python/Connector
MySQL Fabric: High Availability using Python/ConnectorMySQL Fabric: High Availability using Python/Connector
MySQL Fabric: High Availability using Python/Connector
 
MySQL highav Availability
MySQL highav AvailabilityMySQL highav Availability
MySQL highav Availability
 
MySQL Proxy. From Architecture to Implementation
MySQL Proxy. From Architecture to ImplementationMySQL Proxy. From Architecture to Implementation
MySQL Proxy. From Architecture to Implementation
 
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
 
MySQL Proxy tutorial
MySQL Proxy tutorialMySQL Proxy tutorial
MySQL Proxy tutorial
 
MySQL HA Solutions
MySQL HA SolutionsMySQL HA Solutions
MySQL HA Solutions
 
MySQL Proxy. A powerful, flexible MySQL toolbox.
MySQL Proxy. A powerful, flexible MySQL toolbox.MySQL Proxy. A powerful, flexible MySQL toolbox.
MySQL Proxy. A powerful, flexible MySQL toolbox.
 
MySQL High Availability Deep Dive
MySQL High Availability Deep DiveMySQL High Availability Deep Dive
MySQL High Availability Deep Dive
 
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware companyMySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
 
High-Availability using MySQL Fabric
High-Availability using MySQL FabricHigh-Availability using MySQL Fabric
High-Availability using MySQL Fabric
 
MySQL Replication: What’s New in MySQL 5.7 and Beyond
MySQL Replication: What’s New in MySQL 5.7 and BeyondMySQL Replication: What’s New in MySQL 5.7 and Beyond
MySQL Replication: What’s New in MySQL 5.7 and Beyond
 
Using MySQL Fabric for High Availability and Scaling Out
Using MySQL Fabric for High Availability and Scaling OutUsing MySQL Fabric for High Availability and Scaling Out
Using MySQL Fabric for High Availability and Scaling Out
 
ProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQLProxySQL - High Performance and HA Proxy for MySQL
ProxySQL - High Performance and HA Proxy for MySQL
 
Methods of Sharding MySQL
Methods of Sharding MySQLMethods of Sharding MySQL
Methods of Sharding MySQL
 
MySQL Day Paris 2016 - MySQL HA: InnoDB Cluster and NDB Cluster
MySQL Day Paris 2016 - MySQL HA: InnoDB Cluster and NDB ClusterMySQL Day Paris 2016 - MySQL HA: InnoDB Cluster and NDB Cluster
MySQL Day Paris 2016 - MySQL HA: InnoDB Cluster and NDB Cluster
 

Semelhante a DIY MySQL Cluster talk

Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databaseslovingprince58
 
Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017AnkaraCloud
 
Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017Onur Dayıbaşı
 
Microservices for performance - GOTO Chicago 2016
Microservices for performance - GOTO Chicago 2016Microservices for performance - GOTO Chicago 2016
Microservices for performance - GOTO Chicago 2016Peter Lawrey
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databasesguestdfd1ec
 
Latency vs everything
Latency vs everythingLatency vs everything
Latency vs everythingOri Pekelman
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategySaptarshi Chatterjee
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applicationsDing Li
 
The nightmare of locking, blocking and isolation levels!
The nightmare of locking, blocking and isolation levels!The nightmare of locking, blocking and isolation levels!
The nightmare of locking, blocking and isolation levels!Boris Hristov
 
Beyond the RTOS: A Better Way to Design Real-Time Embedded Software
Beyond the RTOS: A Better Way to Design Real-Time Embedded SoftwareBeyond the RTOS: A Better Way to Design Real-Time Embedded Software
Beyond the RTOS: A Better Way to Design Real-Time Embedded SoftwareMiro Samek
 
Open HFT libraries in @Java
Open HFT libraries in @JavaOpen HFT libraries in @Java
Open HFT libraries in @JavaPeter Lawrey
 
MySQL Multi-Master Replication
MySQL Multi-Master ReplicationMySQL Multi-Master Replication
MySQL Multi-Master ReplicationMichael Naumov
 
The Nightmare of Locking, Blocking and Isolation Levels!
The Nightmare of Locking, Blocking and Isolation Levels!The Nightmare of Locking, Blocking and Isolation Levels!
The Nightmare of Locking, Blocking and Isolation Levels!Boris Hristov
 
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...Valverde Computing
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveHostedbyConfluent
 
what every web and app developer should know about multithreading
what every web and app developer should know about multithreadingwhat every web and app developer should know about multithreading
what every web and app developer should know about multithreadingIlya Haykinson
 

Semelhante a DIY MySQL Cluster talk (20)

Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databases
 
Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017
 
Replication in the wild ankara cloud meetup - feb 2017
Replication in the wild   ankara cloud meetup - feb 2017Replication in the wild   ankara cloud meetup - feb 2017
Replication in the wild ankara cloud meetup - feb 2017
 
Microservices for performance - GOTO Chicago 2016
Microservices for performance - GOTO Chicago 2016Microservices for performance - GOTO Chicago 2016
Microservices for performance - GOTO Chicago 2016
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Transactional Memory
Transactional MemoryTransactional Memory
Transactional Memory
 
Latency vs everything
Latency vs everythingLatency vs everything
Latency vs everything
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategy
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
The nightmare of locking, blocking and isolation levels!
The nightmare of locking, blocking and isolation levels!The nightmare of locking, blocking and isolation levels!
The nightmare of locking, blocking and isolation levels!
 
Beyond the RTOS: A Better Way to Design Real-Time Embedded Software
Beyond the RTOS: A Better Way to Design Real-Time Embedded SoftwareBeyond the RTOS: A Better Way to Design Real-Time Embedded Software
Beyond the RTOS: A Better Way to Design Real-Time Embedded Software
 
Open HFT libraries in @Java
Open HFT libraries in @JavaOpen HFT libraries in @Java
Open HFT libraries in @Java
 
distcomp.ppt
distcomp.pptdistcomp.ppt
distcomp.ppt
 
distcomp.ppt
distcomp.pptdistcomp.ppt
distcomp.ppt
 
distcomp.ppt
distcomp.pptdistcomp.ppt
distcomp.ppt
 
MySQL Multi-Master Replication
MySQL Multi-Master ReplicationMySQL Multi-Master Replication
MySQL Multi-Master Replication
 
The Nightmare of Locking, Blocking and Isolation Levels!
The Nightmare of Locking, Blocking and Isolation Levels!The Nightmare of Locking, Blocking and Isolation Levels!
The Nightmare of Locking, Blocking and Isolation Levels!
 
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
 
what every web and app developer should know about multithreading
what every web and app developer should know about multithreadingwhat every web and app developer should know about multithreading
what every web and app developer should know about multithreading
 

Mais de Ulf Wendel

HTTP, JSON, JavaScript, Map&Reduce built-in to MySQL
HTTP, JSON, JavaScript, Map&Reduce built-in to MySQLHTTP, JSON, JavaScript, Map&Reduce built-in to MySQL
HTTP, JSON, JavaScript, Map&Reduce built-in to MySQLUlf Wendel
 
PHPopstar der PHP Unconference 2011
PHPopstar der PHP Unconference 2011PHPopstar der PHP Unconference 2011
PHPopstar der PHP Unconference 2011Ulf Wendel
 
Award-winning technology: Oxid loves the query cache
Award-winning technology: Oxid loves the query cacheAward-winning technology: Oxid loves the query cache
Award-winning technology: Oxid loves the query cacheUlf Wendel
 
The power of mysqlnd plugins
The power of mysqlnd pluginsThe power of mysqlnd plugins
The power of mysqlnd pluginsUlf Wendel
 
Mysqlnd query cache plugin benchmark report
Mysqlnd query cache plugin benchmark reportMysqlnd query cache plugin benchmark report
Mysqlnd query cache plugin benchmark reportUlf Wendel
 
mysqlnd query cache plugin: user-defined storage handler
mysqlnd query cache plugin: user-defined storage handlermysqlnd query cache plugin: user-defined storage handler
mysqlnd query cache plugin: user-defined storage handlerUlf Wendel
 
Mysqlnd query cache plugin statistics and tuning
Mysqlnd query cache plugin statistics and tuningMysqlnd query cache plugin statistics and tuning
Mysqlnd query cache plugin statistics and tuningUlf Wendel
 
Mysqlnd Async Ipc2008
Mysqlnd Async Ipc2008Mysqlnd Async Ipc2008
Mysqlnd Async Ipc2008Ulf Wendel
 

Mais de Ulf Wendel (8)

HTTP, JSON, JavaScript, Map&Reduce built-in to MySQL
HTTP, JSON, JavaScript, Map&Reduce built-in to MySQLHTTP, JSON, JavaScript, Map&Reduce built-in to MySQL
HTTP, JSON, JavaScript, Map&Reduce built-in to MySQL
 
PHPopstar der PHP Unconference 2011
PHPopstar der PHP Unconference 2011PHPopstar der PHP Unconference 2011
PHPopstar der PHP Unconference 2011
 
Award-winning technology: Oxid loves the query cache
Award-winning technology: Oxid loves the query cacheAward-winning technology: Oxid loves the query cache
Award-winning technology: Oxid loves the query cache
 
The power of mysqlnd plugins
The power of mysqlnd pluginsThe power of mysqlnd plugins
The power of mysqlnd plugins
 
Mysqlnd query cache plugin benchmark report
Mysqlnd query cache plugin benchmark reportMysqlnd query cache plugin benchmark report
Mysqlnd query cache plugin benchmark report
 
mysqlnd query cache plugin: user-defined storage handler
mysqlnd query cache plugin: user-defined storage handlermysqlnd query cache plugin: user-defined storage handler
mysqlnd query cache plugin: user-defined storage handler
 
Mysqlnd query cache plugin statistics and tuning
Mysqlnd query cache plugin statistics and tuningMysqlnd query cache plugin statistics and tuning
Mysqlnd query cache plugin statistics and tuning
 
Mysqlnd Async Ipc2008
Mysqlnd Async Ipc2008Mysqlnd Async Ipc2008
Mysqlnd Async Ipc2008
 

Último

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Último (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

DIY MySQL Cluster talk

  • 1. MySQL Cluster talk DIY No Best Practices No Product Presentation … you have been warned. N marketing fluff
  • 2. Foreword and disclaimer Do it yourself, become a maker, get famous! In this course you will learn how to create an eager update anywhere cluster. You need: ● A soldering iron, solder ● Wires (multiple colors recommended) ● A collection of computers By the end of the talk you can either challenge MySQL, or get MySQL Cluster for free – it's Open Source, as ever since. Get armed with the distributed system theory you, as a developer, need to master any distributed database.
  • 3. DIY – Distributed Database Cluster, or: MySQL Cluster Ulf Wendel, MySQL/Oracle N marketing fluff
  • 5. The speaker says... Beautiful work, but unfortunately the DIY troubles begin before the first message has been delivered in our cluster. Long before we can speak about the latest hats fashion, we have to fix wiring and communication! Communication should be: • Fast • Reliable (loss, retransmission, checksum, ordering) • Secure Network performance is a limiting factor for distributed systems. Hmm, we better go back to the drawing board before we mess up more computers...
  • 6. Availability • Cluster as a whole unaffected by loss of nodes Scalability • Geographic distribution • Scale size in terms of users and data • Database specific: read and/or write load Distribution Transparency • Access, Location, Migration, Relocation (while in use) • Replication • Concurrency, Failure Back to the beginning: goals
  • 7. The speaker says... A distributed database cluster strives for maximum availability and scalability while maintaining distribution transparency. MySQL Cluster has a shared-nothing design good enough for 99,999% (five minutes downtime per year). It scales from Rasperry Pi run in a briefcase to 1.2 billion write transactions per second on a 30 data nodes cluster (if using possibly unsupported bleeding edge APIs.) It offers full distribution transparency with the exception of partition relocation to be triggered manually but performed transparently by the cluster. That's to beat. Let's learn what kind of clusters exist, how they tick and what the best algorithms are.
  • 8. Where are transactions run? Primary Copy Update Anywhere When does synchronization happen? Eager Not available for MySQL MySQL Cluster 3rd party Lazy MySQL Replication 3rd party MySQL Cluster Replication What kind of cluster?
  • 9. The speaker says... A wide range of clusters can be categorized by asking where transactions are run and when replicas synchronize their data. Any eager solution ensures that all replicas are synchronized at any time: it offers strong consistency. A transaction cannot commit before synchronization is done. Please note, what it means to transaction rates: • Single computer tx rate ~ disk/fsync rate • Lazy cluster tx rate ~ disk/fsync rate • Eager cluster tx rate ~ network round-trip time (RTT) Test: Would you deploy MySQL Cluster on Amazon EC2 :-) ?
  • 10. Lazy Primary Copy we have... 010101001011010 101010110100101 101010010101010 101010110101011 101010110111101 Master (Primary) Write Slave (Copy) Slave (Copy) Slave (Copy) Read Read Lazy synchronization: eventual consistency Primary Copy: where any transaction may run
  • 11. The speaker says... MySQL Replication falls into the category of lazy Primary Copy clusters. It is a rather unflexible solution as all updates must be sent to the primary. However, this simplifies concurrency control of conflicting, concurrent update transactions. Concurrency control is no different from a single database. Lazy replication can be fast. Transactions don't have to wait for synchronization of replicas. The price of the fast execution is the risk of stale reads and eventual consistency. Transactions can be lost when the primary crashes after commit and before any copy has been updated. (This is something you can avoid by using MySQL semi-sync replication, which delays the commit until delivery to copy.)
  • 12. BTW, confusing: Multi-Master Master (Primary) Slave (Copy) Master (Primary) Slave (Copy) SET A = 1 SET B = 1 A, B A, B
  • 13. The speaker says... Be aware of the term Multi-Master. MySQL Community sometimes uses it to describe a set of Primary Copy clusters where primaries (master) replicate from each other. This is one of the many possible topologies that you can build with MySQL Replication. In the example, the PC cluster on the left manages table A and the PC cluster on the right manages table B. The Primaries copy table A respectively table B from each other. There is no concurrency control and conflicts can arise. There is no distribution transparency. This is not an own kind of cluster with regards to our where and when criteria. And, it is rarely what you want... Not a good goal for DIY – let's move on.
  • 14. Let's do Eager Update Anywhere 010101001011010 101010110100101 101010010101010 101010110101011 101010110111101 Replica Write Replica Replica Replica Read Eager synchronization: strong consistency Update Anywhere: any transaction can run on any replica
  • 15. The speaker says... An eager update anywhere cluster improves distribution transparency and removes the risk of reading stale data. Transparency and flexibility is improved because any transaction can be directed to any replica. Synchronization happens as part of the commit, thus strong consistency is achieved. Remember: transaction rate ~ network RTT. Failure tolerance is better than with Primary Copy. There is no single point of failure – the primary - that can cause a total outage of the cluster. Nodes may fail without bringing the cluster down immediately. Concurrency control (synchronization) is complex as concurrent transactions from different replicas may conflict.
  • 16. Concurrency Control: 1SR 010101001011010 101010110100101 101010010101010 101010110101011 101010110111101 Replicat0 : SET a = 1 Replica t0 : SET a = 2 One-Copy-Serializability (1SR) for correctness • All replicas must decide on the same transaction order a = 1 a = 2 a = 2a = 1 a = 1 010101001011010 101010110100101 101010010101010 101010110101011 101010110111101
  • 17. The speaker says... Concurrent ACID transactions must be isolated from each other to ensure correctness. The database system needs a mechanism to detect conflicts. If any, transactions need to be serialized. The challenge is to have all replicas commit transactions in the same serial order. One-Copy- Serializability (1SR) demands the concurrent execution of transactions in an replicated database to be equivalent to a serial execution of these transactions over a single logical copy of the database. 1SR is the highest level of consistency, lower exist, for example, snapshot isolation. Given that, the questions are: • How to detect conflicting transactions? • How to enforce a global total order?
  • 18. Certification: detect conflict Replica Update transaction Replica Read query Replica Read set: a = 1 Write set: b = 12 Transactions get executed and certified before commit • Conflict detection is based on read and write sets • Multi-Primary deferred update Certification Certification
  • 19. The speaker says... (For brevity we discuss multi-primary deferred update only.) In a multi-primary deferred update system a read query can be served by a replica without consulting any of the other replicas. A write transaction must be certified by all other replicas before it can commit. During the execution of the transaction, the replica records all data items read and written. The read/write sets are then forwarded by the replica to all other replicas to certify the remote transaction. The other replicas check whether the remote transaction includes data items modified by an active local transaction. The outcome of the certification decides on commit or abort. Either symetric (statement based) or asymetric (row based) replication can be used.
  • 20. Concurrency Control 010101001011010 101010110100101 101010010101010 101010110101011 101010110111101 Replicat0 : SET a = 1 Replica t0 : SET a = 2 Various synchronization mechanisms • Atomic commit • Atomic broadcast • Strict two-phase locking (2PL) • Optimistic, Physical clock, Lamport's clock, vector clock... a = 1 a = 2 a = 1a = 1 a = 2
  • 21. The speaker says... One challenge remains: replicas must agree on a global total order for comitting transactions no matter in which order they receive messages. We will discuss atomic commit (two-phase-locking) and atomic broadcast. The other approaches are out of scope.
  • 22. Atomic commit for CC Execute Committing PreCommit Aborted Comitted Formula (background): serial execution, unnecessary aborts
  • 23. The speaker says... Atomic commit can be expressed as a state machine with the final states abort and commit. Once a transaction has been executed, it enters the committing state in which certification/voting takes place. Given the absence of conflicting concurrent transactions, a replica sets the transactions status to precommit. If all replicas precommit, the transaction is comitted, otherwise it is aborted. Don't worry about the formula. It checks for concurrent transactions – as we did before – and ensures, in case of conflicts, that only one transaction can commit at a time. Problem: it may also do unnecessary aborts depending on message delivery order as it requires all servers to precommit->commit in the same order.
  • 24. Atomic broadcast for CC Atomic broadcast guarantees • Agreement: if one server delivers a message, all will • Total order: all servers deliver messages in the same order Greatly simplified concurrency check • Deterministic: no extra communication after local decision
  • 25. The speaker says... Atomic broadcast ensures that transaction are delivered in the same order to all replicas. Thus, certification of transactions is deterministic: all replicas will make the same decision about commit or abort because they all base their decision on the same facts. This in turn means that there is no need to coordinate the decisions of all replicas – all replicas will make the same decision. A transaction does not conflict and thus will commit, if its executed after the commit of any other transaction, or its read set does not overlap with the write set of any other transaction. The formula is greatly simplified! Great for DIY!
  • 26. Voting quorum: ROWA, or...? Read-One Write-All is a special quorum • Quorum constraints: NR + NW > N, NW > N/2 Replica Replica Replica Replica Replica Replica Replica Replica Replica Replica Replica Replica Example: N= 12, read quorum NR = 3, write quorum NW = 10 Replica Replica Replica Example: N= 3, read quorum NR = 2, write quorum NW = 2
  • 27. The speaker says... So far we have silently assumed a Read-One Write-All (ROWA) quorum for voting. Reads could be served locally because updates have been applied to all replicas. Alternatively, we could make a rule that an update has to be agreed by and applied to half of the replicas plus one. This may be faster than achieving agreement among all replicas. However, for a correct read we now have to contact half of the replicas plus one and check whether they all give the same reply. If so, we must have read the latest version as the remaining, unchecked replicas form a minority that cannot be updated. The read quorum overlaps the write quorum by at least one element.
  • 28. Voting quorum: ROWA! ROWA almost always performs better • Are Quorums an Alternative for Data Replication? (Jimenez-Peris et.al.) • „The obvious conclusion from these results is that ROWAA is the best choice for a wide range of application scenarios. It offers good scalability (within the limitations of replication protocols), very good availability, and an acceptable communication overhead. It also has the significant advantage of being very simple to implement and very easy to adapt to configuration changes. For every peculiar loads and configurations, it is possible that some variation of quorum does better than ROWAA.“ • Background: scale out results from study
  • 29. The speaker says... Judging from the paper ROWA respectively Read- One Write-All-Available (ROWAA) is a promising approach. For example, it offers linear scalability for read only workloads but still remains competitive for mixed update and read loads. It requires a high write-to-read ratio before the various Quorum algorithms outperform ROWA on scalability. In sum: ROWA beats Quorums by a magnitude for read but does not drop by a magniture for write, and the web is read dominated. Scalability is one aspect. Quorums also help with availability – the studies finding is similar: ROWA is fine. DIY decision on currency control: ROWA, atomic broadcast. Quiz: name a system using Quorums? Riak! Next: Availability and Fault Tolerance.
  • 30. Complex failure handling required • Later evolution: Three-Phase Commit (3PC) Fault Tolerance: 2PC Coordinator Participant Participant Vote Request PreCommit PreCommit Vote Request Global Commit Commit
  • 31. The speaker says... When discussing atomic commit we have effectively shown the Two-Phase Commit (2PC) protocol. 2PC starts with a vote request multicasted from a coordinator to all participants. The participants either vote to commit (precommit) or abort. Then, the coordinator checks the voting result. If all voted to commit, it sends a global commit messages and the participants commit. Otherwise the coordinator sends a global abort command. Various issues may arise in case of network or process failures. Some cannot be cured using timeouts. For example, consider the situation when a participant precommits but gets no global commit or global abort. The participant cannot uniliterally leave the state. At best, it can ask another participant what to do.
  • 32. Two-Phase Commit is a blocking protocol Fault Tolerance: 2PC Coordinator Participant Participant Vote Request PreCommit PreCommit Vote Request
  • 33. The speaker says... The worst case scenario is a crash of the coordinator after all participants have voted to precommit. The participants cannot leave the precommit state before the coordinator has recovered. They do not know whether all of them have voted to commit or not. Thus, they do not know whether a global commit or global abort has to be performed. As none of them has received a message about the outcome of the voting, the participants cannot contact one another and ask for the outcome. Two-Phase Commit is also known as a blocking protocol.
  • 34. Reliable multicast/broadcast • Build on the idea of group views and view changes Virtual Synchrony P1 P2 P3 P4 M1 M2 VC M3 M4 G1 = {P1, P2, P3} G2 = {P1, P2, P3, P4}
  • 35. The speaker says... Virtual Synchrony is a mechanism that does not block. It is build around the idea of associating multicast messages with the notion of a group. A message is delivered to all members of a group but no other processes. Either the message is delivered to all members of a group or to none of them. All members of the group agree that they are part of the group before the message is multicasted (group view). In the example, M1...3 are associated with the group G1 = {P1, P2, P3}. If a process wants to join or leave a group a view change message is multicated. In the example, P4 wants to join the group and a VC message is send while M3 is still being delivered. Virtual Synchrony requires that either M3 is delivered to all of G1 before the view change takes place or to none.
  • 36. View changes act as a message barrier • Remember the issues with 2PC …? Virtual Synchrony P1 P2 P3 P4 M5 VC M6 G2 = {P1, P2, P3, P4} G3 = {P1, P2, P3} M7 M8
  • 37. The speaker says... There is only one condition under which a multicast message is allowed not to be delivered: if the sender crashed. Assume the processes continue working and multicast messages M5, M6, M7 to group G2 = {P1, P2, P3, P4}. While P4 sends M7 it crashes. P4 has managed to deliver its message to {P3}. The crash of P4 is noticed and a view change is triggered. Because Virtual Synchrony requires a message to be delivered to all members of the group associated with it but the sender crashed, P3 is free to drop M7 and the view change can take place. A new group view G3 is established and messages can be exchanged again.
  • 38. Wire: message ordering and fault tolerance • Common choices: UDP or TCP over IP Reliable, delivered vs. received 010101001011010 101010110100101 101010010101010 101010110101011 101010110111101 ReplicaReplica Update 1 Update 2 t1 : Update 1 t2 : Update 2 t1 : Update 2 t2 : Update 1 (lost)
  • 39. The speaker says... Virtual Synchrony offers reliable multicast. Reliability can be best achieved using a protocol higher up on the OSI model. Isis, an early framework implementing Virtual Synchrony, has used TCP point to point connections if reliable service was requested. TCP is a connection oriented protocol (endpoint failures can be deteted easily) with error handling and message delivery in the order sent. However, using TCP only there are no ordering constraints between messages from any two senders. Those ordering constraints have to be implemented at the application layer. We say a message can be recieved on the network layer in a different order than its delivered to the application by the model discussed. Vector clocks can be used for global total ordering.
  • 40. AB = Virtual Synchrony offering total-order delivery • „Synchrony“ does not refer to temporal aspects Atomic broadcast definition P1 P2 P3 P4 M1 M2 Unordered delivery Ordered delivery P1 P2 P3 P4 M1 M2
  • 41. The speaker says... Atomic broadcast means Virtual Synchrony used with total- order message ordering. When Virtual Synchrony was introduced back in the mid 80s, it was explicitly designed to allow other message orderings. For example, it should be able to support distributed applications that have a notion of finding messages that commute, and thus may be applied in an order different from the order sent to improve performace. If events are applied in different order on different processes, the system cannot be called synchronous any more – the inventors called it virtually synchronous. However, recall we are only after total-ordering for 1SR.
  • 42. Wash the brain without marketing fluff, split brain, done! • System dependent... E.g. Isis failure detector was very basic How to cook brains P1 P2 P3 P4 M1 M2 n1({P1, P2, P3, P4]) = 4 VC Split brain – Connection lost n2({P1, P2}) = 2 < (n1/2)
  • 43. The speaker says... The failure of individual processes – or database replicas – has been discussed. The model has measures to handle them following using a fail stop approach. To conclude the discussion of fault tolerance we look at a situation called split brain: one half of the cluster lost connection to another half. Which shall survive? The answer is often implementation dependent. For example, the early Virtual Synchrony framework Isis has a rule that a new group view can only be installed if it contains n / 2 + 1 members with n being the number of members in the current group. In the example both halves would shut down. Brain splitting question: how many replicas would you project for a cluster if you don't know split brain implementation details?
  • 44. In-core architecture DIY: Hack MySQL (oh, oh), or...? MySQL DBMS MySQL DBMS Load Balancer PECL/mysqlnd_ms MySQL Proxy PHP PHP PHP Reflector Reflector Replicator Replicator GCS
  • 45. The speaker says... Here's a generic architecture made of five components: • Clients (PHP, Java, …) using well known interfaces • Load Balancer (for example PECL/mysqlnd_ms) • The actual database system • The reflector allows inspection and modification of on- going transactions • The (distributed) replicator handling concurrency control • The Group Communication System (GCS) provides communication primitives such as multicast (GCS examples: Appia, JGroups – Java, Spread – C/C++)
  • 46. Middleware architecture DIY: Hack MySQL (oh, oh), or...? Virtual DBMS Virtual DBMS Load Balancer Clients Reflector Reflector Replicator Replicator GCS DBMS DBMS
  • 47. The speaker says... An in-core design requires support for a reflector by the database. Strictly speaking there is no API inside MySQL one can use. The APIs used for MySQL Replication are not sufficient. Nonetheless, MySQL Replication can be classified as in-core in our model. Due to the lack of an reflector API, the only third party product following an in- core design (Galera by Codership) has to patch the MySQL core. Tungsten Replicator by Continuent is a Middleware design. Clients contact a virtual database. Requests are intercepted, parsed and replicated. The challenge is in the interception: statements using non-deterministic calls such as NOW() and TIME() must be taken care of.
  • 48. Hybrid architecture DIY: Hack MySQL (oh, oh), or...? DBMS DBMS Load Balancer Clients Reflector Plugin Reflector Plugin Replicator Replicator GCS
  • 49. The speaker says... In a hybrid architecture the reflector runs within the database process but the replicator layer is using extra processes. It is not a perfect comparison as we will see later but for the sake of our model, we can classify MySQL Cluster as a hybrid architecture. The reflector is implemented as a storage engine. The replicator layer is using extra processes. This design has some neat MySQL NDB Cluster specific benefits. If any MySQL product has NoSQL genes, it is MySQL Cluster.
  • 50. Primary Copy Update Anywhere Eager Not available for MySQL MySQL Cluster (Hybrid) Galera (In-core) Lazy MySQL Replication (In-core) Tungsten (Middleware) MySQL Cluster Replication (Hybrid) DIY: Summary
  • 51. The speaker says... Time for a summary before coding ants and compilers start their work. From a DIY perspective we can skip Lazy Primary Copy: it has simple concurrency control, it does not depend on network speed, it is great for flacky and slow WAN connections but it offers eventual consistency only (hint: enjoy PECL/mysqlnd_ms!), it has no means to scale writes. And, it exists – no karma... An eager update anywhere solution offering the highest level of correctness (1SR) gives you strong consistency. It scales writes to some degree because they can be executed on any replica, which parallizes execution load. Commit performance is network bound.
  • 52. Full Replication Partitial Replication Read Scale Out Write Scale Out Capability MySQL Replication (Lazy Primary Copy, In-core) MySQL Cluster (Eager Update Anywhere, Hybrid) Tungsten (Primary Copy, Middleware) Galera (Eager Update Anywhere, In-core) If 1SR - hard limit DIY: The Master Class
  • 53. The speaker says... The DIY Master Class for maximum karma is a partial replication solution offering strong consistency. Partial replication is the only way to ultimately scale write requests. The explanation is simple: every write adds load to the entire cluster. Remember that writes need to be coordinated, remember that concurrency control involves all replicas (ROWA) or a good number of them (Quorum). Thus, every additional replica adds load to all others. The solution is to partition the data set and keep each partition on a subset of all replicas only. NoSQL calls it sharding, MySQL Cluster calls it partitioning. Partial replication – that's the DIY master piece, that will give you KARMA.
  • 54. Availability • Shared-nothing, High Availability (99,999%) • WAN Replication to secondary data centers Scalability • Read and write through partial replication (partitioning) • Distributed queries (parallize work), real-time guarantees • Focus In-Memory with disk storage extension • Sophisticated thread model for multi-core CPU • Optimized for short transaction (hundrets of operations) Distribution Transparency • SQL level: 100%, low-level interfaces available MySQL (NDB) Cluster goals
  • 55. The speaker says... I am not aware of text books discussing partial replication theory in-depth. Thus, we have to reverse engineer an existing system. As this is a talk about MySQL Cluster, how about talking about MySQL Cluster finally?MySQL Cluster has originally been developed to serve telecommunication systems. It aims to parallize work as much as possible, hence it is a distributed database. It started as an in-memory solution but can store data on disk meanwhile. It runs best in environments offering low network latency, high network throughput and issuing short transactions. Applications should not demand complex joins. There is no chance you throw Drupal at it and Drupal runs super-fast out of the box! Let's see why...
  • 56. SQL view: Cluster is yet another table storage engine MySQL Cluster is a hybrid MySQL MySQL Load Balancer Clients Reflector Plugin = NDB Storage Engine Replicator = NDB Data Node GCS
  • 57. The speaker says... MySQL Cluster has a hybrid architecture. It consists of the green elements on the slide. The Reflector is implemented as a MySQL storage engine. From a SQL user's perspective, it is just another storage engine, similar to MyISAM, InnoDB or others (Distribution Transparency). From a SQL client perspective there is no change: all MySQL APIs can be used. The Reflector (NDB Storage Engine) runs as part of the MySQL process. The Replicator is a seperate process called NDB data node. Please note, node means process not machine. MySQL Cluster does not fit perfectly in the model: an NDB data node combines Replicator and storage tasks. BTW, what happens to Cluster if a MySQL Server fails?
  • 58. Fast low-level access: bypassing the SQL layer MySQL Cluster is a beast MySQL MySQL Load Balancer Clients Reflector Plugin = NDB Storage Engine Replicator = NDB Data Node GCS Clients 4.3b read tx/s 1.2b write tx/s (in 2012)
  • 59. The speaker says... From the perspective of MySQL Cluster, a MySQL Server is yet another application client. MySQL Server happens to be an application that implements a SQL view on the relational data stored inside the cluster. MySQL Cluster users often bypass the SQL layer by implementing application clients on their own. SQL is a rich query language but parsing a SQL query can take 30...50% of the total runtime of a query. Thus, bypassing is a good idea. The top benchmark results we show for Cluster are achieved using C/C++ clients directly accessing MySQL Cluster. There are many extra APIs for this special case: NDB API (C/C++, low level), ClusterJ (ORM style), ClusterJPA (low level), … - even for node.js (ORM style)
  • 60. Partitioning (auto-sharding) NDB Data Node 1 NDB Data Node 2 NDB Data Node 3 NDB Data Node 4 Partition 0, Primary Partition 2, Copy Partition 0, Copy Partition 2, Primary Partition 1, Primary Partition 1, Copy Partition 3, Copy Partition 3, Primary Node Group 1 Node Group 0
  • 61. The speaker says... There is a lot to say about how MySQL Cluster partitions a table and spreads it over nodes. The manual has all details, just all... The key idea is to use an eager primary copy approach for partitions combined with a mindful distribution of each partitions primary and its copies. NDB supports zero or one copies (replication factor). The failure of a partitions primary does not cause a failure of the Cluster. In the example, the failure of any one node has no impact. Also, node 1 and 4 may fail without a stop of the Cluster (fail stop model). But the cluster shuts down if all nodes of a node group fail.
  • 62. Concurrency Control: 2PL,“2PC“ NDB Data Node 1 NDB Data Node 2 NDB Data Node 3 NDB Data Node 4 Partition 0, Primary Partition 2, Copy Partition 0, Copy Partition 2, Primary Partition 1, Primary Partition 1, Copy Partition 3, Copy Partition 3, Primary W R R
  • 63. The speaker says... Buuuuh? Two-Phase-Locking (2PL) and Two-Phase-Commit (2PC) are used for concurrency control. Cluster is using traditional row locking to isolate transactions. Read and write locks can be distributed throughout the cluster. The locks are set on the primary partitions. Transactions are serialized during execution. When a transaction commits, an optimized Two-Phase-Commit is used to synchronize the partition copies. The SQL layer recognizes the commit as soon as the copies are updated (and before logs have been written to disk). The low-level NDB C/C++ application API is asynchronous. Fire and forget is possible: your application can continue before transaction processing as even begun!
  • 64. Brain Masala NDB Data Node 1 NDB Data Node 2 NDB Data Node 3 NDB Data Node 4 Partition 0, Primary Partition 2, Copy Partition 0, Copy Partition 2, Primary Partition 1, Primary Partition 1, Copy Partition 3, Copy Partition 3, Primary Arbitrator
  • 65. The speaker says... The failure of a single node is detected using a hearthbeat protocol: details are documented, future improvements are possible. Both MySQL Cluster and Virtual Synchrony seperate message delivery from node failure detection. The worst case scenario of a brain split is cured by the introduction of arbitrators. If the nodes split and each half is able to keep the Cluster up, the nodes try to contact the arbitrator. It is then up to the arbitrator to decide who stays up and who shuts down. Arbitrators are extra processes, ideally run on extra machines. Management nodes can act as arbitrators too. You need at least one management node for administration, thus you always have an arbitrator readily available.
  • 66. Drupal? Sysbench? Oh, oh... NDB Data Node 1 NDB Data Node 2 NDB Data Node 3 NDB Data Node 4 Partition 0, Primary Partition 2, Copy Partition 0, Copy Partition 2, Primary Partition 1, Primary Partition 1, Copy Partition 3, Copy Partition 3, Primary MySQL
  • 67. The speaker says... Partial replication (here: partitioning, sharding) is the only known solution to the write scale out problem. But, it comes at the high price of distributed queries. A SQL query may require reading data from many partitions. One the one hand work is nicely parallized over many nodes on the other hand, records found have to be transferred within the cluster from one node to another. Although Cluster tries to batch requests efficiently together to minimize communication delays, transferring data from node to node to answer questions remains an expensive operation.
  • 68. Oh, oh... tune your partitions! NDB Data Node 1 NDB Data Node 2 NDB Data Node 3 NDB Data Node 4 Partition 0, Primary Partition 2, Copy Partition 0, Copy Partition 2, Primary Partition 1, Primary Partition 1, Copy Partition 3, Copy Partition 3, Primary MySQL CREATE TABLE cities { id INT NOT NULL, Population INT UNSIGNED, city_name VARCHAR(100), PRIMARY KEY(city_name, id) } SELECT id FROM cities WHERE city_name = 'Kiel'
  • 69. The speaker says... How much traffic and latency occurs depends on the actual SQL query and the partitioning scheme. By default a table is partitioned into 3840 virtual fragments (think vBuckets) using its primary key. The partitioning can and should be tuned. Try to find partitioning keys that make your common, expensive or time-criticial queries run on a single node. Assume you have a list of cities. City names are not unique, thus you have introduced a numeric primary key. It is likely that your most common query checks for the city name not for the numeric primary key only. Therefore, your partitioning should be based on city name as well.
  • 70. The ultimate Key-Value-Store? NDB Data Node 1 NDB Data Node 2 NDB Data Node 3 NDB Data Node 4 Partition 0, Primary Partition 2, Copy Partition 0, Copy Partition 2, Primary Partition 1, Primary Partition 1, Copy Partition 3, Copy Partition 3, Primary MySQL CREATE TABLE cities { id INT NOT NULL, city_name VARCHAR(100), PRIMARY KEY(id) } SELECT FROM cities WHERE id = 1 SELECT FROM citites WHERE id = 100
  • 71. The speaker says... I may have stated it before: if there is any product at MySQL that can compete with NoSQL (as in Key-Value- Store) on the issue of distributed data stores, it is MySQL Cluster. An optimal query load for MySQL Cluster is one that primarily performs lookups on partition keys. Each query will execute on one node only. There is little traffic within the cluster – little network overhead. Work load is perfectly parallized. Will your unmodified PHP application perform on Cluster?
  • 72. Joins: 24...70x faster Then Now NDB_API> read a from table t1 where pk = 1 [round trip] (a = 15) NDB_API> read b from table t2 where pk = 15 [round trip] (b = 30) [return a = 15, b = 30] SELECT t1.a, t2.b FROM t1, t2 WHERE t1.pk = 1 AND t1.a = t2.pk NDB_API> read @a=a from table t1 where pk = 1; read b from table t2 where pk = @a [round trip]
  • 73. The speaker says... In 7.2 we claim certain joins to execute 24...70x faster by the help of AQL (condition push-down)! How come? Partial replication does not go together well with joins. Take this simple nested join as an example. There are two tables to join. The join condition of the second table depends on the values of the first table. Thus, t1 has to be searched before t2 can be searched and the result can be returned to the user. That makes two operations and two round trips. As of 7.2, there is a new batched way of doing it. It saves round trips. Some round trips avoided means – at the extreme - 24...70x faster: the network is your enemy #1.
  • 74. Benchmark pitfall: connections NDB Data Node 1 NDB Data Node 2 NDB Data Node 3 NDB Data Node 4 MySQL Load Balancer Many, many clients MySQL NDB Storage Engine NDB Storage Engine
  • 75. The speaker says... If you ever come to the point of evaluating MySQL Cluster, make sure you configure MySQL to Cluster connections appropriately (ndb_cluster_connection_pool). A MySQL Server with only one connection (default setting) from itself to the cluster may not be able to serve many concurrent clients at the rate the Cluster part itself might be able to handle them. The connection may an impose an artifical limitation on the cluster throughput.
  • 76. Adding nodes, rebalancing NDB Data Node 1 NDB Data Node 2 NDB Data Node 3 NDB Data Node 4 Partitions Partitions Partitions Partitions NDB Data Node 5 NDB Data Node 6
  • 77. The speaker says... Adding nodes, growing the capacity of your cluster in terms of size and computing power, is an online operation. At any time you can add nodes to your cluster. New nodes do not immediately participate in operations. You have to tell the cluster what to do with them: use for new tables, or use for growing the capacity available to existing tables. When growing existing tables, data needs to be redistributed to the new nodes. Rebalancing is an online operation: it does not block clients. The partitioning algorithm used by Cluster ensures that data is copied to new nodes only, there is no traffic between nodes currently holding fragments of the table to be rebalanced.
  • 78. We shall... • Code an Eager Update-Anywhere Cluster • Prefer an hybrid design to get not too deep into MySQL • Do not fear the lack of text books on partital replication • Read CPU vendor tuning guides like comics • Like Sweden or Finland Send your application to the MySQL Cluster team. Cluster is different. MySQL Cluster is perfect for web session storage. Whether your Drupal, WordPress, … runs faster is hard to tell – possibly not faster. PS (marketing fluff): ask Sales for a show! DIY - Summary
  • 79. The speaker says... By the end of this talk you should remember at least this: ● There are four kinds of replication solutions based on a matrix asking „where can all transactions run“ and „when are replicas synchronized“ ● Clusters don't make everything faster – the network is your enemy. For read scale out there are proven solutions. ● Write scale out is only possible through partial replication (Small write Quorum would impact read performance)
  • 81. The speaker says... Thank you for your attendance! Upcoming shows: Talk&Show! (ask... :-)) YourPlace, any time PHP Summit Munich, December 2013