The document discusses running MariaDB across multiple data centers. It begins by outlining the need for multi-datacenter database architectures to provide high availability, disaster recovery, and continuous operation. It then describes topology choices for different use cases, including traditional disaster recovery, geo-synchronous distributed architectures, and how technologies like MariaDB Master/Slave and Galera Cluster work. The rest of the document discusses answering key questions when designing a multi-datacenter topology, trade-offs to consider, architecture technologies, and pros and cons of different approaches.
3. Agenda
Background on the need for Multiple Datacenter DBMS Architectures
High Availability, Active-Passive, Location/Application Affinity, Continuous Operation
Topology Choices for determined use cases
Traditional Disaster Recovery/Secondary Site – HA/FO
Geo-Synchronous Distributed – Multi-Master/Active-Active
How the Topologies work
MariaDB Master/Slave, Galera Cluster
4. Answering a few simple Questions!
What are we trying to solve?
Why do we need to solve it?
Where and When can we deploy it? (On-Prem /Cloud /Hybrid)
How do we choose the correct design?
Complex and Challenging, the need to simplify and manage!
5. Answering the Simple
Questions
Trade-Offs !
Scalability Reliability
Performance
Growth
Hardware failures
Reconciliation
Parallelism
Load distribution
Closer to users
Business continuation
Consolidation
Agility
Data integrity
Outage protection
Resilience
Network Artition
9. Master/Slave Replication with Multiple Data Centers
Data Center (DC1, Active) Data Center (DC2, Passive)
MariaDB MaxScale
Proxy
MariaDB MaxScale
Proxy
Node 1
P1: priority=3
P2: priority=1
Node 2
P1: priority = 3
P2: priority = 1
Node 3
P1: priority = 3
P2: priority = 1
Multi-master cluster
synchronous replication
10. Master/Slave Replication with Multiple Data Centers Semi-Synchronous
Data Center (DC1, Active) Data Center (DC2, Passive)
MariaDB MaxScale
Proxy
MariaDB MaxScale
Proxy
Master Slave SlaveMasterSlaveSlave
11. Master/Slave Replication with Read Scaling
Cluster 1 DC1 Cluster 2 DC2
Writes
port 3307
MariaDB MaxScale
Proxy
Master Binlog Server
Slave R1 Slave R2 Slave R100
Slave M1Slave M2
Reads
port 3308
12. Master/Slave Replication with a Dedicated Backup
MariaDB MaxScale
Proxy
Slave backups Master
Slave 2 reads Slave 3 reads
DC 1
DC 2
MariaDB Backup
14. Replication Types
All nodes are masters and
applications can read and
write from/to any node
The Master does not confirm
transactions to the client
application until at least one slave
has copied the change to its relay
log, and flushed it to disk
The Master does not wait for
Slave, the master writes events to
its binary log and slaves request
them when they are ready
Asynchronous
Replication
Semi-Synchronous
Replication
Synchronous
Replication
19. Topologies: HA with MariaDB TX
Master/SlaveReplicationwithMultipleDataCenters
Data Center (DC1, Active) Data Center (DC2, Passive or Active)
MariaDB MaxScale
Proxy
MariaDB MaxScale
Proxy
Node 1
P1: priority=3
P2: priority=1
Node 2
P1: priority = 3
P2: priority = 1
Node 3
P1: priority = 3
P2: priority = 1
Multi-master cluster
synchronous replication
20. The master Galera cluster consists of 3 nodes (1 in DC1, 1 in DC2, 1 as Arbitrator). In case a Galera node in DC1 or DC2 fails, the other one is still active.
The Arbitrator is used only for the quorum in order to avoid split-brain.
The Galera arbitrator is a full node but does not store any data. All transactions are sent to this node, its network must be fast enough to support this.
MariaDBTXGalera3-node clusterwithMaxScale
22. WhatisGalera?
Replicates the InnoDB storage engine
All accumulated knowledge about InnoDB is applicable query tuning, server parameters, buffer
sizes, all apply
Synchronous replication
23. WhatisGalera?
A MariaDB Galera cluster requires a minimal of 3 nodes
However, one of the members of the cluster can be an arbitrator (2 node + 1 arbitrator)
Despite not participating in data replication, the arbitrator still needs to be on a 3rd physical
node
24. Administration and Monitoring
MaxScale 2.3 – MariaDB
a database proxy that forwards database statements to one or more database servers.
MariaDB MaxScale is designed to provide, transparently to applications, load balancing and high
availability functionality.
ClusterControl – SeveralNines
A robust, all-inclusive open source database management system. Allows users to easily monitor,
deploy, manage, and scale highly available databases (MariaDB) either in the cloud or on premises.
SqlDM – Idera
Provides an unprecedented level of diagnostic information on the health, performance, and status of
MariaDB instances across your environment. You can view, diagnose, and report on critical
performance statistics from a central point of control
26. Multi-MasterorNot
If your application can handle deadlock errors, multi-master is good to use
However, if a database has hot-spots, i.e. multi-master conflicts happen frequently, write
performance will suffer
But read scalability is always guaranteed
Use Master-Slave, if deadlock errors are a problem or conflict rate hurts your performance
27. GaleraCluster
Good Performance
Optimistic concurrency control
Virtually synchronous replication
Parallel replication
Optimized group communication
99.99% transparent
InnoDB look & feel, automatic node joining
Works in LAN / WAN / Cloud
42. Galera Replication
MariaDB
Quorum
MariaDB MariaDB
Load Balancing
x
Galera uses quorum based failure handling:
When cluster partitioning is detected, the majority
partition "has quorum" and can continue
A minority partition cannot commit transactions,
but will attempt to re-connect to primary
partition
Note: 50% is not majority!
=> Minimum 3 nodes recommended.
Load balancer will notice errors & remove node from
pool
43. The master Galera cluster consists of 3 nodes (1 in DC1, 1 in DC2, 1 as Arbitrator). In case a Galera node in DC1 or DC2 fails, the other one is still active.
The Arbitrator is used only for the quorum in order to avoid split-brain.
The Galera arbitrator is a full node but does not store any data. All transactions are sent to this node, its network must be fast enough to support this.
MaraiaDBTXGalera3-node clusterwithMAXScale
44. Master/SlaveCluster
Connection based routing
Low overhead
Balances a set of connections over a set of servers
Uses monitoring feedback to identify master and slaves
Connection weighting if configured
Load balances queries in round robin across configured servers
Each application has a read connection and a write connection
Uses read connrouter to route 2 services mariadb monitors the
replication cluster
Automatic failover: election and promotion of slave
MASTER SLAVES
MaxScale
Write
Connection
Service
Read
Connection
Service
write read
45. MariaDBCluster
Connection based routing
Low overhead
Balances a set of connections over a set of servers
Uses monitoring feedback to elect write master node
Connection weighting if configured
Load balances queries in round robin across configured servers
Each application has a read connection and a write connection
Uses read connrouter to route 2 services
GaleraMon monitors the cluster and elects the master
No external failover required
WRITE
MASTER NODES
READ NODES
MaxScale
Write
Connection
Service
Read
Connection
Service
write read
46. LoadSegregationWithinanApplication
Galera cluster or Master-Slave environment
Connection based or Statement based
Route queries to specific servers based on regular
expression match
One service for all workload configured to
Route queries that match a “*from *users”
to server3
All other queries follow default routing configured
(connection based or statement based)
Monitors the cluster and elect the master
/*from *users/
SERVER2 SERVER3SERVER1
Client Application
All other queries
MaxScale
Notas do Editor
Specific slide for use on presentations used at OpenWorks 2019
Note: the DB of MariaDB is not bolded
Let’s start our discussion today by trying to answer a few simple questions on the need an enterprise would consider a DB architecture that supports multi Data Center operations.
As we’ll see today the reason can vary, the architectures can vary and most importantly the complexity and the ability to manage it will drive the choices one will make.
Discuss Briefly, spend about 60 seconds just a few of the sub topics in general. We’ll get to greater detail later.
Topology Choices for determined use cases
Traditional Disaster Recovery/Secondary Site – HA/FO
Asynchronous/Semi-Synchronous (Warm Site, Read/Write Split, Read Scale, Disbursed Backup Node)
Explain Master/Slave Basics
Geo-Synchronous Distributed – Multi-Master/Active-Active
Synchronous(Hot Site, No Switch over)
Application/Location Affinity(Application and DB partitioning)
Explain Galera Basics
Technical Architectures – How they work, How to Manage
MDB Standard Master/Slave
HA/FO/Read Scale/Read/Write Splitting
Binlog and Relay Log
MDB Cluster – Galera Cluster
Multi Master/Active-Active
WSREP, Certify, Apply flow
Application/Location Affinity
Split Brain(What it is and how to mitigate), Galera Arbitrator
MaxScale – Simplicity and Management(Should we include ClusterControl ?)
Query Routing
Load Balancing
Automated FO
MaxScale Redunancy/SPF(keepalived)
Monitoring
SqlDM or some network tools
ClusterControl ?
Command line
Let’s start our discussion today by trying to answer a few simple questions on the need an enterprise would consider a DB architecture that supports multi Data Center operations.
Today, companies are undergoing a digital transformation: offline operations are becoming online operations, enterprise applications are becoming customer-facing applications, and engagement is happening anywhere and everywhere via web, mobile and Internet of Things (IoT) applications – and when it comes to customer experience, availability is not a preference, it is a requirement.
As we’ll see today the reason can vary, the architectures can vary and most importantly the complexity and the ability to manage it will drive the choices one will make.
Many reasons why we would run across multiple datacenters !
However, when it comes any of the 4 high level reasons, the trade-offs between performance, durability and consistency have to be considered. There are times when durability and consistency is more important than performance. There are times when performance is more important. The right trade-offs depends on the business needs, the use case and the technical requirements.
So what we’ll look at is how I can take MDB as illustrated above, assembly the necessary components and deploy across multiple DC’s.
Basically – what are the capabilities with MDB that can be used in multi-DC architecture, what can I use ?
MariaDB TX uses local storage and replication (with or without clustering) to provide high availability via multiple database servers. There is no single point of failure (SPOF). In fact, when MariaDB TX is configured for high availability, downtime due to an unplanned infrastructure failure is all but removed.
MariaDB TX, with a history of proven enterprise reliability and community-led innovation, is a complete database solution for any and every enterprise. MariaDB TX, when deployed, is comprised of MariaDB connectors (e.g., JDBC/ODBC), MariaDB MaxScale (a database proxy and firewall), MariaDB Server, MariaDB Cluster (multi-master replication), MariaDB tools and access to MariaDB services – and is available via an enterprise open source subscription.
Clustering
In the example above, a three-node cluster is deployed across two data centers with two nodes in the active data center (DC1) and one node in the passive data center (DC2). The configuration for the database proxy in DC1, Proxy 1, assigns a priority value of 1 to Node 1, 2 to Node 2 and 3 to Node 3. It assigns the role of master to Node 1 because Node 1 has the highest priority value. Proxy 1 uses a basic router to route all reads and writes to any node assigned the master role, Node 1 by default. If Node 1 fails, Proxy 1 will assign the role of master to Node 2 – in the same data center, DC1.
If DC1 fails, applications can connect to the database proxy in DC2, Proxy 2. The configuration for Proxy 2 assigns a priority value of 1 to Node 3, 2 to Node 2 and 3 to Node 1. It assigns the role of master to Node 3 because it has the lowest priority value. Proxy 2 uses a basic router to route all reads and writes to any node assigned the master role, Node 3 by default.
A cluster can be deployed across multiple data centers if a) there is enough network bandwidth between them to minimize latency and b) the write sets are small (i.e., transactions to not change a lot of rows). Partitioning the data.
Multiple DC’s
Master/slave replication
In the example below, circular replication (e.g., bidirectional master/slave replication) can be used to synchronize data between an active data center (DC1) and a passive data center (DC2). The master in DC1 is configured as a slave to the master in DC2. The master in DC2 is configured as a slave to the master in DC1. If DC1 fails, applications can connect to the database proxy, MariaDB MaxScale, in DC2.
Read scalability
Master/slave replication
In the example above, a second database proxy is configured and deployed as a binlog server to relay transactions from the master to many slaves for read scaling. The binlog server reduces the replication overhead on the master – instead of many slaves replicating from the master, a single binlog server replicates from it.
The master is configured for semi-synchronous replication, for high availability and durability, with Slave M1 and Slave M2, and for asynchronous replication, for read scalability, with the binlog server. The database proxy is configured with two routers, one for each cluster, with each router having a different port. The first will route all writes to the master in Cluster 1. The second will route all reads to the slaves in Cluster 2 (Slave R2 to Slave R100).
The Binlog Server
MAXSCALE AS A REPLICATION PROXY, AKA THE BINLOG SERVER
In database setups with a large number of users reading data, the Binlog Server can be used to offload traffic to the Master, make Master failover easier to handle and in general simplify replication. In this blog I will describe the benefits of Binlog Server and how to set up MaxScale as a Binlog Server.
In a traditional MariaDB/MySQL replication setup a single master server is created and a set of slaves of MariaDB/MySQL servers are configured to pull the binlog files from the master, putting a lot of load on the master. Introducing a layer between the master server and the slave servers can reduce the load on the master by only serving MaxScale’s Binlog Server instead of all the slaves. The slaves will only need to be aware of the Binlog Server and not the real master server. Removing the requirement for the slaves to have knowledge of the master also simplifies the process of replacing a failed master within a replication environment.
MaxScale, with Binlog Server, can act as a slave to the real master and as a master to the slaves in the same way as an intermediate MySQL master does, however it does not implement any re-execution of the statements within the binary log. The latency that is introduced is mostly added network latency associated with adding the extra network hop. There is no appreciable processing performed at the MaxScale level, other than for managing the local cache of the binlog files.
In addition every MaxScale that is acting as a proxy of the master will have exactly the same binlog events as the master itself. This means that a slave can be moved between any of the MaxScale servers or to the real master without the need to perform any special processing. The result is much simpler behavior for failure recovery and the ability to have a very simple and redundant proxy layer for the slave servers.
THE BINLOG SERVER’S MAIN FEATURES ARE:
The Binlog Server requests and receives binlog records from the master server autonomously of any slave activity.
Stored binlogs are identical to the one stored in the Master server.
Binlog records received from the master must be relayed to the slaves that are able to accept them.
The slave servers must be able to request historical binlog records without sending any additional traffic to the master server.
Master/slave replication
The master assigns a transaction, a global transaction ID (GTID) and writes the transaction to its binary log. A slave requests the next transaction from the master (sending its current GTID), writes the transaction to its relay log and executes it.
Automatic failover
The database proxy, MariaDB MaxScale, has built-in automatic failover. If it is enabled, and the master fails, it will promote the most up-to-date slave (based on GTID) to master and reconfigure the remaining slaves (if any) to replicate from it. In addition, if automatic rejoin is enabled and the failed master is recovered, it will be reconfigured as a slave.
Asynchronous replication
With asynchronous replication, transactions are replicated after being committed. The master does not wait for any of the slaves to acknowledge the transaction before committing it. It does not affect write performance. However, if the master fails and automatic failover is enabled, there will be data loss if one or more transactions have not been replicated to a slave.
In the example above, the database proxy would promote Slave 1 to master because it is the slave with the highest GTID. However, the most recent transaction (GTID = 3) had not been replicated before the master failed. There would be data loss.
Asynchronous replication is recommended for read-intensive workloads or mixed/write-intensive workloads where the highest write performance is required.
Asynchronous replication is recommended for read-intensive workloads or mixed/write-intensive workloads where the highest write performance is required.
Semi-synchronous replication
With semi-synchronous replication, a transaction is not committed until it has been replicated to a slave. It affects write performance, but the effect is minimized by waiting for transactions to replicate to one slave rather than every slave. However, if the master fails and automatic failover is enabled, there will be no data loss because every transaction has been replicated to a slave.
In the example below, the database proxy would promote Slave 2 to master because it is the slave with the highest GTID. With semi-synchronous replication, there would be no data loss because at least one of the slaves will have every transaction written to its relay log.
NOTE:
The master will wait for up to 10 seconds (default) for a transaction to be replicated to a slave before it reverts to asynchronous replication. If it does, and one of the slaves catches up, the master will restore semi-synchronous replication. If all of the slaves are slow, the timeout can be reduced to maintain write performance (but with less durability), or increased to maintain durability (but with less write performance).
Semi-synchronous replication is recommended for mixed/write-intensive workloads where high write performance and strong durability are required.
Clustering
In the example above, a three-node cluster is deployed across two data centers with two nodes in the active data center (DC1) and one node in the passive data center (DC2). The configuration for the database proxy in DC1, Proxy 1, assigns a priority value of 1 to Node 1, 2 to Node 2 and 3 to Node 3. It assigns the role of master to Node 1 because Node 1 has the lowest priority value. Proxy 1 uses a basic router to route all reads and writes to any node assigned the master role, Node 1 by default. If Node 1 fails, Proxy 1 will assign the role of master to Node 2 – in the same data center, DC1.
If DC1 fails, applications can connect to the database proxy in DC2, Proxy 2. The configuration for Proxy 2 assigns a priority value of 1 to Node 3, 2 to Node 2 and 3 to Node 1. It assigns the role of master to Node 3 because it has the lowest priority value. Proxy 2 uses a basic router to route all reads and writes to any node assigned the master role, Node 3 by default.
Multi-master clustering
MariaDB TX supports multi-master clustering via MariaDB Cluster (i.e., Galera Cluster). The originating node assigns a transaction a GTID, and during the commit phase, sends all of the rows modified by it (i.e., writes) to every node within the cluster, including itself. If the writes are accepted by every node within the cluster, the originating node applies the writes and commits the transaction. The other nodes will apply the writes and commit the transaction asynchronously.
Automatic failover
If there is a node failure, the cluster will automatically remove it and the database proxy, MariaDB MaxScale, will stop routing queries to it. If the database proxy was routing reads and writes to the failed node, and because every node can accept reads and writes, the database proxy will select a different node and begin routing reads and writes to it.
Synchronous replication
With synchronous replication, a transaction is not committed until its changes (i.e., modified rows) have been replicated to every node within the cluster. The write performance is limited by the slowest node within the cluster. However, if the node a write was routed to fails, there will be no data loss because the changes for every transaction will have been replicated to every node within the cluster.
In the example below, the database proxy would be routing reads and writes to Node 2 because it has the lowest priority value of the remaining nodes. There would be no data loss because with synchronous replication, every node has the changes of every transaction.
The database proxy can be configured so automatic failover is deterministic (e.g., based on priority value) by setting the use_priority parameter to “true” in the Galera Cluster monitor configuration and the priority parameter in the database server configurations.
Clustering
In the example below, a three-node cluster is deployed across two data centers with two nodes in the active data center (DC1) and one node in the passive data center (DC2). The configuration for the database proxy in DC1, Proxy 1, assigns a priority value of 1 to Node 1, 2 to Node 2 and 3 to Node 3. It assigns the role of master to Node 1 because Node 1 has the lowest priority value. Proxy 1 uses a basic router to route all reads and writes to any node assigned the master role, Node 1 by default. If Node 1 fails, Proxy 1 will assign the role of master to Node 2 – in the same data center, DC1.
If DC1 fails, applications can connect to the database proxy in DC2, Proxy 2. The configuration for Proxy 2 assigns a priority value of 1 to Node 3, 2 to Node 2 and 3 to Node 1. It assigns the role of master to Node 3 because it has the lowest priority value. Proxy 2 uses a basic router to route all reads and writes to any node assigned the master role, Node 3 by default.
In a Galera cluster, an uneven number of nodes, e.g. 5 or 7, is strongly advised in order to avoid split-brain situations.
MaxScale can be used for load balancing.
Specific slide for use on presentations used at OpenWorks 2019
In a Galera cluster, an uneven number of nodes, e.g. 5 or 7, is strongly advised in order to avoid split-brain situations.
MaxScale can be used for load balancing.
One Service with - One listener port(readwritesplit) or two Service with a read and a write listener port(readconnrouter)
traffic separated by queries - Each service configured with NamedServerFilter with pattern match on the queries to be separated to specific server
[NamedServerFilter]type=filtermodule=namedserverfiltermatch= *from *usersoptions=ignorecaseserver=server3[MyService]type=servicerouter=readwritesplitservers=server1,server2,server3user=myuserpassword=mypasswordfilters=NamedServerFilter