MySQL HA reloaded - old tricks and cool new tools to guarantee high availability to your MySQL Servers

High Availability
Reloaded

IVAN ZORATTI
Chief Technology Ofﬁcer

Oracle, MySQL and InnoDB are registered trademarks of Oracle and/or its afﬁliates. Other names may be trademarks of their respective owners. 1201.01.01
Tuesday, 24 January 12

Agenda

• SkySQL - 3 (+1) slides!
• A bit of theory
• High availability solutions
• ...and the famous last words!

2


SkySQL Ab
• Funded by:
–MySQL® AB founders Monty Widenius and
David Axmark
–US Investment group OnCorps.org
• A team of 40, operating in 14 countries, 90%
from MySQL® AB
• Backed by:
–Product Engineering MontyProgram Ab
–Top Community contributors, commercial partners
and end users

3


SkySQL Offering
• SkySQL Enterprise Subscriptions
– Monitoring, Administration and End User tools
– Specialised modules for High Availability and performance
improvements1
• SkySQL Enterprise Cluster and SkySQL Enterprise HA
– Up to L3 Technical and Consultative Support for the most used
MySQL® distributions and branches
• SkySQL Consulting
– Top class team for MySQL® technology
– Extended service offering from Health Check to continuous
administration
• SkySQL Training
– MySQL® Training and Certification

1 - Option 4


The SkySQL Reference Architecture
Components
Integra&on
Integra&on
Tools
Tools
Migra&on
Migra&on
Tools
Tools

5


High Availability
...a bit of theory

6


High Availability

“High availability is a system design protocol and
associated implementation that ensures a
certain degree of operational continuity during a
given measurement period.”

7


Fault-tolerant?

“Fault-tolerant design enables a system
to continue operation,
possibly at a reduced level
(also known as graceful degradation),
rather than failing completely,
when some part of the system fails.”

8


Switchover / Failover

• Switchover
– “Switchover is the capability to manually switch over from
one system to a redundant or standby computer server,
system, or network upon the failure or abnormal termination
of the previously active server, system, or network.”

• Failover
– “Failover is the capability to switch over automatically to a
redundant or standby computer server, system, or network
upon the failure or abnormal termination of the previously
active application, server, system, or network.”

• Aided Switchover?
• Failback?
9


Downtime

• Planned/Scheduled

• Unplanned/Unscheduled

• “Downtime or outage duration refers to a
period of time that a system fails to provide or
perform its primary function.”

10


Single Point Of Failure - SPOF

“A Single Point of
Failure, (SPOF),
is a part of a system
which, if it fails,
will stop the entire
system from working.”

11


Disaster Recovery and
Business Continuity
“Disaster recovery is the “Disaster recovery
process, policies and planning is a subset of a
procedures related to larger process known
preparing for recovery as business continuity
or continuation of planning and should
technology include planning for
infrastructure critical to resumption of
an organization after a applications, data,
natural or human- hardware,
induced disaster.” communications (such
as networking) and
other IT infrastructure.”
12


Disaster Recovery and Business
Continuity
“Disaster recovery is the
process, policies and
procedures related to
preparing for recovery
or continuation of
technology
infrastructure critical to
an organization after a
natural or human-
induced disaster.”

13


Designing a Highly Available System

• Which level of High Availability do I need?

• Do I require no loss of data?

• Do I need failover or is switchover enough?

• Can I provide a reasonable service when a
component is down?

14


Something to clarify...

• Availability vs Scalability

• HA Costs

• HA for your systems, not only for your
database

• Review your SLAs

15


High Availability
Solutions

16


High Availability with MySQL
Higher
Availability

• Combined solutions
• Shared nothing distributed cluster with MySQL
Cluster
• Geographical Replication for disaster recovery
• Virtualised Environments
• Active/Passive Clusters through shared storage
• MySQL synchronous replication
• Generic synchronous replication
• MySQL Replication with agents and failover
• MySQL Replication
17


MySQL Replication
• Something you may have missed...
–Asynchronous or Semi-synchronous
–Pros and Cons of RBR vs SBR
–Mono-thread pull from
the slaves
–sync_binlog = 0/1
–Antilope vs Barracuda Read-Write
–Group Commit Read-Only Read-Only

–Multi-engines

–Rolling upgrades binlog

99
18
relaylog relaylog relaylog relaylog

MySQL Replication with MMM
(Multi-Master replication Manager)
• Master-Master features:
–Monitoring
–Automatic failover
–Data backup
–Resynch Read-Write

mmm_agentd mmm_agentd
• http://code.openark.org/blog/mysql/
problems-with-mmm-for-mysql Read-Only Read-Only
• http://www.xaprb.com/blog/2011/05/
04/whats-wrong-with-mmm/

binlog binlog
mmm_mond

relaylog relaylog relaylog relaylog

http://mysql-mmm.org 19


MySQL Replication with MHA

• Something to consider...
–read-only=1 and
log-bin on slaves
–Master IP failover
–Filtering rules
–multi-tier replication

http://code.google.com/p/mysql-master-ha/ 20


Tungsten Replicator
• Open Source, heterogenous replication
• Truly multi-master
and fan-in with
Global ID
• Per-schema Read-Write

multi-thread

Replicator Replicator
agent agent

Replicator Replicator
agent agent

http://code.google.com/p/tungsten-replicator/ 21


Tungsten Enterprise

Connector Connector Connector Connector Connector

Read-Write

• Tungsten Replicator + Replicator
+ Monitor
Replicator
+ Monitor

–Client Connector with R/W Replicator Replicator

split and load balancing
+ Monitor + Monitor

–Replication Monitoring
–Integrated backup

http://www.continuent.com/solutions/overview 22


Synchronous Replication with DRBD
• Typical Active/Standby
• Cross active/active servers implementations

• Possible issues:
–Dependencies
–Infrastructure SPOFs
–Write performance
impact
Active/Hot Passive/Std-by
Server Server

–InnoDB only

• DRBD in a virtualized
environment Block Block 23
Device Device

Synchronous Replication through DRBD
Conﬁguration
Gateway

192.168.1.1

192.168.1.X

Active/Hot VIP
192.168.1.2 Passive/Std-by
Server
Server
HB1: 10.0.3.X

HB2: 10.0.4.X
15 16

DRBD: 10.0.5.X

/dev/sdb /mysqldata /dev/sdb /mysqldata

Block Device Block Device
24


Synchronous Replication with Galera

• Synchronous replication for InnoDB
• Multi-master, no SPOF

• Application Read-Write Read-Write

failover must be
managed
• Conflict resolution

wsrep wsrep wsrep

http://www.codership.com 25


Percona XtraDB Cluster

• Alpha version of Galera + XtraDB
• Multi-master, no SPOF

• Application
failover must be Read-Write Read-Write

managed
• Conflict resolution with
aborted COMMITs
• Auto Increment
• No XA TXN
• NoPK operations issues wsrep wsrep wsrep

http://www.percona.com/doc/percona-xtradb-cluster 26


SchoonerSQL
• Synchronous master-slave replication for InnoDB
• Retrieve/Inject in
the transaction
log and buffer
pool
• Monitoring/
Administration
tool

• Closed source

27


Active/Passive Clusters using
Shared Storage

• Points to consider: Active/Hot Passive/Std-by
–Redundancy and replication Server Server

must be guaranteed by
the shared storage
(and this is not trivial)
–InnoDB only
–File Systems
Shared
Storage 28


Active/Passive Clusters using Shared Storage
Large Deployments

VIP01 VIP02 VIP03 VIP04
VIP05 VIP06 VIP07 VIP08

in01 in02 in03 in04
in05 in06 in07 in08

01 02 03
04 05 06
07 08

Shared Storage

29


Active/Passive Clusters using Shared Storage
Failover in Large Deployments

VIP01 VIP05
VIP02 VIP03 VIP04
VIP06 VIP07 VIP08

in02 in03 in04
in06 in07 in08
in01 in05

01 02 03
04 05 06
07 08

Shared Storage
30


Virtualised Environments
• Data storage, high availability and load balancing are
provided and managed by the virtualised software
• In case of fault, the virtualised software restarts on
any other
physical
server

• MySQL Replication
for disaster 01 03 05 07
recovery
• InnoDB only 02 04 06 08

01 02 03
04 05 06
07 08
31
Shared Storage

Geographical Replication for Disaster
Recovery
• Master-Master Asynchronous
Replication is used to update the
backup data centre
• In case of fault, the network
traffic is redirected to the
backup data centre. Failback
must be executed manually
• Cross-platform and cross-
engine
Active
Backup Data
Data Centre
Centre

32


Storage Snapshots for Disaster
Recovery
• Snapshots are managed by the NAS and SAN
firmware. There is usually a short read-only freeze
Active Data • Snapshots can be used as run-time
Centre backup
• InnoDB only, NetApp NASs and
firmware are certified using
Snapshot and Snapmirror

Backup Data
Centre

33


MySQL Cluster
• Shared-nothing, fully transactional and distributed architecture used for high volume and
small transactions.
• MySQL Cluster is based on the NDB (Network DataBase) Storage Engine
• Data is distributed for scalability and performance, and it is replicated for redundancy on
multiple data nodes.
Application Nodes
• Nodes in a cluster:
– SQL Nodes: provide the
SQL interface to NDB
– API Nodes: provide the
native NDB API
– Data Nodes: store and
retrieve data, manage
NDB API, ClusterJ/JPA

SQL Nodes
transactions
– Management Nodes:
manage the Cluster
• Load balanced
• Memory or disk-based
• Geographically replicated
for disaster recovery with
conflict resolution
• Full online operation for
maintenance and Management
administration Nodes
34
Data Nodes

Client-based Failover and Proxies

• Connector/J
–jdbc:mysql://[host][,failoverhost...][:port]

• mysqlnd_ms for PHP
–connection pooling for mysqli, mysql and
PDO_MYSQL

• ScaleBase

35


The absolutely necessary comparison chart...

Galera
MySQL
Shooner Shared
Geo
Storage
MySQL

MHA Tungsten DRBD XtraDB VM
Replica.on SQL Cluster Replica.on Snapshots Cluster
Cluster

100%
Data

✘ ✘ ✔ ✔ ✘ ✔ ✔ ✔ ✘ ✘ ✔
Safe

All
Storage

✔ ✔ ✔ ✘ ✘ ✘ ✘ ✘ ✔ ✘ ✘
Engines

Automa&c

✘! ✔ ✔ ✔ ✘! ✔ ✘! ✔ ✘ ✘ ✔
Failover

Performance

Overhead * * ** *** ** ** -‐ ** * ** *
(*
-‐
Best)

Easy
admin/
conﬁg * * * *** * * * * ** ** ***
(*
-‐
Best)

Scalability
(***-‐
Best) ** ** *** * ** ** * * * * **

36


The famous last words...
• I need 5 nines
–
• Everything must be automatic
–

• I want to migrate to MySQL Cluster
–
• I can’t afford to lose any data
–

• I need a sub-second failover
–
37


The famous last words...
• I need 5 nines
–Implement what you really need
• Everything must be automatic
–Aided switchover is sometimes more effective,
inexpensive and easy to implement/administer
• I want to migrate to MySQL Cluster
–Is your application designed for Cluster?
• I can’t afford to lose any data
–People lose data every day. Is the drop in
performance worth it?
• I need a sub-second failover
–Check the timeout periods and the caching warm-
ups 38


SkySQL Enterprise HA

• Full HA solution, supported on
–Platforms: Linux, Windsows Solaris X86
–DB Servers: Oracle MySQL, MariaDB, Percona Server
–2 to 3 days implementation guaranteed with
acceptance tests

• Technologies:
–MySQL Replication
–DRBD Active/Passive or Cross Active
–MHA Tool with/without Multi-tier Replication
–Linux or Windows Shared Storage
–MySQL Cluster
–Tungsten Enterprise
39


Thank You!

ivan@skysql.com
izoratti.blogspot.com
mysql4all.wordpress.com

40


MySQL HA reloaded - old tricks and cool new tools to guarantee high availability to your MySQL Servers

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

More from Ivan Zoratti

More from Ivan Zoratti (20)

Recently uploaded

Recently uploaded (20)

MySQL HA reloaded - old tricks and cool new tools to guarantee high availability to your MySQL Servers