MySQL HA with Pacemaker

MySQL HA
with PaceMaker
Kris Buytaert
#opendbcamp

Kris Buytaert
● I used to be a Dev, Then Became an Op,
● Today I feel like a dev again
● Senior Linux and Open Source Consultant @inuits.be
● „Infrastructure Architect“
● Building Clouds since before the Cloud
● Surviving the 10th floor test
● Co-Author of some books
● Guest Editor at some sites

In this presentation
● High Availability ?
● MySQL HA Solutions
● Linux HA / Pacemaker

What is HA Clustering ?

● One service goes down
=> others take over its work
● IP address takeover, service takeover,
● Not designed for high-performance
● Not designed for high troughput (load
balancing)

Lies, Damn Lies, and
Statistics
Counting nines
(slide by Alan R)

99.9999% 30 sec
99.999% 5 min
99.99% 52 min
99.9% 9 hr
99% 3.5 day

The Rules of HA

● Keep it Simple
● Keep it Simple
● Prepare for Failure
● Complexity is the enemy of reliability
● Test your HA setup

Eliminating the SPOF
● Find out what Will Fail
• Disks
• Fans
• Power (Supplies)
● Find out what Can Fail
• Network
• Going Out Of Memory

Data vs Connection
● DATA :
• Replication
• Shared storage
• DRBD
● Connection
• LVS
• Proxy
• Heartbeat / Pacemaker

Shared Storage
● 1 MySQL instance
● Monitor MySQL node
● Stonith
● $$$ 1+1 <> 2
● Storage = SPOF
● Split Brain :(

DRBD
● Distributed Replicated Block Device
● In the Linux Kernel
● Usually only 1 mount
• Multi mount as of 8.X
• Requires GFS / OCFS2
● Regular FS ext3 ...
● Only 1 MySQL instance Active accessing data
● Upon Failover MySQL needs to be started on
other node

DRBD(2)
● What happens when you pull the plug of a
Physical machine ?
• Minimal Timeout
• Why did the crash happen ?
• Is my data still correct ?
• Innodb Consistency Checks ?
• Lengthy ?
• Check your BinLog size

Other Solutions Today

● MySQL Cluster NDBD
● Multi Master Replication
● MySQL Proxy
● MMM
● Flipper
● BYO
● ....

Pulling Traffic
● Eg. for Cluster, MultiMaster setups
• DNS
• Advanced Routing
• LVS

• Or the upcoming slides

Linux-HA PaceMaker
● Plays well with others
● Manages more than MySQL
●

● ...v3 .. don't even think about the rest anymore
●

● http://clusterlabs.org/

Heartbeat v1
• Max 2 nodes
• No finegrained resources
• Monitoring using “mon”

/etc/ha.d/ha.cf
/etc/ha.d/haresources
mdb-a.menos.asbucenter.dz ntc-restart-mysql mon IPaddr2::10.8.0.13/16/bond0
IPaddr2::10.16.0.13/16/bond0.16 mon

/etc/ha.d/authkeys

Heartbeat v2
• Stability issues
• Forking ?

“A consulting Opportunity”
LMB

Clone Resource
Clones in v2 were buggy
Resources were started on 2 nodes
Stopped again on “1”

Heartbeat v3

• No more /etc/ha.d/haresources
• No more xml
• Better integrated monitoring
• /etc/ha.d/ha.cf has
• crm=yes

Pacemaker ?
● Not a fork
● Only CRM Code taken out of Heartbeat
● As of Heartbeat 2.1.3
• Support for both OpenAIS / HeartBeat
• Different Release Cycles as Heartbeat

Heartbeat, OpenAis,
Corosync ?
● All Messaging Layers
● Initially only Heartbeat
● OpenAIS
● Heartbeat got unmaintained
● OpenAIS had heisenbugs :(
● Corosync
● Heartbeat maintenance taken over by LinBit
● CRM Detects which layer

Pacemaker

Heartbeat or OpenAIS

Cluster Glue

● Stonithd : The Heartbeat fencing subsystem.

Pacemaker Architecture
● Lrmd : Local Resource Management Daemon. Interacts
directly with resource agents (scripts).
● pengine Policy Engine. Computes the next state of the
cluster based on the current state and the configuration.
● cib Cluster Information Base. Contains definitions of all
cluster options, nodes, resources, their relationships to
one another and current status. Synchronizes updates to
all cluster nodes.
● crmd Cluster Resource Management Daemon. Largely
a message broker for the PEngine and LRM, it also elects
a leader to co-ordinate the activities of the cluster.
● openais messaging and membership layer.
● heartbeat messaging layer, an alternative to OpenAIS.
● ccm Short for Consensus Cluster Membership. The
Heartbeat membership layer.

Configuring Heartbeat Correctly

heartbeat::hacf {"clustername":

hosts => ["host-a","host-b"],

hb_nic => ["bond0"],

hostip1 => ["10.0.128.11"],

hostip2 => ["10.0.128.12"],

ping => ["10.0.128.4"],

}

heartbeat::authkeys {"ClusterName":

password => “ClusterName ",

}

http://github.com/jtimberman/puppet/tree/master/heartbeat/

CRM
configure
property $id="cibbootstrapoptions"
● Cluster Resource         stonithenabled="FALSE"
        noquorumpolicy=ignore
Manager         startfailureisfatal="FALSE"
rsc_defaults $id="rsc_defaultsoptions"
        migrationthreshold="1"
● Keeps Nodes in Sync         failuretimeout="1"
primitive d_mysql ocf:local:mysql
        op monitor interval="30s"
        params test_user="sure" test_passwd="illtell"
● XML Based test_table="test.table"
primitive ip_db ocf:heartbeat:IPaddr2
        params ip="172.17.4.202" nic="bond0"
● cibadm         op monitor interval="10s"
group svc_db d_mysql ip_db
commit

● Cli manageable
● Crm

Heartbeat Resources
● LSB
● Heartbeat resource (+status)
● OCF (Open Cluster FrameWork) (+monitor)
● Clones (don't use in HAv2)
● Multi State Resources

LSB Resource Agents
● LSB == Linux Standards Base
● LSB resource agents are standard System V-
style init scripts commonly used on Linux and
other UNIX-like OSes
● LSB init scripts are stored under /etc/init.d/
● This enables Linux-HA to immediately support
nearly every service that comes with your
system, and most packages which come with
their own init script
● It's straightforward to change an LSB script to
an OCF script

OCF
● OCF == Open Cluster Framework
● OCF Resource agents are the most powerful type of
resource agent we support
● OCF RAs are extended init scripts
• They have additional actions:
• monitor – for monitoring resource health
• meta-data – for providing information about the RA

● OCF RAs are located in
/usr/lib/ocf/resource.d/provider-name/

Monitoring
● Defined in the OCF Resource script
● Configured in the parameters
● You have to support multiple states
• Not running
• Running
• Failed

Anatomy of a Cluster
config

• Cluster properties
• Resource Defaults
• Primitive Definitions
• Resource Groups and Constraints

Cluster Properties

property $id="cib-bootstrap-options"
stonith-enabled="FALSE"
no-quorum-policy="ignore"
start-failure-is-fatal="FALSE"

No-quorum-policy = We'll ignore the loss of quorum on a 2 node cluster

Start-failure : When set to FALSE, the cluster will instead use the resource's failcount and value for resource-failure-
stickiness

Resource Defaults

rsc_defaults $id="rsc_defaults-options"
migration-threshold="1"
failure-timeout="1"
resource-stickiness="INFINITY"

failure-timeout means that after a failure there will be a 60 second timeout before the resource can come back to the
node on which it failed.

Migration-treshold=1 means that after 1 failure the resource will try to start on the other node

Resource-stickiness=INFINITY means that the resource really wants to stay where it is now.

Primitive Definitions

primitive d_mine ocf:custom:tomcat
params instance_name="mine"
monitor_urls="health.html"
monitor_use_ssl="no"
on-fail="restart"

primitive ip_mine_svc ocf:heartbeat:IPaddr2
params ip="10.8.4.131" cidr_netmask="16" nic="bond0"

Parsing a config
● Isn't always done correctly
● Even a verify won't find all issues
● Unexpected behaviour might occur

Where a resource runs
• multi state resources
• Master – Slave ,
• e.g mysql master-slave, drbd
• Clones
• Resources that can run on multiple nodes
e.g
• Multimaster mysql servers
• Mysql slaves
• Stateless applications
• location
• Preferred location to run resource, eg. Based on hostname
• colocation
• Resources that have to live together
• e.g ip address + service
• order
Define what resource has to start first, or wait for another resource
• groups
• Colocation + order

eg. A Service on DRBD
● DRBD can only be active on 1 node
● The filesystem needs to be mounted on that
active DRBD node

group svc_mine d_mine ip_mine

ms ms_drbd_storage drbd_storage

meta master_max="1" master_node_max="1" clone_max="2" clone_node_max="1"
notify="true"

colocation fs_on_drbd inf: svc_mine ms_drbd_storage:Master

order fs_after_drbd inf: ms_drbd_storage:promote svc_mine:start

location cli-prefer-svc_db svc_db

rule $id="cli-prefer-rule-svc_db" inf: #uname eq db-a

A MySQL Resource
● OCF
• Clone
• Where do you hook up the IP ?
• Multi State
• But we have Master Master replication
• Meta Resource
• Dummy resource that can monitor
• Connection
• Replication state

Simple 2 node example
primitive d_mysql ocf:ntc:mysql
params test_user="just" test_passwd="kidding" test_table="really"

primitive ip_mysql_svc ocf:heartbeat:IPaddr2
params ip="10.8.0.30" cidr_netmask="255.255.255.0"
nic="bond0"

group svc_mysql d_mysql ip_mysql_svc

Monitor your Setup
● Not just connectivity
● Also functional
• Query data
• Check resultset is correct
● Check replication
• MaatKit
• OpenARK

How to deal with replication state ?
● Multiple slaves

• Use Drbd ocf resource
● 2 masters only use own script

• Replication is slow on the active node

• Shouldn't happen talk to HR / cfgmt people

• Replication is slow on the passive node

• Weight--

• Replication breaks on the active node

send out warning, don't modify weights and check other node

• Replication breaks on the passive node

• Fence of the passive node

Adding MySQL to the
stack

Replication
Service IP MySQL

“MySQLd” “MySQLd” Resource MySQL

Cluster Stack
Pacemaker

HeartBeat
Node A Node B Hardware

Pitfalls & Solutions
● Monitor,
• Replication state
• Replication Lag

● MaatKit
● OpenARK

Conclusion
● Plenty of Alternatives
● Think about your Data
● Think about getting Queries to that Data
● Complexity is the enemy of reliability
● Keep it Simple
● Monitor inside the DB

Contact
Kris Buytaert Kris.Buytaert@inuits.be

Further Reading
@KrisBuytaert
http://www.krisbuytaert.be/blog/
http://www.inuits.be/
http://www.virtualization.com/
http://www.oreillygmt.com/

Inuits Esquimaux
't Hemeltje Kheops Business
Gemeentepark 2 Center
2930 Brasschaat Avenque Georges
891.514.231 Lemaître 54
6041 Gosselies
+32 473 441 636 889.780.406
+32 495 698 668

MySQL HA with Pacemaker

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (14)

Semelhante a MySQL HA with Pacemaker

Semelhante a MySQL HA with Pacemaker (20)

Mais de Kris Buytaert

Mais de Kris Buytaert (20)

Último

Último (20)

MySQL HA with Pacemaker