2. Who am I ?
o Santhinesh Kumar Nagendran
o Currently working as Senior Database
Administrator @ Tesla Inc. Over 12 years
Industry experience in supporting
environments like healthcare, social
networking applications like AOL, IBIBO,
Sify etc. I primarily focus on Database
High availability and DB automations at
large scale.
3. Agenda
1. Why HA ?
2. HA Objectives
3. MySQL HA Solutions
4. Why MySQL GR ?
5. Implementation
6. Conclusion
4. Why HA ?
u Continuation of Services with minimal or no interruptions
u Improve Operations Standards by
u Hardware Upgrades ( Memory/CPU Upgrades )
u OS Security patches
u To meet application/Business/Customer SLAs
5. HA Objectives
v How much reliable is your HA solution ?
v Can we afford the complexity to fix issues caused by improper failover ?
v Cost associated with no/manual failover vs fixing unexpected improper failover ?
v Do we have the skill set to support the HA solution implemented
6. MySQL HA
Solutions
ü Master - Master Replication with HA proxy
ü MySQL MHA with Keepalived
ü MySQL MHA
ü InnodB Cluster
8. M - M Replication with HA proxy
Good Bad
Seamlessly failover happened when
primary became inaccessible
Connections goes back to old primary if it
comes back online in read-write mode
New connections went to new master
without any user interruptions
Need to keep both the Master-Master
servers in read-write mode all the time
Read Write split using respective TCP
port
Very high probability to have accidental
writes on both the servers
Repointing of Replication to new master
Fixing data is a big mess
9. Ø Too Many HA proxy Servers to handle when deployed in large scale
Ø Not a cost effective as it needed 2 HA proxy servers for each 3 node clusters
Ø HA proxy is not technically designed for MySQL/Database alone
Ø Need to remove old server from the config file immediately after a failover
happens to avoid failback when the failed server comes back online.
Ø NON-DB components for customer to go through to reach the database
F5
HA
proxy
DB
Server
Existing Drawbacks and future Requirements
10. MySQL MHA with Keepalived
S1
S2 S3
S1
S2 S3
crashed
Alias to MHA VIP Alias to MHA VIP
Master / RW
Slave1 / RO Slave2 / RO Master / RW Slave2 / RO
Keepalived VIP
Keepalived Service should be running in Master and Candidate Masters
Keepalived VIP
MHA does failover by stopping
keepalived in old master
Users/App
Users/App
11. MySQL MHA with Keepalived
Good Bad
Seamlessly failover happened when primary
became inaccessible
MHA manager demon stops working to
avoid another failover so DBA is asked to
verify each failovers
New connections went to new master without
any user interruptions
Not a fully automatic solution it requires
necessary manual interventions
Corrupt server goes out of cluster by itself If the server goes unreachable due to
firewall issue
Keeps only one server in [ read—write mode ]
rest all the servers will be or should be in
read-only mode
Keepalived also fails over independently
Manual Failover is possible keeping existing
master alive or dead
12. Existing Drawbacks and future Requirements
Ø Non-Standard / Custom Monitoring required to monitor components and
failures
Ø Needed proper inventory and automations to support MHA clusters in large
scale
Ø Too many false failovers due to keepalived due to network glitches
Ø Too many components for customer to deal with in a HA setup
F5
Keepalived
MHA
DB
Server
13. MySQL MHA with F5
S1
S2 S3
S1
S2 S3
crashed
F5 BigIP
F5 BigIP
Master / RW
Slave1 / RO Slave2 / RO
Master / RW Slave2 / RO
F5 checks for read_only parameter
to be off to send traffic to prod F5 checks for read_only parameter
to be off to send traffic to prod
MHA does failover without any
other VIP involved
Users/App
Users/App
14. MySQL MHA with F5
Good Bad
Seamlessly failover happened when
primary became inaccessible
MHA manager demon stops working to avoid
another failover so its DBA JOB to verify each
failovers completely
F5 checks for server in read-write mode
New connections went to new master
without any user interruptions
Not a fully automatic solution it requires
necessary manual interventions
Corrupt server goes out of cluster by itself non standard / custom monitoring
components like mha_manager etc.
Keeps only one server in [ read—write
mode ] rest all the servers will be in read-
only mode.
Complicated Setup to support in large scale
Manual Failover is possible keeping
existing master alive or dead
15. Existing Drawbacks and future Requirements
Ø Non-Standard / Custom Monitoring required to alter failures
Ø Needed proper inventory and automations to support MHA clusters in large
scale
Ø Too many false failovers due to keepalived due to network glitches
Ø Too many components for customer to deal with in a HA setup
F5 MHA
DB
Server
17. InnodB Cluster
Good NOTES
• Powered by mysql shell.
• mysqlsh makes setting innodb cluster
is extremely easy
Deafult user authentication plugin
change from mysql_native_password
(5.7) to caching_sha2_password (8.0)
Mysqlrouter servers can support multiple
innodb clusters which is a great relief
Replication between multi zonal clusters
can be challenging when a failover
happens
util.checkForServerUpgrade() makes DBAs
life so easy and saves hell lots of time
Can have replication between multiple
innodb clusters
Filtered multi-master replication
18. o Have lots of ways to monitoring
o Can get cluster status using mysql shell
o Can fetch the cluster status from performance_schema.
replication_group_members
o If we store the clusters is a proper inventory we can monitor respective clusters
and setup alerts for events like
o If a node gets out in a 3 node cluster then cluster status goes to OK_NO_TOLERANCE
o Can setup alerts when number of active group members are not equal to the number of
servers involved in that cluster as per inventory
How do you Monitor ?
19. Conclusion
o Group Replication has always been one of the best inhouse product for MySQL
o Empowered by Mysqlrouter and MySQL shell utilities
o One of the best and stable HA’s I have worked on till now.