Join Krzysztof Książek, Senior Support Engineer at Severalnines and expert Database Administrator (DBA), for this 60min crash course on managing open source databases!
If you’ve recently been tasked with taking care of an open source database and are not sure how to proceed; or have worked with open source databases before, but never had to manage them (in other words, you’re not a DBA) … this webinar is for you!
What should you focus on? What are the most important tasks of a DBA?
Krzysztof will share some tips to help you get started on your adventure in open source database management and will try to make it as database-agnostic as possible. In the end, no matter what database you end up managing, some of the tasks and skills required are the same.
So if you’re a sysadmin, DevOps, developer, system architect, … or a DBA in search of a refresher session, make sure to watch this webinar replay!
AGENDA
- What does working with open source databases involve?
- Runbooks
- Testing procedures
- Automation
- Why automate?
- What to automate?
- Tips and tricks for typical procedures in open source database management
- Backups
- High Availability
- Disaster Recovery
- Monitoring
- Health Checks
- Performance Tuning
SPEAKER
Krzysztof Książek, Senior Support Engineer at Severalnines, is a MySQL DBA with experience managing complex database environments for companies like Zendesk, Chegg, Pinterest and Flipboard.
2. Copyright 2018 Severalnines AB
Your host & some logistics
2
I'm Jean-Jérôme from the Severalnines Team and
I'm your host for today's webinar!
Feel free to ask any questions in the Questions
section of this application or via the Chat box.
You can also contact me directly via the chat box
or via email: jj@severalnines.com during or
after the webinar.
5. Copyright 2018 Severalnines AB
What Problems do we Address?
5
Deploy
Deploy MySQL, Postgres or MongoDB - single
instances or entire clusters
Monitor
Get a unified view of all clusters across all your data
centers
Scale
Add/remove nodes, resize instances & clone your
production clusters
Manage
Automatically repair & recover broken nodes or
clusters. Test & automate upgrades
7. Copyright 2018 Severalnines AB
•Each Cluster can be deployed and existing Clusters can be imported.
•Web UI
Deployment Wizard
•CLI
Allows easy integration with e.g Ansible
s9s cluster
--create
--cluster-type=galera
--nodes='10.10.10.26;10.10.10.27;10.10.10.28'
--vendor=percona
--cluster-name=PXC_CENTOS7
--provider-version=5.7
--os-user=vagrant --wait
•Supports multiple NICs and templated configurations.
Deployment Features in ClusterControl
7
8. Copyright 2018 Severalnines AB
•Database specific stats and Health status
Graphs and Dashboards
•Host statistics
E.g Predictive disk space usage monitoring
•Query Monitoring
E.g Top Queries, Outlier detection
•Advisors
Developer Studio with JS like syntax
•Notifications
Email, Pagerduty, VictorOps etc
•Operational Reports
Monitoring Features in ClusterControl
8
9. Copyright 2018 Severalnines AB
•Availability
Node/Cluster Recovery
•Backup and Restore
MySQL: mysqldump, xtrabackup
Postgres: pg_dump, pg_basebackup
MongoDb: Mongodump, MongoDb Consistent Backup
•Configuration
•Upgrades
•Loadbalancer
HAProxy, ProxySQL, MaxScale
KeepAlived
Management Features in ClusterControl
9
13. Copyright 2018 Severalnines AB
•How does it look like to work with databases?
•Automation
•Tips and tricks for typical management procedures
Backup
High Availability
Disaster Recovery
Monitoring
Health checks
Performance tuning
Agenda
13
15. Copyright 2018 Severalnines AB
•What is the most important asset?
Data!!!
•Safety of the data should always have the highest priority in your work
•Humans tend to make mistakes
This may have dire consequences to the data
•Always think twice before you act
Can the action endanger data?
Is the command you wrote something you really want to run?
Check for typos, check your path, check your host
Working with databases - what’s it like?
15
16. Copyright 2018 Severalnines AB
•You will work under pressure
Recovery from a data loss event
Recovery from a partial failure
Recovery from a total failure
•Stress increases chances for a mistake
•Make sure you are properly prepared
Working with databases - what’s it like?
•Create runbooks for every action you may
need to take
day-to-day operations
recovery scenarios
•Runbooks can guide you through the
process even if most of your brain switched
off - make them copy&paste friendly
•Test your runbooks, your system may
change in time, runbooks should change
along with it
16
17. Copyright 2018 Severalnines AB
•Test resilience of your environment
•Design it to be highly available and then try
to break it
•If you have small environment, manual
tests may be enough
•For larger setups think about implementing
something like Chaos Monkey
Working with databases - what’s it like?
•Use your SLA wisely
If you are well within limits, use
remaining time for something more risky
Do test failover
Try to improve automation around it
17
19. Copyright 2018 Severalnines AB
•Automation is one of the key factors in High
Availability implementation
Eliminate human error as a factor
Scripts, once properly tested and within
stable environment, should work majority
of the time
Scripts can replace runbooks - just run a
command to perform all actions instead
of going through them one after another
Automate, automate, automate!
•Enjoy variety of tools, which will help you
automate database management:
Bash
Perl, Python, Ruby, Go
Ansible, Chef, Puppet, Salt
ClusterControl CLI
•No matter how you do it, make it easy to
read, understand and maintain
19
21. Copyright 2018 Severalnines AB
•All datastores are different
Yet they share some common patterns
and processes
They require similar approach
Priorities of the tasks performed by the
administrator are also similar
Tips and tricks for database management
21
22. Copyright 2018 Severalnines AB
•Data is your priority, you should protect it
•Backups will help you in number of cases:
Data loss (a.k.a rm -rf)
Hacks, including ransom hacks
Disaster which require environment to be
rebuilt from scratch
•Backups will be last line of defense in
maintaining availability of your database
It may take long to restore from backup
but at least you’ll be able to do so
Tips and tricks - Backups
•To be trusted, backups have to be tested
Schrödinger’s backup - you never know
if it works or not unless you restore it
•Test your backups frequently
Ideally, all of them will be tested
Always do tests after a change in your
backup process
22
23. Copyright 2018 Severalnines AB
•In addition to knowing if your backups work or not, it’s good to know how long it will take to
restore it
•Knowing recovery time helps when things went south and someone has to do disaster
management
It looks better if you can give estimates of when service will be up again
It looks even better if those are realistic estimates
•What’s good, you know your recovery process and how long it takes
You are testing your backups, right?
Tips and tricks - Backups
23
25. Copyright 2018 Severalnines AB
•Download The DevOps Guide to Database Backups for MySQL and MariaDB
Tips and tricks - Backups
25
26. Copyright 2018 Severalnines AB
•Keep the service up - this is second most important aspect of working with databases
•High Availability is all about removing all existing single points of failure
and not introducing new ones
•You need to define your requirements for High Availability
You will have to decide what kind of failures your database should be able to handle.
Is it a failure of one or more servers (how many?)
Is it a failure of a whole datacenter?
Is it maybe multi-datacenter failures?
Tips and tricks - High Availability
26
27. Copyright 2018 Severalnines AB
•Once expectations are defined, it’s time to design. Three shall be the number thou shalt count
Two nodes/datacenters - it’s not enough, you have to have a third one to handle network
split
•It may be just a single node or an arbitrator, as long as the „observer” is there, you can
handle a single datacenter failure
Tips and tricks - High Availability
27
28. Copyright 2018 Severalnines AB
•Magic „three” is just a beginning, there are questions you’ll have to answer:
How your application will connect to the database?
Are you going to use proxy layer?
How your application will determine where to connect?
How your environment will be interconnected?
Will you write to multiple nodes or will you use a single writer?
Can your datastore re-provision failed nodes or will you have to do that?
Can your datastore perform automated recovery of failed nodes?
Tips and tricks - High Availability
28
29. Copyright 2018 Severalnines AB
•Again, planning is the key
Define the requirements
•Identify failure scenarios
Network split
Data loss
Node failure
Datacenter failure
Too good marketing
Tips and tricks - Disaster Recovery
•Test recovery scenarios
Identify time and effort required
Build runbooks, automation to make it fast
•What can you do?
Have a standby environment to failover to
Have tested backups
Have a tested recovery scripts
Have an automated way to scale out your
cluster
29
31. Copyright 2018 Severalnines AB
•You have to be able to understand the state
of your database
Use software which will connect to it and
collect metrics
Use software which will generate graphs
out of that metrics
You cannot watch everything in real time
- graphs will help you understand what
happened in the past
Graphs will help you see patterns
Tips and tricks - Monitoring and trending
•You have to monitor your database’s state
Use software or scripts which will alert
you if something went wrong
•Make sure you track what you need
Make sure you don’t track what you don’t
need - all alerts should be actionable
There’s nothing worse than being woken
up at 3am by alert you can’t do anything
about
Oncall staff will start ignoring pages
31
32. Copyright 2018 Severalnines AB
•No matter what database you use, there will always be maintenance operations to run
•The most important task is to make them as smooth and not impacting as possible
You want process to be stable and predictable
•One of the ways to achieve that is to test and standardize the process
Come up with a flow which is stable and reduces the impact to the minimum
Make sure everybody is using the same flow for given operation
•To achieve it you have two methods:
Create runbooks for everyone to follow
Write code which will automate the process - once tested, human errors are eliminated
Tips and tricks - Maintenance
32
33. Copyright 2018 Severalnines AB
•Test, test, test…
Be cautious, make sure you do not impact production
Test every step you take
•Always have a rollback plan for the maintenance
Even if it seems to be a walk in the park
•Design the process to be controllable
Implement an option to pause it
Implement an option to throttle it
Tips and tricks - Maintenance
33
34. Copyright 2018 Severalnines AB
•Performance tuning is a never-ending story, and you cannot really avoid it
Default settings are, typically, not suitable for large database
Workload you tune for changes in time, in a year from now an optimal configuration may
look different
•To make tuning efficient, you need observability - without measuring the impact of a change,
there’s no point in making config changes
•Second rule to follow - make one single change at a time
If you make multiple changes at once there’s no way you can tell what impact had each
one of them
Tips and Tricks - Performance tuning
34
35. Copyright 2018 Severalnines AB
•You need to stay on top of your database’s health
Regular health checks help here
•You should be looking into things like backup - did they complete ok? Did they pass restore
test?
•Were there any issues recently? Any downtimes?
•How does the graphs look like? Anything out of ordinary?
•Ideally, you want to automate some of the checks, especially if you have large environment
It may take too long to go through tens (or hundreds) of databases manually
Tips and tricks - Health checks
35