Some best practices about tuning Linux for your database workloads. The focus is not just on MySQL or MariaDB Server but also on understanding the OS from hardware/cloud, I/O, filesystems, memory, CPU, network, and resources.
1. Tuning Linux for your
database
Colin Charles,Team MariaDB, MariaDB Corporation
colin@mariadb.com / byte@bytebot.net
http://bytebot.net/blog/ | @bytebot on Twitter
FLOSSUK 2016, London
16 March 2016
1
2. whoami
• Work on MariaDB at MariaDB Corporation
(SkySQL Ab)
• Merged with Monty Program Ab, makers of
MariaDB
• Formerly MySQL AB (exit: Sun Microsystems)
• Past lives include Fedora Project (FESCO),
OpenOffice.org
• MySQL Community Contributor of theYear Award
winner 2014
2
3. Agenda
• Hardware or cloud
• I/O
• Filesystems
• Memory
• CPU
• Network
• Resources
3
4. Operating System
• Focus of this talk & tools: Linux
• But distributions make a difference too
• like RHEL6 has tuned but Ubuntu doesn’t
• Distribution versions can make a difference
too
• RHEL5: ktune; RHEL6: tuned; RHEL7:
tuned throughput-performance
4
5. Servers
• When you buy servers... do you think in
terms of CPU, then memory, then I/O?
• Database servers clearly love I/O more than
memory more than CPU
• queries! scans of large tables? index scans
with random I/O? BI?
• Durability requires each transaction commit
to be flushed to disk - i.e. call fsync()
5
6. I/O
• What kind of storage do you use?
• SAS or SATA?
• Ethernet (NAS, DRBD)
• SSD
• Fusion IO (NVMFS)
6
7. Schedulers (I/O
Elevators)
• cfq: default, great for slower storage
(SATA)
• noop: low latency storage (SSD)
• deadline: multi-process apps
• Most database workloads benefit from the
deadline scheduler (goal is after all to
minimise seeks, prioritise process I/O)
7
10. File systems
• Choices: ext4, XFS
• mount options:
• ext4: rw,
nosuid,
noatime,
data=ordered,
nobarrier
• xfs: nobarrier
• Turn off I/O barriers
• http://lwn.net/Articles/
283161/
• Separate files (data/logs/
undo/etc.) or not?
• Don’t forget flash backed
write caches or battery
backed write cache on
RAID cards (way more
fsync/s)
• iostat -dmx <interval> is
your friend
10
11. SSD & Flash
• SSD
• much more IOPS than traditional disks
• lower latency
• make sure you have a new RAID controller
• Flash
• more IOPS than SSD
• very low latency
• InnoDB doublewrite buffer -- MariaDB 10.0, Percona
Server 5.6 (Fusion IO), MySQL 5.7
11
12. RAID
• RAID0 - fast writes, reads, no redundancy
• RAID1 - slower writes, fast reads
• RAID5 - slow for random writes, fast for
sequential writes, fast reads too but slow
recovery
• RAID10 - fast reads & writes
12
14. Memory
• The more RAM the better -> reduces I/O requirements
• Use ECC RAM
• NUMA - “swap insanity”
• http://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-
numa-architecture/
•numactl --interleave=all mysqld &
• Twitter patch/Percona Server 5.6
• numa_interleave option
• innodb_buffer_pool_populate - NUMA decisions
to be made when buffer cache is clean
14
15. Why does MySQL need
memory?
• Per-session buffers
• sort buffer, temp tables
• Metadata/locking
• index statistics, table definitions
• Caching data
• decrease reads: faster response time
• decrease random writes: queues write in cache;
perform more sequential writes
15
16. Where is memory
used?
• filesystem cache
• binary / relay logs
• MyISAM data relies on filesystem cache
• MySQL cache:
• innodb_buffer_pool_size (pages), MyISAM’s
key_buffer_size (indexes)
• Per session buffers
•sort_buffer_size, join_buffer_size,
read_buffer_size, tmp_table_size
•free -m / SHOW ENGINE INNODB STATUS
16
17. Huge pages
• 2M (4M even) pages vs 4K standard Linux
page
•vm.nr_hugepages=8192
• MySQL benefits from hugepages, MongoDB
doesn’t
17
18. Swappiness
• Controls how aggressively the system reclaims
“mapped” memory
• default: 60, vm.swappiness=0 is ok (but not for newer
kernels!)
• decrease: aggressive reclaiming of unmapped
pagecache memory
• increase: aggressive swapping of mapped memory
• 3.5 kernels change behaviour (and this was
backported to RHEL 2.6.32-303)
18
19. CPU
• default is ondemand, you should turn it to
performance generally
•echo "performance" > /sys/devices/
system/cpu/cpu0/cpufreq/
scaling_governor
• disable powersave mode
• cpufrequtils is where it’s at
• look at BIOS: turbo mode/C-state/OS power control
19
20. Network
• Tools: tc, dropwatch
• Use gigabit Ethernet (more bandwidth/
throughput, less latency)
• Can also trunk/bond for HA
20
21. Network II
• ARP filter - prevents ARP flux
•echo 1 > /proc/sys/net/ipv4/
conf/all/arp_filter
• MTU size at 9,000 - “jumbo frames”
• sar -n DEV
21
22. Network III
• allow more connections to queue up: echo 4096 > /
proc/sys/net/ipv4/tcp_max_syn_backlog
• /etc/sysctl.conf -
net.ipv4.tcp_max_syn_backlog = 4096
• increase kernel packet buffer:
net.core.netdev_max_backlog = 4096
• increase socket connection wait queue:
net.core.somaxconn = 4096
• Reduce TCP timeout that comes after closing socket (default
1m): net.ipv4.tcp_fin_timeout = 30
22
23. Network IV
• Fully synchronous replication, ala, Galera
Cluster? (NDBCluster too)
• Fastest as your slowest node
• evs.send_window + evs.user_send_window
(something large e.g. 512)
23
24. Resource Limits
• ulimits / /etc/security/limits.conf - they need to be
high/sensible over the defaults
• -f (file size): unlimited
• -t (cpu time): unlimited
• -v (virtmem): unlimited
• -n (open files): 64000
• -m (memsize): unlimited
• -u (processes/threads): 32000
24
25. Tune that database
• Focus on database design - find slow, crappy
queries and fix them (eg. mysql: pt-
query-digest)
• Reduce locking or time waiting - always
benchmark your application continually
• Don’t forget /etc/my.cnf (and appropriate
friends)
25
26. KVM/Xen/Virtualization
• Most of your raw database performance
comes from being hardware-centric
• KVM at device level you can turn off caching
• Native AIO used over threaded nowadays
• VMWare: http://kb.vmware.com/selfservice/
microsites/search.do?
language=en_US&cmd=displayKC&externalI
d=1008542
26
27. Containers
• Kubernetes - vitess.io rolls out w/o issue,
CoreOS, Docker
• Remember when benchmarking, you have
to look at local vs remote benchmarking
(networking can add overhead)
• 1 thread (up to 25% difference), but average
seems to be 15% loss for r/w workloads;
remotely? 32% is possible
27
28. EC2
• EC2 - instance limitations
• EBS - unpredictable I/O performance, use
RAID10 or RAID0
• RDS - similar performance between EBS
RAID10 MySQL & RDS
• choice of MySQL 5.7 (and older) or
MariaDB Server 10.0 or PostgreSQL
28
29. Why MariaDB/Percona
Server
• Threadpool
• NUMA interleaving
• Numerous performance fixes in XtraDB
like fast InnoDB restarts, buffer pool
warming, etc.
• Can run (supported) MariaDB Galera
Cluster/Percona XtraDB Cluster
29
30. Hardware overall
• Test everything
• Sometimes NIC’s are a bad batch or the
driver is buggy (RHEL5 + some Broadcom
NICs drop packets under high load - bnx2
before 2.6.33)
• Sometimes you get a bad batch of disk
30
31. Hadoop/Hbase
• running this in EC2 is problematic
• make sure name resolution works
• enable bonding
• networking: hadoop uses big buffers (64MB blocks by default)
• net.ipv4.tcp_rmem = 32768 436600 4194304
• net.ipv4.tcp_wmem = 32768 436600 4194304
• http://docs.hortonworks.com/HDPDocuments/HDP2/
HDP-2.1.3/bk_cluster-planning-guide/content/ch_hardware-
recommendations.html
31
33. SELinux, firewalls,
security
• The easy way is to “disable SELinux”
• The reality is you should learn it
• Its just like a firewall that you really should
configure
33
34. Tools
• ps
• vmstat 1
• iostat
• top
• free
• sar (sar -n DEV 1)
• gdb
• tcpdump
• strace
• oprofile
• htop
• MySQL only: Percona
Toolkit (pt-diskstats, pt-
summary, pt-query-digest,
etc.)
34
35. Benchmarking
• sysbench
• OLTP test, use
tables with 20M
rows and 20M
transactions, check
1-128 threads/run
• LinkBench
• Yahoo! Cloud Serving
Benchmark
• https://github.com/
brianfrankcooper/
YCSB
• Google’s PerfKit
Benchmarker
• https://github.com/
GoogleCloudPlatfor
m/
PerfKitBenchmarke
r
35
38. Thanks/Q&A
Colin Charles, colin@mariadb.com | byte@bytebot.net
http://bytebot.net/blog/ | @bytebot
slides: slideshare.net/bytebot/
Download MariaDB and give it a try: http://mariadb.org/
38