Mais conteúdo relacionado
Semelhante a Optimizing MariaDB for Web Applications (European Drupal Days 2015) (20)
Mais de Eugenio Minardi (13)
Optimizing MariaDB for Web Applications (European Drupal Days 2015)
- 1. © Ibuildings 2014/2015 - All rights reserved
#DrupalDaysEU
Optimizing MariaDB
for Drupal
and other Web Applications
- 4. © Ibuildings 2014/2015 - All rights reserved
Speaker Info
Federico Razzoli
Database Engineer
federico@ibuildings.it
- 6. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
●
MySQL fork
●
Created by Monty Widenius
●
Protected and promoted by the MariaDB Foundation
●
Supported by the community
●
"Community" includes Google, Facebook, Twitter...
●
Compatible with MySQL
What is MariaDB?
- 7. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
Monty Widenius has two daughters called
●
My
●
Maria
Incidentally, he created two DBMSs with the same names :)
Genesis of a name
- 8. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
●
mysql is the command-line client. To run it:
mysql -uroot -p
●
Any MySQL GUI will work with MariaDB. Recommended ones are:
●
SQLyog on Linux
●
HeidiSQL on Windows
●
phpMyAdmin is ok if you can't run a GUI locally
MariaDB clients
- 9. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
Configuration file:
●
/etc/mysql/my.cnf (on Debian, Ubuntu, etc)
●
my.cnf (on Windows)
Setting a variable at runtime:
SET GLOBAL variable_name = 'value';
Starting the server with a particular setting:
mysql --variable-name=value
Configuring MariaDB
- 10. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
datadir = /var/mysql
max_connections = 200
lock_wait_timeout = 60
max_packet_size = 2M
For most variables, default parameters work fine.
Configuring more than 10 variables is generally wrong.
Basic Configuration
- 11. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
Clients connect to MariaDB server and send queries.
The workflow of a query processing is:
●
Connection manager
●
Query Cache
●
Parser
●
Optimizer
●
Storage engines
MariaDB architecture
- 13. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
Generic truths:
●
MySQL drivers work with MariaDB
●
MySQL drivers do not support some MariaDB specific features:
●
Non-blocking API
●
Progress reporting
●
ODBC drivers work with MariaDB
Drivers (connectors)
- 14. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
●
PHP doesn't have a specific MariaDB driver
●
PHP has 3 modules for MySQL:
●
mysql (not supported anymore)
●
mysqli (most performant)
●
PDO (abstract layer)
●
Drupal uses PDO; it's ok
PHP drivers
- 17. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
●
Traditional method, used by MySQL; still the default one
●
When a connection is opened, a new thread is created
●
When a connection is closed, the thread is destroyed
●
On a simple web page, 1 click opens needs a connection
●
With AJAX, an open page can periodically create many connections
●
With Drupal, more plugins = more connections
Constantly creating and destroying threads cause an overhead.
Therefore this connection method well suites to desktop applications, not to web
applications.
Thread per connection
- 18. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
●
Connections are handled by one or more pools
●
Usually, one pool per core
●
No need to often create / destroy threads
●
Doesn't suite to environments where some long queries could be executed together
●
root will still be able to connect with the traditional method to kill long running queries
Perfect for web applications, as long as we don't "lock" the cores with long running
queries.
Pool of threads
- 19. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
To use Thread per connection... do nothing. It's the default method.
To use Pool of threads:
connection_method = pool-of-threads
Choosing a connection method
- 21. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
●
Can be disabled
●
When a query is issued, MariaDB looks in the cache for in an identical way
●
If the query is found, MariaDB returns the whole resultset
●
Otherwise, it executes the query and caches it
●
While doing so, it probably has to evict an old query from the cache
Query Cache
- 23. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
Long answer:
●
Query Cache needs a lot of memory
●
Whole resultsets are stored. Rows returned by 10 queries are stored 5 times.
●
Data invalidation
●
Every time a table is modified, all queries that mention that tables are invalidated.
●
Overhead
●
Looking for a query in the cache has a cost.
Should I use Query Cache?
- 26. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
Storage Engines are a particular type of plugins.
They implement data access:
●
data format (compression level, etc)
●
indexing
●
caching
●
special features, like transactions, foreign keys..
Storage Engines: an overview
- 27. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
If you are uncertain, just use InnoDB. It's impressive for most workloads.
MySQL is only focusing on InnoDB, ignoring the other Storage Engines.
But:
●
Each storage engine is optimized for some types of workload.
●
Storage Engines can do special things.
Why don't we have only one Storage Engine?
- 28. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
InnoDB - Fast, reliable for OLTP.
MEMORY - Data are written in memory, not on disk
MyISAM - Simple. Doesn't support transactions.
Aria - A crash-safe MyISAM
ARCHIVE - Compressed append-only tables
TokuDB - Reduces I/O; great compression level
BLACKHOLE - A Black Hole that makes data disappear
SPIDER - Connects to remote servers
CONNECT - Reads data files or remote DBMS's as if they were local tables
OQGRAPH - Handles trees and graphs
...and others.
Which Storage Engines exist?
- 29. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
InnoDB is faster for a typical workload.
MyISAM does not support transactions: rows are written immediately.
A MyISAM table can be compressed (it becomes read-only).
Aria is crash-safe: if tables are damaged, you can always recover data.
Drupal uses InnoDB for most table, and MyISAM for:
●
Logs
●
Caches
InnoDB vs. MyISAM vs. Aria
- 30. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
innodb_buffer_pool_size = <a_lot_of_memory>
innodb_log_buffer_size = 64M (at least)
innodb_lock_wait_timeout = 30
InnoDB basic configuration
- 32. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
●
Receives a request and (tries to) find the best execution strategy
●
For simple applications, it usually succeeds
●
But if s query is slow, the optimizer probably failed to elaborate a good query plan
MariaDB Optimizer
- 33. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
●
Slow Query Log
slow_query_log = ON
long_query_time = 5
log_queries_not_using_indexes = ON
min_examined_row_limit = 100
Restart MariaDB and run your application.
Then, just open the slow query log file and see which queries are slow.
One step behind...
how do I know which queries are slow?
- 34. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
Imagine you are searching a book for a specific information.
Reading the whole book wouldn't be an efficient method.
Fortunately, books start with an index. The index allows us to read a small amount of text
to find information.
Table indexes are similar to book indexes: they allow MariaDB to quickly find the rows we
are looking for.
What are those index things you mentioned?
- 35. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
●
An index can consist of one or more columns
●
An index can be built on whole columns or prefix (for text columns)
●
There are no ascending/descending indexes (MariaDB/MySQL limitation)
Index essentials
- 36. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
●
Almost all operators in the WHERE clause:
●
=, <, >, >=, <=, IN, BETWEEN, LIKE
●
JOIN
●
GROUP BY
●
ORDER BY
Supported operations:
- 37. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
●
A sufficient number of rows will be examined
●
Index has a sufficient number of unique values
●
MariaDB can use whole index or the leftmost part
(yes, columns order matters)
●
With functions, indexes cannot be used:
SELECT * FROM my_table WHERE CHAR_LENGTH(indexed_column) = 1;
When are indexes used?
- 38. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
Indexes:
CREATE INDEX idx_user ON username (username);
CREATE INDEX idx_userpass ON username (username, password);
CREATE INDEX idx_birthdate_user ON username (birth_date, username);
Queries:
SELECT username, password FROM users WHERE username = 'batman';
SELECT username FROM users WHERE username LIKE 'batman%';
SELECT username FROM users WHERE username LIKE '%batman%';
Leftmost part?
- 39. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
CREATE INDEX idx_a ON my_table (a);
CREATE INDEX idx_b ON my_table (b);
CREATE INDEX idx_abc ON my_table (a, b, c);
SELECT * FROM my_table WHERE a = 1 AND c = 1;
SELECT * FROM my_table WHERE a = 1 AND b = 1;
SELECT * FROM my_table WHERE a = 1 OR b = 1 OR c = 1;
SELECT * FROM my_table WHERE a = 1 OR b = 1;
Indexes: AND, OR
- 40. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
Sometimes, we think that an index can be used to speedup a query,
but we are wrong.
Sometimes an index can be used, but the optimizer chooses to discard it.
There could be a good reason for this, but the optimizer can be wrong.
If a query is slow, we need to check if it uses a good index:
EXPLAIN EXTENDED SELECT …
The output of EXPLAIN can optionally be included in the Slow Query Log.
EXPLAIN statement
- 41. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
EXPLAIN returns a row for each table that is read by the query.
The rows are sorted.
There are several columns. The most important are:
●
table_name: Name of a table read by the query.
●
possible_keys: List of indexes that could be used.
●
key: The index choosen by the optimizer.
●
extra: Extra information; for example, wether
an internal temporary table is used.
EXPLAIN statement
- 42. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
Everyone knows/remembers JOIN?
JOIN "links" two tables.
SELECT u.username, p.title
FROM users u
JOIN posts p
ON u.id = p.author;
SELECT u.username, p.title, c.text
FROM users u
JOIN posts p
ON u.id = p.author
LEFT JOIN comments c
ON p.id = c.post_id;
JOIN optimization
- 43. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
Suppose that we have the following tables:
users: 30 rows
posts: 2,000 rows
comments: 10,000 rows
●
SELECT … FROM users JOIN posts JOIN comments
This query is completely non-optimized.
●
SELECT … FROM comments JOIN posts JOIN comments
will read 10,000 rows; for each row: 2,000; for each match: 10
The optimizer will determine the best JOIN order.... maybe.
JOIN optimization
- 44. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
The optimzer will choose a bad JOIN order if it doesn't know the number of rows in a
table, or the number of unique values in a column.
Solutions:
●
ANALYZE TABLE comments;
●
use_stat_tables = complementary
●
SELECT STRAIGHT_JOIN … FROM t1 JOIN t2
JOIN optimization
- 45. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
GROUP BY and ORDER BY can take advantage of indexes.
CREATE INDEX idx_a ON my_table (a, b);
SELECT * FROM my_table GROUP BY a;
SELECT * FROM my_table ORDER BY a;
SELECT * FROM my_table GROUP BY a ORDER BY a;
SELECT * FROM my_table WHERE a < 100 GROUP BY a ORDER BY a;
SELECT * FROM my_table WHERE a > b GROUP BY a ORDER BY a;
SELECT * FROM my_table WHERE b < 100 GROUP BY a ORDER BY a;
Optimization of grouping and ordering
- 46. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
If WHERE, GROUP BY and ORDER BY use different columns,
MariaDB creates a temporary table, in memory or on disk,
CREATE INDEX idx_a ON my_table (a, b, c);
SELECT * FROM my_table WHERE a < 100 GROUP BY b ORDER BY b;
SELECT * FROM my_table WHERE a > b GROUP BY a ORDER BY b;
SELECT * FROM my_table WHERE c < 100 GROUP BY a ORDER BY a;
Try to avoid this when a big amount of data is examined!
Optimization of grouping and ordering
- 47. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
Some queries can only be executed by using temporary tables:
SELECT c.id, u.users_num
FROM (
SELECT city_id, COUNT(*) AS users_num
FROM users
GROUP BY city) u
JOIN city c ON u.city_id = c.id;
Often, these queries can be rewritten to avoid using a temp table:
SELECT c.id, COUNT(u.id) AS users_num
FROM city c
JOIN user u
ON c.id = u.city_id
GROUP BY c.id;
Internal temporary tables
- 49. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
error_log = <file_name>
If not specified, errors are written:
●
to the console (on Linux)
●
on a file with a default name (onWindows)
Error Log
- 50. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
To start logging SQL errors:
INSTALL SONAME 'sql_errlog';
To rotate the log:
SET GLOBAL sql_error_log_rotate = 1;
To stop logging:
UNINSTALL SONAME 'sql_errlog';
FLUSH TABLES;
SQL Error Log
- 52. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
●
Don't assume that something will work in practice just because it works in theory
●
Don't assume that theory is useless just because it doesn't work
●
In MariaDB, things always happen for a reason
●
.Well, a complex set of reasons, in fact
●
When you don't understand, investigate. Read the docs. Experiment.
●
Dirty solutions won't save time. Especially if you're sure they will.
General advices
- 53. #DrupalDaysEU
© Ibuildings 2014/2015 - All rights reserved
Search the web for:
●
MariaDB Knowledge Base (KB)
●
MySQL documentation (for information missing in the KB)
●
MySQL Performance Blog (Percona)
●
planet.mysql.com
●
MariaDB mailing lists on Launchpad
Good resources