MySQL Query Optimisation 101

€ whoami
● Federico Razzoli
● Freelance consultant
● Writing SQL since MySQL 2.23
hello@federico-razzoli.com
● I love open source, sharing,
Collaboration, win-win, etc
● I love MariaDB, MySQL, Postgres, etc
○ Even Db2, somehow

Why is the database important?

Remember the Von Neumann machine?

It’s always about Data
● Since then, the purpose of hardware and software never changed:
○ Receive data
○ Process data
○ Output data

A rose by any other name...
● Feel free to use synonyms
○ Validate
○ Sanitise
○ Parse
○ Persist
○ Normalise / Denormalise
○ Cache
○ Map / Reduce
○ Print
○ Ping
○ ...

...would smell as sweet
● The database of known stars is not a ping package
● You use a DBMS to abstract data management as much as possible
○ Persistence
○ Consistence
○ Queries
○ Fast search
○ …
● That’s why “database is magic”

Busy
● But it’s just a very busy person, performing many tasks concurrently
● And each is:
○ Important (must be reliable)
○ Complex (must be fast)
○ Expected (if something goes wrong,
you will complain)
In practice, the DBMS is usually the bottleneck.

Terminology
Statement: An SQL command
Query: A statement that returns a resultset
...or any other statement :)
Resultset: output 0 or more rows
Optimiser / Query Planner: Component responsible of deciding a query’s
execution plan
Optimised query: A query whose execution plan is reasonably good
...this doesn’t imply in any way that the query is fast
Database: set of tables (schema)
Instance / Server: running MySQL daemon (cluster)

When should a query be optimised?
mysql> EXPLAIN SELECT * FROM t WHERE c < 10 G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t
partitions: NULL
type: ALL
possible_keys: idx_c
key: idx_c
key_len: 4
ref: NULL
rows: 213030
filtered: 50.00
Extra: Using where
1 row in set, 1 warning (0.01 sec)

When should a query be optimised?
mysql> SELECT * FROM performance_schema.events_statements_summary_by_digestWHERE DIGEST =
'254a65744e661e072103b7a7630dee1c3a3b8e906f19889f7c796aebe7cdd4f8' G
*************************** 1. row ***************************
SCHEMA_NAME: test
DIGEST: 254a65744e661e072103b7a7630dee1c3a3b8e906f19889f7c796aebe7cdd4f8
DIGEST_TEXT: SELECT * FROM `t` WHERE `c` < ?
COUNT_STAR: 1
...
SUM_ROWS_AFFECTED: 0
SUM_ROWS_SENT: 57344
SUM_ROWS_EXAMINED: 212992
SUM_CREATED_TMP_DISK_TABLES: 0
SUM_CREATED_TMP_TABLES: 0
SUM_SELECT_FULL_JOIN: 0
SUM_SELECT_FULL_RANGE_JOIN: 0
SUM_SELECT_RANGE: 0
SUM_SELECT_RANGE_CHECK: 0
SUM_SELECT_SCAN: 1
SUM_SORT_MERGE_PASSES: 0
SUM_SORT_RANGE: 0
SUM_SORT_ROWS: 0
SUM_SORT_SCAN: 0
SUM_NO_INDEX_USED: 1
SUM_NO_GOOD_INDEX_USED: 0
FIRST_SEEN: 2019-05-14 00:31:24.078967
LAST_SEEN: 2019-05-14 00:31:24.078967
...
QUERY_SAMPLE_TEXT: SELECT * FROM t WHERE c < 10
QUERY_SAMPLE_SEEN: 2019-05-14 00:31:24.078967
QUERY_SAMPLE_TIMER_WAIT: 117493874000

But how do I find impacting queries?
● It depends what you mean by “impacting”
● There are several monitoring methods (USE, etc)
● But 3 philosophies:

○ Panicking when you hear that something is down or slow

○ System-centric monitoring

○ System-centric monitoring
○ User-centric

● Panicking when you hear that something is down or slow
● System-centric monitoring
● User-centric
You can use them all.

Panicking
● Simplest method
● Do nothing do prevent anything
○ Optionally, take a lot of actions to prevent imaginary problems in
imaginary ways
○ There is no evidence that your job is useless, so your boss will not fire
you

System-centric
● pt-query-digest, PMM, etc
● Merge queries into one, normalising its text and replacing parameters
○ SELECT * FROM t WHERE b= 111 AND a = 0 -- comment
○ Select * From t Where a = 24 and b=42;
○ SELECT * FROM t WHERE a = ? AND b = ?
● Sum execution time of each occurrence (Grand Total Time)
● Optimise the queries with highest GTT

User-Centric
● Calculate the cost of slowness (users don’t buy, maybe leave 4ever)
● Cost of slowness is different for different
○ URLs
○ Number of users
○ ...other variables that depend on your business
■ (day of month, country, etc)
● Set Service Level Objectives
● Monitor the HTTP calls latency, and the involved services
● Find out what’s slowing them down

What makes a query “important”?
● How many times it’s executed
● It’s locking

What makes a query slow?
● Number of rows read
○ Read != return
○ Indexes are there to lower the number of reads
● Number of rows written
○ In-memory temp tables are not good
○ On-disk temp tables are worse

How do I optimise a query?
● Use indexes properly
● Avoid creation of temp tables, if possible

Index Types
● BTREE - ordered data structure
● HASH - hash table
● PostgreSQL has much more
● Each storage engine can implement any of both
● InnoDB uses BTREE and internally uses HASH when it thinks it’s better
● The syntax CREATE INDEX USING [BTREE | HASH] is generally useless
We will focus on BTREE indexes in InnoDB

Index Properties
● Primary key: unique values, not null
● UNIQUE
● Multiple columns
○ The order matters
● Column prefix (only for strings)

InnoDB Indexes
● InnoDB tables should always have a Primary Key
● The table is stored ordered by primary key
● The table itself is the primary key
○ Columns “not part of the primary key” simply don’t affect the order of
rows

InnoDB Indexes
Table columns: {a, b, c}
Primary key: {a, b}
A B C
1 1 4
1 2 1
2 1 9
2 2 3
3 0 3
4 20 0

InnoDB Indexes
● Secondary indexes are stored separately
● They are ordered by the indexed column
● Each entry contain a reference to a primary key entry

InnoDB Indexes
Primary key: {a, b}
Index idx_c: {c}
c a b
0 4 20
1 1 2
3 2 2
3 3 0
4 1 1
9 2 1

Which queries will be faster?
Table columns: {a, b, c, d, e}
Primary key: {a, b}
Index idx_c: {c}
● SELECT * FROM t WHERE a = 1
● SELECT * FROM t WHERE a = 1 AND b = 2
● SELECT a, b FROM t WHERE c = 0
● SELECT d, e FROM t WHERE c = 0

More performance considerations?

● Big primary key = big indexes
● Primary key should be append-only
INTEGER UNSIGNED AUTO_INCREMENT
● These indexes are duplicates: {a} - {a, b}
● This index is wrong: {a, id}

https://github.com/jeremycole/innodb_diagrams

● Writing to an index is relatively slow
● Deleting many rows leaves fragmented indexes

Phone Book
● Indexes are ordered data structures
● Think to them as a phone book
Table: {first_name, last_name, phone, address}
Index: {last_name, first_name}

Phone Book
● I will show you some queries, and you will tell me which can be solved by
using the index
● You may not know, but your mind contains a pretty good SQL optimiser
Table: {first_name, last_name, phone, address}
Index: {last_name, first_name}

Queries
SELECT * FROM phone_book …
WHERE last_name = 'Baker'
WHERE last_name IN ('Hartnell','Baker', 'Whittaker')
WHERE last_name > 'Baker'
WHERE last_name >= 'Baker'
WHERE last_name < 'Baker'
WHERE last_name <= 'Baker'
WHERE last_name <> 'Baker'

Queries
SELECT * FROM phone_book …
WHERE last_name IS NULL
WHERE last_name IS NOT NULL

Rule #1
A BTREE can optimise
point searches and ranges

Queries
WHERE last_name >= 'B' AND last_name < 'C'
WHERE last_name BETWEEN 'B' AND 'C'
WHERE last_name LIKE 'B%'

Queries
WHERE last_name LIKE 'B%'
WHERE last_name LIKE '%B%'
WHERE last_name LIKE '%B'
WHERE last_name LIKE 'B_'
WHERE last_name LIKE '_B_'
WHERE last_name LIKE '_B'

Rule #2
A LIKE condition
whose second operand starts with a 'constant string'
is a range

Queries
WHERE first_name = 'Tom'
WHERE last_name = 'Baker'
WHERE first_name = 'Tom' AND last_name = 'Baker'
WHERE last_name = 'Baker' AND first_name = 'Tom'

Rule #3
We can use a whole index
or its leftmost part

Queries
WHERE LEFT(last_name, 2) = 'Ba'
WHERE last_name = CONCAT('Ba', 'ker')

Rule #4
Optimiser cannot make assumptions on functions/expression results.
However, wrapping a constant value into a function will produce another
constant value, which is mostly irrelevant for query optimisation.

Queries
WHERE last_name = first_name

Rule #5
Comparing a column with another results in a comparison
whose operands change at every row.
The optimiser cannot ﬁlter out any row in advance.

Queries
WHERE last_name = 'Baker' AND phone = '+44 7739 427279'

Rule #6
We can use an index to restrict the search to a set of rows
And search those rows in a non-optimised fashion
Depending on this set’s size, this could be a brilliant or a terrible strategy

Queries
WHERE last_name = 'Baker' AND first_name > 'Tom'
WHERE last_name > 'Baker' AND first_name = 'Tom'
WHERE last_name > 'Baker' AND first_name > 'Tom'

Queries
WHERE last_name = 'Baker' AND first_name > 'Tom'
WHERE first_name = 'Tom' AND last_name > 'Baker'
WHERE first_name > 'Tom' AND last_name = 'Baker'
WHERE last_name > 'Baker' AND first_name > 'Tom'
Baker, Colin
Baker, Tom
Baker, Walter
Capaldi, Ada
Capaldi, Peter
Whittaker, Jody
Whittaker, Vadim

Rule #7
If we have a range condition on an index column
The next index columns cannot be used
If you prefer:
Index usage stops at the ﬁrst >

Queries
ORDER BY last_name
ORDER BY first_name
ORDER BY last_name, first_name
ORDER BY first_name, last_name

Queries
GROUP BY last_name
GROUP BY first_name
GROUP BY last_name, first_name
GROUP BY first_name, last_name

Rule #8
ORDER BY and GROUP BY can take advantage of an index order
or create an internal temp table
Note: GROUP BY optimisation also depends on the function we’re using
(MAX, COUNT…).

Queries
WHERE last_name > 'Baker' ORDER BY last_name
WHERE last_name = 'Baker' ORDER BY first_name
WHERE last_name > 'Baker' ORDER BY first_name

Rule #9
If we have an ORDER BY / GROUP BY on an index column
The next index columns cannot be used

Queries
Table: {id, a, b, c, d}
idx_a: {a, d}
idx_b: {b}
WHERE a = 10 OR a = 20
WHERE a = 24 OR c = 42
WHERE a = 24 OR d = 42
WHERE a = 24 AND b = 42
WHERE a = 24 OR b = 42
WHERE a = 24 ORDER BY b
GROUP BY a ORDER BY b

Rule #10
Using multiple indexes for AND or OR (intersect) is possible,
but there is a beneﬁt only if we read MANY rows
Using different indexes in WHERE / GROUP BY / ORDER BY
is not possible

Thank you kindly!
https://federico-razzoli.com
info@federico-razzoli.com

MySQL Query Optimisation 101

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (16)

Semelhante a MySQL Query Optimisation 101

Semelhante a MySQL Query Optimisation 101 (20)

Mais de Federico Razzoli

Mais de Federico Razzoli (18)

Último

Último (20)

MySQL Query Optimisation 101