Advanced Query Optimizer Tuning and Analysis

Advanced query optimizer
tuning and analysis
Sergei Petrunia
Timour Katchaounov
Monty Program Ab
MySQL Conference And Expo 2013

2 07:48:08 AM
● Introduction
– What is an optimizer problem
– How to catch it
● old an new tools
● Single-table selects
– brief recap from 2012
● JOINs
– ref access
● index statistics
– join condition pushdown
– join plan efficiency
– query plan vs reality
● Big I/O bound JOINs
– Batched Key Access
● Aggregate functions
● ORDER BY ... LIMIT
● GROUP BY
● Subqueries

Is there a problem with query optimizer?
3 07:48:08 AM
• Database
performance is
affected by many
factors
• One of them is the
query optimizer
• Is my performance
problem caused by
the optimizer?

Sings that there is a query optimizer problem
• Some (not all) queries are slow
• A query seems to run longer than it ought to
– And examines more records than it ought to
• Usually, query remains slow regardless of
other activity on the server
4 07:48:08 AM

Catching slow queries, the old ways
5 07:48:08 AM
● Watch the Slow query log
– Percona Server/MariaDB:
--log_slow_verbosity=query_plan
# Thread_id: 1 Schema: dbt3sf10 QC_hit: No
# Query_time: 2.452373 Lock_time: 0.000113 Rows_sent: 0 Rows_examined: 1500000
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No
# Filesort: No Filesort_on_disk: No Merge_passes: 0
SET timestamp=1333385770;
select * from customer where c_acctbal < -1000;
– Run pt-query-digest on the log
• Run SHOW PROCESSLIST periodically

7 07:48:08 AM
SHOW EXPLAIN usage
● Intended usage
– SHOW PROCESSLIST ...
– SHOW EXPLAIN FOR ...
● Why not just run EXPLAIN again
– Difficult to replicate setups
● Temporary tables
● Optimizer settings
● Storage engine's index statistics
● ...
– No uncertainty about whether you're looking at
the same query plan or not.

Catching slow queries (NEW)
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0]
8 07:48:08 AM
● use performance_schema
● Many ways to analyze via queries
– events_statements_summary_by_digest
● count_star, sum_timer_wait,
min_timer_wait, avg_timer_wait, max_timer_wait
● digest_text, digest
● sum_rows_examined, sum_created_tmp_disk_tables,
sum_select_full_join
– events_statements_history
● sql_text, digest_text, digest
● timer_start, timer_end, timer_wait
● rows_examined, created_tmp_disk_tables,
select_full_join
8

• Modified Q18 from DBT3
select c_name, c_custkey, o_orderkey, o_orderdate,
9 07:48:08 AM
o_totalprice, sum(l_quantity)
from customer, orders, lineitem
where
o_totalprice > ?
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey,
o_orderdate, o_totalprice
order by o_totalprice desc, o_orderdate
LIMIT 10;
• App executes Q18 many times with
? = 550000, 500000, 400000, ...
9

● Find candidate slow queries
● Simple tests: select_full_join > 0,
created_tmp_disk_tables > 0, etc
● Complex conditions:
max execution time > X sec OR
min/max time vary a lot:
select max_timer_wait/avg_timer_wait as max_ratio,
avg_timer_wait/min_timer_wait as min_ratio
from events_statements_summary_by_digest
where max_timer_wait > 1000000000000
or max_timer_wait / avg_timer_wait > 2
or avg_timer_wait / min_timer_wait > 2G
10 07:48:08 AM

11 07:48:08 AM
*************************** 5. row ***************************
DIGEST: 3cd7b881cbc0102f65fe8a290ec1bd6b
DIGEST_TEXT: SELECT `c_name` , `c_custkey` , ò_orderkey` , ò_orderdate` ,
ò_totalprice` , SUM ( `l_quantity` ) FROM `customer` , òrders` , `lineitem` WHERE
ò_totalprice` > ? AND `c_custkey` = ò_custkey` AND ò_orderkey` = `l_orderkey` GROUP BY
`c_name` , `c_custkey` , ò_orderkey` , ò_orderdate` , ò_totalprice` ORDER BY ò_totalprice`
DESC , ò_orderdate` LIMIT ?
COUNT_STAR: 3
SUM_TIMER_WAIT: 3251758347000
MIN_TIMER_WAIT: 3914209000 → 0.0039 sec
AVG_TIMER_WAIT: 1083919449000
MAX_TIMER_WAIT: 3204044053000 → 3.2 sec
SUM_LOCK_TIME: 555000000
SUM_ROWS_SENT: 25
SUM_ROWS_EXAMINED: 0
SUM_CREATED_TMP_DISK_TABLES: 0
SUM_CREATED_TMP_TABLES: 3
SUM_SELECT_FULL_JOIN: 0
SUM_SELECT_RANGE: 3
SUM_SELECT_SCAN: 0
SUM_SORT_RANGE: 0
SUM_SORT_ROWS: 25
SUM_SORT_SCAN: 3
SUM_NO_INDEX_USED: 0
SUM_NO_GOOD_INDEX_USED: 0
FIRST_SEEN: 1970-01-01 03:38:27
LAST_SEEN: 1970-01-01 03:38:43
max_ratio: 2.9560
min_ratio: 276.9192
High variance of
execution time

● Check the actual queries and constants
● The events_statements_history table
select timer_wait/1000000000000 as exec_time, sql_text
from events_statements_history
where digest in
(select digest from events_statements_summary_by_digest
where max_timer_wait > 1000000000000
12 07:48:08 AM
or max_timer_wait / avg_timer_wait > 2
or avg_timer_wait / min_timer_wait > 2)
order by timer_wait;

+-----------+-----------------------------------------------------------------------------------+
| exec_time | sql_text |
+-----------+-----------------------------------------------------------------------------------+
| 0.0039 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity)
where o_totalprice > 550000 and c_custkey = o_custkey ... LIMIT 10 |
+-----------+-----------------------------------------------------------------------------------+
Observation:
orders.o_totalprice > ? is less and less selective
13 07:48:08 AM

Actions after finding the slow query
Bad query plan
– Rewrite the query
– Force a good query plan
• Bad optimizer settings
– Do tuning
• Query is inherently complex
– Don't waste time with it
– Look for other solutions.
14 07:48:08 AM

15 07:48:08 AM
● Introduction
– How to catch it
● JOINs
– ref access
● GROUP BY
● Subqueries

Finding out which indexes to add
where
Check selectivity of conditions that will use the index
o_orderDate BETWEEN '1992-06-06' and '1992-07-06';
19 07:48:08 AM
● index (o_orderdate)
select count(*) from orders
where
306322 rows
● index (o_clerk)
select count(*) from orders where o_clerk='Clerk#000009506'
1507 rows.

Conditions must be in SARGable form
• Condition must represent a range
• It must have form that is recognized by the optimizer
o_orderDate BETWEEN '1992-06-01' and '1992-06-30'
day(o_orderDate)=1992 and month(o_orderdate)=6
TO_DAYS(o_orderDATE) between TO_DAYS('1992-06-06') and
TO_DAYS('1992-07-06')
21 07:48:08 AM
o_clerk LIKE 'Clerk#000009506'
o_clerk LIKE '%Clerk#000009506%'







column IN (1,10,15,21, ...)
(col1, col2) IN ( (1,1), (2,2), (3,3), …).

New in MySQL-5.6: optimizer_trace
22 07:48:08 AM
● Lets you see the ranges
set optimizer_trace=1;
explain select * from orders
where o_orderDATE between '1992-06-01' and '1992-07-03' and
o_orderdate not in ('1992-01-01', '1992-06-12','1992-07-04')
select * from information_schema.optimizer_traceG
● Will print a big JSON struct
● Search for range_scan_alternatives.

New in MySQL-5.6: optimizer_trace
23 07:48:08 AM
...
"range_scan_alternatives": [
{
"index": "i_o_orderdate",
"ranges": [
"1992-06-01 <= o_orderDATE < 1992-06-12",
"1992-06-12 < o_orderDATE <= 1992-07-03"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 319082,
"cost": 382900,
"chosen": true
},
{
"index": "i_o_date_clerk",
"ranges": [
"1992-06-01 <= o_orderDATE < 1992-06-12",
"1992-06-12 < o_orderDATE <= 1992-07-03"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 406336,
"cost": 487605,
"chosen": false,
"cause": "cost"
}
],
...
● Considered ranges are shown
in range_scan_alternatives
section
● This is actually original use
case of optimizer_trace
● Alas, recent mysql-5.6 displays
misleading info about ranges
on multi-component keys (will
file a bug)
● Still, very useful.

24 07:48:08 AM
Source of #rows estimates for range
where o_orderDate BETWEEN '1992-06-06' and '1992-07-06'
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where|
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+
?
• “records_in_range” estimate
• Done by diving into index
• Usually is fairly accurate
• Not affected by ANALYZE
TABLE.

25 07:48:08 AM
Simple selects: conclusions
• Efficiency == “#rows_scanned is close to #rows_returned”
• Indexes and WHERE conditions reduce #rows scanned
• Index estimates are usually accurate
• Multi-column indexes
– “handle” conditions on multiple columns
– Order of columns in the index matters
• optimizer_trace allows to view the ranges
– But misrepresents ranges over multi-column indexes.

26 07:48:08 AM
Now, will skip some topics
One can also speedup simple selects with
● index_merge access method
● index access method
● Index Condition Pushdown
We don't have time for these now, check out the last
year's tutorial.

27 07:48:08 AM
● Introduction
– How to catch it
● JOINs
– ref access
● GROUP BY
● Subqueries

• “Customers with their orders”
28 07:48:08 AM
A simple join
select * from customer, orders where c_custkey=o_custkey

Execution: Nested Loops join
29 07:48:08 AM
for each customer C {
for each order O {
if (C.c_custkey == O.o_custkey)
produce record(C, O);
}
}
• Complexity:
– Scans table customer
– For each record in customer, scans table orders
• Is this ok?

Execution: Nested loops join (2)
30 07:48:08 AM
for each order O {
}
}
• EXPLAIN:
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | |
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where|
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+

31 07:48:08 AM
for each order O {
}
}
rows to read
• EXPLAIN:
from customer
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
rows to read from orders
c_custkey=o_custkey

+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
• Scan a 1,493,361-row table 148,749 times
– Consider 1,493,361 * 148,749 row combinations
• Is this query inherently complex?
– We know each customer has his own orders
– size(customer x orders)= size(orders)
– Lower bound is
1,493,361 + 148,749 + costs to match customer<->order.
32 07:48:08 AM

Using index for join: ref access
alter table orders add index i_o_custkey(o_custkey)
33 07:48:08 AM

Conditions that can be used for ref access
35 07:48:08 AM
● Can use equalities
– tbl.key=other_table.col
– tbl.key=const
– tbl.key IS NULL
● For multipart keys, will use largest prefix
– keypart1=... AND keypart2= … AND keypartK=... .

Conditions that can't be used for ref access
● Doesn't work for non-equalities
36 07:48:08 AM
t1.key BETWEEN t2.col1 AND t2.col2
● Doesn't work for OR-ed equalities
t1.key=t2.col1 OR t1.key=t2.col2
– Except for ref_or_null
t1.key=... OR t1.key IS NULL
● Doesn't “combine” ref and range
access
– t.keypart1 BETWEEN c1 AND c2 AND
t.keypart2=t2.col
– t.keypart2 BETWEEN c1 AND c2 AND
t.keypart1=t2.col .

37 07:48:08 AM
Is ref always efficient?
● Efficient, if column has many different values
– Best case – unique index (eq_ref)
● A few different values – not useful
● Skewed distribution: depends on which part the
join touches
good
bad depends

ref access estimates - index statistics
38 07:48:08 AM
• How many rows will match
tbl.key_column = $value
for an arbitrary $value?
• Index statistics
show keys from orders where key_name='i_o_custkey'
*************************** 1. row ***************
Table: orders
Non_unique: 1
Key_name: i_o_custkey
Seq_in_index: 1
Column_name: o_custkey
Collation: A
Cardinality: 214462
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
show table status like 'orders'
*************************** 1. row ****
Name: orders
Engine: InnoDB
Version: 10
Row_format: Compact
Rows: 1495152
Avg_row_length: 133
Data_length: 199966720
Max_data_length: 0
Index_length: 122421248
Data_free: 6291456
...
average = Rows /Cardinality = 1495152 / 214462 = 6.97.

39 07:48:08 AM
ref access – conclusions
● Based on t.key=... equality conditions
● Can make joins very efficient
● Relies on index statistics for estimates.

40 07:48:08 AM
Optimizer statistics
● MySQL/Percona Server
– Index statistics
– Persistent/transient InnoDB stats
● MariaDB
– Index statistics, persistent/transient
● Same as Percona Server (via XtraDB)
– Persistent,
engine-independent,
index-independent statistics.

41 07:48:08 AM
Index statistics
● Cardinality allows to calculate a table-wide
average #rows-per-key-prefix
● It is a statistical value (inexact)
● Exact collection procedure depends on the
storage engine
– InnoDB – random sampling
– MyISAM – index scan
– Engine-independent – index scan.

42 07:48:08 AM
Index statistics in MySQL 5.6
● Sample [8] random index leaf pages
● Table statistics (stored)
– rows - estimated number of rows in a table
– Other stats not used by optimizer
● Index statistics (stored)
– fields - #fields in the index
– rows_per_key - rows per 1 key value, per prefix fields
([1 column value], [2 columns value], [3 columns value], …)
– Other stats not used by optimizer.

43 07:48:08 AM
Index statics updates
● Statistics updated when:
– ANALYZE TABLE tbl_name [, tbl_name] …
– SHOW TABLE STATUS, SHOW INDEX
– Access to INFORMATION_SCHEMA.[TABLES|
STATISTICS]
– A table is opened for the first time
(after server restart)
– A table has changed >10%
– When InnoDB Monitor is turned ON.

44 07:48:08 AM
Displaying optimizer statistics
● MySQL 5.5, MariaDB 5.3, and older
– Issue SQL statements to count rows/keys
– Indirectly, look at EXPLAIN for simple queries
● MariaDB 5.5, Percona Server 5.5 (using XtraDB)
– information_schema.[innodb_index_stats, innodb_table_stats]
– Read-only, always visible
● MySQL 5.6
– mysql.[innodb_index_stats, innodb_table_stats]
– User updatetable
– Only available if innodb_analyze_is_persistent=ON
● MariaDB 10.0
– Persistent updateable tables mysql.[index_stats, column_stats, table_stats]
– User updateable
– + current XtraDB mechanisms.

45 07:48:08 AM
Plan [in]stability
● Statistics may vary a lot (orders)
MariaDB [dbt3]> select * from information_schema.innodb_index_stats;
+------------+-----------------+--------------+ +---------------+
| table_name | index_name | rows_per_key | | rows_per_key | error (actual)
+------------+-----------------+--------------+ +---------------+
| partsupp | PRIMARY | 3, 1 | | 4, 1 | 25%
| partsupp | i_ps_partkey | 3, 0 | => | 4, 1 | 25% (4)
| partsupp | i_ps_suppkey | 64, 0 | | 91, 1 | 30% (80)
| orders | i_o_orderdate | 9597, 1 | | 1660956, 0 | 99% (6234)
| orders | i_o_custkey | 15, 1 | | 15, 0 | 0% (15)
| lineitem | i_l_receiptdate | 7425, 1, 1 | | 6665850, 1, 1 | 99.9% (23477)
+------------+-----------------+--------------+ +---------------+
MariaDB [dbt3]> select * from information_schema.innodb_table_stats;
+-----------------+----------+ +----------+
| table_name | rows | | rows |
+-----------------+----------+ +----------+
| partsupp | 6524766 | | 9101065 | 28% (8000000)
| orders | 15039855 | ==> | 14948612 | 0.6% (15000000)
| lineitem | 60062904 | | 59992655 | 0.1% (59986052)
.
+-----------------+----------+ +----------+

Controlling statistics (MySQL 5.6)
● Persistent and user-updatetable InnoDB statistics
– innodb_analyze_is_persistent = ON,
– updated manually by ANALYZE TABLE or
– automatically by innodb_stats_auto_recalc = ON
● Control the precision of sampling [default 8]
– innodb_stats_persistent_sample_pages,
– innodb_stats_transient_sample_pages
● No new statistics compared to older versions.
46 07:48:08 AM

Controlling statistics (MariaDB 10.0)
Current XtraDB index statistics
+
● Engine-independent, persistent, user-updateable statistics
● Precise
● Additional statistics per column (even when there is no
index):
– min_value, max_value: minimum/maximum value per
47 07:48:08 AM
column
– nulls_ratio: fraction of null values in a column
– avg_length: average size of values in a column
– avg_frequency: average number of rows with the same
value.

48 07:48:08 AM
Join condition
pushdown

50 07:48:08 AM
select *
from
customer, orders
where
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+

51 07:48:08 AM
select *
from
customer, orders
where
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+

52 07:48:08 AM
select *
from
customer, orders
where
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
● Conjunctive (ANDed) conditions are split into parts
● Each part is attached as early as possible
– Either as “Using where”
– Or as table access method.

Observing join condition pushdown
53 07:48:08 AM
EXPLAIN: {
"query_block": {
"select_id": 1,
"nested_loop": [
{
"table": {
"table_name": "orders",
"access_type": "ALL",
"possible_keys": [
"i_o_custkey"
],
"rows": 1499715,
"filtered": 100,
"attached_condition": "((`dbt3sf1`.òrders`.ò_orderpriority` =
'1-URGENT') and (`dbt3sf1`.òrders`.ò_custkey` is not null))"
}
},
{
"table": {
"table_name": "customer",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"c_custkey"
],
"key_length": "4",
"ref": [
"dbt3sf1.orders.o_custkey"
],
"rows": 1,
"filtered": 100,
"attached_condition": "(`dbt3sf1`.`customer`.`c_acctbal` <
<cache>(-(500)))"
}
● Before mysql-5.6:
EXPLAIN shows only
“Using where”
– The condition itself
only visible in debug
trace
● Starting from 5.6:
EXPLAIN FORMAT=JSON
shows attached
conditions.

Reasoning about join plan efficiency
54 07:48:08 AM
select *
from
customer, orders
where
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT';
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
First table, “customer”
● type=ALL, 150 K rows
● select count(*) from customer where c_acctbal < -500 gives 6804.
● alter table customer add index (c_acctbal).

+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
55 07:48:08 AM
select *
from
customer, orders
where
First table, “customer”
● type=ALL, 150 K rows
● select count(*) from customer where c_acctbal < -500 gives 6804.
● alter table customer add index (c_acctbal)
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition|
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where |
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
Now, access to 'customer' is efficient.

+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
56 07:48:08 AM
select *
from
customer, orders
where
Second table, “orders”
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT'
● ref access uses only c_custkey=o_custkey
● What about o_orderpriority='1-URGENT'?.

57 07:48:08 AM
●o_orderpriority='1-URGENT'
o_orderpriority='1-URGENT'
● select count(*) from orders – 1.5M rows
● select count(*) from orders where o_orderpriority='1-URGENT' - 300K
rows
● 300K / 1.5M = 0.2

+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
58 07:48:08 AM
select *
from
customer, orders
where
Second table, “orders”
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT'
● ref access uses only c_custkey=o_custkey
● What about o_orderpriority='1-URGENT'? Selectivity= 0.2
– Can examine 7*0.2=1.4 rows, 6802 times if we add an index:
alter table orders add index (o_custkey, o_orderpriority)
or
alter table orders add index (o_orderpriority, o_custkey)

Reasoning about join plan efficiency - summary
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+
Basic* approach to evaluation of join plan efficiency:
for each table $T in the join order {
Look at conditions attached to table $T (condition must
use table $T, may also use previous tables)
Does access method used with $T make a good use
of attached conditions?
}
* some other details may also affect join performance
59 07:48:08 AM

60 07:48:08 AM
Attached conditions

61 07:48:08 AM
Attached conditions
● Ideally, should be used for table access
● Not all conditions can be used [at the same time]
– Unused ones are still useful
– They reduce number of scans for subsequent tables
select *
from
customer, orders
where
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+

63 07:48:08 AM
Attached condition selectivity
● Unused indexes provide info about selectivity
– Works, but very expensive
● MariaDB 10.0 has engine-independent statistics
– Index statistics
– Non-indexed Column statistics
● Histograms
– Further info:
Tomorrow, 2:20 pm @ Ballroom D
Igor Babaev
Engine-independent persistent statistics with histograms
in MariaDB.

How to check if the query plan
64 07:48:08 AM
matches the reality

65 07:48:08 AM
Check if query plan is realistic
● EXPLAIN shows what optimizer
expects. It may be wrong
– Out-of-date index statistics
– Non-uniform data distribution
● Other DBMS: EXPLAIN ANALYZE
● MySQL: no equivalent. Instead, have
– Handler counters
– “User statistics” (Percona, MariaDB)
– PERFORMANCE_SCHEMA

Join analysis: example query (Q18, DBT3)
<reset counters>
select c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice, sum(l_quantity)
where
o_totalprice > 500000
and c_custkey = o_custkey
and o_orderkey = l_orderkey
group by c_name, c_custkey, o_orderkey, o_orderdate,
o_totalprice
order by o_totalprice desc, o_orderdate
LIMIT 10;
<collect statistics>
66 07:48:08 AM

Join analysis: handler counters (old)
67 07:48:08 AM
FLUSH STATUS;
=> RUN QUERY
SHOW STATUS LIKE "Handler%";
+----------------------------+-------+
| Handler_mrr_key_refills | 0 |
| Handler_mrr_rowid_refills | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 1646 |
| Handler_read_last | 0 |
| Handler_read_next | 1462 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 10 |
| Handler_read_rnd_deleted | 0 |
| Handler_read_rnd_next | 184 |
| Handler_tmp_update | 1096 |
| Handler_tmp_write | 183 |
| Handler_update | 0 |
| Handler_write | 0 |

Join analysis: USERSTAT by Facebook
MariaDB, Percona Server
SET GLOBAL USERSTAT=1;
FLUSH TABLE_STATISTICS;
FLUSH INDEX_STATISTICS;
=> RUN QUERY
SHOW TABLE_STATISTICS;
+--------------+------------+-----------+--------------+-------------------------+
| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |
+--------------+------------+-----------+--------------+-------------------------+
| dbt3 | orders | 183 | 0 | 0 |
| dbt3 | lineitem | 1279 | 0 | 0 |
| dbt3 | customer | 183 | 0 | 0 |
+--------------+------------+-----------+--------------+-------------------------+
SHOW INDEX_STATISTICS;
+--------------+------------+-----------------------+-----------+
| Table_schema | Table_name | Index_name | Rows_read |
+--------------+------------+-----------------------+-----------+
| dbt3 | customer | PRIMARY | 183 |
| dbt3 | lineitem | i_l_orderkey_quantity | 1279 |
| dbt3 | orders | i_o_totalprice | 183 |
+--------------+------------+-----------------------+-----------+
68 07:48:08 AM

Join analysis: PERFORMANCE SCHEMA
[MySQL 5.6, MariaDB 10.0]
● summary tables with read/write statistics
69 07:48:08 AM
– table_io_waits_summary_by_table
– table_io_waits_summary_by_index_usage
● Superset of the userstat tables
● More overhead
● Not possible to associate statistics with a query
=> truncate stats tables before running a query
● Possible bug
– performance schema not ignored
– Disable by
UPDATE setup_consumers SET ENABLED = 'NO'
where name = 'global_instrumentation';

Analyze joins via PERFORMANCE SCHEMA:
SHOW TABLE_STATISTICS analogue
select object_schema, object_name, count_read, count_write,
70 07:48:08 AM
sum_timer_read, sum_timer_write, ...
from table_io_waits_summary_by_table
where object_schema = 'dbt3' and count_star > 0;
+---------------+-------------+------------+-------------+
| object_schema | object_name | count_read | count_write |
+---------------+-------------+------------+-------------+
| dbt3 | customer | 183 | 0 |
| dbt3 | lineitem | 1462 | 0 |
| dbt3 | orders | 184 | 0 |
+---------------+-------------+------------+-------------+
+----------------+-----------------+
| sum_timer_read | sum_timer_write | ...
+----------------+-----------------+
| 8326528406 | 0 |
| 12117332778 | 0 |
| 7946312812 | 0 |
+----------------+-----------------+

Analyze joins via PERFORMANCE SCHEMA:
SHOW INDEX_STATISTICS analogue
select object_schema, object_name, index_name, count_read,
71 07:48:08 AM
sum_timer_read, sum_timer_write, ...
from table_io_waits_summary_by_index_usage
where object_schema = 'dbt3' and count_star > 0
and index_name is not null;
+---------------+-------------+-----------------------+------------+
| object_schema | object_name | index_name | count_read |
+---------------+-------------+-----------------------+------------+
| dbt3 | customer | PRIMARY | 183 |
| dbt3 | lineitem | i_l_orderkey_quantity | 1462 |
| dbt3 | orders | i_o_totalprice | 184 |
+---------------+-------------+-----------------------+------------+
+----------------+-----------------+
| sum_timer_read | sum_timer_write | ...
+----------------+-----------------+
| 8326528406 | 0 |
| 12117332778 | 0 |
| 7946312812 | 0 |
+----------------+-----------------+

72 07:48:08 AM
● Introduction
– How to catch it
● JOINs
– ref access
● GROUP BY
● Subqueries

73 07:48:08 AM
Batched joins
● Optimization for analytical queries
● Analytic queries shovel through lots of data
– e.g. “average size of order in the last month”
– or “pairs of goods purchased together”
● Indexes,etc won't help when you really need to
look at all data
● More data means greater chance of being io-bound
● Solution: batched joins

74 07:48:08 AM
Batched Key Access Idea

75 07:48:08 AM

76 07:48:08 AM

77 07:48:08 AM

78 07:48:08 AM

79 07:48:08 AM

80 07:48:08 AM
● Non-BKA join hits data at random
● Caches are not used efficiently
● Prefetching is not useful

81 07:48:08 AM
● BKA implementation accesses data
in order
● Takes advantages of caches and
prefetching

82 07:48:08 AM
Batched Key access effect
set join_cache_level=6;
select max(l_extendedprice)
from orders, lineitem
where
l_orderkey=o_orderkey and
o_orderdate between $DATE1 and $DATE2
The benchmark was run with
● Various BKA buffer size
● Various size of $DATE1...$DATE2 range

83 07:48:08 AM
Batched Key Access Performance
3000
2500
2000
1500
1000
500
0
BKA join performance depending on buffer size
-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000
query_size=1, regular
query_size=1, BKA
query_size=2, BKA
query_size=3, BKA
Buffer size, bytes
Query time, sec
Performance without BKA
Performance with BKA,
given sufficient buffer size ● 4x-10x speedup
● The more the data, the bigger the speedup
● Buffer size setting is very important.

84 07:48:08 AM
Batched Key Access settings
● Needs to be turned on
set join_buffer_size= 32*1024*1024;
set join_cache_level=6; -- MariaDB
set optimizer_switch='batched_key_access=on' -- MySQL 5.6
set optimizer_switch='mrr=on';
set optimizer_switch='mrr_sort_keys=on'; -- MariaDB only
● Further join_buffer_size tuning is watching
– Query performance
– Handler_mrr_init counter
and increasing join_buffer_size until either saturates.

85 07:48:08 AM
Batched Key Access - conclusions
● Targeted at big joins
● Needs to be enabled manually
● @@join_buffer_size is the most important
setting
● MariaDB's implementation is a superset of
MySQL's.

86 07:48:08 AM
● Introduction
– How to catch it
● JOINs
– ref access
● GROUP BY
● Subqueries

ORDER BY
87 07:48:08 AM
aggregates
GROUP BY

ORDER BY … LIMIT
Three algorithms
● Use an index to read in order
● Read one table, sort, join - “Using filesort”
● Execute join into temporary table and then
sort - “Using temporary; Using filesort”
89 07:48:08 AM

Using index to read data in order
● No special indication
in EXPLAIN output
● LIMIT n: as soon as
we read n records,
we can stop!
90 07:48:08 AM

92 07:48:08 AM
Using filesort strategy
● Have to read the entire
first table
● For remaining, can apply
LIMIT n
● ORDER BY can only use
columns of tbl1.

93 07:48:08 AM
Using temporary; Using filesort
● ORDER BY clause
can use columns of
any table
● LIMIT is applied only
after executing the
entire join and
sorting.

94 07:48:08 AM
ORDER BY - conclusions
● Resolving ORDER BY with index allows very
efficient handling for LIMIT
– Optimization for
WHERE unused_condition ORDER BY … LIMIT n
is challenging.
● Use sql_big_result, IGNORE INDEX FOR ORDER BY
● Using filesort
– Needs all ORDER BY columns in the first table
– Take advantage of LIMIT when doing join to non-first tables
● Using where; Using filesort is least efficient.

95 07:48:08 AM
GROUP BY strategies
There are three strategies
● Ordered index scan
● Loose Index Scan (LooseScan)
● Groups table
(Using temporary; [Using filesort]).

96 07:48:08 AM
Ordered index scan
● Groups are
enumerated one after
another
● Can compute
aggregates on the fly
● Loose index scan is
also able to jump to
next group.

Execution of GROUP BY with temptable
97 07:48:08 AM

99 07:48:08 AM
Subquery optimizations
● Before MariaDB 5.3/MySQL 5.6 - “don't use subqueries”
● Queries that caused most of the pain
– SELECT … FROM tbl WHERE col IN (SELECT …) - semi-joins
– SELECT … FROM (SELECT …) - derived tables
● MariaDB 5.3 and MySQL 5.6
– Have common inheritance, MySQL 6.0 alpha
– Huge (100x, 1000x) speedups for painful areas
– Other kinds of subqueries received a speedup, too
– MariaDB 5.3/5.5 has a superset of MySQL 5.6's optimizations
● 5.6 handles some un-handled edge cases, too

100 07:48:08 AM
Tuning for subqueries
● “Before”: one execution strategy
– No tuning possible
● “After”: similar to joins
– Reasonable execution strategies supported
– Need indexes
– Need selective conditions
– Support batching in most important cases
● Should be better 9x% of the time.

What if it still picks a poor query plan?
For both MariaDB and MySQL:
● Check EXPLAIN [EXTENDED], find a keyword around a
101 07:48:08 AM
subquery table
● Google “site:kb.askmonty.org $subuqery_keyword”
or https://kb.askmonty.org/en/subquery-optimizations-map/
● Find which optimization it was
● set optimizer_switch='$subquery_optimization=off'

102 07:48:08 AM
Thanks!
Q & A

Advanced Query Optimizer Tuning and Analysis

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Advanced Query Optimizer Tuning and Analysis

Semelhante a Advanced Query Optimizer Tuning and Analysis (20)

Mais de MYXPLAIN

Mais de MYXPLAIN (15)

Último

Último (20)