SlideShare uma empresa Scribd logo
1 de 102
Advanced query optimizer 
tuning and analysis 
Sergei Petrunia 
Timour Katchaounov 
Monty Program Ab 
MySQL Conference And Expo 2013
2 07:48:08 AM 
● Introduction 
– What is an optimizer problem 
– How to catch it 
● old an new tools 
● Single-table selects 
– brief recap from 2012 
● JOINs 
– ref access 
● index statistics 
– join condition pushdown 
– join plan efficiency 
– query plan vs reality 
● Big I/O bound JOINs 
– Batched Key Access 
● Aggregate functions 
● ORDER BY ... LIMIT 
● GROUP BY 
● Subqueries
Is there a problem with query optimizer? 
3 07:48:08 AM 
• Database 
performance is 
affected by many 
factors 
• One of them is the 
query optimizer 
• Is my performance 
problem caused by 
the optimizer?
Sings that there is a query optimizer problem 
• Some (not all) queries are slow 
• A query seems to run longer than it ought to 
– And examines more records than it ought to 
• Usually, query remains slow regardless of 
other activity on the server 
4 07:48:08 AM
Catching slow queries, the old ways 
5 07:48:08 AM 
● Watch the Slow query log 
– Percona Server/MariaDB: 
--log_slow_verbosity=query_plan 
# Thread_id: 1 Schema: dbt3sf10 QC_hit: No 
# Query_time: 2.452373 Lock_time: 0.000113 Rows_sent: 0 Rows_examined: 1500000 
# Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No 
# Filesort: No Filesort_on_disk: No Merge_passes: 0 
SET timestamp=1333385770; 
select * from customer where c_acctbal < -1000; 
– Run pt-query-digest on the log 
• Run SHOW PROCESSLIST periodically
The new way: SHOW PROCESSLIST + SHOW EXPLAIN 
• Available in MariaDB 10.0+ 
• Displays EXPLAIN of a running statement 
MariaDB> show processlist; 
+--+----+---------+-------+-------+----+------------+-------------------------... 
|Id|User|Host |db |Command|Time|State |Info 
+--+----+---------+-------+-------+----+------------+-------------------------... 
| 1|root|localhost|dbt3sf1|Query | 10|Sending data|select max(o_totalprice) ... 
| 2|root|localhost|dbt3sf1|Query | 0|init |show processlist 
+--+----+---------+-------+-------+----+------------+-------------------------... 
MariaDB> show explain for 1; 
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+ 
|1 |SIMPLE |orders|ALL |NULL |NULL|NULL |NULL|1498194|Using where| 
+--+-----------+------+----+-------------+----+-------+----+-------+-----------+ 
MariaDB [dbt3sf1]> show warnings; 
+-----+----+-----------------------------------------------------------------+ 
|Level|Code|Message | 
+-----+----+-----------------------------------------------------------------+ 
|Note |1003|select max(o_totalprice) from orders where year(o_orderDATE)=1995| 
+-----+----+-----------------------------------------------------------------+ 
6 07:48:08 AM
7 07:48:08 AM 
SHOW EXPLAIN usage 
● Intended usage 
– SHOW PROCESSLIST ... 
– SHOW EXPLAIN FOR ... 
● Why not just run EXPLAIN again 
– Difficult to replicate setups 
● Temporary tables 
● Optimizer settings 
● Storage engine's index statistics 
● ... 
– No uncertainty about whether you're looking at 
the same query plan or not.
Catching slow queries (NEW) 
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 
8 07:48:08 AM 
● use performance_schema 
● Many ways to analyze via queries 
– events_statements_summary_by_digest 
● count_star, sum_timer_wait, 
min_timer_wait, avg_timer_wait, max_timer_wait 
● digest_text, digest 
● sum_rows_examined, sum_created_tmp_disk_tables, 
sum_select_full_join 
– events_statements_history 
● sql_text, digest_text, digest 
● timer_start, timer_end, timer_wait 
● rows_examined, created_tmp_disk_tables, 
select_full_join 
8
Catching slow queries (NEW) 
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 
• Modified Q18 from DBT3 
select c_name, c_custkey, o_orderkey, o_orderdate, 
9 07:48:08 AM 
o_totalprice, sum(l_quantity) 
from customer, orders, lineitem 
where 
o_totalprice > ? 
and c_custkey = o_custkey 
and o_orderkey = l_orderkey 
group by c_name, c_custkey, o_orderkey, 
o_orderdate, o_totalprice 
order by o_totalprice desc, o_orderdate 
LIMIT 10; 
• App executes Q18 many times with 
? = 550000, 500000, 400000, ... 
9
Catching slow queries (NEW) 
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 
● Find candidate slow queries 
● Simple tests: select_full_join > 0, 
created_tmp_disk_tables > 0, etc 
● Complex conditions: 
max execution time > X sec OR 
min/max time vary a lot: 
select max_timer_wait/avg_timer_wait as max_ratio, 
avg_timer_wait/min_timer_wait as min_ratio 
from events_statements_summary_by_digest 
where max_timer_wait > 1000000000000 
or max_timer_wait / avg_timer_wait > 2 
or avg_timer_wait / min_timer_wait > 2G 
10 07:48:08 AM
Catching slow queries (NEW) 
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 
11 07:48:08 AM 
*************************** 5. row *************************** 
DIGEST: 3cd7b881cbc0102f65fe8a290ec1bd6b 
DIGEST_TEXT: SELECT `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , 
`o_totalprice` , SUM ( `l_quantity` ) FROM `customer` , `orders` , `lineitem` WHERE 
`o_totalprice` > ? AND `c_custkey` = `o_custkey` AND `o_orderkey` = `l_orderkey` GROUP BY 
`c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` ORDER BY `o_totalprice` 
DESC , `o_orderdate` LIMIT ? 
COUNT_STAR: 3 
SUM_TIMER_WAIT: 3251758347000 
MIN_TIMER_WAIT: 3914209000 → 0.0039 sec 
AVG_TIMER_WAIT: 1083919449000 
MAX_TIMER_WAIT: 3204044053000 → 3.2 sec 
SUM_LOCK_TIME: 555000000 
SUM_ROWS_SENT: 25 
SUM_ROWS_EXAMINED: 0 
SUM_CREATED_TMP_DISK_TABLES: 0 
SUM_CREATED_TMP_TABLES: 3 
SUM_SELECT_FULL_JOIN: 0 
SUM_SELECT_RANGE: 3 
SUM_SELECT_SCAN: 0 
SUM_SORT_RANGE: 0 
SUM_SORT_ROWS: 25 
SUM_SORT_SCAN: 3 
SUM_NO_INDEX_USED: 0 
SUM_NO_GOOD_INDEX_USED: 0 
FIRST_SEEN: 1970-01-01 03:38:27 
LAST_SEEN: 1970-01-01 03:38:43 
max_ratio: 2.9560 
min_ratio: 276.9192 
High variance of 
execution time
Catching slow queries (NEW) 
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 
● Check the actual queries and constants 
● The events_statements_history table 
select timer_wait/1000000000000 as exec_time, sql_text 
from events_statements_history 
where digest in 
(select digest from events_statements_summary_by_digest 
where max_timer_wait > 1000000000000 
12 07:48:08 AM 
or max_timer_wait / avg_timer_wait > 2 
or avg_timer_wait / min_timer_wait > 2) 
order by timer_wait;
Catching slow queries (NEW) 
PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 
+-----------+-----------------------------------------------------------------------------------+ 
| exec_time | sql_text | 
+-----------+-----------------------------------------------------------------------------------+ 
| 0.0039 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) 
from customer, orders, lineitem 
where o_totalprice > 550000 and c_custkey = o_custkey ... LIMIT 10 | 
| 0.0438 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) 
from customer, orders, lineitem 
where o_totalprice > 500000 and c_custkey = o_custkey ... LIMIT 10 | 
| 3.2040 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) 
from customer, orders, lineitem 
where o_totalprice > 400000 and c_custkey = o_custkey ... LIMIT 10 | 
+-----------+-----------------------------------------------------------------------------------+ 
Observation: 
orders.o_totalprice > ? is less and less selective 
13 07:48:08 AM
Actions after finding the slow query 
Bad query plan 
– Rewrite the query 
– Force a good query plan 
• Bad optimizer settings 
– Do tuning 
• Query is inherently complex 
– Don't waste time with it 
– Look for other solutions. 
14 07:48:08 AM
15 07:48:08 AM 
● Introduction 
– What is an optimizer problem 
– How to catch it 
● old an new tools 
● Single-table selects 
– brief recap from 2012 
● JOINs 
– ref access 
● index statistics 
– join condition pushdown 
– join plan efficiency 
– query plan vs reality 
● Big I/O bound JOINs 
– Batched Key Access 
● Aggregate functions 
● ORDER BY ... LIMIT 
● GROUP BY 
● Subqueries
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and 
o_clerk='Clerk#000009506' 
● Run the query: 
19 rows in set (7.65 sec) 
● Check the query plan: 
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | 
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where | 
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 
16 07:48:08 AM 
Consider a simple select 
select * from orders 
where 
• 15M rows were scanned, 19 rows in output 
• Query plan seems inefficient 
– (note: this logic doesn't directly apply to group/order by queries).
select * from orders 
where 
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and 
o_clerk='Clerk#000009506' 
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | 
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where | 
+----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 
17 07:48:08 AM 
Query plan analysis 
• Entire table is scanned 
• WHERE condition checked 
after records are read 
– Not used to limit 
#examined rows.
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and 
o_clerk='Clerk#000009506' 
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 
|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where| 
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 
18 07:48:08 AM 
Let's add an index 
alter table orders add key i_o_orderdate (o_orderdate); 
select * from orders 
where 
● Query time: 
19 rows in set (0.76 sec) 
• Outcome 
– Down to reading 300K rows 
– Still, 300K >> 19 rows.
Finding out which indexes to add 
select * from orders 
where 
o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and 
o_clerk='Clerk#000009506' 
Check selectivity of conditions that will use the index 
o_orderDate BETWEEN '1992-06-06' and '1992-07-06'; 
19 07:48:08 AM 
● index (o_orderdate) 
select count(*) from orders 
where 
306322 rows 
● index (o_clerk) 
select count(*) from orders where o_clerk='Clerk#000009506' 
1507 rows.
Try adding composite indexes 
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | 
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ 
|1 |SIMPLE |orders|range|i_o_clerk_...|i_o_clerk_date|20 |NULL|19 |Using where| 
+--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ 
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ 
|1 |SIMPLE |orders|range|i_o_date_c...|i_o_date_clerk|20 |NULL|360354|Using where| 
+--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ 
20 07:48:08 AM 
● index (o_clerk, o_orderdate) 
● index (o_orderdate, o_clerk) 
Bingo! 100% efficiency 
Much worse! 
• If condition uses multiple columns, composite index will be most efficient 
• Order of column matters 
– Explanation why is outside of scope of this tutorial. Covered in last year's 
tutorial
Conditions must be in SARGable form 
• Condition must represent a range 
• It must have form that is recognized by the optimizer 
o_orderDate BETWEEN '1992-06-01' and '1992-06-30' 
day(o_orderDate)=1992 and month(o_orderdate)=6 
TO_DAYS(o_orderDATE) between TO_DAYS('1992-06-06') and 
TO_DAYS('1992-07-06') 
21 07:48:08 AM 
o_clerk='Clerk#000009506' 
o_clerk LIKE 'Clerk#000009506' 
o_clerk LIKE '%Clerk#000009506%' 
 
 
 
 
 
 
 
column IN (1,10,15,21, ...) 
(col1, col2) IN ( (1,1), (2,2), (3,3), …).
New in MySQL-5.6: optimizer_trace 
22 07:48:08 AM 
● Lets you see the ranges 
set optimizer_trace=1; 
explain select * from orders 
where o_orderDATE between '1992-06-01' and '1992-07-03' and 
o_orderdate not in ('1992-01-01', '1992-06-12','1992-07-04') 
select * from information_schema.optimizer_traceG 
● Will print a big JSON struct 
● Search for range_scan_alternatives.
New in MySQL-5.6: optimizer_trace 
23 07:48:08 AM 
... 
"range_scan_alternatives": [ 
{ 
"index": "i_o_orderdate", 
"ranges": [ 
"1992-06-01 <= o_orderDATE < 1992-06-12", 
"1992-06-12 < o_orderDATE <= 1992-07-03" 
], 
"index_dives_for_eq_ranges": true, 
"rowid_ordered": false, 
"using_mrr": false, 
"index_only": false, 
"rows": 319082, 
"cost": 382900, 
"chosen": true 
}, 
{ 
"index": "i_o_date_clerk", 
"ranges": [ 
"1992-06-01 <= o_orderDATE < 1992-06-12", 
"1992-06-12 < o_orderDATE <= 1992-07-03" 
], 
"index_dives_for_eq_ranges": true, 
"rowid_ordered": false, 
"using_mrr": false, 
"index_only": false, 
"rows": 406336, 
"cost": 487605, 
"chosen": false, 
"cause": "cost" 
} 
], 
... 
● Considered ranges are shown 
in range_scan_alternatives 
section 
● This is actually original use 
case of optimizer_trace 
● Alas, recent mysql-5.6 displays 
misleading info about ranges 
on multi-component keys (will 
file a bug) 
● Still, very useful.
24 07:48:08 AM 
Source of #rows estimates for range 
select * from orders 
where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' 
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 
|1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where| 
+--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 
? 
• “records_in_range” estimate 
• Done by diving into index 
• Usually is fairly accurate 
• Not affected by ANALYZE 
TABLE.
25 07:48:08 AM 
Simple selects: conclusions 
• Efficiency == “#rows_scanned is close to #rows_returned” 
• Indexes and WHERE conditions reduce #rows scanned 
• Index estimates are usually accurate 
• Multi-column indexes 
– “handle” conditions on multiple columns 
– Order of columns in the index matters 
• optimizer_trace allows to view the ranges 
– But misrepresents ranges over multi-column indexes.
26 07:48:08 AM 
Now, will skip some topics 
One can also speedup simple selects with 
● index_merge access method 
● index access method 
● Index Condition Pushdown 
We don't have time for these now, check out the last 
year's tutorial.
27 07:48:08 AM 
● Introduction 
– What is an optimizer problem 
– How to catch it 
● old an new tools 
● Single-table selects 
– brief recap from 2012 
● JOINs 
– ref access 
● index statistics 
– join condition pushdown 
– join plan efficiency 
– query plan vs reality 
● Big I/O bound JOINs 
– Batched Key Access 
● Aggregate functions 
● ORDER BY ... LIMIT 
● GROUP BY 
● Subqueries
• “Customers with their orders” 
28 07:48:08 AM 
A simple join 
select * from customer, orders where c_custkey=o_custkey
Execution: Nested Loops join 
select * from customer, orders where c_custkey=o_custkey 
29 07:48:08 AM 
for each customer C { 
for each order O { 
if (C.c_custkey == O.o_custkey) 
produce record(C, O); 
} 
} 
• Complexity: 
– Scans table customer 
– For each record in customer, scans table orders 
• Is this ok?
Execution: Nested loops join (2) 
select * from customer, orders where c_custkey=o_custkey 
30 07:48:08 AM 
for each customer C { 
for each order O { 
if (C.c_custkey == O.o_custkey) 
produce record(C, O); 
} 
} 
• EXPLAIN: 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | 
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
Execution: Nested loops join (3) 
select * from customer, orders where c_custkey=o_custkey 
31 07:48:08 AM 
for each customer C { 
for each order O { 
if (C.c_custkey == O.o_custkey) 
produce record(C, O); 
} 
} 
rows to read 
• EXPLAIN: 
from customer 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | 
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
rows to read from orders 
c_custkey=o_custkey
Execution: Nested loops join (4) 
select * from customer, orders where c_custkey=o_custkey 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
|1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | 
|1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| 
+--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ 
• Scan a 1,493,361-row table 148,749 times 
– Consider 1,493,361 * 148,749 row combinations 
• Is this query inherently complex? 
– We know each customer has his own orders 
– size(customer x orders)= size(orders) 
– Lower bound is 
1,493,361 + 148,749 + costs to match customer<->order. 
32 07:48:08 AM
Using index for join: ref access 
alter table orders add index i_o_custkey(o_custkey) 
select * from customer, orders where c_custkey=o_custkey 
33 07:48:08 AM
select * from customer, orders where c_custkey=o_custkey 
34 07:48:08 AM 
ref access - analysis 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |148749| | 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ 
● One ref lookup scans 7 rows. 
● In total: 7 * 148,749=1,041,243 rows 
– `orders` has 1.4M rows 
– no redundant reads from `orders` 
● The whole query plan 
– Reads all customers 
– Reads 1M orders (of 1.4M) 
● Efficient!
Conditions that can be used for ref access 
35 07:48:08 AM 
● Can use equalities 
– tbl.key=other_table.col 
– tbl.key=const 
– tbl.key IS NULL 
● For multipart keys, will use largest prefix 
– keypart1=... AND keypart2= … AND keypartK=... .
Conditions that can't be used for ref access 
● Doesn't work for non-equalities 
36 07:48:08 AM 
t1.key BETWEEN t2.col1 AND t2.col2 
● Doesn't work for OR-ed equalities 
t1.key=t2.col1 OR t1.key=t2.col2 
– Except for ref_or_null 
t1.key=... OR t1.key IS NULL 
● Doesn't “combine” ref and range 
access 
– t.keypart1 BETWEEN c1 AND c2 AND 
t.keypart2=t2.col 
– t.keypart2 BETWEEN c1 AND c2 AND 
t.keypart1=t2.col .
37 07:48:08 AM 
Is ref always efficient? 
● Efficient, if column has many different values 
– Best case – unique index (eq_ref) 
● A few different values – not useful 
● Skewed distribution: depends on which part the 
join touches 
good 
bad depends
ref access estimates - index statistics 
38 07:48:08 AM 
• How many rows will match 
tbl.key_column = $value 
for an arbitrary $value? 
• Index statistics 
show keys from orders where key_name='i_o_custkey' 
*************************** 1. row *************** 
Table: orders 
Non_unique: 1 
Key_name: i_o_custkey 
Seq_in_index: 1 
Column_name: o_custkey 
Collation: A 
Cardinality: 214462 
Sub_part: NULL 
Packed: NULL 
Null: YES 
Index_type: BTREE 
show table status like 'orders' 
*************************** 1. row **** 
Name: orders 
Engine: InnoDB 
Version: 10 
Row_format: Compact 
Rows: 1495152 
Avg_row_length: 133 
Data_length: 199966720 
Max_data_length: 0 
Index_length: 122421248 
Data_free: 6291456 
... 
average = Rows /Cardinality = 1495152 / 214462 = 6.97.
39 07:48:08 AM 
ref access – conclusions 
● Based on t.key=... equality conditions 
● Can make joins very efficient 
● Relies on index statistics for estimates.
40 07:48:08 AM 
Optimizer statistics 
● MySQL/Percona Server 
– Index statistics 
– Persistent/transient InnoDB stats 
● MariaDB 
– Index statistics, persistent/transient 
● Same as Percona Server (via XtraDB) 
– Persistent, 
engine-independent, 
index-independent statistics.
41 07:48:08 AM 
Index statistics 
● Cardinality allows to calculate a table-wide 
average #rows-per-key-prefix 
● It is a statistical value (inexact) 
● Exact collection procedure depends on the 
storage engine 
– InnoDB – random sampling 
– MyISAM – index scan 
– Engine-independent – index scan.
42 07:48:08 AM 
Index statistics in MySQL 5.6 
● Sample [8] random index leaf pages 
● Table statistics (stored) 
– rows - estimated number of rows in a table 
– Other stats not used by optimizer 
● Index statistics (stored) 
– fields - #fields in the index 
– rows_per_key - rows per 1 key value, per prefix fields 
([1 column value], [2 columns value], [3 columns value], …) 
– Other stats not used by optimizer.
43 07:48:08 AM 
Index statics updates 
● Statistics updated when: 
– ANALYZE TABLE tbl_name [, tbl_name] … 
– SHOW TABLE STATUS, SHOW INDEX 
– Access to INFORMATION_SCHEMA.[TABLES| 
STATISTICS] 
– A table is opened for the first time 
(after server restart) 
– A table has changed >10% 
– When InnoDB Monitor is turned ON.
44 07:48:08 AM 
Displaying optimizer statistics 
● MySQL 5.5, MariaDB 5.3, and older 
– Issue SQL statements to count rows/keys 
– Indirectly, look at EXPLAIN for simple queries 
● MariaDB 5.5, Percona Server 5.5 (using XtraDB) 
– information_schema.[innodb_index_stats, innodb_table_stats] 
– Read-only, always visible 
● MySQL 5.6 
– mysql.[innodb_index_stats, innodb_table_stats] 
– User updatetable 
– Only available if innodb_analyze_is_persistent=ON 
● MariaDB 10.0 
– Persistent updateable tables mysql.[index_stats, column_stats, table_stats] 
– User updateable 
– + current XtraDB mechanisms.
45 07:48:08 AM 
Plan [in]stability 
● Statistics may vary a lot (orders) 
MariaDB [dbt3]> select * from information_schema.innodb_index_stats; 
+------------+-----------------+--------------+ +---------------+ 
| table_name | index_name | rows_per_key | | rows_per_key | error (actual) 
+------------+-----------------+--------------+ +---------------+ 
| partsupp | PRIMARY | 3, 1 | | 4, 1 | 25% 
| partsupp | i_ps_partkey | 3, 0 | => | 4, 1 | 25% (4) 
| partsupp | i_ps_suppkey | 64, 0 | | 91, 1 | 30% (80) 
| orders | i_o_orderdate | 9597, 1 | | 1660956, 0 | 99% (6234) 
| orders | i_o_custkey | 15, 1 | | 15, 0 | 0% (15) 
| lineitem | i_l_receiptdate | 7425, 1, 1 | | 6665850, 1, 1 | 99.9% (23477) 
+------------+-----------------+--------------+ +---------------+ 
MariaDB [dbt3]> select * from information_schema.innodb_table_stats; 
+-----------------+----------+ +----------+ 
| table_name | rows | | rows | 
+-----------------+----------+ +----------+ 
| partsupp | 6524766 | | 9101065 | 28% (8000000) 
| orders | 15039855 | ==> | 14948612 | 0.6% (15000000) 
| lineitem | 60062904 | | 59992655 | 0.1% (59986052) 
. 
+-----------------+----------+ +----------+
Controlling statistics (MySQL 5.6) 
● Persistent and user-updatetable InnoDB statistics 
– innodb_analyze_is_persistent = ON, 
– updated manually by ANALYZE TABLE or 
– automatically by innodb_stats_auto_recalc = ON 
● Control the precision of sampling [default 8] 
– innodb_stats_persistent_sample_pages, 
– innodb_stats_transient_sample_pages 
● No new statistics compared to older versions. 
46 07:48:08 AM
Controlling statistics (MariaDB 10.0) 
Current XtraDB index statistics 
+ 
● Engine-independent, persistent, user-updateable statistics 
● Precise 
● Additional statistics per column (even when there is no 
index): 
– min_value, max_value: minimum/maximum value per 
47 07:48:08 AM 
column 
– nulls_ratio: fraction of null values in a column 
– avg_length: average size of values in a column 
– avg_frequency: average number of rows with the same 
value.
48 07:48:08 AM 
Join condition 
pushdown
49 07:48:08 AM 
Join condition pushdown 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and 
o_orderpriority='1-URGENT'; 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+.
50 07:48:08 AM 
Join condition pushdown 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and 
o_orderpriority='1-URGENT'; 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
51 07:48:08 AM 
Join condition pushdown 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and 
o_orderpriority='1-URGENT'; 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
52 07:48:08 AM 
Join condition pushdown 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and 
o_orderpriority='1-URGENT'; 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
● Conjunctive (ANDed) conditions are split into parts 
● Each part is attached as early as possible 
– Either as “Using where” 
– Or as table access method.
Observing join condition pushdown 
53 07:48:08 AM 
EXPLAIN: { 
"query_block": { 
"select_id": 1, 
"nested_loop": [ 
{ 
"table": { 
"table_name": "orders", 
"access_type": "ALL", 
"possible_keys": [ 
"i_o_custkey" 
], 
"rows": 1499715, 
"filtered": 100, 
"attached_condition": "((`dbt3sf1`.`orders`.`o_orderpriority` = 
'1-URGENT') and (`dbt3sf1`.`orders`.`o_custkey` is not null))" 
} 
}, 
{ 
"table": { 
"table_name": "customer", 
"access_type": "eq_ref", 
"possible_keys": [ 
"PRIMARY" 
], 
"key": "PRIMARY", 
"used_key_parts": [ 
"c_custkey" 
], 
"key_length": "4", 
"ref": [ 
"dbt3sf1.orders.o_custkey" 
], 
"rows": 1, 
"filtered": 100, 
"attached_condition": "(`dbt3sf1`.`customer`.`c_acctbal` < 
<cache>(-(500)))" 
} 
● Before mysql-5.6: 
EXPLAIN shows only 
“Using where” 
– The condition itself 
only visible in debug 
trace 
● Starting from 5.6: 
EXPLAIN FORMAT=JSON 
shows attached 
conditions.
Reasoning about join plan efficiency 
54 07:48:08 AM 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
First table, “customer” 
● type=ALL, 150 K rows 
● select count(*) from customer where c_acctbal < -500 gives 6804. 
● alter table customer add index (c_acctbal).
Reasoning about join plan efficiency 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
55 07:48:08 AM 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; 
First table, “customer” 
● type=ALL, 150 K rows 
● select count(*) from customer where c_acctbal < -500 gives 6804. 
● alter table customer add index (c_acctbal) 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
Now, access to 'customer' is efficient.
Reasoning about join plan efficiency 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
56 07:48:08 AM 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; 
Second table, “orders” 
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT' 
● ref access uses only c_custkey=o_custkey 
● What about o_orderpriority='1-URGENT'?.
57 07:48:08 AM 
●o_orderpriority='1-URGENT' 
o_orderpriority='1-URGENT' 
● select count(*) from orders – 1.5M rows 
● select count(*) from orders where o_orderpriority='1-URGENT' - 300K 
rows 
● 300K / 1.5M = 0.2
Reasoning about join plan efficiency 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
58 07:48:08 AM 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; 
Second table, “orders” 
● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT' 
● ref access uses only c_custkey=o_custkey 
● What about o_orderpriority='1-URGENT'? Selectivity= 0.2 
– Can examine 7*0.2=1.4 rows, 6802 times if we add an index: 
alter table orders add index (o_custkey, o_orderpriority) 
or 
alter table orders add index (o_orderpriority, o_custkey)
Reasoning about join plan efficiency - summary 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
|1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | 
+--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 
Basic* approach to evaluation of join plan efficiency: 
for each table $T in the join order { 
Look at conditions attached to table $T (condition must 
use table $T, may also use previous tables) 
Does access method used with $T make a good use 
of attached conditions? 
} 
* some other details may also affect join performance 
59 07:48:08 AM
60 07:48:08 AM 
Attached conditions
61 07:48:08 AM 
Attached conditions 
● Ideally, should be used for table access 
● Not all conditions can be used [at the same time] 
– Unused ones are still useful 
– They reduce number of scans for subsequent tables 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal < -500 and 
o_orderpriority='1-URGENT'; 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| 
+--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
Informing optimizer about attached conditions 
Currently: a range access that's too expensive to use 
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ 
|id|select_type|table |type|possible_keys |key |key_len|ref |rows |filtered|Extra | 
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ 
|1 |SIMPLE |customer|ALL |PRIMARY,c_acctbal|NULL |NULL |NULL |150081| 36.22 |Using where| 
|1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | 100.00 |Using where| 
+--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ 
62 07:48:08 AM 
explain extended 
select * 
from 
customer, orders 
where 
c_custkey=o_custkey and c_acctbal > 8000 and 
o_orderpriority='1-URGENT'; 
● `orders` will be scanned 150081 * 36.22%= 54359 times 
● This reduces the cost of join 
– Has an effect when comparing potential join plans 
● => Index i_o_custkey is not used. But may help the optimizer.
63 07:48:08 AM 
Attached condition selectivity 
● Unused indexes provide info about selectivity 
– Works, but very expensive 
● MariaDB 10.0 has engine-independent statistics 
– Index statistics 
– Non-indexed Column statistics 
● Histograms 
– Further info: 
Tomorrow, 2:20 pm @ Ballroom D 
Igor Babaev 
Engine-independent persistent statistics with histograms 
in MariaDB.
How to check if the query plan 
64 07:48:08 AM 
matches the reality
65 07:48:08 AM 
Check if query plan is realistic 
● EXPLAIN shows what optimizer 
expects. It may be wrong 
– Out-of-date index statistics 
– Non-uniform data distribution 
● Other DBMS: EXPLAIN ANALYZE 
● MySQL: no equivalent. Instead, have 
– Handler counters 
– “User statistics” (Percona, MariaDB) 
– PERFORMANCE_SCHEMA
Join analysis: example query (Q18, DBT3) 
<reset counters> 
select c_name, c_custkey, o_orderkey, o_orderdate, 
o_totalprice, sum(l_quantity) 
from customer, orders, lineitem 
where 
o_totalprice > 500000 
and c_custkey = o_custkey 
and o_orderkey = l_orderkey 
group by c_name, c_custkey, o_orderkey, o_orderdate, 
o_totalprice 
order by o_totalprice desc, o_orderdate 
LIMIT 10; 
<collect statistics> 
66 07:48:08 AM
Join analysis: handler counters (old) 
67 07:48:08 AM 
FLUSH STATUS; 
=> RUN QUERY 
SHOW STATUS LIKE "Handler%"; 
+----------------------------+-------+ 
| Handler_mrr_key_refills | 0 | 
| Handler_mrr_rowid_refills | 0 | 
| Handler_read_first | 0 | 
| Handler_read_key | 1646 | 
| Handler_read_last | 0 | 
| Handler_read_next | 1462 | 
| Handler_read_prev | 0 | 
| Handler_read_rnd | 10 | 
| Handler_read_rnd_deleted | 0 | 
| Handler_read_rnd_next | 184 | 
| Handler_tmp_update | 1096 | 
| Handler_tmp_write | 183 | 
| Handler_update | 0 | 
| Handler_write | 0 |
Join analysis: USERSTAT by Facebook 
MariaDB, Percona Server 
SET GLOBAL USERSTAT=1; 
FLUSH TABLE_STATISTICS; 
FLUSH INDEX_STATISTICS; 
=> RUN QUERY 
SHOW TABLE_STATISTICS; 
+--------------+------------+-----------+--------------+-------------------------+ 
| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes | 
+--------------+------------+-----------+--------------+-------------------------+ 
| dbt3 | orders | 183 | 0 | 0 | 
| dbt3 | lineitem | 1279 | 0 | 0 | 
| dbt3 | customer | 183 | 0 | 0 | 
+--------------+------------+-----------+--------------+-------------------------+ 
SHOW INDEX_STATISTICS; 
+--------------+------------+-----------------------+-----------+ 
| Table_schema | Table_name | Index_name | Rows_read | 
+--------------+------------+-----------------------+-----------+ 
| dbt3 | customer | PRIMARY | 183 | 
| dbt3 | lineitem | i_l_orderkey_quantity | 1279 | 
| dbt3 | orders | i_o_totalprice | 183 | 
+--------------+------------+-----------------------+-----------+ 
68 07:48:08 AM
Join analysis: PERFORMANCE SCHEMA 
[MySQL 5.6, MariaDB 10.0] 
● summary tables with read/write statistics 
69 07:48:08 AM 
– table_io_waits_summary_by_table 
– table_io_waits_summary_by_index_usage 
● Superset of the userstat tables 
● More overhead 
● Not possible to associate statistics with a query 
=> truncate stats tables before running a query 
● Possible bug 
– performance schema not ignored 
– Disable by 
UPDATE setup_consumers SET ENABLED = 'NO' 
where name = 'global_instrumentation';
Analyze joins via PERFORMANCE SCHEMA: 
SHOW TABLE_STATISTICS analogue 
select object_schema, object_name, count_read, count_write, 
70 07:48:08 AM 
sum_timer_read, sum_timer_write, ... 
from table_io_waits_summary_by_table 
where object_schema = 'dbt3' and count_star > 0; 
+---------------+-------------+------------+-------------+ 
| object_schema | object_name | count_read | count_write | 
+---------------+-------------+------------+-------------+ 
| dbt3 | customer | 183 | 0 | 
| dbt3 | lineitem | 1462 | 0 | 
| dbt3 | orders | 184 | 0 | 
+---------------+-------------+------------+-------------+ 
+----------------+-----------------+ 
| sum_timer_read | sum_timer_write | ... 
+----------------+-----------------+ 
| 8326528406 | 0 | 
| 12117332778 | 0 | 
| 7946312812 | 0 | 
+----------------+-----------------+
Analyze joins via PERFORMANCE SCHEMA: 
SHOW INDEX_STATISTICS analogue 
select object_schema, object_name, index_name, count_read, 
71 07:48:08 AM 
sum_timer_read, sum_timer_write, ... 
from table_io_waits_summary_by_index_usage 
where object_schema = 'dbt3' and count_star > 0 
and index_name is not null; 
+---------------+-------------+-----------------------+------------+ 
| object_schema | object_name | index_name | count_read | 
+---------------+-------------+-----------------------+------------+ 
| dbt3 | customer | PRIMARY | 183 | 
| dbt3 | lineitem | i_l_orderkey_quantity | 1462 | 
| dbt3 | orders | i_o_totalprice | 184 | 
+---------------+-------------+-----------------------+------------+ 
+----------------+-----------------+ 
| sum_timer_read | sum_timer_write | ... 
+----------------+-----------------+ 
| 8326528406 | 0 | 
| 12117332778 | 0 | 
| 7946312812 | 0 | 
+----------------+-----------------+
72 07:48:08 AM 
● Introduction 
– What is an optimizer problem 
– How to catch it 
● old an new tools 
● Single-table selects 
– brief recap from 2012 
● JOINs 
– ref access 
● index statistics 
– join condition pushdown 
– join plan efficiency 
– query plan vs reality 
● Big I/O bound JOINs 
– Batched Key Access 
● Aggregate functions 
● ORDER BY ... LIMIT 
● GROUP BY 
● Subqueries
73 07:48:08 AM 
Batched joins 
● Optimization for analytical queries 
● Analytic queries shovel through lots of data 
– e.g. “average size of order in the last month” 
– or “pairs of goods purchased together” 
● Indexes,etc won't help when you really need to 
look at all data 
● More data means greater chance of being io-bound 
● Solution: batched joins
74 07:48:08 AM 
Batched Key Access Idea
75 07:48:08 AM 
Batched Key Access Idea
76 07:48:08 AM 
Batched Key Access Idea
77 07:48:08 AM 
Batched Key Access Idea
78 07:48:08 AM 
Batched Key Access Idea
79 07:48:08 AM 
Batched Key Access Idea
80 07:48:08 AM 
Batched Key Access Idea 
● Non-BKA join hits data at random 
● Caches are not used efficiently 
● Prefetching is not useful
81 07:48:08 AM 
Batched Key Access Idea 
● BKA implementation accesses data 
in order 
● Takes advantages of caches and 
prefetching
82 07:48:08 AM 
Batched Key access effect 
set join_cache_level=6; 
select max(l_extendedprice) 
from orders, lineitem 
where 
l_orderkey=o_orderkey and 
o_orderdate between $DATE1 and $DATE2 
The benchmark was run with 
● Various BKA buffer size 
● Various size of $DATE1...$DATE2 range
83 07:48:08 AM 
Batched Key Access Performance 
3000 
2500 
2000 
1500 
1000 
500 
0 
BKA join performance depending on buffer size 
-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000 
query_size=1, regular 
query_size=1, BKA 
query_size=2, regular 
query_size=2, BKA 
query_size=3, regular 
query_size=3, BKA 
Buffer size, bytes 
Query time, sec 
Performance without BKA 
Performance with BKA, 
given sufficient buffer size ● 4x-10x speedup 
● The more the data, the bigger the speedup 
● Buffer size setting is very important.
84 07:48:08 AM 
Batched Key Access settings 
● Needs to be turned on 
set join_buffer_size= 32*1024*1024; 
set join_cache_level=6; -- MariaDB 
set optimizer_switch='batched_key_access=on' -- MySQL 5.6 
set optimizer_switch='mrr=on'; 
set optimizer_switch='mrr_sort_keys=on'; -- MariaDB only 
● Further join_buffer_size tuning is watching 
– Query performance 
– Handler_mrr_init counter 
and increasing join_buffer_size until either saturates.
85 07:48:08 AM 
Batched Key Access - conclusions 
● Targeted at big joins 
● Needs to be enabled manually 
● @@join_buffer_size is the most important 
setting 
● MariaDB's implementation is a superset of 
MySQL's.
86 07:48:08 AM 
● Introduction 
– What is an optimizer problem 
– How to catch it 
● old an new tools 
● Single-table selects 
– brief recap from 2012 
● JOINs 
– ref access 
● index statistics 
– join condition pushdown 
– join plan efficiency 
– query plan vs reality 
● Big I/O bound JOINs 
– Batched Key Access 
● Aggregate functions 
● ORDER BY ... LIMIT 
● GROUP BY 
● Subqueries
ORDER BY 
87 07:48:08 AM 
aggregates 
GROUP BY
Aggregate functions, no GROUP BY 
● COUNT, SUM, AVG, etc need to examine all rows 
select SUM(column) from tbl needs to examine the whole tbl. 
● MIN and MAX can use index for lookup 
index (o_orderpriority, o_orderdate) 
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ 
|id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra | 
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ 
|1 |SIMPLE |NULL |NULL|NULL |NULL|NULL |NULL|NULL|Select tables optimized away| 
+--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ 
88 07:48:08 AM 
index (o_orderdate) 
select max(o_orderdate) from orders 
select min(o_orderdate) from orders where o_orderdate > '1995-05-01' 
select max(o_orderdate) from orders where o_orderpriority='1-URGENT'
ORDER BY … LIMIT 
Three algorithms 
● Use an index to read in order 
● Read one table, sort, join - “Using filesort” 
● Execute join into temporary table and then 
sort - “Using temporary; Using filesort” 
89 07:48:08 AM
Using index to read data in order 
● No special indication 
in EXPLAIN output 
● LIMIT n: as soon as 
we read n records, 
we can stop! 
90 07:48:08 AM
A problem with LIMIT N optimization 
`orders` has 1.5 M rows 
explain select * from orders order by o_orderdate desc limit 10; 
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra| 
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ 
|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 | | 
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ 
select * from orders where o_orderpriority='1-URGENT' order by o_orderdate desc limit 10; 
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ 
|id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | 
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ 
|1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 |Using where| 
+--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ 
91 07:48:08 AM 
● A problem: 
– 1.5M rows, 300K of them 'URGENT' 
– Scanning by date, when will we find 10 'URGENT' rows? 
– No good solution so far.
92 07:48:08 AM 
Using filesort strategy 
● Have to read the entire 
first table 
● For remaining, can apply 
LIMIT n 
● ORDER BY can only use 
columns of tbl1.
93 07:48:08 AM 
Using temporary; Using filesort 
● ORDER BY clause 
can use columns of 
any table 
● LIMIT is applied only 
after executing the 
entire join and 
sorting.
94 07:48:08 AM 
ORDER BY - conclusions 
● Resolving ORDER BY with index allows very 
efficient handling for LIMIT 
– Optimization for 
WHERE unused_condition ORDER BY … LIMIT n 
is challenging. 
● Use sql_big_result, IGNORE INDEX FOR ORDER BY 
● Using filesort 
– Needs all ORDER BY columns in the first table 
– Take advantage of LIMIT when doing join to non-first tables 
● Using where; Using filesort is least efficient.
95 07:48:08 AM 
GROUP BY strategies 
There are three strategies 
● Ordered index scan 
● Loose Index Scan (LooseScan) 
● Groups table 
(Using temporary; [Using filesort]).
96 07:48:08 AM 
Ordered index scan 
● Groups are 
enumerated one after 
another 
● Can compute 
aggregates on the fly 
● Loose index scan is 
also able to jump to 
next group.
Execution of GROUP BY with temptable 
97 07:48:08 AM
Subqueries 
98 07:48:08 AM
99 07:48:08 AM 
Subquery optimizations 
● Before MariaDB 5.3/MySQL 5.6 - “don't use subqueries” 
● Queries that caused most of the pain 
– SELECT … FROM tbl WHERE col IN (SELECT …) - semi-joins 
– SELECT … FROM (SELECT …) - derived tables 
● MariaDB 5.3 and MySQL 5.6 
– Have common inheritance, MySQL 6.0 alpha 
– Huge (100x, 1000x) speedups for painful areas 
– Other kinds of subqueries received a speedup, too 
– MariaDB 5.3/5.5 has a superset of MySQL 5.6's optimizations 
● 5.6 handles some un-handled edge cases, too
100 07:48:08 AM 
Tuning for subqueries 
● “Before”: one execution strategy 
– No tuning possible 
● “After”: similar to joins 
– Reasonable execution strategies supported 
– Need indexes 
– Need selective conditions 
– Support batching in most important cases 
● Should be better 9x% of the time.
What if it still picks a poor query plan? 
For both MariaDB and MySQL: 
● Check EXPLAIN [EXTENDED], find a keyword around a 
101 07:48:08 AM 
subquery table 
● Google “site:kb.askmonty.org $subuqery_keyword” 
or https://kb.askmonty.org/en/subquery-optimizations-map/ 
● Find which optimization it was 
● set optimizer_switch='$subquery_optimization=off'
102 07:48:08 AM 
Thanks! 
Q & A

Mais conteúdo relacionado

Mais procurados

ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...Sergey Petrunya
 
Percona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuningPercona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuningSergey Petrunya
 
New features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in actionNew features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in actionSveta Smirnova
 
MySQL Query tuning 101
MySQL Query tuning 101MySQL Query tuning 101
MySQL Query tuning 101Sveta Smirnova
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
Introducing new SQL syntax and improving performance with preparse Query Rewr...
Introducing new SQL syntax and improving performance with preparse Query Rewr...Introducing new SQL syntax and improving performance with preparse Query Rewr...
Introducing new SQL syntax and improving performance with preparse Query Rewr...Sveta Smirnova
 
Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sSveta Smirnova
 
Troubleshooting MySQL Performance
Troubleshooting MySQL PerformanceTroubleshooting MySQL Performance
Troubleshooting MySQL PerformanceSveta Smirnova
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
Optimizing Queries with Explain
Optimizing Queries with ExplainOptimizing Queries with Explain
Optimizing Queries with ExplainMYXPLAIN
 
MariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerMariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerSergey Petrunya
 
Advanced MySQL Query and Schema Tuning
Advanced MySQL Query and Schema TuningAdvanced MySQL Query and Schema Tuning
Advanced MySQL Query and Schema TuningMYXPLAIN
 
Performance Schema for MySQL Troubleshooting
 Performance Schema for MySQL Troubleshooting Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingSveta Smirnova
 
Applied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System PresentationApplied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System PresentationRichard Crowley
 
Character Encoding - MySQL DevRoom - FOSDEM 2015
Character Encoding - MySQL DevRoom - FOSDEM 2015Character Encoding - MySQL DevRoom - FOSDEM 2015
Character Encoding - MySQL DevRoom - FOSDEM 2015mushupl
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace WalkthroughSergey Petrunya
 
Moving to the NoSQL side: MySQL JSON functions
 Moving to the NoSQL side: MySQL JSON functions Moving to the NoSQL side: MySQL JSON functions
Moving to the NoSQL side: MySQL JSON functionsSveta Smirnova
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query TuningSveta Smirnova
 
Efficient Pagination Using MySQL
Efficient Pagination Using MySQLEfficient Pagination Using MySQL
Efficient Pagination Using MySQLEvan Weaver
 

Mais procurados (20)

ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...ANALYZE for executable statements - a new way to do optimizer troubleshooting...
ANALYZE for executable statements - a new way to do optimizer troubleshooting...
 
Percona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuningPercona live-2012-optimizer-tuning
Percona live-2012-optimizer-tuning
 
New features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in actionNew features in Performance Schema 5.7 in action
New features in Performance Schema 5.7 in action
 
Explain
ExplainExplain
Explain
 
MySQL Query tuning 101
MySQL Query tuning 101MySQL Query tuning 101
MySQL Query tuning 101
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
Introducing new SQL syntax and improving performance with preparse Query Rewr...
Introducing new SQL syntax and improving performance with preparse Query Rewr...Introducing new SQL syntax and improving performance with preparse Query Rewr...
Introducing new SQL syntax and improving performance with preparse Query Rewr...
 
Introduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]sIntroduction to MySQL Query Tuning for Dev[Op]s
Introduction to MySQL Query Tuning for Dev[Op]s
 
Troubleshooting MySQL Performance
Troubleshooting MySQL PerformanceTroubleshooting MySQL Performance
Troubleshooting MySQL Performance
 
Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL TroubleshootingPerformance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
Optimizing Queries with Explain
Optimizing Queries with ExplainOptimizing Queries with Explain
Optimizing Queries with Explain
 
MariaDB 10.0 Query Optimizer
MariaDB 10.0 Query OptimizerMariaDB 10.0 Query Optimizer
MariaDB 10.0 Query Optimizer
 
Advanced MySQL Query and Schema Tuning
Advanced MySQL Query and Schema TuningAdvanced MySQL Query and Schema Tuning
Advanced MySQL Query and Schema Tuning
 
Performance Schema for MySQL Troubleshooting
 Performance Schema for MySQL Troubleshooting Performance Schema for MySQL Troubleshooting
Performance Schema for MySQL Troubleshooting
 
Applied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System PresentationApplied Partitioning And Scaling Your Database System Presentation
Applied Partitioning And Scaling Your Database System Presentation
 
Character Encoding - MySQL DevRoom - FOSDEM 2015
Character Encoding - MySQL DevRoom - FOSDEM 2015Character Encoding - MySQL DevRoom - FOSDEM 2015
Character Encoding - MySQL DevRoom - FOSDEM 2015
 
Optimizer Trace Walkthrough
Optimizer Trace WalkthroughOptimizer Trace Walkthrough
Optimizer Trace Walkthrough
 
Moving to the NoSQL side: MySQL JSON functions
 Moving to the NoSQL side: MySQL JSON functions Moving to the NoSQL side: MySQL JSON functions
Moving to the NoSQL side: MySQL JSON functions
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query Tuning
 
Efficient Pagination Using MySQL
Efficient Pagination Using MySQLEfficient Pagination Using MySQL
Efficient Pagination Using MySQL
 

Semelhante a Advanced Query Optimizer Tuning and Analysis

Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)Valeriy Kravchuk
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf TuningHighLoad2009
 
How to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraHow to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraSveta Smirnova
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?Mydbops
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 MinutesSveta Smirnova
 
OSMC 2008 | Monitoring MySQL by Geert Vanderkelen
OSMC 2008 | Monitoring MySQL by Geert VanderkelenOSMC 2008 | Monitoring MySQL by Geert Vanderkelen
OSMC 2008 | Monitoring MySQL by Geert VanderkelenNETWAYS
 
MariaDB 10.4 New Features
MariaDB 10.4 New FeaturesMariaDB 10.4 New Features
MariaDB 10.4 New FeaturesFromDual GmbH
 
IT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New FeaturesIT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New FeaturesFromDual GmbH
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL IndexingMYXPLAIN
 
LVOUG meetup #4 - Case Study 10g to 11g
LVOUG meetup #4 - Case Study 10g to 11gLVOUG meetup #4 - Case Study 10g to 11g
LVOUG meetup #4 - Case Study 10g to 11gMaris Elsins
 
DB Floripa - ProxySQL para MySQL
DB Floripa - ProxySQL para MySQLDB Floripa - ProxySQL para MySQL
DB Floripa - ProxySQL para MySQLMarcelo Altmann
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query OptimizationAnju Garg
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015Dave Stokes
 
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015Dave Stokes
 

Semelhante a Advanced Query Optimizer Tuning and Analysis (20)

Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
 
Perf Tuning Short
Perf Tuning ShortPerf Tuning Short
Perf Tuning Short
 
Highload Perf Tuning
Highload Perf TuningHighload Perf Tuning
Highload Perf Tuning
 
How to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with GaleraHow to Avoid Pitfalls in Schema Upgrade with Galera
How to Avoid Pitfalls in Schema Upgrade with Galera
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
 
MySQL Performance Schema in 20 Minutes
 MySQL Performance Schema in 20 Minutes MySQL Performance Schema in 20 Minutes
MySQL Performance Schema in 20 Minutes
 
OSMC 2008 | Monitoring MySQL by Geert Vanderkelen
OSMC 2008 | Monitoring MySQL by Geert VanderkelenOSMC 2008 | Monitoring MySQL by Geert Vanderkelen
OSMC 2008 | Monitoring MySQL by Geert Vanderkelen
 
MariaDB 10.4 New Features
MariaDB 10.4 New FeaturesMariaDB 10.4 New Features
MariaDB 10.4 New Features
 
IT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New FeaturesIT Tage 2019 MariaDB 10.4 New Features
IT Tage 2019 MariaDB 10.4 New Features
 
MySQLinsanity
MySQLinsanityMySQLinsanity
MySQLinsanity
 
Need for Speed: MySQL Indexing
Need for Speed: MySQL IndexingNeed for Speed: MySQL Indexing
Need for Speed: MySQL Indexing
 
LVOUG meetup #4 - Case Study 10g to 11g
LVOUG meetup #4 - Case Study 10g to 11gLVOUG meetup #4 - Case Study 10g to 11g
LVOUG meetup #4 - Case Study 10g to 11g
 
DB Floripa - ProxySQL para MySQL
DB Floripa - ProxySQL para MySQLDB Floripa - ProxySQL para MySQL
DB Floripa - ProxySQL para MySQL
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
Mysql tracing
Mysql tracingMysql tracing
Mysql tracing
 
Mysql tracing
Mysql tracingMysql tracing
Mysql tracing
 
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015MySQL 5.7. Tutorial - Dutch PHP Conference 2015
MySQL 5.7. Tutorial - Dutch PHP Conference 2015
 
MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015MySQL 5.7 Tutorial Dutch PHP Conference 2015
MySQL 5.7 Tutorial Dutch PHP Conference 2015
 

Mais de MYXPLAIN

Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksMYXPLAIN
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index CookbookMYXPLAIN
 
Are You Getting the Best of your MySQL Indexes
Are You Getting the Best of your MySQL IndexesAre You Getting the Best of your MySQL Indexes
Are You Getting the Best of your MySQL IndexesMYXPLAIN
 
How to Design Indexes, Really
How to Design Indexes, ReallyHow to Design Indexes, Really
How to Design Indexes, ReallyMYXPLAIN
 
MySQL 5.6 Performance
MySQL 5.6 PerformanceMySQL 5.6 Performance
MySQL 5.6 PerformanceMYXPLAIN
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MYXPLAIN
 
56 Query Optimization
56 Query Optimization56 Query Optimization
56 Query OptimizationMYXPLAIN
 
Tools and Techniques for Index Design
Tools and Techniques for Index DesignTools and Techniques for Index Design
Tools and Techniques for Index DesignMYXPLAIN
 
Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6MYXPLAIN
 
The Power of MySQL Explain
The Power of MySQL ExplainThe Power of MySQL Explain
The Power of MySQL ExplainMYXPLAIN
 
Improving Performance with Better Indexes
Improving Performance with Better IndexesImproving Performance with Better Indexes
Improving Performance with Better IndexesMYXPLAIN
 
Explaining the MySQL Explain
Explaining the MySQL ExplainExplaining the MySQL Explain
Explaining the MySQL ExplainMYXPLAIN
 
Covering indexes
Covering indexesCovering indexes
Covering indexesMYXPLAIN
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewMYXPLAIN
 
Advanced query optimization
Advanced query optimizationAdvanced query optimization
Advanced query optimizationMYXPLAIN
 

Mais de MYXPLAIN (15)

Query Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New TricksQuery Optimization with MySQL 5.6: Old and New Tricks
Query Optimization with MySQL 5.6: Old and New Tricks
 
MySQL Index Cookbook
MySQL Index CookbookMySQL Index Cookbook
MySQL Index Cookbook
 
Are You Getting the Best of your MySQL Indexes
Are You Getting the Best of your MySQL IndexesAre You Getting the Best of your MySQL Indexes
Are You Getting the Best of your MySQL Indexes
 
How to Design Indexes, Really
How to Design Indexes, ReallyHow to Design Indexes, Really
How to Design Indexes, Really
 
MySQL 5.6 Performance
MySQL 5.6 PerformanceMySQL 5.6 Performance
MySQL 5.6 Performance
 
MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6MySQL Indexing - Best practices for MySQL 5.6
MySQL Indexing - Best practices for MySQL 5.6
 
56 Query Optimization
56 Query Optimization56 Query Optimization
56 Query Optimization
 
Tools and Techniques for Index Design
Tools and Techniques for Index DesignTools and Techniques for Index Design
Tools and Techniques for Index Design
 
Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6Powerful Explain in MySQL 5.6
Powerful Explain in MySQL 5.6
 
The Power of MySQL Explain
The Power of MySQL ExplainThe Power of MySQL Explain
The Power of MySQL Explain
 
Improving Performance with Better Indexes
Improving Performance with Better IndexesImproving Performance with Better Indexes
Improving Performance with Better Indexes
 
Explaining the MySQL Explain
Explaining the MySQL ExplainExplaining the MySQL Explain
Explaining the MySQL Explain
 
Covering indexes
Covering indexesCovering indexes
Covering indexes
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
Advanced query optimization
Advanced query optimizationAdvanced query optimization
Advanced query optimization
 

Último

Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Communityprachaibot
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书rnrncn29
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming languageSmritiSharma901052
 
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfPaper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfNainaShrivastava14
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
signals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsignals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsapna80328
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdfDEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdfAkritiPradhan2
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 

Último (20)

Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Community
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming language
 
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdfPaper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
Paper Tube : Shigeru Ban projects and Case Study of Cardboard Cathedral .pdf
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
signals in triangulation .. ...Surveying
signals in triangulation .. ...Surveyingsignals in triangulation .. ...Surveying
signals in triangulation .. ...Surveying
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdfDEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
 
Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 

Advanced Query Optimizer Tuning and Analysis

  • 1. Advanced query optimizer tuning and analysis Sergei Petrunia Timour Katchaounov Monty Program Ab MySQL Conference And Expo 2013
  • 2. 2 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 3. Is there a problem with query optimizer? 3 07:48:08 AM • Database performance is affected by many factors • One of them is the query optimizer • Is my performance problem caused by the optimizer?
  • 4. Sings that there is a query optimizer problem • Some (not all) queries are slow • A query seems to run longer than it ought to – And examines more records than it ought to • Usually, query remains slow regardless of other activity on the server 4 07:48:08 AM
  • 5. Catching slow queries, the old ways 5 07:48:08 AM ● Watch the Slow query log – Percona Server/MariaDB: --log_slow_verbosity=query_plan # Thread_id: 1 Schema: dbt3sf10 QC_hit: No # Query_time: 2.452373 Lock_time: 0.000113 Rows_sent: 0 Rows_examined: 1500000 # Full_scan: Yes Full_join: No Tmp_table: No Tmp_table_on_disk: No # Filesort: No Filesort_on_disk: No Merge_passes: 0 SET timestamp=1333385770; select * from customer where c_acctbal < -1000; – Run pt-query-digest on the log • Run SHOW PROCESSLIST periodically
  • 6. The new way: SHOW PROCESSLIST + SHOW EXPLAIN • Available in MariaDB 10.0+ • Displays EXPLAIN of a running statement MariaDB> show processlist; +--+----+---------+-------+-------+----+------------+-------------------------... |Id|User|Host |db |Command|Time|State |Info +--+----+---------+-------+-------+----+------------+-------------------------... | 1|root|localhost|dbt3sf1|Query | 10|Sending data|select max(o_totalprice) ... | 2|root|localhost|dbt3sf1|Query | 0|init |show processlist +--+----+---------+-------+-------+----+------------+-------------------------... MariaDB> show explain for 1; +--+-----------+------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |orders|ALL |NULL |NULL|NULL |NULL|1498194|Using where| +--+-----------+------+----+-------------+----+-------+----+-------+-----------+ MariaDB [dbt3sf1]> show warnings; +-----+----+-----------------------------------------------------------------+ |Level|Code|Message | +-----+----+-----------------------------------------------------------------+ |Note |1003|select max(o_totalprice) from orders where year(o_orderDATE)=1995| +-----+----+-----------------------------------------------------------------+ 6 07:48:08 AM
  • 7. 7 07:48:08 AM SHOW EXPLAIN usage ● Intended usage – SHOW PROCESSLIST ... – SHOW EXPLAIN FOR ... ● Why not just run EXPLAIN again – Difficult to replicate setups ● Temporary tables ● Optimizer settings ● Storage engine's index statistics ● ... – No uncertainty about whether you're looking at the same query plan or not.
  • 8. Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 8 07:48:08 AM ● use performance_schema ● Many ways to analyze via queries – events_statements_summary_by_digest ● count_star, sum_timer_wait, min_timer_wait, avg_timer_wait, max_timer_wait ● digest_text, digest ● sum_rows_examined, sum_created_tmp_disk_tables, sum_select_full_join – events_statements_history ● sql_text, digest_text, digest ● timer_start, timer_end, timer_wait ● rows_examined, created_tmp_disk_tables, select_full_join 8
  • 9. Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] • Modified Q18 from DBT3 select c_name, c_custkey, o_orderkey, o_orderdate, 9 07:48:08 AM o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > ? and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate LIMIT 10; • App executes Q18 many times with ? = 550000, 500000, 400000, ... 9
  • 10. Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] ● Find candidate slow queries ● Simple tests: select_full_join > 0, created_tmp_disk_tables > 0, etc ● Complex conditions: max execution time > X sec OR min/max time vary a lot: select max_timer_wait/avg_timer_wait as max_ratio, avg_timer_wait/min_timer_wait as min_ratio from events_statements_summary_by_digest where max_timer_wait > 1000000000000 or max_timer_wait / avg_timer_wait > 2 or avg_timer_wait / min_timer_wait > 2G 10 07:48:08 AM
  • 11. Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] 11 07:48:08 AM *************************** 5. row *************************** DIGEST: 3cd7b881cbc0102f65fe8a290ec1bd6b DIGEST_TEXT: SELECT `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` , SUM ( `l_quantity` ) FROM `customer` , `orders` , `lineitem` WHERE `o_totalprice` > ? AND `c_custkey` = `o_custkey` AND `o_orderkey` = `l_orderkey` GROUP BY `c_name` , `c_custkey` , `o_orderkey` , `o_orderdate` , `o_totalprice` ORDER BY `o_totalprice` DESC , `o_orderdate` LIMIT ? COUNT_STAR: 3 SUM_TIMER_WAIT: 3251758347000 MIN_TIMER_WAIT: 3914209000 → 0.0039 sec AVG_TIMER_WAIT: 1083919449000 MAX_TIMER_WAIT: 3204044053000 → 3.2 sec SUM_LOCK_TIME: 555000000 SUM_ROWS_SENT: 25 SUM_ROWS_EXAMINED: 0 SUM_CREATED_TMP_DISK_TABLES: 0 SUM_CREATED_TMP_TABLES: 3 SUM_SELECT_FULL_JOIN: 0 SUM_SELECT_RANGE: 3 SUM_SELECT_SCAN: 0 SUM_SORT_RANGE: 0 SUM_SORT_ROWS: 25 SUM_SORT_SCAN: 3 SUM_NO_INDEX_USED: 0 SUM_NO_GOOD_INDEX_USED: 0 FIRST_SEEN: 1970-01-01 03:38:27 LAST_SEEN: 1970-01-01 03:38:43 max_ratio: 2.9560 min_ratio: 276.9192 High variance of execution time
  • 12. Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] ● Check the actual queries and constants ● The events_statements_history table select timer_wait/1000000000000 as exec_time, sql_text from events_statements_history where digest in (select digest from events_statements_summary_by_digest where max_timer_wait > 1000000000000 12 07:48:08 AM or max_timer_wait / avg_timer_wait > 2 or avg_timer_wait / min_timer_wait > 2) order by timer_wait;
  • 13. Catching slow queries (NEW) PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] +-----------+-----------------------------------------------------------------------------------+ | exec_time | sql_text | +-----------+-----------------------------------------------------------------------------------+ | 0.0039 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 550000 and c_custkey = o_custkey ... LIMIT 10 | | 0.0438 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 500000 and c_custkey = o_custkey ... LIMIT 10 | | 3.2040 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 400000 and c_custkey = o_custkey ... LIMIT 10 | +-----------+-----------------------------------------------------------------------------------+ Observation: orders.o_totalprice > ? is less and less selective 13 07:48:08 AM
  • 14. Actions after finding the slow query Bad query plan – Rewrite the query – Force a good query plan • Bad optimizer settings – Do tuning • Query is inherently complex – Don't waste time with it – Look for other solutions. 14 07:48:08 AM
  • 15. 15 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 16. o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506' ● Run the query: 19 rows in set (7.65 sec) ● Check the query plan: +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 16 07:48:08 AM Consider a simple select select * from orders where • 15M rows were scanned, 19 rows in output • Query plan seems inefficient – (note: this logic doesn't directly apply to group/order by queries).
  • 17. select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506' +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ | 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 15084733 | Using where | +----+-------------+--------+------+---------------+------+---------+------+----------+-------------+ 17 07:48:08 AM Query plan analysis • Entire table is scanned • WHERE condition checked after records are read – Not used to limit #examined rows.
  • 18. o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506' +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where| +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ 18 07:48:08 AM Let's add an index alter table orders add key i_o_orderdate (o_orderdate); select * from orders where ● Query time: 19 rows in set (0.76 sec) • Outcome – Down to reading 300K rows – Still, 300K >> 19 rows.
  • 19. Finding out which indexes to add select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' and o_clerk='Clerk#000009506' Check selectivity of conditions that will use the index o_orderDate BETWEEN '1992-06-06' and '1992-07-06'; 19 07:48:08 AM ● index (o_orderdate) select count(*) from orders where 306322 rows ● index (o_clerk) select count(*) from orders where o_clerk='Clerk#000009506' 1507 rows.
  • 20. Try adding composite indexes +--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ |1 |SIMPLE |orders|range|i_o_clerk_...|i_o_clerk_date|20 |NULL|19 |Using where| +--+-----------+------+-----+-------------+--------------+-------+----+----+-----------+ +--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ |1 |SIMPLE |orders|range|i_o_date_c...|i_o_date_clerk|20 |NULL|360354|Using where| +--+-----------+------+-----+-------------+--------------+-------+----+------+-----------+ 20 07:48:08 AM ● index (o_clerk, o_orderdate) ● index (o_orderdate, o_clerk) Bingo! 100% efficiency Much worse! • If condition uses multiple columns, composite index will be most efficient • Order of column matters – Explanation why is outside of scope of this tutorial. Covered in last year's tutorial
  • 21. Conditions must be in SARGable form • Condition must represent a range • It must have form that is recognized by the optimizer o_orderDate BETWEEN '1992-06-01' and '1992-06-30' day(o_orderDate)=1992 and month(o_orderdate)=6 TO_DAYS(o_orderDATE) between TO_DAYS('1992-06-06') and TO_DAYS('1992-07-06') 21 07:48:08 AM o_clerk='Clerk#000009506' o_clerk LIKE 'Clerk#000009506' o_clerk LIKE '%Clerk#000009506%'        column IN (1,10,15,21, ...) (col1, col2) IN ( (1,1), (2,2), (3,3), …).
  • 22. New in MySQL-5.6: optimizer_trace 22 07:48:08 AM ● Lets you see the ranges set optimizer_trace=1; explain select * from orders where o_orderDATE between '1992-06-01' and '1992-07-03' and o_orderdate not in ('1992-01-01', '1992-06-12','1992-07-04') select * from information_schema.optimizer_traceG ● Will print a big JSON struct ● Search for range_scan_alternatives.
  • 23. New in MySQL-5.6: optimizer_trace 23 07:48:08 AM ... "range_scan_alternatives": [ { "index": "i_o_orderdate", "ranges": [ "1992-06-01 <= o_orderDATE < 1992-06-12", "1992-06-12 < o_orderDATE <= 1992-07-03" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 319082, "cost": 382900, "chosen": true }, { "index": "i_o_date_clerk", "ranges": [ "1992-06-01 <= o_orderDATE < 1992-06-12", "1992-06-12 < o_orderDATE <= 1992-07-03" ], "index_dives_for_eq_ranges": true, "rowid_ordered": false, "using_mrr": false, "index_only": false, "rows": 406336, "cost": 487605, "chosen": false, "cause": "cost" } ], ... ● Considered ranges are shown in range_scan_alternatives section ● This is actually original use case of optimizer_trace ● Alas, recent mysql-5.6 displays misleading info about ranges on multi-component keys (will file a bug) ● Still, very useful.
  • 24. 24 07:48:08 AM Source of #rows estimates for range select * from orders where o_orderDate BETWEEN '1992-06-06' and '1992-07-06' +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows |Extra | +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ |1 |SIMPLE |orders|range|i_o_orderdate|i_o_orderdate|4 |NULL|306322|Using where| +--+-----------+------+-----+-------------+-------------+-------+----+------+-----------+ ? • “records_in_range” estimate • Done by diving into index • Usually is fairly accurate • Not affected by ANALYZE TABLE.
  • 25. 25 07:48:08 AM Simple selects: conclusions • Efficiency == “#rows_scanned is close to #rows_returned” • Indexes and WHERE conditions reduce #rows scanned • Index estimates are usually accurate • Multi-column indexes – “handle” conditions on multiple columns – Order of columns in the index matters • optimizer_trace allows to view the ranges – But misrepresents ranges over multi-column indexes.
  • 26. 26 07:48:08 AM Now, will skip some topics One can also speedup simple selects with ● index_merge access method ● index access method ● Index Condition Pushdown We don't have time for these now, check out the last year's tutorial.
  • 27. 27 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 28. • “Customers with their orders” 28 07:48:08 AM A simple join select * from customer, orders where c_custkey=o_custkey
  • 29. Execution: Nested Loops join select * from customer, orders where c_custkey=o_custkey 29 07:48:08 AM for each customer C { for each order O { if (C.c_custkey == O.o_custkey) produce record(C, O); } } • Complexity: – Scans table customer – For each record in customer, scans table orders • Is this ok?
  • 30. Execution: Nested loops join (2) select * from customer, orders where c_custkey=o_custkey 30 07:48:08 AM for each customer C { for each order O { if (C.c_custkey == O.o_custkey) produce record(C, O); } } • EXPLAIN: +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | |1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+
  • 31. Execution: Nested loops join (3) select * from customer, orders where c_custkey=o_custkey 31 07:48:08 AM for each customer C { for each order O { if (C.c_custkey == O.o_custkey) produce record(C, O); } } rows to read • EXPLAIN: from customer +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | |1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ rows to read from orders c_custkey=o_custkey
  • 32. Execution: Nested loops join (4) select * from customer, orders where c_custkey=o_custkey +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ |1 |SIMPLE |customer|ALL |NULL |NULL|NULL |NULL|148749 | | |1 |SIMPLE |orders |ALL |NULL |NULL|NULL |NULL|1493631|Using where| +--+-----------+--------+----+-------------+----+-------+----+-------+-----------+ • Scan a 1,493,361-row table 148,749 times – Consider 1,493,361 * 148,749 row combinations • Is this query inherently complex? – We know each customer has his own orders – size(customer x orders)= size(orders) – Lower bound is 1,493,361 + 148,749 + costs to match customer<->order. 32 07:48:08 AM
  • 33. Using index for join: ref access alter table orders add index i_o_custkey(o_custkey) select * from customer, orders where c_custkey=o_custkey 33 07:48:08 AM
  • 34. select * from customer, orders where c_custkey=o_custkey 34 07:48:08 AM ref access - analysis +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |148749| | |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----+ ● One ref lookup scans 7 rows. ● In total: 7 * 148,749=1,041,243 rows – `orders` has 1.4M rows – no redundant reads from `orders` ● The whole query plan – Reads all customers – Reads 1M orders (of 1.4M) ● Efficient!
  • 35. Conditions that can be used for ref access 35 07:48:08 AM ● Can use equalities – tbl.key=other_table.col – tbl.key=const – tbl.key IS NULL ● For multipart keys, will use largest prefix – keypart1=... AND keypart2= … AND keypartK=... .
  • 36. Conditions that can't be used for ref access ● Doesn't work for non-equalities 36 07:48:08 AM t1.key BETWEEN t2.col1 AND t2.col2 ● Doesn't work for OR-ed equalities t1.key=t2.col1 OR t1.key=t2.col2 – Except for ref_or_null t1.key=... OR t1.key IS NULL ● Doesn't “combine” ref and range access – t.keypart1 BETWEEN c1 AND c2 AND t.keypart2=t2.col – t.keypart2 BETWEEN c1 AND c2 AND t.keypart1=t2.col .
  • 37. 37 07:48:08 AM Is ref always efficient? ● Efficient, if column has many different values – Best case – unique index (eq_ref) ● A few different values – not useful ● Skewed distribution: depends on which part the join touches good bad depends
  • 38. ref access estimates - index statistics 38 07:48:08 AM • How many rows will match tbl.key_column = $value for an arbitrary $value? • Index statistics show keys from orders where key_name='i_o_custkey' *************************** 1. row *************** Table: orders Non_unique: 1 Key_name: i_o_custkey Seq_in_index: 1 Column_name: o_custkey Collation: A Cardinality: 214462 Sub_part: NULL Packed: NULL Null: YES Index_type: BTREE show table status like 'orders' *************************** 1. row **** Name: orders Engine: InnoDB Version: 10 Row_format: Compact Rows: 1495152 Avg_row_length: 133 Data_length: 199966720 Max_data_length: 0 Index_length: 122421248 Data_free: 6291456 ... average = Rows /Cardinality = 1495152 / 214462 = 6.97.
  • 39. 39 07:48:08 AM ref access – conclusions ● Based on t.key=... equality conditions ● Can make joins very efficient ● Relies on index statistics for estimates.
  • 40. 40 07:48:08 AM Optimizer statistics ● MySQL/Percona Server – Index statistics – Persistent/transient InnoDB stats ● MariaDB – Index statistics, persistent/transient ● Same as Percona Server (via XtraDB) – Persistent, engine-independent, index-independent statistics.
  • 41. 41 07:48:08 AM Index statistics ● Cardinality allows to calculate a table-wide average #rows-per-key-prefix ● It is a statistical value (inexact) ● Exact collection procedure depends on the storage engine – InnoDB – random sampling – MyISAM – index scan – Engine-independent – index scan.
  • 42. 42 07:48:08 AM Index statistics in MySQL 5.6 ● Sample [8] random index leaf pages ● Table statistics (stored) – rows - estimated number of rows in a table – Other stats not used by optimizer ● Index statistics (stored) – fields - #fields in the index – rows_per_key - rows per 1 key value, per prefix fields ([1 column value], [2 columns value], [3 columns value], …) – Other stats not used by optimizer.
  • 43. 43 07:48:08 AM Index statics updates ● Statistics updated when: – ANALYZE TABLE tbl_name [, tbl_name] … – SHOW TABLE STATUS, SHOW INDEX – Access to INFORMATION_SCHEMA.[TABLES| STATISTICS] – A table is opened for the first time (after server restart) – A table has changed >10% – When InnoDB Monitor is turned ON.
  • 44. 44 07:48:08 AM Displaying optimizer statistics ● MySQL 5.5, MariaDB 5.3, and older – Issue SQL statements to count rows/keys – Indirectly, look at EXPLAIN for simple queries ● MariaDB 5.5, Percona Server 5.5 (using XtraDB) – information_schema.[innodb_index_stats, innodb_table_stats] – Read-only, always visible ● MySQL 5.6 – mysql.[innodb_index_stats, innodb_table_stats] – User updatetable – Only available if innodb_analyze_is_persistent=ON ● MariaDB 10.0 – Persistent updateable tables mysql.[index_stats, column_stats, table_stats] – User updateable – + current XtraDB mechanisms.
  • 45. 45 07:48:08 AM Plan [in]stability ● Statistics may vary a lot (orders) MariaDB [dbt3]> select * from information_schema.innodb_index_stats; +------------+-----------------+--------------+ +---------------+ | table_name | index_name | rows_per_key | | rows_per_key | error (actual) +------------+-----------------+--------------+ +---------------+ | partsupp | PRIMARY | 3, 1 | | 4, 1 | 25% | partsupp | i_ps_partkey | 3, 0 | => | 4, 1 | 25% (4) | partsupp | i_ps_suppkey | 64, 0 | | 91, 1 | 30% (80) | orders | i_o_orderdate | 9597, 1 | | 1660956, 0 | 99% (6234) | orders | i_o_custkey | 15, 1 | | 15, 0 | 0% (15) | lineitem | i_l_receiptdate | 7425, 1, 1 | | 6665850, 1, 1 | 99.9% (23477) +------------+-----------------+--------------+ +---------------+ MariaDB [dbt3]> select * from information_schema.innodb_table_stats; +-----------------+----------+ +----------+ | table_name | rows | | rows | +-----------------+----------+ +----------+ | partsupp | 6524766 | | 9101065 | 28% (8000000) | orders | 15039855 | ==> | 14948612 | 0.6% (15000000) | lineitem | 60062904 | | 59992655 | 0.1% (59986052) . +-----------------+----------+ +----------+
  • 46. Controlling statistics (MySQL 5.6) ● Persistent and user-updatetable InnoDB statistics – innodb_analyze_is_persistent = ON, – updated manually by ANALYZE TABLE or – automatically by innodb_stats_auto_recalc = ON ● Control the precision of sampling [default 8] – innodb_stats_persistent_sample_pages, – innodb_stats_transient_sample_pages ● No new statistics compared to older versions. 46 07:48:08 AM
  • 47. Controlling statistics (MariaDB 10.0) Current XtraDB index statistics + ● Engine-independent, persistent, user-updateable statistics ● Precise ● Additional statistics per column (even when there is no index): – min_value, max_value: minimum/maximum value per 47 07:48:08 AM column – nulls_ratio: fraction of null values in a column – avg_length: average size of values in a column – avg_frequency: average number of rows with the same value.
  • 48. 48 07:48:08 AM Join condition pushdown
  • 49. 49 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+.
  • 50. 50 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  • 51. 51 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  • 52. 52 07:48:08 AM Join condition pushdown select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ ● Conjunctive (ANDed) conditions are split into parts ● Each part is attached as early as possible – Either as “Using where” – Or as table access method.
  • 53. Observing join condition pushdown 53 07:48:08 AM EXPLAIN: { "query_block": { "select_id": 1, "nested_loop": [ { "table": { "table_name": "orders", "access_type": "ALL", "possible_keys": [ "i_o_custkey" ], "rows": 1499715, "filtered": 100, "attached_condition": "((`dbt3sf1`.`orders`.`o_orderpriority` = '1-URGENT') and (`dbt3sf1`.`orders`.`o_custkey` is not null))" } }, { "table": { "table_name": "customer", "access_type": "eq_ref", "possible_keys": [ "PRIMARY" ], "key": "PRIMARY", "used_key_parts": [ "c_custkey" ], "key_length": "4", "ref": [ "dbt3sf1.orders.o_custkey" ], "rows": 1, "filtered": 100, "attached_condition": "(`dbt3sf1`.`customer`.`c_acctbal` < <cache>(-(500)))" } ● Before mysql-5.6: EXPLAIN shows only “Using where” – The condition itself only visible in debug trace ● Starting from 5.6: EXPLAIN FORMAT=JSON shows attached conditions.
  • 54. Reasoning about join plan efficiency 54 07:48:08 AM select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ First table, “customer” ● type=ALL, 150 K rows ● select count(*) from customer where c_acctbal < -500 gives 6804. ● alter table customer add index (c_acctbal).
  • 55. Reasoning about join plan efficiency +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ 55 07:48:08 AM select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; First table, “customer” ● type=ALL, 150 K rows ● select count(*) from customer where c_acctbal < -500 gives 6804. ● alter table customer add index (c_acctbal) +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ Now, access to 'customer' is efficient.
  • 56. Reasoning about join plan efficiency +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 56 07:48:08 AM select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; Second table, “orders” ● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT' ● ref access uses only c_custkey=o_custkey ● What about o_orderpriority='1-URGENT'?.
  • 57. 57 07:48:08 AM ●o_orderpriority='1-URGENT' o_orderpriority='1-URGENT' ● select count(*) from orders – 1.5M rows ● select count(*) from orders where o_orderpriority='1-URGENT' - 300K rows ● 300K / 1.5M = 0.2
  • 58. Reasoning about join plan efficiency +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ 58 07:48:08 AM select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; Second table, “orders” ● Attached condition: c_custkey=o_custkey and o_orderpriority='1-URGENT' ● ref access uses only c_custkey=o_custkey ● What about o_orderpriority='1-URGENT'? Selectivity= 0.2 – Can examine 7*0.2=1.4 rows, 6802 times if we add an index: alter table orders add index (o_custkey, o_orderpriority) or alter table orders add index (o_orderpriority, o_custkey)
  • 59. Reasoning about join plan efficiency - summary +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ |1 |SIMPLE |customer|range|PRIMARY,c_...|c_acctbal |9 |NULL |6802|Using index condition| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where | +--+-----------+--------+-----+-------------+-----------+-------+------------------+----+---------------------+ Basic* approach to evaluation of join plan efficiency: for each table $T in the join order { Look at conditions attached to table $T (condition must use table $T, may also use previous tables) Does access method used with $T make a good use of attached conditions? } * some other details may also affect join performance 59 07:48:08 AM
  • 60. 60 07:48:08 AM Attached conditions
  • 61. 61 07:48:08 AM Attached conditions ● Ideally, should be used for table access ● Not all conditions can be used [at the same time] – Unused ones are still useful – They reduce number of scans for subsequent tables select * from customer, orders where c_custkey=o_custkey and c_acctbal < -500 and o_orderpriority='1-URGENT'; +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |id|select_type|table |type|possible_keys|key |key_len|ref |rows |Extra | +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY |NULL |NULL |NULL |150081|Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 |Using where| +--+-----------+--------+----+-------------+-----------+-------+------------------+------+-----------+
  • 62. Informing optimizer about attached conditions Currently: a range access that's too expensive to use +--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ |id|select_type|table |type|possible_keys |key |key_len|ref |rows |filtered|Extra | +--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ |1 |SIMPLE |customer|ALL |PRIMARY,c_acctbal|NULL |NULL |NULL |150081| 36.22 |Using where| |1 |SIMPLE |orders |ref |i_o_custkey |i_o_custkey|5 |customer.c_custkey|7 | 100.00 |Using where| +--+-----------+--------+----+-----------------+-----------+-------+------------------+------+--------+-----------+ 62 07:48:08 AM explain extended select * from customer, orders where c_custkey=o_custkey and c_acctbal > 8000 and o_orderpriority='1-URGENT'; ● `orders` will be scanned 150081 * 36.22%= 54359 times ● This reduces the cost of join – Has an effect when comparing potential join plans ● => Index i_o_custkey is not used. But may help the optimizer.
  • 63. 63 07:48:08 AM Attached condition selectivity ● Unused indexes provide info about selectivity – Works, but very expensive ● MariaDB 10.0 has engine-independent statistics – Index statistics – Non-indexed Column statistics ● Histograms – Further info: Tomorrow, 2:20 pm @ Ballroom D Igor Babaev Engine-independent persistent statistics with histograms in MariaDB.
  • 64. How to check if the query plan 64 07:48:08 AM matches the reality
  • 65. 65 07:48:08 AM Check if query plan is realistic ● EXPLAIN shows what optimizer expects. It may be wrong – Out-of-date index statistics – Non-uniform data distribution ● Other DBMS: EXPLAIN ANALYZE ● MySQL: no equivalent. Instead, have – Handler counters – “User statistics” (Percona, MariaDB) – PERFORMANCE_SCHEMA
  • 66. Join analysis: example query (Q18, DBT3) <reset counters> select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_totalprice > 500000 and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice order by o_totalprice desc, o_orderdate LIMIT 10; <collect statistics> 66 07:48:08 AM
  • 67. Join analysis: handler counters (old) 67 07:48:08 AM FLUSH STATUS; => RUN QUERY SHOW STATUS LIKE "Handler%"; +----------------------------+-------+ | Handler_mrr_key_refills | 0 | | Handler_mrr_rowid_refills | 0 | | Handler_read_first | 0 | | Handler_read_key | 1646 | | Handler_read_last | 0 | | Handler_read_next | 1462 | | Handler_read_prev | 0 | | Handler_read_rnd | 10 | | Handler_read_rnd_deleted | 0 | | Handler_read_rnd_next | 184 | | Handler_tmp_update | 1096 | | Handler_tmp_write | 183 | | Handler_update | 0 | | Handler_write | 0 |
  • 68. Join analysis: USERSTAT by Facebook MariaDB, Percona Server SET GLOBAL USERSTAT=1; FLUSH TABLE_STATISTICS; FLUSH INDEX_STATISTICS; => RUN QUERY SHOW TABLE_STATISTICS; +--------------+------------+-----------+--------------+-------------------------+ | Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes | +--------------+------------+-----------+--------------+-------------------------+ | dbt3 | orders | 183 | 0 | 0 | | dbt3 | lineitem | 1279 | 0 | 0 | | dbt3 | customer | 183 | 0 | 0 | +--------------+------------+-----------+--------------+-------------------------+ SHOW INDEX_STATISTICS; +--------------+------------+-----------------------+-----------+ | Table_schema | Table_name | Index_name | Rows_read | +--------------+------------+-----------------------+-----------+ | dbt3 | customer | PRIMARY | 183 | | dbt3 | lineitem | i_l_orderkey_quantity | 1279 | | dbt3 | orders | i_o_totalprice | 183 | +--------------+------------+-----------------------+-----------+ 68 07:48:08 AM
  • 69. Join analysis: PERFORMANCE SCHEMA [MySQL 5.6, MariaDB 10.0] ● summary tables with read/write statistics 69 07:48:08 AM – table_io_waits_summary_by_table – table_io_waits_summary_by_index_usage ● Superset of the userstat tables ● More overhead ● Not possible to associate statistics with a query => truncate stats tables before running a query ● Possible bug – performance schema not ignored – Disable by UPDATE setup_consumers SET ENABLED = 'NO' where name = 'global_instrumentation';
  • 70. Analyze joins via PERFORMANCE SCHEMA: SHOW TABLE_STATISTICS analogue select object_schema, object_name, count_read, count_write, 70 07:48:08 AM sum_timer_read, sum_timer_write, ... from table_io_waits_summary_by_table where object_schema = 'dbt3' and count_star > 0; +---------------+-------------+------------+-------------+ | object_schema | object_name | count_read | count_write | +---------------+-------------+------------+-------------+ | dbt3 | customer | 183 | 0 | | dbt3 | lineitem | 1462 | 0 | | dbt3 | orders | 184 | 0 | +---------------+-------------+------------+-------------+ +----------------+-----------------+ | sum_timer_read | sum_timer_write | ... +----------------+-----------------+ | 8326528406 | 0 | | 12117332778 | 0 | | 7946312812 | 0 | +----------------+-----------------+
  • 71. Analyze joins via PERFORMANCE SCHEMA: SHOW INDEX_STATISTICS analogue select object_schema, object_name, index_name, count_read, 71 07:48:08 AM sum_timer_read, sum_timer_write, ... from table_io_waits_summary_by_index_usage where object_schema = 'dbt3' and count_star > 0 and index_name is not null; +---------------+-------------+-----------------------+------------+ | object_schema | object_name | index_name | count_read | +---------------+-------------+-----------------------+------------+ | dbt3 | customer | PRIMARY | 183 | | dbt3 | lineitem | i_l_orderkey_quantity | 1462 | | dbt3 | orders | i_o_totalprice | 184 | +---------------+-------------+-----------------------+------------+ +----------------+-----------------+ | sum_timer_read | sum_timer_write | ... +----------------+-----------------+ | 8326528406 | 0 | | 12117332778 | 0 | | 7946312812 | 0 | +----------------+-----------------+
  • 72. 72 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 73. 73 07:48:08 AM Batched joins ● Optimization for analytical queries ● Analytic queries shovel through lots of data – e.g. “average size of order in the last month” – or “pairs of goods purchased together” ● Indexes,etc won't help when you really need to look at all data ● More data means greater chance of being io-bound ● Solution: batched joins
  • 74. 74 07:48:08 AM Batched Key Access Idea
  • 75. 75 07:48:08 AM Batched Key Access Idea
  • 76. 76 07:48:08 AM Batched Key Access Idea
  • 77. 77 07:48:08 AM Batched Key Access Idea
  • 78. 78 07:48:08 AM Batched Key Access Idea
  • 79. 79 07:48:08 AM Batched Key Access Idea
  • 80. 80 07:48:08 AM Batched Key Access Idea ● Non-BKA join hits data at random ● Caches are not used efficiently ● Prefetching is not useful
  • 81. 81 07:48:08 AM Batched Key Access Idea ● BKA implementation accesses data in order ● Takes advantages of caches and prefetching
  • 82. 82 07:48:08 AM Batched Key access effect set join_cache_level=6; select max(l_extendedprice) from orders, lineitem where l_orderkey=o_orderkey and o_orderdate between $DATE1 and $DATE2 The benchmark was run with ● Various BKA buffer size ● Various size of $DATE1...$DATE2 range
  • 83. 83 07:48:08 AM Batched Key Access Performance 3000 2500 2000 1500 1000 500 0 BKA join performance depending on buffer size -2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000 query_size=1, regular query_size=1, BKA query_size=2, regular query_size=2, BKA query_size=3, regular query_size=3, BKA Buffer size, bytes Query time, sec Performance without BKA Performance with BKA, given sufficient buffer size ● 4x-10x speedup ● The more the data, the bigger the speedup ● Buffer size setting is very important.
  • 84. 84 07:48:08 AM Batched Key Access settings ● Needs to be turned on set join_buffer_size= 32*1024*1024; set join_cache_level=6; -- MariaDB set optimizer_switch='batched_key_access=on' -- MySQL 5.6 set optimizer_switch='mrr=on'; set optimizer_switch='mrr_sort_keys=on'; -- MariaDB only ● Further join_buffer_size tuning is watching – Query performance – Handler_mrr_init counter and increasing join_buffer_size until either saturates.
  • 85. 85 07:48:08 AM Batched Key Access - conclusions ● Targeted at big joins ● Needs to be enabled manually ● @@join_buffer_size is the most important setting ● MariaDB's implementation is a superset of MySQL's.
  • 86. 86 07:48:08 AM ● Introduction – What is an optimizer problem – How to catch it ● old an new tools ● Single-table selects – brief recap from 2012 ● JOINs – ref access ● index statistics – join condition pushdown – join plan efficiency – query plan vs reality ● Big I/O bound JOINs – Batched Key Access ● Aggregate functions ● ORDER BY ... LIMIT ● GROUP BY ● Subqueries
  • 87. ORDER BY 87 07:48:08 AM aggregates GROUP BY
  • 88. Aggregate functions, no GROUP BY ● COUNT, SUM, AVG, etc need to examine all rows select SUM(column) from tbl needs to examine the whole tbl. ● MIN and MAX can use index for lookup index (o_orderpriority, o_orderdate) +--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ |id|select_type|table|type|possible_keys|key |key_len|ref |rows|Extra | +--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ |1 |SIMPLE |NULL |NULL|NULL |NULL|NULL |NULL|NULL|Select tables optimized away| +--+-----------+-----+----+-------------+----+-------+----+----+----------------------------+ 88 07:48:08 AM index (o_orderdate) select max(o_orderdate) from orders select min(o_orderdate) from orders where o_orderdate > '1995-05-01' select max(o_orderdate) from orders where o_orderpriority='1-URGENT'
  • 89. ORDER BY … LIMIT Three algorithms ● Use an index to read in order ● Read one table, sort, join - “Using filesort” ● Execute join into temporary table and then sort - “Using temporary; Using filesort” 89 07:48:08 AM
  • 90. Using index to read data in order ● No special indication in EXPLAIN output ● LIMIT n: as soon as we read n records, we can stop! 90 07:48:08 AM
  • 91. A problem with LIMIT N optimization `orders` has 1.5 M rows explain select * from orders order by o_orderdate desc limit 10; +--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra| +--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ |1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 | | +--+-----------+------+-----+-------------+-------------+-------+----+----+-----+ select * from orders where o_orderpriority='1-URGENT' order by o_orderdate desc limit 10; +--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ |id|select_type|table |type |possible_keys|key |key_len|ref |rows|Extra | +--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ |1 |SIMPLE |orders|index|NULL |i_o_orderdate|4 |NULL|10 |Using where| +--+-----------+------+-----+-------------+-------------+-------+----+----+-----------+ 91 07:48:08 AM ● A problem: – 1.5M rows, 300K of them 'URGENT' – Scanning by date, when will we find 10 'URGENT' rows? – No good solution so far.
  • 92. 92 07:48:08 AM Using filesort strategy ● Have to read the entire first table ● For remaining, can apply LIMIT n ● ORDER BY can only use columns of tbl1.
  • 93. 93 07:48:08 AM Using temporary; Using filesort ● ORDER BY clause can use columns of any table ● LIMIT is applied only after executing the entire join and sorting.
  • 94. 94 07:48:08 AM ORDER BY - conclusions ● Resolving ORDER BY with index allows very efficient handling for LIMIT – Optimization for WHERE unused_condition ORDER BY … LIMIT n is challenging. ● Use sql_big_result, IGNORE INDEX FOR ORDER BY ● Using filesort – Needs all ORDER BY columns in the first table – Take advantage of LIMIT when doing join to non-first tables ● Using where; Using filesort is least efficient.
  • 95. 95 07:48:08 AM GROUP BY strategies There are three strategies ● Ordered index scan ● Loose Index Scan (LooseScan) ● Groups table (Using temporary; [Using filesort]).
  • 96. 96 07:48:08 AM Ordered index scan ● Groups are enumerated one after another ● Can compute aggregates on the fly ● Loose index scan is also able to jump to next group.
  • 97. Execution of GROUP BY with temptable 97 07:48:08 AM
  • 99. 99 07:48:08 AM Subquery optimizations ● Before MariaDB 5.3/MySQL 5.6 - “don't use subqueries” ● Queries that caused most of the pain – SELECT … FROM tbl WHERE col IN (SELECT …) - semi-joins – SELECT … FROM (SELECT …) - derived tables ● MariaDB 5.3 and MySQL 5.6 – Have common inheritance, MySQL 6.0 alpha – Huge (100x, 1000x) speedups for painful areas – Other kinds of subqueries received a speedup, too – MariaDB 5.3/5.5 has a superset of MySQL 5.6's optimizations ● 5.6 handles some un-handled edge cases, too
  • 100. 100 07:48:08 AM Tuning for subqueries ● “Before”: one execution strategy – No tuning possible ● “After”: similar to joins – Reasonable execution strategies supported – Need indexes – Need selective conditions – Support batching in most important cases ● Should be better 9x% of the time.
  • 101. What if it still picks a poor query plan? For both MariaDB and MySQL: ● Check EXPLAIN [EXTENDED], find a keyword around a 101 07:48:08 AM subquery table ● Google “site:kb.askmonty.org $subuqery_keyword” or https://kb.askmonty.org/en/subquery-optimizations-map/ ● Find which optimization it was ● set optimizer_switch='$subquery_optimization=off'
  • 102. 102 07:48:08 AM Thanks! Q & A