O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Query Optimizer: further down the rabbit hole

258 visualizações

Publicada em

OpenWorks 2019 Session
There are a number of improvements to the query optimizer in MariaDB Server 10.4. These features not only improve query performance, but provide valuable diagnostic information. Sergei Petrunia and Galina Shalygina from MariaDB begin with a technical introduction of the query optimizer, discuss the latest query optimizations, including condition pushdown into IN expressions and primary key filtering, then show how the optimizer trace can be used to troubleshoot the optimizer.

Publicada em: Software
  • Writing you here to say that this is one of the best collection of plans I've seen I'm on my 4th day and have already build a few wooden toys for my daughter! ✄✄✄ https://url.cn/ktFCrsHZ
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Query Optimizer: further down the rabbit hole

  1. 1. Query optimizer: further down the rabbit hole Sergei Petrunia Galina Shalygina Sr. Software Engineer Junior Engineer MariaDB Corporation MariaDB Corporation
  2. 2. Query Optimizer in MariaDB 10.4 ● New default optimizer settings ● Faster histogram collection ● Condition pushdown: ○ into materialized IN subqueries ○ from HAVING into WHERE ● In-memory PK filters built from range index scans ● Optimizer trace
  3. 3. New optimizer defaults
  4. 4. New default settings ● Condition selectivity computation takes more factors into account -optimizer_use_condition_selectivity=1 +optimizer_use_condition_selectivity=4 − Better query plans -use_stat_tables=NEVER +use_stat_tables=PREFERABLY_FOR_QUERIE S − Still need to use ANALYZE TABLE ... PERSISTENT to collect them ● ANALYZE PERSISTENT will build a good histogram ● Optimizer uses EITS statistics (incl. Histograms) if it is present -histogram_size=0 +histogram_size=254 -histogram_type=SINGLE_PREC_H B +histogram_type=DOUBLE_PREC_H B − Different rows / filtered in EXPLAIN output
  5. 5. New default settings (2) -eq_range_index_dive_limit=10 -eq_range_index_dive_limit=200 ● Join buffer size will auto-size itself -optimize_join_buffer_size=OFF +optimize_join_buffer_size=ON ● Large IN-lists use index statistics (cardinality) as estimate − Estimation of WHERE t.key (1,2,3,...,201) will not do 201 index dives ● will use AVG(records_per_key(t.key)) − Just following MySQL here − join_buffer_sizesetting is still relevant
  6. 6. Histograms
  7. 7. Histograms ● Available since MariaDB 10.0 (Yes) ● Used by advanced users ● Have shortcomings: − Expensive to collect − Usage is not enabled ● => Not used when they should be
  8. 8. Histogram collection ● Analyzes the whole population (“census”, not “survey”) 1. Reads all data 2. Performs an expensive computation ● MariaDB 10.4 supports “Bernoulli sampling” − Still does #1, but fixes #2 ● Configuration: − analyze_sample_percentage=100 (default) – use all data, as before − analyze_sample_percentage=0 – determine sample ratio automatically
  9. 9. Histogram use by the optimizer ● Now enabled by default ● The workflow: set analyze_sample_percentage=0; -- Optional analyze table t1 persistent for columns (col1, ...) indexes (idx1, ...); analyze table t1 persistent for all; -- Your queries here ● Now enabled by default ● The workflow: ● More details: “How to use histograms to get better performance” − Today, 1:30 pm- 2:20 pm, Gallery C
  10. 10. MariaDB condition pushdown MariaDB 10.2: 1. Pushdown conditions into non-mergeable views/derived tables MariaDB 10.4: 1. Condition pushdown from HAVING into WHERE 2. Push conditions into materialized IN subqueries
  11. 11. Condition pushdown from HAVING into WHERE
  12. 12. When it can be used ● There is a condition that depends on grouping fields only in HAVING ● There are no aggregation functions in this condition ● Special variable ‘condition_pushdown_from_having’ is set (set by default) SELECT c_name,MAX(o_totalprice) FROM customer, orders WHERE o_custkey = c_custkey GROUP BY c_name HAVING c_name = 'Customer#000000020';
  13. 13. How it is made SELECT c_name,MAX(o_totalprice) FROM customer, orders WHERE o_custkey = c_custkey AND c_name = 'Customer#000000020'; GROUP BY c_name; SELECT c_name,MAX(o_totalprice) FROM customer, orders WHERE o_custkey = c_custkey GROUP BY c_name HAVING c_name = 'Customer#000000020';
  14. 14. How it is made SELECT c_name,MAX(o_totalprice) FROM customer, orders WHERE o_custkey = c_custkey GROUP BY c_name HAVING c_name = 'Customer#000000020'; SELECT c_name,MAX(o_totalprice) FROM customer, orders WHERE o_custkey = c_custkey AND c_name = 'Customer#000000020'; GROUP BY c_name;
  15. 15. How it is made SELECT c_name,MAX(o_totalprice) FROM customer, orders WHERE o_custkey = c_custkey GROUP BY c_name HAVING c_name = 'Customer#000000020'; SELECT c_name,MAX(o_totalprice) FROM customer, orders WHERE o_custkey = c_custkey AND c_name = 'Customer#000000020'; GROUP BY c_name; ● No temporary table ● No sorting
  16. 16. Pushing down using equalities SELECT l_shipdate,l_receiptdate,MAX(l_quantity) FROM lineitem GROUP BY l_shipdate HAVING l_receiptdate > '1996-11-01' AND l_shipdate = l_receiptdate;
  17. 17. Pushing down using equalities SELECT l_shipdate,l_receiptdate,MAX(l_quantity) FROM lineitem GROUP BY l_shipdate HAVING l_receiptdate > '1996-11-01' AND l_shipdate = l_receiptdate; SELECT l_shipdate,l_receiptdate,MAX(l_quantity) FROM lineitem WHERE l_shipdate > '1996-11-01' AND l_shipdate = l_receiptdate; GROUP BY l_shipdate;
  18. 18. Where you can find it MariaDB 10.4 MySQL 8.0 PostgreSQL 11.2 Oracle 12c
  19. 19. Where you can find it MariaDB 10.4 MySQL 8.0 PostgreSQL 11.2 Oracle 12c GROUP BY t1.a HAVING (t1.a=t1.c) AND (t1.c>1); PostgreSQL will not allow it
  20. 20. Condition pushdown into materialized IN subquery
  21. 21. When it can be used ● Uncorrelated materialized semi-join IN-subquery with GROUP BY SELECT c_name,c_phone FROM customer WHERE с_status = 1 AND c_regyear BETWEEN 1992 AND 1994 AND (c_custkey,c_status,c_regyear) IN ( SELECT o_custkey,MIN(o_customerstatus), o_orderyear FROM orders GROUP BY o_custkey,o_orderyear ) ;
  22. 22. When it can be used ● Uncorrelated materialized semi-join IN-subquery with GROUP BY ● There is a condition which fields consist in the left part of IN-subquerySELECT c_name,c_phone FROM customer WHERE с_status = 1 AND c_regyear BETWEEN 1992 AND 1994 AND (c_custkey,c_status,c_regyear) IN ( SELECT o_custkey,MIN(o_customerstatus), o_orderyear FROM orders GROUP BY o_custkey,o_orderyear ) ;
  23. 23. SELECT c_name,c_phone FROM customer WHERE с_status = 1 AND c_regyear BETWEEN 1992 AND 1994 AND (c_custkey,c_status,c_regyear) IN ( SELECT o_custkey,MIN(o_customerstatus), o_orderyear FROM orders GROUP BY o_custkey,o_orderyear ) ; When it can be used ● Uncorrelated materialized semi-join IN-subquery with GROUP BY ● There is a condition which fields consist in the left part of IN-subquery ● Special variable ‘condition_pushdown_for_subquery’ is set (set by default)
  24. 24. How condition pushdown is made SELECT c_name,c_phone FROM customer WHERE с_status = 1 AND c_regyear BETWEEN 1992 AND 1994 AND (c_custkey,c_status,c_regyear) IN ( SELECT o_custkey,MIN(o_customerstatus),o_orderyear FROM orders GROUP BY o_custkey,o_orderyear ) ;
  25. 25. How condition pushdown is made SELECT c_name,c_phone FROM customer WHERE с_status = 1 AND c_regyear BETWEEN 1992 AND 1994 AND (c_custkey,c_status,c_regyear) IN ( SELECT o_custkey,MIN(o_customerstatus),o_orderyear FROM orders GROUP BY o_custkey,o_orderyear HAVING MIN(o_customerstatus) = 1 ) ;
  26. 26. How condition pushdown is made SELECT c_name,c_phone FROM customer WHERE с_status = 1 AND c_regyear BETWEEN 1992 AND 1994 AND (c_custkey,c_status,c_regyear) IN ( SELECT o_custkey,MIN(o_customerstatus),o_orderyear FROM orders GROUP BY o_custkey,o_orderyear HAVING MIN(o_customerstatus) = 1 ) ;
  27. 27. How condition pushdown is made SELECT c_name,c_phone FROM customer WHERE с_status = 1 AND c_regyear BETWEEN 1992 AND 1994 AND (c_custkey,c_status,c_regyear) IN ( SELECT o_custkey,MIN(o_customerstatus),o_orderyear FROM orders WHERE o_orderyear BETWEEN 1992 AND 1994 GROUP BY o_custkey,o_orderyear HAVING MIN(o_customerstatus) = 1 ) ;
  28. 28. How condition pushdown is made
  29. 29. Improvement DBT3, MyISAM with optimization without optimization 1 GB 0.013 sec 0.017 sec 5 GB 7.185 sec 2 min 51.705 sec 15 GB 11.003 sec 12 min 47.846 sec
  30. 30. In-memory PK filters built from range index scans
  31. 31. What is PK-filter SELECT o_orderkey, l_linenumber, l_shipdate, o_totalprice FROM lineitem JOIN orders ON l_orderkey = o_orderkey WHERE l_shipdate BETWEEN '1997-01-01' AND '1997-06-30' AND o_totalprice between 200000 and 230000;
  32. 32. What is PK-filter 1. There is an index i_o_totalprice on orders(o_totalprice) SELECT o_orderkey, l_linenumber, l_shipdate, o_totalprice FROM lineitem JOIN orders ON l_orderkey = o_orderkey WHERE l_shipdate BETWEEN '1997-01-01' AND '1997-06-30' AND o_totalprice between 200000 and 230000;
  33. 33. What is PK-filter 1. There is an index i_o_totalprice on orders(o_totalprice) 2. C1 cardinality is small in comparison with the cardinality of orders SELECT o_orderkey, l_linenumber, l_shipdate, o_totalprice FROM lineitem JOIN orders ON l_orderkey = o_orderkey WHERE l_shipdate BETWEEN '1997-01-01' AND '1997-06-30' AND o_totalprice between 200000 and 230000; C1
  34. 34. What is PK-filter 1. There is an index i_o_totalprice on orders(o_totalprice) 2. C1 cardinality is small in comparison with the cardinality of orders SELECT o_orderkey, l_linenumber, l_shipdate, o_totalprice FROM lineitem JOIN orders ON l_orderkey = o_orderkey WHERE l_shipdate BETWEEN '1997-01-01' AND '1997-06-30' AND o_totalprice between 200000 and 230000; C1 Try to build a filter!
  35. 35. What is PK-filter o_totalprice between 200000 and 230000 + i_o_totalprice = range scan using i_o_totalprice
  36. 36. What is PK-filter o_totalprice between 200000 and 230000 + i_o_totalprice = range scan using i_o_totalprice collect Primary Keys for the rows in this range PK for C1
  37. 37. What is PK-filter o_totalprice between 200000 and 230000 + i_o_totalprice = range scan using i_o_totalprice collect Primary Keys for the rows in this range PK for C1 sort it
  38. 38. What is PK-filter o_totalprice between 200000 and 230000 + i_o_totalprice = range scan using i_o_totalprice collect Primary Keys for the rows in this range PK for C1 sort it PK filter built from range index scan
  39. 39. How it works PK-filter orders lineitem o_orderkey = l_orderkey
  40. 40. How it works +------+-------------+----------+---------------+---------------------------------------------------------+------------ ------------+---------+--------------------------------+--------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+---------------+---------------------------------------------------------+------------ ------------+---------+--------------------------------+--------+---------------------------------+ | 1 | SIMPLE | lineitem | range | PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity | i_l_shipdate | 4 | NULL | 98 | Using index condition | | 1 | SIMPLE | orders | eq_ref|filter | PRIMARY,i_o_totalprice | PRIMARY|i_o_totalprice | 4|9 | dbt3_small.lineitem.l_orderkey | 1 (5%) | Using where; Using rowid filter | +------+-------------+----------+---------------+---------------------------------------------------------+------------ ------------+---------+--------------------------------+--------+---------------------------------+
  41. 41. How it works "rowid_filter": { "range": { "key": "i_o_totalprice", "used_key_parts": ["o_totalprice"] }, "rows": 81, "selectivity_pct": 5.4, "r_rows": 71, "r_selectivity_pct": 10.417, "r_buffer_size": 53, "r_filling_time_ms": 0.0482 },
  42. 42. Limitations ● Check if ‘rowid_filter’ special variable is set (set by default) ● PK-filter size shouldn’t exceed ‘max_rowid_filter_size’ ○ 128 KB by default ● Index on which filter is built is not clustered primary ● Engines that support rowid filters ○ InnoDB ○ MyISAM Find more in MDEV-16188
  43. 43. Improvement: MyISAM SELECT l_quantity, l_shipdate FROM lineitem, orders WHERE l_orderkey=o_orderkey AND o_totalprice BETWEEN 300000 AND 330000 AND l_shipdate BETWEEN '1996-11-01' AND '1996-11-14' AND l_quantity=15; Indexes on: 1. l_quantity 2. l_shipdate 3. o_totalprice
  44. 44. Improvement: MyISAM SELECT l_quantity, l_shipdate FROM lineitem, orders WHERE l_orderkey=o_orderkey AND o_totalprice BETWEEN 300000 AND 330000 AND l_shipdate BETWEEN '1996-11-01' AND '1996-11-14' AND l_quantity=15; Indexes on: 1. l_quantity 2. l_shipdate 3. o_totalprice max_rowid_filter_size = 24 MB
  45. 45. Improvement: MyISAM +------+-------------+----------+---------------+----------------------------------------------------------------------+--------------------------- +---------+-----------------------------+--------------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+---------------+----------------------------------------------------------------------+--------------------------- +---------+-----------------------------+--------------+---------------------------------+ | 1 | SIMPLE | lineitem | ref|filter | PRIMARY,i_l_shipdate,i_l_orderkey,i_l_orderkey_quantity,i_l_quantity | i_l_quantity|i_l_shipdate | 9|4 | const | 2012600 (1%) | Using where; Using rowid filter | | 1 | SIMPLE | orders | eq_ref|filter | PRIMARY,i_o_totalprice | PRIMARY|i_o_totalprice | 4|9 | dbt3_30.lineitem.l_orderkey | 1 (6%) | Using where; Using rowid filter | +------+-------------+----------+---------------+----------------------------------------------------------------------+--------------------------- +---------+-----------------------------+--------------+---------------------------------+
  46. 46. Improvement: MyISAM DBT3 with optimization without optimization 5 GB 1.147 sec 15.005 sec 15 GB 3.281 sec 44.363 sec 30 GB: SSD 6.552 sec 1 min 28.347 sec 30 GB: HDD 37.234 sec 7 min 29.090 sec
  47. 47. Improvement: InnoDB SELECT * FROM part,lineitem,partsupp WHERE p_partkey = ps_partkey AND l_suppkey=ps_suppkey AND p_retailprice BETWEEN 1080 AND 1100 AND l_shipdate BETWEEN '1996-10-01' AND '1997-02-01';
  48. 48. Improvement: InnoDB +------+-------------+----------+------------+-----------------------------------+--------------------------+---------+---------------------------+ -----------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+----------+------------+-----------------------------------+--------------------------+---------+---------------------------+ -----------+---------------------------------+ | 1 | SIMPLE | part | range | PRIMARY,i_p_retailprice | i_p_retailprice | 9 | NULL | 115314 | Using index condition | | 1 | SIMPLE | partsupp | ref | PRIMARY,i_ps_partkey,i_ps_suppkey | PRIMARY | 4 | dbt31.part.p_partkey | 2 | | | 1 | SIMPLE | lineitem | ref|filter | i_l_shipdate,i_l_suppkey | i_l_suppkey|i_l_shipdate | 5|4 | dbt31.partsupp.ps_suppkey | 249 (10%) | Using where; Using rowid filter | +------+-------------+----------+------------+-----------------------------------+--------------------------+---------+---------------------------+ -----------+---------------------------------+
  49. 49. Improvement: InnoDB DBT3 with optimization without optimization 5 GB 27.669 sec 5 min 41.049 sec 15 GB 8 min 28.506 sec > 50 min
  50. 50. Optimizer trace
  51. 51. Optimizer trace ● Available in MySQL since MySQL 5.6 mysql> set optimizer_trace=1; mysql> <query>; mysql> select * from -> information_schema.optimizer_trace; "steps": [ { "join_preparation": { "select#": 1, "steps": [ { "expanded_query": "/* select#1 */ select `t1`.`col1` AS `col1`,`t1`.`col2` AS `col2` from `t1` where (`t1`.`col1` < 4)" } ] } }, { "join_optimization": { "select#": 1, "steps": [ { "condition_processing": { "condition": "WHERE", "original_condition": "(`t1`.`col1` < 4)", "steps": [ { "transformation": "equality_propagation", "resulting_condition": "(`t1`.`col1` < 4)" }, { "transformation": "constant_propagation", "resulting_condition": "(`t1`.`col1` < 4)" }, { "transformation": "trivial_condition_removal", "resulting_condition": "(`t1`.`col1` < 4)" } ] } ● Now, a similar feature in MariaDB ● Explains optimizer choices
  52. 52. The goal is to understand the optimizer ● “Why was query plan X not chosen?” − It had higher cost (due to incorrect statistics ?) − Limitation in the optimizer? ● What rewrites happen − Does “X=10 AND FUNC(X)” -> “FUNC(10)” work? − Or any other suspicious rewrite of the day ● What changed between the two hosts/versions − diff /tmp/trace_from_host1.json /tmp/trace_from_host2.json ● ...
  53. 53. A user case: range optimizer ● Complex WHERE clause and multi-component index make it unclear what ranges will be scanned ● A classic example: create table some_events ( start_date DATE, end_date DATE, ... KEY (start_date, end_date) ); "rows_estimation": [ { "table": "some_events", ... "analyzing_range_alternatives": { "range_scan_alternatives": [ { "index": "start_date", "ranges": ["0x4ac60f <= start_date"], "rowid_ordered": false, "using_mrr": false, "index_only": false, .. select ... from some_events where start_date >= '2019-02-10' and end_date <= '2019-04-01'
  54. 54. Customer Case: a VIEW that stopped merging ● A big join query with lots of nested views ● Performance drop after a minor change to a VIEW − EXPLAIN shows the view is no longer merged ● Initial idea: the change added a LEFT JOIN, so it must be it "view": { "table": "view_name_8", "select_id": 9, "algorithm": "merged" } "view": { "table": "view_name_8", "select_id": 9, "algorithm": "materialized", "cause": "Not enough table bits to merge subquery" } ● (Due to Table Elimination, EXPLAIN showed <64 tables both before and after)
  55. 55. Customer Case 2: no materialization ● Subquery materialization was not used. +------+--------------------+-------+------+---------------+------+---------+------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+--------------------+-------+------+---------------+------+---------+------+---------+-------------+ | 1 | PRIMARY | t1 | ALL | NULL | NULL | NULL | NULL | 10 | Using where | | 2 | DEPENDENT SUBQUERY | t2 | ALL | NULL | NULL | NULL | NULL | 1000000 | Using where | +------+--------------------+-------+------+---------------+------+---------+------+---------+-------------+ "join_preparation": { "select_id": 2, "steps": [ { "transformation": { "select_id": 2, "from": "IN (SELECT)", "to": "materialization", "possible": false, "cause": "types mismatch" } ● Different datatypes disallow Materialization ● A non-obvious limitation − Required a server developer with a debugger to figure out select * from t1 where t1.col in (select t2.col from t2) or ...
  56. 56. Optimizer trace structure TRACE: steps: { join_preparation+, join_optimization+, (join_explain | join_execution)+ } join_optimization : steps { condition_processing, substitute_generated_columns, table_dependencies, ref_optimizer_key_uses, rows_estimation, considered_execution_plans, attaching_conditions_to_tables, refine_plan, } join_preparation : { expanded_query } join_preparation : { expanded_query } rows_estimation: { analyzing_range_alternatives : { ... } selectivity_for_indexes, selectivity_for_columns, cond_selectivity: 0.nnnn }
  57. 57. Optimizer trace summary ● Allows to examine how optimizer processes the query ● Mostly for manual troubleshooting ● Good for bug reporting too ● Currently prints the essentials − Will print more in the future.
  58. 58. Thanks for your attention! Come to “How to use histograms to get better performance” Today, 1:30 pm- 2:20 pm, Gallery C

×