3. 3
Plan
● Earlier versions: derived table merging
● MariaDB 10.2: Condition pushdown
● MariaDB 10.3: Condition pushdown through window
functions
● MariaDB 10.3: GROUP BY splitting
4. 4
Background – derived table merge
● “Customers and their big orders from October”
select *
from
customers,
(select *
from orders
where order_date BETWEEN '2017-10-01' and '2017-10-31'
) as OCT_ORDERS
where
OCT_ORDERS.amount > 1M and
OCT_ORDERS.customer_id = customer.customer_id
5. 5
Naive execution
select *
from
customers,
(select *
from orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
) as OCT_ORDERS
where
OCT_ORDERS.amount > 1M and
OCT_ORDERS.customer_id =
customers.customer_id
orders
customers
1 – compute
oct_orders
2- do join OCT_ORDERS
amount > 1M
6. 6
Derived table merge
select *
from
customers,
(select *
from orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
) as OCT_ORDERS
where
OCT_ORDERS.amount > 1M and
OCT_ORDERS.customer_id =
customers.customer_id
select *
from
customers,
orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
and
orders.amount > 1M and
orders.customer_id =
customers.customer_id
7. 7
Execution after merge
customers
Join
orders
select *
from
customers,
orders
where
order_date BETWEEN '2017-10-01' and
'2017-10-31'
and
orders.amount > 1M and
orders.customer_id =
customers.customer_id
Made in October
amount > 1M
● Allows the optimizer to join customers→orders or orders→customers
● Good for optimization
8. 8
Another use case - grouping
● Can’t merge due to GROUP BY in the child.
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
select * from OCT_TOTALS where customer_id=1
9. 9
Execution is inefficient
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
select * from OCT_TOTALS where customer_id=1
orders
1 – compute all totals
2- get* customer=1
OCT_TOTALS
customer_id=1
Sum
( “derived_with_keys” will
build/use an index here)
10. 10
Condition pushdown
select *
from OCT_TOTALS
where customer_id=1
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
● Can push down conditions on GROUP
BY columns
● … to filter out rows that go into groups
we dont care about
11. 11
Condition pushdown
select *
from OCT_TOTALS
where customer_id=1
orders
1 – find customer_id=1
OCT_TOTALS,
customer_id=1
customer_id=1
Sum
● Looking only at rows you’re interested in is much more efficient
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
orders
12. 12
Condition Pushdown into HAVING
select *
from OCT_TOTALS
where TOTAL_AMT > 1M
● Conditions that cannot be pushed
through GROUP BY will be
pushed into the HAVING clause
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
having
TOTAL_AMT > 1M
13. 13
Condition pushdown will also push inferred conditions
select
custmer.customer_name,
TOTAL_AMT
from
customer, OCT_TOTALS
where
customer.customer_id=OCT_TOTALS.customer_id and
customer.customer_id=1
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
OCT_TOTALS.customer_id=1
14. 14
“Split grouping for derived”
select *
from
customer, OCT_TOTALS
where
customer.customer_id=OCT_TOTALS.customer_id and
customer.customer_name IN ('Customer 1', 'Customer 2')
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
15. 15
Execution, the old way
Sum
orders
select *
from
customer, OCT_TOTALS
where
customer.customer_id=
OCT_TOTALS.customer_id and
customer.customer_name IN ('Customer 1',
'Customer 2')
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
Customer 1
Customer 2
Customer 3
Customer 100
Customer 1
Customer 2
Customer 3
Customer 100
customer
Customer 1
Customer 2
OCT_TOTALS
● Inefficient, OCT_TOTALS
is computed for *all*
customers.
16. 16
Split grouping execution
Sum
customer
Customer 2
Customer 2
Customer 1
Customer 100
orders
Customer 1
Customer 1
Customer 2
Sum
SumSum
● Can be used when doing join from
customer to orders
● Must have equalities for GROUP BY
columns:
OCT_TOTALS.customer_id=customer.customer_id
– This allows to select one group
● The underlying table (orders) must
have an index on the GROUP BY
column (customer_id)
– This allows to use ref access
17. 17
Split grouping execution
● EXPLAIN shows “LATERAL DERIVED”
● @@optimizer_switch flag: split_materialization (ON by default)
● Cost-based choice whether use lateralization
select *
from
customer, OCT_TOTALS
where
customer.customer_id=
OCT_TOTALS.customer_id and
customer.customer_name IN ('Customer 1',
'Customer 2')
create view OCT_TOTALS as
select
customer_id,
SUM(amount) as TOTAL_AMT
from orders
where
order_date BETWEEN '2017-10-01' and '2017-10-31'
group by
customer_id
+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
| 1 | PRIMARY | customer | ALL | PRIMARY | NULL | NULL | NULL | 1000 | |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | 4 | customer.customer_id | 36 | |
| 2 | LATERAL DERIVED | orders | ref | customer_id | customer_id | 4 | customer.customer_id | 365 | Using where |
+------+-----------------+------------+------+---------------+-------------+---------+----------------------+------+-------------+
18. 18
Summary
● MariaDB 10.2: Condition pushdown for derived tables optimization
– Push a condition into derived table
– Used when derived table cannot be merged
– Biggest effect is for subqueries with GROUP BY
● MariaDB 10.3: Condition Pushdown through Window functions
● MariaDB 10.3: Lateral derived optimization
– When doing a join, can’t do condition pushdown
– So, lateral derived is used. It allows to only examine GROUP BY groups that
match other tables. Group By columns must be indexed
20. 20
Plan
● What are window functions
● Using window functions is an optimization by itself
● Condition pushdown through PARTITION BY
● Doing fewer sorts
21. 21
Window functions basics
● Window functions are like aggregate functions,
– Each with its own GROUP BY clause
● Except that
– The groups are ordered
– The groups are not collapsed
22. 22
Aggregate function example
select
country, sum(Population) as total
from Cities
group by country
+-----------+---------+------------+
| name | country | population |
+-----------+---------+------------+
| Berlin | DEU | 3386667 |
| Frankfurt | DEU | 643821 |
| Moscow | RUS | 8389200 |
| New York | USA | 8008278 |
| Chicago | USA | 2896016 |
| Seattle | USA | 563374 |
+-----------+---------+------------+
+---------+----------+
| country | total |
+---------+----------+
| DEU | 4030488 |
| RUS | 8389200 |
| USA | 11467668 |
+---------+----------+
23. 23
Window function example
select
name,
rank() over (partition by country,
order by population desc)
from cities
+-----------+---------+------------+
| name | country | population |
+-----------+---------+------------+
| Berlin | DEU | 3386667 |
| Frankfurt | DEU | 643821 |
| Moscow | RUS | 8389200 |
| New York | USA | 8008278 |
| Chicago | USA | 2896016 |
| Seattle | USA | 563374 |
+-----------+---------+------------+
+-----------+------+
| name | rank |
+-----------+------+
| Berlin | 1 |
| Frankfurt | 2 |
| Moscow | 1 |
| New York | 1 |
| Chicago | 2 |
| Seattle | 3 |
+-----------+------+
24. 24
Window function example
select
name,
rank() over (partition by country,
order by population desc)
from cities
+-----------+---------+------------+
| name | country | population |
+-----------+---------+------------+
| Berlin | DEU | 3386667 |
| Frankfurt | DEU | 643821 |
| Moscow | RUS | 8389200 |
| New York | USA | 8008278 |
| Chicago | USA | 2896016 |
| Seattle | USA | 563374 |
+-----------+---------+------------+
+-----------+------+
| name | rank |
+-----------+------+
| Berlin | 1 |
| Frankfurt | 2 |
| Moscow | 1 |
| New York | 1 |
| Chicago | 2 |
| Seattle | 3 |
+-----------+------+
25. 25
Window function computation
● Can look at
– Current row
– Rows in the partition, ordered
● Can compute the window function
● Computing values individually would
be expensive
– O(#rows_in_partition ^ 2)
27. 27
Window function computation
10 $total+10
$total
● Example● Example
SELECT
SUM(amount) OVER (ORDER BY date
ROWS BETWEEN
UNBOUNDED PRECEDING
AND CURRENT ROW)
AS cur_balance
FROM
transactions
28. 28
Compare to non-window function
$total
● Typically uses a correlated subquery
SELECT
(SELECT SUM(amount)
FROM
transactions t
WHERE
t.date <= date AND
account_id = 12345
) AS cur_balance
FROM
transactions
● N^2 complexity
30. 30
MariaDB 10.3: Pushdown through Window Functions
● “Customer’s biggest orders”
create view top_three_orders as
select *
from
(
select
customer_id,
amount,
rank() over (partition by customer_id
order by amount desc
) as order_rank
from orders
) as ordered_orders
where order_rank<3
select * from top_three_orders where customer_id=1
+-------------+--------+------------+
| customer_id | amount | order_rank |
+-------------+--------+------------+
| 1 | 10000 | 1 |
| 1 | 9500 | 2 |
| 1 | 400 | 3 |
| 2 | 3200 | 1 |
| 2 | 1000 | 2 |
| 2 | 400 | 3 |
...
31. 31
MariaDB 10.3: Pushdown through Window Functions
MariaDB 10.2, MySQL 8.0
● Compute
top_three_orders for all
customers
● select rows with
customer_id=1
select * from top_three_orders where customer_id=1
MariaDB 10.3 (and e.g. PostgreSQL)
● Only compute top_three_orders
for customer_id=1
– This can be much faster!
– Can make use of
index(customer_id)
32. 32
Doing fewer sorts
tbl
tbl
tbl
join
sort
select
rank() over (order by incidents),
ntile(4)over (order by incidents),
rank() over (order by ...),
from
support_staff
● Each window function requires a sort
● Could avoid sorting if using an index (not supported yet)
● Identical PARTITION/ORDER BY must share the sort step
– Compatible may share the sort step (supported)
33. 33
Window function optimzation conclusions
● Using window functions is an optimization by itself
● Condition pushdown through PARTITION BY
– This is the most important
● Fewer sorts are done.