SlideShare uma empresa Scribd logo
1 de 42
Hệ quản trị cơ sở dữ liệu

Query Optimization
Dư Phương Hạnh
Bộ môn Hệ thống thông tin
Khoa CNTT, trường Đại học Công nghệ
Đại học Quốc gia Hanoi
hanhdp@vnu.edu.vn
Outline







Optimization Overview
Optimizing SQL Statement
Optimizing Database Structure
Query Execution Plan
Measuring Performance
Internal Details of Mysql Optimizations

Reading: Chap 12+13+14 of Ramakrishnan
http://dev.mysql.com/doc/refman/5.5/en/optimization.html

2

Hệ quản trị CSDL @ BM HTTT
Query Execution Plan

3

Hệ quản trị CSDL @ BM HTTT
Query Execution Plan
 The set of operations that the optimizer chooses to perform
the most efficient query is called the “query execution plan”
 Depending on the details of your tables, columns, indexes,
and the conditions in your WHERE clause, the MySQL
optimizer considers many techniques to efficiently perform
the lookups involved in an SQL query.
– A query on a huge table can be performed without reading all the
rows;
– A join involving several tables can be performed without comparing
every combination of rows.

 Your goals are to recognize the aspects of the EXPLAIN plan
that indicate a query is optimized well.
4

Hệ quản trị CSDL @ BM HTTT
Optimizing Queries with EXPLAIN
EXPLAIN SELECT select_options
MySQL displays information from the optimizer about
how tables are joined and in which order.
– To give a hint to the optimizer to use a join order
corresponding to the order in which the tables are named
in a SELECT statement, begin the statement with
SELECT STRAIGHT_JOIN rather than just SELECT.

You can see where you should add indexes to tables
so that the statement executes faster.
5

Hệ quản trị CSDL @ BM HTTT
EXPLAIN output format
 EXPLAIN returns a row of information for each table
used in the SELECT statement.
– In the output, the tables are listed in the order that MySQL
would read them while processing the statement.

 MySQL solves all joins using nested-loop join method.
– This means that MySQL reads a row from the first table,
and then finds a matching row in the second table, the third
table, and so on.
– When all tables are processed, MySQL outputs the selected
columns and backtracks through the table list until a table is
found for which there are more matching rows.
– The next row is read from this table and the process
continues with the next table.
6

Hệ quản trị CSDL @ BM HTTT
EXPLAIN output column
Column

Meaning

id

The SELECT identifier

select_type

The SELECT type

table

The table for the output row

type

The join type

possible_keys

The possible indexes to choose

key

The index actually chosen

key_len

The length of the chosen key

ref

The columns compared to the index

rows

Estimate of rows to be examined

Extra

Additional information

7

Hệ quản trị CSDL @ BM HTTT
EXPLAIN output column
 select_type:
–
–
–
–
–
–
–
–
–
–
8

SIMPLE: Simple SELECT (not using UNION or subqueries)
PRIMARY: Outermost SELECT
UNION: Second or later SELECT statement in a UNION
DEPENDENT UNION: Second or later SELECT statement in a UNION,
dependent on outer query
UNION RESULT: Result of a UNION.
SUBQUERY: First SELECT in subquery.
DEPENDENT SUBQUERY: First SELECT in subquery, dependent on
outer query.
DERIVED: Derived table SELECT (subquery in FROM clause).
UNCACHEABLE SUBQUERY:A subquery for which the result cannot
be cached and must be re-evaluated for each row of the outer query.
UNCACHEABLE UNION: The second or later select in a UNION that
belongs to an uncacheable.
Hệ quản trị CSDL @ BM HTTT
EXPLAIN output column
 Type: The following list describes the join types,
ordered from the best type to the worst:
– all: A full table scan is done for each combination of rows
from the previous tables.
– system: The table has only one row (= system table). This
is a special case of the const join type.
– const: The table has at most one matching row, which is
read at the start of the query  values from the column in
this row can be regarded as constants by the rest of the
optimizer. Const tables are very fast because they are
read only once. Const is used when you compare all
parts of a PRIMARY KEY or UNIQUE index to constant
values.
9

Hệ quản trị CSDL @ BM HTTT
EXPLAIN output column
– eq_ref: One row is read from this table for each
combination of rows from the previous tables. This is the
best possible join type. It is used when all parts of an index
are used by the join and the index is a PRIMARY
KEY or UNIQUE NOT NULL index.

Examples:
SELECT * FROM ref_table,other_table WHERE
ref_table.key_column=other_table.column;
SELECT * FROM ref_table,other_table WHERE
ref_table.key_column_part1=other_table.column
AND ref_table.key_column_part2=1;
10

Hệ quản trị CSDL @ BM HTTT
EXPLAIN output column
– ref: All rows with matching index values are read from this
table for each combination of rows from the previous
tables. Ref is used if the join uses only a leftmost prefix of
the key or if the key is not a PRIMARY KEY or UNIQUE
(index cannot select a single row based on the key value).

Examples:
SELECT * FROM ref_table WHERE key_column=expr;
SELECT * FROM ref_table,other_table WHERE
ref_table.key_column=other_table.column;
SELECT * FROM ref_table,other_table WHERE
ref_table.key_column_part1=other_table.column AND
ref_table.key_column_part2=1;
11

Hệ quản trị CSDL @ BM HTTT
EXPLAIN output column
– range: Only rows that are in a given range are retrieved,
using an index to select the rows. The key column in the
output row indicates which index is used. The key_len
contains the longest key part that was used. The ref
column is NULL for this type.

Examples:
SELECT * FROM tbl_name WHERE key_column = 10;
SELECT * FROM tbl_name WHERE key_column BETWEEN
10 and 20;
SELECT * FROM tbl_name WHERE key_part1 = 10 AND
key_part2 IN (10,20,30);
12

Hệ quản trị CSDL @ BM HTTT
EXPLAIN output column
– ...

(Read more at
http://dev.mysql.com/doc/refman/5.5/en/explainoutput.html)

13

Hệ quản trị CSDL @ BM HTTT
Optimizing join example
EXPLAIN SELECT tt.TicketNumber, tt.TimeIn,
tt.ProjectReference, tt.EstimatedShipDate,
tt.ActualShipDate, tt.ClientID,
tt.ServiceCodes, tt.RepetitiveID,
tt.CurrentProcess, tt.CurrentDPPerson,
tt.RecordVolume, tt.DPPrinted,
et.COUNTRY, et_1.COUNTRY,
do.CUSTNAME
FROM tt, et, et AS et_1, do
WHERE tt.SubmitTime IS NULL
AND tt.ActualPC = et.EMPLOYID
AND tt.AssignedPC = et_1.EMPLOYID
AND tt.ClientID = do.CUSTNMBR;
14

Hệ quản trị CSDL @ BM HTTT
Optimizing join example
Table

Column

Data Type

tt

ActualPC

CHAR(10)

tt

AssignedPC

CHAR(10)

tt

ClientID

CHAR(10)

et

EMPLOYID

CHAR(15)

do

CUSTNMBR

CHAR(15)

Table

Index

tt

ActualPC

tt

AssignedPC

tt

ClientID

et

EMPLOYID (primary key)

do

CUSTNMBR (primary key)

15

Hệ quản trị CSDL @ BM HTTT
Optimizing join example
Initially, before any optimizations have been
performed, the EXPLAIN statement produces the
following information:
table type possible_keys key key_len
et ALL PRIMARY NULL NULL
do ALL PRIMARY NULL NULL
et_1 ALL PRIMARY NULL NULL
tt
ALL AssignedPC, NULL NULL
ClientID,
ActualPC
16

ref
NULL
NULL
NULL
NULL

rows
74
2135
74
3872

Extra

Hệ quản trị CSDL @ BM HTTT
Optimizing join example
 This output indicates that MySQL is generating a
Cartesian product of all the tables;
 This takes quite a long time, because the product of
the number of rows in each table must be
examined. For the case at hand, this product is 74 ×
2135 × 74 × 3872 = 45,268,558,720 rows. If the
tables were bigger  long time…

17

Hệ quản trị CSDL @ BM HTTT
Optimizing join example
 One problem here is that MySQL can use indexes
on columns more efficiently if they are declared as
the same type and size.
 In this context, VARCHAR and CHAR are
considered the same if they are declared as the
same size. tt.ActualPC is declared as CHAR(10)
and et.EMPLOYID is CHAR(15), so there is a length
mismatch.
 ALTER Table…

18

Hệ quản trị CSDL @ BM HTTT
Optimizing join example
Executing the EXPLAIN statement again produces this result:
table type possible_keys key key_len ref
rows Extra
tt
ALL AssignedPC, NULL NULL NULL
3872 Using
ClientID,
where
ActualPC
do ALL PRIMARY
NULL NULL NULL
2135
et_1 ALL PRIMARY
NULL NULL NULL
74
et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1
This is not perfect, but is much better: The product of
the rows values is less by a factor of 74. This version
executes in a couple of seconds.

19

Hệ quản trị CSDL @ BM HTTT
Optimizing join example
 A second alteration can be made to eliminate the column
length mismatches for the tt.AssignedPC = et_1.EMPLOYID
and tt.ClientID = do.CUSTNMBR comparisons:
table type possible_keys key key_len ref
rows
et
ALL PRIMARY
NULL NULL NULL
74
tt
ref AssignedPC, ActualPC
15 et.EMPLOYID 52
Using
ClientID,
where
ActualPC
et_1 eq_ref PRIMARY
PRIMARY 15 tt.AssignedPC 1
do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1

20

Extra

Hệ quản trị CSDL @ BM HTTT
Optimizing join example
 At this point, the query is optimized almost as well
as possible.
 The remaining problem is that, by default, MySQL
assumes that values in the tt.ActualPC column are
evenly distributed, and that is not the case for
the tt table. It is easy to tell MySQL to analyze the
key distribution using ANALYZE statement.
 With the additional index information, the join is
perfect:

21

Hệ quản trị CSDL @ BM HTTT
Optimizing join example
table type possible_keys key key_len ref
rows
tt
ALL AssignedPC NULL NULL NULL
3872
ClientID,
ActualPC
et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1
et_1 eq_ref PRIMARY
PRIMARY 15 tt.AssignedPC 1
do eq_ref PRIMARY
PRIMARY 15 tt.ClientID 1

22

Extra
Using
where

Hệ quản trị CSDL @ BM HTTT
Estimating Query Performance
 You can estimate query performance by counting
disk seeks.
– For small tables, you can usually find a row in one disk
seek (because the index is probably cached).
– For bigger tables, you can estimate that, using B-tree
indexes, you need this many seeks to find a row:
log(row_count) / log(index_block_length / 3 * 2 /
(index_length +data_pointer_length)) + 1.

23

Hệ quản trị CSDL @ BM HTTT
Estimating Query Performance
 In MySQL, an index block is usually 1,024 bytes and the data
pointer is usually 4 bytes. For a 500,000-row table with a key
value length of 3 bytes (the size of MEDIUMINT), the formula
indicates: log(500,000)/log(1024/3*2/(3+4)) + 1 = 4 seeks.
 This index would require storage of about 500,000 * 7 * 3/2 =
5.2MB (assuming a typical index buffer fill ratio of 2/3), so
you probably have much of the index in memory and so need
only one or two calls to read data to find the row.
 For writes, however, you need four seek requests to find
where to place a new index value and normally two seeks to
update the index and write the row.
24

Hệ quản trị CSDL @ BM HTTT
Measuring Performance
 Performance depending on so many different factors
that a difference of a few percentage points might
not be a decisive victory.
– The results might shift the opposite way when you test in a
different environment.

 Certain MySQL features help or do not help
performance depending on the workload.
– For completeness, always test performance with those
features turned on and turned off.

25

Hệ quản trị CSDL @ BM HTTT
Measuring Performance
 To measure the speed of a specific MySQL
expression or function, invoke the BENCHMARK()
function using the mysql client program as follow:
BENCHMARK(loop_count,expression).
Example:
SELECT BENCHMARK(1000000,1+1);
 If we use a Pentium II 400MHz system, the result
shows that MySQL can execute 1,000,000 simple
addition expressions in 0.32 seconds on that system.
26

Hệ quản trị CSDL @ BM HTTT
Internal Details of
MySQL Optimizations

27

Hệ quản trị CSDL @ BM HTTT
Internal Details of MySQL Optimizations
IS NULL Optimization
LEFT JOIN and RIGHT JOIN Optimization
Nested-Loop Join Algorithms
DISTINCT Optimization
Optimizing IN/=ANY Subqueries
…
Read more at
http://dev.mysql.com/doc/refman/5.5/en/optimizationinternals.html







28

Hệ quản trị CSDL @ BM HTTT
IS NULL Optimization
 If a WHERE clause includes a col_name IS NULL
condition for a column that is declared as NOT
NULL, that expression is optimized away.
– This optimization does not occur in cases when the
column might produce NULL anyway; for example, if it
comes from a table on the right side of a LEFT JOIN.

 MySQL can also optimize the combination
(col_name = expr OR col_name IS NULL), a form
that is common in resolved subqueries.
– EXPLAIN shows ref_or_null when this optimization is
used.

29

Hệ quản trị CSDL @ BM HTTT
IS NULL Optimization
 Examples of queries that are optimized, assuming
that there is an index on columns a and b of table t2:
– SELECT * FROM t1 WHERE t1.a=expr OR t1.a IS NULL;
– SELECT * FROM t1, t2 WHERE t1.a=t2.a OR t2.a IS NULL;
– SELECT * FROM t1, t2
WHERE (t1.a=t2.a OR t2.a IS NULL) AND t2.b=t1.b;
– SELECT * FROM t1, t2
WHERE t1.a=t2.a AND (t2.b=t1.b OR t2.b IS NULL);
– SELECT * FROM t1, t2
WHERE (t1.a=t2.a AND t2.a IS NULL AND ...)
OR (t1.a=t2.a AND t2.a IS NULL AND ...);
30

Hệ quản trị CSDL @ BM HTTT
IS NULL Optimization
 ref_or_null works by first doing a read on the
reference key, and then a separate search for rows
with a NULL key value.
 Note that the optimization can handle only one IS
NULL level. In the following query, MySQL uses key
lookups only on the expression (t1.a=t2.a AND t2.a
IS NULL) and is not able to use the key part on b:
SELECT * FROM t1, t2
WHERE (t1.a=t2.a AND t2.a IS NULL)
OR (t1.b=t2.b AND t2.b IS NULL);
31

Hệ quản trị CSDL @ BM HTTT
LEFT JOIN and RIGHT JOIN
Optimization

 The join optimizer calculates the order in which tables
should be joined.
– The table read order forced by LEFT JOIN or
STRAIGHT_JOIN helps the join optimizer do its work much
more quickly, because there are fewer table permutations
to check.

Example:
SELECT * FROM a JOIN b LEFT JOIN c ON (c.key=a.key)
LEFT JOIN d ON (d.key=a.key) WHERE b.key=d.key;
– MySQL will do a full scan on b because the LEFT JOIN
forces it to be read before d.
32

Hệ quản trị CSDL @ BM HTTT
LEFT JOIN and RIGHT JOIN
Optimization

 The fix in this example is reverse the order in
which a and b are listed in the FROM clause:
SELECT * FROM a JOIN b LEFT JOIN c ON (c.key=a.key)
LEFT JOIN d ON (d.key=a.key) WHERE b.key=d.key;

SELECT * FROM b JOIN a LEFT JOIN c ON (c.key=a.key)
LEFT JOIN d ON (d.key=a.key) WHERE b.key=d.key;

33

Hệ quản trị CSDL @ BM HTTT
LEFT JOIN and RIGHT JOIN
Optimization if the WHERE condition is always false for
 For a LEFT JOIN,
the generated NULL row, the LEFT JOIN is changed to a
normal join. For example, the WHERE clause would be false
in the following query if t2.column1 were NULL:
SELECT * FROM t1 LEFT JOIN t2 ON (column1)
WHERE t2.column2=5;
 Therefore, it is safe to convert the query to a normal join:
SELECT * FROM t1, t2 WHERE t2.column2=5 AND
t1.column1=t2.column1;
 This can be made faster because MySQL can use table t2
before table t1 if doing so would result in a better query plan.
34

Hệ quản trị CSDL @ BM HTTT
Nested-Loop Join Algorithms (NLJ)
 MySQL executes joins between tables using a
nested-loop algorithm or variations on it.
 Assume that a join between three tables t1, t2,
and t3 is to be executed using the following join
types:
Table Join_Type
t1
range
t2
ref
t3
ALL.

35

Hệ quản trị CSDL @ BM HTTT
Nested-Loop Join Algorithms (NLJ)
 If a simple NLJ algorithm is used, the join is
processed like this:
for each row in t1 matching range {
for each row in t2 matching reference key {
for each row in t3 {
if row satisfies join conditions,
send to client
}
}
}
36

Hệ quản trị CSDL @ BM HTTT
Nested-Loop Join Algorithms (NLJ)
 A Block Nested-Loop (BNL) join algorithm uses
buffering of rows read in outer loops to reduce the
number of times that tables in inner loops must be
read.
 For example, if 10 rows are read into a buffer and
the buffer is passed to the next inner loop, each row
read in the inner loop can be compared against all
10 rows in the buffer. The reduces the number of
times the inner table must be read by an order of
magnitude.

37

Hệ quản trị CSDL @ BM HTTT
Nested-Loop Join Algorithms (NLJ)
for each row in t1 matching range {
for each row in t2 matching reference key {
store used columns from t1, t2 in join buffer
if buffer is full {
for each row in t3 {
for each t1, t2 combination in join buffer {
if row satisfies join conditions,
send to client
}
}
empty buffer
}
}
}
38

if buffer is not empty {
for each row in t3 {
for each t1, t2
combination in join buffer {
if row satisfies join
conditions,
send to client
}
}
}

Hệ quản trị CSDL @ BM HTTT
Nested-Loop Join Algorithms (NLJ)
 S: the size of each stored t1, t2 combination
 C: the number of combinations in the buffer
 The number of times table t3 is scanned is:
(S * C)/join_buffer_size + 1
 The number of t3 scans decreases as the value of
join_buffer_size increases, up to the point when
join_buffer_size is large enough to hold all previous
row combinations. At that point, there is no speed to
be gained by making it larger.
39

Hệ quản trị CSDL @ BM HTTT
Optimizing IN/=ANY Subqueries
 To help the query optimizer better execute your
queries, use these tips:
– A column must be declared as NOT NULL if it really is.
(This also helps other aspects of the optimizer.)
– If you don't need to distinguish a NULL from FALSE
subquery result, you can easily avoid the slow execution
path. Replace a comparison that looks like this:
outer_expr IN (SELECT inner_expr FROM ...)
with this expression:
(outer_expr IS NOT NULL) AND (outer_expr IN
(SELECT inner_expr…

40

Hệ quản trị CSDL @ BM HTTT
Optimizing IN/=ANY Subqueries
outer_expr IN (SELECT inner_expr FROM ...
WHERE subquery_where)
MySQL evaluates queries “from outside to inside.”
– It first obtains the value of the outer expression
outer_expr, and then runs the subquery and captures the
rows that it produces.

A very useful optimization is to “inform” the subquery
that the only rows of interest are those where the inner
expression inner_expr is equal to outer_expr. This is
done by pushing down an appropriate equality into the
subquery's WHERE clause.
41

Hệ quản trị CSDL @ BM HTTT
Optimizing IN/=ANY Subqueries
 The comparison is converted to this:
outer_expr IN (SELECT inner_expr FROM ...
WHERE subquery_where)

EXISTS (SELECT 1 FROM ...
WHERE subquery_where AND
outer_expr=inner_expr)
 After the conversion, MySQL can use the pusheddown equality to limit the number of rows that it
must examine when evaluating the subquery.
42

Hệ quản trị CSDL @ BM HTTT

Mais conteúdo relacionado

Mais procurados

Data and File Structure Lecture Notes
Data and File Structure Lecture NotesData and File Structure Lecture Notes
Data and File Structure Lecture NotesFellowBuddy.com
 
Introduction of Data Structures and Algorithms by GOWRU BHARATH KUMAR
Introduction of Data Structures and Algorithms by GOWRU BHARATH KUMARIntroduction of Data Structures and Algorithms by GOWRU BHARATH KUMAR
Introduction of Data Structures and Algorithms by GOWRU BHARATH KUMARBHARATH KUMAR
 
Csc4320 chapter 8 2
Csc4320 chapter 8 2Csc4320 chapter 8 2
Csc4320 chapter 8 2bshikhar13
 
Furnish an Index Using the Works of Tree Structures
Furnish an Index Using the Works of Tree StructuresFurnish an Index Using the Works of Tree Structures
Furnish an Index Using the Works of Tree Structuresijceronline
 
Programming & Data Structure Lecture Notes
Programming & Data Structure Lecture NotesProgramming & Data Structure Lecture Notes
Programming & Data Structure Lecture NotesFellowBuddy.com
 
Algorithms for External Memory Sorting
Algorithms for External Memory SortingAlgorithms for External Memory Sorting
Algorithms for External Memory SortingMilind Gokhale
 
Implementation of page table
Implementation of page tableImplementation of page table
Implementation of page tableguestff64339
 
Online Statistics Gathering for ETL
Online Statistics Gathering for ETLOnline Statistics Gathering for ETL
Online Statistics Gathering for ETLAndrej Pashchenko
 
Data structure
Data structureData structure
Data structureMohd Arif
 
Distributed design alternatives
Distributed design alternativesDistributed design alternatives
Distributed design alternativesPooja Dixit
 
Merging files (Data Structure)
Merging files (Data Structure)Merging files (Data Structure)
Merging files (Data Structure)Tech_MX
 
Structure of the page table
Structure of the page tableStructure of the page table
Structure of the page tableduvvuru madhuri
 
Segmentation in Operating Systems.
Segmentation in Operating Systems.Segmentation in Operating Systems.
Segmentation in Operating Systems.Muhammad SiRaj Munir
 
Unit I Database concepts - RDBMS & ORACLE
Unit I  Database concepts - RDBMS & ORACLEUnit I  Database concepts - RDBMS & ORACLE
Unit I Database concepts - RDBMS & ORACLEDrkhanchanaR
 
Properly Use Parallel DML for ETL
Properly Use Parallel DML for ETLProperly Use Parallel DML for ETL
Properly Use Parallel DML for ETLAndrej Pashchenko
 

Mais procurados (20)

Data and File Structure Lecture Notes
Data and File Structure Lecture NotesData and File Structure Lecture Notes
Data and File Structure Lecture Notes
 
Introduction of Data Structures and Algorithms by GOWRU BHARATH KUMAR
Introduction of Data Structures and Algorithms by GOWRU BHARATH KUMARIntroduction of Data Structures and Algorithms by GOWRU BHARATH KUMAR
Introduction of Data Structures and Algorithms by GOWRU BHARATH KUMAR
 
Linked List
Linked ListLinked List
Linked List
 
Csc4320 chapter 8 2
Csc4320 chapter 8 2Csc4320 chapter 8 2
Csc4320 chapter 8 2
 
Furnish an Index Using the Works of Tree Structures
Furnish an Index Using the Works of Tree StructuresFurnish an Index Using the Works of Tree Structures
Furnish an Index Using the Works of Tree Structures
 
Programming & Data Structure Lecture Notes
Programming & Data Structure Lecture NotesProgramming & Data Structure Lecture Notes
Programming & Data Structure Lecture Notes
 
1816 1819
1816 18191816 1819
1816 1819
 
Question answer
Question answerQuestion answer
Question answer
 
Algorithms for External Memory Sorting
Algorithms for External Memory SortingAlgorithms for External Memory Sorting
Algorithms for External Memory Sorting
 
Implementation of page table
Implementation of page tableImplementation of page table
Implementation of page table
 
Online Statistics Gathering for ETL
Online Statistics Gathering for ETLOnline Statistics Gathering for ETL
Online Statistics Gathering for ETL
 
Data structure
Data structureData structure
Data structure
 
Distributed design alternatives
Distributed design alternativesDistributed design alternatives
Distributed design alternatives
 
Merging files (Data Structure)
Merging files (Data Structure)Merging files (Data Structure)
Merging files (Data Structure)
 
Structure of the page table
Structure of the page tableStructure of the page table
Structure of the page table
 
Normalization
NormalizationNormalization
Normalization
 
Segmentation in Operating Systems.
Segmentation in Operating Systems.Segmentation in Operating Systems.
Segmentation in Operating Systems.
 
Unit I Database concepts - RDBMS & ORACLE
Unit I  Database concepts - RDBMS & ORACLEUnit I  Database concepts - RDBMS & ORACLE
Unit I Database concepts - RDBMS & ORACLE
 
Properly Use Parallel DML for ETL
Properly Use Parallel DML for ETLProperly Use Parallel DML for ETL
Properly Use Parallel DML for ETL
 
Semi join
Semi joinSemi join
Semi join
 

Destaque

Destaque (9)

7. backup & restore data
7. backup & restore data7. backup & restore data
7. backup & restore data
 
6.1 query optimization overview
6.1 query optimization overview6.1 query optimization overview
6.1 query optimization overview
 
01 gioithieu
01 gioithieu01 gioithieu
01 gioithieu
 
2.1 view
2.1 view2.1 view
2.1 view
 
2.2 cac chuong trinh my sql
2.2 cac chuong trinh my sql2.2 cac chuong trinh my sql
2.2 cac chuong trinh my sql
 
9. partitioning
9. partitioning9. partitioning
9. partitioning
 
8.replication
8.replication8.replication
8.replication
 
C3 2
C3 2C3 2
C3 2
 
2.3 quan ly truy cap
2.3 quan ly truy cap2.3 quan ly truy cap
2.3 quan ly truy cap
 

Semelhante a 6.3 my sql queryoptimization_part2

MySQL Query And Index Tuning
MySQL Query And Index TuningMySQL Query And Index Tuning
MySQL Query And Index TuningManikanda kumar
 
Assignment 4
Assignment 4Assignment 4
Assignment 4SneaK3
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007paulguerin
 
Guide To Mastering The MySQL Query Execution Plan
Guide To Mastering The MySQL Query Execution PlanGuide To Mastering The MySQL Query Execution Plan
Guide To Mastering The MySQL Query Execution PlanOptimiz DBA
 
Excel Top 10 formula For The Beginners
Excel Top 10 formula For The BeginnersExcel Top 10 formula For The Beginners
Excel Top 10 formula For The BeginnersStat Analytica
 
Bt0075 rdbms with mysql 2
Bt0075 rdbms with mysql 2Bt0075 rdbms with mysql 2
Bt0075 rdbms with mysql 2Techglyphs
 
Tunning sql query
Tunning sql queryTunning sql query
Tunning sql queryvuhaininh88
 
Sql(structured query language)
Sql(structured query language)Sql(structured query language)
Sql(structured query language)Ishucs
 
Advanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsAdvanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsDave Stokes
 

Semelhante a 6.3 my sql queryoptimization_part2 (20)

MySQL Query And Index Tuning
MySQL Query And Index TuningMySQL Query And Index Tuning
MySQL Query And Index Tuning
 
Assignment 4
Assignment 4Assignment 4
Assignment 4
 
Chapter9 more on database and sql
Chapter9 more on database and sqlChapter9 more on database and sql
Chapter9 more on database and sql
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
 
Advance excel
Advance excelAdvance excel
Advance excel
 
Guide To Mastering The MySQL Query Execution Plan
Guide To Mastering The MySQL Query Execution PlanGuide To Mastering The MySQL Query Execution Plan
Guide To Mastering The MySQL Query Execution Plan
 
ADVANCE ITT BY PRASAD
ADVANCE ITT BY PRASADADVANCE ITT BY PRASAD
ADVANCE ITT BY PRASAD
 
Oracle: Joins
Oracle: JoinsOracle: Joins
Oracle: Joins
 
Oracle: Joins
Oracle: JoinsOracle: Joins
Oracle: Joins
 
Excel Top 10 formula For The Beginners
Excel Top 10 formula For The BeginnersExcel Top 10 formula For The Beginners
Excel Top 10 formula For The Beginners
 
Bt0075 rdbms with mysql 2
Bt0075 rdbms with mysql 2Bt0075 rdbms with mysql 2
Bt0075 rdbms with mysql 2
 
Advanced sql
Advanced sqlAdvanced sql
Advanced sql
 
Tunning sql query
Tunning sql queryTunning sql query
Tunning sql query
 
MSSQL_Book.pdf
MSSQL_Book.pdfMSSQL_Book.pdf
MSSQL_Book.pdf
 
Sql(structured query language)
Sql(structured query language)Sql(structured query language)
Sql(structured query language)
 
MYSQL join
MYSQL joinMYSQL join
MYSQL join
 
Interview Questions.pdf
Interview Questions.pdfInterview Questions.pdf
Interview Questions.pdf
 
Advanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsAdvanced MySQL Query Optimizations
Advanced MySQL Query Optimizations
 
Mysql Optimization
Mysql OptimizationMysql Optimization
Mysql Optimization
 
Spreadsheet new
Spreadsheet newSpreadsheet new
Spreadsheet new
 

Mais de Trần Thanh (11)

07 trigger view
07 trigger view07 trigger view
07 trigger view
 
4 trigger
4  trigger4  trigger
4 trigger
 
Chuan
ChuanChuan
Chuan
 
C4 1 tuan 14
C4 1 tuan 14C4 1 tuan 14
C4 1 tuan 14
 
C3 2 (tuan6,7)
C3 2 (tuan6,7)C3 2 (tuan6,7)
C3 2 (tuan6,7)
 
C3 1
C3 1C3 1
C3 1
 
C2 2
C2 2C2 2
C2 2
 
C2 1
C2 1C2 1
C2 1
 
C1
C1C1
C1
 
C4 1
C4 1C4 1
C4 1
 
VoIP with Opensips
VoIP with OpensipsVoIP with Opensips
VoIP with Opensips
 

Último

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Último (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

6.3 my sql queryoptimization_part2

  • 1. Hệ quản trị cơ sở dữ liệu Query Optimization Dư Phương Hạnh Bộ môn Hệ thống thông tin Khoa CNTT, trường Đại học Công nghệ Đại học Quốc gia Hanoi hanhdp@vnu.edu.vn
  • 2. Outline       Optimization Overview Optimizing SQL Statement Optimizing Database Structure Query Execution Plan Measuring Performance Internal Details of Mysql Optimizations Reading: Chap 12+13+14 of Ramakrishnan http://dev.mysql.com/doc/refman/5.5/en/optimization.html 2 Hệ quản trị CSDL @ BM HTTT
  • 3. Query Execution Plan 3 Hệ quản trị CSDL @ BM HTTT
  • 4. Query Execution Plan  The set of operations that the optimizer chooses to perform the most efficient query is called the “query execution plan”  Depending on the details of your tables, columns, indexes, and the conditions in your WHERE clause, the MySQL optimizer considers many techniques to efficiently perform the lookups involved in an SQL query. – A query on a huge table can be performed without reading all the rows; – A join involving several tables can be performed without comparing every combination of rows.  Your goals are to recognize the aspects of the EXPLAIN plan that indicate a query is optimized well. 4 Hệ quản trị CSDL @ BM HTTT
  • 5. Optimizing Queries with EXPLAIN EXPLAIN SELECT select_options MySQL displays information from the optimizer about how tables are joined and in which order. – To give a hint to the optimizer to use a join order corresponding to the order in which the tables are named in a SELECT statement, begin the statement with SELECT STRAIGHT_JOIN rather than just SELECT. You can see where you should add indexes to tables so that the statement executes faster. 5 Hệ quản trị CSDL @ BM HTTT
  • 6. EXPLAIN output format  EXPLAIN returns a row of information for each table used in the SELECT statement. – In the output, the tables are listed in the order that MySQL would read them while processing the statement.  MySQL solves all joins using nested-loop join method. – This means that MySQL reads a row from the first table, and then finds a matching row in the second table, the third table, and so on. – When all tables are processed, MySQL outputs the selected columns and backtracks through the table list until a table is found for which there are more matching rows. – The next row is read from this table and the process continues with the next table. 6 Hệ quản trị CSDL @ BM HTTT
  • 7. EXPLAIN output column Column Meaning id The SELECT identifier select_type The SELECT type table The table for the output row type The join type possible_keys The possible indexes to choose key The index actually chosen key_len The length of the chosen key ref The columns compared to the index rows Estimate of rows to be examined Extra Additional information 7 Hệ quản trị CSDL @ BM HTTT
  • 8. EXPLAIN output column  select_type: – – – – – – – – – – 8 SIMPLE: Simple SELECT (not using UNION or subqueries) PRIMARY: Outermost SELECT UNION: Second or later SELECT statement in a UNION DEPENDENT UNION: Second or later SELECT statement in a UNION, dependent on outer query UNION RESULT: Result of a UNION. SUBQUERY: First SELECT in subquery. DEPENDENT SUBQUERY: First SELECT in subquery, dependent on outer query. DERIVED: Derived table SELECT (subquery in FROM clause). UNCACHEABLE SUBQUERY:A subquery for which the result cannot be cached and must be re-evaluated for each row of the outer query. UNCACHEABLE UNION: The second or later select in a UNION that belongs to an uncacheable. Hệ quản trị CSDL @ BM HTTT
  • 9. EXPLAIN output column  Type: The following list describes the join types, ordered from the best type to the worst: – all: A full table scan is done for each combination of rows from the previous tables. – system: The table has only one row (= system table). This is a special case of the const join type. – const: The table has at most one matching row, which is read at the start of the query  values from the column in this row can be regarded as constants by the rest of the optimizer. Const tables are very fast because they are read only once. Const is used when you compare all parts of a PRIMARY KEY or UNIQUE index to constant values. 9 Hệ quản trị CSDL @ BM HTTT
  • 10. EXPLAIN output column – eq_ref: One row is read from this table for each combination of rows from the previous tables. This is the best possible join type. It is used when all parts of an index are used by the join and the index is a PRIMARY KEY or UNIQUE NOT NULL index. Examples: SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column; SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1; 10 Hệ quản trị CSDL @ BM HTTT
  • 11. EXPLAIN output column – ref: All rows with matching index values are read from this table for each combination of rows from the previous tables. Ref is used if the join uses only a leftmost prefix of the key or if the key is not a PRIMARY KEY or UNIQUE (index cannot select a single row based on the key value). Examples: SELECT * FROM ref_table WHERE key_column=expr; SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column; SELECT * FROM ref_table,other_table WHERE ref_table.key_column_part1=other_table.column AND ref_table.key_column_part2=1; 11 Hệ quản trị CSDL @ BM HTTT
  • 12. EXPLAIN output column – range: Only rows that are in a given range are retrieved, using an index to select the rows. The key column in the output row indicates which index is used. The key_len contains the longest key part that was used. The ref column is NULL for this type. Examples: SELECT * FROM tbl_name WHERE key_column = 10; SELECT * FROM tbl_name WHERE key_column BETWEEN 10 and 20; SELECT * FROM tbl_name WHERE key_part1 = 10 AND key_part2 IN (10,20,30); 12 Hệ quản trị CSDL @ BM HTTT
  • 13. EXPLAIN output column – ... (Read more at http://dev.mysql.com/doc/refman/5.5/en/explainoutput.html) 13 Hệ quản trị CSDL @ BM HTTT
  • 14. Optimizing join example EXPLAIN SELECT tt.TicketNumber, tt.TimeIn, tt.ProjectReference, tt.EstimatedShipDate, tt.ActualShipDate, tt.ClientID, tt.ServiceCodes, tt.RepetitiveID, tt.CurrentProcess, tt.CurrentDPPerson, tt.RecordVolume, tt.DPPrinted, et.COUNTRY, et_1.COUNTRY, do.CUSTNAME FROM tt, et, et AS et_1, do WHERE tt.SubmitTime IS NULL AND tt.ActualPC = et.EMPLOYID AND tt.AssignedPC = et_1.EMPLOYID AND tt.ClientID = do.CUSTNMBR; 14 Hệ quản trị CSDL @ BM HTTT
  • 15. Optimizing join example Table Column Data Type tt ActualPC CHAR(10) tt AssignedPC CHAR(10) tt ClientID CHAR(10) et EMPLOYID CHAR(15) do CUSTNMBR CHAR(15) Table Index tt ActualPC tt AssignedPC tt ClientID et EMPLOYID (primary key) do CUSTNMBR (primary key) 15 Hệ quản trị CSDL @ BM HTTT
  • 16. Optimizing join example Initially, before any optimizations have been performed, the EXPLAIN statement produces the following information: table type possible_keys key key_len et ALL PRIMARY NULL NULL do ALL PRIMARY NULL NULL et_1 ALL PRIMARY NULL NULL tt ALL AssignedPC, NULL NULL ClientID, ActualPC 16 ref NULL NULL NULL NULL rows 74 2135 74 3872 Extra Hệ quản trị CSDL @ BM HTTT
  • 17. Optimizing join example  This output indicates that MySQL is generating a Cartesian product of all the tables;  This takes quite a long time, because the product of the number of rows in each table must be examined. For the case at hand, this product is 74 × 2135 × 74 × 3872 = 45,268,558,720 rows. If the tables were bigger  long time… 17 Hệ quản trị CSDL @ BM HTTT
  • 18. Optimizing join example  One problem here is that MySQL can use indexes on columns more efficiently if they are declared as the same type and size.  In this context, VARCHAR and CHAR are considered the same if they are declared as the same size. tt.ActualPC is declared as CHAR(10) and et.EMPLOYID is CHAR(15), so there is a length mismatch.  ALTER Table… 18 Hệ quản trị CSDL @ BM HTTT
  • 19. Optimizing join example Executing the EXPLAIN statement again produces this result: table type possible_keys key key_len ref rows Extra tt ALL AssignedPC, NULL NULL NULL 3872 Using ClientID, where ActualPC do ALL PRIMARY NULL NULL NULL 2135 et_1 ALL PRIMARY NULL NULL NULL 74 et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1 This is not perfect, but is much better: The product of the rows values is less by a factor of 74. This version executes in a couple of seconds. 19 Hệ quản trị CSDL @ BM HTTT
  • 20. Optimizing join example  A second alteration can be made to eliminate the column length mismatches for the tt.AssignedPC = et_1.EMPLOYID and tt.ClientID = do.CUSTNMBR comparisons: table type possible_keys key key_len ref rows et ALL PRIMARY NULL NULL NULL 74 tt ref AssignedPC, ActualPC 15 et.EMPLOYID 52 Using ClientID, where ActualPC et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1 20 Extra Hệ quản trị CSDL @ BM HTTT
  • 21. Optimizing join example  At this point, the query is optimized almost as well as possible.  The remaining problem is that, by default, MySQL assumes that values in the tt.ActualPC column are evenly distributed, and that is not the case for the tt table. It is easy to tell MySQL to analyze the key distribution using ANALYZE statement.  With the additional index information, the join is perfect: 21 Hệ quản trị CSDL @ BM HTTT
  • 22. Optimizing join example table type possible_keys key key_len ref rows tt ALL AssignedPC NULL NULL NULL 3872 ClientID, ActualPC et eq_ref PRIMARY PRIMARY 15 tt.ActualPC 1 et_1 eq_ref PRIMARY PRIMARY 15 tt.AssignedPC 1 do eq_ref PRIMARY PRIMARY 15 tt.ClientID 1 22 Extra Using where Hệ quản trị CSDL @ BM HTTT
  • 23. Estimating Query Performance  You can estimate query performance by counting disk seeks. – For small tables, you can usually find a row in one disk seek (because the index is probably cached). – For bigger tables, you can estimate that, using B-tree indexes, you need this many seeks to find a row: log(row_count) / log(index_block_length / 3 * 2 / (index_length +data_pointer_length)) + 1. 23 Hệ quản trị CSDL @ BM HTTT
  • 24. Estimating Query Performance  In MySQL, an index block is usually 1,024 bytes and the data pointer is usually 4 bytes. For a 500,000-row table with a key value length of 3 bytes (the size of MEDIUMINT), the formula indicates: log(500,000)/log(1024/3*2/(3+4)) + 1 = 4 seeks.  This index would require storage of about 500,000 * 7 * 3/2 = 5.2MB (assuming a typical index buffer fill ratio of 2/3), so you probably have much of the index in memory and so need only one or two calls to read data to find the row.  For writes, however, you need four seek requests to find where to place a new index value and normally two seeks to update the index and write the row. 24 Hệ quản trị CSDL @ BM HTTT
  • 25. Measuring Performance  Performance depending on so many different factors that a difference of a few percentage points might not be a decisive victory. – The results might shift the opposite way when you test in a different environment.  Certain MySQL features help or do not help performance depending on the workload. – For completeness, always test performance with those features turned on and turned off. 25 Hệ quản trị CSDL @ BM HTTT
  • 26. Measuring Performance  To measure the speed of a specific MySQL expression or function, invoke the BENCHMARK() function using the mysql client program as follow: BENCHMARK(loop_count,expression). Example: SELECT BENCHMARK(1000000,1+1);  If we use a Pentium II 400MHz system, the result shows that MySQL can execute 1,000,000 simple addition expressions in 0.32 seconds on that system. 26 Hệ quản trị CSDL @ BM HTTT
  • 27. Internal Details of MySQL Optimizations 27 Hệ quản trị CSDL @ BM HTTT
  • 28. Internal Details of MySQL Optimizations IS NULL Optimization LEFT JOIN and RIGHT JOIN Optimization Nested-Loop Join Algorithms DISTINCT Optimization Optimizing IN/=ANY Subqueries … Read more at http://dev.mysql.com/doc/refman/5.5/en/optimizationinternals.html       28 Hệ quản trị CSDL @ BM HTTT
  • 29. IS NULL Optimization  If a WHERE clause includes a col_name IS NULL condition for a column that is declared as NOT NULL, that expression is optimized away. – This optimization does not occur in cases when the column might produce NULL anyway; for example, if it comes from a table on the right side of a LEFT JOIN.  MySQL can also optimize the combination (col_name = expr OR col_name IS NULL), a form that is common in resolved subqueries. – EXPLAIN shows ref_or_null when this optimization is used. 29 Hệ quản trị CSDL @ BM HTTT
  • 30. IS NULL Optimization  Examples of queries that are optimized, assuming that there is an index on columns a and b of table t2: – SELECT * FROM t1 WHERE t1.a=expr OR t1.a IS NULL; – SELECT * FROM t1, t2 WHERE t1.a=t2.a OR t2.a IS NULL; – SELECT * FROM t1, t2 WHERE (t1.a=t2.a OR t2.a IS NULL) AND t2.b=t1.b; – SELECT * FROM t1, t2 WHERE t1.a=t2.a AND (t2.b=t1.b OR t2.b IS NULL); – SELECT * FROM t1, t2 WHERE (t1.a=t2.a AND t2.a IS NULL AND ...) OR (t1.a=t2.a AND t2.a IS NULL AND ...); 30 Hệ quản trị CSDL @ BM HTTT
  • 31. IS NULL Optimization  ref_or_null works by first doing a read on the reference key, and then a separate search for rows with a NULL key value.  Note that the optimization can handle only one IS NULL level. In the following query, MySQL uses key lookups only on the expression (t1.a=t2.a AND t2.a IS NULL) and is not able to use the key part on b: SELECT * FROM t1, t2 WHERE (t1.a=t2.a AND t2.a IS NULL) OR (t1.b=t2.b AND t2.b IS NULL); 31 Hệ quản trị CSDL @ BM HTTT
  • 32. LEFT JOIN and RIGHT JOIN Optimization  The join optimizer calculates the order in which tables should be joined. – The table read order forced by LEFT JOIN or STRAIGHT_JOIN helps the join optimizer do its work much more quickly, because there are fewer table permutations to check. Example: SELECT * FROM a JOIN b LEFT JOIN c ON (c.key=a.key) LEFT JOIN d ON (d.key=a.key) WHERE b.key=d.key; – MySQL will do a full scan on b because the LEFT JOIN forces it to be read before d. 32 Hệ quản trị CSDL @ BM HTTT
  • 33. LEFT JOIN and RIGHT JOIN Optimization  The fix in this example is reverse the order in which a and b are listed in the FROM clause: SELECT * FROM a JOIN b LEFT JOIN c ON (c.key=a.key) LEFT JOIN d ON (d.key=a.key) WHERE b.key=d.key; SELECT * FROM b JOIN a LEFT JOIN c ON (c.key=a.key) LEFT JOIN d ON (d.key=a.key) WHERE b.key=d.key; 33 Hệ quản trị CSDL @ BM HTTT
  • 34. LEFT JOIN and RIGHT JOIN Optimization if the WHERE condition is always false for  For a LEFT JOIN, the generated NULL row, the LEFT JOIN is changed to a normal join. For example, the WHERE clause would be false in the following query if t2.column1 were NULL: SELECT * FROM t1 LEFT JOIN t2 ON (column1) WHERE t2.column2=5;  Therefore, it is safe to convert the query to a normal join: SELECT * FROM t1, t2 WHERE t2.column2=5 AND t1.column1=t2.column1;  This can be made faster because MySQL can use table t2 before table t1 if doing so would result in a better query plan. 34 Hệ quản trị CSDL @ BM HTTT
  • 35. Nested-Loop Join Algorithms (NLJ)  MySQL executes joins between tables using a nested-loop algorithm or variations on it.  Assume that a join between three tables t1, t2, and t3 is to be executed using the following join types: Table Join_Type t1 range t2 ref t3 ALL. 35 Hệ quản trị CSDL @ BM HTTT
  • 36. Nested-Loop Join Algorithms (NLJ)  If a simple NLJ algorithm is used, the join is processed like this: for each row in t1 matching range { for each row in t2 matching reference key { for each row in t3 { if row satisfies join conditions, send to client } } } 36 Hệ quản trị CSDL @ BM HTTT
  • 37. Nested-Loop Join Algorithms (NLJ)  A Block Nested-Loop (BNL) join algorithm uses buffering of rows read in outer loops to reduce the number of times that tables in inner loops must be read.  For example, if 10 rows are read into a buffer and the buffer is passed to the next inner loop, each row read in the inner loop can be compared against all 10 rows in the buffer. The reduces the number of times the inner table must be read by an order of magnitude. 37 Hệ quản trị CSDL @ BM HTTT
  • 38. Nested-Loop Join Algorithms (NLJ) for each row in t1 matching range { for each row in t2 matching reference key { store used columns from t1, t2 in join buffer if buffer is full { for each row in t3 { for each t1, t2 combination in join buffer { if row satisfies join conditions, send to client } } empty buffer } } } 38 if buffer is not empty { for each row in t3 { for each t1, t2 combination in join buffer { if row satisfies join conditions, send to client } } } Hệ quản trị CSDL @ BM HTTT
  • 39. Nested-Loop Join Algorithms (NLJ)  S: the size of each stored t1, t2 combination  C: the number of combinations in the buffer  The number of times table t3 is scanned is: (S * C)/join_buffer_size + 1  The number of t3 scans decreases as the value of join_buffer_size increases, up to the point when join_buffer_size is large enough to hold all previous row combinations. At that point, there is no speed to be gained by making it larger. 39 Hệ quản trị CSDL @ BM HTTT
  • 40. Optimizing IN/=ANY Subqueries  To help the query optimizer better execute your queries, use these tips: – A column must be declared as NOT NULL if it really is. (This also helps other aspects of the optimizer.) – If you don't need to distinguish a NULL from FALSE subquery result, you can easily avoid the slow execution path. Replace a comparison that looks like this: outer_expr IN (SELECT inner_expr FROM ...) with this expression: (outer_expr IS NOT NULL) AND (outer_expr IN (SELECT inner_expr… 40 Hệ quản trị CSDL @ BM HTTT
  • 41. Optimizing IN/=ANY Subqueries outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where) MySQL evaluates queries “from outside to inside.” – It first obtains the value of the outer expression outer_expr, and then runs the subquery and captures the rows that it produces. A very useful optimization is to “inform” the subquery that the only rows of interest are those where the inner expression inner_expr is equal to outer_expr. This is done by pushing down an appropriate equality into the subquery's WHERE clause. 41 Hệ quản trị CSDL @ BM HTTT
  • 42. Optimizing IN/=ANY Subqueries  The comparison is converted to this: outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where) EXISTS (SELECT 1 FROM ... WHERE subquery_where AND outer_expr=inner_expr)  After the conversion, MySQL can use the pusheddown equality to limit the number of rows that it must examine when evaluating the subquery. 42 Hệ quản trị CSDL @ BM HTTT