3. Agenda
● Indexes
- What is index
- B-Tree
- More about indexes
● Queries
- Temporary Tables and filesort in MySQL
- GROUP BY optimization
- Order By Optimization
4. Indexing in the Nutshell
● Indexes is a data structure which is created and
targeted to speed access to database to make your
query run faster.
● Queries can be ran without any indexes but it can take
really long time.
6. Overhead of The Indexing
Writes:
Updating data means updating index
Reads:
Additional Indexes lead to wasting space and memory, and
also additional overhead during query optimization.
Costly, don’t add more indexes than you need.
7. Types of Indexes
● BTREE => Majority of indexes in MySQL
● RTREE => MyISAM only, for GIS
● HASH => MEMORY, NDB
● FULLTEXT => MyISAM, Innodb starting from 5.6
BTREE is default index except for MEMORY engine.
RTREE is for queries like show me all cities within 100 mile of Alex.
NDB is a MySQL Cluster Storage Engine.
FULLTEXT InnoDB, have an inverted index design.
8. B-Tree
B-Trees were described originally as
generalizations of binary search trees BST.
The generalization is that instead of one value,
the node has a list of values,
and the list is of size n ( n > 2 ).
BST
9. B+Tree
Branch/Root Node
less than 3
Leaf Node
Data Pointers
● SELECT * FROM table where id=2
● range scan SELECT * from table where id in (2,4,6)
● innodb, pointers in two directions optimized for range scan
● have free space, data can be added later.
10. B+Tree
InnoDB uses a B+Tree structure for its indexes. A B+Tree is
particularly efficient when data doesn’t fit in memory and
must be read from the disk, as it ensures that a fixed
maximum number of reads would be required to access
any data requested, based only on the depth of the tree,
which scales nicely.
11. B-Tree VS B+Tree
● B+ trees don't store data pointer in interior nodes, they are ONLY
stored in leaf nodes. This is not optional as in B-Tree. This means
that interior nodes can fit more keys on block of memory.
● The leaf nodes of B+ trees are linked, so doing a linear scan of all
keys will requires just one pass through all the leaf nodes. A B
tree, on the other hand, would require a traversal of every level
in the tree. This property can be utilized for efficient search as
well since data is stored only in leafs.
12. What Operations can B+Tree do?
● Find all rows with KEY=5 (point lookup)
● Find all rows with KEY>5 (open range)
● Find all rows with 5<KEY>10 (closed range)
● Can not find rows where last digit of the KEY is Zero
13. Summary
● Linear search is very slow, complexity is O(n)
● Indexes improve search performance.
● Many different type of indexes.
● B-Tree Indexes and derivatives (MyISAM, InnoDB)
● But add extra cost to INSERT/UPDATE/ DELETE
14. Indexes in MyISAM vs Innodb
MyISAM:
data pointers points to physical offset in the data file.
Innodb:
Primary Keys: stores data in the leaf pages of the
index, not pointer.
Secondary Keys: stores Primary Keys as data pointer.
15. Indexing Innodb table
1. Every table has a primary key; if the CREATE TABLE does not specify one,
the first non-NULL unique key is used, and failing that, a 48-bit hidden
“Row ID” field is automatically added to the table structure and used as
the primary key. Always add a primary key yourself. The hidden one is
useless to you but still costs 6 bytes per row.
2. The “row data” (non-PRIMARY KEY fields) are stored in the PRIMARY KEY
index structure, which is also called the “clustered key”. This index
structure is keyed on the PRIMARY KEY fields, and the row data is the
value attached to that key.
3. Secondary keys are stored in an identical index structure, but they are
keyed on the KEY fields, and the primary key value (PKV) is attached to
that key.
16. Indexing Innodb table
Data is clustered by Primary Key
- For comments (POST_ID, COMMENT_ID) can be good
PRIMARY KEY, storing all comments for a single post
close together.
- Primary Key is implicitly appended to all indexes
17. KEY(A, B, C) -- ORDER of columns matters
Index is USED
● A>5
● A=5 AND B>6
● A=5 AND B=6 AND C=7
● A=5 AND B IN (2,3) AND C>5
Index is NOT USED
● B<5
● B=6 AND C=7
● A>5 AND B=2 -> range on first column, only use this key part
● A=5 AND B>6 AND C=2 -> range on second column, use 2 parts
Multiple Column Index
18. First Rule of MySQL Optimizer
MySQL will stop using key parts in multi parts index as
soon as it met real range (>, <, BETWEEN), it however is
able to continue key parts further to the right if IN (..)
range is used.
19. Use Index for Sorting
SELECT * FROM players ORDER BY score DESC LIMIT 10
● Use index on score column
● Without the index, MySQL will do “filesort” (very
expensive)
SELECT * FROM players WHERE country=”US” ORDER BY
score DESC LIMIT 10
● Best served by index on KEY(country, score)
20. Multi Column Indexes for efficient sorting
KEY(A, B)
Index is USED
● ORDER BY A
● A=5 ORDER BY B
● ORDER BY A DESC, B DESC
● A>5 ORDER BY A
Index is NOT USED
● ORDER BY B
● A>5 ORDER BY B
● A IN (1,2) ORDER BY B
● ORDER BY A ASC, B DESC
21. MySQL Index for Sorting Rules
● You can’t sort in different order by 2 columns
● You can only have equality comparison for columns
which are not part of the ORDER BY
22. Avoiding Reading The data
Covering Index
● Reading Index only, not accessing the data.
SELECT status FROM orders where csutomer_id=123
KEY(customer_id, status)
● Help Min/Max aggregate functions (Only)
SELECT MAX(salary) FROM employees GROUP BY dept_id
KEY(dept_id, salary)
23. Indexes and Joins
SELECT * FROM posts, comments WHERE author=”Peter”
AND comments.post_id = posts.id
- Scan posts table for posts with author=”Peter”
- For each row, go to the comments table and fetch
related comments.
● Index comments.post_id
● Index posts.id is not used in this case.
24. Using multiple indexes for The Table
MySQL can use more than one Index.
SELECT * FROM table WHERE A=5 AND B=6
Can use KEY(A) & KEY(B)
KEY(A, B) is much better.
SELECT * FROM table WHERE A=5 OR B=6
2 separate indexes is as good as it gets.
KEY(A, B) can’t be used.
26. Indexes
CREATE TABLE City (
ID int(11) NOT NULL AUTO_INCREMENT,
Name char(35) NOT NULL DEFAULT ‘’,
CountryCode char(3) NOT NULL DEFAULT ‘’,
District char(20) NOT NULL DEFAULT ‘’,
Population int(11) NOT NULL DEFAULT ‘0’,
PRIMARY KEY (ID),
KEY CountryCode (CountryCode)
) Engine=InnoDB;
32. Covered Index: Example
Covered index = cover all fields in the query
ALTER TABLE City ADD KEY cov1(CountryCode, District, population,
name);
Fields order:
1. Where clause
2. Group By/Order By (Not Used Now)
3. Select part (name)
34. Covered Index: Example
Range & Const condition.
SELECT name FROM City where district=”California” AND population >
30000
Index (district, population, name) in this order.
Rule of thumb:
Const first the Range comes second. (depends on the Query)
37. What is Filesort?
The truth is, filesort is badly named. Anytime a sort can’t
be performed from an index, it’s a filesort. It has nothing
to do with files.
38. Temporary tables I
MySQL creates temporary tables when query uses:
⚪ GROUP BY
⚪ Range + ORDER BY
⚪ Some other expressions
2 types of temporary tables:
⚪ MEMORY
⚪ On-disk
39. Temporary tables II
First MySQL create temporary table in memory
MySQL configuration variables:
tmp_table_size
⚪ maximum size for in Memory temporary tables
max_heap_table_size
⚪ Sets the maximum size for MEMORY tables
41. Indexes: Theory
● MySQL choose one best index per table.
● Supports combined index.
● Order of Fields in combined index matter.
● MySQL can use leftmost part of any index.
● MySQL can use index to satisfy GROUP BY/ORDER BY