3. Why use indexes?
Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in b-trees
B-tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential
access, insertions, and deletions in predictable time
5. Selectivity
Selectivity is the ratio of unique values within a certain column
The more unique the values, the higher the selectivity
The query engine likes highly selective key columns
The higher the selectivity, the faster the query engine can reduce the size of the
result set
6. Selectivity and Cardinality
Cardinality is number of unique values in the index.
In simple words:
Max cardinality: all values are unique
Min cardinality: all values are the same
Selectivity of index = cardinality/(number of records) * 100%
The perfect selectivity is 100%. Can be reached by unique indexes on NOT NULL columns.
7. Query optimization
The main idea is not to try to tune your database, but optimize
your query based on the data you have
8. Selectivity by example
Example:
Table of 10,000 rows with column `gender` (number of males ~ number of females)
Let’s count selectivity for the `gender` column
Selectivity = 2/10000 * 100% = 0.02% which is very low
9. When selectivity can be neglected
Selectivity can be neglected when values are distributed unevenly
Example:
If our query select rows with stat IN (0,1) then we can still use index.
As a general idea, we should create indexes on tables that are often queried for less than 15% of the
table's rows
10. How MySQL uses indexes
• Data Lookups
• Sorting
• Avoiding reading “data”
• Special Optimizations
11. Data Lookups
SELECT * FROM employees WHERE lastname=“Smith”
The classical use of index on (lastname)
Can use Multiple column indexes
SELECT * FROM employees WHERE lastname=“Smith” AND
dept=“accounting”
12. Use cases
Index (a,b,c) - order of columns matters
Will use Index for lookup (all listed keyparts)
a>5
a=5 AND b>6
a=5 AND b=6 AND c=7
a=5 AND b IN (2,3) AND c>5
Will NOT use Index
b>5 – Leading column is not referenced
b=6 AND c=7 - Leading column is not referenced
Will use Part of the index
13. The thing with ranges
MySQL will stop using key parts in multi part index as soon as
it met the real range (<,>, bETWEEN), it however is able to
continue using key parts further to the right if IN(…) range is
used
14. Sorting
SELECT * FROM players ORDER BY score DESC LIMIT 10
Will use index on SCORE column
Without index MySQL will do “filesort” (external sort) which is very expensive
Often Combined with using Index for lookup
SELECT * FROM players WHERE country=“US” ORDER BY score DESC LIMIT 10
Best served by Index on (country, score)
15. Use Cases
It becomes even more restricted!
KEY(a,b)
Will use Index for Sorting
ORDER BY a - sorting by leading column
a=5 ORDER BY b - EQ filtering by 1st and sorting by 2nd
ORDER BY a DESC, b DESC - Sorting by 2 columns in same order
a>5 ORDER BY a - Range on the column, sorting on the same
Will NOT use Index for Sorting
16. Sorting rules
You can’t sort in different order by 2 columns
You can only have Equality comparison (=) for columns which
are not part of ORDER BY
Not even IN() works in this case
17. Avoid reading the data
“Covering Index”
Applies to index use for specific query, not type of index.
Reading Index ONLY and not accessing the “data”
SELECT status FROM orders WHERE customer_id=123
KEY(customer_id, status)
Index is typically smaller than data
Access is a lot more sequential
18. Aggregation functions
Index help MIN()/MAX() aggregate functions
But only these
SELECT MAX(id) FROM table;
SELECT MAX(salary) FROM employee GROUP BY dept_id
Will benefit from (dept_id, salary) index
“Using index for group-by”
19. Joins
MySQL Performs Joins as “Nested Loops”
SELECT * FROM posts p, comments c WHERE p.author=“Peter” AND c.post_id=p.id
Scan table `posts` finding all posts which have Peter as an author
For every such post go to `comments` table to fetch all comments
Very important to have all JOINs Indexed
Index is only needed on table which is being looked up
The index on posts.id is not needed for this query performance
20. Multiple indexes
MySQL Can use More than one index
“Index Merge”
SELECT * FROM table WHERE a=5 AND b=6
Can often use Indexes on (a) and (b) separately
Index on (a,b) is much better
SELECT * FROM table WHERE a=5 OR b=6
2 separate indexes is as good as it gets
21. String indexes
There is no difference… really
Sort order is defined for strings (collation)
“AAAA” < “AAAB”
Prefix LIKE is a special type of Range
LIKE “ABC%” means
“ABC[LOWEST]”<KEY<“ABC[HIGHEST]”
LIKE “%ABC” can’t be optimized by use of the index
23. Real case: Timing
Initially it was like 1m 20sec seconds to run for the first time
After mysql cached the response, it was about 20sec
24. Real case: Query
SELECT wk2_campaign.*,
wk2_campaignGroup.category_id as group_category_id,
wk2_campaignGroup.subcategory_id as group_subcategory_id,
wk2_campaignGroup.summary as group_summary,
IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) category_id
FROM `wk2_campaign`
LEFT JOIN wk2_resource_status ON( wk2_resource_status.id = wk2_campaign.CaID)
LEFT JOIN campaign_has_group ON( wk2_campaign.CaID = campaign_has_group.campaign_id)
LEFT JOIN wk2_campaignGroup ON( campaign_has_group.campaign_group_id = wk2_campaignGroup.GrID)
LEFT JOIN si_private_campaigns pc ON( pc.campaign_id = wk2_campaign.CaID)
WHERE
(wk2_campaign.tracking_active = '1') AND
((IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) IS NOT NULL)
AND (IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) NOT IN(SELECT id FROM campaign_categories WHERE name IN(
'Mobile Content Subscription'
)))
AND(countries REGEXP 'US')) AND(
((wk2_campaign.stat IN('0', '1')) AND(
wk2_resource_status.resource_type =
'ca') AND(
wk2_resource_status.status =
'1') AND(wk2_campaign.access !=
'0') AND(wk2_campaign.external_id IS NULL) AND(
wk2_campaign.name IS NOT NULL
) AND(wk2_campaign.countries IS NOT NULL) AND(
trim(wk2_campaign.countries) IS NOT NULL
)) OR(pc.campaign_id IS NOT NULL)
);
25. Steps to optimize
1. Add missing indexes for the joined tables
2. Check the selectivity for different columns of the main table wk2_campaign
The `tracking_active`, `stat` columns have the best selectivity (the low number
of possible values) which can be indexed fast and boost query response time.
26. Steps to optimize
3. Add index on these columns:
ALTER TABLE wk2_campaign ADD INDEX(tracking_active, stat);
4. We needed just to move some conditions so that they would fit the index
27. Result of optimization
With these manipulations we made the query use only indexes
The explain select of this query:
Query run before after Performance
increase
First time 1m 20s 0m 2s 4000%
Subsequent (cached by
mysql)
20s 0.26s 7692%
28. Another example with “or”
Before
SELECT `wk2_campaign`.*
FROM `wk2_campaign`
LEFT JOIN campaign_summary ON (campaign_summary.campaign_id = caid)
WHERE (name LIKE '%buscape%' OR caid LIKE 'buscape%') OR mobile_app_id LIKE '%buscape%' OR caid in
('89630','89632');
130 rows in set (7.43 sec)
After
SELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id =
caid) WHERE (name LIKE '%buscape%' OR caid LIKE 'buscape%')
UNION
SELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id =
caid) WHERE mobile_app_id LIKE '%buscape%'
UNION
SELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id =
caid) WHERE caid in ('89630','89632');
130 rows in set (4.12 sec)