Adapting and adopting SQL Plan Management (SPM) to achieve execution plan stability for sub-second queries on a high-rate OLTP mission-critical application
1. Adapting and adopting
SQL Plan Management
to achieve execution plan stability for sub-second queries on a high-rate
OLTP mission-critical application
Carlos Sierra
7. Motivation
• Plan stability is more valuable than plan flexibility when
• Strict SLAs in the order of milliseconds
• Simple queries execute dozens of times per second
• Out-of-the-box “Automatic SPM Evolve Task” is great … but
• It may accept sub-optimal execution plans on a non-typical application
• e.g.: binds captured could be outdated in a matter of hours
• Historical plan performance could be used to determine future SQL
performance with some degree of confidence
• We just need to implement an autonomous custom algorithm …
8. About the application and environment
• Oracle 12c multi-tenant on Oracle Servers X5-2 with NVMe SSD
• OLTP application with copies in 30+ databases and 700+ PDBs
• Row by row web-based custom application
• Transaction isolation implemented through application-enforced
serialization
• Few critical queries encapsulated as “critical serial-path transaction”
• Typical transaction executes in ~10ms including up to 10 queries
• A “plan-flip” constantly risks breaching stringent milliseconds SLAs
9. A typical query
SELECT …
FROM SYSTEMS
WHERE (id, TxnID, 1) IN ( SELECT id, TxnID,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY
TxnID DESC ) rn
FROM SYSTEMS WHERE TxnID <= :1 )
AND Live = 'Y'
AND ((compartmentId = :2))
ORDER BY compartmentId ASC, id ASC
FETCH FIRST :3 ROWS ONLY
12. Adaptive SQL Plan Management on 12c
• Refer to this link for details
https://oracle-base.com/articles/12c/adaptive-sql-plan-management-12cr1
• Evolution of SPBs is “on” by default
• View dba_advisor_parameters
• Filter task_name = ‘SYS_AUTO_SPM_EVOLVE_TASK’
• Columns parameter_name and parameter_value
• Look for ACCEPT_PLANS parameter
• Last evolution: DBMS_SPM.report_auto_evolve_task
• Creation of SPB is “off” by default
13. Automatic SPM Evolve Task
Plan is evaluated using variable
values captured at the time the
test plan is created.
The “evolve task” determines
the base plan performs poorly
when executed passing
outdated values.
This application has a fast
moving time window.
14. Custom SPM Implementation Objectives
• Reduce the number of incidents where the execution of a new plan
causes a performance regression of an SLA related SQL statement
• Create a SQL Plan Baseline (SPB) a.k.a. “pin a plan” when such a plan
has a proven record of consistent good performance (i.e. learn from
history or lack of history)
• Ignore SQL statements that are too young
• If a SQL statement changes, then re-learn from history and “pin a
plan” once it becomes mature again
• Flag a plan as permanent once its SPB has also matured
• Clean up unwanted plans
15. FPZ Algorithm
• Pre-select SQL_ID/PHV candidates, mainly from shared pool
• If there exists a valid SQL Plan Baseline (SPB) for candidate
• Demote SPB if underperforms (disable it)
• Promote SPB after proven performance (fix it)
• Else (no SPB exists for candidate)
• Further screen candidate
• Create SPB if candidate is accepted
• Log decision
17. Pre-select SQL_ID/PHV candidates
• For PHV, at least one child cursor is valid, shareable and not obsolete
• Parent cursor’s first load time is > 6 days
• SQL (parent cursor) is mature
• Cursor has been active within the last 24 hours
• Parsing user and schema is not SYS nor Oracle managed
• PDB is not CDB$ROOT or PDB$SEED
• PHV and Executions are > 0
• Some others
18. Consider Plan candidates from AWR, only if
1. Plan is not on Shared Pool
• Plan was generated in the past (AWR) but not currently in memory
2. There are other Plans for SQL on Shared Pool (with no SPB)
• SQL is active and has no SPB
3. Focus is on one SQL and not entire PDB or CDB
• Algorithm skips AWR plans which are candidates from Shared Pool
(because AWR does not store SPB name on SQLSTAT)
Note: not having SPB name on AWR SQLSTAT would cause algorithm to re-
create SPB on every execution
19. What is a valid SQL Plan Baseline (SPB)?
• Accepted
• Enabled
• Reproduced
• Not necessary Fixed
20. Disable SPB if underperforms
• Cursor’s average elapsed time per execution > 10x category’s
threshold
• Cursor’s average elapsed time per execution > 100x SPB average
elapsed time per execution
• Evaluate after N executions (as per candidate threshold)
21. SPB demotion to “DISABLE”
Cursor Cache
Plans with
SPB
SPBs that qualify for a
“DISABLE” demotion
Enabled
Accepted
Reproduced
Not Fixed
Avg ET > 10x Max Category Threshold
Avg ET > 100x SPB snapshot
22. SPB evaluation and conditional promotion
• If not “fixed” and “created” > 14 days
• Set “FIX” flag to YES
• Plan is mature, in use and with acceptable performance
• Note: after “fixed” no new plans are created into Plan History
23. SPB promotion to “FIX”
Cursor Cache
Plans with
SPB
SPBs that qualify for a
“FIX” promotion
Enabled
Accepted
Reproduced
Not Fixed
Age > 14d
24. Further screen SQL_ID/PHV candidates
• Plan has > “X” executions
• > 10,000 for some categories
• > 1,000 for other categories
• Hint: Start SPM Automation with high-rate SQL only
• Plan’s average execution time is < “X”ms
• < 0.5ms for some categories
• < 10ms for other categories
• Proven acceptable “on average” performance based on cumulative metrics
• Or lack of historical metrics which usually denote a light-weight SQL
25. SPB creation
Cursor Cache and
AWR SQLSTAT
Plan candidates
for SPB
Plans that qualify
for a SPB
Executions > 2,500
Elapsed Time per Execution < 10s
Age > 4d
Executions > 25,000
Elapsed Time per Execution < 1.25ms
Age > 6d
26. Categorizing SQL statements
• Use Module and Action, and/or parse SQL text
• Critical transaction (e.g.)
• Commit path
• Begin transaction
• Garbage collection
• Non-critical transaction (e.g.)
• Scan read
• Something else (i.e.)
• Categorize as non-application and possibly reject candidate
27.
28. Further screen SQL_ID/PHV candidates (cont.)
• Plan has no AWR performance history (low database load); or
• Plan has AWR recent performance history (60 days) such as
• Execution time’s 90th Percentile < 2x cursor’s category and < 20x cursor’s avg
• e.g. < 2.5ms and < 20x avg
• Execution time’s 95th Percentile < 3x cursor’s category and < 30x cursor’s avg
• e.g. < 3.75ms and < 30x avg
• Execution time’s 97th Percentile < 4x cursor’s category and < 40x cursor’s avg
• e.g. < 5ms and < 40x avg
• Execution time’s 99th Percentile < 5x cursor’s category and < 50x cursor’s avg
• e.g. < 6.25ms and < 50x avg
31. Create SQL Plan Baseline (SPB)
• Enabled
• Accepted
• But not “Fixed”
• Source most from Cursor Cache, and some from AWR
32. Log decision
• Update SPB “description”
• Source SQL_ID
• Source plan hash value (PHV)
• Date when promoted to “Fixed” or demoted from “Fixed”
• Write into log
• Created SPB with selection metrics such as execution percentiles
• Promoted and demoted SPBs, with criteria used
• Rejected candidates and reason
• Preserve logs for at least 1 month
33. FPZ Algorithm (recap)
• Pre-select SQL_ID/PHV candidates, mainly from shared pool
• If there exists a valid SQL Plan Baseline (SPB) for candidate
• Demote SPB if underperforms (disable it)
• Promote SPB after proven performance (fix it)
• Else (no SPB exists for candidate)
• Further screen candidate
• Create SPB if candidate is accepted
• Log decision
34. AWR Configuration
• EXEC DBMS_SPM.CONFIGURE('plan_retention_weeks', 13);
• EXEC DBMS_WORKLOAD_REPOSITORY.MODIFY_SNAPSHOT_SETTINGS(topnsql=>300);
• ALTER SYSTEM SET "_awr_sql_child_limit" = 2000;
35. Additional considerations
• Set Autopurge to NO for Plans on black-list
• Manually (out of scope for automation)
• What if there is no “proven consistent performance”?
• What if average performance is higher than target threshold?
• What if predicates selectivity requires more than one execution plan?
• What if SQL produces different plans across databases?
36. FPZ Algorithm Automation
• PL/SQL package
• Can be executed from SQL*Plus or OEM calling a PL/SQL library
• Executed connecting as CDB$ROOT
• Set of configuration constants
• How many SPB to create and how many promote? (or report only)
• Report rejected candidates and non-promoted SPBs?
• Evaluate particular application categories
• Number of executions to consider a candidate, or to qualify for a SPB
• Time per execution to qualify a candidate for SPM
• Factors over average elapsed time for 90th, 95th, 97th and 99th percentiles
• Days of AWR history to consider
37.
38. Dry run results and sample output
+------------------------------------------------------------
|
| Candidates : 2019
| SPBs Qualified for Creation : 977
| SPBs Qualified for Promotion : 4
| SPBs Created : 0
| SPBs Promoted : 0
| Date and Time (end) : 2017-10-22T14:33:42
| Duration (secs) : 102
|
+------------------------------------------------------------
47. Outliers
• SQL not considered by PL/SQL library
• Candidates rejected for valid reasons (performance, executions, age, etc.)
• Bug on Algorithm or PL/SQL library?
• Algorithm too restrictive?
• Short-lived small spikes
• Executions burst combined with frequent hard-parses due to CBO statistics
gathering
• SQL has multiple optimal plans as per Adaptive Cursor Sharing (ACS)
• Algorithm implements a subset
48. Closing remarks
• Past performance may not be indicative of future results
• Nevertheless: historical plan performance can be used to determine future SQL
performance with some degree of confidence
• Not every SQL statement gets a SPB
• Some queries are still at risk of spikes
• Lower rate of executions, performance above thresholds, new SQL, etc.
• And not every plan becomes a SPB (think ACS)
• Method presented reduces frequency of “plan flips”
• Consistent latency is more important than best performance