If your organization relies on data, optimizing the performance of your database can increase your earnings and savings. Many factors large and small can affect performance, so fine-tuning your database is essential. Performance Tuning expert and Senior Applications Tuner for Datavail, Chuck Ezell, sheds light on the right questions to get the answers that will help you move forward by using a defined approach, refered to as 5S.
This performance tuning white paper addresses each stage of this novel approach, as well as key performance issues: SQL, Space, Sessions, Statistics, and Scheduled Processes.
Unblocking The Main Thread Solving ANRs and Frozen Frames
Â
The 5S Approach to Performance Tuning by Chuck Ezell
1. The 5S Approach
To Database Performance Tuning
Presented by Chuck Ezell
chuck.ezell@datavail.com
478-714-1615
2. The Landscape
Bank of America online banking down for 6
days, affecting 29 million online customers.
Gmail down for 2 days, caused by software update
affecting 120,000 users.
Virgin Blue’s reservation desk down for 11 days
affecting 50k passengers and 400 flights, costing
millions in profit.
Netflix down 4-8 hours, affecting 20 million
customers, potentially due to software deployment
issues that were termed “internal technical issues.”
PayPal battled on-and-off service outages for about five
days in October 2004 after upgrading site. They blamed
the glitches on a software update.
11/19/2013
www.datavail.com
2
3. Why Performance Tune?
80% of unplanned outages are due to ill-planned changes
made by operations or developers.
60% of availability and performance errors are the result of
mis-configurations.
80% of incidents are caused by changes made to the IT
environment including application code.
Looking Ahead: Through 2015, 80% of outages impacting
mission-critical services will be caused by people and process
issues. More than 50% will be caused by
change/configuration/release integration and hand-off issues.
11/19/2013
www.datavail.com
3
5. DBA’s Top Performance Issues?
“What are the Top 3 performance issues that you
encounter with your SQL servers?” – Stack Exchange User
11/19/2013
www.datavail.com
5
6. What is Database Performance Tuning?
Pull an AWR and ASH report!
What’s your Buffer Cache Hit Ratio?
Look in the Workload History.
Are your statistics up to date?
Buffer busy waits
I/O Wait
SQL * Net message from client
Enq: CF Contention
db file sequential reads
Disk Reads
Enq:TX – row lock contention
Buffer Gets
Cursor: pin S wait on X
Concurrency Wait Time
LGWR wait for redo copy
Rollbacks & Transactions
Physical Reads
Log File Sync Waits
11/19/2013
www.datavail.com
6
7. Ignore the Forest for the Trees
“Every defect is a treasure, if the company can uncover its
cause and work to prevent it across the corporation.”
- Kilchiro Toyoda, founder of Toyota
Reactive Approach vs. Proactive Approach
• Many approaches out there are from hardware/architecture perspective or
deal with peripheral issues and distractions.
• What works in reactive situations often applies when proactively planning
(in both cases we’re mitigating).
• Most often we’re facing Reactive situations.
• Build on what we know are real problems we’re fixing right now.
• Let’s avoid all the distractions of all the potential peripheral issues.
• We need a direct quick way to address root cause and remediate.
11/19/2013
www.datavail.com
7
8. The 5S Approach
SQL Code
Statistics
Space/Indexing
Sessions
Scheduled Process
11/19/2013
www.datavail.com
8
9. Step 1 - SQL Code
Review the SQL Execution Plan
•
•
•
•
•
What are the peaks and bottlenecks?
What indexes are being used?
Do you see Index Skip Scans, Index Range Scans or Full Table Scans?
Why is the optimizer/execution engine generating this plan?
Are there embedded HINTs forcing the poor execution?
Review the SQL Code
•
•
•
•
•
•
11/19/2013
Wise use of built in, optimized core language functions ?
Are there ANSI JOINS instead Core Product Friendly JOINs?
Are you seeing date or integer calculations without the use of Core Product Friendly
functions (e.g. implicit conversions)?
Are there too many rows being selected (e.g. SELECT * is bad form)?
Are bind variables being used?
Iterative Calls generating multiple SQL statements (reduce).
www.datavail.com
9
10. Step 2 - Statistics
• Are the statistics up to date?
• Are you finding stats on temporary tables?
• Is there a great degree of data manipulation on the tables in question
that might have left a high water mark?
• Are there sufficient transaction slots for the table/index in question?
• What is the clustering factor on the indexes in question?
Step 3 - Space/Indexing
•
•
•
•
•
11/19/2013
Are there better indexes that can be used?
Are you missing indexes (or too many IDX) that would be needed?
Is there too much data in the table and in need of purging?
Are the tables fragmented and in need of rebuilding?
Is there sufficient space available for temp data to be processed?
www.datavail.com
10
11. Step 4 - Sessions
•
•
•
•
•
Is it possible a developer is testing SQL code against production?
Are you seeing long running sessions causing blocks or waiting?
Are there sessions locking objects and/or possibly invalidating objects?
Are you seeing too many sessions open at once?
Are you finding abandoned sessions consuming connections and CPU?
Step 5 - Scheduled Processes
• Was that backup scheduled for 12 a.m. or 12 p.m., right in the middle
of the sales day?
• Are there processes competing for resources?
• Do you know how many child processes the scheduled process will
spawn?
• Are your update statistics jobs conflicting with other crucial process
and causing extended run times?
11/19/2013
www.datavail.com
11
12. Case Study #1
Customer Environment: Informix, Java, Silverlight
Complaint: Reports were running greater than 5 minutes and would often
timeout when searching by city, state, zip code, vendor name, or vendor id.
Root Cause: SQL Code
•
•
•
•
Too many ANSI joins to external database for Vendor Information.
SQL was also using OR statements (in the JOINs) against two different sets of tables.
Normalization was too high, which forced multiple joins to get simple vendor information.
Poor, non-standard SQL caused Informix to generate bad execution plan.
Solution:
•
•
•
Replicate external dependency into de-normalized tables within DB.
Rewrite the SQL joins
Eliminate OR clauses by with de-normalization.
Result: 5+ minute response times decreased to a
maximum of :30 second response time.
11/19/2013
www.datavail.com
12
13. Case Study #2
Customer Environment: Oracle EBS & iStore
Complaint: Vendor sales checkout form was performing slowly.
Root Cause: Statistics
•
•
•
A poor execution plan was causing multiple blocking locks, waits, high I/O and high CPU.
Statistics were found on two temporary tables that should not have been there.
A DBA had improperly scheduled a statistics update job and it generated statistics on ALL
objects in the database.
Solution:
•
Dropped the statistics on the temporary tables and execution plan reverted back to
previous plan.
Result: Response time went from several minutes
with blocking locks and waits to sub-second
response times hardly worth noting.
11/19/2013
www.datavail.com
13
14. Case Study #3
Customer Environment: Oracle Financials
Complaint: Month-end reporting taking days to complete. Had to schedule
through weekend and eliminate any interaction with system until
complete.
Root Cause: SQL Code, Indexing
•
•
•
A concurrent process was spawning multiple child processes.
Each process was executing full table scans and index skip scans.
The reporting process was taking 17 hours to complete.
Solution:
•
•
•
Add an index on a specific table to eliminate the full table scans.
Register new index with histogram and turn off logging.
Force the use of a better index with statistics for the Skip Scans.
Result: Even with 5% increased executions the 17
hour report began returning in less than 3 minutes.
11/19/2013
www.datavail.com
14
15. Case Study #4
Customer Environment: Informix, Java, Silverlight
Complaint: Shipping change alert not working.
Root Cause: SQL Code
•
•
•
The SQL execution was selecting too much data.
The alert was looking back against 6 months of data to verify a shipment should have
been received within 10 days.
The SQL code had no way of limiting the selection against all the rows of shipment data.
Solution:
•
•
•
Add a new condition to the SQL that limited the data selection to 10 days or less.
It was finally decided to add a new feature to the UI
allow the user to define how far back the alert would look
with a maximum of 30 days.
Result: Alert began working because response time
went from timing out to sub-second response times.
11/19/2013
www.datavail.com
15
16. Case Study #5
Customer Environment: Oracle & Java concurrent program
Complaint: Billing invoices werebacking up and many were failing.
Root Cause: SQL Code, Indexing
•
•
•
Memory and hardware improvements made no improvement in throughput.
The vendor support couldn’t provide patches or improvements.
The SQL execution plan was sub-par selecting the wrong indexes for optimal
performance; performing index skip scans and index range scans.
Solution:
•
Added 2 new indexes that provided highest degree of selectivity over current indexes
being utilized by optimizer.
Result: Doubled throughput of invoice billing, eliminating
failures due to timeouts and response times.
11/19/2013
www.datavail.com
16
17. Benefits of 5S Approach
Get to root cause faster
Greater degree of insight (no Voodoo!)
Eliminates distracting possibilities
Time and money saved
Better utilization of resources (people and hardware)
Works both in reactive and proactive situations
Outcome provides measured response
Root cause reporting is much easier
11/19/2013
www.datavail.com
17