Apidays New York 2024 - The value of a flexible API Management solution for O...
Why & how to optimize sql server for performance from design to query
1. Why & How to optimize SQL Server
for performance from design to query
Antonios Chatzipavlis
Software Architect , Development Evangelist, IT Consultant
MCT, MCITP, MCPD, MCSD, MCDBA, MCSA, MCTS, MCAD, MCP, OCA
2. Objectives
• Why is Performance Tuning Necessary?
• How to Optimize SQL Server for performance
• Optimizing Database Design
• Optimizing Queries for performance.
• Optimizing an Indexing Strategy
• How to troubleshoot SQL Server
• Define and implement monitoring standards for database
servers and instances.
• Troubleshoot database server & database performance
issues.
• Troubleshoot SQL Server connectivity issues.
• Troubleshoot SQL Server concurrency issues.
2
4. Why is Performance Tuning Necessary?
• Allowing your system to scale
• Adding more customers
• Adding more features
• Improve overall system performance
• Save money but not wasting resources
• The database is typically one of the most expensive
resources in a datacenter
4
5. General Scaling Options (1)
• Scaling SQL Server with Bigger Hardware
•
•
•
•
Purchase a larger server, and replace the existing system.
Works well with smaller systems.
Cost prohibitive for larger systems.
Can be a temporary solution.
• Scaling SQL Server with More Hardware
• Purchase more hardware and split or partition the
database.
• Partitioning can be either vertical or horizontal
• Vertical: Split the databases based on a specific
demographic such as time zone or zip code.
• Horizontal: Split components out of one database into
another
5
6. General Scaling Options (2)
• Scaling SQL Server without adding hardware
•
•
•
•
•
•
•
•
Adjusting and rewriting queries.
Adding indexes.
Removing indexes.
Re-architecting the database schema.
Moving things that shouldn’t be in the database.
Eliminating redundant work on the database.
Caching of data.
Other performance tuning techniques.
6
7. How to Optimize SQL Server for performance
Optimizing Database Design
7
10. Normalization
In this process you organize data to minimize redundancy, which
eliminates duplicated data and logical ambiguities in the database.
Normal Form
Description
First
Every attribute is atomic, and there are no
repeating groups
Second
Complies with First Normal Form, and all
non-key columns depend on the whole
key
Third
Complies with Second Normal Form, and
all non-key columns are non-transitively
dependent upon the primary key
10
11. Denormalization
In this process you re-introduce redundancy to the
database to optimize performance.
When to use denormalization:
To
pre-aggregate data
To
avoid multiple/complex joins
When not to use denormalization:
To
prevent simple joins
To
provide reporting data
To
prevent same row calculations
11
12. Generalization
In this process you group similar entities together into a
single entity to reduce the amount of required data access
code.
Use generalization when:
A
large number of entities appear to be of
the same type
Multiple
entities contain the same attributes
Do not use generalization when:
It
results in an overly complex design that is
difficult to manage
12
13. How to Optimize SQL Server for performance
Optimizing Queries for
performance
13
14. Key Measures for Query
Performance
Key factors for query performance:
Resources
Time
used to execute the query
required for query execution
SQL Server tools to measure query performance:
Performance
SQL
Monitor
Server Profiler
14
15. Useful Performance Counters
SQLServer:Access Methods
Range Scans/sec.
Measures the number of qualified range scans through indexes in the last
second.
Full Scans/sec.
Measures the number of unrestricted full scans in the last second.
Index Searches/sec.
Measures the number of index searches in the last second.
Table Lock Escalations/sec.
Measures the number of lock escalations on a table.
Worktables Created/sec.
Measures the number of worktables created in the last second.
15
16. Useful Performance Counters
SQLServer:SQL Statistics
Batch Requests/sec.
Measures the number of Transact-SQL command batches received per
second. High batch requests mean good throughput.
SQL Compilations/sec.
Measures the number of SQL compilations per second. This value reaches
a steady state after SQL Server user activity is stable.
SQL Re-Compilations/sec.
Measures the number of SQL recompiles per second.
16
18. Useful Performance Counters
SQLServer:Transactions
Longest Transaction Running
Time.
Measures the length of time in seconds since the start of the transaction
that has been active longer than any other current transaction. If this
counter shows a very long transaction, you can use
sys.dm_tran_active_transactions() to identify the transaction.
Update conflict ratio.
Measures the percentage of those transactions using the snapshot
isolation level that have encountered update conflicts within the last
second.
18
19. Useful Performance Counters
SQLServer:Locks
Average Wait Time (ms).
Measures the average wait time for each lock request that resulted in a
wait.
Lock Requests/sec.
Measures the number of locks and lock conversions per second.
Lock Wait Time (ms).
Measures the total wait time for locks in the last second.
Lock Waits/sec.
Measures the number of lock requests per second that required the caller
to wait.
19
20. Useful SQL Profiler Events
•
Stored Procedures category:
•
RPC:Completed occurs when a remote procedure call has completed.
SP:Completed occurs when a stored procedure has completed.
• SP:StmtCompleted occurs when a T-SQL statement in a SP has completed.
•
•
TSQL category:
•
•
•
SQL:StmtCompleted
SQL:BatchCompleted occurs when a Transact-SQL batch has completed.
which occurs when a T-SQL statement has completed.
Locks category:
•
Lock:Acquired occurs when a transaction acquires a lock on a resource.
Lock:Released occurs when a transaction releases a lock on a resource.
• Lock:Timeout occurs when a lock request has timed out because another transaction
•
holds a blocking lock on the required resource.
20
21. Guidelines for Identifying Locking
and Blocking
•
Use Activity Monitor
•
Use SQL Server Profiler blocked process report
•
Watch for situations in which the same procedure
executes in different amounts of time
•
Identify the transaction isolation level of the procedure
21
23. Customers Table Data
Orders Table Data
Sample
customerid
city
ANTON
Athens
CHRIS
Salonica
FANIS
Athens
NASOS
Athens
Orderid
customerid
1
NASOS
2
NASOS
3
FANIS
4
FANIS
5
FANIS
6
CHRIS
7
NULL
23
24. Sample
SELECT C.customerid,
COUNT(O.orderid) AS numorders
FROM dbo.Customers AS C
LEFT OUTER JOIN dbo.Orders AS O
ON C.customerid = O.customerid
WHERE C.city = 'Athens'
GROUP BY C.customerid
HAVING COUNT(O.orderid) < 3
ORDER BY numorders;
Customerid
numorders
ANTON
0
NASOS
2
24
25. 1st Step - Cross Join
FROM dbo.Customers AS C ... JOIN dbo.Orders AS O
Customerid
City
Orderid
customerid
ANTON
Athens
1
NASOS
ANTON
Athens
2
NASOS
ANTON
Athens
3
FANIS
ANTON
Athens
4
FANIS
ANTON
Athens
5
FANIS
ANTON
Athens
6
CHRIS
ANTON
Athens
7
NULL
CHRIS
Salonica
1
NASOS
CHRIS
Salonica
2
NASOS
CHRIS
Salonica
3
FANIS
CHRIS
Salonica
4
FANIS
CHRIS
Salonica
5
FANIS
CHRIS
Salonica
6
CHRIS
CHRIS
Salonica
7
NULL
FANIS
Athens
1
NASOS
FANIS
Athens
2
NASOS
FANIS
Athens
3
FANIS
FANIS
Athens
4
FANIS
FANIS
Athens
5
FANIS
FANIS
Athens
6
CHRIS
FANIS
Athens
7
NULL
NASOS
Athens
1
NASOS
NASOS
Athens
2
NASOS
NASOS
Athens
3
FANIS
NASOS
Athens
4
FANIS
NASOS
Athens
5
FANIS
NASOS
Athens
6
CHRIS
NASOS
Athens
7
NULL
25
26. 2nd Step- Apply Join condition ON
Filter
ON C.customerid = O.customerid
Customerid
City
Orderid
customerid
ΟΝ Filter
ANTON
Athens
1
NASOS
FALSE
ANTON
Athens
2
NASOS
FALSE
ANTON
Athens
3
FANIS
FALSE
ANTON
Athens
4
FANIS
FALSE
ANTON
Athens
5
FANIS
FALSE
ANTON
Athens
6
CHRIS
FALSE
ANTON
Athens
7
NULL
UNKNOWN
CHRIS
Salonica
1
NASOS
FALSE
CHRIS
Salonica
2
NASOS
FALSE
CHRIS
Salonica
3
FANIS
FALSE
CHRIS
Salonica
4
FANIS
FALSE
CHRIS
Salonica
5
FANIS
FALSE
CHRIS
Salonica
6
CHRIS
TRUE
CHRIS
Salonica
7
NULL
UNKNOWN
FANIS
Athens
1
NASOS
FALSE
FANIS
Athens
2
NASOS
FALSE
FANIS
Athens
3
FANIS
TRUE
FANIS
Athens
4
FANIS
TRUE
FANIS
Athens
5
FANIS
TRUE
FANIS
Athens
6
CHRIS
FALSE
FANIS
Athens
7
NULL
UNKNOWN
NASOS
Athens
1
NASOS
TRUE
NASOS
Athens
2
NASOS
TRUE
NASOS
Athens
3
FANIS
FALSE
NASOS
Athens
4
FANIS
FALSE
NASOS
Athens
5
FANIS
FALSE
NASOS
Athens
6
CHRIS
FALSE
NASOS
Athens
7
NULL
UNKNOWN
Customerid City
Orderid
customerid
CHRIS
Salonica
6
CHRIS
FANIS
Athens
3
FANIS
FANIS
Athens
4
FANIS
FANIS
Athens
5
FANIS
NASOS
Athens
1
NASOS
NASOS
Athens
2
NASOS
26
27. 3rd Step - Apply OUTER Join
FROM dbo.Customers AS C LEFT OUTER JOIN dbo.Orders AS O
Customerid
City
Orderid
customerid
CHRIS
Salonica
6
CHRIS
FANIS
Athens
3
FANIS
FANIS
Athens
4
FANIS
FANIS
Athens
5
FANIS
NASOS
Athens
1
NASOS
NASOS
Athens
2
NASOS
ΑΝΤΟΝ
Athens
NULL
NULL
27
37. Considerations to Take When
Using Subqueries
Select statement
element
Subquery results
expression
Subquery results single
column table
Subquery results data
set
Subquery returns single
scalar value.
Subquery returns single
column of values.
Subquery returns multiple
columns.
The subquery’s data set is used
as a virtual table within the
outer-query.
The subquery’s data set is used
as a virtual table within the
outer-query.
Select list
The subquery’s result is used
as an expression supplying the
value for the column.
FROM clause (derived table)
This is the only location where
a subquery can act as a table.
The subquery’s data set is used
as a virtual table within the
outer-query.
WHERE clause, comparison
predicates
x {=, >, <, >=, <=, <>} ().
The predicate is true if the test
value compares with the
subquery’s scalar value and
returns true.
WHERE clause, IN predicate
x IN ().
The predicate is true if the test
value is equal to the value
returned by the subquery.
The predicate is true if the test
value is found within the
values returned by the
subquery.
WHERE clause, EXISTS
predicate
EXISTS (x).
The predicate is true if the
subquery returns at least
one row.
The predicate is true if the
subquery returns at least
one row.
The predicate is true if the
subquery returns at least
one row.
Consider queries on a case-by-case basis
37
40. Favor set-based logic over procedural
or cursor logic
• The most important factor to consider when tuning
queries is how to properly express logic in a set-based
manner.
• Cursors or other procedural constructs limit the query
optimizer’s ability to generate flexible query plans.
• Cursors can therefore reduce the possibility of
performance improvements in many situations
40
42. Test query variations for performance
• The query optimizer can often produce widely different
plans for logically equivalent queries.
• Test different techniques, such as joins or subqueries,
to find out which perform better in various situations.
42
44. Avoid query hints.
• You must work with the SQL Server query optimizer,
rather than against it, to create efficient queries.
• Query hints tell the query optimizer how to behave and
therefore override the optimizer’s ability to do its job
properly.
• If you eliminate the optimizer’s choices, you might limit
yourself to a query plan that is less than ideal.
• Use query hints only when you are absolutely certain
that the query optimizer is incorrect.
44
46. Use correlated subqueries to improve
performance.
--Using the query optimizer is able to integrate
• Since a LEFT JOIN
SELECT a.parent_key FROM parent_table aa variety of
subqueries into the main query flow in
LEFT JOIN child_table b help in various query tuning
ways, subqueries might
ON a.parent_key = b.parent_key
situations.
WHERE B.parent_key IS NULL
• Subqueries can be especially useful in situations in
which you create a join to a table only to verify the
existence of correlated rows. For better performance,
--Using a NOT EXISTS
replace these kinds of joins with correlated subqueries
SELECT a.parent_key FROM parent_table a
that make use of the EXISTS operator
WHERE NOT EXISTS
(SELECT * FROM child_table b WHERE a.parent_key
=b.parent_key)
46
48. Avoid using a scalar user-defined
function in the WHERE clause.
• Scalar user-defined functions, unlike scalar subqueries,
are not optimized into the main query plan.
• Instead, you must call them row-by-row by using a
hidden cursor.
• This is especially troublesome in the WHERE clause
because the function is called for every input row.
• Using a scalar function in the SELECT list is much less
problematic because the rows have already been
filtered in the WHERE clause.
48
50. Use table-valued user-defined
functions as derived tables.
CREATE FUNCTION Sales.fn_SalesByStore (@storeid
• In contrast to scalar user-defined functions, table-int)
RETURNSfunctions are often helpful from a performance
valued TABLE AS RETURN
( point of view when you use them as derived tables.
SELECT P.ProductID, P.Name,
• The query processor evaluates a derived table only
SUM(SD.LineTotal) AS 'YTD Total'
once per query.
FROM Production.Product AS P
• IfJOIN embed the logic in a table-valued user-defined
you Sales.SalesOrderDetail AS SD
function, you can encapsulate and reuse it for other
ON SD.ProductID = P.ProductID
• queries.
JOIN Sales.SalesOrderHeader AS SH
ON SH.SalesOrderID = SD.SalesOrderID
WHERE SH.CustomerID = @storeid
GROUP BY P.ProductID, P.Name
)
50
52. Avoid unnecessary GROUP BY columns
• Use a subquery instead.
SELECT p1.ProductSubcategoryID,
• The process of grouping rows becomes more expensive
p1.Name
as you add more columns to the GROUP BY list.
FROM Production.Product p1
• If your query has few column aggregations but many
WHERE p1.ListPrice >
non-aggregated grouped columns, you might be able
( SELECT AVG (p2.ListPrice)
to refactor it by using a correlated scalar subquery.
FROM Production.Product p2
• This will result in less work for grouping in the query
WHERE
and therefore possibly better overall query =
p1.ProductSubcategoryID
performance.
p2.ProductSubcategoryID)
52
54. Use CASE expressions to include
variable logic in a query
• The CASE expression is one of the most powerful logic
tools available to T-SQL programmers.
• Using CASE, you can dynamically change column
output on a row-by-row basis.
• This enables your query to return only the data that is
absolutely necessary and therefore reduces the I/O
operations and network overhead that is required to
assemble and send large result sets to clients.
54
56. Divide joins into temporary tables
when you query very large tables.
• The query optimizer’s main strategy is to find query plans
that satisfy queries by using single operations.
• Although this strategy works for most cases, it can fail for
larger sets of data because the huge joins require so much
I/O overhead.
• In some cases, a better option is to reduce the working set
by using temporary tables to materialize key parts of the
query. You can then join the temporary tables to produce a
final result.
• This technique is not favorable in heavily transactional
systems because of the overhead of temporary table
creation, but it can be very useful in decision support
situations.
56
58. Refactoring Cursors into Queries.
•
Rebuild logic as multiple queries
•
Rebuild logic as a user-defined function
•
Rebuild logic as a complex query with a case expression
58
61. Stored Procedures Best Practices
• Avoid stored procedures that accept parameters for
table names
• Use the SET NOCOUNT ON option in stored procedures
• Limit the use of temporary tables and table variables in
stored procedures
• If a stored procedure does multiple data modification
operations, make sure to enlist them in a transaction.
• When working with dynamic T-SQL, use sp_executesql
instead of the EXEC statement
61
62. Views Best Practices
•
•
•
•
•
Use views to abstract complex data structures
Use views to encapsulate aggregate queries
Use views to provide more user-friendly column names
Think of reusability when designing views
Avoid using the ORDER BY clause in views that contain
a TOP 100 PERCENT clause.
• Utilize indexes on views that include aggregate data
62
63. How to Optimize SQL Server for performance
Optimizing an Indexing Strategy
63
66. Guidelines for designing indexes
• Examine the database characteristics.
For example, your indexing strategy will differ between an online transaction processing system with frequent data updates
and a data warehousing system that contains primarily read-only data.
• Understand the characteristics of the most frequently used queries
and the columns used in the queries.
For example, you might need to create an index on a query that joins tables or that uses a unique column for its search
argument.
• Decide on the index options that might enhance the performance
of the index.
Options that can affect the efficiency of an index include FILLFACTOR and ONLINE.
• Determine the optimal storage location for the index.
You can choose to store a nonclustered index in the same filegroup as the table or on a different filegroup. If you store the
index in a filegroup that is on a different disk than the table filegroup, you might find that disk I/O performance improves
because multiple disks can be read at the same time.
• Balance read and write performance in the database.
You can create many nonclustered indexes on a single table, but it is important to remember that each new index has an
impact on the performance of insert and update operations. This is because nonclustered indexes maintain copies of the
indexed data. Each copy of the data requires I/O operations to maintain it, and you might cause a reduction in write
performance if the database has to write too many copies. You must ensure that you balance the needs of both select queries
and data updates when you design an indexing strategy.
• Consider the size of tables in the database.
The query processor might take longer to traverse the index of a small table than to perform a simple table scan. Therefore, if
you create an index on a small table, the processor might never use the index. However, the database engine must still
update the index when the data in the table changes .
• Consider the use of indexed views.
Indexes on views can provide significant performance gains when the view contains aggregations, table joins, or both.
66
67. Nonclustered Index (do’s & don’ts)
•
Create a nonclustered index for columns used for:
•
Predicates
•
Joins
•
Aggregation
• Avoid the following when designing nonclustered
indexes:
•
Redundant indexes
•
Wide composite indexes
•
Indexes for one query
•
Nonclustered indexes that include the clustered index
67
68. Clustered Indexes (do’s & don’ts)
• Use clustered indexes for:
•
Range queries
•
Primary key queries
•
Queries that retrieve data from many columns
• Do not use clustered indexes for:
•
Columns that have frequent changes
•
Wide keys
68
70. How to troubleshoot SQL Server
Define and implement
monitoring standards for
database servers and instances.
70
71. Monitoring Stages
Stage 1
Monitoring the database environment
Narrowing down a performance issue to a
particular database environment area
Stage 2
Stage 3
Narrowing down a performance issue to a
particular database environment object
Stage 4
Troubleshooting individual
problems
Stage 5
Implementing a
solution
71
72. How to Optimize SQL Server for performance
Troubleshoot database server
and database performance issues.
72
73. Monitoring the database
environment
• You must collect a broad range of performance data.
• The monitoring system must provide you with enough data to
solve the current performance issues.
• You must set up a monitoring solution that collects data from a
broad range of sources.
• Active data, you can use active collection tools
• System Monitor,
• Error Logs,
• SQL Server Profiler
• Inactive data you can use sources
• Database configuration settings,
• Server configuration settings,
• Metadata from SQL Server installation and databases.
73
74. Narrowing Down a Performance
Issue to a Particular Database
• Analyze the performance data that you collect
• Identify the performance issues.
• The combination of data that you have gathered helps
you identify database areas on which you need to
concentrate.
• Revisit the monitoring solution to gather additional
data. This often provides clues that you can use to
define the scope of the investigation and focus on a
particular database object or server configuration.
• After identifying the object, you can begin
troubleshooting performance issues and solve the
problem.
74
75. Guidelines for Auditing and
Comparing Test Results
• Scan the outputs gathered for any obvious
performance issues.
• Automate the analysis with the use of custom scripts
and tools.
• Analyze data soon after it is collected.
•
Performance data has a short life span, and if there is a delay, the quality of the
analysis will suffer.
• Do not stop analyzing data when you discover the first
set of issues.
•
Continue to analyze until all performance issues have been identified.
• Take into account the entire database environment
when you analyze performance data.
75
77. SQL Server Profiler guidelines
• Schedule data tracing for peak and nonpeak hours
• Use Transact-SQL to create your own SQL Server
Profiler traces to minimize the performance impact of
SQL Server Profiler.
• Do not collect the SQL Server Profiler traces directly
into a SQL Server table.
• After the trace has ended, use fn_trace_gettable function to load the
data into a table.
• Store collected data on a computer that is not the
instance that you are tracing.
77
78. System Monitor guidelines
• Execute System Monitor traces at different times during
the week, month.
• Collect data every 36 seconds for a week.
• If the data collection period spans more than a week,
set the collection time interval in the range of 300 to
600 seconds.
• Collect the data in a comma-delimited text file. You can
load this text file into SQL Server Profiler for further
analysis.
• Execute System Monitor on one server to collect the
performance data of another server.
78
79. SQLDIAG
• Is a general purpose diagnostics collection utility
• Can be run as a console application or as a service.
• Is intended to expedite and simplify diagnostic
information gathering for Microsoft Customer Support
Services.
• Collect the following types of diagnostic information:
•
•
•
•
•
Windows performance logs
Windows event logs
SQL Server Profiler traces
SQL Server blocking information
SQL Server configuration information
79
81. DMVs for Monitoring
• sys.dm_os_threads
Returns a list of all SQL Server Operating System threads that are running under the
SQL Server process.
• sys.dm_os_memory_pools
Returns a row for each object store in the instance of SQL Server. You can use this
view to monitor cache memory use and to identify bad caching behavior
• sys.dm_os_memory_cache_counters
Returns a snapshot of the health of a cache, provides run-time information about the
cache entries allocated, their use, and the source of memory for the cache entries.
• sys.dm_os_wait_stats
Returns information about all the waits encountered by threads that executed. You
can use this aggregated view to diagnose performance issues with SQL Server and
also with specific queries and batches.
• sys.dm_os_sys_info
Returns a miscellaneous set of useful information about the computer, and about the
resources available to and consumed by SQL Server.
81
82. Performance Data Collector
• Management Data Warehouse
• Performance Data Collection
•
•
•
•
Performance data collection components
System collection sets
User-defined collection sets
Reporting
• Centralized Administration: Bringing it all together
Performance Data Collection and Reporting
82
84. How to troubleshoot SQL Server
Troubleshoot SQL Server
connectivity issues.
84
85. Areas to Troubleshoot for Common
Connectivity Issues
•
Server
•
•
Service pack
•
Database configuration
•
•
Surface area configuration policies
Account status
Client and server
•
•
•
Network protocols
Net library
Other network devices
•
Firewall port configuration
•
DNS entries
85
86. SQL Server Endpoints
Server endpoints
Enable connection over network with client
Enable configuration based on TCP port numbers
Are managed by statements:
CREATE ENDPOINT
ALTER ENDPOINT
DELETE ENDPOINT
Types of endpoint
SOAP
TSQL
Service Broker
Database Mirroring
86
87. How to troubleshoot SQL Server
Troubleshoot SQL Server
concurrency issues.
87
89. Guidelines to Reduce Locking and
Blocking
•
Keep logical transactions short
•
Avoid cursors
•
Use efficient and well-indexed queries
•
Use the minimum transaction isolation level required
•
Keep triggers to a minimum
89
90. Minimizing Deadlocks
•
•
•
•
•
Access objects in the same order.
Avoid user interaction in transactions.
Keep transactions short and in one batch.
Use a lower isolation level.
Use a row versioning–based isolation level.
• Set the READ_COMMITTED_SNAPSHOT database option
ON to enable read-committed transactions to use row
versioning.
• Use snapshot isolation.
• Use bound connections.
•
Bound connections allow two or more connections to share the same transaction and locks. Bound
connections can work on the same data without lock conflicts. Bound connections can be created from
multiple connections within the same application, or from multiple applications with separate connections.
Bound connections make coordinating actions across multiple connections easier. For more information see
Books Online http://msdn.microsoft.com/en-us/library/aa213063(SQL.80).aspx
90
91. What Are SQL Server Latches?
• Latches are:
•
Objects used to synchronize data pages
•
Released immediately after the operation
• Latch waits:
•
Occur when a requested latch is held by another thread
•
Can be monitored with the counters:
•
•
Latch Waits/sec
•
•
Average Latch Wait Time (ms)
Total Latch Wait Time (ms)
Increase under memory or disk I/O pressure
91