6. Category Metric
Largest single database 70 TB
Largest table 20 TB
Biggest total data 1 application 88 PB
Highest database transactions
per second 1 db (from Perfmon)
130,000
Fastest I/O subsystem in
production (SQLIO 64k buffer)
18 GB/sec
Fastest “real time” cube 5 sec latency
data load for 1TB 30 minutes
Largest cube 12 TB
7. Company Profile
• World’s largest publicly listed online gaming platform
• 20 million registered customers in more than 25 core markets
>14,000 bets offered simultaneously on more than 90 sports
~ 90 live events with videos every day – bwin world’s largest
broadcaster of live sports
>70,000 payment transactions (PCI Level 1 and ISO 27001 certified)
per day
Business Requirements
• Failure is not an option
• 100% transactional consistency, zero data loss
• 99.998% availability...even after loss of a data center
• Performance critical
• Must scale to handle every user and and give them a great experience
• Protect users privacy and financial information
• Provide a secure PCI compliant environment for all customers
8. SQL Server Environment
• 100+ SQL Server Instances
• 120+ TB of data
• 1,400+ Databases
• 1,600+ TB storage
• 450,000+ SQL Statements per second on a single server
• 500+ Billion database transactions per day
Core component in solutions designated for:
• Financial transactions
• Gaming environments
• Tracking user state throughout the system
Solutions primarily scale-up using commodity hardware
10. Single High Transaction throughput system provides:
Mission critical to the business in terms of performance and
availability
11. Project Description
Maintains US Equities and Options trading data
Processing 10’s of billions of transactions per day
Average over 1 million business transactions/sec into SQL Server
Peak: 10 million/sec
Require last 7 years of online data
Data is used to comply with government regulations
Requirements for “real-time” query and analysis
Approximately 500 TB per year, totaling over 2PB of uncompressed data
Largest tables approaching 10TB (page compressed) in size
Early Adopter and upgrade to SQL Server 2014 in-order-to:
Better manage data growth
Improve query performance
Reduce database maintenance time
12. Data at this scale require breaking things down into manageable units:
Separate data into different logical areas:
• A database per subject area (17)
• A database per subject area per year (last 7 years)
Table and Index Partitioning:
• 255 partitions per database
• 25,000 filegroups
• Filegroup to partition alignment for easier management/less impact moving data
• Filegroup backups
Taking advantage of compression:
• Compression per partition
• Backup compression
20. Use Disk Alignment at 1024KB
Use GPT if MBR not large enough
Format partitions at 64KB allocation unit size
One partition per LUN
Only use Dynamic Disks when there is a need to
stripe LUNs using Windows striping (i.e. Analysis
Services workload)
Tools:
Diskpar.exe, DiskPart.exe and DmDiag.exe
Format.exe, fsutil.exe
Disk Manager
21. Here is a graph of performance improvement from
Microsoft’s white paper:
24. RAID-1 is OK for log files and datafiles but you can do
better…
RAID-5 is a BIG NO! for anything except read-only or read-
mostly datafiles
RAID-10 is your best bet (but most expensive)
NEVER put OLTP log files on RAID-5!
If you can afford it:
Stripe And Mirror Everything (SAME) – one HUGE
RAID-10
SSD is even better – consider for tempdb and/or log files
If adventurous, use RAW partitions (see BOL)
25. As much as you can get…
…and more!
64-bit is great for memory-intensive workloads
If still on 32-bit, use AWE
Are you sharing the box? How much memory
needs to be set aside? Set max/min server
memory as needed.
Observe where all this memory goes:
Data Cache vs. Procedure Cache vs. Lock Manager
vs. Other
Keep an eye for „ A significant part of sql server
process memory has been paged out” error in the
errorlog.
26. Min/max server memory – when needed.
Locked pages:
32-bit – when using AWE
x64 Enterprise Edition – just grant „Lock Pages in
Memory” privilege
X64 Standard Edition – must have hotfix and enable
TF845 (see KB970070 for details)
Large Pages:
ONLY dedicated 64-bit servers with more than 8GB or
RAM!
Enabled with TF834 – see KB920093
Server sloooooooow to start – be warned!
27. CPU is rarely the real bottleneck – look for WHY
we are using so much CPU power!
Use affinity mask as needed:
Splitting the CPUs between applications (or SQL
instances)
Moving SQL Server OFF the CPU that serves NIC
IRQs
With a really busy server:
Increase max worker threads (but be careful – it’s not
for free!)
Consider lightweight pooling (be SUPER careful – no
SQLCLR and some other features – see KB319942
and BOL).
28. Parallelism is good:
Gives you query results faster
But at a cost of using a lot more CPU resources
MAXDOP setting is your friend:
On server level (sp_configure „max degree of
parallelims”)
On Resource Governor workload group
On a single query (OPTION (MAXDOP 1))
Often overlooked:
sp_configure „cost threshold for parallelism” (default 5)
29. Data file layout matters…
Choose your Recovery Model carefully:
Full – highest recoverability but lowest performance
Bulk-logged – middle ground
Simple – no log backups, bulk operations minimally
logged
Always leave ON:
Auto create statistics
Auto update statistics
Always leave OFF:
Auto shrink
30.
31. Optimizes processing times Rebuild
Uses more CPU cores
ALTER INDEX ALL ON Person.Person
REBUILD WITH (MAXDOP= 4)
32. MaxDOP CPU ms Duration ms
1 7344 7399
2 9797 5997
4 15845 5451
33.
34. Using compression.
Use more than one backup device
Configure Buffercount , maxtransfersize and BlockSize
BufferCount * MaxTranferSize = Buffer size
49 Buffers * 1024k = 49MB total buffer space used.
BlockSize:
Specifies the physical block size, in bytes
The default value is 65536 for tape devices and 512 to other devices
BufferCount:
Specifies the total number of buffers I / S to be used for backup operation
MaxtransferSize
Specifies the largest unit of transfer (in bytes) to be used between SQL Server
and the average backup
35. BACKUP DATABASE [adventureworks]TO
DISK = N'C:DSI3400LUN00backupTPCH_1TB-Full',
DISK = N'C:DSI3500LUN00backupFile2',
DISK = N'C:DSI3500LUN00backupFile3',
DISK = N'C:DSI3500LUN00backupFile4',
DISK = N'C:DSI3500LUN00backupFile5',
DISK = N'C:DSI3400LUN00backupFile6',
DISK = N'C:DSI3500LUN00backupFile7',
DISK = N'C:DSI3500LUN00backupFile8',
DISK = N'C:DSI3500LUN00backupFile9'
WITH NOFORMAT, INIT,NAME = N'backup',
SKIP, NOREWIND, NOUNLOAD, COMPRESSION,STATS = 10
,BUFFERCOUNT = 2200
,BLOCKSIZE = 65536
,MAXTRANSFERSIZE=2097152
39. High rate of allocations to any data files can result in scaling
issues due to contention on allocation structures
Impacts decision for number of data files per file group
Especially a consideration on servers with many CPU cores
PFS/GAM/SGAM are structures within data file which manage
free space
Easily diagnosed by looking for contention on PAGELATCH_UP
Either real time on sys.dm_exec_requests or tracked in
sys.dm_os_wait_stats
Resource description in form of DBID:FILEID:PAGEID
Can be cross referenced with
sys.dm_os_buffer_descriptors to determine type of page
40. More data files does not necessarily equal better
performance
Determined mainly by 1) hardware capacity & 2) access patterns
Number of data files may impact scalability of heavy write
workloads
Potential for contention on allocation structures (PFS/GAM/SGAM
– more on this later)
Mainly a concern for applications with high rate of page allocations
on servers with >= 8 CPU cores
Can be used to maximize # of spindles – Data files can be
used to “stripe” database across more physical spindles
41. Provides less flexibility with respect to mapping data
files into differing storage configurations
Multiple files can be used as a mechanism to stripe
data across more physical spindles and/or service
processors (applies to many small/mid range arrays)
A single file prevents possible optimizations related to
file placement of certain objects (relatively uncommon)
Allocations heavy workloads (PFS contention) may
incur waits on allocation structures, which are
maintained per file.
42. The primary filegroup contains all system
objects
These CANNOT be moved to another
filegroup
If using file group based backup, you must
backup PRIMARY as part of regular backups
If not, you cannot restore!
Primary must be restored before other filegroups
Best Practice:
Allocate at least on additional filegroup and set
this to the default.
Do not place objects in Primary
47. DBCC TRACEON
Use -1 to turn on trace flag globally
DBCC TRACEOFF
DBCC TRACESTATUS
-T startup flag
Use –T# separated by semi-colon (;)
48.
49. Trace flag 610 controls minimally logged inserts into indexed tables
Allows for high volume data loading
Less information is written to the transaction log
Transaction log file size can be greatly reduced
Introduced in SQL Server 2008
“Very fussy”
Documented:
Data Loading Performance Guide white paper
http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx
50. Trace flag 1224 disables lock escalation based on the number of locks
Memory pressure can still trigger lock escalation
Database engine will escalate row or page locks to table locks
40% of memory available for locking
sp_configure ‘locks’
Non-AWE memory
Scope: Global | Session
Documented: BOL
52. Trace flag 1118 directs SQL Server to allocate full
extents to each tempdb objects (instead of mixed
extents)
Less contention on internal structures such as
SGAM pages
Story has improved in subsequent releases of SQL
Server
53.
54.
55. Local and global temporary tables (and
indexes if created)
User-defined tables and indexes
Table variables
Tables returned in table-valued functions
Note: This list, and the following lists, are not designed to be all inclusive.
56. Work tables for DBCC CHECKDB and DBCC
CHECKTABLE.
Work tables for hash operations, such as joins and
aggregations.
Work tables for processing static or keyset cursors.
Work tables for processing Service Broker objects.
Work files needed for many GROUP BY, ORDER BY,
UNION, SORT, and SELECT DISTINCT operations.
Work files for sorts that result from creating or rebuilding
indexes (SORT_IN_TEMPDB).
57. The version store is a collection of pages used to store
row-level versioning of data.
There are two types of version stores:
1. Common Version Store: Examples include:
Triggers.
Snapshot isolation or read-committed snapshot
isolation (uses less TEMPDB than snapshot
isolation).
MARS (when multiple active result sets are
used).
2. Online-Index-Build Version Store:
Used for online index builds or rebuilds. EE
edition only.
58. TEMPDB is dropped and recreated every time the SQL
Server service is stopped and restarted.
When SQL Server is restarted, TEMPDB inherits many of
the characteristics of model, and creates an MDF file of
8MB and an LDF file of 1MB (default setting).
By default, autogrowth is set to grow by 10% with
unrestricted growth.
Each SQL Server instance may have only one TEMPDB,
although TEMPDB may have multiple physical files.
59. Many TEMPDB database options can’t be changed (e.g.
Database Read-Only, Auto Close, Auto Shrink).
TEMPDB only uses the simple recovery model.
TEMPDB may not be backed up, restored, be mirrored,
have database snapshots made of it, or have many
DBCC commands run against it.
TEMPDB may not be dropped, detached, or attached.
60. TEMPDB logging works differently from regular logging.
Operations are minimally logged, as redo information is not
included, which reduces TEMPDB transaction log activity.
The log is truncated constantly during the automatic
checkpoint process, and should not grow significantly,
although it can grow with long-running transactions, or if
disk I/O is bottlenecked.
If a TEMPDB log file grows wildly:
Check for long-running transactions (and kill them if necessary).
Check for I/O bottlenecks (and fix them if possible).
Manually running a checkpoint can often temporally reduce a
wildly growing log file if bottle-necked disk I/O is the problem.
61. Generally, there are three major problems you
run into with TEMPDB:
1. TEMPDB is experiencing an I/O bottleneck, hurting server
performance.
2. TEMPDB is experiencing contention on various global allocation
structures (metadata pages) as temporary objects are being created,
populated, and dropped. E.G. Any space-changing operation
acquires a latch on PFS, GAM or SGAM pages to update space
allocation metadata. A large number of such operations can cause
excessive waits while latches are acquired, creating a bottleneck
(hotspot), and hurting performance.
3. TEMPDB has run out of space.
Ideally, you should be monitoring all these on a
proactive basis to identify potential problems.
62. Use Performance Monitor to determine how busy the disk is where
your TEMPDB MDF and LDF files are located.
LogicalDisk Object: Avg. Disk Sec/Read: The average time, in
seconds, of a read of data from disk. Numbers below are a general
guide only and may not apply to your hardware configuration.
Less than 10 milliseconds (ms) = very good
Between 10-20 ms = okay
Between 20-50 ms = slow, needs attention
Greater than 50 ms = serious IO bottleneck
LogicalDisk Object: Avg. Disk Sec/Write: The average time, in
seconds, of a write of data to the disk. See above guidelines.
LogicalDisk: %Disk Time: The percentage of elapsed time that the
selected disk drive is busy servicing read or write requests. A general
guideline is that if this value > 50%, there is a potential I/O bottleneck.
63. Use these performance counters to monitor allocation/deallocation
contention in SQL Server:
Access Methods:Worktables Created/sec: The number of work tables
created per second. Work tables are temporary objects and are used to
store results for query spool, LOB variables, and cursors. This number
should generally be less than 200, but can vary based on your hardware.
Access Methods:Workfiles Created/sec: Number of work files created
per second. Work files are similar to work tables but are created by
hashing operations. Used to store temporary results for hash and hash
aggregates. High values may indicate contention potential. Create a
baseline.
Temp Tables Creation Rate: The number of temporary tables
created/sec. High values may indicate contention potential. Create a
baseline.
Temp Tables For Destruction: The number of temporary tables or
variables waiting to be destroyed by the cleanup system thread. Should
be near zero, although spikes are common.
64. Minimize the use of TEMPDB
Enhance temporary object reuse
Add more RAM to your server
Locate TEMPDB on its own array
Locate TEMPDB on a fast I/O subsystem
Leave Auto Create Statistics & Auto Update Statistics on
Pre-allocate TEMPDB space – everyone needs to do this
Don’t shrink TEMPDB if you don’t need to
Divide TEMPDB among multiple physical files
Avoid using Transparent Data Encryption (2008)
65. Generally, if you are building a new SQL Server instance, it
is a good idea to assume that TEMPDB performance will
become a problem, and to take proactive steps to deal with
this possibility.
It is easier to deal with TEMPDB performance issues
before they occur, than after they occur.
The following TEMPDB performance tips may or may not
apply to your particular situation.
It is important to evaluate each recommendation, and
determine which ones best fit your particular SQL Server’s
instance. Not a one size fits all approach.
66. If latches are waiting to be acquired on TEMPDB pages for
various connections, this may indicate allocation page
contention.
Use this code to find out:
SELECT session_id, wait_duration_ms, resource_description
FROM sys.dm_os_waiting_tasks
WHERE wait_type like 'PAGE%LATCH_%' AND resource_description like
'2:%'
Allocation Page
Contention:
2:1:1 = PFS Page
2:1:2 = GAM Page
2:1:3: = SGAM Page
67.
68. Installation & Configuration Best Practices for Performance
Server Role. Server should be a member server of a Microsoft
Active Directory network, and dedicated only to SQL Server.
Windows File, Print, and Domain Controller services should be
left for other machines.
System Architecture. Use 64-bit architecture server.
32-Bit Systems. Include de /PAE parameter inside the boot.ini
file on Windows Server 2003 on servers with more than 4GB
RAM.
SQL Server Edition. Use the DEVELOPER edition on
development and test servers. Use the ENTERPRISE edition on
QA and Production servers.
CPU Cache. Use servers with CPUs that has L3 memory
cache.
Whitepapers. Look for Low-Latency best practices
configurations on server manufacturer’s websites.
BIOS. Disable CPU Hyper-Threading (or “Logical Processor”) at
the BIOS level. Use Intel’s Processor ID utility to verify it.
BIOS. Disable CPU Turbo Mode (or Turbo Boost Optimization).
BIOS. Disable CPU C-States (or C-3, C6, etc.).
BIOS. Disable CPU C1E.
BIOS. Change Power Management to Maximum Performance.
BIOS. Disable QPI Power Management.
BIOS. Change Power Profile to Maximum Performance.
BIOS. Change Power Regulator to High Performance Mode.
RAM Modules. Validate with the server’s manufacturer low-
latency recommendations on CPU and memory SIMMs
combinations, as well as memory SIMMs location on multiple
memory channels per processor.
RAM per CPU Core. For OLTP systems, use 2GB-4GB RAM
per CPU Core.
RAM per CPU Socket in Fast Track v3 (Data Warehousing).
For 2-CPU Socket use minimum of 96 GB RAM. For 4-CPU
Socket use minimum of 128 GB RAM. For 8-CPU Socket use
minimum of 256 GB RAM.
Processor Scheduling. Be sure that in Computer properties,
Performance Options, the Processor Scheduling parameter is
configured for “Background Services”.
Network Interface Cards. Have, at least, two network interface
cards connected to two different networks in order to divide
application load from administrative load.
69. Installation & Configuration Best Practices for Performance
Network Interface Cards. Configure each network interface
adapter for “Maximize data throughput for network applications”.
Network Interface Cards. For OLAP systems (Data
Warehouses and Cubes), Database Mirroring, Log Shipping,
and Replication… evaluate using Jumbo Frames (9-Mbps) on
all devices that interact with each other (switches, routers, and
NICs).
Disk Volumes. Use Solid-State (SSD) disks or 15K disks.
Disk Volumes. Use RAID-10 (or RAID-1) arrays when possible.
Use RAID-5 as last option. Never use RAID-0. RAID-5 is
excellent for reading, but not best for writing (specially bad in
random write). On direct-attached systems (DAS), if you need
to balance performance and space between solid-state disks
(SSD) and 15K disks (SAS), one strategy is to have solid-state
disk at RAID-5 and 15k disks at RAID-10.
RAID Controller. In virtual disks, indicate cache configuration
in Write Policy = Write-Through (instead of Write-Back). The
objective is to acknowledge the operating system the
completion of the transaction when is written to the storage
system instead of the RAID controller’s cache. Otherwise, is a
consistency risk if the controller’s battery is not working and
energy goes down.
Fast Track v3 (DW) – Disks. For Windows operating system
and SQL Server binary files, use a 2-Disk Spindles RAID-1
local disks array.
Disk Volumes. Assign separate virtual disks (ex. SAN LUNs)
for SQL Server data, log, tempdb, backups.
Disk Host Bus Adapter (HBA). Insert the HBA adapter into the
fastest PCI-E slot.
PCIe x4 v2.0 delivers up to 2GB/sec.
PCIe x4 v1.0 delivers up 1GB/sec.
PCIe x1 v2.0 delivers up to 500MB/sec.
PCIe x1 v1.0 delivers up to 250MB/sec.
Disk Host Bus Adapter (HBA). Configure the HBA’s Queue
Depth parameter (in Windows Registry) with the value that
reports the best performance on SQLIO tests (x86 and x64
only) or SQLIOSIM (x86, x64, and IA64).
Fast Track v3 (DW) – Disks. For data files (*.MDF, *.NDF) use
multiple SAN/DAS storage enclosures that have multiple RAID-
10 groups each one with at least 4-spindles, but dedicate one
RAID-10 group on each storage enclosure for log files (*.LDF).
In Fast Track v3 tempdb is mixed with user databases.
Disk Volumes. Have each operating system disk partitioned as
one volume only. Don’t divide each disk into multiple logical
volumes.
70. Installation & Configuration Best Practices for Performance
Disk Volumes. Partition each disk volume with Starting Offset
of 1024K (1048576).
Disk Volumes. Do NOT use Windows NTFS File Compression.
Disk Volumes. Format disk volumes using NTFS. Do not use
FAT or FAT32.
Disk Volumes. Use Windows Mount Point Volumes (folders)
instead of drive letters in Failover Clusters.
Disk Volumes. Format each SQL Server disk volume (data,
log, tempdb, backups) with Allocation Unit of 64KB, and do a
quick format if volumes are SAN Logical Units (LUNs).
Disk Volumes. Ratio #1. Be sure that the division result of Disk
Partition Offset (ex. 1024KB) ÷ RAID Controller Stripe Unit Size
(ex. 64KB) = equals an integer value. NOTE: This specific ratio
is critical to minimize disk misalignment.
Disk Volumes. Ratio #2. Be sure that the division result of
RAID Controller Stripe Unit Size (ex. 64KB) ÷ Disk Partition
Allocation Unit Size (ex. 64KB) = equals an integer value.
Fast Track v3 (DW) – Multi-path I/O (MPIO) to SAN. Install
and Multi-Path I/O (MPIO), configure each disk volume to have
multiple MPIO paths defined with, at least, one Active path, and
consult SAN vendor prescribe documentations.
Disk Volumes. Assign a unique disc volume to the MS DTC log file.
Also, before installing a SQL Server Failover Cluster, create a
separate resource dedicated to MS DTC.
Windows Internal Services. Disable any Windows service not
needed for SQL Server.
Windows Page File. Be sure that Windows paging is configure to use
each operating system disk only. Do not include paging file on any of
SQL Server disks.
Antivirus. The antivirus software should be configure to NOT scan
SQL Server database, logs, tempdb, and backup folders (*.mdf, *.ldf,
*.ndf, *.bak) .
SQL Server Engine Startup Flags for Fast Track v3 (Data
Warehousing). Start the SQL Server Engine with the -E and -T1117
startup flags.
SQL Server Service Accounts. Assign a different Active Directory
service account to each SQL Server service installed.
Service Account and Windows Special Rights. Assign the SQL
Server service account the following Windows user right policies: 1)
Lock pages in memory, and 2) Perform volume maintenance tasks.
Address Windows Extensions (AWE). If the SQL Server service
account has the Lock pages in memory Windows user right, then
enable the SQL instance AWE memory option. ( Note: AWE was
removed from SQL Server 2012; use 64-bit! ).
71. Installation & Configuration Best Practices for Performance
Instance Maximum Server Memory. If exist only one (1) SQL
Database Instance and no other SQL engines, then configure
the instance’s Maximum Server Memory option with a value of
85% the global physical memory available.
Tempdb Data Files. Be sure that the tempdb database has the
same amount of data files as CPU cores and with the same
size.
Startup Parameter T1118. Evaluate the use of trace flag T1118
as a startup parameter for the RDBMS engine to minimize
allocation contention in tempdb.
Maximum Degree of Parallelism (MAXDOP). For OLTP
systems, configure the instance’s MAXDOP=1 or higher (up to
8) depending on the number of physical CPU chips. For OLAP
systems, configure MAXDOP=0 (zero).
Maximum Worker Threads. Configure the instance’s
Maximum Worker Threads = 0 (zero).
Boost SQL Server Priority. Configure the instance’s Boost
SQL Server Priority=0 (zero).
Database Data and Log Default Locations. Configure the
instance database default locations for data and log files.
Backup Files Default Location. Configure the instance backup
location.
Backup Compression. In SQL Server 2008, enable the
instance backup compression option.
Filegroups. Before creating any database object (tables,
indexes, etc.), create a new default filegroup (NOT PRIMARY)
for data.
Data and Log Files Initial Size. Pre-allocate data and log files
sizes. This will helps to minimize disk block fragmentation and
consuming time increasing file size stopping process until it
ends.
Fast Track v3 (DW) – Compression. For Fact Tables use
Page Compression. In the other hand, compression for
Dimension tables should be considered on a case-by-case
basis.
Fast Track v3 (DW) – Index Defragmentation. When
defragmenting indexes, use ALTER INDEX [index_name] on
[schema_name].[table_name] REBUILD (WITH MAXDOP = 1,
SORT_IN_TEMPDB = TRUE) to improve performance and
avoid filegroup fragmentation. Do not use the ALTER INDEX
REORGANIZE statement. To defrag indexes specially on FACT
TABLES from data warehouses, include
DATA_COMPRESSION = PAGE.
Tools. Use the Microsoft SQL Server 2008 R2 Best Practices
Analyzer (BPA) to determine if something was left or not
configured vs. best practices.
72. Installation & Configuration Best Practices for Performance
Tools. Use Microsoft NT Testing TCP Tool (NTttcp) to
determine networking actual throughput.
Tools. Use Microsoft SQLIO and Microsoft SQLIOSim to stress
test storage and validate communication errors.
Tools. Use CPUID CPUz to determine processor information,
specially at which speed is currently running.
Tools. Use Intel Processor Identification to determine processor
information, specially if Hyperthreading is running.
73.
74. SQLIO.exe
Unsupported tool available through Microsoft
IOMeter
Open source tool, Allows combinations of I/O types to run concurrently
against test file
Not meant to exactly simulate SQL Server engine I/O ,
their purpose is to run a variety of I/O types to
“Shake-out” configuration problems
Determine capacity of the configuration
More details on benchmarking in the Pre-deployment Best
Practices Whitepaper
75. HBA throughput , multi-pathing, etc…
Run sequential I/O against a file that is memory resident in
the controller cache
Can throughput “near” theoretical aggregate bandwidth be
achieved?
Example: Practical throughput on 4 Gb/s Fiber Channel port = ~ 360
MB/s
This could be the HBA, Switch port, Front end array ports
Test the HBA load balance paths (See later)
Potential Bottlenecks: Connectivity (HBA, Switch,etc),
Controller/Service Processor, suboptimal host
configuration
Recommended: Use vendor tools to diagnose
76. • Two 4Gb/s dual port HBA’s
• Theoretical throughputlimit ~1.6
GB/s
• Two paths to each service
processor (~800 MB/s theoretical
limit per SP)
• First attempt – only ~1.0 GB/s
total for both SPs
• Second attempt – change load
balancing algorithm to round
robin
77. To get a true representation of disk performance use test files of
approximate size to planned data files – small test files (even if they
are larger than cache) may result in smaller seek times due to “short-
stroking” and skew results
Use a test file at least 10 x cache size
Fill all drives in LUN to at least 50% space
Test each LUN path individually and then combinations of the I/O
paths (scaling up)
Remember IOPs is most important for random access workloads
(OLTP), aggregate throughput for scan intensive (DW)
Random reads are good for this as they take cache out of the picture
(assuming large test file)
May need to run longer tests with sustained writes; cache will
eventually be exhausted give a true representation of “disk speed” for
the writes.
78. Test a variety of I/O types and sizes
Run tests for a reasonable period of time
Caching may behave differently after long period of sustained I/O
Relatively short tests are okay for read tests with low read cache
For write-back caches, make sure you run test long enough to measure
the de-staging of the cache.
Allow time in between tests to allow the hardware to reset (cache flush)
Keep all of the benchmark data to refer to after the SQL
implementation has taken place
Maximum throughput (IOPS or MB/s) has been obtained when latency
continues to increase while throughput is near constant
79. Example patterns to Run
R/W% Type Block Threads /
Queue
Simulates
80/20 Random 8K # cores / Files Typical OLTP
data files
0/100 Sequential 60K 1 / 32 Transaction Log
100/0 Sequential 512K 1 / 16 Table Scans
0/100 Sequential 256K 1 / 16 Bulk load
100/0 Random 32K # cores / 1 SSAS Workload
100/0 Sequential 1MB 1 / 32 Backup
0/100 Random 64K-256K # cores / Files Checkpoints
These are minimum runs for a mixed OLTP/DW environment. Take special care on monitoring cache
effects and latencies of transaction log for OLTP environments
80.
81. Hardware Utilization Efficiency
Datacenter deployment efficiency
Power Utilization
Often hardware standardization coincides
Management Efficiency
Fewer servers to manage and maintain
Centralized management of multiple/many servers
Infrastructure Agility
Load Balancing
Lowered cost and complexity for High Availability
Power
Savings
18%
Higher util
and lower
h/w costs
25%Rack Space
Savings
18%
Ease of
Managemen
t
21%
Reduced
Licensing
Costs
18%
82. Server
Child Partition
Multiple operating systems images supporting separate
independent applications running simultaneously on the same
computer system.
Strong hardware enforced isolation between the VMs
Root Partition
Devices Processors Memory
Hypervisor
Hardware
One Physical Server
Server
Child Partition
SP2
RC
Server
Child Partition
83. Configuration Considerations
Guest VM w/ Passthrough
Disks
Use Physical disk counters within root partitionto monitor I/O of
passthrough disks
Guest VM w/ VHD Use Logical or physical disk counters within guest VM to monitor IO rates of
a VHD
Disk counters at the root partition provide aggregate IO of all VHDs hosted
on the underlying partition/volume
Either configuration Very little difference in the values reported by the counters from the root
partition with those within guest VM
Slightly higher latency values (Avg. Disk/sec Read and Write) observed
within the guest VM
Terminology
Passthrough Disk: Disk Offline at Root Partition
VHD (Virtual Hard Disk)
Fixed Size VHD : Space allocated statically
Dynamic VHD : Expand on demand
84. Dedicated per VM using passthrough disks:
SQL Data – 2 LUNs
150GB LUNs using
RAID 1+0 (4+4) Sets
SQL Log – 1 LUN
50GB LUN using RAID
1+0 (2+2) Set
Disk Configuration per VM/Root
Single Pool of Disks for data files and single pool for logs
F: Data files
Two 150 GB
VHDs per VM
G: Log files
One 30GB LUN
VHD per VM
85. Disk Performance Measure From Physical Disk Counters On Native
OS
Slight overhead using VHD files for data
VHD files placed on identical LUN configuration
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
Low OLTP
Workload
Med OLTP
Workload
High OLTP
Workload
Average Disk Latency in Seconds
Root OS - Hyper-V Disabled Root OS - Hyper-V Enabled
Single VM (Passthrough Disks) Single VM (Fixed Size VHD)
0
200
400
600
800
1000
1200
1400
1600
1800
2000
Low OLTP
Workload
Med OLTP
Workload
High OLTP
Workload
Reads Per Second for Data
Volumes
Root OS - Hyper-V Disabled Root OS - Hyper-V Enabled
Single VM (Passthrough Disks) Single VM (Fixed Size VHD)
86. Same Throughput Attainable
however there is more CPU overhead with hyper-v enabled or when running within a VM
Some overhead observed with Hyper-V just being enabled
Measures:
Throughput = Batch Requests / sec
Relative Throughput = Batch Requests / sec / %CPU
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
18.00
20.00
1 2 3 4
Low: Batches per CPU %
Medium: Batches per CPU
%
1. Root OS - Hyper-V Disabled
2. Root OS - Hyper-V Enabled
3. Single VM (Passthrough Disks)
4. Single VM (Fixed Size VHD)
Relative Throughput* per Unit of CPU
RelativeThroughput:ThroughputperUnitofCPU
87. Passthrough and Fixed Size VHD for Better I/O
performance
IO Performance Impact is minimal
SQL IO performance and sizing recommendations apply
Dynamic VHD not recommended for SQL Server deployments
In over commit CPU resource scenarios
We have observed more CPU overhead to manage the additional logical
CPU’s.
Proper sizing of memory capacity.
Memory is allocated for VMs in a static fashion and can only be modified
when a guest is offline,
CPU Affinity
Not supported by Hyper-V
SQL CPU Affinity has no practical effect on virtual instance
Lower Network Latency with IPv6
Private Network and IPv6 between VMs
Jumbo Frames
88.
89. Two Types of compression
ROW
Fixed length columns stored as variable length
Recommendation: DML heavy workload
PAGE
Column Prefix and Page Dictionary compression
Recommendation: Read-mostly workload
Can be enabled on a table, index, and partition
Estimate data compression savings by
sp_estimate_data_compression_savings
Can be enabled/disabled ONLINE
No application changes
90. Your mileage will vary.
Page 91
Customer Data Compression Space
Savings
Notes
Bank Itau 70% PAGE. Data Warehouse
application.
BWIN.com 40% PAGE. OLTP Web application.
NASDAQ 62% PAGE. DW application.
GE Healthcare 38%, 21% PAGE, ROW.
Manhattan Associates 80%, 50% PAGE, ROW.
First American Title 52% PAGE.
SAP ERP 50%, 15% PAGE, ROW.
MS Dynamics AX 81% PAGE. ERP application.
ServiceU 35% PAGE.
91. Customer Performance impact Notes
BWIN.com 5% PAGE compression. OLTP Web
application. Large volume of transactions.
NASDAQ 40%-60% PAGE compression. Large sequential
range queries . DW Application.
GE Healthcare -1% PAGE compression. 500 users, 1500
Transactions / sec. OLTP with some
reporting queries.
Manhattan Associates -11% PAGE compression. A lot of insert, update
and delete activity.
First American Title 2% - 3% PAGE compression. OLTP Application.
MS Dynamics AX 3% PAGE compression. ERP application –
small transactions.
92. Question: I am not getting any or minimal
compression?
ROW Compression:
No fixed length column
Fixed length columns but all bytes are used
Compressed row > 4K
PAGE Compression
No column prefix savings
No common values for page dictionary
Large row size implying 1 to few rows per page
Mostly LOB data
93.
94. All operations are
blocked for meta-data
updates
Target index is created
Target index replaces
source index
Target Index
95. ALTER INDEX ALL
ON [MySchema].[MyTable]
REBUILD WITH (
ONLINE = ON
,MAXDOP = 4
,STATISTICS_RECOMPUTE = ON
);
96.
97. ALTER INDEX ALL ON [MySchema].[MyTable]
REBUILD Partition = ALL
ALTER INDEX ALL ON [MySchema].[MyTable]
REBUILD Partition = 5
ALTER INDEX [IX_SalesID_SalesDate] ON [MySchema].[MyTable]
REBUILD Partition = ALL
ALTER INDEX [IX_SalesID_SalesDate] ON [MySchema].[MyTable]
REBUILD Partition = 5
98. ALTER INDEX [IX_SalesID_SalesDate]
ON [MySchema].[MyTable]
REBUILD Partition = 5
WITH (
SORT_IN_TEMPDB = { ON | OFF }
,MAXDOP = max_degree_of_parallelism
,DATA_COMPRESSION = { NONE | ROW | PAGE }
);
99.
100. Object Counter Value Notes
Paging $Usage <70% Amount of page file currently in use
Processor
% Processor
Time
<= 80%
The higher it is, the more likely users
are delayed.
Processor
% Privilege
Time
<30% of
%
Processo
r Time
Amount of time spent executing kernel
commands like SQL Server IO
requests.
Process(sqlservr)
Process(msmdsrv
)
% Processor
Time
< 80%
Percentage of elapsed time spent on
SQL Server and Analysis Server
process threads.
System
Processor
Queue Length
< 4
< 12 per CPU is good/fair,< 8 is better,
< 4 is best
101. Logical Disk Counter Storage Guy’s term Description
Disk Reads / Second
Disk Writes / Second
IOPS Measures the Number of I/O’s per second
Discuss with vendor sizing of spindles of different
type and rotational speeds
Impactedby disk head movement (i.e. short stroking
the disk will provide more I/O per second capacity)
Average Disk sec / read
Average Disk sec / write
Latency Measures disk latency. Numbers will vary, optimal
values for averages over time:
1 - 5 ms for Log (Ideally 1ms or better)
5 - 20 ms for Data (OLTP) (Ideally 10ms or
better)
<=25-30 ms for Data (DSS)
Average Disk Bytes / Read
Average Disk Bytes / Write
Block Size Measures the size of I/O’s being issued. Larger I/O
tend to have higher latency (example:
BACKUP/RESTORE)
Avg. / Current Disk Queue
Length
Outstanding or
waiting IOPS
Should not be used to diagnose good/bad
performance. Provides insight into the applications
I/O pattern.
Disk Read Bytes/sec
Disk Write Bytes/sec
Throughput or
Aggregate Throughput
Measure of total disk throughput. Ideally larger
block scans should be able to heavily utilize
connection bandwidth.
102. Object Counter Value Notes
Physical Disk
Avg Disk
Reads/sec
< 8
> 20 is poor, <20 is good/fair, <12 is better, <8
is best
Physical Disk
Avg Disk
Writes/sec
< 8 or <1
Without cache: > 20 poor, <20 fair, <12 better,
<8 best.
With cache > 4 poor, <4 fair, <2 better, <1 best
Memory Available Mbytes >100
Amount of physical memory available to run
processes on the machine
SQL Server:
Memory Manager
Memory Grants
Pending
~0
Current number of processes waiting for a
workspace memory grant.
SQL Server:
Memory Manager
Page Life
Expectancy
>=300
Time, in seconds, that a page stays in the
memory pool without being referenced before it
is flushed
SQL Server: Buffer
Manager
Free List
Stalls/sec
< 2
Frequency that requests for db buffer pages
are suspended because there are no buffers.
103. Object Counter Value Notes
:Access Methods
Forwarded
Records/sec
<10*
Tables with records traversed by a pointer.
Should be < 10 per 100 batch requests/sec.
:Access Methods Page Splits/sec <20*
Number of 8k pages that filled and split into two
new pages. Should be <20 per 100 batch
requests/sec.
:Databases
Log Growths/sec;
Percent Log used
< 1 and
<80%,
resp
Don’t let transaction log growth happen
randomly!
:SQL Statistics
Batch
Requests/sec
*
No firm number without benchmarking, but >
1000 is a very busy system.
:SQL Statistics
Compilations/sec
;Recompilations/
sec
*
Compilations should be <10% of batch
requests/sec; Recompilations should be <10%
of compilations/sec
:Locks Deadlocks/sec < 1 Nbr of lock requests that caused a deadlock.
104.
105. • DON’T RUN SQL Profiler in the server.
• Then what?
• Run SQL Profiler in your computer.
• Connect to the server.
• Indicate the events and columns wanted.
• Filter by the database to be evaluated.
• Run the trace for 1 second, then stop it.
• Export the trace as script.
• Optimize the script.
• And then and only then, run the SQL Trace Script in the server.
• And to evaluate?
• Use the fn_trace_gettable() function to query the content of the
SQL Trace file(s).
• You can use the SQL Trace file(s) with SQL Server – Database
Engine Tuning Advisor to evaluate for the creation of new indexes.
106. Split tables and indexes into multiple
storage objects based on the value of a data
column
Based on range of a single column’s value
Still treated as single object by the relational
engine
Handled as multiple objects by the storage
engine
Up to 1000 partitions per object supported
107. Customer ID Index
Order History Table
Nonpartitioned:
FilegroupDATA
FilegroupIDX
Example:
Table ORDER HISTORY with a
Nonclustered Index on CUSTOMER ID
Order
History
Customer ID
Order Date
Amount
…
Order ID
108. Customer ID Index Customer ID Index Customer ID Index
Partitioned by ORDER DATE:
Order Date <
‘2003-01-01’
Order Date >=
‘2003-01-01’ and
Order Date <
‘2004-01-01’
Order Date >=
‘2004-01-01’
Filegroup
DATA_2002
Filegroup
DATA_2003
Filegroup
DATA_2004
Filegroup
IDX_2002
Filegroup
IDX_2003
Filegroup
IDX_2004
Orders
Customer ID
Order Date
Amount
…
Order ID
Order
History
Customer ID
Order Date
Amount
…
Order ID
Order History Table Order History Table Order History Table
109. Manageability
Fast Data Deletion and Data Load
Piecemeal backup / restore of historical data
Partition-wise index management
Minimize index fragmentation for historically-partitioned
tables
Support alternative storage for historical data
Performance querying Large Tables
Join efficiency
Smaller index tree or table scan when querying a
single partition
Simpler query plans compared to Partition Views
110. Partitioned Table: a single object in query plans
Single set of statistics
Smaller plans, faster compilation than Partition Views
Auto-parameterization supported
Insert / Bulk Insert / BCP fully supported
Numerous fine-grained partitions work well
Queries may access partitions in parallel
Partition is the unit of parallelism
But…
Cannot span multiple DBs or instances
Potentially use PV or DPVs atop Partitioned Tables
112. Maps ranges of a data type to integer
values
Defined by specifying boundary points
N boundary points define N+1 partitions
Partition #
1 2 3 4 5
Boundary
1
Boundary
2
Boundary
3
Boundary
4
113. CREATE PARTITION FUNCTION annual_range (DATETIME)
as RANGE RIGHT
for values
( -- Partition 1 -- 2001 and earlier
'2002-01-01', -- Partition 2 -- 2002
'2003-01-01', -- Partition 3 -- 2003
'2004-01-01', -- Partition 4 -- 2004
'2005-01-01' -- Partition 5 -- 2005 and later
)
115. Associates a storage location (Filegroup) with
each partition defined by a partition function
No requirement to use different filegroups for
different partitions
Useful for Manageability
Filegroup-based backup or storage location
Best Practice: Spread all of your Filegroups in a
Partition Scheme across as many disk spindles as
possible.
Rarely want to dedicate separate drives to separate
partitions
116. CREATE PARTITION SCHEME annual_scheme_1
as PARTITION annual_range to
(annual_min, -- filegroup for pre-2002
annual_2002, -- filegroup for 2002
annual_2003, -- filegroup for 2003
annual_2004, -- filegroup for 2004
annual_2005) -- filegroup for 2005 and later
Create PARTITION SCHEME annual_scheme_2
as PARTITION annual_range
ALL to ([PRIMARY])
117. Partition #
1 2 3 4 5
Boundary
1
Boundary
2
Boundary
3
Boundary
4
2002-01-01 2003-01-01 2004-01-01 2005-01-01
2001 & Earlier 2002 Data 2003 Data 2004 Data 2005 & Later
Filegroup
Annual_Min
Filegroup
Annual_2002
Filegroup
Annual_2003
Filegroup
Annual_2004
Filegroup
Annual_2005
Partition Function
Partition
Scheme
118. A single column must be selected as the
Partitioning Key
Partitioned Tables and Indexes are created on
Partition Schemes instead of Filegroups
All query operations on tables or indexes are
transparent to partitioning
Different tables and indexes may share common
partition functions and schemes
Table or
Index
Partition
Scheme
Partition
Function
1 many many
119. CREATE TABLE Order_History (
Order_ID bigint,
Order_Date datetime,
Customer_ID bigint
…
) ON Annual_Scheme_1(Order_Date)
CREATE INDEX Order_Cust_Idx
ON Order_History(Order_ID)
ON Annual_Scheme_1(Order_Date)
120. Partitioning Key of an Index need not be part of
the Index Key
SQL 2005 indexes can include columns outside of the
btree, at the leaf level only
Essential for partitioning, and also great for covering
index scenarios
If an index uses a similar partition function and
same partitioning key as the base table, then the
index is “aligned”
One-to-one correspondence between data in table and
index partition
All index entries in one partition map to data in a single
partition of the base table
121. A typical requirement is to insert or remove
entire partitions of data in bulk
Achieve this with a sequence of basic
operations on partitions:
Split
Merge
Switch
122.
123. General event handling
Goal is to make available well-defined data
in XML format from execution points in code
Baked into SQL Server code
Layers on top of Event Tracing for Windows
Used by
• SQL Trace, Performance Monitor and SQL
Server Audit
• Windows Event Log or SQL Error Log
• As desired by users in admin or development
Introduced in SQL Server 2008
124. Superset of Extended Events
Can be used in conjunction
with Extended Events
Can be a consumer or
target of Extended
Events
Kernel level facility
125.
126. Built in set of objects in EXE or DLL (aka Module)
SQL Server has three types of packages
• Package0
• SQLServer
• SQLOS
Packages one or more object types
Event Targets
Actions Types
Predicates Maps
127. Monitoring point of interest in code of a
module
Event firing implies:
• Point of interest in code reached
• State information available at time event fired
Events defined statically in package
registration
Versioned schema defines contents
Schema with well-defined data types
Event data always has columns in same order
Targets can pick columns to consume
128. Targets are event consumers
Targets can
• Write to a file
• Aggregate event data
• Start a task/action that is related to an
event
Process data synchronously or
asynchronously
Either file targets or In-memory targets
130. Executed on top of events before event info
stored in buffers (which may be later sent to
storage)
Currently used to
Get additional data related to event
TSQL statement
User
TSQL process info
Generate a mini-dump
Defined in ADD/ALTER EVENT clause
131. Logical expression that gate event to fire
Pred_Compare – operator for pair of values
Value Compare Value
Example: Severity < 16
Example: Error_Message = ‘Hello World!’
Pred_Source – generic data not usually in
event
Package.Pred_Source Compare Value
Example: SQLServer.user_name = ‘Chuck’
Example: SQLOs.CPU_ID = 0
Defined in ADD/ALTER EVENT clause
132.
133. Real-time data capture
No performance penalty
Based on Event Tracing for Windows (ETW)
Full programmability support
135. Name Description
package0 Default package. Contains all standard types, maps, compare operators, actions
and targets
sqlos Extended events for SQL Operating System
XeDkPkg Extended events for SQLDK binary (SQLDK.dll now loads package0 and sqlos)
sqlserver Extended events for Microsoft SQL Server
SecAudit Security Audit Events
ucs Extended events for Unified Communications Stack
sqlclr Extended events for SQL CLR
filestream Extended events for SQL Server FILESTREAM and FileTable
sqlserver Extended events for Microsoft SQL Server
138. SELECT CAST(xet.target_data as xml)
FROM sys.dm_xe_session_targets xet
JOIN sys.dm_xe_sessions xe
ON (xe.address =
xet.event_session_address)
WHERE xe.name = 'system_health'
139. CREATE EVENT SESSION [SampledQueries] ON SERVER
ADD EVENT sqlserver.error_reported(
ACTION(sqlserver.client_app_name,sqlserver.database_id,
sqlserver.query_hash,sqlserver.session_id)
WHERE
((([package0].[divides_by_uint64]([sqlserver].[session_id],(5))) AND
([package0].[greater_than_uint64]([sqlserver].[database_id],(4)))) AND
([package0].[equal_boolean]([sqlserver].[is_system],(0))))),
ADD EVENT sqlserver.sql_batch_completed(
ACTION(sqlserver.client_app_name,sqlserver.database_id,
sqlserver.query_hash,sqlserver.session_id)
WHERE
((([package0].[divides_by_uint64]([sqlserver].[session_id],(5))) AND
([package0].[greater_than_uint64]([sqlserver].[database_id],(4)))) AND
([package0].[equal_boolean]([sqlserver].[is_system],(0)))))
ADD TARGET package0.ring_buffer
WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,
MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=ON,STARTUP_STATE=OFF)
GO