Parallelism in sql server

Parallelism in SQL Server
Enrique Catala Bañuls
Mentor, SolidQ
ecatala@solidq.com
Twitter: @enriquecatala

Enrique Catala Bañuls

 Computer engineer
 Mentor at SolidQ in the relational engine
team
 Microsoft Technical Ranger
 Microsoft Active Professional since 2010
 Microsoft Certified Trainer

Volunteers:
 They spend their FREE time to give you this
event. (2 months per person)
 Because they are crazy.
 Because they want YOU
to learn from the BEST IN THE WORLD.
 If you see a guy with “STAFF” on their back –
buy them a beer, they deserve it.

Objectives of this session

 Basics on parallelism
 Settings to adjust parallelism
 Exchange operators
 Enemies of the parallelism
 Best practices

9 | 3/20/2013 |

Parallelism

 “Parallelism is the action of executing a
single task across several CPUs”
 It enhances performance taking advance of
newest HW configurations

Parallelism benefits
 SQL Server uses all CPU by default
 Generally the queries that qualify for parallelism are
high IO queries

SMP

 Symmetric multiprocessing (SMP) system
 All the CPUs share the same main memory
 No hardware partitioning for memory access
 Typically used in smaller computers

SMP architecture
CPU CPU CPU CPU CPU CPU CPU CPU

System bus
CPU CPU
F
Main
S
Memory Memory
B
CPU CPU

NUMA

 Non-Uniform Memory Access
 Nodes connected by shared bus, cross-bar,
ring
 Typically used in high-end computers

CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU

Memory Memory Memory Memory
Controller Controller Controller Controller

Node Controller Node Controller Node Controller Node Controller

Shared Bus

NUMA

 SQL Server is NUMA aware
 Automatically detects NUMA configuration
 Minimizes the memory latency by using local
memory in each node
 SQL Server must be properly configured to
gain the best performance in NUMA systems

SQL Server Execution Model
SQLOS
 SQLOS creates a scheduler for
Memory Node
each logical CPU
 A scheduler is like a logical
CPU Node CPU used by SQL Server
Scheduler
workers
 Only one worker can be executed
Worker by a scheduler at the same time
 The unit of work for a worker is a
Task
task

Schedulers and concurrency

 Pre-emptive scheduler (Windows)
 Windows uses pre-emptive scheduling because of its general
operating system nature
 It uses a priority-driven architecture
 Each thread executes in a predetermined time slice
 A thread can be preempted by a higher priority thread
 Cooperative scheduler (SQL Server)
 Each task puts itself in the waiting list every time it needs a
resource
 The same scheduler executes until the end
 This voluntary yielding by workers prevents context switching
and improves performance


 Best practices

17 | 3/20/2013 |

Settings to adjust parallelism
 Hardware level
 NUMA
 Instance level
 Soft-NUMA (affinity mask)
 Degree of parallelism
 Cost threshold for parallelism
 Max worker threads
 -P parameter
 Connection level
 Resource Governor by configuring MAXDOP
 Query level
 MAXDOP clause
 T-SQL patterns
 CROSS APPLY
 Functions…

CPU Affinity Mask
• Used to set which processor(s) can be used by the SQL
Server instance.
• Setting a processor affinity will tie the threads to a particular
processor

Affinity I/O Mask

 Used to affinitize the CPU usage to I/O
operations
 Each I/O operation needs to be finalized
 Byte checksum, number of transferred bytes,
page number okay, etc.
 CPU consumption
 Can be used to specify the lazy writer (in a
new hidden scheduler)
Bad Good

Network affinity

8000

8001

8002

8003

Threshold for parallelism

 Instance level configuration
 Change statistically the parallel execution
 Changes the boundaries of when a serial plan should be
changed to parallel plan

if(best_plan_for_now.cost<1) return(best_plan_for_now)
else if(MAXDOP>0
and best_plan.cost > threshold for parallelism)
return(MIN(create_paralel_plan().cost, best_plan_for_now))

Demonstration 1

Affinity mask, cost threshold for
parallelism

Degree of parallelism (DOP)

 Max degree of parallelism
o Instance setting that affects the whole instance
o Can be configured at resource governor´s
workload level
o Enforces the maximum number of CPUs that a
single worker can use
 MAXDOP hint
o Can be used at query level

Demonstration 2

MAXDOP


 Best practices

26 | 3/20/2013 |

Exchange operators

 Operators dedicated to moving rows between
one or more workers, distributing individual
rows among them

Distribute streams operator
 Row distribution based on
 Hash
 Each row computed a hash and each thread Works only with the rows that have
the same hash
 Round-robin
 Each row is sent to the following thread of a round-robin
 Broadcast
 All rows are sent to all threads
 Range
 Each row is sent to a thread based on a range computation over a column
 Rare and used in some parallel index creation operations
 Demand
 Pull mode
 It SENDS the row to the operator is calling
 It appears on partitioned tables

Repartition streams operator

 Takes rows from multiple sources and send rows
to multiple destinations
 Doesn´t update any row

Gather streams operator

 It takes rows from multiple sources and send
to a single destination (thread)
 Tipically increases CXPACKETS

Demonstration 3

OPERATORS


 Best practices

32 | 3/20/2013 |

Enemies of the parallelism
 Makes the whole plan serial
 Modifying the contents of a table variable (reading is fine)
 Any T-SQL scalar function
 CLR scalar functions marked as performing data access (normal ones
are fine)
 Random intrinsic functions including OBJECT_NAME,
ENCYPTBYCERT, and IDENT_CURRENT
 System table access (e.g. sys.tables)
 Serial zones
 TOP
 Sequence project (e.g. ROW_NUMBER, RANK)
 Multi-statement T-SQL table-valued functions
 Backward range scans (forward is fine)
 Global scalar aggregates
 Common sub-expression spools
 Recursive CTEs

Demonstration 4

ENEMIES OF THE PARALLELISM

CXPACKET
Serial Parallel Serial

Demonstration 5

CXPACKET


 Best practices

37 | 3/20/2013 |

Best practices
 Never trust the default configuration for the
degree of parallelism
 By default, MAXDOP = 0
 As a general rule
 Pure OLTP should use MAXDOP = 1
 MAXDOP not to exceed the number of physical cores
 If NUMA architecture,
MAXDOP <= #physical_cores_numa_node

wait type name wait time (ms) requests
CXPACKET 786556034 128110444
LATCH_EX 255701441 155553913
ASYNC_NETWORK_IO 129888217 19083082
PAGEIOLATCH_SH 83672746 2813207
WRITELOG 70634742 48398646
SOS_SCHEDULER_YIELD 47697175 176871743

Best practices

 When to apply MAXDOP?
 ALTER INDEX operations
 Typically set MAXDOP = #_physical_cores
 When to set max degree of parallelism?
 When you see high CXPACKET waits
 OLTP pure systems should set its value to 1
 When to set cost threshold for parallelism?
 When you want to change the number of parallel
operations statistically


 Best practices

41 | 3/20/2013 |

Parallelism in sql server

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Parallelism in sql server

Semelhante a Parallelism in sql server (20)

Mais de Enrique Catala Bañuls

Mais de Enrique Catala Bañuls (20)

Último

Último (20)

Parallelism in sql server

Notas do Editor