1. Performance Tuning of Spatial
Queries in SQL Server
Deep Dive into Spatial Indexing
Michael Rys (@SQLServerMike)
Principal Program Manager
Microsoft Corp.
October 11-14, Seattle, WA
3. Q: Why is my Query so Slow?
A: Usually because the index isn’t being used.
Q: How do I tell?
A: SELECT * FROM T WHERE g.STIntersects(@x) = 1
AD404-M| Spatial Performance 3
4. Hinting the Index
Spatial indexes can be forced if needed.
SELECT *
FROM T WITH(INDEX(T_g_idx))
WHERE g.STIntersects(@x) = 1
Use SQL Server 2008 SP1 or 2008 R2!
AD404-M| Spatial Performance 4
5. But Why Isn't My Index Used?
Plan choice is cost-based
• QO uses various information, including cardinality
EXEC sp_executesql
SELECT *@x geometry = 'POINT (0 0)'
DECLARE
FROM T
SELECT *N'SELECT *
WHERE FROM T
FROM TT.g.STIntersects('POINT (0 0)') = 1
WHERE WHERE T.g.STIntersects(@x) = 1',
T.g.STIntersects(@x) = 1
N'@x geometry', N'POINT (0 0)'
When can we estimate cardinality?
• Variables: never
• Literals: not for spatial since they are not literals
under the covers
• Parameters: yes, but cached, so first call matters
AD404-M| Spatial Performance 5
6. Spatial Indexing Basics
C
D A B B
D A B A
Primary Filter Secondary Filter
E (Index lookup) (Original predicate)
In general, split predicates in two
• Primary filter finds all candidates, possibly
with false positives (but never false negatives)
• Secondary filter removes false positives
The index provides our primary filter
Original predicate is our secondary filter
Some tweaks to this scheme
• Sometimes possible to skip secondary filter
AD404-M| Spatial Performance 6
7. Using B+-Trees for Spatial Index
SQL Server has B+-Trees
Spatial indexing is usually done through other
structures
• Quad tree, R-Tree
Challenge: How do we repurpose the B+-Tree
to handle spatial queries?
• Add a level of indirection!
AD404-M| Spatial Performance 7
8. Mapping to the B+-Tree
B+-Trees handle linearly ordered sets well
We need to somehow linearly order 2D space
• Either the plane or the globe
We want a locality-preserving mapping from
the original space to the line
• i.e., close objects should be close in the index
• Can’t be done, but we can approximate it
AD404-M| Spatial Performance 8
9. SQL Server Spatial Indexing Story
Planar Index Geographic Index
• Requires bounding box • No bounding box
• Only one grid • Two top-level projection grids
Secondary Filter
Indexing Filter
Primary Phase
1 2 15 16 1.
4 3 14 13
5 8 9 12 3.
6 7 10 11 2.
5.
4. Apply actual CLR method
3. Intersecting for spatial
2. Identify a grid on the
1. Overlay gridsgrids identifies on
query
candidates to
object(s)
object to store in index
spatial object find matches
AD404-M| Spatial Performance 9
10. SQL Server Spatial Indexing Story
Multi-Level Grid
• Much more flexible than a simple grid
• Hilbert numbering
• Modified adaptable QuadTree
Grid index features
• 4 levels
• Customizable grid subdivisions
• Customizable maximum number of cells per object (default
16)
• NEW IN SQL Server Codename “DENALI”: New Default
tessellation with 8 levels of cell nesting
AD404-M| Spatial Performance 10
11. Multi-Level Grid
/4/2/3/1
/
(“cell 0”)
Deepest-cell Optimization: Only keep the lowest level cell in index
Covering Optimization: Only record higher level cells when all lower
cells are completely covered by the object
Cell-per-object Optimization: User restricts max number of cells per object Performance
AD404-M| Spatial 11
12. Implementation of the Index
Persist a table-valued function
• Internally rewrite queries Spatialencoding IDcovers cellor 2)
Varbinary(5) Reference table
to use the
0 – cell at least touches the object (but not 1
1 – guarantee that object partially
15 columns and 2 – object limitation be the same to
of gridHaveid
cell to
895 byte covers cell
produce match
Prim_key geography Prim_key cell_id srid cell_attr
1 0x00007 42 0
1 g1
3 0x00007 42 1
2 g2 3 0x0000A 42 2
3 g3 3 0x0000B 42 0
3 0x0000C 42 1
Base Table T 1 0x0000D 42 0
2 0x00014 42 1
CREATE SPATIAL INDEX sixd
Internal Table for sixd
ON T(geography)
AD404-M| Spatial Performance 12
13. New AUTO GRID Index
• NEW IN SQL Server Codename “DENALI”
• Has 8 levels of cell nesting
• No manual grid density selection:
• Fixed at HLLLLLLL
• default number of cells per object:
• 8 for geometry
• 12 for geography
• More stable performance
• for windows of different size
• for data with different spatial density
• For default values:
• Up to 2x faster for longer queries > 500 ms
• More efficient primary filter
• Fewer rows returned
• 10ms slower for very fast queries < 50 ms
• Increased tessellation time which is constant
AD404-M| Spatial Performance 13
14. Spatial Index Performance
New grid gives much stable performance for query windows of different size
Better grid coverage gives fewer high peaks
AD404-M| Spatial Performance 14
15. Index Creation and Maintenance
Create index example GEOMETRY:
CREATE SPATIAL INDEX sixd ON spatial_table(geom_column)
WITH (
BOUNDING_BOX = (0, 0, 500, 500),
GRIDS = (LOW, LOW, MEDIUM, HIGH),
CELLS_PER_OBJECT = 20)
Create index example GEOGRAPHY:
CREATE SPATIAL INDEX sixd ON spatial_table(geogr_column)
USING GEOGRAPHY_GRID
WITH (
GRIDS = (LOW, LOW, MEDIUM, HIGH),
CELLS_PER_OBJECT = 20)
NEW IN SQL Server “DENALI” (equivalent to default creation):
CREATE SPATIAL INDEX sixd ON spatial_table(geom_column)
USING GEOGRAPHY_AUTO_GRID
WITH (CELLS_PER_OBJECT = 20)
15 Use ALTER and DROP INDEX for maintenance.
18. How Costing is Done
• The stats on the index contain a trie constructed on
the string form of the packed binary(5) typed Cell ID.
• When a window query is compiled with a sniffable
window object, the tessellation function on the
window object is run at compile time. The results are
used to construct a trie for use during compilation.
• May lead to wrong compilation for later objects
• No costing on:
• Local variables, constants, results of expressions
• Use different indices and different stored procs to
account for different query characteristics
AD404-M| Spatial Performance 18
20. Seeking into a Spatial Index
Minimize I/O and random I/O
Intuition: small windows should touch small portions of the index
A cell 7.2.4 matches
• Itself
• Ancestors
• Descendants
7 7.2 7.2.4
Spatial Index S
AD404-M| Spatial Performance 20
21. Understanding the Index Query Plan
Remove dup T(@g)
Optional Sort
ranges
Ranges
Spatial Index Seek
AD404-M| Spatial Performance 21
22. Other Query Processing Support
• Index intersection
• Enables efficient mixing of spatial and non-spatial
predicates
• Matching
• New in SQL Server “Denali”: Nearest Neighbor query
• Distance queries: convert to STIntersects
• Commutativity: a.STIntersects(b) = b.STIntersects(a)
• Dual: a.STContains(b) = b.STWithin(a)
• Multiple spatial indexes on the same column
• Various bounding boxes, granularities
• Outer references as window objects
• Enables spatial join to use one index
AD404-M| Spatial Performance 22
23. Other Spatial Performance Improvements
in SQL Server Codename “Denali”
• Spatial index build time for point data can be as
much as four to five times faster
• Optimized spatial query plan for STDistance and
STIntersects like queries
• Faster point data queries
• Optimized STBuffer, lower memory footprint
AD404-M| Spatial Performance 23
24. Spatial Nearest Neighbor (Denali)
Main scenario
• Give me the closest 5 Italian restaurants
Execution plan
• SQL Server 2008/2008 R2: table scan
• SQL Server Codename “Denali”: uses spatial index
Specific query pattern required
• SELECT TOP(5) *
FROM Restaurants r
WHERE r.type = ‘Italian’
AND r.pos.STDistance(@me) IS NOT NULL
ORDER BY r.pos.STDistance(@me)
AD404-M| Spatial Performance 24
26. Nearest Neighbor Performance
Find the closest 50 business points (22 million in total)
NN query vs best current workaround (sort all points in 10km radius)
*Average time for NN query is ~236ms AD404-M| Spatial Performance 26
27. Limitations of Spatial Plan Selection
• Off whenever window object is not a
parameter:
• Spatial join (window is an outer reference)
• Local variable, string constant, or complex expression
• Has the classic SQL Server parameter-
sensitivity problem
• SQL compiles once for one parameter value and reuses the
plan for all parameter values
• Different plans for different sizes of window require
application logic to bucketize the windows
AD404-M| Spatial Performance 27
28. Index Support
• Can be built in parallel
• Can be hinted
• File groups/Partitioning
• Aligned to base table or Separate file group
• Full rebuild only
• New catalog views, DDL Events
• DBCC Checks
• Supportability stored procedures
• New in SQL Server “Denali”: Index Page and Row Compression
• Ca. 50% smaller indices, 0-15% slower queries
• Not supported
• Online rebuild
• Database Tuning advisor AD404-M| Spatial Performance 28
29. SET Options
Spatial indexes requires:
• ANSI_NULLS: ON
• ANSI_PADDING: ON
• ANSI_WARNINGS: ON
• CONCAT_NULL_YIELDS_NULL: ON
• NUMERIC_ROUNDABORT: OFF
• QUOTED_IDENTIFIER: ON
AD404-M| Spatial Performance 29
30. Index Hinting
FROM T WITH (INDEX (<Spatial_idxname>))
• Spatial index is treated the same way a
non-clustered index is
• the order of the hint is reflected in the order of the indexes
in the plan
• multiple index hints are concatenated
• no duplicates are allowed
• The following restrictions exist:
• The spatial index must be either first in the first index hint or
last in the last index hint for a given table.
• Only one spatial index can be specified in any index hint for
a given table.
AD404-M| Spatial Performance 30
31. Query Window Hinting (Denali)
SELECT * FROM table t
with(SPATIAL_WINDOW_MAX_CELLS=1024)
WHERE t.geom.STIntersects(@window)=1
• Used if an index is chosen (does not force an index)
• Overwrites the default (512 for geometry, 768 for
geography)
• Rule of thumb:
• Higher value makes primary filter phase longer but reduces
work in secondary filter phase
• Set higher for dense spatial data
• Set lower for sparse spatial data
AD404-M| Spatial Performance 31
33. Spatial Catalog Views
• sys.spatial_indexes catalog view
• sys.spatial_index_tessellations catalog view
• Entries in sys.indexes for a spatial index:
• A clustered index on the internal table of the spatial index
• A spatial index (type = 4) for spatial index
• An entry in sys.internal_tables
• An entry to sys.index_columns
AD404-M| Spatial Performance 35
34. New Spatial Histogram Helpers (Denali)
sp_spatial_help_geometry_histogram
sp_spatial_help_geography_histogram
Used for spatial data and index analysis
Histogram of 22 million business points over US
Left: SSMS view of a histogram
Right: Custom drawing on top of Bing Maps
AD404-M| Spatial Performance 38
36. sys.sp_help_spatial_geometry_index
Arguments
Parameter Type Description
@tabname nvarchar(776) the name of the table for which the index
has been specified
@indexname sysname the index name to be investigated
@verboseoutput tinyint 0 core set of properties is reported
1 all properties are being reported
@query_sample geometry A representative query sample that will be
used to test the usefulness of the index. It
may be a representative object or a query
window.
Results in property name/value pair table of the format:
PropName: nvarchar(256) PropValue: sql_variant
AD404-M| Spatial Performance 40
37. Some of the returned Properties
Property Type Description
Number_Of_Rows_Selected_By_ bigint Core P = Number of rows selected by the
Primary_Filter primary filter.
Number_Of_Rows_Selected_By_ bigint Core S = Number of rows selected by the
Internal_Filter internal filter. For these rows, the secondary
filter is not called.
Number_Of_Times_Secondary_Fi bigint Core Number of times the secondary filter is
lter_Is_Called called.
Percentage_Of_Rows_NotSelecte float Core Suppose there are N rows in the base table,
d_By_Primary_Filter suppose P are selected by the primary filter.
This is (N-P)/N as percentage.
Percentage_Of_Primary_Filter_R float Core This is S/P as a percentage. The higher the
ows_Selected_By_Internal_Filter percentage, the better is the index in
avoiding the more expensive secondary
filter.
Number_Of_Rows_Output bigint Core O=Number of rows output by the query.
Internal_Filter_Efficiency float Core This is S/O as a percentage.
Primary_Filter_Efficiency float Core This is O/P as a percentage. The higher the
efficiency is, the less false positives have to
be processed by the secondary filter.
AD404-M| Spatial Performance 43
39. Spatial Tips on index settings
Some best practice recommendations (YMMV):
• Start out with new default tesselation
• Point data: always use HIGH for all 4 level.
CELL_PER_OBJECT are not relevant in the case.
• Simple, relatively consistent polygons: set all levels to
LOW or MEDIUM, MEDIUM, LOW, LOW
• Very complex LineString or Polygon instances:
• High number of CELL_PER_OBJECT (often 8192 is best)
• Setting all 4 levels to HIGH may be beneficial
• Polygons or line strings which have highly variable
sizes: experimentation is needed.
• Rule of thumb for GEOGRAPHY: if MMMM is not
working, try HHMM AD404-M| Spatial Performance 45
40. What to do if my Spatial Query is slow?
• Make sure you are running SQL Server 2008 SP1, 2008 R2 or
“Denali”
• Check query plan for use of index
• Make sure it is a supported operation
• Hint the index (and/or a different join type)
• Do not use a spatial index when there is a highly selective non-
spatial predicate
• Run above index support procedure:
• Assess effectiveness of primary filter (Primary_Filter_Efficiency)
• Assess effectiveness of internal filter (Internal_Filter_Efficiency)
• Redefine or define a new index with better characteristics
• More appropriate bounding box for GEOMETRY
• Better grid densities
AD404-M| Spatial Performance 46
41. Related Content
Weblog
• http://blogs.msdn.com/isaac
• http://blogs.msdn.com/edkatibah
• http://johanneskebeck.spaces.live.com/
• http://sqlblog.com/blogs/michael_rys/
Forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=1629&SiteID=1
Whitepapers, Websites & Code
• Denali CTP3: http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/08/08/new-spatial-
features-in-sql-server-code-named-denali-community-technology-preview-3.aspx
• Spatial Wiki: http://social.technet.microsoft.com/wiki/contents/articles/4136.aspx
• SQL Server 2008 Spatial Site: http://www.microsoft.com/sqlserver/2008/en/us/spatial-
data.aspx
• SQL Spatial Codeplex: http://www.codeplex.com/sqlspatialtools
• http://www.sharpgis.net/page/SQL-Server-2008-Spatial-Tools.aspx
• http://www.codeplex.com/ProjNET
• http://www.geoquery2008.com/
• SIGMOD 2008 Paper: Spatial Indexing in Microsoft SQL Server 2008
• And of course Books Online!
AD404-M| Spatial Performance 47
42. Complete the Evaluation Form
to Win!
Win a Dell Mini Netbook – every day – just for
submitting your completed form. Each session
evaluation form represents a chance to win.
Pick up your evaluation form:
• In each presentation room Sponsored by Dell
• Online on the PASS Summit website
Drop off your completed form:
• Near the exit of each presentation room
• At the Registration desk
• Online on the PASS Summit website
AD404-M| Spatial Performance 48
43. Thank you
for attending this session and the
2011 PASS Summit in Seattle
October 11-14, Seattle, WA
44. Microsoft SQL Microsoft Expert Pods Hands-on Labs
Server Clinic Product Pavilion Meet Microsoft SQL
Server Engineering
Work through your Talk with Microsoft SQL Get experienced through
team members &
technical issues with SQL Server & BI experts to self-paced & instructor-
SQL MVPs
Server CSS & get learn about the next led labs on our cloud
architectural guidance version of SQL Server based lab platform -
from SQLCAT and check out the new bring your laptop or use
Database Consolidation HP provided hardware
Appliance
Room 611 Expo Hall 6th Floor Lobby Room 618-620
AD404-M| Spatial Performance 50
Editor's Notes
ADD USING Syntax to show new tesselation scheme
Procedure:Construct 4 points/ranges for each cell in TRemove duplicatesSort (optionally)Seek
Clustering imposes ordering on index
Procedure:Construct 4 points/ranges for each cell in TRemove duplicatesSort (optionally)Seek
TBD
ADD Tesselation
Experimentation: For instance, consider this dataset: US Highways. In this dataset some of the LineStrings are quite long (over 2000 miles) and others are quite short (400 meters or less). For optimal performance, the following two indexes were roughly equivalent:Geography Index: MEDIUM, MEDIUM, MEDIUM, MEDIUM 1024Geometry Index: LOW, LOW, LOW, LOW 1024