SlideShare a Scribd company logo
1 of 42
Structured and
unstructured Search




Related/”Semantic”
Search
Building and Maintaining Applications with
         relational and non-relational data is hard
 Pain       Complex integration
            Duplicated functionality
Points      Compensation for unavailable services



         Reduce the cost of managing all data
         Simplify the development of applications
Goals    over all data
         Provide management and programming
         services for all data
Tables, XML, Spatial, Documents, Digital Media, Scientific
Records, Factoids…


Data formats and content natively understood for rich application and
user experience
Consistent Application Model and Data Constructs to ease application
development, migration and long-term retention


Provide rich services, e.g.,
Programmability

    T-SQL


    Query

  Structured
     Data

   B-trees

Manageability
 Availability
     Files
Programmability

             T-SQL


  Query                 Search

Structured            Unstructured
   Data                  Data

             B-trees

       Manageability
        Availability
              Files
Programmability
                                          Spatial, XML,
               T-SQL/Data Types           HierarchyID
                                                             Win 32
Query and            XQuery
                                             Search
Type Operations     Spatial ops
                       Semi-
  Structured                                     Unstructured
                    structured
     Data                                           Data
                    Data/XML
                                  XML, FTS, Spatial
                  B-trees             Indices
                                                          Filestrea
               Manageability                                  m
                Availability
                            Files
Rich Data                            Programmability
Programming
                                                              Spatial, XML,
Capabilities                       T-SQL/Data Types           HierarchyID

                                                                                  Win 32
Rich Query and      Query and Type
                                                             Search
Search Services     Operations
                                     XQuery
over all Data                       Spatial ops                      Semantic
                                                                     Platform

Efficient Storage     Structured         Semi-structured             Unstructured
for BR Data              Data              Data/XML                     Data
                                                      XML, FTS, Spatial
                                      B-trees             Indices
                                                                                Filestream
                          Manageability& Availability
                                                  Files
Transactional Access                       Streaming Win32 Access
                                                                                    Streaming Win32 Access??
                                             Database Applications                  Windows Apps           SQL Apps


                                                                        Blobs            SMB Share         FileStream
                                                                                        Files/Folders          API

                      Rich Services

 Fulltext Search                                      Database




                                                                                                               Solutions
                                                                                                               Scale-up
Semantic Similarity                                                                                                            Disk   Disk   Disk



                                                     FileTable
                                                                                                                                1      2      3


                                                                                 FileStreams
    Search
                                                                                                                            Multiple Containers


                                       Integrated Administration?
                                        Integrated Administration                   Remote BLOB Storage
                                                                                 Customer Application
                                                                                            SQL RBS API
                                         D
                                                 D                                               Centera        SQL
                                         B           FileStre                   Azure lib          lib     FILESTREAM lib
                                                 B        FileStreams




                                              Integrated                        Azure           Centera        SQL DB
                                      Backup/Replication/AlwaysOn
Store BLOBs in
DB + File System
     Application

               BLOB




     DB
FileTable Folder Hierarchy
FILESTREAM
Share
                        MSSQLSERVER


                                                             my_machine
Database                                                     MSSQLSERVEROffice
Directories                                                  DocsDocuments
               Private Docs                  Office Docs
               (Database1)                  (Database2)

FileTable
Directories
                                Media          Documents       LogFiles
                              (FileTable)      (FileTable)    (FileTable)
User-Defined
Directory
Structure
900

                    800

                    700
                                                                        Filestream Win32
                                                                        (Filesystem)
Throughput (Mbps)




                    600                                                 Access
                                                                        Filestream T-SQL
                    500

                    400                                                 Varbinary

                    300
                                                                        Filesystem Win32
                    200                                                 Access Gain (%)
                    100

                     0
                          240 KB   480 KB   1 MB   2 MB   4 MB   8 MB
Insert
                    600

                                                                           Filestream
                    500                                                    Win32
                                                                           (Filesystem)
                    400                                                    Access
                                                                           Filestream T-
Throughput (Mbps)




                    300                                                    SQL

                    200
                                                                           Varbinary
                    100


                      0
                           240 KB   480 KB   1 MB     2 MB   4 MB   8 MB

                    -100


                    -200
Create/Alter Database
          max_size
DBCC Shrinkfile Emptyfile
Use of multiple spindles for achieving better I/O Scalability
2012   2012
Queries over 350M documents database and random DMLs running in background.
Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput
2005/8 vs 2012




                                                                         2005/8


                                                                         2012




Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer
playback benchmark
geography::Point(lat, lon, 4326)
C
        B                                D A      B                        A    B
D A
                    Primary Filter                       Secondary Filter
         E          (Index lookup)                       (Original predicate)

In general, split predicates in two
      Primary filter finds all candidates, possibly
      with false positives (but never false negatives)
      Secondary filter removes false positives
The index provides our primary filter
Original predicate is our secondary filter
Some tweaks to this scheme
      Sometimes possible to skip secondary filter
Fully contained
                        cells
   Partially
contained cells
Optimal value (theoretical) is
                                      somewhere between two extremes




       Default values:                                    Time needed to
512 - Geometry AUTO grid                               process false positives
768 - Geography AUTO grid
1024 - SELECT * FROM table t WITH
        MANUAL grids             (SPATIAL_WINDOW_MAX_CELLS=256)
      WHERE t.geom.STIntersects(@window)=1;
CREATE SPATIAL INDEX idxGeog
   ON table(geography column)
   USING GEOGRAPHY_GRID
   WITH (
     DATA_COMPRESSION = page | row
   );

On the basis of internal tests, with compression
- 40%-50% smaller
  - 20% faster -15% slower queries
- Per partition compression setting is not supported.
Give me the closest 5 Italian restaurants

      SQL Server 2008/2008 R2: table scan
      SQL Server 2012: uses spatial index


SELECT TOP(5) *
FROM Restaurants r
WHERE r.type = ‘Italian’
  AND r.pos.STDistance(@me) IS NOT NULL
ORDER BY r.pos.STDistance(@me)
Find the closest 50 business points to a specific location (out of 22 million in total)
http://www.slideshare.net/MichaelRys/sql-bits-brruds
    http://www.slideshare.net/MichaelRys/filetable-and-semantic-search-in-sql-server-2012
    http://www.sqlserverlaunch.com/WW/theater?sid=634

    http://www.slideshare.net/MichaelRys/sqlbits-x-sql-server-2012-spatial
    http://www.slideshare.net/MichaelRys/sqlbits-x-sql-server-2012-spatial-indexing
Forum:
    http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=1629&SiteID=1

More Related Content

What's hot

U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)Michael Rys
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Michael Rys
 
Apache MetaModel - unified access to all your data points
Apache MetaModel - unified access to all your data pointsApache MetaModel - unified access to all your data points
Apache MetaModel - unified access to all your data pointsKasper Sørensen
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLCloudera, Inc.
 
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.George Joseph
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsgagravarr
 
Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLMichael Rys
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Michael Rys
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryBizTalk360
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Michael Rys
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLDataWorks Summit
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft PlatformAndrew Brust
 

What's hot (19)

NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
 
U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)U-SQL Federated Distributed Queries (SQLBits 2016)
U-SQL Federated Distributed Queries (SQLBits 2016)
 
HTAP Queries
HTAP QueriesHTAP Queries
HTAP Queries
 
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
Best Practices and Performance Tuning of U-SQL in Azure Data Lake (SQL Konfer...
 
Apache MetaModel - unified access to all your data points
Apache MetaModel - unified access to all your data pointsApache MetaModel - unified access to all your data points
Apache MetaModel - unified access to all your data points
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETL
 
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
Killer Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQLKiller Scenarios with Data Lake in Azure with U-SQL
Killer Scenarios with Data Lake in Azure with U-SQL
 
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
Tuning and Optimizing U-SQL Queries (SQLPASS 2016)
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
NoSql
NoSqlNoSql
NoSql
 
Mongo db
Mongo dbMongo db
Mongo db
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014
 
C# + SQL = Big Data
C# + SQL = Big DataC# + SQL = Big Data
C# + SQL = Big Data
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
 

Similar to SQL Server 2012 Beyond Relational Performance and Scale

SQLBits X SQL Server 2012 Beyond Relational
SQLBits X SQL Server 2012 Beyond RelationalSQLBits X SQL Server 2012 Beyond Relational
SQLBits X SQL Server 2012 Beyond RelationalMichael Rys
 
FileTable and Semantic Search in SQL Server 2012
FileTable and Semantic Search in SQL Server 2012FileTable and Semantic Search in SQL Server 2012
FileTable and Semantic Search in SQL Server 2012Michael Rys
 
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataSQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataMichael Rys
 
The Efficient Use of Cyberinfrastructure to Enable Data Analysis Collaboration
The Efficient Use of Cyberinfrastructure  to Enable Data Analysis CollaborationThe Efficient Use of Cyberinfrastructure  to Enable Data Analysis Collaboration
The Efficient Use of Cyberinfrastructure to Enable Data Analysis CollaborationCybera Inc.
 
Relational
RelationalRelational
Relationaldieover
 
CSC1100 - Chapter08 - Database Management
CSC1100 - Chapter08 - Database ManagementCSC1100 - Chapter08 - Database Management
CSC1100 - Chapter08 - Database ManagementYhal Htet Aung
 
The SPOSAD Architectural Style for Multi-tenant Software Applications
The SPOSAD Architectural Style for Multi-tenant Software ApplicationsThe SPOSAD Architectural Style for Multi-tenant Software Applications
The SPOSAD Architectural Style for Multi-tenant Software ApplicationsHeiko Koziolek
 
1.2 active directory
1.2 active directory1.2 active directory
1.2 active directoryMuuluu
 
Towards an Architectural Style for Multi-tenant Software Applications
Towards an Architectural Style for Multi-tenant Software ApplicationsTowards an Architectural Style for Multi-tenant Software Applications
Towards an Architectural Style for Multi-tenant Software ApplicationsHeiko Koziolek
 
Sep 2012 HUG: Giraffa File System to Grow Hadoop Bigger
Sep 2012 HUG: Giraffa File System to Grow Hadoop Bigger Sep 2012 HUG: Giraffa File System to Grow Hadoop Bigger
Sep 2012 HUG: Giraffa File System to Grow Hadoop Bigger Yahoo Developer Network
 
Hw09 Terapot Email Archiving With Hadoop
Hw09   Terapot  Email Archiving With HadoopHw09   Terapot  Email Archiving With Hadoop
Hw09 Terapot Email Archiving With HadoopCloudera, Inc.
 
Linked Data as a Service
Linked Data as a ServiceLinked Data as a Service
Linked Data as a ServicePeter Haase
 
The Object Evolution - EMC Object-Based Storage for Active Archiving and Appl...
The Object Evolution - EMC Object-Based Storage for Active Archiving and Appl...The Object Evolution - EMC Object-Based Storage for Active Archiving and Appl...
The Object Evolution - EMC Object-Based Storage for Active Archiving and Appl...EMC
 
Enterprise linked data clouds
Enterprise linked data cloudsEnterprise linked data clouds
Enterprise linked data cloudsdamienjoyce
 

Similar to SQL Server 2012 Beyond Relational Performance and Scale (20)

SQLBits X SQL Server 2012 Beyond Relational
SQLBits X SQL Server 2012 Beyond RelationalSQLBits X SQL Server 2012 Beyond Relational
SQLBits X SQL Server 2012 Beyond Relational
 
FileTable and Semantic Search in SQL Server 2012
FileTable and Semantic Search in SQL Server 2012FileTable and Semantic Search in SQL Server 2012
FileTable and Semantic Search in SQL Server 2012
 
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataSQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
 
The Efficient Use of Cyberinfrastructure to Enable Data Analysis Collaboration
The Efficient Use of Cyberinfrastructure  to Enable Data Analysis CollaborationThe Efficient Use of Cyberinfrastructure  to Enable Data Analysis Collaboration
The Efficient Use of Cyberinfrastructure to Enable Data Analysis Collaboration
 
Vormetric - Gherkin Event
Vormetric - Gherkin EventVormetric - Gherkin Event
Vormetric - Gherkin Event
 
Citrix Day 2012: ShareFile
Citrix Day 2012: ShareFileCitrix Day 2012: ShareFile
Citrix Day 2012: ShareFile
 
Relational
RelationalRelational
Relational
 
CSC1100 - Chapter08 - Database Management
CSC1100 - Chapter08 - Database ManagementCSC1100 - Chapter08 - Database Management
CSC1100 - Chapter08 - Database Management
 
354 ch1
354 ch1354 ch1
354 ch1
 
SBS-101 What is SharePoint
SBS-101 What is SharePointSBS-101 What is SharePoint
SBS-101 What is SharePoint
 
The SPOSAD Architectural Style for Multi-tenant Software Applications
The SPOSAD Architectural Style for Multi-tenant Software ApplicationsThe SPOSAD Architectural Style for Multi-tenant Software Applications
The SPOSAD Architectural Style for Multi-tenant Software Applications
 
1.2 active directory
1.2 active directory1.2 active directory
1.2 active directory
 
ISUG 113: File stream
ISUG 113: File streamISUG 113: File stream
ISUG 113: File stream
 
Towards an Architectural Style for Multi-tenant Software Applications
Towards an Architectural Style for Multi-tenant Software ApplicationsTowards an Architectural Style for Multi-tenant Software Applications
Towards an Architectural Style for Multi-tenant Software Applications
 
Sep 2012 HUG: Giraffa File System to Grow Hadoop Bigger
Sep 2012 HUG: Giraffa File System to Grow Hadoop Bigger Sep 2012 HUG: Giraffa File System to Grow Hadoop Bigger
Sep 2012 HUG: Giraffa File System to Grow Hadoop Bigger
 
Databases
DatabasesDatabases
Databases
 
Hw09 Terapot Email Archiving With Hadoop
Hw09   Terapot  Email Archiving With HadoopHw09   Terapot  Email Archiving With Hadoop
Hw09 Terapot Email Archiving With Hadoop
 
Linked Data as a Service
Linked Data as a ServiceLinked Data as a Service
Linked Data as a Service
 
The Object Evolution - EMC Object-Based Storage for Active Archiving and Appl...
The Object Evolution - EMC Object-Based Storage for Active Archiving and Appl...The Object Evolution - EMC Object-Based Storage for Active Archiving and Appl...
The Object Evolution - EMC Object-Based Storage for Active Archiving and Appl...
 
Enterprise linked data clouds
Enterprise linked data cloudsEnterprise linked data clouds
Enterprise linked data clouds
 

More from Michael Rys

Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Michael Rys
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Michael Rys
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Michael Rys
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Michael Rys
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Michael Rys
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Michael Rys
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...Michael Rys
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...Michael Rys
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)Michael Rys
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLMichael Rys
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)Michael Rys
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)Michael Rys
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)Michael Rys
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)Michael Rys
 
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)Michael Rys
 
U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)Michael Rys
 

More from Michael Rys (20)

Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
Running cost effective big data workloads with Azure Synapse and ADLS (MS Ign...
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019Big Data Processing with Spark and .NET - Microsoft Ignite 2019
Big Data Processing with Spark and .NET - Microsoft Ignite 2019
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
Bring your code to explore the Azure Data Lake: Execute your .NET/Python/R co...
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
U-SQL Killer Scenarios: Custom Processing, Big Cognition, Image and JSON Proc...
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
U-SQL Killer Scenarios: Taming the Data Science Monster with U-SQL and Big Co...
 
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
The Road to U-SQL: Experiences in Language Design (SQL Konferenz 2017 Keynote)
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
 
ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)ADL/U-SQL Introduction (SQLBits 2016)
ADL/U-SQL Introduction (SQLBits 2016)
 
U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)U-SQL Learning Resources (SQLBits 2016)
U-SQL Learning Resources (SQLBits 2016)
 
U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)U-SQL Partitioned Data and Tables (SQLBits 2016)
U-SQL Partitioned Data and Tables (SQLBits 2016)
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
 
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
U-SQL User-Defined Operators (UDOs) (SQLBits 2016)
 
U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)U-SQL Does SQL (SQLBits 2016)
U-SQL Does SQL (SQLBits 2016)
 

Recently uploaded

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 

SQL Server 2012 Beyond Relational Performance and Scale

  • 1.
  • 3. Building and Maintaining Applications with relational and non-relational data is hard Pain Complex integration Duplicated functionality Points Compensation for unavailable services Reduce the cost of managing all data Simplify the development of applications Goals over all data Provide management and programming services for all data
  • 4. Tables, XML, Spatial, Documents, Digital Media, Scientific Records, Factoids… Data formats and content natively understood for rich application and user experience Consistent Application Model and Data Constructs to ease application development, migration and long-term retention Provide rich services, e.g.,
  • 5. Programmability T-SQL Query Structured Data B-trees Manageability Availability Files
  • 6. Programmability T-SQL Query Search Structured Unstructured Data Data B-trees Manageability Availability Files
  • 7. Programmability Spatial, XML, T-SQL/Data Types HierarchyID Win 32 Query and XQuery Search Type Operations Spatial ops Semi- Structured Unstructured structured Data Data Data/XML XML, FTS, Spatial B-trees Indices Filestrea Manageability m Availability Files
  • 8. Rich Data Programmability Programming Spatial, XML, Capabilities T-SQL/Data Types HierarchyID Win 32 Rich Query and Query and Type Search Search Services Operations XQuery over all Data Spatial ops Semantic Platform Efficient Storage Structured Semi-structured Unstructured for BR Data Data Data/XML Data XML, FTS, Spatial B-trees Indices Filestream Manageability& Availability Files
  • 9.
  • 10.
  • 11. Transactional Access Streaming Win32 Access Streaming Win32 Access?? Database Applications Windows Apps SQL Apps Blobs SMB Share FileStream Files/Folders API Rich Services Fulltext Search Database Solutions Scale-up Semantic Similarity Disk Disk Disk FileTable 1 2 3 FileStreams Search Multiple Containers Integrated Administration? Integrated Administration Remote BLOB Storage Customer Application SQL RBS API D D Centera SQL B FileStre Azure lib lib FILESTREAM lib B FileStreams Integrated Azure Centera SQL DB Backup/Replication/AlwaysOn
  • 12. Store BLOBs in DB + File System Application BLOB DB
  • 13. FileTable Folder Hierarchy FILESTREAM Share MSSQLSERVER my_machine Database MSSQLSERVEROffice Directories DocsDocuments Private Docs Office Docs (Database1) (Database2) FileTable Directories Media Documents LogFiles (FileTable) (FileTable) (FileTable) User-Defined Directory Structure
  • 14.
  • 15. 900 800 700 Filestream Win32 (Filesystem) Throughput (Mbps) 600 Access Filestream T-SQL 500 400 Varbinary 300 Filesystem Win32 200 Access Gain (%) 100 0 240 KB 480 KB 1 MB 2 MB 4 MB 8 MB
  • 16. Insert 600 Filestream 500 Win32 (Filesystem) 400 Access Filestream T- Throughput (Mbps) 300 SQL 200 Varbinary 100 0 240 KB 480 KB 1 MB 2 MB 4 MB 8 MB -100 -200
  • 17. Create/Alter Database max_size DBCC Shrinkfile Emptyfile
  • 18. Use of multiple spindles for achieving better I/O Scalability
  • 19. 2012 2012
  • 20.
  • 21.
  • 22. Queries over 350M documents database and random DMLs running in background. Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput
  • 23. 2005/8 vs 2012 2005/8 2012 Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer playback benchmark
  • 24.
  • 26.
  • 27. C B D A B A B D A Primary Filter Secondary Filter E (Index lookup) (Original predicate) In general, split predicates in two Primary filter finds all candidates, possibly with false positives (but never false negatives) Secondary filter removes false positives The index provides our primary filter Original predicate is our secondary filter Some tweaks to this scheme Sometimes possible to skip secondary filter
  • 28. Fully contained cells Partially contained cells
  • 29.
  • 30.
  • 31.
  • 32. Optimal value (theoretical) is somewhere between two extremes Default values: Time needed to 512 - Geometry AUTO grid process false positives 768 - Geography AUTO grid 1024 - SELECT * FROM table t WITH MANUAL grids (SPATIAL_WINDOW_MAX_CELLS=256) WHERE t.geom.STIntersects(@window)=1;
  • 33.
  • 34.
  • 35. CREATE SPATIAL INDEX idxGeog ON table(geography column) USING GEOGRAPHY_GRID WITH ( DATA_COMPRESSION = page | row ); On the basis of internal tests, with compression - 40%-50% smaller - 20% faster -15% slower queries - Per partition compression setting is not supported.
  • 36.
  • 37. Give me the closest 5 Italian restaurants SQL Server 2008/2008 R2: table scan SQL Server 2012: uses spatial index SELECT TOP(5) * FROM Restaurants r WHERE r.type = ‘Italian’ AND r.pos.STDistance(@me) IS NOT NULL ORDER BY r.pos.STDistance(@me)
  • 38.
  • 39. Find the closest 50 business points to a specific location (out of 22 million in total)
  • 40.
  • 41.
  • 42. http://www.slideshare.net/MichaelRys/sql-bits-brruds http://www.slideshare.net/MichaelRys/filetable-and-semantic-search-in-sql-server-2012 http://www.sqlserverlaunch.com/WW/theater?sid=634 http://www.slideshare.net/MichaelRys/sqlbits-x-sql-server-2012-spatial http://www.slideshare.net/MichaelRys/sqlbits-x-sql-server-2012-spatial-indexing Forum: http://forums.microsoft.com/MSDN/ShowForum.aspx?ForumID=1629&SiteID=1

Editor's Notes

  1. Let’s take a look at a BR application. What services does it provide. What about having these services supported in the database instead of each application building their own?
  2. Examples: Manage an application that manages images in the file system and additional information in the databaseBuilding a spatial database application before SQL Server 2008Example services: Backup/restore, search over relational and non-relational data
  3. Pure relational database system.
  4. SQL Server 7.0: Added FT Search over unstructured data
  5. SQL 2000: Starting to add XML supportSQL 2005: XML datatype, XQuery, XML IndicesSQL 2008: Spatialdatatype and ops, Spatial Indexing, Filestream with Win 32 (but requires special library to open/create), integrated FTS Filestream requires NTFS
  6. As of SQL Server 2012:Exposing Win 32 natively through FileTableAddition of Semantic Platform to enable Semantic search (and eventually – post Denali - query)Efficient Storage: building on existing relational storage and indexing infrastructure and backup/restore/HA. Bring SQL Server’s superior TCO to BR data and assures efficient and safe storage of customer’s high-value dataRich Capabilities: Necessary (but not sufficent) programmability experience to move customers to entrust their high-value data to SQL with minimal migration pains and access it via their favorite programming model/API.Rich Services: Provide high-value services to unlock information in all data in a highly scalable way. Entices customers to move their high-value data into SQL to discover information fast. Provides platform stickiness and differentiation.
  7. Focus in SQL Server 2012 in priority order:Capabilities and rich services for unstructured dataSpatial platformSustain existing BR supportToolingPerformance & ScaleOrthogonalityLarge new Features
  8. Focus in SQL Server 2012 in priority order:Capabilities and rich services for unstructured dataSpatial platformSustain existing BR supportToolingPerformance & ScaleOrthogonalityLarge new Features
  9. SQL 2008 provides Filestreams as a way add large blobs/unstructured data streams into SQL and still be able to open a Win32 handle (using SQL API) and provide high streaming performance for the data Win32 Namespace support in SQL Server 2012 has the following goals Reduce the barrier to entry for customers who have data in file servers and have Win32 applications that work on these currently. By enabling Win32 namespace, SQL will generate Windows Share that can be exposed to existing Win32 applications similar to any file server shares. This can allow Win32 applications/mid tier servers (like IIS) to work with this data without having to understand the database/transaction semantics Single integrated set of Admin tools – SQL backup/restore, Replication, HA solutions etc Scale up – Add multiple disks on a machine for storing Filestream data. Use SQL services like Full text search for both FileStream and relational metadata, Property Promotion Infrastructure fro extracting interesting properties from SQL blobs/filestream to surface as relational columns for query
  10. Reading bigger buffers gives better performance FS volumeDedicated volumes means volumes not used for tempdb (non-OS, paging, SQL data & log volumes)If stored files are large as we generally recommend, format with 64K clustersDo compress filestream volumes or filestream containers, but ONLY if data to be stored is compressible. Note that in this case NTFS cluster size must be 4K.1 vol per container => enables space management at volume level.AV should be configured not to delete infected files but to quarantine them. Otherwise corruption will be reported.SMBWith 60KB: A read can happen in one single IO and ideally coming back in one single TCP-IP packet. It is not 64K because 64KB data can't fit in one single TCP/IP buffer.Partitioning:FILESTREAM columns require the presence of the ROWGUID unique index for aligned partitioning, or in case this is not possible, explicitly specifying the data placement option for the unique or primary key constraint on the ROWGUID column.
  11. customer lab testing with 220 MB video files. FS win32 reads video streaming performance.FILESTREAM best practices.
  12. Stats on inserts followed by reads.8.3 etc…
  13. Optimized hot paths, removed unnecessary serialization, expensive FileSystem operations etc
  14. Focus in SQL Server 2012 in priority order:Capabilities and rich services for unstructured dataSpatial platformSustain existing BR supportToolingPerformance & ScaleOrthogonalityLarge new Features
  15. TB
  16. Experimentation: For instance, consider this dataset: US Highways.  In this dataset some of the LineStrings are quite long (over 2000 miles) and others are quite short (400 meters or less). For optimal performance, the following two indexes were roughly equivalent:Geography Index: MEDIUM, MEDIUM, MEDIUM, MEDIUM 1024Geometry Index: LOW, LOW, LOW, LOW 1024