SlideShare a Scribd company logo
1 of 51
Download to read offline
SQL Server Parallel DWH
   Architecture “Aka :
   Madison”
Franck Sidi
Lead SQL Server & Bi – Microsoft Israel
Trusted, Scalable Platform
Our scalability strategy




                           “Madison” in 2010 Q1
Agenda
 Concepts and Principles
 Reference Architectures “FastTrack”
 Madison functional overview
 Early adoption
Symmetric Multiprocessing

     SMP

     Single DB instance
     “Shared Everything” Architecture
     Server/CPU’s share
        memory
        disks

   Can lead to resource contention as you scale
Massively Parallel Processing
     MPP

     Server/CPU’s have their own dedicated resources
     “Shared Nothing” Architecture
     “Secret Sauce” is parallelizing operations
     Lightning-fast Queries, Data Loads and Updates
     Linear Scalability

   Problem needs to be partitionable
SMP vs MPP
            SMP                                MPP
HW advancements increasing       HW advancements increasing
ability to scale-up              ability to scale-up & scale-out
  Scaling is limited               Scaling to 1 PB+
  High end SMP very expensive      Scale out is relatively low cost
Extremely high concurrency for   Relatively high concurrency for
some workloads                   complex workloads
Less than 1-2 TB of data SMP     > 2 TB up to 1 PB
will almost always be better     Limited SQL Server functionality
Full SQL Server functionality    HA is built in
HA must be architected in
Agenda
 Concepts and Principles
 Reference Architectures “FastTrack”
 Madison functional overview
 Early adoption
How some solve the problem today
Big SAN
Biggest 64-core Server
Connected together!




What’s wrong with
this picture???
System out of balance
 This server can consume 16 GB/Sec of IO, but
 the SAN can only deliver 2 GB/Sec
    Even when the SAN is dedicated to the SQL Data
    Warehouse, which it often isn’t
    Lots of disks for Random IOPS BUT
    Limited controllers  Limited IO bandwidth
 System is typically IO bound and queries are
 slow
    Despite significant investment in both Server and
    Storage
Where Does an I/O Go?
   Understand potential throughput of the hardware
     Each component in the path has associated
     speed/bandwidth
     Know where the potential bottlenecks exist
                        Switch




                                                            Controllers/Processors
                                  Front End Ports

                                                    Cache
        Host
                         Switch




 PCI Bus  HBA  Fiber Channel Ports  Array Processors  Disks
Potential Performance Bottlenecks


                                                                                               DISK   DISK
              SQL SERVER
              CPU CORES




                                       A




                                            FC SWITCH
                                FC
     SERVER

               WINDOWS



                                                                                           A
                CACHE


                               HBA     B                                                          LUN




                                                                               CACHE
                                                        A     STORAGE                  A
                                                        B    CONTROLLER                B       DISK   DISK
                                FC     A
                               HBA                                                         B
                                       B
                                                                                                  LUN




  CPU Feed Rate      SQL Server      HBA Port Rate          Switch Port Rate   SP Port Rate    LUN Read Rate   Disk Feed Rate
                   Read Ahead Rate
The alternative: A balanced system
 Design a server + storage configuration that can
 deliver all the IO bandwidth that CPUs can
 consume when executing a SQL Relational DW
 workload
 Avoid sharing storage devices among servers
 Avoid overinvesting in disk drives
    Focus on scan performance, not IOPS
 Layout and manage data to maximize range
 scan performance and minimize fragmentation
Sequential I/O
       Sequential I/O                                                   Random I/O

  Ideal for data warehousing                               Ideal for OLTP
  Large reads & writes                                     Small reads and writes
  Scans on large data stores are                           OLTP usually random-read centric
  usually read with sequential read                            Seek queries are a goal in OLTP query
  patterns and not random read                                 optimization
  patterns                                                     Seeks usually cause random reads
                                                           Not as predictable & scalable for
                                                           data warehousing
  Scalable, predictable performance                        Requires large number of drives
  Requires 1/3 or fewer drives for
  same performance



 All databases contain both scans and seeks among with other types of reads and writes, DW workload indicate
                             that the vast majority of reads are sequential – not all
What is Fast Track Data Warehouse?
  A method for designing a cost-effective,
  balanced system for Data Warehouse
  workloads
  Reference hardware configurations developed
  in conjunction with hardware partners using
  this method
  Best practices for data layout, loading and
  management

    Relational Database Only – Not SSAS, IS, RS
Fast Track Scope
                                                                Presentation Layer Systems
   Reference Architecture Scope (dashed)




                                            Presentation Data
                                           Presentation Data
                                                                    Web Analytic Tools
                                                                    Reporting Services

      Dedicated SAN,
      Storage Array     Data Staging,
                        Bulk Loading
                        Data Warehouse                                                     Analysis Services
                                                                                           Cubes


                                                                     SharePoint Services
                                                                     Microsoft Office SharePoint
                                                                          PerformancePoint
                                                                          Excel Services
Benefits of Fast Track appliance
model
  Lower TCO
     Minimizes risk of overspending on un-balanced hardware
     configurations
     Commodity Hardware
  Choice
     HW platform
     Implementation vendor
  Reduced Risk
     Validated by Microsoft
     Encapsulates best practices
     Known performance & scalability
Fast Track DW Reference Configurations
                                                                           CPU                                                                     Initial             Max
              Server                             CPU                                             SAN                 Data Drive Count
                                                                          Cores                                                                  Capacity*          Capacity**
    HP Proliant                    (2) AMD Opteron Istanbul                 12       (3) HP MSA2312fc               (24) 300GB 15k                   6TB                12TB
    DL 385 G6                      six core 2.6 GHz                                                                 RPM SAS
    HP Proliant                    (4) AMD Opteron Instanbul                24       (6) HP MSA2312fc               (48) 300GB 15k SAS               12TB               24TB
    DL 585 G6                      six core 2.6 GHz


    HP Proliant                    (8) AMD Opteron Istanbul                 48       (12) HP MSA2312                (96) 300GB 15k SAS               24TB               48TB
    DL 785 G6                      six core 2.8 GHz
    Dell PowerEdge R710            (2) Intel Xeon Nehalem                    8       (2) EMC AX4                    (16) 300GB 15k FC                4TB                    8TB
                                   quad core 2.66 GHz
    Dell Power Edge R900           (4) Intel Xeon Dunnington                24       (6) EMC AX4                    (48) 300GB 15k FC                12TB               24TB
                                   six core 2.67GHz
    IBM X3650 M2                   (2) Intel Xeon Nehalem                    8       (2) IBM DS3400                 (16) 200GB 15K FC                4TB                    8TB
                                   quad core 2.67 GHx
    IBM X3850 M2                   (4) Intel Xeon Dunnington                24       (6) IBM DS3400                 (24) 300GB 15k FC                12TB               24TB
                                   six core 2.67 GHz
    IBM X3950 M2                   (8) Intel Xeon Nehalem four              32       (8) IBM DS3400                 (32) 300GB 15k SAS               16TB               32TB
                                   core 2.13 GHz
    Bull Novascale R460            (2) Intel Xeon Nehalem                    8       (2) EMC AX4                    (16) 300GB 15k FC                4TB                    8TB
    E2                             quad core 2.66 GHz
    Bull Novascale R480            (4) Intel Xeon Dunnington                24       (6) EMC AX4                    (48) 300GB 15k FC                12TB               24TB
    E1                             six core 2.67GHz


* Core-balanced compressed capacity based on 300GB 15k SAS not including hot spares and log drives. Assumes 25% (of raw disk space) allocated for Temp DB.
** Represents storage array fully populated with 300GB15k SAS and use of 2.5:1 compression ratio. This includes the addition of one storage expansion tray per enclosure.
   30% of this storage should be reserved for DBA operations
Fast Track DW
Core-Balanced Architecture                                                    Using 300GB 15k SAS drives
     Each HBA port rated at 4Gb/s                                             each LUN rated at 125MB/s
     or 400MB/s and 1600MB/s for all       Each SP rated at 500MB/s           each SP controls 4 LUN’s at 500MB/s
     4 SP ports.                           or 1000MB/s for both SP’s          or 1000MB/s per MSA DAE


                                                RAID GP01      RAID GP02            RAID GP05

                                       S
                                       P        01      02     03       04         09      10
                                                 LUN1            LUN3
                                                                                     LUN0
                                       A         LUN2
                                                                                     (Logs)
                  SMP                                            LUN4                                HS                ONLY 8
                             SWITCH




                 Server
                  per
                                                 RAID GP03     RAID GP04
                                                                                                                         data
                4-Cores
                                                                                                                       disks !!!
                                       S
                                       P        05       06    07       08
                                                     LUN5        LUN7

                                       B             LUN6        LUN8


                                                                    Per MSA2312 Drive Details
              Each SP port rated at 4Gb/s                           • Each MSA can hold 12 drives, this configuration requires 11
              or 400MB/s and 1600MB/s for all                       • MSA is 2U in total (capacitor eliminates need for battery)
              4 SP ports.                                           • Each MSA SP port controls 4 LUNs
                                                                    • Each pair of LUNs consists of (2) 300GB 15k SAS drives RAID1
Fast Track Data Warehouse Components


                          Software:
                             • SQL Server 2008
                               Enterprise
                             • Windows Server 2008

                          Configuration guidelines:
                                • Physical table structures
                                • Indexes
                                • Compression
                                • SQL Server settings
                                • Windows Server settings
                                • Loading

                          Hardware:
                             • Tight specifications for servers,
                               storage and networking
                             • ‘Per core’ building block
RA: Tightly Spec'd
 RAs include not only hardware but best
 practices in:
    Window OS configuration
    SQL Server startup options
    Database physical layout
    Table types
    Indexing
    Statistics
    Managing fragmentation
    Loading procedures
Fast Track Case Study - Results

                         Teradata             SQL Server           Comparison
                                             Fast Track DW
Loading –         5:10:21 total time     51:31 total time      R SQL Server 6x
 Subject Area 1                                                  faster
Loading –         4:36:08 total time     1:50.01 total time    R SQL Server 2.5x
 Subject Area 2                                                  faster
Query times –     3:03 avg query time    0:15 avg query time   R SQL Server 12x
 Subject Area 1   (using 9 benchmark     (using 9 benchmark      faster
                  queries)               queries)
Query times –     56:44 avg query time   8:09 avg query time   R SQL Server 7x
 Subject Area 2   (using 4 benchmark     (using 4 benchmark      faster
                  queries)               queries)
Agenda
 Concepts and Principles
 Reference Architectures “FastTrack”
 Madison functional overview
 Early adoption
About DATAllegro…
        Technology
        Partners


        Proprietary Appliance
        Management and
        MPP Database


        Open Source
        Database and OS


        Industry
        Standard
        Servers

        Industry
        Standard
        Networking

        Industry
        Standard
        Storage
Integration Plans
 Provide scale out through MPP on SQL Server and Windows
 Offer ‘Appliance like’ user experience to Data Warehouse customers
 Lower TCO to high end Data Warehousing
 Offer integrated BI platform to small and very large Enterprises




                OPEN SOURCE
                DATABASE
                & OS

                Industry Standard
                Servers

                Industry Standard
                Networking

                Industry Standard
                Storage
MPP Additional Considerations
 Principles & approach of SMP carry forward
 Deeper level of complexity –
    High Availability
    Parallelization
    Inter node data movement
Modular building blocks
 Balanced CPU and storage
    Both SMP and MPP are based on building blocks that scale
    by the CPU core
    Adds network, storage processing and disk bandwidth for
    each core
    Based on maximizing & sustaining true sequential I/O while
    minimizing disks
 Generally changes balance of systems so more can be
 spent on CPU and SW than on storage to give better
 overall performance for a given budget
 Building blocks can be adjusted for multiple MPP
 configurations – high performance, archive and
 extreme performance
The future of SQL Server Data Warehousing
– Project "Madison"



     Predictable Scale out through MPP
     Customers with over 400 TB data warehouses
Commodity Hardware
 Lower cost
 Frequent performance improvements
 Easier upgrade and maintenance
 Higher customer comfort
 Better compatibility
Ultra Shared Nothing
  An extension of traditional shared nothing design
      Push shared nothing architecture into SMP node
         IO and CPU affinity within SMP nodes
           Eliminate contention per user query
           Use full resources for each user query
     Multiple physical instances of tables
       Distribute large tables
       Replicate small tables
       Distribute AND Replicate medium tables
     Re-Distribute rows “on-the-fly” when necessary
Madison Server Components
                                            Database Servers


Control Nodes                                              SQL
                                                                                      Control
Active / Passive
                                                           SQL                        Compute
                   SQL
                                                           SQL
                                                                                      Storage
                                                           SQL

                                                                                      Landing Zone
                                                                 Dual Fiber Channel
                                                           SQL
                         Dual Infiniband




                                                           SQL                        Backup
                                                           SQL
                                                                                      Management
                                                           SQL

                                                                                      Failover/Spare
                                           Spare Database Server
System Architecture       20Gbs Infiniband
                           DMS Backbone
                                                                     Database Servers


                         Control Nodes                                                  SQL

                         Active / Passive
                                                                                        SQL
                                            SQL
     Client Drivers
                                                                                        SQL


                                                                                        SQL




                                                                                              Dual Fiber Channel
                                                                                        SQL




                                                  Dual Infiniband
     Data Center
     Monitoring                                                                         SQL


                                                                                        SQL


                                                                                        SQL
   ETL Load Interface



                                                                                                 8Gbs Fiber Channel
   Corporate Backup                                                                                  Local San
   Solution                                                         Spare Database Server
                                                                         IPoIB
                                                                     Dedicated LAN
Corporate Network       Private Network
Software Architecture
                                                              Nexus
                                                MS BI
                                                              Query       Compute Nodes
                                               (AS, RS)                     Compute Nodes
                                                               Tool
                                                                              Compute Nodes
                                                                             DMS

                 IIS
                                                     JDBC
                  Admin Console                                                 User Data
                                                    OLE-DB
                                                                                                        SQL Server
                                                     ODBC
                                                    Ado.Net


                 Madison Service                                           Landing Zone
  DMS                                                                                        Loader
                                                                               DMS                           SQL SSIS
                                    Core Engine            DMS                                Client
                       DSQL
 SQL OS                              Services             Manager

                                                                           Backup Node
                                      SQL OS
                                                                               DMS


     DW               DW              DW
                                                      DW Schema            Management Node
Authentication   Configuration       Queue
                                                           SQL Server
                                                                                     HPC                AD




                          Existing MS software                 Built by DWPU                3rd Party
Control Node & Client Drivers
 Client connections always go through the control node
     Clustered to a passive node
 Processes SQL requests
 Prepares execution plan
 Orchestrates distributed execution
 Local SQL Server to do final query plan processing / result
 aggregation
 Will use same set of drivers used by DATAllegro
     Provided by DataDirect
        ODBC, OLE-DB, JDBC and Ado.Net client drivers
        Wire protocol (SeQuel Link)
     Available drivers for 32 and 64 bits
Compute Nodes
 A SQL Server 2008 instance
 DB engine nodes autonomous on local data
 SQL as primary interface
 Each MPP node is a highly tuned SMP node with
 standard interfaces
Landing Zone
 Provides high capacity storage for data files
 from ETL processes
 Integration services available on the landing
 zone
 Connected to internal network
 Available as sandbox for other applications and
 scripts that run on internal network.


                  Landing      Data    Compute
        Source
                 Zone Files   Loader    Nodes
Backup Node
 Builds on SQL Server native backup/restore
 facility
    Use VDI interface to plug into backup pipeline
    Database-level backup
 Coordinated backup across the nodes
 Quiesce write activity to synchronize
 Can only restore to another appliance with
 exactly the same number of distributions
Data Distribution & Replication

  Control Node                            Compute Nodes   Storage Nodes



                        Tables Are Hash
                         Distributed Or
                          Replicated




Landing Zone
    Node
                                            Spare Node

        Text
        File
           Text
           File
              Text
              File
                 Text
                 File
Database                                Distributed & Replicated Tables
                       Date Dim

                       D_DATE_SK                                                                      D
 Customer
                       D_DATE_ID
                                                                                              C                     I
                       D_DATE
 C-CUSTOMER_SK
 C_CUSTOMER_ID
                       D_MONTH                                                                        SS
                       …               Item                                                  CD                 P
 C_CURRENT_ADD
 R
                                       I_ITEM_SK                                                      S
 …
                                       I_ITEM_ID
                                       I_REC_START_D
                                       ATE
                                       I_ITEM_DESC
                                       …
                     Store Sales

                     Ss_sold_date_sk                                      D                                                           D
                     Ss_item_sk
                     Ss_customer_sk                               C                 I                                        C                 I
                     Ss_cdemo_sk                                                                      D
                     Ss_store_sk
                                                                          SS                                                          SS
                     Ss_promo_sk                              CD                    P        C                  I            CD                P
                     Ss_quantity                                                                      SS
                     …                                                    S                                                           S
                                       Promotion                                             CD                 P
Customer
                                       P_PROMO_SK                                                     S
Demographics
                                       P_PROMO_ID
CD_DEMO_SK                             P_START_DATE_
CD_GENDER                              SK                                                                               D
                                       P_END_DATE_SK          D
CD_MARITAL_STATUS   Store
CD_EDUCATION                           …                                                                   C                      I
                                                       C              I
…                   S_STORE_SK                                                                                          SS
                                                              SS
                    S_STORE_ID                                                                             CD                     P
                    S_REC_START_DAT                    CD             P
                                                                                        D                                                          D
                    E                                                                                                   S
                    S_REC_END_DATE                            S
                                                                               C                  I                                        C            I
                    S_STORE_NAME
                    …                                                                   SS                                                         SS
                                                                               CD                 P                                       CD            P

                                                                                        S                                                          S
Physical Storage Configuration – Single Node
                   LUN 1                  LUN 2                         LUN 3                                           LUN 8


              FG Dist A              FG Dist B                   FG Dist C                                          FG Dist H

              DistData1.mdf          DistData3.ndf               DistData5.ndf                                      DistData7.ndf
                DistData2.ndf          DistData4.ndf               DistData6.ndf                                      DistData8.ndf
Database(s)




                                                                Replicated FG
   User




              ReplData1.mdf          ReplData3.ndf               ReplData5.ndf                                      ReplData7.ndf
                ReplData2.ndf          ReplData4.ndf               ReplData6.ndf                                      ReplData8.ndf




              FG Stage A             FG Stage B                  FG Stage C                                         FG Stage H

              StageData1.mdf         StageData3.ndf              StageData5.ndf                                     StageData1.ndf
               StageData2.ndf         StageData4.ndf              StageData6.ndf                                     StageData2.ndf
Database
 Staging




                                                                Replicated FG

              ReplData1.mdf          ReplData3.ndf               ReplData5.ndf                                      ReplData7.ndf
                ReplData2.ndf          ReplData4.ndf               ReplData6.ndf                                      ReplData.ndf




              Local Drive 1     Local Drive 2          Local Drive 3          Local Drive 4         Local Drive 5       Local Drive 6
   TempDB




                TempDB1.mdf      TempDB2.ndf            TempDB3.ndf             TempDB4.ndf          TempDB5.ndf         TempDB6.ndf




                                                               Log LUN

                                        UserDB Log             StageDB Log             TempDB Log
Create Table – Behind the Scenes

         Create Table store_sales
         with
         distribute_on (ss_item_sk)
         partition_on(ss_sold_date_sk)
         cluster_on (ss_sold_date_sk)
                                                 8 Filegroups
          Create Table mad_store_sales_a         1 Table per FG
           Create Table mad_store_sales_ …       Distribution_a
            Create Table mad_store_sales_h         thru
                                                      Distribution_h




                                                                 12 Partitions
                                                                 (ss_sold_date_sk)




                                                                       8K
                                                                         8K
                                                                           8K        N-number of
                                                                             8K      Pages
                                                                               8K
                                                                                        Tuple
Microsoft Confidential
High Availability
   Multiple levels of redundancy:

                 • Leveraging MSCS for node availability
                 • Cluster aware services:
                     • SQL Server, Madison, DMS




           8x1
                 • Leveraging MSCS for SQL Services, DMS
                 • 1 spare node for every 8* compute nodes
Security and Encryption
 Retain DA v3 design
    Authentication and authorization done by Madison server
    Users and Roles as first class principals
    Nested role capabilities
    Connection to SQL back-ends through high privilege account
    SQL nodes reside on private network
 No support for integrated auth
 Leverages TDE to expose DB-level encryption
    Supports key rotation
The Logical Data Model
 Multiple databases per appliance
    Each user database maps to one SQL Server db per
    node
 Tables
    Replicated, Distributed, Replicated + Distributed
    Leverage SQL Server compression
    Supports Partitioning
    Supports secondary indexes
 Views
SQL Server Data Types      DAv3   Madison


Data Types                                       bigint
                                                 binary
                                                                                   P        P



Most scalar data types supported                 bit                                        P
                                                 char / nchar                      P        P
by SQL Server 2008 are supported                 date, time                                 P

by Madison                                       datetime (was date in DA)         P        P
                                                 datetime2                                  P
Main exceptions                                  datetimeoffset                             P
    Character and binary strings limited to 8K   decimal                           P        P
    (i.e. no BLOB support)                       float                             P        P

    XML                                          geometry / geography
                                                 hierarchyid
    Sql-Variant
                                                 Int (was integer in DA)           P        P
    System and CLR UDTs                          money                                      P

Latin1_General with binary                       real                                       P
                                                 smalldatetime                              P
comparison only                                  smallint                          P        P
                                                 smallmoney                                 P
                                                 sql_variant
                                                 text / ntext / image
                                                 timestamp
                                                 tinyint                           P        P
                                                 varchar / nvarchar / varbinary    P        P
                                                 v*(max)
                                                 uniqueidentifier
                                                 xml
Supported SQL Syntax
 Aligned with ANSI SQL 92
    Basic INSERT, UPDATE, DELETE, SELECT

 CREATE TABLE AS SELECT

 Limited analytical function support
 Teradata extensions
    Quantile, Sample,…
Configuration and Monitoring
  Challenge: Is it an appliance or a collection of nodes?
 Madison services instrumented
     Logs and Performance Counters
 Capture and forward SNMP alerts from devices within the appliance
 Small subset of DMVs to union underlying node DMVs
 Leverage HPC for monitoring
Manageability
 Web-based main administrative user interface
   Based on DATAllegro manageability UI
   Monitoring system health and activity
 Leveraging HPC pack 2008
   Systems management
   Monitoring
   Cluster health
Query Tools
              GUI Tool:
                 Nexus (CoffingDW)
                 Table & view object
                 explorer
                 Interactive query
                 execution

              Command line tool:
                 Replacement for DA-
                 SQL
                 Flavor of SqlCmd
MS BI Integration
Integration Services
     Madison enabled as a source
        Data movement, lookup operations, etc.
     Will add a new SSIS destination
        Ensure integrated high performance loads

Reporting Services
     Fully supported; including parameterized queries
     Will customize experience for report builder and report
     designer

Analysis Services
     Will get connectivity through OLE-DB provider
     Will enable both MOLAP and ROLAP storage
High Level Release Definitions

   Will start
 running MTPs                        V2+
    in the
   summer                   Closer functional alignment with SQL Server
                            Better integration with SQL and MS ecosystem,
                            tools and technologies


       “Madison” (aka v1)


       Focus on time to market
       Compatibility with DATAllegro v3
       MS BI integration

       H1 2010
Recap
  Data Warehousing Reference Architectures
  available today!
     SQL Server Fast Track
  SQL Server “Madison”
     Built for advanced, large scale data warehouses
     Shared-nothing MPP architecture
  Early evaluation programs starting soon


All feedback welcome:
   fransidi@microsoft.com

                             Thank you!

More Related Content

What's hot

Sql Server 2008 Performance and Scaleability
Sql Server 2008 Performance and ScaleabilitySql Server 2008 Performance and Scaleability
Sql Server 2008 Performance and Scaleability
dataplex systems limited
 
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
IndicThreads
 
The Art & Sience of Optimization
The Art & Sience of OptimizationThe Art & Sience of Optimization
The Art & Sience of Optimization
Hertzel Karbasi
 
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
Benoit Hudzia
 
CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring
CA Nimsoft
 

What's hot (20)

Document Imaging Tools and Strategies to Accelerate Your Accounts Payable Act...
Document Imaging Tools and Strategies to Accelerate Your Accounts Payable Act...Document Imaging Tools and Strategies to Accelerate Your Accounts Payable Act...
Document Imaging Tools and Strategies to Accelerate Your Accounts Payable Act...
 
Document Imaging and the SAP Content Server 101
Document Imaging and the SAP Content Server 101Document Imaging and the SAP Content Server 101
Document Imaging and the SAP Content Server 101
 
Sap On Esx Backup Methodology
Sap On Esx   Backup MethodologySap On Esx   Backup Methodology
Sap On Esx Backup Methodology
 
Real-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQReal-Time Loading to Sybase IQ
Real-Time Loading to Sybase IQ
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 
Sql Server 2008 Performance and Scaleability
Sql Server 2008 Performance and ScaleabilitySql Server 2008 Performance and Scaleability
Sql Server 2008 Performance and Scaleability
 
2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis
 
Twee remedies tegen systeemuitval en datacorruptie
Twee remedies tegen systeemuitval en datacorruptieTwee remedies tegen systeemuitval en datacorruptie
Twee remedies tegen systeemuitval en datacorruptie
 
Edition based redefinition joords
Edition based redefinition joordsEdition based redefinition joords
Edition based redefinition joords
 
Hana Offerings Engl
Hana Offerings EnglHana Offerings Engl
Hana Offerings Engl
 
SQL Server 2008 R2 Parallel Data Warehouse
SQL Server 2008 R2 Parallel Data WarehouseSQL Server 2008 R2 Parallel Data Warehouse
SQL Server 2008 R2 Parallel Data Warehouse
 
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
Scalable Architecture on Amazon AWS Cloud - Indicthreads cloud computing conf...
 
The Art & Sience of Optimization
The Art & Sience of OptimizationThe Art & Sience of Optimization
The Art & Sience of Optimization
 
Improving HR Document Availability and Process Workflows with Electronic Imaging
Improving HR Document Availability and Process Workflows with Electronic ImagingImproving HR Document Availability and Process Workflows with Electronic Imaging
Improving HR Document Availability and Process Workflows with Electronic Imaging
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
27ian2011 hp
27ian2011   hp27ian2011   hp
27ian2011 hp
 
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...Enhancing Live Migration Process for CPU and/or  memory intensive VMs running...
Enhancing Live Migration Process for CPU and/or memory intensive VMs running...
 
Hecatonchire kvm forum_2012_benoit_hudzia
Hecatonchire kvm forum_2012_benoit_hudziaHecatonchire kvm forum_2012_benoit_hudzia
Hecatonchire kvm forum_2012_benoit_hudzia
 
CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring CA Nimsoft xen desktop monitoring
CA Nimsoft xen desktop monitoring
 
Next Gen Datacenter
Next Gen DatacenterNext Gen Datacenter
Next Gen Datacenter
 

Similar to User Group Bi

Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
Ontico
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qcon
Yiwei Ma
 
Hana Training Day 1
Hana Training Day 1Hana Training Day 1
Hana Training Day 1
mishra4927
 
Ultimate SharePoint Infrastructure Best Practices Session - Live360 Orlando 2012
Ultimate SharePoint Infrastructure Best Practices Session - Live360 Orlando 2012Ultimate SharePoint Infrastructure Best Practices Session - Live360 Orlando 2012
Ultimate SharePoint Infrastructure Best Practices Session - Live360 Orlando 2012
Michael Noel
 
AWS Summit 2011: Architecting in the cloud
AWS Summit 2011: Architecting in the cloudAWS Summit 2011: Architecting in the cloud
AWS Summit 2011: Architecting in the cloud
Amazon Web Services
 

Similar to User Group Bi (20)

Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of TerabytesOverview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
#lspe: Dynamic Scaling
#lspe: Dynamic Scaling #lspe: Dynamic Scaling
#lspe: Dynamic Scaling
 
Sql azure introduction
Sql azure introductionSql azure introduction
Sql azure introduction
 
Ceph: Low Fail Go Scale
Ceph: Low Fail Go Scale Ceph: Low Fail Go Scale
Ceph: Low Fail Go Scale
 
Netflix web-adrian-qcon
Netflix web-adrian-qconNetflix web-adrian-qcon
Netflix web-adrian-qcon
 
Cloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web AppsCloud Computing & Scaling Web Apps
Cloud Computing & Scaling Web Apps
 
Hana Training Day 1
Hana Training Day 1Hana Training Day 1
Hana Training Day 1
 
Sql Server 2005 Memory Internals
Sql Server 2005 Memory InternalsSql Server 2005 Memory Internals
Sql Server 2005 Memory Internals
 
Ultimate SharePoint Infrastructure Best Practices Session - Live360 Orlando 2012
Ultimate SharePoint Infrastructure Best Practices Session - Live360 Orlando 2012Ultimate SharePoint Infrastructure Best Practices Session - Live360 Orlando 2012
Ultimate SharePoint Infrastructure Best Practices Session - Live360 Orlando 2012
 
Towards an Architectural Style for Multi-tenant Software Applications
Towards an Architectural Style for Multi-tenant Software ApplicationsTowards an Architectural Style for Multi-tenant Software Applications
Towards an Architectural Style for Multi-tenant Software Applications
 
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
Cloud architecture and deployment: The Kognitio checklist, Nigel Sanctuary, K...
 
Sapuki sig 2013
Sapuki sig 2013Sapuki sig 2013
Sapuki sig 2013
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
TechTalkThai webinar SAP HANA
TechTalkThai webinar SAP HANATechTalkThai webinar SAP HANA
TechTalkThai webinar SAP HANA
 
AWS Summit 2011: Architecting in the cloud
AWS Summit 2011: Architecting in the cloudAWS Summit 2011: Architecting in the cloud
AWS Summit 2011: Architecting in the cloud
 

More from sqlserver.co.il

Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013
sqlserver.co.il
 
Things you can find in the plan cache
Things you can find in the plan cacheThings you can find in the plan cache
Things you can find in the plan cache
sqlserver.co.il
 
Sql server user group news january 2013
Sql server user group news   january 2013Sql server user group news   january 2013
Sql server user group news january 2013
sqlserver.co.il
 
SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3
sqlserver.co.il
 
SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2
sqlserver.co.il
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1
sqlserver.co.il
 
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended EventsSQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
sqlserver.co.il
 
SQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoreSQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStore
sqlserver.co.il
 
SQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DACSQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DAC
sqlserver.co.il
 
SQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: SpatialSQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: Spatial
sqlserver.co.il
 
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
sqlserver.co.il
 

More from sqlserver.co.il (20)

Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013
 
Things you can find in the plan cache
Things you can find in the plan cacheThings you can find in the plan cache
Things you can find in the plan cache
 
Sql server user group news january 2013
Sql server user group news   january 2013Sql server user group news   january 2013
Sql server user group news january 2013
 
DAC 2012
DAC 2012DAC 2012
DAC 2012
 
Query handlingbytheserver
Query handlingbytheserverQuery handlingbytheserver
Query handlingbytheserver
 
Adi Sapir ISUG 123 11/10/2012
Adi Sapir ISUG 123 11/10/2012Adi Sapir ISUG 123 11/10/2012
Adi Sapir ISUG 123 11/10/2012
 
Products.intro.forum version
Products.intro.forum versionProducts.intro.forum version
Products.intro.forum version
 
SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3
 
SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1
 
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended EventsSQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
 
SQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoreSQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStore
 
SQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DACSQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DAC
 
SQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: SpatialSQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: Spatial
 
מיכאל
מיכאלמיכאל
מיכאל
 
נועם
נועםנועם
נועם
 
עדי
עדיעדי
עדי
 
מיכאל
מיכאלמיכאל
מיכאל
 
DBCC - Dubi Lebel
DBCC - Dubi LebelDBCC - Dubi Lebel
DBCC - Dubi Lebel
 
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

User Group Bi

  • 1. SQL Server Parallel DWH Architecture “Aka : Madison” Franck Sidi Lead SQL Server & Bi – Microsoft Israel
  • 2. Trusted, Scalable Platform Our scalability strategy “Madison” in 2010 Q1
  • 3. Agenda Concepts and Principles Reference Architectures “FastTrack” Madison functional overview Early adoption
  • 4. Symmetric Multiprocessing SMP Single DB instance “Shared Everything” Architecture Server/CPU’s share memory disks Can lead to resource contention as you scale
  • 5. Massively Parallel Processing MPP Server/CPU’s have their own dedicated resources “Shared Nothing” Architecture “Secret Sauce” is parallelizing operations Lightning-fast Queries, Data Loads and Updates Linear Scalability Problem needs to be partitionable
  • 6. SMP vs MPP SMP MPP HW advancements increasing HW advancements increasing ability to scale-up ability to scale-up & scale-out Scaling is limited Scaling to 1 PB+ High end SMP very expensive Scale out is relatively low cost Extremely high concurrency for Relatively high concurrency for some workloads complex workloads Less than 1-2 TB of data SMP > 2 TB up to 1 PB will almost always be better Limited SQL Server functionality Full SQL Server functionality HA is built in HA must be architected in
  • 7. Agenda Concepts and Principles Reference Architectures “FastTrack” Madison functional overview Early adoption
  • 8. How some solve the problem today Big SAN Biggest 64-core Server Connected together! What’s wrong with this picture???
  • 9. System out of balance This server can consume 16 GB/Sec of IO, but the SAN can only deliver 2 GB/Sec Even when the SAN is dedicated to the SQL Data Warehouse, which it often isn’t Lots of disks for Random IOPS BUT Limited controllers  Limited IO bandwidth System is typically IO bound and queries are slow Despite significant investment in both Server and Storage
  • 10. Where Does an I/O Go? Understand potential throughput of the hardware Each component in the path has associated speed/bandwidth Know where the potential bottlenecks exist Switch Controllers/Processors Front End Ports Cache Host Switch PCI Bus  HBA  Fiber Channel Ports  Array Processors  Disks
  • 11. Potential Performance Bottlenecks DISK DISK SQL SERVER CPU CORES A FC SWITCH FC SERVER WINDOWS A CACHE HBA B LUN CACHE A STORAGE A B CONTROLLER B DISK DISK FC A HBA B B LUN CPU Feed Rate SQL Server HBA Port Rate Switch Port Rate SP Port Rate LUN Read Rate Disk Feed Rate Read Ahead Rate
  • 12. The alternative: A balanced system Design a server + storage configuration that can deliver all the IO bandwidth that CPUs can consume when executing a SQL Relational DW workload Avoid sharing storage devices among servers Avoid overinvesting in disk drives Focus on scan performance, not IOPS Layout and manage data to maximize range scan performance and minimize fragmentation
  • 13. Sequential I/O Sequential I/O Random I/O Ideal for data warehousing Ideal for OLTP Large reads & writes Small reads and writes Scans on large data stores are OLTP usually random-read centric usually read with sequential read Seek queries are a goal in OLTP query patterns and not random read optimization patterns Seeks usually cause random reads Not as predictable & scalable for data warehousing Scalable, predictable performance Requires large number of drives Requires 1/3 or fewer drives for same performance All databases contain both scans and seeks among with other types of reads and writes, DW workload indicate that the vast majority of reads are sequential – not all
  • 14. What is Fast Track Data Warehouse? A method for designing a cost-effective, balanced system for Data Warehouse workloads Reference hardware configurations developed in conjunction with hardware partners using this method Best practices for data layout, loading and management Relational Database Only – Not SSAS, IS, RS
  • 15. Fast Track Scope Presentation Layer Systems Reference Architecture Scope (dashed) Presentation Data Presentation Data Web Analytic Tools Reporting Services Dedicated SAN, Storage Array Data Staging, Bulk Loading Data Warehouse Analysis Services Cubes SharePoint Services Microsoft Office SharePoint PerformancePoint Excel Services
  • 16. Benefits of Fast Track appliance model Lower TCO Minimizes risk of overspending on un-balanced hardware configurations Commodity Hardware Choice HW platform Implementation vendor Reduced Risk Validated by Microsoft Encapsulates best practices Known performance & scalability
  • 17. Fast Track DW Reference Configurations CPU Initial Max Server CPU SAN Data Drive Count Cores Capacity* Capacity** HP Proliant (2) AMD Opteron Istanbul 12 (3) HP MSA2312fc (24) 300GB 15k 6TB 12TB DL 385 G6 six core 2.6 GHz RPM SAS HP Proliant (4) AMD Opteron Instanbul 24 (6) HP MSA2312fc (48) 300GB 15k SAS 12TB 24TB DL 585 G6 six core 2.6 GHz HP Proliant (8) AMD Opteron Istanbul 48 (12) HP MSA2312 (96) 300GB 15k SAS 24TB 48TB DL 785 G6 six core 2.8 GHz Dell PowerEdge R710 (2) Intel Xeon Nehalem 8 (2) EMC AX4 (16) 300GB 15k FC 4TB 8TB quad core 2.66 GHz Dell Power Edge R900 (4) Intel Xeon Dunnington 24 (6) EMC AX4 (48) 300GB 15k FC 12TB 24TB six core 2.67GHz IBM X3650 M2 (2) Intel Xeon Nehalem 8 (2) IBM DS3400 (16) 200GB 15K FC 4TB 8TB quad core 2.67 GHx IBM X3850 M2 (4) Intel Xeon Dunnington 24 (6) IBM DS3400 (24) 300GB 15k FC 12TB 24TB six core 2.67 GHz IBM X3950 M2 (8) Intel Xeon Nehalem four 32 (8) IBM DS3400 (32) 300GB 15k SAS 16TB 32TB core 2.13 GHz Bull Novascale R460 (2) Intel Xeon Nehalem 8 (2) EMC AX4 (16) 300GB 15k FC 4TB 8TB E2 quad core 2.66 GHz Bull Novascale R480 (4) Intel Xeon Dunnington 24 (6) EMC AX4 (48) 300GB 15k FC 12TB 24TB E1 six core 2.67GHz * Core-balanced compressed capacity based on 300GB 15k SAS not including hot spares and log drives. Assumes 25% (of raw disk space) allocated for Temp DB. ** Represents storage array fully populated with 300GB15k SAS and use of 2.5:1 compression ratio. This includes the addition of one storage expansion tray per enclosure. 30% of this storage should be reserved for DBA operations
  • 18. Fast Track DW Core-Balanced Architecture Using 300GB 15k SAS drives Each HBA port rated at 4Gb/s each LUN rated at 125MB/s or 400MB/s and 1600MB/s for all Each SP rated at 500MB/s each SP controls 4 LUN’s at 500MB/s 4 SP ports. or 1000MB/s for both SP’s or 1000MB/s per MSA DAE RAID GP01 RAID GP02 RAID GP05 S P 01 02 03 04 09 10 LUN1 LUN3 LUN0 A LUN2 (Logs) SMP LUN4 HS ONLY 8 SWITCH Server per RAID GP03 RAID GP04 data 4-Cores disks !!! S P 05 06 07 08 LUN5 LUN7 B LUN6 LUN8 Per MSA2312 Drive Details Each SP port rated at 4Gb/s • Each MSA can hold 12 drives, this configuration requires 11 or 400MB/s and 1600MB/s for all • MSA is 2U in total (capacitor eliminates need for battery) 4 SP ports. • Each MSA SP port controls 4 LUNs • Each pair of LUNs consists of (2) 300GB 15k SAS drives RAID1
  • 19. Fast Track Data Warehouse Components Software: • SQL Server 2008 Enterprise • Windows Server 2008 Configuration guidelines: • Physical table structures • Indexes • Compression • SQL Server settings • Windows Server settings • Loading Hardware: • Tight specifications for servers, storage and networking • ‘Per core’ building block
  • 20. RA: Tightly Spec'd RAs include not only hardware but best practices in: Window OS configuration SQL Server startup options Database physical layout Table types Indexing Statistics Managing fragmentation Loading procedures
  • 21. Fast Track Case Study - Results Teradata SQL Server Comparison Fast Track DW Loading – 5:10:21 total time 51:31 total time R SQL Server 6x Subject Area 1 faster Loading – 4:36:08 total time 1:50.01 total time R SQL Server 2.5x Subject Area 2 faster Query times – 3:03 avg query time 0:15 avg query time R SQL Server 12x Subject Area 1 (using 9 benchmark (using 9 benchmark faster queries) queries) Query times – 56:44 avg query time 8:09 avg query time R SQL Server 7x Subject Area 2 (using 4 benchmark (using 4 benchmark faster queries) queries)
  • 22. Agenda Concepts and Principles Reference Architectures “FastTrack” Madison functional overview Early adoption
  • 23. About DATAllegro… Technology Partners Proprietary Appliance Management and MPP Database Open Source Database and OS Industry Standard Servers Industry Standard Networking Industry Standard Storage
  • 24. Integration Plans Provide scale out through MPP on SQL Server and Windows Offer ‘Appliance like’ user experience to Data Warehouse customers Lower TCO to high end Data Warehousing Offer integrated BI platform to small and very large Enterprises OPEN SOURCE DATABASE & OS Industry Standard Servers Industry Standard Networking Industry Standard Storage
  • 25. MPP Additional Considerations Principles & approach of SMP carry forward Deeper level of complexity – High Availability Parallelization Inter node data movement
  • 26. Modular building blocks Balanced CPU and storage Both SMP and MPP are based on building blocks that scale by the CPU core Adds network, storage processing and disk bandwidth for each core Based on maximizing & sustaining true sequential I/O while minimizing disks Generally changes balance of systems so more can be spent on CPU and SW than on storage to give better overall performance for a given budget Building blocks can be adjusted for multiple MPP configurations – high performance, archive and extreme performance
  • 27. The future of SQL Server Data Warehousing – Project "Madison" Predictable Scale out through MPP Customers with over 400 TB data warehouses
  • 28. Commodity Hardware Lower cost Frequent performance improvements Easier upgrade and maintenance Higher customer comfort Better compatibility
  • 29. Ultra Shared Nothing An extension of traditional shared nothing design Push shared nothing architecture into SMP node IO and CPU affinity within SMP nodes Eliminate contention per user query Use full resources for each user query Multiple physical instances of tables Distribute large tables Replicate small tables Distribute AND Replicate medium tables Re-Distribute rows “on-the-fly” when necessary
  • 30. Madison Server Components Database Servers Control Nodes SQL Control Active / Passive SQL Compute SQL SQL Storage SQL Landing Zone Dual Fiber Channel SQL Dual Infiniband SQL Backup SQL Management SQL Failover/Spare Spare Database Server
  • 31. System Architecture 20Gbs Infiniband DMS Backbone Database Servers Control Nodes SQL Active / Passive SQL SQL Client Drivers SQL SQL Dual Fiber Channel SQL Dual Infiniband Data Center Monitoring SQL SQL SQL ETL Load Interface 8Gbs Fiber Channel Corporate Backup Local San Solution Spare Database Server IPoIB Dedicated LAN Corporate Network Private Network
  • 32. Software Architecture Nexus MS BI Query Compute Nodes (AS, RS) Compute Nodes Tool Compute Nodes DMS IIS JDBC Admin Console User Data OLE-DB SQL Server ODBC Ado.Net Madison Service Landing Zone DMS Loader DMS SQL SSIS Core Engine DMS Client DSQL SQL OS Services Manager Backup Node SQL OS DMS DW DW DW DW Schema Management Node Authentication Configuration Queue SQL Server HPC AD Existing MS software Built by DWPU 3rd Party
  • 33. Control Node & Client Drivers Client connections always go through the control node Clustered to a passive node Processes SQL requests Prepares execution plan Orchestrates distributed execution Local SQL Server to do final query plan processing / result aggregation Will use same set of drivers used by DATAllegro Provided by DataDirect ODBC, OLE-DB, JDBC and Ado.Net client drivers Wire protocol (SeQuel Link) Available drivers for 32 and 64 bits
  • 34. Compute Nodes A SQL Server 2008 instance DB engine nodes autonomous on local data SQL as primary interface Each MPP node is a highly tuned SMP node with standard interfaces
  • 35. Landing Zone Provides high capacity storage for data files from ETL processes Integration services available on the landing zone Connected to internal network Available as sandbox for other applications and scripts that run on internal network. Landing Data Compute Source Zone Files Loader Nodes
  • 36. Backup Node Builds on SQL Server native backup/restore facility Use VDI interface to plug into backup pipeline Database-level backup Coordinated backup across the nodes Quiesce write activity to synchronize Can only restore to another appliance with exactly the same number of distributions
  • 37. Data Distribution & Replication Control Node Compute Nodes Storage Nodes Tables Are Hash Distributed Or Replicated Landing Zone Node Spare Node Text File Text File Text File Text File
  • 38. Database Distributed & Replicated Tables Date Dim D_DATE_SK D Customer D_DATE_ID C I D_DATE C-CUSTOMER_SK C_CUSTOMER_ID D_MONTH SS … Item CD P C_CURRENT_ADD R I_ITEM_SK S … I_ITEM_ID I_REC_START_D ATE I_ITEM_DESC … Store Sales Ss_sold_date_sk D D Ss_item_sk Ss_customer_sk C I C I Ss_cdemo_sk D Ss_store_sk SS SS Ss_promo_sk CD P C I CD P Ss_quantity SS … S S Promotion CD P Customer P_PROMO_SK S Demographics P_PROMO_ID CD_DEMO_SK P_START_DATE_ CD_GENDER SK D P_END_DATE_SK D CD_MARITAL_STATUS Store CD_EDUCATION … C I C I … S_STORE_SK SS SS S_STORE_ID CD P S_REC_START_DAT CD P D D E S S_REC_END_DATE S C I C I S_STORE_NAME … SS SS CD P CD P S S
  • 39. Physical Storage Configuration – Single Node LUN 1 LUN 2 LUN 3 LUN 8 FG Dist A FG Dist B FG Dist C FG Dist H DistData1.mdf DistData3.ndf DistData5.ndf DistData7.ndf DistData2.ndf DistData4.ndf DistData6.ndf DistData8.ndf Database(s) Replicated FG User ReplData1.mdf ReplData3.ndf ReplData5.ndf ReplData7.ndf ReplData2.ndf ReplData4.ndf ReplData6.ndf ReplData8.ndf FG Stage A FG Stage B FG Stage C FG Stage H StageData1.mdf StageData3.ndf StageData5.ndf StageData1.ndf StageData2.ndf StageData4.ndf StageData6.ndf StageData2.ndf Database Staging Replicated FG ReplData1.mdf ReplData3.ndf ReplData5.ndf ReplData7.ndf ReplData2.ndf ReplData4.ndf ReplData6.ndf ReplData.ndf Local Drive 1 Local Drive 2 Local Drive 3 Local Drive 4 Local Drive 5 Local Drive 6 TempDB TempDB1.mdf TempDB2.ndf TempDB3.ndf TempDB4.ndf TempDB5.ndf TempDB6.ndf Log LUN UserDB Log StageDB Log TempDB Log
  • 40. Create Table – Behind the Scenes Create Table store_sales with distribute_on (ss_item_sk) partition_on(ss_sold_date_sk) cluster_on (ss_sold_date_sk) 8 Filegroups Create Table mad_store_sales_a 1 Table per FG Create Table mad_store_sales_ … Distribution_a Create Table mad_store_sales_h thru Distribution_h 12 Partitions (ss_sold_date_sk) 8K 8K 8K N-number of 8K Pages 8K Tuple Microsoft Confidential
  • 41. High Availability Multiple levels of redundancy: • Leveraging MSCS for node availability • Cluster aware services: • SQL Server, Madison, DMS 8x1 • Leveraging MSCS for SQL Services, DMS • 1 spare node for every 8* compute nodes
  • 42. Security and Encryption Retain DA v3 design Authentication and authorization done by Madison server Users and Roles as first class principals Nested role capabilities Connection to SQL back-ends through high privilege account SQL nodes reside on private network No support for integrated auth Leverages TDE to expose DB-level encryption Supports key rotation
  • 43. The Logical Data Model Multiple databases per appliance Each user database maps to one SQL Server db per node Tables Replicated, Distributed, Replicated + Distributed Leverage SQL Server compression Supports Partitioning Supports secondary indexes Views
  • 44. SQL Server Data Types DAv3 Madison Data Types bigint binary P P Most scalar data types supported bit P char / nchar P P by SQL Server 2008 are supported date, time P by Madison datetime (was date in DA) P P datetime2 P Main exceptions datetimeoffset P Character and binary strings limited to 8K decimal P P (i.e. no BLOB support) float P P XML geometry / geography hierarchyid Sql-Variant Int (was integer in DA) P P System and CLR UDTs money P Latin1_General with binary real P smalldatetime P comparison only smallint P P smallmoney P sql_variant text / ntext / image timestamp tinyint P P varchar / nvarchar / varbinary P P v*(max) uniqueidentifier xml
  • 45. Supported SQL Syntax Aligned with ANSI SQL 92 Basic INSERT, UPDATE, DELETE, SELECT CREATE TABLE AS SELECT Limited analytical function support Teradata extensions Quantile, Sample,…
  • 46. Configuration and Monitoring Challenge: Is it an appliance or a collection of nodes? Madison services instrumented Logs and Performance Counters Capture and forward SNMP alerts from devices within the appliance Small subset of DMVs to union underlying node DMVs Leverage HPC for monitoring
  • 47. Manageability Web-based main administrative user interface Based on DATAllegro manageability UI Monitoring system health and activity Leveraging HPC pack 2008 Systems management Monitoring Cluster health
  • 48. Query Tools GUI Tool: Nexus (CoffingDW) Table & view object explorer Interactive query execution Command line tool: Replacement for DA- SQL Flavor of SqlCmd
  • 49. MS BI Integration Integration Services Madison enabled as a source Data movement, lookup operations, etc. Will add a new SSIS destination Ensure integrated high performance loads Reporting Services Fully supported; including parameterized queries Will customize experience for report builder and report designer Analysis Services Will get connectivity through OLE-DB provider Will enable both MOLAP and ROLAP storage
  • 50. High Level Release Definitions Will start running MTPs V2+ in the summer Closer functional alignment with SQL Server Better integration with SQL and MS ecosystem, tools and technologies “Madison” (aka v1) Focus on time to market Compatibility with DATAllegro v3 MS BI integration H1 2010
  • 51. Recap Data Warehousing Reference Architectures available today! SQL Server Fast Track SQL Server “Madison” Built for advanced, large scale data warehouses Shared-nothing MPP architecture Early evaluation programs starting soon All feedback welcome: fransidi@microsoft.com Thank you!