User Group Bi

SQL Server Parallel DWH
Architecture “Aka :
Madison”
Franck Sidi
Lead SQL Server & Bi – Microsoft Israel

Trusted, Scalable Platform
Our scalability strategy

“Madison” in 2010 Q1

Agenda
Concepts and Principles
Reference Architectures “FastTrack”
Madison functional overview
Early adoption

Symmetric Multiprocessing

SMP

Single DB instance
“Shared Everything” Architecture
Server/CPU’s share
memory
disks

Can lead to resource contention as you scale

Massively Parallel Processing
MPP

Server/CPU’s have their own dedicated resources
“Shared Nothing” Architecture
“Secret Sauce” is parallelizing operations
Lightning-fast Queries, Data Loads and Updates
Linear Scalability

Problem needs to be partitionable

SMP vs MPP
SMP MPP
HW advancements increasing HW advancements increasing
ability to scale-up ability to scale-up & scale-out
Scaling is limited Scaling to 1 PB+
High end SMP very expensive Scale out is relatively low cost
Extremely high concurrency for Relatively high concurrency for
some workloads complex workloads
Less than 1-2 TB of data SMP > 2 TB up to 1 PB
will almost always be better Limited SQL Server functionality
Full SQL Server functionality HA is built in
HA must be architected in

How some solve the problem today
Big SAN
Biggest 64-core Server
Connected together!

What’s wrong with
this picture???

System out of balance
This server can consume 16 GB/Sec of IO, but
the SAN can only deliver 2 GB/Sec
Even when the SAN is dedicated to the SQL Data
Warehouse, which it often isn’t
Lots of disks for Random IOPS BUT
Limited controllers  Limited IO bandwidth
System is typically IO bound and queries are
slow
Despite significant investment in both Server and
Storage

Where Does an I/O Go?
Understand potential throughput of the hardware
Each component in the path has associated
speed/bandwidth
Know where the potential bottlenecks exist
Switch

Controllers/Processors
Front End Ports

Cache
Host
Switch

PCI Bus  HBA  Fiber Channel Ports  Array Processors  Disks

Potential Performance Bottlenecks

DISK DISK
SQL SERVER
CPU CORES

A

FC SWITCH
FC
SERVER

WINDOWS

A
CACHE

HBA B LUN

CACHE
A STORAGE A
B CONTROLLER B DISK DISK
FC A
HBA B
B
LUN

CPU Feed Rate SQL Server HBA Port Rate Switch Port Rate SP Port Rate LUN Read Rate Disk Feed Rate
Read Ahead Rate

The alternative: A balanced system
Design a server + storage configuration that can
deliver all the IO bandwidth that CPUs can
consume when executing a SQL Relational DW
workload
Avoid sharing storage devices among servers
Avoid overinvesting in disk drives
Focus on scan performance, not IOPS
Layout and manage data to maximize range
scan performance and minimize fragmentation

Sequential I/O
Sequential I/O Random I/O

Ideal for data warehousing Ideal for OLTP
Large reads & writes Small reads and writes
Scans on large data stores are OLTP usually random-read centric
usually read with sequential read Seek queries are a goal in OLTP query
patterns and not random read optimization
patterns Seeks usually cause random reads
Not as predictable & scalable for
data warehousing
Scalable, predictable performance Requires large number of drives
Requires 1/3 or fewer drives for
same performance

All databases contain both scans and seeks among with other types of reads and writes, DW workload indicate
that the vast majority of reads are sequential – not all

What is Fast Track Data Warehouse?
A method for designing a cost-effective,
balanced system for Data Warehouse
workloads
Reference hardware configurations developed
in conjunction with hardware partners using
this method
Best practices for data layout, loading and
management

Relational Database Only – Not SSAS, IS, RS

Fast Track Scope
Presentation Layer Systems
Reference Architecture Scope (dashed)

Presentation Data
Presentation Data
Web Analytic Tools
Reporting Services

Dedicated SAN,
Storage Array Data Staging,
Bulk Loading
Data Warehouse Analysis Services
Cubes

SharePoint Services
Microsoft Office SharePoint
PerformancePoint
Excel Services

Benefits of Fast Track appliance
model
Lower TCO
Minimizes risk of overspending on un-balanced hardware
configurations
Commodity Hardware
Choice
HW platform
Implementation vendor
Reduced Risk
Validated by Microsoft
Encapsulates best practices
Known performance & scalability

Fast Track DW Reference Configurations
CPU Initial Max
Server CPU SAN Data Drive Count
Cores Capacity* Capacity**
HP Proliant (2) AMD Opteron Istanbul 12 (3) HP MSA2312fc (24) 300GB 15k 6TB 12TB
DL 385 G6 six core 2.6 GHz RPM SAS
HP Proliant (4) AMD Opteron Instanbul 24 (6) HP MSA2312fc (48) 300GB 15k SAS 12TB 24TB
DL 585 G6 six core 2.6 GHz

HP Proliant (8) AMD Opteron Istanbul 48 (12) HP MSA2312 (96) 300GB 15k SAS 24TB 48TB
DL 785 G6 six core 2.8 GHz
Dell PowerEdge R710 (2) Intel Xeon Nehalem 8 (2) EMC AX4 (16) 300GB 15k FC 4TB 8TB
quad core 2.66 GHz
Dell Power Edge R900 (4) Intel Xeon Dunnington 24 (6) EMC AX4 (48) 300GB 15k FC 12TB 24TB
six core 2.67GHz
IBM X3650 M2 (2) Intel Xeon Nehalem 8 (2) IBM DS3400 (16) 200GB 15K FC 4TB 8TB
quad core 2.67 GHx
IBM X3850 M2 (4) Intel Xeon Dunnington 24 (6) IBM DS3400 (24) 300GB 15k FC 12TB 24TB
six core 2.67 GHz
IBM X3950 M2 (8) Intel Xeon Nehalem four 32 (8) IBM DS3400 (32) 300GB 15k SAS 16TB 32TB
core 2.13 GHz
Bull Novascale R460 (2) Intel Xeon Nehalem 8 (2) EMC AX4 (16) 300GB 15k FC 4TB 8TB
E2 quad core 2.66 GHz
Bull Novascale R480 (4) Intel Xeon Dunnington 24 (6) EMC AX4 (48) 300GB 15k FC 12TB 24TB
E1 six core 2.67GHz

* Core-balanced compressed capacity based on 300GB 15k SAS not including hot spares and log drives. Assumes 25% (of raw disk space) allocated for Temp DB.
** Represents storage array fully populated with 300GB15k SAS and use of 2.5:1 compression ratio. This includes the addition of one storage expansion tray per enclosure.
30% of this storage should be reserved for DBA operations

Fast Track DW
Core-Balanced Architecture Using 300GB 15k SAS drives
Each HBA port rated at 4Gb/s each LUN rated at 125MB/s
or 400MB/s and 1600MB/s for all Each SP rated at 500MB/s each SP controls 4 LUN’s at 500MB/s
4 SP ports. or 1000MB/s for both SP’s or 1000MB/s per MSA DAE

RAID GP01 RAID GP02 RAID GP05

S
P 01 02 03 04 09 10
LUN1 LUN3
LUN0
A LUN2
(Logs)
SMP LUN4 HS ONLY 8
SWITCH

Server
per
RAID GP03 RAID GP04
data
4-Cores
disks !!!
S
P 05 06 07 08
LUN5 LUN7

B LUN6 LUN8

Per MSA2312 Drive Details
Each SP port rated at 4Gb/s • Each MSA can hold 12 drives, this configuration requires 11
or 400MB/s and 1600MB/s for all • MSA is 2U in total (capacitor eliminates need for battery)
4 SP ports. • Each MSA SP port controls 4 LUNs
• Each pair of LUNs consists of (2) 300GB 15k SAS drives RAID1

Fast Track Data Warehouse Components

Software:
• SQL Server 2008
Enterprise
• Windows Server 2008

Configuration guidelines:
• Physical table structures
• Indexes
• Compression
• SQL Server settings
• Windows Server settings
• Loading

Hardware:
• Tight specifications for servers,
storage and networking
• ‘Per core’ building block

RA: Tightly Spec'd
RAs include not only hardware but best
practices in:
Window OS configuration
SQL Server startup options
Database physical layout
Table types
Indexing
Statistics
Managing fragmentation
Loading procedures

Fast Track Case Study - Results

Teradata SQL Server Comparison
Fast Track DW
Loading – 5:10:21 total time 51:31 total time R SQL Server 6x
Subject Area 1 faster
Loading – 4:36:08 total time 1:50.01 total time R SQL Server 2.5x
Subject Area 2 faster
Query times – 3:03 avg query time 0:15 avg query time R SQL Server 12x
Subject Area 1 (using 9 benchmark (using 9 benchmark faster
queries) queries)
Query times – 56:44 avg query time 8:09 avg query time R SQL Server 7x
Subject Area 2 (using 4 benchmark (using 4 benchmark faster
queries) queries)

About DATAllegro…
Technology
Partners

Proprietary Appliance
Management and
MPP Database

Open Source
Database and OS

Industry
Standard
Servers

Industry
Standard
Networking

Industry
Standard
Storage

Integration Plans
Provide scale out through MPP on SQL Server and Windows
Offer ‘Appliance like’ user experience to Data Warehouse customers
Lower TCO to high end Data Warehousing
Offer integrated BI platform to small and very large Enterprises

OPEN SOURCE
DATABASE
& OS

Industry Standard
Servers

Industry Standard
Networking

Industry Standard
Storage

MPP Additional Considerations
Principles & approach of SMP carry forward
Deeper level of complexity –
High Availability
Parallelization
Inter node data movement

Modular building blocks
Balanced CPU and storage
Both SMP and MPP are based on building blocks that scale
by the CPU core
Adds network, storage processing and disk bandwidth for
each core
Based on maximizing & sustaining true sequential I/O while
minimizing disks
Generally changes balance of systems so more can be
spent on CPU and SW than on storage to give better
overall performance for a given budget
Building blocks can be adjusted for multiple MPP
configurations – high performance, archive and
extreme performance

The future of SQL Server Data Warehousing
– Project "Madison"

Predictable Scale out through MPP
Customers with over 400 TB data warehouses

Commodity Hardware
Lower cost
Frequent performance improvements
Easier upgrade and maintenance
Higher customer comfort
Better compatibility

Ultra Shared Nothing
An extension of traditional shared nothing design
Push shared nothing architecture into SMP node
IO and CPU affinity within SMP nodes
Eliminate contention per user query
Use full resources for each user query
Multiple physical instances of tables
Distribute large tables
Replicate small tables
Distribute AND Replicate medium tables
Re-Distribute rows “on-the-fly” when necessary

Madison Server Components
Database Servers

Control Nodes SQL
Control
Active / Passive
SQL Compute
SQL
SQL
Storage
SQL

Landing Zone
Dual Fiber Channel
SQL
Dual Infiniband

SQL Backup
SQL
Management
SQL

Failover/Spare
Spare Database Server

System Architecture 20Gbs Infiniband
DMS Backbone
Database Servers

Control Nodes SQL

Active / Passive
SQL
SQL
Client Drivers
SQL

SQL

Dual Fiber Channel
SQL

Dual Infiniband
Data Center
Monitoring SQL

SQL

SQL
ETL Load Interface

8Gbs Fiber Channel
Corporate Backup Local San
Solution Spare Database Server
IPoIB
Dedicated LAN
Corporate Network Private Network

Software Architecture
Nexus
MS BI
Query Compute Nodes
(AS, RS) Compute Nodes
Tool
Compute Nodes
DMS

IIS
JDBC
Admin Console User Data
OLE-DB
SQL Server
ODBC
Ado.Net

Madison Service Landing Zone
DMS Loader
DMS SQL SSIS
Core Engine DMS Client
DSQL
SQL OS Services Manager

Backup Node
SQL OS
DMS

DW DW DW
DW Schema Management Node
Authentication Configuration Queue
SQL Server
HPC AD

Existing MS software Built by DWPU 3rd Party

Control Node & Client Drivers
Client connections always go through the control node
Clustered to a passive node
Processes SQL requests
Prepares execution plan
Orchestrates distributed execution
Local SQL Server to do final query plan processing / result
aggregation
Will use same set of drivers used by DATAllegro
Provided by DataDirect
ODBC, OLE-DB, JDBC and Ado.Net client drivers
Wire protocol (SeQuel Link)
Available drivers for 32 and 64 bits

Compute Nodes
A SQL Server 2008 instance
DB engine nodes autonomous on local data
SQL as primary interface
Each MPP node is a highly tuned SMP node with
standard interfaces

Landing Zone
Provides high capacity storage for data files
from ETL processes
Integration services available on the landing
zone
Connected to internal network
Available as sandbox for other applications and
scripts that run on internal network.

Landing Data Compute
Source
Zone Files Loader Nodes

Backup Node
Builds on SQL Server native backup/restore
facility
Use VDI interface to plug into backup pipeline
Database-level backup
Coordinated backup across the nodes
Quiesce write activity to synchronize
Can only restore to another appliance with
exactly the same number of distributions

Data Distribution & Replication

Control Node Compute Nodes Storage Nodes

Tables Are Hash
Distributed Or
Replicated

Landing Zone
Node
Spare Node

Text
File
Text
File
Text
File
Text
File

Database Distributed & Replicated Tables
Date Dim

D_DATE_SK D
Customer
D_DATE_ID
C I
D_DATE
C-CUSTOMER_SK
C_CUSTOMER_ID
D_MONTH SS
… Item CD P
C_CURRENT_ADD
R
I_ITEM_SK S
…
I_ITEM_ID
I_REC_START_D
ATE
I_ITEM_DESC
…
Store Sales

Ss_sold_date_sk D D
Ss_item_sk
Ss_customer_sk C I C I
Ss_cdemo_sk D
Ss_store_sk
SS SS
Ss_promo_sk CD P C I CD P
Ss_quantity SS
… S S
Promotion CD P
Customer
P_PROMO_SK S
Demographics
P_PROMO_ID
CD_DEMO_SK P_START_DATE_
CD_GENDER SK D
P_END_DATE_SK D
CD_MARITAL_STATUS Store
CD_EDUCATION … C I
C I
… S_STORE_SK SS
SS
S_STORE_ID CD P
S_REC_START_DAT CD P
D D
E S
S_REC_END_DATE S
C I C I
S_STORE_NAME
… SS SS
CD P CD P

S S

Physical Storage Configuration – Single Node
LUN 1 LUN 2 LUN 3 LUN 8

FG Dist A FG Dist B FG Dist C FG Dist H

DistData1.mdf DistData3.ndf DistData5.ndf DistData7.ndf
DistData2.ndf DistData4.ndf DistData6.ndf DistData8.ndf
Database(s)

Replicated FG
User

ReplData1.mdf ReplData3.ndf ReplData5.ndf ReplData7.ndf
ReplData2.ndf ReplData4.ndf ReplData6.ndf ReplData8.ndf

FG Stage A FG Stage B FG Stage C FG Stage H

StageData1.mdf StageData3.ndf StageData5.ndf StageData1.ndf
StageData2.ndf StageData4.ndf StageData6.ndf StageData2.ndf
Database
Staging

Replicated FG

ReplData1.mdf ReplData3.ndf ReplData5.ndf ReplData7.ndf
ReplData2.ndf ReplData4.ndf ReplData6.ndf ReplData.ndf

Local Drive 1 Local Drive 2 Local Drive 3 Local Drive 4 Local Drive 5 Local Drive 6
TempDB

TempDB1.mdf TempDB2.ndf TempDB3.ndf TempDB4.ndf TempDB5.ndf TempDB6.ndf

Log LUN

UserDB Log StageDB Log TempDB Log

Create Table – Behind the Scenes

Create Table store_sales
with
distribute_on (ss_item_sk)
partition_on(ss_sold_date_sk)
cluster_on (ss_sold_date_sk)
8 Filegroups
Create Table mad_store_sales_a 1 Table per FG
Create Table mad_store_sales_ … Distribution_a
Create Table mad_store_sales_h thru
Distribution_h

12 Partitions
(ss_sold_date_sk)

8K
8K
8K N-number of
8K Pages
8K
Tuple
Microsoft Confidential

High Availability
Multiple levels of redundancy:

• Leveraging MSCS for node availability
• Cluster aware services:
• SQL Server, Madison, DMS

8x1
• Leveraging MSCS for SQL Services, DMS
• 1 spare node for every 8* compute nodes

Security and Encryption
Retain DA v3 design
Authentication and authorization done by Madison server
Users and Roles as first class principals
Nested role capabilities
Connection to SQL back-ends through high privilege account
SQL nodes reside on private network
No support for integrated auth
Leverages TDE to expose DB-level encryption
Supports key rotation

The Logical Data Model
Multiple databases per appliance
Each user database maps to one SQL Server db per
node
Tables
Replicated, Distributed, Replicated + Distributed
Leverage SQL Server compression
Supports Partitioning
Supports secondary indexes
Views

SQL Server Data Types DAv3 Madison

Data Types bigint
binary
P P

Most scalar data types supported bit P
char / nchar P P
by SQL Server 2008 are supported date, time P

by Madison datetime (was date in DA) P P
datetime2 P
Main exceptions datetimeoffset P
Character and binary strings limited to 8K decimal P P
(i.e. no BLOB support) float P P

XML geometry / geography
hierarchyid
Sql-Variant
Int (was integer in DA) P P
System and CLR UDTs money P

Latin1_General with binary real P
smalldatetime P
comparison only smallint P P
smallmoney P
sql_variant
text / ntext / image
timestamp
tinyint P P
varchar / nvarchar / varbinary P P
v*(max)
uniqueidentifier
xml

Supported SQL Syntax
Aligned with ANSI SQL 92
Basic INSERT, UPDATE, DELETE, SELECT

CREATE TABLE AS SELECT

Limited analytical function support
Teradata extensions
Quantile, Sample,…

Configuration and Monitoring
Challenge: Is it an appliance or a collection of nodes?
Madison services instrumented
Logs and Performance Counters
Capture and forward SNMP alerts from devices within the appliance
Small subset of DMVs to union underlying node DMVs
Leverage HPC for monitoring

Manageability
Web-based main administrative user interface
Based on DATAllegro manageability UI
Monitoring system health and activity
Leveraging HPC pack 2008
Systems management
Monitoring
Cluster health

Query Tools
GUI Tool:
Nexus (CoffingDW)
Table & view object
explorer
Interactive query
execution

Command line tool:
Replacement for DA-
SQL
Flavor of SqlCmd

MS BI Integration
Integration Services
Madison enabled as a source
Data movement, lookup operations, etc.
Will add a new SSIS destination
Ensure integrated high performance loads

Reporting Services
Fully supported; including parameterized queries
Will customize experience for report builder and report
designer

Analysis Services
Will get connectivity through OLE-DB provider
Will enable both MOLAP and ROLAP storage

High Level Release Definitions

Will start
running MTPs V2+
in the
summer Closer functional alignment with SQL Server
Better integration with SQL and MS ecosystem,
tools and technologies

“Madison” (aka v1)

Focus on time to market
Compatibility with DATAllegro v3
MS BI integration

H1 2010

Recap
Data Warehousing Reference Architectures
available today!
SQL Server Fast Track
SQL Server “Madison”
Built for advanced, large scale data warehouses
Shared-nothing MPP architecture
Early evaluation programs starting soon

All feedback welcome:
fransidi@microsoft.com

Thank you!

User Group Bi

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to User Group Bi

Similar to User Group Bi (20)

More from sqlserver.co.il

More from sqlserver.co.il (20)

Recently uploaded

Recently uploaded (20)

User Group Bi