The document discusses Microsoft SQL Server data warehousing solutions. It provides an agenda for a presentation that includes an overview of Microsoft's data warehousing offerings, how to establish baseline metrics for Fast Track reference configurations, and how to design balanced server and storage configurations for data warehousing workloads. It also discusses software and hardware best practices, such as data striping and storage configuration recommendations. Overall, the document outlines topics and solutions to help customers accelerate their data warehouse deployments using Microsoft SQL Server.
4. Microsoft Data Warehousing
Offerings
Tier 1 Offerings
Fast Track Data HP Business DW Parallel Data
Enterprise
Warehouse Appliance Warehouse
Appliance for high end Data
Scalable and reliable platform Reference Architectures offering An affordable SMP solution for
Warehousing requiring highest
for Data Warehousing on any best price performance for Data data warehousing on optimized
scalability, performance or
hardware Warehousing hardware
complexity
Ideal for data marts or small to Ideal for data marts or small to
Ideal for small data marts or DWs Offers flexibility in hardware and
mid-sized enterprise data mid-sized DWs with scan centric
with scan centric workloads architecture
warehouses (EDWs) workloads
DW Appliance
Reference Architectures Integrated Appliance
Software only (Fully integrated Software and
(Software and Hardware) (Software and Hardware)
Hardware)
Scale out data warehousing
Scale up data warehousing Scale up data warehousing Scale up data warehousing with massively parallel
processing (MPP)
10s of terabytes 4–80 terabytes Up to 5 terabytes 10s–100s of terabytes
5. Some Data Warehouses today
Big SAN
Big SMP Server
Connected together
What’s wrong with this picture?
6. Answer: system out of balance
This server can consume 12 GB/Sec of IO, but the
SAN can only deliver 2 GB/Sec
Even when the SAN is dedicated to the SQL Data
Warehouse, which it often isn’t
Queries are slow
Despite significant investment in both Server and Storage
Result: significant investment, not delivering performance
7.
8. The Alternative: A Balanced System
Design a server + storage configuration that can
deliver all the IO bandwidth that CPUs can
consume when executing a SQL Relational DW
workload
Avoid sharing storage devices among servers
Avoid overinvesting in disk drives
9.
10. SQL Server Fast Track Data Warehouse
Solution to help customers and partners
accelerate their data warehouse deployments
A method for designing a cost-effective,
balanced system for Data Warehouse
workloads
Reference hardware configurations
developed in conjunction with hardware
partners using this method
Best practices for data layout, loading and
management
11. Software:
• SQL Server 2008 R2
Enterprise
• Windows Server 2008 R2
Configuration guidelines:
• Physical table structures
• Indexes
• Compression
• SQL Server settings
• Windows Server settings
• Loading
Hardware:
• Tight specifications for servers,
storage and networking
• ‘Per core’ building block
19. Data Warehouse Workload Characteristics
SELECT L_RETURNFLAG, L_LINESTATUS, SUM(L_QUANTITY) AS SUM_QTY,
SUM(L_EXTENDEDPRICE) AS SUM_BASE_PRICE,
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)) AS SUM_DISC_PRICE,
SUM(L_EXTENDEDPRICE*(1-L_DISCOUNT)*(1+L_TAX))
AS SUM_CHARGE,
AVG(L_QUANTITY) AS AVG_QTY,
AVG(L_EXTENDEDPRICE) AS AVG_PRICE,
AVG(L_DISCOUNT) AS AVG_DISC,
COUNT(*) AS COUNT_ORDER
FROM LINEITEM
GROUP BY L_RETURNFLAG,
L_LINESTATUS
ORDER BY L_RETURNFLAG,
L_LINESTATUS
37. MPP Engine Coordinator
Software Architecture Provides single system image
SQL compilation
Global metadata and appliance configuration
Global query optimization and plan generation
Global query execution coordination
Other Global transaction coordination
Query MS BI Internet Authentication and authorization
DWSQL Third- Explorer
Tool (AS, RS) Supportability (hardware and software status)
Party Tools
Compute Node
Compute Nodes
Compute Nodes
IIS Data Movement Service
Data Access Admin
(OLEDB, ODBC, ADO.NET, JDBC)
Console
User Data
SQL Server
Core
SQL DMS
Engine
Parser Manager Data Backup Node
Services
Movement
MPP Engine Coordinator Service Data Movement Service
Landing Zone Node
DW DW DW Data Movement Service
TempDB
Authentication Configuration Schema
SQL Server
Data Movement Service
Control Node Data movement across the appliance
Distributed query execution operators
38.
39. Blazing-Fast Performance
“400 percent
improvement in
performance
First American Title
Insurance Company
Now, up to 10xFaster³
ColumnStore
¹Source: Microsoft customer evidence, Choice Hotels International
²Source: Microsoft customer evidence, KAS Bank
³Source: Microsoft customer testing; common data warehousing queries
41. 41
• Batch object
•
Column vectors
•
List of qualifying rows
−
−
•
42.
43. In a standard scale-out server deployment, multiple report servers share a single
report server database. The report server database should be installed on a
remote SQL Server instance. The following diagram is an example of a standard
scale-out server deployment configuration with the report server database on a
remote SQL Server instance.
44. As another option, you might decide to host the report server database on a
SQL Server instance that is part of a failover cluster. The following diagram is
an example of a scale-out server deployment configuration where the report
server databases are on an instance that is part of a failover cluster.
45. In addition to the standard scale-out deployment, you might determine that your reporting environment
would benefit from a more advanced scale-out deployment configuration. For example, you might decide
to use the load-balanced report servers for interactive report processing and add a separate report server
computer to process only scheduled reports. The following diagram is an example of this advanced scale-
out server deployment configuration.
46. Log Description
The report server execution log contains data about specific reports, including when a report was run,
Report Server Execution Log who ran it, where it was delivered, and which rendering format was used.
The execution log is stored in the report server database.
The service trace log contains very detailed information that is useful if you are debugging an
Report Server Service Trace Log application or investigating an issue or event. The file is located at Microsoft SQL Server<SQL Server
Instance>Reporting ServicesLogFiles.
The HTTP log file contains a record of all HTTP requests and responses handled by the Report Server
Web service and Report Manager. HTTP logging is not enabled by default. You must modify the
Report Server HTTP Log
ReportingServicesService.exe configuration file to use this feature in your installation. The file is
located at Microsoft SQL Server<SQL Server Instance>Reporting ServicesLogFiles.
51. Under the properties of your data source, increasing the network packet size for SQL
Server minimizes the protocol overhead require to build many, small packages. The
default value for SQL Server 2008 is 4096. With a data warehouse load, a packet size of
32K (in SQL Server, this means assigning the value 32767) can benefit processing. Don’t
change the value in SQL Server using sp_configure; instead override it in your data source.
This can be set whether you are using TCP/IP or Shared Memory.
This slide shows what we are going to talk about today. We will start off discussing Microsoft’s vision for data warehousing solutions. Then we will discuss the different offerings. Next, we will discuss how you can get support and services to help you get started with your data warehouse and to help accelerate the completion of your solution. Finally, we will end with a discussion of the quick start services to enable you to begin your data warehouse solution quickly.
SQL Server 2008 R2 comes in several editions. In this presentation, we will look at 4 different SKUs, each of which has different features that are important for data warehousing. We will drill down to get more information about each edition and the features that are important.
Remind them
In order to ensure the query is cached you need to do the following:Ensure the results of the query will fit in memoryRun the query once. The 2nd and subsequent times you execute the query it should be cached from memory. You can tell this b/c the 2nd execution should be much faster than the initialReview:TPC BENCHMARKTM Hhttp://www.tpc.org/tpch/spec/tpch2.8.0.pdfTPC-H Data Sethttp://www.tpc.org/tpch/spec/tpch_2_8_0.ziphttp://www.tpc.org/tpch/spec/reference2.8.0.zip
Remind them “Your mileage may vary”
-E is the primary way we help to ensure longer “runs” of contiguous, logically grouped pages.An extent is (8) 8k pages.. Or 64k (64k*64k)/1024 = 4MBSQL will still allocate the 4MB extent in groups of (8) 8k pages at a time. This means that pages can still be interleaved (extent fragmentation) down to the extent level.TF117 is specific to TempDB as Autogrow should be off for all other databasesCustomer may have a database with a specific use case that requires autogrow..this is ok just needs to be managedShould not be a major part of the overall workload. This file will become fragmentedUsing Autogrow for Tempdb is about practicality. It can be hard to pre-allocated TempDB. If they can pre-allocate it, go for itReview:Using the SQL Server Service Startup Optionshttp://msdn.microsoft.com/en-us/library/ms190737.aspxSAP with Microsoft SQL Server 2005: Best Practices for High Availability, Maximum Performance, and Scalabilityhttp://download.microsoft.com/download/d/9/4/d948f981-926e-40fa-a026-5bfcf076d9b9/SAP_SQL2005_Best%20Practices.doc
Remember that additional space may be needed during initial migration of data if moving onto a Fast Track RA or during the initial load of a new Fast Track RAReview:Working with tempdb in SQL Server 2005http://technet.microsoft.com/en-us/library/cc966545.aspxCapacity Planning for tempdbhttp://msdn.microsoft.com/en-us/library/ms345368.aspx
Remember that additional space may be needed during initial migration of data if moving onto a Fast Track RA or during the initial load of a new Fast Track RAReview:Working with tempdb in SQL Server 2005http://technet.microsoft.com/en-us/library/cc966545.aspxCapacity Planning for tempdbhttp://msdn.microsoft.com/en-us/library/ms345368.aspx
Workloads often need large amounts of data pages to be in cache, in this case add additional memory as neededHash Joins and Sorts can make use of additional memory to help prevent them from spilling to tempdb. Workloads with large amounts of queries and bulk loads performing hash joins and sorts will benefit from more memory.Review:Troubleshooting Performance Problems in SQL Server 2008http://msdn.microsoft.com/en-us/library/dd672789.aspxHow to: Enable the Lock Pages in Memory Optionhttp://msdn.microsoft.com/en-us/library/ms190730.aspxTuning options for SQL Server 2005 and SQL Server 2008 when running in high performance workloads
4 Racks in V1Orderable at the rack levelRequired software13k Price per TB Pricing and licensing training in resources
Data layout options:Dimension tables are typically replicated.PDW maintains data integrity across all nodes.Fact tables are typically distributed.The data model, table sizes, and workloads must all be considered when choosing between replicated and distributed tables.The following join types are used to achieve Distribution Compatibility:Shared Nothing join - Achieves Distribution Compatibility by using compatible Distribution Keys in the SQL join criteria.Ultra Shared Nothing join - Achieves Distribution Compatibility through a replicated table; no data movement between nodes is required.Redistribution join - Requires data to be dynamically distributed between Compute Nodes to achieve Distribution Compatibility.