SlideShare uma empresa Scribd logo
1 de 29
DATAWAREHOUSE
BEST PRACTICES
Dr. Eduardo Castro, MSc
ecastro@simsasys.com

http://ecastrom.blogspot.com
http://comunidadwindows.org
http://tiny.cc/comwindows
Facebook: ecastrom
Twitter: edocastro
SOURCES



This presentation is based on the following sources

Datawarehouse
              Ravi RanJan
Top 10 Best Practices for Building a Large Scale Relational Data Warehouse
              SQL CAT
Complexities of Creating a Data Warehouse

                 • Incomplete errors
                          • Missing Fields
                          • Records or Fields That, by Design, are not
                                Being Recorded

                 • Incorrect errors
                          • Wrong Calculations, Aggregations
                          • Duplicate Records
                          • Wrong Information Entered into Source System



Source. Datawarehouse. Ravi RanJan
Data Warehouse Pitfalls
     • You are going to spend much time extracting, cleaning,
        and loading data
     • You are going to find problems with systems feeding the
        data warehouse
     • You will find the need to store/validate data not being
        captured/validated by any existing system
     • Large scale data warehousing can become an exercise
        in data homogenizing



Source. Datawarehouse. Ravi RanJan
Data Warehouse Pitfalls…

          • The time it takes to load the warehouse will expand
            to the amount of the time in the available window...
            and then some
          • You are building a HIGH maintenance system
          • You will fail if you concentrate on resource
            optimization to the neglect of project, data, and
            customer management issues and an understanding
            of what adds value to the customer




Source. Datawarehouse. Ravi RanJan
Best Practices
  • Complete requirements and design

  • Prototyping is key to business understanding

  • Utilizing proper aggregations and detailed data

  • Training is an on-going process

  • Build data integrity checks into your system.




Source. Datawarehouse. Ravi RanJan
Top 10 Best Practices for Building a Large
            Scale Relational Data Warehouse
            • Building a large scale relational data warehouse is a
              complex task.
            • This section describes some design techniques that can
              help in architecting an efficient large scale relational data
              warehouse with SQL Server.
            • Most large scale data warehouses use table and index
              partitioning, and therefore, many of the recommendations
              here involve partitioning.
            • Most of these tips are based on experiences building
              large data warehouses on SQL Server


Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Consider partitioning large fact tables
            • Consider partitioning fact tables that are 50 to 100GB or
              larger.
            • Partitioning can provide manageability and often
              performance benefits.
                    • Faster, more granular index maintenance.
                    • More flexible backup / restore options.
                    • Faster data loading and deleting
                    • Faster queries when restricted to a single partition..
            • Typically partition the fact table on the date key.
               • Enables sliding window.
            • Enables partition elimination.


Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Build clustered index on the date key of
            the fact table
            • This supports efficient queries to populate cubes or
                 retrieve a historical data slice.

            • If you load data in a batch window for the clustered index
                 on the fact table then use the options
                      ALLOW_ROW_LOCKS = OFF and
                      ALLOW_PAGE_LOCKS = OFF

            • This helps speed up table scan operations during query
                 time and helps avoid excessive locking activity during
                 large updates.


Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Build clustered index on the date key of
            the fact table
            • Build nonclustered indexes for each foreign key.
              • This helps ‘pinpoint queries' to extract rows based on a selective
                dimension predicate.


            • Use filegroups for administration requirements such as
                 backup / restore, partial database availability, etc.




Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Choose partition grain carefully
            • Most customers use month, quarter, or year.
            • For efficient deletes, you must delete one full partition at a
              time.
            • It is faster to load a complete partition at a time.
                    • Daily partitions for daily loads may be an attractive option.
                    • However, keep in mind that a table can have a maximum of 1000
                         partitions.
            • Partition grain affects query parallelism.




Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Choose partition grain carefully
            • For SQL Server 2005:


                    • Queries touching a single partition can parallelize up to MAXDOP
                         (maximum degree of parallelism).

                    • Queries touching multiple partitions use one thread per partition up
                         to MAXDOP.




Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Choose partition grain carefully

            • For SQL Server 2008:

                    • Parallel threads up to MAXDOP are distributed proportionally to
                         scan partitions, and multiple threads per partition may be used
                         even when several partitions must be scanned.

                    • Avoid a partition design where only 2 or 3 partitions are touched by
                         frequent queries, if you need MAXDOP parallelism (assuming
                         MAXDOP =4 or larger).




Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Design dimension tables appropriately
            • Use integer surrogate keys for all dimensions, other than
                 the Date dimension.

            • Use the smallest possible integer for the dimension
                 surrogate keys. This helps to keep fact table narrow.

            • Use a meaningful date key of integer type derivable from
                 the DATETIME data type (for example: 20060215).

            • Don't use a surrogate Key for the Date dimension



Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Design dimension tables appropriately
            • Build a clustered index on the surrogate key for each
                 dimension table

            • Build a non-clustered index on the Business Key
                 (potentially combined with a row-effective-date) to support
                 surrogate key lookups during loads.

            • Build nonclustered indexes on other frequently searched
                 dimension columns.

            • Avoid partitioning dimension tables.

Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Design dimension tables appropriately
            • Avoid enforcing foreign key relationships between the fact
                 and the dimension tables, to allow faster data loads.

            • You can create foreign key constraints with NOCHECK to
                 document the relationships; but don’t enforce them.

            • Ensure data integrity though Transform Lookups, or
                 perform the data integrity checks at the source of the
                 data.




Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Write effective queries for partition
            elimination
            • Whenever possible, place a query predicate (WHERE
                 condition) directly on the partitioning key (Date dimension
                 key) of the fact table.




Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Use Sliding Window technique to maintain
            data
            • Maintain a rolling time window for online access to the
                 fact tables. Load newest data, unload oldest data.
            •
            • Always keep empty partitions at both ends of the partition
                 range to guarantee that the partition split (before loading
                 new data) and partition merge (after unloading old data)
                 do not incur any data movement.

            • Avoid split or merge of populated partitions. Splitting or
                 merging populated partitions can be extremely inefficient,
                 as this may cause as much as 4 times more log
                 generation, and also cause severe locking.
Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Use Sliding Window technique to maintain
            data
            • Create the load staging table in the same filegroup as the
                 partition you are loading.

            • Create the unload staging table in the same filegroup as
                 the partition you are deleteing.

            • It is fastest to load newest full partition at one time, but
                 only possible when partition size is equal to the data load
                 frequency (for example, you have one partition per day,
                 and you load data once per day).



Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Use Sliding Window technique to maintain
            data
            • If the partition size doesn't match the data load frequency,
                 incrementally load the latest partition.

            • Various options for loading bulk data into a partitioned
              table are discussed in the whitepaper
            •       http://www.microsoft.com/technet/prodtechnol/sql/be
              stpractice/loading_bulk_data_partitioned_table.mspx.

            • Always unload one partition at a time.




Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Efficiently load the initial data
            • Use SIMPLE or BULK LOGGED recovery model during
                 the initial data load.

            • Create the partitioned fact table with the Clustered index.


            • Create non-indexed staging tables for each partition, and
                 separate source data files for populating each partition.

            • Populate the staging tables in parallel.


            • Use multiple BULK INSERT, BCP or SSIS tasks.

Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Efficiently load the initial data
            • Create as many load scripts to run in parallel as there are
                 CPUs, if there is no IO bottleneck. If IO bandwidth is
                 limited, use fewer scripts in parallel.

            • Use 0 batch size in the load. Use 0 commit size in the
                 load.

            • Use TABLOCK.


            • Use BULK INSERT if the sources are flat files on the
                 same server. Use BCP or SSIS if data is being pushed
                 from remote machines.

Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Efficiently load the initial data
            • Build a clustered index on each staging table, then create
                 appropriate CHECK constraints.

            • SWITCH all partitions into the partitioned table.


            • Build nonclustered indexes on the partitioned table.


            • Possible to load 1 TB in under an hour on a 64-CPU
                 server with a SAN capable of 14 GB/Sec throughput (non-
                 indexed table).


Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Efficiently delete old data
            • Use partition switching whenever possible.


            • To delete millions of rows from nonpartitioned, indexed
                 tables
                    • Avoid DELETE FROM ...WHERE ...
                           • Huge locking and logging issues
                           • Long rollback if the delete is canceled


                    • Usually faster to
                           • INSERT the records to keep into a non-indexed table
                           • Create index(es) on the table
                           • Rename the new table to replace the original



Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Efficiently delete old data
            • As an alternative, ‘trickle' deletes using the following
                 repeatedly in a loop


                                     DELETE TOP (1000) ... ;
                                     COMMIT

            • Another alternative is to update the row to mark as
                 deleted, then delete later during non critical time.




Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Manage statistics manually
            • Statistics on partitioned tables are maintained for the table
                 as a whole.

            • Manually update statistics on large fact tables after
                 loading new data.

            • Manually update statistics after rebuilding index on a
                 partition.

            • If you regularly update statistics after periodic loads, you
                 may turn off autostats on that table.

Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Manage statistics manually
            • This is important for optimizing queries that may need to
                 read only the newest data.

            • Updating statistics on small dimension tables after
                 incremental loads may also help performance.

            • Use FULLSCAN option on update statistics on dimension
                 tables for more accurate query plans.




Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Consider efficient backup strategies
            • Backing up the entire database may take significant
                 amount of time for a very large database.

                    • For example, backing up a 2 TB database to a 10-spindle RAID-5
                         disk on a SAN may take 2 hours (at the rate 275 MB/sec).


            • Snapshot backup using SAN technology is a very good
                 option.

            • Reduce the volume of data to backup regularly.




Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
Consider efficient backup strategies
            • The filegroups for the historical partitions can be marked
                 as READ ONLY.

            • Perform a filegroup backup once when a filegroup
                 becomes read-only.

            • Perform regular backups only on the read / write
                 filegroups.

            • Note that RESTOREs of the read-only filegroups cannot
                 be performed in parallel.

Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT

Mais conteúdo relacionado

Mais procurados

Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introductionIBM Analytics
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know SnowflakeKnoldus Inc.
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysisAmazon Web Services
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentationDavid Rice
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaAmazon Web Services
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020Timothy McAliley
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta LakeKnoldus Inc.
 

Mais procurados (20)

Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introduction
 
Let’s get to know Snowflake
Let’s get to know SnowflakeLet’s get to know Snowflake
Let’s get to know Snowflake
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentation
 
Data Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & AthenaData Catalog & ETL - Glue & Athena
Data Catalog & ETL - Glue & Athena
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 

Semelhante a Data Warehouse Best Practices

Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best PracticesEduardo Castro
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsKellyn Pot'Vin-Gorman
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptxIke Ellis
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server DatabasesColdFusionConference
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analyticsIke Ellis
 
Modern Data Platform Modern Data Platform
Modern Data Platform Modern Data PlatformModern Data Platform Modern Data Platform
Modern Data Platform Modern Data Platformbangel105
 
high performance databases
high performance databaseshigh performance databases
high performance databasesmahdi_92
 
Crystal xcelsius best practices and workflows for building enterprise solut...
Crystal xcelsius   best practices and workflows for building enterprise solut...Crystal xcelsius   best practices and workflows for building enterprise solut...
Crystal xcelsius best practices and workflows for building enterprise solut...Yogeeswar Reddy
 
World-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon RedshiftWorld-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon RedshiftLars Kamp
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
The High Performance DBA Optimizing Databases For High Performance
The High Performance DBA Optimizing Databases For High PerformanceThe High Performance DBA Optimizing Databases For High Performance
The High Performance DBA Optimizing Databases For High PerformanceEmbarcadero Technologies
 
Capacity planning for your data stores
Capacity planning for your data storesCapacity planning for your data stores
Capacity planning for your data storesColin Charles
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMahesh Salaria
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL AzureShy Engelberg
 

Semelhante a Data Warehouse Best Practices (20)

Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
 
Taming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI OptionsTaming the shrew, Optimizing Power BI Options
Taming the shrew, Optimizing Power BI Options
 
Build a modern data platform.pptx
Build a modern data platform.pptxBuild a modern data platform.pptx
Build a modern data platform.pptx
 
Building better SQL Server Databases
Building better SQL Server DatabasesBuilding better SQL Server Databases
Building better SQL Server Databases
 
Data modeling trends for analytics
Data modeling trends for analyticsData modeling trends for analytics
Data modeling trends for analytics
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
Modern Data Platform Modern Data Platform
Modern Data Platform Modern Data PlatformModern Data Platform Modern Data Platform
Modern Data Platform Modern Data Platform
 
high performance databases
high performance databaseshigh performance databases
high performance databases
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
Crystal xcelsius best practices and workflows for building enterprise solut...
Crystal xcelsius   best practices and workflows for building enterprise solut...Crystal xcelsius   best practices and workflows for building enterprise solut...
Crystal xcelsius best practices and workflows for building enterprise solut...
 
World-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon RedshiftWorld-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon Redshift
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
The High Performance DBA Optimizing Databases For High Performance
The High Performance DBA Optimizing Databases For High PerformanceThe High Performance DBA Optimizing Databases For High Performance
The High Performance DBA Optimizing Databases For High Performance
 
Capacity planning for your data stores
Capacity planning for your data storesCapacity planning for your data stores
Capacity planning for your data stores
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source Database
 
Scalable relational database with SQL Azure
Scalable relational database with SQL AzureScalable relational database with SQL Azure
Scalable relational database with SQL Azure
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Revision
RevisionRevision
Revision
 

Mais de Eduardo Castro

Introducción a polybase en SQL Server
Introducción a polybase en SQL ServerIntroducción a polybase en SQL Server
Introducción a polybase en SQL ServerEduardo Castro
 
Creando tu primer ambiente de AI en Azure ML y SQL Server
Creando tu primer ambiente de AI en Azure ML y SQL ServerCreando tu primer ambiente de AI en Azure ML y SQL Server
Creando tu primer ambiente de AI en Azure ML y SQL ServerEduardo Castro
 
Seguridad en SQL Azure
Seguridad en SQL AzureSeguridad en SQL Azure
Seguridad en SQL AzureEduardo Castro
 
Azure Synapse Analytics MLflow
Azure Synapse Analytics MLflowAzure Synapse Analytics MLflow
Azure Synapse Analytics MLflowEduardo Castro
 
SQL Server 2019 con Windows Server 2022
SQL Server 2019 con Windows Server 2022SQL Server 2019 con Windows Server 2022
SQL Server 2019 con Windows Server 2022Eduardo Castro
 
Novedades en SQL Server 2022
Novedades en SQL Server 2022Novedades en SQL Server 2022
Novedades en SQL Server 2022Eduardo Castro
 
Introduccion a SQL Server 2022
Introduccion a SQL Server 2022Introduccion a SQL Server 2022
Introduccion a SQL Server 2022Eduardo Castro
 
Machine Learning con Azure Managed Instance
Machine Learning con Azure Managed InstanceMachine Learning con Azure Managed Instance
Machine Learning con Azure Managed InstanceEduardo Castro
 
Novedades en sql server 2022
Novedades en sql server 2022Novedades en sql server 2022
Novedades en sql server 2022Eduardo Castro
 
Sql server 2019 con windows server 2022
Sql server 2019 con windows server 2022Sql server 2019 con windows server 2022
Sql server 2019 con windows server 2022Eduardo Castro
 
Introduccion a databricks
Introduccion a databricksIntroduccion a databricks
Introduccion a databricksEduardo Castro
 
Pronosticos con sql server
Pronosticos con sql serverPronosticos con sql server
Pronosticos con sql serverEduardo Castro
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsEduardo Castro
 
Que hay de nuevo en el Azure Data Lake Storage Gen2
Que hay de nuevo en el Azure Data Lake Storage Gen2Que hay de nuevo en el Azure Data Lake Storage Gen2
Que hay de nuevo en el Azure Data Lake Storage Gen2Eduardo Castro
 
Introduccion a Azure Synapse Analytics
Introduccion a Azure Synapse AnalyticsIntroduccion a Azure Synapse Analytics
Introduccion a Azure Synapse AnalyticsEduardo Castro
 
Seguridad de SQL Database en Azure
Seguridad de SQL Database en AzureSeguridad de SQL Database en Azure
Seguridad de SQL Database en AzureEduardo Castro
 
Python dentro de SQL Server
Python dentro de SQL ServerPython dentro de SQL Server
Python dentro de SQL ServerEduardo Castro
 
Servicios Cognitivos de de Microsoft
Servicios Cognitivos de de Microsoft Servicios Cognitivos de de Microsoft
Servicios Cognitivos de de Microsoft Eduardo Castro
 
Script de paso a paso de configuración de Secure Enclaves
Script de paso a paso de configuración de Secure EnclavesScript de paso a paso de configuración de Secure Enclaves
Script de paso a paso de configuración de Secure EnclavesEduardo Castro
 
Introducción a conceptos de SQL Server Secure Enclaves
Introducción a conceptos de SQL Server Secure EnclavesIntroducción a conceptos de SQL Server Secure Enclaves
Introducción a conceptos de SQL Server Secure EnclavesEduardo Castro
 

Mais de Eduardo Castro (20)

Introducción a polybase en SQL Server
Introducción a polybase en SQL ServerIntroducción a polybase en SQL Server
Introducción a polybase en SQL Server
 
Creando tu primer ambiente de AI en Azure ML y SQL Server
Creando tu primer ambiente de AI en Azure ML y SQL ServerCreando tu primer ambiente de AI en Azure ML y SQL Server
Creando tu primer ambiente de AI en Azure ML y SQL Server
 
Seguridad en SQL Azure
Seguridad en SQL AzureSeguridad en SQL Azure
Seguridad en SQL Azure
 
Azure Synapse Analytics MLflow
Azure Synapse Analytics MLflowAzure Synapse Analytics MLflow
Azure Synapse Analytics MLflow
 
SQL Server 2019 con Windows Server 2022
SQL Server 2019 con Windows Server 2022SQL Server 2019 con Windows Server 2022
SQL Server 2019 con Windows Server 2022
 
Novedades en SQL Server 2022
Novedades en SQL Server 2022Novedades en SQL Server 2022
Novedades en SQL Server 2022
 
Introduccion a SQL Server 2022
Introduccion a SQL Server 2022Introduccion a SQL Server 2022
Introduccion a SQL Server 2022
 
Machine Learning con Azure Managed Instance
Machine Learning con Azure Managed InstanceMachine Learning con Azure Managed Instance
Machine Learning con Azure Managed Instance
 
Novedades en sql server 2022
Novedades en sql server 2022Novedades en sql server 2022
Novedades en sql server 2022
 
Sql server 2019 con windows server 2022
Sql server 2019 con windows server 2022Sql server 2019 con windows server 2022
Sql server 2019 con windows server 2022
 
Introduccion a databricks
Introduccion a databricksIntroduccion a databricks
Introduccion a databricks
 
Pronosticos con sql server
Pronosticos con sql serverPronosticos con sql server
Pronosticos con sql server
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
 
Que hay de nuevo en el Azure Data Lake Storage Gen2
Que hay de nuevo en el Azure Data Lake Storage Gen2Que hay de nuevo en el Azure Data Lake Storage Gen2
Que hay de nuevo en el Azure Data Lake Storage Gen2
 
Introduccion a Azure Synapse Analytics
Introduccion a Azure Synapse AnalyticsIntroduccion a Azure Synapse Analytics
Introduccion a Azure Synapse Analytics
 
Seguridad de SQL Database en Azure
Seguridad de SQL Database en AzureSeguridad de SQL Database en Azure
Seguridad de SQL Database en Azure
 
Python dentro de SQL Server
Python dentro de SQL ServerPython dentro de SQL Server
Python dentro de SQL Server
 
Servicios Cognitivos de de Microsoft
Servicios Cognitivos de de Microsoft Servicios Cognitivos de de Microsoft
Servicios Cognitivos de de Microsoft
 
Script de paso a paso de configuración de Secure Enclaves
Script de paso a paso de configuración de Secure EnclavesScript de paso a paso de configuración de Secure Enclaves
Script de paso a paso de configuración de Secure Enclaves
 
Introducción a conceptos de SQL Server Secure Enclaves
Introducción a conceptos de SQL Server Secure EnclavesIntroducción a conceptos de SQL Server Secure Enclaves
Introducción a conceptos de SQL Server Secure Enclaves
 

Último

10052024_First India Newspaper Jaipur.pdf
10052024_First India Newspaper Jaipur.pdf10052024_First India Newspaper Jaipur.pdf
10052024_First India Newspaper Jaipur.pdfFIRST INDIA
 
China's soft power in 21st century .pptx
China's soft power in 21st century   .pptxChina's soft power in 21st century   .pptx
China's soft power in 21st century .pptxYasinAhmad20
 
11052024_First India Newspaper Jaipur.pdf
11052024_First India Newspaper Jaipur.pdf11052024_First India Newspaper Jaipur.pdf
11052024_First India Newspaper Jaipur.pdfFIRST INDIA
 
Politician uddhav thackeray biography- Full Details
Politician uddhav thackeray biography- Full DetailsPolitician uddhav thackeray biography- Full Details
Politician uddhav thackeray biography- Full DetailsVoterMood
 
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...IT Industry
 
Unveiling the Characteristics of Political Institutions_ A Comprehensive Anal...
Unveiling the Characteristics of Political Institutions_ A Comprehensive Anal...Unveiling the Characteristics of Political Institutions_ A Comprehensive Anal...
Unveiling the Characteristics of Political Institutions_ A Comprehensive Anal...tewhimanshu23
 
The political system of the united kingdom
The political system of the united kingdomThe political system of the united kingdom
The political system of the united kingdomlunadelior
 
Indegene Limited IPO Detail - Divadhvik
Indegene Limited IPO Detail  - DivadhvikIndegene Limited IPO Detail  - Divadhvik
Indegene Limited IPO Detail - Divadhvikdhvikdiva
 
06052024_First India Newspaper Jaipur.pdf
06052024_First India Newspaper Jaipur.pdf06052024_First India Newspaper Jaipur.pdf
06052024_First India Newspaper Jaipur.pdfFIRST INDIA
 
Dubai Call Girls Pinky O525547819 Call Girl's In Dubai
Dubai Call Girls Pinky O525547819 Call Girl's In DubaiDubai Call Girls Pinky O525547819 Call Girl's In Dubai
Dubai Call Girls Pinky O525547819 Call Girl's In Dubaikojalkojal131
 
Job-Oriеntеd Courses That Will Boost Your Career in 2024
Job-Oriеntеd Courses That Will Boost Your Career in 2024Job-Oriеntеd Courses That Will Boost Your Career in 2024
Job-Oriеntеd Courses That Will Boost Your Career in 2024Insiger
 
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...Andy (Avraham) Blumenthal
 
422524114-Patriarchy-Kamla-Bhasin gg.pdf
422524114-Patriarchy-Kamla-Bhasin gg.pdf422524114-Patriarchy-Kamla-Bhasin gg.pdf
422524114-Patriarchy-Kamla-Bhasin gg.pdflambardar420420
 
05052024_First India Newspaper Jaipur.pdf
05052024_First India Newspaper Jaipur.pdf05052024_First India Newspaper Jaipur.pdf
05052024_First India Newspaper Jaipur.pdfFIRST INDIA
 
declarationleaders_sd_re_greens_theleft_5.pdf
declarationleaders_sd_re_greens_theleft_5.pdfdeclarationleaders_sd_re_greens_theleft_5.pdf
declarationleaders_sd_re_greens_theleft_5.pdfssuser5750e1
 

Último (17)

10052024_First India Newspaper Jaipur.pdf
10052024_First India Newspaper Jaipur.pdf10052024_First India Newspaper Jaipur.pdf
10052024_First India Newspaper Jaipur.pdf
 
call girls inMahavir Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls inMahavir Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7call girls inMahavir Nagar  (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls inMahavir Nagar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
China's soft power in 21st century .pptx
China's soft power in 21st century   .pptxChina's soft power in 21st century   .pptx
China's soft power in 21st century .pptx
 
11052024_First India Newspaper Jaipur.pdf
11052024_First India Newspaper Jaipur.pdf11052024_First India Newspaper Jaipur.pdf
11052024_First India Newspaper Jaipur.pdf
 
Politician uddhav thackeray biography- Full Details
Politician uddhav thackeray biography- Full DetailsPolitician uddhav thackeray biography- Full Details
Politician uddhav thackeray biography- Full Details
 
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...
KING VISHNU BHAGWANON KA BHAGWAN PARAMATMONKA PARATOMIC PARAMANU KASARVAMANVA...
 
Unveiling the Characteristics of Political Institutions_ A Comprehensive Anal...
Unveiling the Characteristics of Political Institutions_ A Comprehensive Anal...Unveiling the Characteristics of Political Institutions_ A Comprehensive Anal...
Unveiling the Characteristics of Political Institutions_ A Comprehensive Anal...
 
The political system of the united kingdom
The political system of the united kingdomThe political system of the united kingdom
The political system of the united kingdom
 
Indegene Limited IPO Detail - Divadhvik
Indegene Limited IPO Detail  - DivadhvikIndegene Limited IPO Detail  - Divadhvik
Indegene Limited IPO Detail - Divadhvik
 
06052024_First India Newspaper Jaipur.pdf
06052024_First India Newspaper Jaipur.pdf06052024_First India Newspaper Jaipur.pdf
06052024_First India Newspaper Jaipur.pdf
 
Dubai Call Girls Pinky O525547819 Call Girl's In Dubai
Dubai Call Girls Pinky O525547819 Call Girl's In DubaiDubai Call Girls Pinky O525547819 Call Girl's In Dubai
Dubai Call Girls Pinky O525547819 Call Girl's In Dubai
 
Job-Oriеntеd Courses That Will Boost Your Career in 2024
Job-Oriеntеd Courses That Will Boost Your Career in 2024Job-Oriеntеd Courses That Will Boost Your Career in 2024
Job-Oriеntеd Courses That Will Boost Your Career in 2024
 
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
America Is the Target; Israel Is the Front Line _ Andy Blumenthal _ The Blogs...
 
422524114-Patriarchy-Kamla-Bhasin gg.pdf
422524114-Patriarchy-Kamla-Bhasin gg.pdf422524114-Patriarchy-Kamla-Bhasin gg.pdf
422524114-Patriarchy-Kamla-Bhasin gg.pdf
 
05052024_First India Newspaper Jaipur.pdf
05052024_First India Newspaper Jaipur.pdf05052024_First India Newspaper Jaipur.pdf
05052024_First India Newspaper Jaipur.pdf
 
9953056974 Call Girls In Pratap Nagar, Escorts (Delhi) NCR
9953056974 Call Girls In Pratap Nagar, Escorts (Delhi) NCR9953056974 Call Girls In Pratap Nagar, Escorts (Delhi) NCR
9953056974 Call Girls In Pratap Nagar, Escorts (Delhi) NCR
 
declarationleaders_sd_re_greens_theleft_5.pdf
declarationleaders_sd_re_greens_theleft_5.pdfdeclarationleaders_sd_re_greens_theleft_5.pdf
declarationleaders_sd_re_greens_theleft_5.pdf
 

Data Warehouse Best Practices

  • 1. DATAWAREHOUSE BEST PRACTICES Dr. Eduardo Castro, MSc ecastro@simsasys.com http://ecastrom.blogspot.com http://comunidadwindows.org http://tiny.cc/comwindows Facebook: ecastrom Twitter: edocastro
  • 2. SOURCES This presentation is based on the following sources Datawarehouse Ravi RanJan Top 10 Best Practices for Building a Large Scale Relational Data Warehouse SQL CAT
  • 3. Complexities of Creating a Data Warehouse • Incomplete errors • Missing Fields • Records or Fields That, by Design, are not Being Recorded • Incorrect errors • Wrong Calculations, Aggregations • Duplicate Records • Wrong Information Entered into Source System Source. Datawarehouse. Ravi RanJan
  • 4. Data Warehouse Pitfalls • You are going to spend much time extracting, cleaning, and loading data • You are going to find problems with systems feeding the data warehouse • You will find the need to store/validate data not being captured/validated by any existing system • Large scale data warehousing can become an exercise in data homogenizing Source. Datawarehouse. Ravi RanJan
  • 5. Data Warehouse Pitfalls… • The time it takes to load the warehouse will expand to the amount of the time in the available window... and then some • You are building a HIGH maintenance system • You will fail if you concentrate on resource optimization to the neglect of project, data, and customer management issues and an understanding of what adds value to the customer Source. Datawarehouse. Ravi RanJan
  • 6. Best Practices • Complete requirements and design • Prototyping is key to business understanding • Utilizing proper aggregations and detailed data • Training is an on-going process • Build data integrity checks into your system. Source. Datawarehouse. Ravi RanJan
  • 7. Top 10 Best Practices for Building a Large Scale Relational Data Warehouse • Building a large scale relational data warehouse is a complex task. • This section describes some design techniques that can help in architecting an efficient large scale relational data warehouse with SQL Server. • Most large scale data warehouses use table and index partitioning, and therefore, many of the recommendations here involve partitioning. • Most of these tips are based on experiences building large data warehouses on SQL Server Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 8. Consider partitioning large fact tables • Consider partitioning fact tables that are 50 to 100GB or larger. • Partitioning can provide manageability and often performance benefits. • Faster, more granular index maintenance. • More flexible backup / restore options. • Faster data loading and deleting • Faster queries when restricted to a single partition.. • Typically partition the fact table on the date key. • Enables sliding window. • Enables partition elimination. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 9. Build clustered index on the date key of the fact table • This supports efficient queries to populate cubes or retrieve a historical data slice. • If you load data in a batch window for the clustered index on the fact table then use the options ALLOW_ROW_LOCKS = OFF and ALLOW_PAGE_LOCKS = OFF • This helps speed up table scan operations during query time and helps avoid excessive locking activity during large updates. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 10. Build clustered index on the date key of the fact table • Build nonclustered indexes for each foreign key. • This helps ‘pinpoint queries' to extract rows based on a selective dimension predicate. • Use filegroups for administration requirements such as backup / restore, partial database availability, etc. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 11. Choose partition grain carefully • Most customers use month, quarter, or year. • For efficient deletes, you must delete one full partition at a time. • It is faster to load a complete partition at a time. • Daily partitions for daily loads may be an attractive option. • However, keep in mind that a table can have a maximum of 1000 partitions. • Partition grain affects query parallelism. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 12. Choose partition grain carefully • For SQL Server 2005: • Queries touching a single partition can parallelize up to MAXDOP (maximum degree of parallelism). • Queries touching multiple partitions use one thread per partition up to MAXDOP. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 13. Choose partition grain carefully • For SQL Server 2008: • Parallel threads up to MAXDOP are distributed proportionally to scan partitions, and multiple threads per partition may be used even when several partitions must be scanned. • Avoid a partition design where only 2 or 3 partitions are touched by frequent queries, if you need MAXDOP parallelism (assuming MAXDOP =4 or larger). Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 14. Design dimension tables appropriately • Use integer surrogate keys for all dimensions, other than the Date dimension. • Use the smallest possible integer for the dimension surrogate keys. This helps to keep fact table narrow. • Use a meaningful date key of integer type derivable from the DATETIME data type (for example: 20060215). • Don't use a surrogate Key for the Date dimension Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 15. Design dimension tables appropriately • Build a clustered index on the surrogate key for each dimension table • Build a non-clustered index on the Business Key (potentially combined with a row-effective-date) to support surrogate key lookups during loads. • Build nonclustered indexes on other frequently searched dimension columns. • Avoid partitioning dimension tables. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 16. Design dimension tables appropriately • Avoid enforcing foreign key relationships between the fact and the dimension tables, to allow faster data loads. • You can create foreign key constraints with NOCHECK to document the relationships; but don’t enforce them. • Ensure data integrity though Transform Lookups, or perform the data integrity checks at the source of the data. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 17. Write effective queries for partition elimination • Whenever possible, place a query predicate (WHERE condition) directly on the partitioning key (Date dimension key) of the fact table. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 18. Use Sliding Window technique to maintain data • Maintain a rolling time window for online access to the fact tables. Load newest data, unload oldest data. • • Always keep empty partitions at both ends of the partition range to guarantee that the partition split (before loading new data) and partition merge (after unloading old data) do not incur any data movement. • Avoid split or merge of populated partitions. Splitting or merging populated partitions can be extremely inefficient, as this may cause as much as 4 times more log generation, and also cause severe locking. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 19. Use Sliding Window technique to maintain data • Create the load staging table in the same filegroup as the partition you are loading. • Create the unload staging table in the same filegroup as the partition you are deleteing. • It is fastest to load newest full partition at one time, but only possible when partition size is equal to the data load frequency (for example, you have one partition per day, and you load data once per day). Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 20. Use Sliding Window technique to maintain data • If the partition size doesn't match the data load frequency, incrementally load the latest partition. • Various options for loading bulk data into a partitioned table are discussed in the whitepaper • http://www.microsoft.com/technet/prodtechnol/sql/be stpractice/loading_bulk_data_partitioned_table.mspx. • Always unload one partition at a time. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 21. Efficiently load the initial data • Use SIMPLE or BULK LOGGED recovery model during the initial data load. • Create the partitioned fact table with the Clustered index. • Create non-indexed staging tables for each partition, and separate source data files for populating each partition. • Populate the staging tables in parallel. • Use multiple BULK INSERT, BCP or SSIS tasks. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 22. Efficiently load the initial data • Create as many load scripts to run in parallel as there are CPUs, if there is no IO bottleneck. If IO bandwidth is limited, use fewer scripts in parallel. • Use 0 batch size in the load. Use 0 commit size in the load. • Use TABLOCK. • Use BULK INSERT if the sources are flat files on the same server. Use BCP or SSIS if data is being pushed from remote machines. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 23. Efficiently load the initial data • Build a clustered index on each staging table, then create appropriate CHECK constraints. • SWITCH all partitions into the partitioned table. • Build nonclustered indexes on the partitioned table. • Possible to load 1 TB in under an hour on a 64-CPU server with a SAN capable of 14 GB/Sec throughput (non- indexed table). Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 24. Efficiently delete old data • Use partition switching whenever possible. • To delete millions of rows from nonpartitioned, indexed tables • Avoid DELETE FROM ...WHERE ... • Huge locking and logging issues • Long rollback if the delete is canceled • Usually faster to • INSERT the records to keep into a non-indexed table • Create index(es) on the table • Rename the new table to replace the original Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 25. Efficiently delete old data • As an alternative, ‘trickle' deletes using the following repeatedly in a loop DELETE TOP (1000) ... ; COMMIT • Another alternative is to update the row to mark as deleted, then delete later during non critical time. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 26. Manage statistics manually • Statistics on partitioned tables are maintained for the table as a whole. • Manually update statistics on large fact tables after loading new data. • Manually update statistics after rebuilding index on a partition. • If you regularly update statistics after periodic loads, you may turn off autostats on that table. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 27. Manage statistics manually • This is important for optimizing queries that may need to read only the newest data. • Updating statistics on small dimension tables after incremental loads may also help performance. • Use FULLSCAN option on update statistics on dimension tables for more accurate query plans. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 28. Consider efficient backup strategies • Backing up the entire database may take significant amount of time for a very large database. • For example, backing up a 2 TB database to a 10-spindle RAID-5 disk on a SAN may take 2 hours (at the rate 275 MB/sec). • Snapshot backup using SAN technology is a very good option. • Reduce the volume of data to backup regularly. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 29. Consider efficient backup strategies • The filegroups for the historical partitions can be marked as READ ONLY. • Perform a filegroup backup once when a filegroup becomes read-only. • Perform regular backups only on the read / write filegroups. • Note that RESTOREs of the read-only filegroups cannot be performed in parallel. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT