O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Data WarehouseDesign Considerations 
Ram Kedem
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Slowly Changi...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Slowly Changi...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Slowly Changi...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Slowly Changi...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
ColumnstoreIn...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Indexing the ...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Indexing the ...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Lineage ...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Lineage ...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Lineage ...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Lineage ...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Using Partiti...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Using Partiti...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Using Partiti...
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Identifying o...
Próximos SlideShares
Carregando em…5
×

3

Compartilhar

Data Warehouse Design Considerations

Data Warehouse Design Considerations

Data Warehouse Design Considerations

  1. 1. Data WarehouseDesign Considerations Ram Kedem
  2. 2. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 1 SCD •OLTP updates are moved into the DW •Any changes overwrites the current DW data •Past actual data history is lost •Historical data may be change if it doesn’t contain important business details (such as store location)
  3. 3. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 2 SCD •Data is not overwritten in the DW •A new row for the customer must be inserted •Usually created Primary Key Issues •For example –if customer details got changed, this approach suggest you insert another row in the Dimension for the same customer •You must add a Surrogate Key (DWH Key) •Incremented number for each update, same idea as Primary Key that consists from two columns. •You must also add another column or two •To flag the current value •To provide date / time perspective
  4. 4. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 1 SCD
  5. 5. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Slowly Changing Dimensions •Type 2 SCD
  6. 6. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Indexing •Indexing affects how data is stored and managed in SQL Server •There are four main indexing options in SQL Server •Clustered Index •Non Clustered Index •Filtered Non Clustered Index •Columnstoreindex (include)
  7. 7. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Indexing •Clustered Index •Determines the physical storage order of the data •There can be only one clustered index on a table •Non Clustered Index •Sorts data in a column or columns and stores pointers to the actual data row •We can have up to 999 non clustered indexes on a table
  8. 8. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Indexing •Filtered Non Clustered Index •Creates a non clustered index on a subset of values in a column •ColumnstoreIndex •A non clustered index placed on a single column •The column is stored and searched speratelyfrom the data row •Adding a columnstoreindex to a column makes the column read- only •https://www.simple-talk.com/sql/database- administration/columnstore-indexes-in-sql-server-2012/
  9. 9. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com ColumnstoreIndex CREATE NONCLUSTERED COLUMNSTORE INDEX csi_products ON dbo.products (productName, UnitPrice, unitsinstock); SELECT productName, UnitPrice, unitsinstock FROM products ;
  10. 10. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Indexing the Data Warehouse •Indexing in the Data Warehouse can be tricky •Too few indexes will allow data loads to be quick But query response time will be slow •Too many indexes slow down load, and storage requirements go up But query response is good
  11. 11. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Indexing the Data Warehouse •General rule of thumb •Dimension tables •Place a clustered index on the surrogate key •If the table has a lot of columns, create non-clustered indexes on the most popular columns •Fact tables •Place a non-clustered index on the single-column foreign keys to the dimension tables •If the primary key is a composite of all the dimension foreign keys, make it a non-unique clustered index.
  12. 12. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Index Views •What is a view •A result set of a query that is a virtual table •The virtual table is not stored permanently in the database. •The view can be referenced like a table in TSQL •Indexing a view •You can create a unique clustered index on a view •The view result set get stored in the database, just like a regular table with a clustered index.
  13. 13. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Index Views •Advantages •Improve the performance of joins and aggregations that process many rows
  14. 14. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Data Compression •SQL Server 2012 Supports data compression •Data compression reduces the size of the database •Packs more data onto few data pages •Fewer data page reads are required to satisfy queries •Lower IO means faster response; lower processing load on the server •Extra CPU resource are required for data decompression / compression •DWH usually doesn’t have much updates (other than Bulk Loading)
  15. 15. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Data Compression •SQL Server 2012 supports three compression types •Page compression •Focuses on duplicated values within the data page •Stores one value, place a pointer at all other locations •Row Compression •Remove any unused bytes in a fixed data type •CHAR(25) •Unicode compression •Reduces storage space for Unicode data that doesn’t require that space
  16. 16. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Data Compression •Which compression should you use •Page compression •It automatically uses row compression when page compression is used •If you use row compression, you cant use page compression •Facttables usually benefit the most from compression •Compression is only available in SQL Server Enterprise Edition.
  17. 17. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage •What is data lineage •Data origination and flow details •Where it is from, where it is going, how it is transformed in the process •Same concept as comments in programming
  18. 18. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage •Why do we need Data Lineage •To provide meta-data context in the DWH •Future business rules may change, affecting some data •Making it invalid •Making it suspect •Making it more important •Data lineage allows us to identify this data
  19. 19. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage •Two main options for adding Data Lineage •SSIS system variables •TSQL System functions
  20. 20. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Lineage using TSQL SELECT APP_NAME () , DATABASE_PRINCIPAL_ID (), USER_NAME () SUSER_NAME (), GETDATE () , CURRENT_TIMESTAMP () , CONNECTIONPROPERTY (‘Client_net_address’)
  21. 21. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Using Partitions •Fact tables become very large tables over time •Very large database tables present serious challenges •What if you need to delete large portion of the data ? •TRUNCATE TABLE command performs deletion with minimal logging, but it deletes the entire table. •Large data inserts become time consuming •Index maintenance and storage can become problematic •Table partitions deal with all these issues
  22. 22. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Using Partitions •What is a table partition •A large table is stored in multiple files •Divided by rows (based on condition) •Usually date / time •SQL SERVER 2012 allows up to 15,000 partitions on a single table •Partitions and data are managed in the background
  23. 23. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Using Partitions
  24. 24. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Identifying our Dimensions / Fact Tables
  • anusehjal1

    May. 21, 2018
  • ZfMohammed

    Nov. 17, 2017
  • marvinhorst58

    Mar. 6, 2015

Data Warehouse Design Considerations

Vistos

Vistos totais

10.221

No Slideshare

0

De incorporações

0

Número de incorporações

4.446

Ações

Baixados

13

Compartilhados

0

Comentários

0

Curtir

3

×