SlideShare a Scribd company logo
1 of 30
Data Warehouse Basics 
Ram Kedem
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Warehouse Basics 
•Data Usage Challenges 
•OLAP vs. OLTP 
•Understanding Normalization 
•OLAP 
•Star Schema Basics 
•Snowflake Schema Basics 
•Understanding Granularity 
•Auditing
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Usage Challenges 
•Databases are usually divided into two separate types –OLTP / OLAP
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
OLAP vs. OLTP 
OLTP SystemOnline Transaction Processing(Operational System) 
OLAP SystemOnline Analytical Processing(Data Warehouse) 
Source of data 
Operational data; OLTPs are the original source of the data. 
Consolidation data; OLAP data comes from the various OLTP Databases 
Purpose of data 
To control and run fundamental business tasks 
To help with planning, problem solving, and decision support 
What the data 
Reveals a snapshot of ongoing business processes 
Multi-dimensional views of various kinds of business activities 
Inserts and Updates 
Short and fast inserts and updates initiated by end users 
Periodic long-running batch jobs refresh the data 
Queries 
Relatively standardized and simple queries Returning relatively few records 
Often complex queries involving aggregations
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
OLAP vs. OLTP 
OLTP SystemOnline Transaction Processing(Operational System) 
OLAP SystemOnline Analytical Processing(Data Warehouse) 
Processing Speed 
Typically very fast 
Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes 
Space Requirements 
Can be relatively small if historical data is archived 
Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP 
Database Design 
Highly normalized with many tables 
Typically de-normalized with fewer tables; use of star and/or snowflake schemas 
Backup and Recovery 
Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability 
Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Data Usage Challenges 
•Databases start out as OLTP (99.99 of times…) 
•OLAP functionality becomes a need as data accumulates 
•At some point two databases are required 
•The OLTP captures and manages daily transactions 
•The OLAP is periodically loaded with data from OLTP
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Normalization 
•What is Normalization ? 
•The process of organizing the tables in a relational Database 
•Eliminates data redundancy 
•Lowers record locking 
•Increases efficiency in concurrency 
•Accomplished by dividing large tables into smaller tables 
•Tables have relationships defined
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Normalization 
•Form zero
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Normalization 
•First Form 
•Break each field down to the smallest meaningful value 
•Remove repeating groups of data and Create a separate table for each set of related data
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Normalization 
•Second Form 
•Create new tables for data that applies to more than one record in a table 
•Add a related field (foreign key) to the table
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Normalization 
•Third Form 
•Remove fields that do not relate to, or provide a fact about, the primary key. 
•Take the Manager, Dept, and Sector fields and moved to another table. In addistiona field to establish a relationship between the tables should be added
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Normalized Structure Challenges 
•It is usually very inefficient for data extraction 
•Usually requires multiple table joins to reach all the data 
•Join queries can be a challenging to write 
•Join queries can be challenging for the Database Engine 
•It doesn’t store data in the form needed for data analysis 
•data is stored in the most detailed form, without aggregation 
•Data may be stored in multiple, normalized Databases
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Star Schema Basics 
•What is a Star Schema ? 
•The simplest form of database structure used in a DWH 
•Answers the basic question : 
•What happened, who did it, when did they do it.. Etc. 
•Focuses on one, single business area 
•What advantaged does a start schema offer ? 
•Separates data into two main categories 
•Fact 
•Dimensions ( Descriptive information about the facts)
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Star Schema Basics 
•Fact vs. Dimensions 
•Fact (what happened) 
•Product sold 
•Customer who bought 
•Etc. 
•Dimensions (Attributes that describe what happened) 
•When the product was sold 
•Day / Date / year / quarter 
•Where the product was sold
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Star Schema Basics
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Star Schema Basics
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Fact Tables 
•A fact table is a collection of measurements 
•Note the word Measurements 
•This is usually a number, something we can measure about a specificbusiness process. 
•Fact table contains a single / multiple facts about a specific process (usually numeric) 
•Sales amount 
•Order quantity 
•Tax amount 
•Discount amount
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Fact Tables 
•Fact tables may contain multiple measurements only if they are closely related. 
•A data warehouse will have many fact tables 
•Each table stores data (measure) for each specific business area) 
•Products sold Fact Table / shipment details Fact Table 
•Since fact tables design depends on science and data understanding, there are many ways by which fact tables can be designed.
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Dimensions 
•Dimensions give context to measures (facts) 
•Dimensions give context, or specific meaning to facts. 
•The term “Dimension” usually refers to a table of related dimensions.
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Dimensions 
•Example : 
•A facttable contains numbers of products sold 
•A date dimension table contains the following “dimensions” of dates pertaining the number of products sold 
•Date and time (15.09.2013 09:25:32) 
•Quarter 
•DayofYear(321) 
•Week (44) 
•Weekday (Thursday)
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Dimensions 
•Each individual column in a dimension table is an attribute. 
•Attribute usually compress or expand data detail 
•Data can be “discretized” into smaller, summarized groups 
•Days (365 values) 
•Weeks (52 Values) 
•Months (12 values) 
•Quarters (4 values) 
•Hour / Minute / Second ..
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
UnderstandingDimensions
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Snowflake Schema Basics 
•What is a Snowflake Schema ? 
•A Star Schema with a little normalization added in 
•Dimension tables are normalized somewhat 
•Why use snowflake schema ? 
•To satisfy data gathering functionality of more advanced data warehousing / mining tools 
•To logically separate large dimensions tables 
•To more naturally separate dimensional data 
•Known customers vs. anonymous customers
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Snowflake Schema Basics 
•One main rule concerning snowflake schema 
•Don’t use it, Unless you want to or need to.
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Snowflake Schema Basics
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Granularity 
•What is meant by the tem granularity in a DWH ? 
•The level of detail available 
•What determines Granularity 
•The level of data loaded into the fact table 
•For example, per order numbers vs. daily numbers vs. weekly numbers etc. 
•The number and detail level of dimensions 
•If we want to look into customer details but we don’t have customer dimension –this data won’t be available
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Understanding Granularity 
•Granularity should be determined during database design. 
•This change can be made after database was created as well, but it will require much more effort. 
•This change may involve 
•Changing fact table structure 
•Possible changes in dimension tables 
•Changes in data loading
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Auditing 
•Data warehouses do not store data as it is created. 
•OLTP databases are populated as business occurs 
•Source and purpose of data is generally self explanatory 
•Data is added when transaction occurs 
•DWH are populated from OLTP data 
•Based on various conditions 
•At various times 
•From various sources
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Auditing 
•Data can be informative based on different aspects 
•The data itself 
•The source of the data 
•The volume of the data 
•These characteristics usually change over time 
•Auditing identify these aspects 
•Usually stored in tables 
•Describe source, duration of load, who performed the load, etc.
Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com 
Auditing 
•SQL Server Integration Services 
•Provides SSIS logging

More Related Content

What's hot

Data Staging Strategy
Data Staging StrategyData Staging Strategy
Data Staging StrategyMilind Zodge
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceDATAVERSITY
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)DATAVERSITY
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management DATAVERSITY
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceDenodo
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksKnoldus Inc.
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaionsridhark1981
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQLRam Kedem
 
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesAccelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesPaul Van Siclen
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best PracticesBoris Otto
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 

What's hot (20)

Data Staging Strategy
Data Staging StrategyData Staging Strategy
Data Staging Strategy
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and GovernanceData Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
From Data Warehouse to Lakehouse
From Data Warehouse to LakehouseFrom Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaion
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQL
 
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelinesAccelerate and modernize your data pipelines
Accelerate and modernize your data pipelines
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 

Viewers also liked

Data Warehouse Design Considerations
Data Warehouse Design ConsiderationsData Warehouse Design Considerations
Data Warehouse Design ConsiderationsRam Kedem
 
SSIS Basic Data Flow
SSIS Basic Data FlowSSIS Basic Data Flow
SSIS Basic Data FlowRam Kedem
 
SSIS Incremental ETL process
SSIS Incremental ETL processSSIS Incremental ETL process
SSIS Incremental ETL processRam Kedem
 
Big data (Data Size doesn't Matter, How and What is Data that's matter)
Big data (Data Size doesn't Matter, How and What is Data that's matter)Big data (Data Size doesn't Matter, How and What is Data that's matter)
Big data (Data Size doesn't Matter, How and What is Data that's matter)Syed Taimoor Hussain Shah
 
SSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesSSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesRam Kedem
 
SSIS Data Flow Tasks
SSIS Data Flow Tasks SSIS Data Flow Tasks
SSIS Data Flow Tasks Ram Kedem
 
Control Flow Using SSIS
Control Flow Using SSISControl Flow Using SSIS
Control Flow Using SSISRam Kedem
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 

Viewers also liked (8)

Data Warehouse Design Considerations
Data Warehouse Design ConsiderationsData Warehouse Design Considerations
Data Warehouse Design Considerations
 
SSIS Basic Data Flow
SSIS Basic Data FlowSSIS Basic Data Flow
SSIS Basic Data Flow
 
SSIS Incremental ETL process
SSIS Incremental ETL processSSIS Incremental ETL process
SSIS Incremental ETL process
 
Big data (Data Size doesn't Matter, How and What is Data that's matter)
Big data (Data Size doesn't Matter, How and What is Data that's matter)Big data (Data Size doesn't Matter, How and What is Data that's matter)
Big data (Data Size doesn't Matter, How and What is Data that's matter)
 
SSAS Cubes & Hierarchies
SSAS Cubes & HierarchiesSSAS Cubes & Hierarchies
SSAS Cubes & Hierarchies
 
SSIS Data Flow Tasks
SSIS Data Flow Tasks SSIS Data Flow Tasks
SSIS Data Flow Tasks
 
Control Flow Using SSIS
Control Flow Using SSISControl Flow Using SSIS
Control Flow Using SSIS
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 

Similar to Data Warehouse Basics

Data mining In SSAS
Data mining In SSASData mining In SSAS
Data mining In SSASRam Kedem
 
Data Mining in SSAS
Data Mining in SSASData Mining in SSAS
Data Mining in SSASRam Kedem
 
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big DataConflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big DataHalo BI
 
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Mrunal Shridhar
 
Column Statistics in Hive
Column Statistics in HiveColumn Statistics in Hive
Column Statistics in Hivevshreepadma
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksAppDynamics
 
Microservices for java architects schamburg-2015-05-19
Microservices for java architects schamburg-2015-05-19Microservices for java architects schamburg-2015-05-19
Microservices for java architects schamburg-2015-05-19Derek Ashmore
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourceTerry Bunio
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameCloudera, Inc.
 
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)Derek Ashmore
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingVibrant Event
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingVibrant Event
 
Microservices for Java Architects (Chicago, April 21, 2015)
Microservices for Java Architects (Chicago, April 21, 2015)Microservices for Java Architects (Chicago, April 21, 2015)
Microservices for Java Architects (Chicago, April 21, 2015)Derek Ashmore
 
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business NeedsDesigning Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business NeedsBrian Anderson
 
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business NeedsDesigning Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business NeedsEagle Technologies
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...DataWorks Summit
 

Similar to Data Warehouse Basics (20)

Data mining In SSAS
Data mining In SSASData mining In SSAS
Data mining In SSAS
 
Data Mining in SSAS
Data Mining in SSASData Mining in SSAS
Data Mining in SSAS
 
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big DataConflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big Data
 
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613
 
Column Statistics in Hive
Column Statistics in HiveColumn Statistics in Hive
Column Statistics in Hive
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & TricksTop 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & Tricks
 
Microservices for java architects schamburg-2015-05-19
Microservices for java architects schamburg-2015-05-19Microservices for java architects schamburg-2015-05-19
Microservices for java architects schamburg-2015-05-19
 
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open sourcePr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
 
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
 
Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL TestingETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
Microservices for Java Architects (Chicago, April 21, 2015)
Microservices for Java Architects (Chicago, April 21, 2015)Microservices for Java Architects (Chicago, April 21, 2015)
Microservices for Java Architects (Chicago, April 21, 2015)
 
dwproblems.pptx
dwproblems.pptxdwproblems.pptx
dwproblems.pptx
 
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business NeedsDesigning Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business Needs
 
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business NeedsDesigning Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business Needs
 
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
 

More from Ram Kedem

Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edgeRam Kedem
 
Advanced SQL Webinar
Advanced SQL WebinarAdvanced SQL Webinar
Advanced SQL WebinarRam Kedem
 
Managing oracle Database Instance
Managing oracle Database InstanceManaging oracle Database Instance
Managing oracle Database InstanceRam Kedem
 
Power Pivot and Power View
Power Pivot and Power ViewPower Pivot and Power View
Power Pivot and Power ViewRam Kedem
 
SQL Injections - Oracle
SQL Injections - OracleSQL Injections - Oracle
SQL Injections - OracleRam Kedem
 
SSAS Attributes
SSAS AttributesSSAS Attributes
SSAS AttributesRam Kedem
 
DDL Practice (Hebrew)
DDL Practice (Hebrew)DDL Practice (Hebrew)
DDL Practice (Hebrew)Ram Kedem
 
DML Practice (Hebrew)
DML Practice (Hebrew)DML Practice (Hebrew)
DML Practice (Hebrew)Ram Kedem
 
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Ram Kedem
 
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Ram Kedem
 
Pig - Processing XML data
Pig - Processing XML dataPig - Processing XML data
Pig - Processing XML dataRam Kedem
 
SSRS Basic Parameters
SSRS Basic ParametersSSRS Basic Parameters
SSRS Basic ParametersRam Kedem
 
SSRS Conditional Formatting
SSRS Conditional FormattingSSRS Conditional Formatting
SSRS Conditional FormattingRam Kedem
 
SSRS Calculated Fields
SSRS Calculated FieldsSSRS Calculated Fields
SSRS Calculated FieldsRam Kedem
 
MSSQL Server - Automation
MSSQL Server - AutomationMSSQL Server - Automation
MSSQL Server - AutomationRam Kedem
 
Lesson 5 security
Lesson 5   securityLesson 5   security
Lesson 5 securityRam Kedem
 

More from Ram Kedem (20)

Impala use case @ edge
Impala use case @ edgeImpala use case @ edge
Impala use case @ edge
 
Advanced SQL Webinar
Advanced SQL WebinarAdvanced SQL Webinar
Advanced SQL Webinar
 
Managing oracle Database Instance
Managing oracle Database InstanceManaging oracle Database Instance
Managing oracle Database Instance
 
Power Pivot and Power View
Power Pivot and Power ViewPower Pivot and Power View
Power Pivot and Power View
 
SQL Injections - Oracle
SQL Injections - OracleSQL Injections - Oracle
SQL Injections - Oracle
 
SSAS Attributes
SSAS AttributesSSAS Attributes
SSAS Attributes
 
SSRS Matrix
SSRS MatrixSSRS Matrix
SSRS Matrix
 
DDL Practice (Hebrew)
DDL Practice (Hebrew)DDL Practice (Hebrew)
DDL Practice (Hebrew)
 
DML Practice (Hebrew)
DML Practice (Hebrew)DML Practice (Hebrew)
DML Practice (Hebrew)
 
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)
 
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014
 
Pig - Processing XML data
Pig - Processing XML dataPig - Processing XML data
Pig - Processing XML data
 
SSRS Basic Parameters
SSRS Basic ParametersSSRS Basic Parameters
SSRS Basic Parameters
 
SSRS Gauges
SSRS GaugesSSRS Gauges
SSRS Gauges
 
SSRS Conditional Formatting
SSRS Conditional FormattingSSRS Conditional Formatting
SSRS Conditional Formatting
 
SSRS Calculated Fields
SSRS Calculated FieldsSSRS Calculated Fields
SSRS Calculated Fields
 
SSRS Groups
SSRS GroupsSSRS Groups
SSRS Groups
 
Deploy SSIS
Deploy SSISDeploy SSIS
Deploy SSIS
 
MSSQL Server - Automation
MSSQL Server - AutomationMSSQL Server - Automation
MSSQL Server - Automation
 
Lesson 5 security
Lesson 5   securityLesson 5   security
Lesson 5 security
 

Recently uploaded

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Data Warehouse Basics

  • 2. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Warehouse Basics •Data Usage Challenges •OLAP vs. OLTP •Understanding Normalization •OLAP •Star Schema Basics •Snowflake Schema Basics •Understanding Granularity •Auditing
  • 3. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Usage Challenges •Databases are usually divided into two separate types –OLTP / OLAP
  • 4. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com OLAP vs. OLTP OLTP SystemOnline Transaction Processing(Operational System) OLAP SystemOnline Analytical Processing(Data Warehouse) Source of data Operational data; OLTPs are the original source of the data. Consolidation data; OLAP data comes from the various OLTP Databases Purpose of data To control and run fundamental business tasks To help with planning, problem solving, and decision support What the data Reveals a snapshot of ongoing business processes Multi-dimensional views of various kinds of business activities Inserts and Updates Short and fast inserts and updates initiated by end users Periodic long-running batch jobs refresh the data Queries Relatively standardized and simple queries Returning relatively few records Often complex queries involving aggregations
  • 5. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com OLAP vs. OLTP OLTP SystemOnline Transaction Processing(Operational System) OLAP SystemOnline Analytical Processing(Data Warehouse) Processing Speed Typically very fast Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes Space Requirements Can be relatively small if historical data is archived Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP Database Design Highly normalized with many tables Typically de-normalized with fewer tables; use of star and/or snowflake schemas Backup and Recovery Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method
  • 6. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Usage Challenges •Databases start out as OLTP (99.99 of times…) •OLAP functionality becomes a need as data accumulates •At some point two databases are required •The OLTP captures and manages daily transactions •The OLAP is periodically loaded with data from OLTP
  • 7. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •What is Normalization ? •The process of organizing the tables in a relational Database •Eliminates data redundancy •Lowers record locking •Increases efficiency in concurrency •Accomplished by dividing large tables into smaller tables •Tables have relationships defined
  • 8. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •Form zero
  • 9. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •First Form •Break each field down to the smallest meaningful value •Remove repeating groups of data and Create a separate table for each set of related data
  • 10. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •Second Form •Create new tables for data that applies to more than one record in a table •Add a related field (foreign key) to the table
  • 11. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •Third Form •Remove fields that do not relate to, or provide a fact about, the primary key. •Take the Manager, Dept, and Sector fields and moved to another table. In addistiona field to establish a relationship between the tables should be added
  • 12. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Normalized Structure Challenges •It is usually very inefficient for data extraction •Usually requires multiple table joins to reach all the data •Join queries can be a challenging to write •Join queries can be challenging for the Database Engine •It doesn’t store data in the form needed for data analysis •data is stored in the most detailed form, without aggregation •Data may be stored in multiple, normalized Databases
  • 13. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics •What is a Star Schema ? •The simplest form of database structure used in a DWH •Answers the basic question : •What happened, who did it, when did they do it.. Etc. •Focuses on one, single business area •What advantaged does a start schema offer ? •Separates data into two main categories •Fact •Dimensions ( Descriptive information about the facts)
  • 14. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics •Fact vs. Dimensions •Fact (what happened) •Product sold •Customer who bought •Etc. •Dimensions (Attributes that describe what happened) •When the product was sold •Day / Date / year / quarter •Where the product was sold
  • 15. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics
  • 16. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics
  • 17. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Fact Tables •A fact table is a collection of measurements •Note the word Measurements •This is usually a number, something we can measure about a specificbusiness process. •Fact table contains a single / multiple facts about a specific process (usually numeric) •Sales amount •Order quantity •Tax amount •Discount amount
  • 18. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Fact Tables •Fact tables may contain multiple measurements only if they are closely related. •A data warehouse will have many fact tables •Each table stores data (measure) for each specific business area) •Products sold Fact Table / shipment details Fact Table •Since fact tables design depends on science and data understanding, there are many ways by which fact tables can be designed.
  • 19. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Dimensions •Dimensions give context to measures (facts) •Dimensions give context, or specific meaning to facts. •The term “Dimension” usually refers to a table of related dimensions.
  • 20. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Dimensions •Example : •A facttable contains numbers of products sold •A date dimension table contains the following “dimensions” of dates pertaining the number of products sold •Date and time (15.09.2013 09:25:32) •Quarter •DayofYear(321) •Week (44) •Weekday (Thursday)
  • 21. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Dimensions •Each individual column in a dimension table is an attribute. •Attribute usually compress or expand data detail •Data can be “discretized” into smaller, summarized groups •Days (365 values) •Weeks (52 Values) •Months (12 values) •Quarters (4 values) •Hour / Minute / Second ..
  • 22. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com UnderstandingDimensions
  • 23. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Snowflake Schema Basics •What is a Snowflake Schema ? •A Star Schema with a little normalization added in •Dimension tables are normalized somewhat •Why use snowflake schema ? •To satisfy data gathering functionality of more advanced data warehousing / mining tools •To logically separate large dimensions tables •To more naturally separate dimensional data •Known customers vs. anonymous customers
  • 24. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Snowflake Schema Basics •One main rule concerning snowflake schema •Don’t use it, Unless you want to or need to.
  • 25. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Snowflake Schema Basics
  • 26. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Granularity •What is meant by the tem granularity in a DWH ? •The level of detail available •What determines Granularity •The level of data loaded into the fact table •For example, per order numbers vs. daily numbers vs. weekly numbers etc. •The number and detail level of dimensions •If we want to look into customer details but we don’t have customer dimension –this data won’t be available
  • 27. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Granularity •Granularity should be determined during database design. •This change can be made after database was created as well, but it will require much more effort. •This change may involve •Changing fact table structure •Possible changes in dimension tables •Changes in data loading
  • 28. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Auditing •Data warehouses do not store data as it is created. •OLTP databases are populated as business occurs •Source and purpose of data is generally self explanatory •Data is added when transaction occurs •DWH are populated from OLTP data •Based on various conditions •At various times •From various sources
  • 29. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Auditing •Data can be informative based on different aspects •The data itself •The source of the data •The volume of the data •These characteristics usually change over time •Auditing identify these aspects •Usually stored in tables •Describe source, duration of load, who performed the load, etc.
  • 30. Copyright 2014 © Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Auditing •SQL Server Integration Services •Provides SSIS logging