Submit Search
Upload
Data Warehouse Basics
•
4 likes
•
3,760 views
Ram Kedem
Follow
Data Warehouse Basics
Read less
Read more
Technology
Report
Share
Report
Share
1 of 30
Recommended
Introduction to Databases
Introduction to Databases
Ram Kedem
Data Mesh for Dinner
Data Mesh for Dinner
Kent Graziano
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
Data Governance Best Practices
Data Governance Best Practices
DATAVERSITY
Master Data Management's Place in the Data Governance Landscape
Master Data Management's Place in the Data Governance Landscape
CCG
Straight Talk to Demystify Data Lineage
Straight Talk to Demystify Data Lineage
DATAVERSITY
Modern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
Data Architecture Brief Overview
Data Architecture Brief Overview
Hal Kalechofsky
Recommended
Introduction to Databases
Introduction to Databases
Ram Kedem
Data Mesh for Dinner
Data Mesh for Dinner
Kent Graziano
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
Data Governance Best Practices
Data Governance Best Practices
DATAVERSITY
Master Data Management's Place in the Data Governance Landscape
Master Data Management's Place in the Data Governance Landscape
CCG
Straight Talk to Demystify Data Lineage
Straight Talk to Demystify Data Lineage
DATAVERSITY
Modern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
Data Architecture Brief Overview
Data Architecture Brief Overview
Hal Kalechofsky
Data Staging Strategy
Data Staging Strategy
Milind Zodge
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
DATAVERSITY
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
DATAVERSITY
From Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
Modern Data Stack France
Data Warehouse
Data Warehouse
AttaUrRahman78
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)
DATAVERSITY
Data Governance and Metadata Management
Data Governance and Metadata Management
DATAVERSITY
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
Denodo
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
Knoldus Inc.
Data warehouse presentaion
Data warehouse presentaion
sridhark1981
Introduction to SQL
Introduction to SQL
Ram Kedem
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelines
Paul Van Siclen
Modern Data Architecture
Modern Data Architecture
Alexey Grishchenko
Time to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
Data Governance Best Practices
Data Governance Best Practices
Boris Otto
Data warehouse
Data warehouse
Richard Bányi
Data Warehouse
Data Warehouse
MadhuriNigam1
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
Data Warehouse Design Considerations
Data Warehouse Design Considerations
Ram Kedem
SSIS Basic Data Flow
SSIS Basic Data Flow
Ram Kedem
More Related Content
What's hot
Data Staging Strategy
Data Staging Strategy
Milind Zodge
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
DATAVERSITY
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
DATAVERSITY
From Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
Modern Data Stack France
Data Warehouse
Data Warehouse
AttaUrRahman78
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)
DATAVERSITY
Data Governance and Metadata Management
Data Governance and Metadata Management
DATAVERSITY
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
Denodo
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
Knoldus Inc.
Data warehouse presentaion
Data warehouse presentaion
sridhark1981
Introduction to SQL
Introduction to SQL
Ram Kedem
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelines
Paul Van Siclen
Modern Data Architecture
Modern Data Architecture
Alexey Grishchenko
Time to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
Data Governance Best Practices
Data Governance Best Practices
Boris Otto
Data warehouse
Data warehouse
Richard Bányi
Data Warehouse
Data Warehouse
MadhuriNigam1
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
What's hot
(20)
Data Staging Strategy
Data Staging Strategy
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture - The Foundation for Enterprise Architecture and Governance
Data Architecture - The Foundation for Enterprise Architecture and Governance
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
From Data Warehouse to Lakehouse
From Data Warehouse to Lakehouse
Data Warehouse
Data Warehouse
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance and Metadata Management
Data Governance and Metadata Management
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
Data warehouse presentaion
Data warehouse presentaion
Introduction to SQL
Introduction to SQL
Accelerate and modernize your data pipelines
Accelerate and modernize your data pipelines
Modern Data Architecture
Modern Data Architecture
Time to Talk about Data Mesh
Time to Talk about Data Mesh
Data Governance Best Practices
Data Governance Best Practices
Data warehouse
Data warehouse
Data Warehouse
Data Warehouse
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Viewers also liked
Data Warehouse Design Considerations
Data Warehouse Design Considerations
Ram Kedem
SSIS Basic Data Flow
SSIS Basic Data Flow
Ram Kedem
SSIS Incremental ETL process
SSIS Incremental ETL process
Ram Kedem
Big data (Data Size doesn't Matter, How and What is Data that's matter)
Big data (Data Size doesn't Matter, How and What is Data that's matter)
Syed Taimoor Hussain Shah
SSAS Cubes & Hierarchies
SSAS Cubes & Hierarchies
Ram Kedem
SSIS Data Flow Tasks
SSIS Data Flow Tasks
Ram Kedem
Control Flow Using SSIS
Control Flow Using SSIS
Ram Kedem
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
James Serra
Viewers also liked
(8)
Data Warehouse Design Considerations
Data Warehouse Design Considerations
SSIS Basic Data Flow
SSIS Basic Data Flow
SSIS Incremental ETL process
SSIS Incremental ETL process
Big data (Data Size doesn't Matter, How and What is Data that's matter)
Big data (Data Size doesn't Matter, How and What is Data that's matter)
SSAS Cubes & Hierarchies
SSAS Cubes & Hierarchies
SSIS Data Flow Tasks
SSIS Data Flow Tasks
Control Flow Using SSIS
Control Flow Using SSIS
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
Similar to Data Warehouse Basics
Data mining In SSAS
Data mining In SSAS
Ram Kedem
Data Mining in SSAS
Data Mining in SSAS
Ram Kedem
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big Data
Halo BI
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613
Mrunal Shridhar
Column Statistics in Hive
Column Statistics in Hive
vshreepadma
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
Vibrant Technologies & Computers
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & Tricks
AppDynamics
Microservices for java architects schamburg-2015-05-19
Microservices for java architects schamburg-2015-05-19
Derek Ashmore
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
Terry Bunio
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
Cloudera, Inc.
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Derek Ashmore
Data Vault Introduction
Data Vault Introduction
Patrick Van Renterghem
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
Vibrant Technologies & Computers
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
Vibrant Event
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
Vibrant Event
Microservices for Java Architects (Chicago, April 21, 2015)
Microservices for Java Architects (Chicago, April 21, 2015)
Derek Ashmore
dwproblems.pptx
dwproblems.pptx
manojMarwah
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business Needs
Brian Anderson
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business Needs
Eagle Technologies
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
DataWorks Summit
Similar to Data Warehouse Basics
(20)
Data mining In SSAS
Data mining In SSAS
Data Mining in SSAS
Data Mining in SSAS
Conflict in the Cloud – Issues & Solutions for Big Data
Conflict in the Cloud – Issues & Solutions for Big Data
Designing dashboards for performance shridhar wip 040613
Designing dashboards for performance shridhar wip 040613
Column Statistics in Hive
Column Statistics in Hive
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
Top 5 Java Performance Metrics, Tips & Tricks
Top 5 Java Performance Metrics, Tips & Tricks
Microservices for java architects schamburg-2015-05-19
Microservices for java architects schamburg-2015-05-19
Pr dc 2015 sql server is cheaper than open source
Pr dc 2015 sql server is cheaper than open source
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Microservices for Java Architects (Madison-Milwaukee, April 28-9, 2015)
Data Vault Introduction
Data Vault Introduction
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL Testing
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
Microservices for Java Architects (Chicago, April 21, 2015)
Microservices for Java Architects (Chicago, April 21, 2015)
dwproblems.pptx
dwproblems.pptx
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business Needs
Designing Effective Storage Strategies to Meet Business Needs
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
Faster, Cheaper, Easier... and Successful Best Practices for Big Data Integra...
More from Ram Kedem
Impala use case @ edge
Impala use case @ edge
Ram Kedem
Advanced SQL Webinar
Advanced SQL Webinar
Ram Kedem
Managing oracle Database Instance
Managing oracle Database Instance
Ram Kedem
Power Pivot and Power View
Power Pivot and Power View
Ram Kedem
SQL Injections - Oracle
SQL Injections - Oracle
Ram Kedem
SSAS Attributes
SSAS Attributes
Ram Kedem
SSRS Matrix
SSRS Matrix
Ram Kedem
DDL Practice (Hebrew)
DDL Practice (Hebrew)
Ram Kedem
DML Practice (Hebrew)
DML Practice (Hebrew)
Ram Kedem
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)
Ram Kedem
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014
Ram Kedem
Pig - Processing XML data
Pig - Processing XML data
Ram Kedem
SSRS Basic Parameters
SSRS Basic Parameters
Ram Kedem
SSRS Gauges
SSRS Gauges
Ram Kedem
SSRS Conditional Formatting
SSRS Conditional Formatting
Ram Kedem
SSRS Calculated Fields
SSRS Calculated Fields
Ram Kedem
SSRS Groups
SSRS Groups
Ram Kedem
Deploy SSIS
Deploy SSIS
Ram Kedem
MSSQL Server - Automation
MSSQL Server - Automation
Ram Kedem
Lesson 5 security
Lesson 5 security
Ram Kedem
More from Ram Kedem
(20)
Impala use case @ edge
Impala use case @ edge
Advanced SQL Webinar
Advanced SQL Webinar
Managing oracle Database Instance
Managing oracle Database Instance
Power Pivot and Power View
Power Pivot and Power View
SQL Injections - Oracle
SQL Injections - Oracle
SSAS Attributes
SSAS Attributes
SSRS Matrix
SSRS Matrix
DDL Practice (Hebrew)
DDL Practice (Hebrew)
DML Practice (Hebrew)
DML Practice (Hebrew)
Exploring Oracle Database Architecture (Hebrew)
Exploring Oracle Database Architecture (Hebrew)
Deploy SSRS Project - SQL Server 2014
Deploy SSRS Project - SQL Server 2014
Pig - Processing XML data
Pig - Processing XML data
SSRS Basic Parameters
SSRS Basic Parameters
SSRS Gauges
SSRS Gauges
SSRS Conditional Formatting
SSRS Conditional Formatting
SSRS Calculated Fields
SSRS Calculated Fields
SSRS Groups
SSRS Groups
Deploy SSIS
Deploy SSIS
MSSQL Server - Automation
MSSQL Server - Automation
Lesson 5 security
Lesson 5 security
Recently uploaded
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Precisely
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Dubai Multi Commodity Centre
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
2toLead Limited
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
Dilum Bandara
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Mark Simos
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Alfredo García Lavilla
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
LoriGlavin3
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
LoriGlavin3
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Kalema Edgar
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Hervé Boutemy
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
LoriGlavin3
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
mohitsingh558521
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Mattias Andersson
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Lonnie McRorey
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
ScyllaDB
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
LoriGlavin3
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
Raghuram Pandurangan
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Slibray Presentation
Recently uploaded
(20)
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Data Warehouse Basics
1.
Data Warehouse Basics
Ram Kedem
2.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Warehouse Basics •Data Usage Challenges •OLAP vs. OLTP •Understanding Normalization •OLAP •Star Schema Basics •Snowflake Schema Basics •Understanding Granularity •Auditing
3.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Usage Challenges •Databases are usually divided into two separate types –OLTP / OLAP
4.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com OLAP vs. OLTP OLTP SystemOnline Transaction Processing(Operational System) OLAP SystemOnline Analytical Processing(Data Warehouse) Source of data Operational data; OLTPs are the original source of the data. Consolidation data; OLAP data comes from the various OLTP Databases Purpose of data To control and run fundamental business tasks To help with planning, problem solving, and decision support What the data Reveals a snapshot of ongoing business processes Multi-dimensional views of various kinds of business activities Inserts and Updates Short and fast inserts and updates initiated by end users Periodic long-running batch jobs refresh the data Queries Relatively standardized and simple queries Returning relatively few records Often complex queries involving aggregations
5.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com OLAP vs. OLTP OLTP SystemOnline Transaction Processing(Operational System) OLAP SystemOnline Analytical Processing(Data Warehouse) Processing Speed Typically very fast Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes Space Requirements Can be relatively small if historical data is archived Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP Database Design Highly normalized with many tables Typically de-normalized with fewer tables; use of star and/or snowflake schemas Backup and Recovery Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method
6.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Data Usage Challenges •Databases start out as OLTP (99.99 of times…) •OLAP functionality becomes a need as data accumulates •At some point two databases are required •The OLTP captures and manages daily transactions •The OLAP is periodically loaded with data from OLTP
7.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •What is Normalization ? •The process of organizing the tables in a relational Database •Eliminates data redundancy •Lowers record locking •Increases efficiency in concurrency •Accomplished by dividing large tables into smaller tables •Tables have relationships defined
8.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •Form zero
9.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •First Form •Break each field down to the smallest meaningful value •Remove repeating groups of data and Create a separate table for each set of related data
10.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •Second Form •Create new tables for data that applies to more than one record in a table •Add a related field (foreign key) to the table
11.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Normalization •Third Form •Remove fields that do not relate to, or provide a fact about, the primary key. •Take the Manager, Dept, and Sector fields and moved to another table. In addistiona field to establish a relationship between the tables should be added
12.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Normalized Structure Challenges •It is usually very inefficient for data extraction •Usually requires multiple table joins to reach all the data •Join queries can be a challenging to write •Join queries can be challenging for the Database Engine •It doesn’t store data in the form needed for data analysis •data is stored in the most detailed form, without aggregation •Data may be stored in multiple, normalized Databases
13.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics •What is a Star Schema ? •The simplest form of database structure used in a DWH •Answers the basic question : •What happened, who did it, when did they do it.. Etc. •Focuses on one, single business area •What advantaged does a start schema offer ? •Separates data into two main categories •Fact •Dimensions ( Descriptive information about the facts)
14.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics •Fact vs. Dimensions •Fact (what happened) •Product sold •Customer who bought •Etc. •Dimensions (Attributes that describe what happened) •When the product was sold •Day / Date / year / quarter •Where the product was sold
15.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics
16.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Star Schema Basics
17.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Fact Tables •A fact table is a collection of measurements •Note the word Measurements •This is usually a number, something we can measure about a specificbusiness process. •Fact table contains a single / multiple facts about a specific process (usually numeric) •Sales amount •Order quantity •Tax amount •Discount amount
18.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Fact Tables •Fact tables may contain multiple measurements only if they are closely related. •A data warehouse will have many fact tables •Each table stores data (measure) for each specific business area) •Products sold Fact Table / shipment details Fact Table •Since fact tables design depends on science and data understanding, there are many ways by which fact tables can be designed.
19.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Dimensions •Dimensions give context to measures (facts) •Dimensions give context, or specific meaning to facts. •The term “Dimension” usually refers to a table of related dimensions.
20.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Dimensions •Example : •A facttable contains numbers of products sold •A date dimension table contains the following “dimensions” of dates pertaining the number of products sold •Date and time (15.09.2013 09:25:32) •Quarter •DayofYear(321) •Week (44) •Weekday (Thursday)
21.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Dimensions •Each individual column in a dimension table is an attribute. •Attribute usually compress or expand data detail •Data can be “discretized” into smaller, summarized groups •Days (365 values) •Weeks (52 Values) •Months (12 values) •Quarters (4 values) •Hour / Minute / Second ..
22.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com UnderstandingDimensions
23.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Snowflake Schema Basics •What is a Snowflake Schema ? •A Star Schema with a little normalization added in •Dimension tables are normalized somewhat •Why use snowflake schema ? •To satisfy data gathering functionality of more advanced data warehousing / mining tools •To logically separate large dimensions tables •To more naturally separate dimensional data •Known customers vs. anonymous customers
24.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Snowflake Schema Basics •One main rule concerning snowflake schema •Don’t use it, Unless you want to or need to.
25.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Snowflake Schema Basics
26.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Granularity •What is meant by the tem granularity in a DWH ? •The level of detail available •What determines Granularity •The level of data loaded into the fact table •For example, per order numbers vs. daily numbers vs. weekly numbers etc. •The number and detail level of dimensions •If we want to look into customer details but we don’t have customer dimension –this data won’t be available
27.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Understanding Granularity •Granularity should be determined during database design. •This change can be made after database was created as well, but it will require much more effort. •This change may involve •Changing fact table structure •Possible changes in dimension tables •Changes in data loading
28.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Auditing •Data warehouses do not store data as it is created. •OLTP databases are populated as business occurs •Source and purpose of data is generally self explanatory •Data is added when transaction occurs •DWH are populated from OLTP data •Based on various conditions •At various times •From various sources
29.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Auditing •Data can be informative based on different aspects •The data itself •The source of the data •The volume of the data •These characteristics usually change over time •Auditing identify these aspects •Usually stored in tables •Describe source, duration of load, who performed the load, etc.
30.
Copyright 2014 ©
Ram Kedem. All rights reserved. Not to be reproduced without written consent. ramkedem.com Auditing •SQL Server Integration Services •Provides SSIS logging