SlideShare uma empresa Scribd logo
1 de 42
Introduction to Data Warehousing


            By
                 MAHESH.AMPOLU
Necessity is the mother of invention




            Why Data Warehouse?
Scenario

Unilever is a company with branches at UK,
India, America and Japan. The Sales Manager
wants quarterly sales report. Each branch has a
separate operational system.
Scenario 1 : Unilever company.
   India




    UK
            Sales per item type per branch    Sales
                   for first quarter.        Manager

  America




   Japan
Solution :      Unilever company.

 Extract sales information from each database.
 Store the information in a common repository at a
  single site.
Solution : Unilever company.
 India


                                         Report
  UK
                          Query &                  Sales
              Data      Analysis tools            Manager
            Warehouse

America




 Japan
Scenario :

Hindustan Unilever is a small,new company.
President of the company wants his company should
grow. He needs information so that he can make
correct decisions.
Solution :
 Improve  the quality of data before loading it
  into the warehouse.
 Perform data cleaning and transformation
  before loading the data.
 Use query analysis tools to support adhoc
  queries.
Solution
                                                  Expansio
                                                     n

                             sales

   Data      Query and Analysis             President
 Warehouse          tool

                                     time

                                                Improvemen
                                                     t
What is Data Warehouse??
Inmons’s definition :
 A data warehouse is
       -subject-oriented,
       -integrated,
       -time-variant,
       -nonvolatile
collection of data in support of management’s
decision making process.
Subject-oriented
 Data  warehouse is organized around subjects such
  as sales,product,customer.
 It focuses on modeling and analysis of data for
  decision makers.
 Excludes data not useful in decision support
  process.
Integration
 Data Warehouse is constructed by integrating
  multiple heterogeneous sources.
 Data Preprocessing are applied to ensure
  consistency.
                   RDBMS



                                            Data
                    Legacy                Warehouse
                    System



                   Flat File            Data Processing
                                        Data Transformation
Integration
 In   terms of data.
  – encoding structures.

  – Measurement of
       attributes.

  – physical attribute.
        of data            remarks




  – naming conventions.

  – Data type format
Time-variant
 Provides  information from historical perspective
  e.g. past 5-10 years
 Every key structure contains either implicitly or
  explicitly an element of time
Nonvolatile
 Data  once recorded cannot be updated.
 Data warehouse requires two operations in data
  accessing
   – Initial loading of data
   – Access of data




     load


                            access
Data Warehousing Architecture
Data Warehouse Architecture
 Data  Warehouse server
  – almost always a relational DBMS,rarely flat
     files
 OLAP servers
  – to support and operate on multi-dimensional
     data structures
 Clients
  – Query and reporting tools
  – Analysis tools
  – Data mining tools
OLTP vs OLAP
Data Warehouse Schema
 Star Schema
 Fact Constellation Schema
 Snowflake Schema
Star Schema
A star schema consists of at least one
 fact table and a number of dimension
 tables.

 Star
     Schema is highly recommended
 schema for SSAS cubes.
Star Schema
Store Dimension           Fact Table                   Time Dimension
 Store Key                 Store Key                  Period Key
 Store Name                Product Key                Year
 City                      Period Key                 Quarter
 State                     Units                      Month
 Region                    Price


                           Product Key
                           Product Desc

                         Product
                         Dimension

Benefits: Easy to understand, easy to define hierarchies, reduces
no. of physical
joins.
SnowFlake Schema
 Variant of star schema model.
 A single,large and central fact table and one
  or more tables for each dimension.
 Dimension tables are normalized i.e. split
  dimension table data into additional tables
SnowFlake Schema
Store Dimension             Fact Table                   Time Dimension
                             Store Key                   Period Key
Store Key
                             Product Key                 Year
Store Name
                             Period Key                  Quarter
City Key
                             Units                       Month
                             Price
 City Dimension
City Key
                             Product Key
City
                             Product Desc
State
Region                    Product
                          Dimension
Drawbacks: Time consuming joins,report generation slow
Fact Constellation
 Multiple fact tables share dimension tables.
 This schema is viewed as collection of stars
  hence called galaxy schema or fact
  constellation.
 Sophisticated application requires such
  schema.
Fact Constellation
    Sales                              Shipping
    Fact Table     Product
                   Dimension           Fact Table
Store Key
                                     Shipper Key
Product Key        Product Key       Store Key
Period Key         Product Desc      Product Key
Units
                                     Period Key
Price
                                     Units
                                     Price
                   Store Dimension

                   Store Key
                   Store Name
                   City
                   State
                   Region
Fact Constellation
    Sales                              Shipping
    Fact Table     Product
                   Dimension           Fact Table
Store Key
                                     Shipper Key
Product Key        Product Key       Store Key
Period Key         Product Desc      Product Key
Units
                                     Period Key
Price
                                     Units
                                     Price
                   Store Dimension

                   Store Key
                   Store Name
                   City
                   State
                   Region
Building Data Warehouse
 Data Selection
 Data Preprocessing
  – Fill missing values
  – Remove inconsistency
 Data Transformation & Integration
 Data Loading
 Data in warehouse is stored in form of fact tables
 and dimension tables.
Case Study
  Unilever is a new company which produces
  soaps,paste and baverages products with
  production unit located at NA.
 There products are sold in North,North West and
  Western region of India.
 They have sales units at India, America , UK and
  Japan.
 The President of the company wants sales
  information.
Sales Information

Report: The number of units sold.

113


Report: The number of units sold over time and date


 January       February       March         April
 14            41             33            25
Sales Information
Report : The number of items sold for each product with
time

             Jan Feb Mar Apr
Soaps                 6    17

Paste        6   16   6    8




                                        Time
bread        8   25   21

                                                   Product
Sales Information
Report: The number of items sold in each country for each
product with time
                     Jan   Feb Mar   Apr
India    Soaps                  3    10              City

         Paste       3     16   6
         bread       4     16   6




                                              Time
UK       soaps                  3    7

         paste       3               8                      Product
         bread       4     9    15
Sales Information
Report: The number of items sold and income in each region for
each product with time.
                     Jan        Feb          Mar          Apr
                     Rs     U   Rs      U    Rs      U    Rs      U

 India   Soaps                               7.44    3    24.80   10

         Paste       7.95   3   42.40   16   15.90   6

         bread       7.32   4   29.98   16   10.98   6

 UK      Soaps                               7.44    3    17.36   7

         paste       7.95   3                             21.20   8

         bread       7.32   4   16.47   9    27.45   15
Sales Measures & Dimensions

 Measure – Units sold, Amount.
 Dimensions – Product,Time,Region.
Sales Data Warehouse Model
Fact Table
Country      Product   Month      Units   Rupees
India        Soaps     January    3       7.95
India        Paste     January    4       7.32
UK           Soaps     January    3       7.95
UK           Paste     January    4       7.32
India        Bread     February   16      42.40
Sales Data Warehouse Model

  City_ID Prod_ID   Month      Units   Rupees
  1       589       1/1/1998   3       7.95
  1       1218      1/1/1998   4       7.32
  2       589       1/1/1998   3       7.95
  2       1218      1/1/1998   4       7.32
  1       589       2/1/1998   16      42.40
Sales Data Warehouse Model
Product Dimension Tables
  Prod_ID     Product_Name        Product_Category_ID
  589         Soaps               1
  590         Paste               1
  288         Bread               2


  Product_Category_Id Product_Category
  1                   domestic
  2                   food
Sales Data Warehouse Model
Region Dimension Table

City_ID       City       Region      Country

1             India      West        India
2             UK         NorthWest   India
Sales Data Warehouse Model
   Time




                                 Product
          Sales Fact   Product
                                 Category




  Region
Data Warehousing includes
 Build Data Warehouse
 Online analysis processing(OLAP).
 Presentation.

            Cleaning ,Selection &
            Integration

RDBMS                                                  Presentation




Flat File
                                                                      Client
                             Warehouse & OLAP server
Thank You

Mais conteúdo relacionado

Mais procurados

British American Tobacco in Bangladesh
British American Tobacco in BangladeshBritish American Tobacco in Bangladesh
British American Tobacco in Bangladeshalamgir_ hossain
 
British American Tobacco, PLC - EQUITY REPORT 2.0
British American Tobacco, PLC - EQUITY REPORT 2.0British American Tobacco, PLC - EQUITY REPORT 2.0
British American Tobacco, PLC - EQUITY REPORT 2.0Teddy Krejci
 
Industry Analysis Media and Entertainment Industry
Industry Analysis Media and Entertainment IndustryIndustry Analysis Media and Entertainment Industry
Industry Analysis Media and Entertainment Industryankitasaxena03
 
Porter’s 5 forces model case apple inc
Porter’s 5 forces model  case apple incPorter’s 5 forces model  case apple inc
Porter’s 5 forces model case apple incJonty Mohta
 
Zong presentation By Zebi Butt
Zong presentation By Zebi ButtZong presentation By Zebi Butt
Zong presentation By Zebi ButtZebi-Butt
 
"The World's Most Valuable Whiteboard Session" by Steve Jobs
"The World's Most Valuable Whiteboard Session" by Steve Jobs"The World's Most Valuable Whiteboard Session" by Steve Jobs
"The World's Most Valuable Whiteboard Session" by Steve JobsJeremy Waite
 
Pepsico Egypt presentation
Pepsico Egypt presentationPepsico Egypt presentation
Pepsico Egypt presentationihab tarek
 
Samsung Electronics acquires Harman International
Samsung Electronics acquires Harman InternationalSamsung Electronics acquires Harman International
Samsung Electronics acquires Harman InternationalDeepankar Khare
 
Marketing mix of apple
Marketing mix of appleMarketing mix of apple
Marketing mix of appleMohit Malviya
 
Strategic Management Presentation - Apple Inc.
Strategic Management Presentation - Apple Inc.Strategic Management Presentation - Apple Inc.
Strategic Management Presentation - Apple Inc.Colby Nelson
 
SAMSUNG Business Strategies
SAMSUNG Business StrategiesSAMSUNG Business Strategies
SAMSUNG Business StrategiesMd Nazmul Haque
 
Design Thinking and Innovation at Apple Inc
Design Thinking and Innovation at Apple IncDesign Thinking and Innovation at Apple Inc
Design Thinking and Innovation at Apple IncSHREYANSH VATS
 

Mais procurados (20)

British American Tobacco in Bangladesh
British American Tobacco in BangladeshBritish American Tobacco in Bangladesh
British American Tobacco in Bangladesh
 
British American Tobacco, PLC - EQUITY REPORT 2.0
British American Tobacco, PLC - EQUITY REPORT 2.0British American Tobacco, PLC - EQUITY REPORT 2.0
British American Tobacco, PLC - EQUITY REPORT 2.0
 
Youbora presentation
Youbora presentationYoubora presentation
Youbora presentation
 
Industry Analysis Media and Entertainment Industry
Industry Analysis Media and Entertainment IndustryIndustry Analysis Media and Entertainment Industry
Industry Analysis Media and Entertainment Industry
 
Porter’s 5 forces model case apple inc
Porter’s 5 forces model  case apple incPorter’s 5 forces model  case apple inc
Porter’s 5 forces model case apple inc
 
apple inc
apple incapple inc
apple inc
 
Zong presentation By Zebi Butt
Zong presentation By Zebi ButtZong presentation By Zebi Butt
Zong presentation By Zebi Butt
 
Apple Inc
Apple Inc Apple Inc
Apple Inc
 
PRAN RFL group
PRAN RFL groupPRAN RFL group
PRAN RFL group
 
"The World's Most Valuable Whiteboard Session" by Steve Jobs
"The World's Most Valuable Whiteboard Session" by Steve Jobs"The World's Most Valuable Whiteboard Session" by Steve Jobs
"The World's Most Valuable Whiteboard Session" by Steve Jobs
 
Pepsico Egypt presentation
Pepsico Egypt presentationPepsico Egypt presentation
Pepsico Egypt presentation
 
Samsung Electronics acquires Harman International
Samsung Electronics acquires Harman InternationalSamsung Electronics acquires Harman International
Samsung Electronics acquires Harman International
 
Marketing mix of apple
Marketing mix of appleMarketing mix of apple
Marketing mix of apple
 
Strategic Management Presentation - Apple Inc.
Strategic Management Presentation - Apple Inc.Strategic Management Presentation - Apple Inc.
Strategic Management Presentation - Apple Inc.
 
Canon
CanonCanon
Canon
 
SAMSUNG Business Strategies
SAMSUNG Business StrategiesSAMSUNG Business Strategies
SAMSUNG Business Strategies
 
Presentation on SONY
Presentation on SONYPresentation on SONY
Presentation on SONY
 
Presentation1
Presentation1Presentation1
Presentation1
 
Design Thinking and Innovation at Apple Inc
Design Thinking and Innovation at Apple IncDesign Thinking and Innovation at Apple Inc
Design Thinking and Innovation at Apple Inc
 
Moroccan or Thai market... apple next step with IWatch
Moroccan or Thai market... apple next step with IWatchMoroccan or Thai market... apple next step with IWatch
Moroccan or Thai market... apple next step with IWatch
 

Semelhante a Data warehousing

Kishore jaladi-dw
Kishore jaladi-dwKishore jaladi-dw
Kishore jaladi-dwsam2sung2
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasiryasir873
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional ModellingAshish Chandwani
 
The Business Value of Business Intelligence
The Business Value of Business IntelligenceThe Business Value of Business Intelligence
The Business Value of Business IntelligenceSenturus
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overviewashok kumar
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehousekiran14360
 
Optimizing Shopper Insights For Grocery Retailers
Optimizing Shopper Insights For Grocery RetailersOptimizing Shopper Insights For Grocery Retailers
Optimizing Shopper Insights For Grocery RetailersG3 Communications
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPDhiren Gala
 
Measure Data Quality
Measure Data QualityMeasure Data Quality
Measure Data QualityZavalaJV
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)tafosepsdfasg
 
Tirta ERP - Business Intelligence Layer
Tirta ERP - Business Intelligence LayerTirta ERP - Business Intelligence Layer
Tirta ERP - Business Intelligence LayerWildan Maulana
 
Intro to datawarehouse dev 1.0
Intro to datawarehouse   dev 1.0Intro to datawarehouse   dev 1.0
Intro to datawarehouse dev 1.0Jannet Peetz
 
Tomas mis eng
Tomas mis engTomas mis eng
Tomas mis engtomasdse
 
Micro strategy Reporting Suite
Micro strategy Reporting SuiteMicro strategy Reporting Suite
Micro strategy Reporting SuiteClassic Polo
 
Significant Profit Increase during economic slowdown through TOC Implementation
Significant Profit Increase during economic slowdown through TOC ImplementationSignificant Profit Increase during economic slowdown through TOC Implementation
Significant Profit Increase during economic slowdown through TOC ImplementationAvenirManagement
 
Data warehouse project on retail store
Data warehouse project on retail storeData warehouse project on retail store
Data warehouse project on retail storeSiddharth Chaudhary
 

Semelhante a Data warehousing (20)

Kishore jaladi-dw
Kishore jaladi-dwKishore jaladi-dw
Kishore jaladi-dw
 
Introduction To Msbi By Yasir
Introduction To Msbi By YasirIntroduction To Msbi By Yasir
Introduction To Msbi By Yasir
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
 
The Business Value of Business Intelligence
The Business Value of Business IntelligenceThe Business Value of Business Intelligence
The Business Value of Business Intelligence
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
Optimizing Shopper Insights For Grocery Retailers
Optimizing Shopper Insights For Grocery RetailersOptimizing Shopper Insights For Grocery Retailers
Optimizing Shopper Insights For Grocery Retailers
 
Become BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAPBecome BI Architect with 1KEY Agile BI Suite - OLAP
Become BI Architect with 1KEY Agile BI Suite - OLAP
 
Data ware housing- Introduction to olap .
Data ware housing- Introduction to  olap .Data ware housing- Introduction to  olap .
Data ware housing- Introduction to olap .
 
Measure Data Quality
Measure Data QualityMeasure Data Quality
Measure Data Quality
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
 
Tirta ERP - Business Intelligence Layer
Tirta ERP - Business Intelligence LayerTirta ERP - Business Intelligence Layer
Tirta ERP - Business Intelligence Layer
 
02 Essbase
02 Essbase02 Essbase
02 Essbase
 
Intro to datawarehouse dev 1.0
Intro to datawarehouse   dev 1.0Intro to datawarehouse   dev 1.0
Intro to datawarehouse dev 1.0
 
Tomas mis eng
Tomas mis engTomas mis eng
Tomas mis eng
 
Micro strategy Reporting Suite
Micro strategy Reporting SuiteMicro strategy Reporting Suite
Micro strategy Reporting Suite
 
Significant Profit Increase during economic slowdown through TOC Implementation
Significant Profit Increase during economic slowdown through TOC ImplementationSignificant Profit Increase during economic slowdown through TOC Implementation
Significant Profit Increase during economic slowdown through TOC Implementation
 
Data warehouse project on retail store
Data warehouse project on retail storeData warehouse project on retail store
Data warehouse project on retail store
 
Dominick’s finer foods
Dominick’s finer foodsDominick’s finer foods
Dominick’s finer foods
 
Dominick’s finer foods
Dominick’s finer foodsDominick’s finer foods
Dominick’s finer foods
 

Último

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Data warehousing

  • 1. Introduction to Data Warehousing By MAHESH.AMPOLU
  • 2. Necessity is the mother of invention Why Data Warehouse?
  • 3. Scenario Unilever is a company with branches at UK, India, America and Japan. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.
  • 4. Scenario 1 : Unilever company. India UK Sales per item type per branch Sales for first quarter. Manager America Japan
  • 5. Solution : Unilever company.  Extract sales information from each database.  Store the information in a common repository at a single site.
  • 6. Solution : Unilever company. India Report UK Query & Sales Data Analysis tools Manager Warehouse America Japan
  • 7. Scenario : Hindustan Unilever is a small,new company. President of the company wants his company should grow. He needs information so that he can make correct decisions.
  • 8. Solution :  Improve the quality of data before loading it into the warehouse.  Perform data cleaning and transformation before loading the data.  Use query analysis tools to support adhoc queries.
  • 9. Solution Expansio n sales Data Query and Analysis President Warehouse tool time Improvemen t
  • 10. What is Data Warehouse??
  • 11. Inmons’s definition : A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of management’s decision making process.
  • 12. Subject-oriented  Data warehouse is organized around subjects such as sales,product,customer.  It focuses on modeling and analysis of data for decision makers.  Excludes data not useful in decision support process.
  • 13. Integration  Data Warehouse is constructed by integrating multiple heterogeneous sources.  Data Preprocessing are applied to ensure consistency. RDBMS Data Legacy Warehouse System Flat File Data Processing Data Transformation
  • 14. Integration  In terms of data. – encoding structures. – Measurement of attributes. – physical attribute. of data remarks – naming conventions. – Data type format
  • 15. Time-variant  Provides information from historical perspective e.g. past 5-10 years  Every key structure contains either implicitly or explicitly an element of time
  • 16. Nonvolatile  Data once recorded cannot be updated.  Data warehouse requires two operations in data accessing – Initial loading of data – Access of data load access
  • 18. Data Warehouse Architecture  Data Warehouse server – almost always a relational DBMS,rarely flat files  OLAP servers – to support and operate on multi-dimensional data structures  Clients – Query and reporting tools – Analysis tools – Data mining tools
  • 20. Data Warehouse Schema  Star Schema  Fact Constellation Schema  Snowflake Schema
  • 21. Star Schema A star schema consists of at least one fact table and a number of dimension tables.  Star Schema is highly recommended schema for SSAS cubes.
  • 22. Star Schema Store Dimension Fact Table Time Dimension Store Key Store Key Period Key Store Name Product Key Year City Period Key Quarter State Units Month Region Price Product Key Product Desc Product Dimension Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins.
  • 23.
  • 24. SnowFlake Schema  Variant of star schema model.  A single,large and central fact table and one or more tables for each dimension.  Dimension tables are normalized i.e. split dimension table data into additional tables
  • 25. SnowFlake Schema Store Dimension Fact Table Time Dimension Store Key Period Key Store Key Product Key Year Store Name Period Key Quarter City Key Units Month Price City Dimension City Key Product Key City Product Desc State Region Product Dimension Drawbacks: Time consuming joins,report generation slow
  • 26. Fact Constellation  Multiple fact tables share dimension tables.  This schema is viewed as collection of stars hence called galaxy schema or fact constellation.  Sophisticated application requires such schema.
  • 27. Fact Constellation Sales Shipping Fact Table Product Dimension Fact Table Store Key Shipper Key Product Key Product Key Store Key Period Key Product Desc Product Key Units Period Key Price Units Price Store Dimension Store Key Store Name City State Region
  • 28. Fact Constellation Sales Shipping Fact Table Product Dimension Fact Table Store Key Shipper Key Product Key Product Key Store Key Period Key Product Desc Product Key Units Period Key Price Units Price Store Dimension Store Key Store Name City State Region
  • 29. Building Data Warehouse  Data Selection  Data Preprocessing – Fill missing values – Remove inconsistency  Data Transformation & Integration  Data Loading Data in warehouse is stored in form of fact tables and dimension tables.
  • 30. Case Study  Unilever is a new company which produces soaps,paste and baverages products with production unit located at NA.  There products are sold in North,North West and Western region of India.  They have sales units at India, America , UK and Japan.  The President of the company wants sales information.
  • 31. Sales Information Report: The number of units sold. 113 Report: The number of units sold over time and date January February March April 14 41 33 25
  • 32. Sales Information Report : The number of items sold for each product with time Jan Feb Mar Apr Soaps 6 17 Paste 6 16 6 8 Time bread 8 25 21 Product
  • 33. Sales Information Report: The number of items sold in each country for each product with time Jan Feb Mar Apr India Soaps 3 10 City Paste 3 16 6 bread 4 16 6 Time UK soaps 3 7 paste 3 8 Product bread 4 9 15
  • 34. Sales Information Report: The number of items sold and income in each region for each product with time. Jan Feb Mar Apr Rs U Rs U Rs U Rs U India Soaps 7.44 3 24.80 10 Paste 7.95 3 42.40 16 15.90 6 bread 7.32 4 29.98 16 10.98 6 UK Soaps 7.44 3 17.36 7 paste 7.95 3 21.20 8 bread 7.32 4 16.47 9 27.45 15
  • 35. Sales Measures & Dimensions  Measure – Units sold, Amount.  Dimensions – Product,Time,Region.
  • 36. Sales Data Warehouse Model Fact Table Country Product Month Units Rupees India Soaps January 3 7.95 India Paste January 4 7.32 UK Soaps January 3 7.95 UK Paste January 4 7.32 India Bread February 16 42.40
  • 37. Sales Data Warehouse Model City_ID Prod_ID Month Units Rupees 1 589 1/1/1998 3 7.95 1 1218 1/1/1998 4 7.32 2 589 1/1/1998 3 7.95 2 1218 1/1/1998 4 7.32 1 589 2/1/1998 16 42.40
  • 38. Sales Data Warehouse Model Product Dimension Tables Prod_ID Product_Name Product_Category_ID 589 Soaps 1 590 Paste 1 288 Bread 2 Product_Category_Id Product_Category 1 domestic 2 food
  • 39. Sales Data Warehouse Model Region Dimension Table City_ID City Region Country 1 India West India 2 UK NorthWest India
  • 40. Sales Data Warehouse Model Time Product Sales Fact Product Category Region
  • 41. Data Warehousing includes  Build Data Warehouse  Online analysis processing(OLAP).  Presentation. Cleaning ,Selection & Integration RDBMS Presentation Flat File Client Warehouse & OLAP server

Notas do Editor

  1. we invent something only if there is a need for that thing….today we are going to see what data warehousing is…data warehouse is evolved to satisfy some needs….we will see some of these need now
  2. When we need to extract data from various sources, some may be manually maintained on paper, some on different legacy systems and integrating the data is a laborious work. Many systems provide some DTS systems to convert data in appropriate format and provide necessary transformations
  3. We need subject oriented and multidimensional data amodel fro data warehouse which facilitates online analysis