SlideShare uma empresa Scribd logo
1 de 31
Baixar para ler offline
Optimizing the Design of your Data
Warehouse
Michael Wacey
CSC
mwacey@csc.com
Introduction

• Who am I?
  – Michael Wacey
  – Partner with CSC since 1986
  – Architected many large scale data warehouses
• What are we going to discuss today?
  – Motivation
  – Tools
  – Approach




  PAGE 2
Motivation

• Data Here, Data There, Data Everywhere
• Solutions
  – Architecture – the SAP approach – very hard to sustain and SAP can not solve all
    problems
  – Data Integration – requires architecture on the boundaries and infrastructure, lots of
    infrastructure
  – Data Warehouse – Periodically collect the data and bring it all together for one or
    more purposes – the best bet for the foreseeable future
• Solutions are always trying to answer - How do we get this data to
  fit together?



  PAGE 3
Motivation

• Making data fit together is difficult
  – Local countries report numbers in their local (possibly multiple) currencies and there
    is no agreed to set of conversion rates
  – The Trust department would rather not share that data with finance
  – The current policy administration system has serious data quality issues, but there is
    a new system being built and scheduled to go online in June 2011, but that date may
    be in jeopardy
• We need a way to collect and analyze all this knowledge about the
  data




  PAGE 4
Motivation

• A high level view:                               Customer
                                                  Profitability

    Accounting


            Sales          Data Warehouse               Sales
                                                      Forecasts


           Marketing


• May help with scoping
• Each line could represent many files or feeds
• Each box could represent many applications
  PAGE 5
Motivation

• A detailed view:
           BEGIN
             SELECT ml.sequence, al.sequence, m.msgkey INTO mseq, aseq, mkey
              FROM mqseries.levelcodes ml, mqseries.messages m, mqseries.appctl a, mqseries.levelcodes al
              WHERE m.msglevel = ml.levelcodekey
              AND m.msgcode = inmsgcode
              AND a.msglevel = al.levelcodekey
              AND a.appctlkey = 1;
             IF sql%ROWCOUNT = 1 THEN
               IF aseq <= mseq THEN
                               SELECT statuscodekey INTO sck FROM mqseries.statuscodes WHERE
           statuscode = 'n';
                               insert into mqseries.msglog (msglogkey, msgkey, msgdata, msgstatus,
           msgsqlcode, msgsqlerrm)
                                 values(mqseries.msgseq.nextval, mkey, inmsgdata, sck, inmsgsqlcode,
           SUBSTR(inmsgsqlerrm,1,4000));
                              IF incommit = true THEN
                                 commit;
                              END IF;
               END IF;
             ELSE




• Too much detail to plan and analyze and understand
• As usual, we have a forest and trees problem


  PAGE 6
Motivation

• What to do?
  – PowerPoint?
  – Visio?
  – ERwin?
• They all help, but none gives us that right picture
• We need a way to see the problem and the solution at the right
  level of detail




  PAGE 7
Motivation

• What is a data warehouse?
• It includes:
   – Sources of data
   – Processing of data
   – Storage of data – probably multiple times in different structures
   – Analytics
• Except for Analytics, these are either static views of data or
  dynamic processing of data
• ERwin DM is great for the static views of data, we just need to
  capture the dynamic processing


  PAGE 8
Motivation

• I have used many techniques to capture the dynamic processing
• Spreadsheets to capture data mapping (who hasn’t)
• Process flow diagrams in PowerPoint and Visio
• UML Diagrams in the IBM and Sparx tools
• They all worked to an extent but were hard to maintain and did not
  provide a leveling mechanism




  PAGE 9
Motivation

• Many years ago, I had used Data Flow Diagrams to describe
  systems under development
• They provided insight into the flow of data and leveling of those
  processes
• So, I tried that – first in Visio and later in ERwin PM
• The rest of this talk is an approach to using ERwin DM and ERwin
  PM together to model a Data Warehouse
• I have used this approach for the past five years and find it is very
  successful
• It provides information to both the user community and
  developers
  PAGE 10
The Tools

• ERwin Data Modeler
  – Used to model databases
  – Supports both Logical and Physical models
  – If needed, I create conceptual models in PowerPoint or Visio
  – Each model has to represent one type of database
  – But, data warehouses use many – Flat Files, Oracle, SQL Server, Cubes, etc
  – I use UDP to represent the actual type of an Entity/Table
  – For example, a table that represents a flat file would have that setting in a UDP




  PAGE 11
The Tools

• ERwin Process Modeler (ERwin PM)
  – Previously called BPwin
  – Supports several diagram types
  – I have only found the Data Flow diagrams useful for the design of a data warehouse
  – The other diagrams could be used in analysis to understand how the data warehouse
    will be used




  PAGE 12
The Tools

• ERwin DM and ERwin PM
• There is a connection between the tools
• I have not used it extensively




  PAGE 13
The Tools

• Other Tools
  – These are minor but needed
  – PDF Viewer
  – Microsoft Excel
  – Microsoft Word




  PAGE 14
The Approach

• So, we have two tools to design a data warehouse
• ERwin DM will be used to design and document static data stores
• ERwin PM will be used to design the processing
• Lets take a look at an example and then discuss how it works




  PAGE 15
The Approach

• Start in ERwin PM
• Create a new model that is a data flow model
• First we will create a context model
• This will provide a view of the sources and uses of data
• On the left side, the sources of data are listed – using the external
  entity symbol
  – Sources can be Systems, Databases, People, etc.
• On the right hand side, the uses of data are listed – using the
  external entity symbol
  – Uses can be reports, cubes, analytics, data feeds, etc.


  PAGE 16
The Approach
 E1                    Allocation                                                                Exception        E11
      Allocation       Factors                                                                   Report             Exception
       Factors                                                                                   Data                 Report

                 Demand
 E2              Deposit D ata                                                                                    E12
  Demand Deposit                             $0                                  A0
                                                                                                                    Balancing
    Accounts                                                                                                          Report

                           Cons um er Loan                                            Balancing Report Data
 E3                        Data
  Cons um er Loans


 E4                        Mortgage Data
      Mortgages                                                                                                   E13
                                                       Cus tomer Profitability                                     Comm ercial
                   Comm ercial Loan                                                                                 Cus tomer
 E5                Data                                                                         Comm ercial
                                                                                                                    Analytics
 Comm erical Loans                                                                              Cus tomer Data
                         Treas ury Data


 E6
        Treas ury                                                                                                 E14
                                                                                                                       Retail
                                                                                                   Retail Cus tomer   Cus tomer
 E7                                                                                                Data               Analytics
  Trus t Accounts    Trus t Data



 E8                   Organization Data
      Organization


 E9
                         General Ledger
  General Ledger
                         Data


NODE:                              TITLE:                                                 NUMBER:
                                                  Customer Profitability
            A-0
   PAGE 17
The Approach

• The Context Diagram is a good start
• It sets the scope
• But does not provide any details about what is going to be done
• This comes in the next diagram – The details of the central process




  PAGE 18
The Approach

                                                                                                                      $0                                 A3
                                                                                           D3 Exceptions                                                       Exception Report
                                                                                                                                 Exception Output              Data
                               Source Exceptions


  Allocation
  Factors            $0              A1                                                                Calculation
                                                                                                       Exceptions
                                                                                                                               Cus tomer                 Cus tomer
  Comm ercial                                                                                                                                            Profitability
  Loan Data                               Validated                                  $0                                     A2 Profitability        D2
                                                                                                                               Data                         Data
                                          Dim ens ion                                                                                                    Warehous e
                                                                                          Cus tomer Profitability Calculation
                                          Data
                                                                          Dim ens ion
  Mortgage Data                                                           Data for                                                                            Comm ercial
                                                       Cus tomer          Calculation                                                     Retial BI Data      BI Data
  Cons um er                                        D1 Profitability
  Loan Data                               Validated     Staging                       Fact Data                                           $0         A4
                                          Fact Data                                   for
  Demand                  Sourcing
                                                                                      Calculation
  Deposit D ata
                                                                                                                                                                 Comm ercial
                                                                                   Calculation Balance                                    Comm ercial BI         Cus tomer Data
 Organization Data                                                                 Values
    Trus t Data                                                                                            Comm ercial
                                                                                                           Balancing Data
    Treas ury Data                                                                                                   $0            A5
 General Ledger
 Data                                                                                                                                                           Retail Cus tomer
                                                                                             Retail Balancing Data        Retail BI                             Data


                             Input Balance
                             Values                                                                                        $0                              A6
                                                                       Balancing                                                                                    Balancing
                                                                  D4
                                                                        Values                                                  Balance Input and Output            Report Data



NODE:                            TITLE:                                                                                                 NUMBER:
                                                               Customer Profitability
           A0
  PAGE 19
The Approach

• This level one diagram shows all the key components of the
  solution.
• There is no magic formula of should be included here
• There needs to at least be some sort of sourcing, processing, and
  display/output activities
• In this case, there one source processing, one calculation, and four
  output activities
• Each can be broken down into more details
• Lets look at the Commercial BI Activity



  PAGE 20
The Approach




                                   Data for
   Comm ercial$0              A4.1 Cube
   BI Data                         Out         Comm ercial Data for C ube $0             A4.3               $0                          A4.6
              Load Commercial Cube         D16 Profitability In                                 Data for
                                                  Cube
                                                                               Cube Provider    Reporting   Comm erical Profitability Reporting




 Comm ercial
 Balancing Data                                                                                                                          Comm ercial
                                                                                                                                         Cus tomer Data




 NODE:                      TITLE:                                                                               NUMBER:
                                                              Commercial BI
           A4
 PAGE 21
The Approach

• This decomposition can continue until you are comfortable
• I try to get to the point where one developer can implement it in
  one module
• At this point, we will have a series of diagrams that show the flow
  of data through the system
• The diagrams contain:
  – Activities
  – Data Stores (note that a single data store can be used on multiple diagrams)
  – Data Flows
  – External Entities



  PAGE 22
The Approach

• Each of the diagram elements, except for the Data Flows, can be
  further modeled in ERwin DM
• This gives the developer a further level of detail of what is intended
• It also provides the physical names that will be used
• To maintain the mapping between the models, I use a naming
  convention for ERwin DM Subject Areas
• The convention is:
  – A01.01.01 – {Activity Name}
  – D01 – {Data Store Name}
  – E01 – {External Name}



  PAGE 23
The Approach

• Some examples for External Entities and Data Stores from the
  model above:
  – D01 – Customer Profitability Staging
  – E05 – Commercial Loans
• Each of these subject areas should have the portion of the data
  model relevant to it
• Note that these are just typical ER models
• They can represent more than just table – for example, an external
  entity could be a flat file
• Below is an example – the E05 – Commercial Loans external entity


  PAGE 24
The Approach




 PAGE 25
The Approach

• Next we need to look at the activities
• Because activities have a hierarchical numbering system, we need
  one for the subject areas
• We simply start with A and separate each level with a period
• Combine Retail Loans from the model above is in Activity 7 inside
  of Activity 2. It is called A2.7 Combine Retail Loans in the model.
• The associated subject area will be:
  – A02.07 – Combine Retail Loans
• The data model will show the input and out put entities and how
  they are processed

  PAGE 26
The Approach




 PAGE 27
The Approach

• With the Diagrams from ERwin DM, ERwin PM, and the narrative in
  ERwin PM, the developer has all the information they need to
  implement a portion of the solution
• The diagrams and narratives are also accessible to technical users
• Twice, I have had the user community write papers to explain the
  details of specific areas of the ERwin PM model




  PAGE 28
The Approach

• Notes
  – Using ERwin DM we can quickly build detailed reports with diagrams and
    descriptions
  – The developers use these reports to track what they have to do
  – The Project Managers use these reports as an inventory for project planning
  – The ERwin PM reports are like a roadmap that ties everything together
  – It takes some effort to keep everything synchronized but it is well worth it




  PAGE 29
The Approach

• In Summary
  – A data warehouse is very much a store of data and a flow of data
  – ERwin DM and ERwin PM can model both of these areas
  – Use ERwin PM to decompose the solution
       • There is no right or best decomposition
       • Try it until it works
  – Use ERwin DM to model the internals of External Entities, Data Stores, and Activities
       • Tie the two models together through an appropriate naming convention
       • Do not worry if the entities model more than tables
  – The goal is to communicate with users and developers




  PAGE 30
Questions?




 PAGE 31

Mais conteúdo relacionado

Semelhante a Optimizing the design of your data warehouse 09222010

A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationKate Subramanian
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfAmmarAhmedSiddiqui2
 
Data Warehouse techniques on Intermediate Census and Demographic Statistics W...
Data Warehouse techniques on Intermediate Census and Demographic Statistics W...Data Warehouse techniques on Intermediate Census and Demographic Statistics W...
Data Warehouse techniques on Intermediate Census and Demographic Statistics W...Vincenzo Patruno
 
DB2 Web Query whats new
DB2 Web Query whats newDB2 Web Query whats new
DB2 Web Query whats newCOMMON Europe
 
New Business Intelligence
New Business IntelligenceNew Business Intelligence
New Business IntelligenceMAIA_1KEY
 
21MBA7078_Utkarshkumar_102.pptx
21MBA7078_Utkarshkumar_102.pptx21MBA7078_Utkarshkumar_102.pptx
21MBA7078_Utkarshkumar_102.pptxSanjay Meena
 
The Marriage of Finance and Operational Analytics
The Marriage of Finance and Operational Analytics The Marriage of Finance and Operational Analytics
The Marriage of Finance and Operational Analytics Emtec Inc.
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project reportsonalighai
 
Infogix BCBS 239 Implementation Challenges
Infogix BCBS 239 Implementation ChallengesInfogix BCBS 239 Implementation Challenges
Infogix BCBS 239 Implementation ChallengesMichelle Genser
 
Mastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligenceMastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligenceEdureka!
 
Why do Data Warehousing & Business Intelligence go hand in hand?
Why do Data Warehousing & Business Intelligence go hand in hand? Why do Data Warehousing & Business Intelligence go hand in hand?
Why do Data Warehousing & Business Intelligence go hand in hand? Vineet Chaturvedi
 
Sfdc user group good data012712(1)
Sfdc user group good data012712(1)Sfdc user group good data012712(1)
Sfdc user group good data012712(1)debm_madronasg
 

Semelhante a Optimizing the design of your data warehouse 09222010 (20)

Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
A Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence ApplicationA Data Warehouse And Business Intelligence Application
A Data Warehouse And Business Intelligence Application
 
Bi assignment
Bi assignmentBi assignment
Bi assignment
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdf
 
Data Warehouse techniques on Intermediate Census and Demographic Statistics W...
Data Warehouse techniques on Intermediate Census and Demographic Statistics W...Data Warehouse techniques on Intermediate Census and Demographic Statistics W...
Data Warehouse techniques on Intermediate Census and Demographic Statistics W...
 
CTAM Making Analytics Actionable RJA FINAL
CTAM Making Analytics Actionable RJA FINALCTAM Making Analytics Actionable RJA FINAL
CTAM Making Analytics Actionable RJA FINAL
 
DB2 Web Query whats new
DB2 Web Query whats newDB2 Web Query whats new
DB2 Web Query whats new
 
VenkatSubbaReddy_Resume
VenkatSubbaReddy_ResumeVenkatSubbaReddy_Resume
VenkatSubbaReddy_Resume
 
New Business Intelligence
New Business IntelligenceNew Business Intelligence
New Business Intelligence
 
21MBA7078_Utkarshkumar_102.pptx
21MBA7078_Utkarshkumar_102.pptx21MBA7078_Utkarshkumar_102.pptx
21MBA7078_Utkarshkumar_102.pptx
 
The Marriage of Finance and Operational Analytics
The Marriage of Finance and Operational Analytics The Marriage of Finance and Operational Analytics
The Marriage of Finance and Operational Analytics
 
Resume
ResumeResume
Resume
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
 
Infogix BCBS 239 Implementation Challenges
Infogix BCBS 239 Implementation ChallengesInfogix BCBS 239 Implementation Challenges
Infogix BCBS 239 Implementation Challenges
 
Mastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligenceMastering in data warehousing & BusinessIintelligence
Mastering in data warehousing & BusinessIintelligence
 
big-data-anallytics.pptx
big-data-anallytics.pptxbig-data-anallytics.pptx
big-data-anallytics.pptx
 
Why do Data Warehousing & Business Intelligence go hand in hand?
Why do Data Warehousing & Business Intelligence go hand in hand? Why do Data Warehousing & Business Intelligence go hand in hand?
Why do Data Warehousing & Business Intelligence go hand in hand?
 
Sfdc user group good data012712(1)
Sfdc user group good data012712(1)Sfdc user group good data012712(1)
Sfdc user group good data012712(1)
 
basis data 02.pptx
basis data 02.pptxbasis data 02.pptx
basis data 02.pptx
 
Complete unit ii notes
Complete unit ii notesComplete unit ii notes
Complete unit ii notes
 

Mais de ERwin Modeling

Zen of metadata 09212010
Zen of metadata 09212010Zen of metadata 09212010
Zen of metadata 09212010ERwin Modeling
 
Using ca e rwin modeling to asure data 09162010
Using ca e rwin modeling to asure data 09162010Using ca e rwin modeling to asure data 09162010
Using ca e rwin modeling to asure data 09162010ERwin Modeling
 
Staying relevant in todays changing dm environment 09282010
Staying relevant in todays changing dm environment 09282010Staying relevant in todays changing dm environment 09282010
Staying relevant in todays changing dm environment 09282010ERwin Modeling
 
Sneak peak ca e rwin data modeler r8 preview09222010
Sneak peak ca e rwin data modeler r8 preview09222010Sneak peak ca e rwin data modeler r8 preview09222010
Sneak peak ca e rwin data modeler r8 preview09222010ERwin Modeling
 
Monetizing data management 09162010
Monetizing data management 09162010Monetizing data management 09162010
Monetizing data management 09162010ERwin Modeling
 
Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010ERwin Modeling
 
Integrating data process a roundtrip modeling using e rwin data modeler_erwin...
Integrating data process a roundtrip modeling using e rwin data modeler_erwin...Integrating data process a roundtrip modeling using e rwin data modeler_erwin...
Integrating data process a roundtrip modeling using e rwin data modeler_erwin...ERwin Modeling
 
Effective capture of metadata using ca e rwin data modeler 09232010
Effective capture of metadata using ca e rwin data modeler 09232010Effective capture of metadata using ca e rwin data modeler 09232010
Effective capture of metadata using ca e rwin data modeler 09232010ERwin Modeling
 
Deciding to go cloud 09212010
Deciding to go cloud  09212010Deciding to go cloud  09212010
Deciding to go cloud 09212010ERwin Modeling
 
Data modeling for the business 09282010
Data modeling for the business  09282010Data modeling for the business  09282010
Data modeling for the business 09282010ERwin Modeling
 
Cust experience a practical guide 09152010
Cust experience a practical guide 09152010Cust experience a practical guide 09152010
Cust experience a practical guide 09152010ERwin Modeling
 
Creating enterprise standards 09302010
Creating enterprise standards 09302010Creating enterprise standards 09302010
Creating enterprise standards 09302010ERwin Modeling
 
Ca e rwin state of the union 09082010
Ca e rwin state of the union 09082010Ca e rwin state of the union 09082010
Ca e rwin state of the union 09082010ERwin Modeling
 
Ca e rwin modeling global user communities_09232010 - webcast
Ca e rwin modeling global user communities_09232010 - webcastCa e rwin modeling global user communities_09232010 - webcast
Ca e rwin modeling global user communities_09232010 - webcastERwin Modeling
 
10 things to avoid in data model 09242010
10 things to avoid in data model 0924201010 things to avoid in data model 09242010
10 things to avoid in data model 09242010ERwin Modeling
 
5 physical data modeling blunders 09092010
5 physical data modeling blunders 090920105 physical data modeling blunders 09092010
5 physical data modeling blunders 09092010ERwin Modeling
 

Mais de ERwin Modeling (16)

Zen of metadata 09212010
Zen of metadata 09212010Zen of metadata 09212010
Zen of metadata 09212010
 
Using ca e rwin modeling to asure data 09162010
Using ca e rwin modeling to asure data 09162010Using ca e rwin modeling to asure data 09162010
Using ca e rwin modeling to asure data 09162010
 
Staying relevant in todays changing dm environment 09282010
Staying relevant in todays changing dm environment 09282010Staying relevant in todays changing dm environment 09282010
Staying relevant in todays changing dm environment 09282010
 
Sneak peak ca e rwin data modeler r8 preview09222010
Sneak peak ca e rwin data modeler r8 preview09222010Sneak peak ca e rwin data modeler r8 preview09222010
Sneak peak ca e rwin data modeler r8 preview09222010
 
Monetizing data management 09162010
Monetizing data management 09162010Monetizing data management 09162010
Monetizing data management 09162010
 
Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010
 
Integrating data process a roundtrip modeling using e rwin data modeler_erwin...
Integrating data process a roundtrip modeling using e rwin data modeler_erwin...Integrating data process a roundtrip modeling using e rwin data modeler_erwin...
Integrating data process a roundtrip modeling using e rwin data modeler_erwin...
 
Effective capture of metadata using ca e rwin data modeler 09232010
Effective capture of metadata using ca e rwin data modeler 09232010Effective capture of metadata using ca e rwin data modeler 09232010
Effective capture of metadata using ca e rwin data modeler 09232010
 
Deciding to go cloud 09212010
Deciding to go cloud  09212010Deciding to go cloud  09212010
Deciding to go cloud 09212010
 
Data modeling for the business 09282010
Data modeling for the business  09282010Data modeling for the business  09282010
Data modeling for the business 09282010
 
Cust experience a practical guide 09152010
Cust experience a practical guide 09152010Cust experience a practical guide 09152010
Cust experience a practical guide 09152010
 
Creating enterprise standards 09302010
Creating enterprise standards 09302010Creating enterprise standards 09302010
Creating enterprise standards 09302010
 
Ca e rwin state of the union 09082010
Ca e rwin state of the union 09082010Ca e rwin state of the union 09082010
Ca e rwin state of the union 09082010
 
Ca e rwin modeling global user communities_09232010 - webcast
Ca e rwin modeling global user communities_09232010 - webcastCa e rwin modeling global user communities_09232010 - webcast
Ca e rwin modeling global user communities_09232010 - webcast
 
10 things to avoid in data model 09242010
10 things to avoid in data model 0924201010 things to avoid in data model 09242010
10 things to avoid in data model 09242010
 
5 physical data modeling blunders 09092010
5 physical data modeling blunders 090920105 physical data modeling blunders 09092010
5 physical data modeling blunders 09092010
 

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Último (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Optimizing the design of your data warehouse 09222010

  • 1. Optimizing the Design of your Data Warehouse Michael Wacey CSC mwacey@csc.com
  • 2. Introduction • Who am I? – Michael Wacey – Partner with CSC since 1986 – Architected many large scale data warehouses • What are we going to discuss today? – Motivation – Tools – Approach PAGE 2
  • 3. Motivation • Data Here, Data There, Data Everywhere • Solutions – Architecture – the SAP approach – very hard to sustain and SAP can not solve all problems – Data Integration – requires architecture on the boundaries and infrastructure, lots of infrastructure – Data Warehouse – Periodically collect the data and bring it all together for one or more purposes – the best bet for the foreseeable future • Solutions are always trying to answer - How do we get this data to fit together? PAGE 3
  • 4. Motivation • Making data fit together is difficult – Local countries report numbers in their local (possibly multiple) currencies and there is no agreed to set of conversion rates – The Trust department would rather not share that data with finance – The current policy administration system has serious data quality issues, but there is a new system being built and scheduled to go online in June 2011, but that date may be in jeopardy • We need a way to collect and analyze all this knowledge about the data PAGE 4
  • 5. Motivation • A high level view: Customer Profitability Accounting Sales Data Warehouse Sales Forecasts Marketing • May help with scoping • Each line could represent many files or feeds • Each box could represent many applications PAGE 5
  • 6. Motivation • A detailed view: BEGIN SELECT ml.sequence, al.sequence, m.msgkey INTO mseq, aseq, mkey FROM mqseries.levelcodes ml, mqseries.messages m, mqseries.appctl a, mqseries.levelcodes al WHERE m.msglevel = ml.levelcodekey AND m.msgcode = inmsgcode AND a.msglevel = al.levelcodekey AND a.appctlkey = 1; IF sql%ROWCOUNT = 1 THEN IF aseq <= mseq THEN SELECT statuscodekey INTO sck FROM mqseries.statuscodes WHERE statuscode = 'n'; insert into mqseries.msglog (msglogkey, msgkey, msgdata, msgstatus, msgsqlcode, msgsqlerrm) values(mqseries.msgseq.nextval, mkey, inmsgdata, sck, inmsgsqlcode, SUBSTR(inmsgsqlerrm,1,4000)); IF incommit = true THEN commit; END IF; END IF; ELSE • Too much detail to plan and analyze and understand • As usual, we have a forest and trees problem PAGE 6
  • 7. Motivation • What to do? – PowerPoint? – Visio? – ERwin? • They all help, but none gives us that right picture • We need a way to see the problem and the solution at the right level of detail PAGE 7
  • 8. Motivation • What is a data warehouse? • It includes: – Sources of data – Processing of data – Storage of data – probably multiple times in different structures – Analytics • Except for Analytics, these are either static views of data or dynamic processing of data • ERwin DM is great for the static views of data, we just need to capture the dynamic processing PAGE 8
  • 9. Motivation • I have used many techniques to capture the dynamic processing • Spreadsheets to capture data mapping (who hasn’t) • Process flow diagrams in PowerPoint and Visio • UML Diagrams in the IBM and Sparx tools • They all worked to an extent but were hard to maintain and did not provide a leveling mechanism PAGE 9
  • 10. Motivation • Many years ago, I had used Data Flow Diagrams to describe systems under development • They provided insight into the flow of data and leveling of those processes • So, I tried that – first in Visio and later in ERwin PM • The rest of this talk is an approach to using ERwin DM and ERwin PM together to model a Data Warehouse • I have used this approach for the past five years and find it is very successful • It provides information to both the user community and developers PAGE 10
  • 11. The Tools • ERwin Data Modeler – Used to model databases – Supports both Logical and Physical models – If needed, I create conceptual models in PowerPoint or Visio – Each model has to represent one type of database – But, data warehouses use many – Flat Files, Oracle, SQL Server, Cubes, etc – I use UDP to represent the actual type of an Entity/Table – For example, a table that represents a flat file would have that setting in a UDP PAGE 11
  • 12. The Tools • ERwin Process Modeler (ERwin PM) – Previously called BPwin – Supports several diagram types – I have only found the Data Flow diagrams useful for the design of a data warehouse – The other diagrams could be used in analysis to understand how the data warehouse will be used PAGE 12
  • 13. The Tools • ERwin DM and ERwin PM • There is a connection between the tools • I have not used it extensively PAGE 13
  • 14. The Tools • Other Tools – These are minor but needed – PDF Viewer – Microsoft Excel – Microsoft Word PAGE 14
  • 15. The Approach • So, we have two tools to design a data warehouse • ERwin DM will be used to design and document static data stores • ERwin PM will be used to design the processing • Lets take a look at an example and then discuss how it works PAGE 15
  • 16. The Approach • Start in ERwin PM • Create a new model that is a data flow model • First we will create a context model • This will provide a view of the sources and uses of data • On the left side, the sources of data are listed – using the external entity symbol – Sources can be Systems, Databases, People, etc. • On the right hand side, the uses of data are listed – using the external entity symbol – Uses can be reports, cubes, analytics, data feeds, etc. PAGE 16
  • 17. The Approach E1 Allocation Exception E11 Allocation Factors Report Exception Factors Data Report Demand E2 Deposit D ata E12 Demand Deposit $0 A0 Balancing Accounts Report Cons um er Loan Balancing Report Data E3 Data Cons um er Loans E4 Mortgage Data Mortgages E13 Cus tomer Profitability Comm ercial Comm ercial Loan Cus tomer E5 Data Comm ercial Analytics Comm erical Loans Cus tomer Data Treas ury Data E6 Treas ury E14 Retail Retail Cus tomer Cus tomer E7 Data Analytics Trus t Accounts Trus t Data E8 Organization Data Organization E9 General Ledger General Ledger Data NODE: TITLE: NUMBER: Customer Profitability A-0 PAGE 17
  • 18. The Approach • The Context Diagram is a good start • It sets the scope • But does not provide any details about what is going to be done • This comes in the next diagram – The details of the central process PAGE 18
  • 19. The Approach $0 A3 D3 Exceptions Exception Report Exception Output Data Source Exceptions Allocation Factors $0 A1 Calculation Exceptions Cus tomer Cus tomer Comm ercial Profitability Loan Data Validated $0 A2 Profitability D2 Data Data Dim ens ion Warehous e Cus tomer Profitability Calculation Data Dim ens ion Mortgage Data Data for Comm ercial Cus tomer Calculation Retial BI Data BI Data Cons um er D1 Profitability Loan Data Validated Staging Fact Data $0 A4 Fact Data for Demand Sourcing Calculation Deposit D ata Comm ercial Calculation Balance Comm ercial BI Cus tomer Data Organization Data Values Trus t Data Comm ercial Balancing Data Treas ury Data $0 A5 General Ledger Data Retail Cus tomer Retail Balancing Data Retail BI Data Input Balance Values $0 A6 Balancing Balancing D4 Values Balance Input and Output Report Data NODE: TITLE: NUMBER: Customer Profitability A0 PAGE 19
  • 20. The Approach • This level one diagram shows all the key components of the solution. • There is no magic formula of should be included here • There needs to at least be some sort of sourcing, processing, and display/output activities • In this case, there one source processing, one calculation, and four output activities • Each can be broken down into more details • Lets look at the Commercial BI Activity PAGE 20
  • 21. The Approach Data for Comm ercial$0 A4.1 Cube BI Data Out Comm ercial Data for C ube $0 A4.3 $0 A4.6 Load Commercial Cube D16 Profitability In Data for Cube Cube Provider Reporting Comm erical Profitability Reporting Comm ercial Balancing Data Comm ercial Cus tomer Data NODE: TITLE: NUMBER: Commercial BI A4 PAGE 21
  • 22. The Approach • This decomposition can continue until you are comfortable • I try to get to the point where one developer can implement it in one module • At this point, we will have a series of diagrams that show the flow of data through the system • The diagrams contain: – Activities – Data Stores (note that a single data store can be used on multiple diagrams) – Data Flows – External Entities PAGE 22
  • 23. The Approach • Each of the diagram elements, except for the Data Flows, can be further modeled in ERwin DM • This gives the developer a further level of detail of what is intended • It also provides the physical names that will be used • To maintain the mapping between the models, I use a naming convention for ERwin DM Subject Areas • The convention is: – A01.01.01 – {Activity Name} – D01 – {Data Store Name} – E01 – {External Name} PAGE 23
  • 24. The Approach • Some examples for External Entities and Data Stores from the model above: – D01 – Customer Profitability Staging – E05 – Commercial Loans • Each of these subject areas should have the portion of the data model relevant to it • Note that these are just typical ER models • They can represent more than just table – for example, an external entity could be a flat file • Below is an example – the E05 – Commercial Loans external entity PAGE 24
  • 26. The Approach • Next we need to look at the activities • Because activities have a hierarchical numbering system, we need one for the subject areas • We simply start with A and separate each level with a period • Combine Retail Loans from the model above is in Activity 7 inside of Activity 2. It is called A2.7 Combine Retail Loans in the model. • The associated subject area will be: – A02.07 – Combine Retail Loans • The data model will show the input and out put entities and how they are processed PAGE 26
  • 28. The Approach • With the Diagrams from ERwin DM, ERwin PM, and the narrative in ERwin PM, the developer has all the information they need to implement a portion of the solution • The diagrams and narratives are also accessible to technical users • Twice, I have had the user community write papers to explain the details of specific areas of the ERwin PM model PAGE 28
  • 29. The Approach • Notes – Using ERwin DM we can quickly build detailed reports with diagrams and descriptions – The developers use these reports to track what they have to do – The Project Managers use these reports as an inventory for project planning – The ERwin PM reports are like a roadmap that ties everything together – It takes some effort to keep everything synchronized but it is well worth it PAGE 29
  • 30. The Approach • In Summary – A data warehouse is very much a store of data and a flow of data – ERwin DM and ERwin PM can model both of these areas – Use ERwin PM to decompose the solution • There is no right or best decomposition • Try it until it works – Use ERwin DM to model the internals of External Entities, Data Stores, and Activities • Tie the two models together through an appropriate naming convention • Do not worry if the entities model more than tables – The goal is to communicate with users and developers PAGE 30