SlideShare uma empresa Scribd logo
1 de 36
Build :: Balance
AGENDA
   Data warehouse and BI overview
   Data warehouse Data Flow
   Staging Area
   Transformation
   Loading
   ETL tools
   Data Marts
   Business Intelligence (BI)
   OLAP
   BIG DATA
DATA WAREHOUSE AND BI OVERVIEW

•   A data warehouse is a database that is designed for query and analysis
    rather than for transaction processing. It usually contains historical data
    derived from transaction data, but it can include data from other sources. It
    separates analysis workload from transaction workload and enables an
    organization to consolidate data from several sources.
•   In addition to a relational database, a data warehouse environment includes
    an extraction, transformation, and loading (ETL) solution, an online
    analytical processing (OLAP) engine, client analysis tools, and other
    applications that manage the process of gathering data and delivering it to
    business users.
•   Business intelligence (BI) is defined as the ability for an organization to
    take all its capabilities and convert them into knowledge. This produces
    large amounts of information which can lead to the development of new
    opportunities for the organization.
FEW KEY IMPORTANT WORDS
•   Business Operation
•   Business Intelligence
•   Business Management
•   Operational System
•   Data Warehouse
•   Operational Data store
•   Data Mart
•   Meta Data Management
STEPS TO CREATE A DATAWAREHOUSE
•   Understand the business problem to be solved
•   Gather requirements
•   Determine appropriate end user technology to support the solution
•   Build a prototype
•   Develop data warehouse data model
•   Map the DW requirements based on the user’s requirement definitions
•   Generate ETL code
•   Test the DW
•   Once validate, move the data and code to Production
SUBJECT

• Referred as subject oriented data warehouse


• Subject refers to data subject or major category of data relevant to business.


• Subset of enterprise data and consist of related entities and relationship.


• Examples Customers,Products,Sales,Geo
ENTITY
•   Defined as person ,place, thing concept or relevant in which an enterprise has both
    interest and capability to capture and store information
•   Primary entity – defined as an entity that does not depend on any other entity for its
    existance
•   SUBTYPE Entity – is logical division of or category of a parent (super type) entity.
    Examples – Customers can be Wholesale customers and Retail customers. Both inherits
    parent attributes of parent entity.
•   Attribute - It handles a group of data for an entity that can occur multiple times.
•   Associative Entity - it depends upon 2 or more entities for its existence . Like Orders
    consists of Customer and Items purchased.
•   Primary Key – Servers as unique identifier for an Entity and is used in the physical
    database to locate a record for storage or access
CHARACTERISTICS OF A PRIMARY KEY (PK)


• The key is never NULL
• The key is unique and unique by design and not by circumstances
• The key is persistence over the time
• The key is manageable – consists of integers and characters strings
  and no embedded symbols or odd characters
• The key should not contain any embedded intelligence
RELATIONSHIP
•   Relationship documents the business rules associating two entities together. The relationship is used to
    describe how the two entries are naturally linked to each other.
•   Example Customers can place orders.
•   Cardinality *** - denotes the maximum number of occurrence of one entity to another that can relate to
    another entity. Usually these are expressed as “ONE” or “MANY”
•   Identifying Relationship – An identifying relationship means that the child table cannot be uniquely
    identified without the parent
     •   Example...
         Account (AccountID, AccountNum, AccountTypeID)
         PersonAccount (AccountID, PersonID, Balance)
         Person(PersonID, Name)
     •   The Account to PersonAccount relationship and the Person to PersonAccount relationship are identifying because the child
         row (PersonAccount) cannot exist without having been defined in the parent (Account or Person). In other words: there is no
         personaccount when there is no Person or when there is no Account.

•   NON Identifying relationship - A non-identifying relationship is one where the child can be identified
    independently of the parent
     •   Example...
         Account( AccountID, AccountNum, AccountTypeID )
         AccountType( AccountTypeID, Code, Name, Description )
     •   The relationship between Account and AccountType is non-identifying because each AccountType can be identified without
         having to exist in the parent table.
NORMALIZATION
•   Normalization is the process of efficiently organizing data in a database. There are two
    goals of the normalization process: eliminating redundant data (for example, storing the
    same data in more than one t) and ensuring data dependencies make sense (only storing
    related data in a table). Both of these are worthy goals as they reduce the amount of
    space a database consumes and ensure that data is logically stored.


•   The database community has developed a series of guidelines for ensuring that
    databases are normalized. These are referred to as normal forms and are numbered from
    one (the lowest form of normalization, referred to as first normal form or 1NF) through
    three (third normal form or 3NF).
FIRST NORMAL FORM (1NF)

•   Eliminate duplicative columns from the same table.
•   Create separate tables for each group of related data and identify each row with a unique
    column or set of columns (the primary key).
•   The first rule dictates that we must not duplicate data within the same row of a table.
    Within the database community, this concept is referred to as the atomicity of a table.
    Tables that comply with this rule are said to be atomic.


•   Let’s explore this principle with a classic example – a table within a human resources
    database that stores the manager-subordinate relationship. For the purposes of our
    example, we’ll impose the business rule that each manager may have one or more
    subordinates while each subordinate may have only one manager.
Option
STUDENT  1: Make a determinant of the repeating
  group (or the multivalued attribute) a part of the
 Stud_ID    Name     Course_ID   Units


  primary key.
  101      Lennon     MSI 250    3.00
  101      Lennon     MSI 415    3.00
  125      Johnson    MSI 331    3.00
                                 Composite
                                 Primary Key

    STUDENT

     Stud_ID               Name          Course_ID   Units
           101            Lennon          MSI 250    3.00
           101            Lennon          MSI 415    3.00
           125           Johnson          MSI 331    3.00
    Option 1: Make a          of the repeating
                         determinant


    group (or the multivalued attribute) a part of
                   Composite
    the primary key.
                   Primary Key

     STUDENT

     Stud_ID   Name      Course_ID     Units
       101     Lennon       MSI 250    3.00
       101     Lennon       MSI 415    3.00
       125     Johnson      MSI 331    3.00
 Option 2: Remove the entire repeating group from the relation.
  Create another relation which would contain all the attributes of
  the repeating group, plus the primary key from the first relation.
  In this new relation, the primary key from the original relation
  and the determinant of the repeating group will comprise a
  primary key.
  STUDENT

   Stud_ID      Name         Course_ID           Units
     101        Lennon        MSI 250            3.00
     101        Lennon        MSI 415            3.00
     125       Johnson        MSI 331            3.00
STUDENT

        Stud_ID      Name
          101     Lennon
          125        Jonson


STUDENT_COURSE

Stud_ID     Course            Units
  101      MSI 250             3
  101      MSI 415             3
  125      MSI 331             3
SECOND NORMAL FORM (2NF)
•   Goal: Remove Partial Dependencies


         Composite                          Partial Dependencies
        Primary Key


    STUDENT

     Stud_ID          Name          Course_ID             Units
        101           Lennon            MSI 250            3.00
        101           Lennon            MSI 415            3.00
        125           Johnson           MSI 331            3.00
CUSTOMER                                 STUDENT_COURSE

Stud_ID    Name      Course_ID   Units   Stud_ID     Cours _ID
                                                          e
  101      Lennon     MSI 250    3.00      101        MSI 250
  101      Lennon     MSI 415    3.00      101        MSI 415
  125      Johnson    MSI 331    3.00      125        MSI 331

STUDENT                                  COURSE

 Stud_ID                   Name          Course_ID     Units
        101               Lennon
                                          MSI 250       3.00
        101               Lennon
                                          MSI 415       3.00
        125              Johnson
                                          MSI 331       3.00
THIRD NORMAL FORM (3NF)

•   Goal: Get rid of transitive dependencies.




                                       Transitive Dependency
EMPLOYEE

    Emp_ID           F_Name             L_Name        Dept_ID Dept_Name
      111               Mary              Jones           1      Acct
      122               Sarah             Smith           2     Mktg
THIRD NORMAL FORM (3NF)
• Remove the attributes, which are dependent on a non-key
  attribute, from the original relation. For each transitive
  dependency, create a new relation with the non-key attribute
  which is a determinant in the transitive dependency as a
  primary key, and the dependent non-key attribute as a
  dependent.

EMPLOYEE

  Emp_ID       F_Name       L_Name      Dept_ID Dept_Name
    111         Mary         Jones         1          Acct
    122         Sarah        Smith         2          Mktg
THIRD NORMAL FORM (3NF)
EMPLOYEE

 Emp_ID    F_Name   L_Name   Dept_ID Dept_Name
   111      Mary     Jones     1        Acct
                                                 EMPLOYEE
   122      Sarah   Smith      2       Mktg


                                                  Emp_ID      F_Name   L_Name   Dept_ID
                                                      111      Mary     Jones     1
                                                      122      Sarah   Smith      2
                                               DEPARTMENT

                                               Dept_ID Dept_Name
                                                  1         Acct
                                                  2         Mktg
ZACHMAN FRAMEWORK FOR ENTERPRISE ARCHITECTURES
ZACHMAN FRAMEWORK FOR ENTERPRISE ARCHITECTURES
•   As you can see from Figure 4, there are 36 intersecting cells in a Zachman grid—one for each
    meeting point between a player's perspective (for example, business owner) and a descriptive
    focus (for example, data.). As we move horizontally (for example, left to right) in the grid, we
    see different descriptions of the system—all from the same player's perspective. As we move
    vertically in the grid (for example, top to bottom), we see a single focus, but change the player
    from whose perspective we are viewing that focus.
•   The first suggestion of the Zachman taxonomy is that every architectural artifact should live in
    one and only one cell. There should be no ambiguity about where a particular artifact lives. If it
    is not clear in which cell a particular artifact lives, there is most likely a problem with the artifact
    itself.
•   The second suggestion of the Zachman taxonomy is that an architecture can be considered
    a complete architecture only when every cell in that architecture is complete. A cell is complete
    when it contains sufficient artifacts to fully define the system for one specific player looking at
    one specific descriptive focus.
•   The third suggestion of the Zachman grid is that cells in columns should be related to each
    other. Consider, for example, the data column (the first column) of the Zachman grid. From the
    business owner's (Bret's) perspective, data is information about the business. From the
    database administrator's perspective, data is rows and columns in the database.
ZACHMAN GRID
5 ways in which the Zachman grid can help in the development of a enterprise architecture
•   Ensure that every stakeholder's perspective has been considered for every descriptive focal
    point.
•   Improve the client’s artifacts themselves by sharpening each of their focus points to one
    particular concern for one particular audience.
•   Ensure that all of client’sbusiness requirements can be traced down to some technical
    implementation.
•   Convince client’s technical team isn't planning on building a bunch of useless functionality.
•   Convince Client that the business folks are including her IT folks in their planning.
THE OPEN GROUP ARCHITECTURE FRAMEWORK (TOGAF)



•   TOGAF is the Architecture Development Method
•   TOGAF divides an enterprise architecture into four categories, as follows


•   Business architecture—Describes the processes the business uses to meet its goals
•   Application architecture—Describes how specific applications are designed and how
    they interact with each other
•   Data architecture—Describes how the enterprise datastores are organized and accessed
•   Technical architecture—Describes the hardware and software infrastructure that
    supports applications and their interactions
•   Zachman tells you how to categorize your artifacts. TOGAF gives you a process for
    creating them.
DAY-TO-DAY EXPERIENCE OF CREATING AN ENTERPRISE ARCHITECTURE
WILL BE DRIVEN BY THE ADM

 A high-level view
PHASE A & PHASE B
•   The culmination of Phase A will be a Statement of Architecture Work, which must be
    approved by the various stakeholders before the next phase of the ADM begins. The
    output of this phase is to create an architectural vision for the first pass through the
    ADM cycle. Architect will guide Client into choosing the project, validating the project
    against the architectural principles established in the Preliminary Phase, and ensure that
    the appropriate stakeholders have been identified and their issues have been addressed.
•   The Architectural Vision created in Phase A will be the main input into Phase B. Client’s
    goal in Phase B is to create a detailed baseline and target business architecture and
    perform a full analysis of the gaps between them.
•   Phase B is quite involved—involving business modeling, highly detailed business
    analysis, and technical-requirements documentation. A successful Phase B requires input
    from many stakeholders. The major outputs will be a detailed description of the baseline
    and target business objectives, and gap descriptions of the business architecture.
PHASE C
•   Develop baseline data-architecture description
•   Review and validate principles, reference models, viewpoints, and tools
•   Create architecture models, including logical data models, data-management process models, and
    relationship models that map business functions to CRUD (Create, Read, Update, Delete) data operations
•   Select data-architecture building blocks
•   Conduct formal checkpoint reviews of the architecture model and building blocks with stakeholders
•   Review qualitative criteria (for example, performance, reliability, security, integrity)
•   Complete data architecture
•   Conduct checkpoint/impact analysis
•   Perform gap analysis


• The most important deliverable from this phase will be the Target Information
  and Applications Architecture.
PHASE D & PHASE E
•   Phase D completes the technical architecture—the infrastructure necessary to support the
    proposed new architecture. This phase is completed mostly by engaging with Client’s
    infrastructure and technical team.


•   Phase E evaluates the various implementation possibilities, identifies the major
    implementation projects that might be undertaken, and evaluates the business opportunity
    associated with each. The TOGAF standard recommends that Client’s first pass at Phase
    E "focus on projects that will deliver short-term payoffs and so create an impetus for
    proceeding with longer-term projects.―
•   A good starting place to look for such projects is the organizational pain-points that
    initially convinced by client’s CEO to adopt an enterprise architectural-based strategy
PHASE F , PHASE G & PHASE H
•   Phase F is closely related to Phase E. In this phase, Teri works with MedAMore's
    governance body to sort the projects identified in Phase E into priority order that include
    not only the cost and benefits (identified in Phase E), but also the risk factors


•   In Phase G, Client takes the prioritized list of projects and creates architectural
    specifications for the implementation projects. These specifications will include
    acceptance criteria and lists of risks and issues
•   The final phase is H. In this phase, Client modifies the architectural change-management
    process with any new artifacts created in this last iteration and with new information that
    becomes available
INFORMATION ENGINEERING (IE) NOTATION
Data Warehousing and BI - Recruitment POV
Data Warehousing and BI - Recruitment POV
Data Warehousing and BI - Recruitment POV
Data Warehousing and BI - Recruitment POV
Data Warehousing and BI - Recruitment POV
Data Warehousing and BI - Recruitment POV

Mais conteúdo relacionado

Destaque

05 zuivel + kaas
05 zuivel + kaas05 zuivel + kaas
05 zuivel + kaasPaul Laaper
 
대체복무 찬성
대체복무 찬성대체복무 찬성
대체복무 찬성prijer
 
E learning utilizando software libre
E learning utilizando software libreE learning utilizando software libre
E learning utilizando software libreGerman Rivas
 
LMS 2.0 - Tools for the Next Gen of Learners
LMS 2.0 - Tools for the Next Gen of LearnersLMS 2.0 - Tools for the Next Gen of Learners
LMS 2.0 - Tools for the Next Gen of LearnersTeamie
 
Using APIs for Success in Government
Using APIs for Success in GovernmentUsing APIs for Success in Government
Using APIs for Success in GovernmentGranicus
 
01 materialenkennis
01 materialenkennis01 materialenkennis
01 materialenkennisPaul Laaper
 
A glimpse of the bible
A glimpse of the bibleA glimpse of the bible
A glimpse of the bibleAilene Ornales
 
Ecologíaydesarrollo1
Ecologíaydesarrollo1Ecologíaydesarrollo1
Ecologíaydesarrollo1ecologiaBH52
 
Modelul biblioteconomic încastrat dinamism evolutiv al ştiinţei biblioteco...
Modelul biblioteconomic încastrat    dinamism evolutiv al ştiinţei biblioteco...Modelul biblioteconomic încastrat    dinamism evolutiv al ştiinţei biblioteco...
Modelul biblioteconomic încastrat dinamism evolutiv al ştiinţei biblioteco...Vasilica Victoria
 
The Power of Email
The Power of EmailThe Power of Email
The Power of EmailGranicus
 
Què hem fet al curs 12-13? Què volem fer al 13-14?
Què hem fet al curs 12-13? Què volem fer al 13-14?Què hem fet al curs 12-13? Què volem fer al 13-14?
Què hem fet al curs 12-13? Què volem fer al 13-14?AMPA INS Bellvitge
 
Best Practices in making Learning Social & Fun
Best Practices in making Learning Social & FunBest Practices in making Learning Social & Fun
Best Practices in making Learning Social & FunTeamie
 
proekt1
proekt1proekt1
proekt1banahr
 
Db investigator strategy 121017
Db investigator strategy 121017Db investigator strategy 121017
Db investigator strategy 121017Sergii Kurbatov
 

Destaque (20)

Question 2
Question 2Question 2
Question 2
 
Amor a la naturaleza
Amor a la naturalezaAmor a la naturaleza
Amor a la naturaleza
 
05 zuivel + kaas
05 zuivel + kaas05 zuivel + kaas
05 zuivel + kaas
 
대체복무 찬성
대체복무 찬성대체복무 찬성
대체복무 찬성
 
E learning utilizando software libre
E learning utilizando software libreE learning utilizando software libre
E learning utilizando software libre
 
LMS 2.0 - Tools for the Next Gen of Learners
LMS 2.0 - Tools for the Next Gen of LearnersLMS 2.0 - Tools for the Next Gen of Learners
LMS 2.0 - Tools for the Next Gen of Learners
 
Regular expressions
Regular expressions Regular expressions
Regular expressions
 
Talent Transformation in ICT
Talent Transformation in ICT Talent Transformation in ICT
Talent Transformation in ICT
 
Is or are?
Is or are?Is or are?
Is or are?
 
Using APIs for Success in Government
Using APIs for Success in GovernmentUsing APIs for Success in Government
Using APIs for Success in Government
 
01 materialenkennis
01 materialenkennis01 materialenkennis
01 materialenkennis
 
A glimpse of the bible
A glimpse of the bibleA glimpse of the bible
A glimpse of the bible
 
Ecologíaydesarrollo1
Ecologíaydesarrollo1Ecologíaydesarrollo1
Ecologíaydesarrollo1
 
Modelul biblioteconomic încastrat dinamism evolutiv al ştiinţei biblioteco...
Modelul biblioteconomic încastrat    dinamism evolutiv al ştiinţei biblioteco...Modelul biblioteconomic încastrat    dinamism evolutiv al ştiinţei biblioteco...
Modelul biblioteconomic încastrat dinamism evolutiv al ştiinţei biblioteco...
 
The Power of Email
The Power of EmailThe Power of Email
The Power of Email
 
Què hem fet al curs 12-13? Què volem fer al 13-14?
Què hem fet al curs 12-13? Què volem fer al 13-14?Què hem fet al curs 12-13? Què volem fer al 13-14?
Què hem fet al curs 12-13? Què volem fer al 13-14?
 
Best Practices in making Learning Social & Fun
Best Practices in making Learning Social & FunBest Practices in making Learning Social & Fun
Best Practices in making Learning Social & Fun
 
proekt1
proekt1proekt1
proekt1
 
Laporan pemrograman desktop 5&6
Laporan pemrograman desktop 5&6Laporan pemrograman desktop 5&6
Laporan pemrograman desktop 5&6
 
Db investigator strategy 121017
Db investigator strategy 121017Db investigator strategy 121017
Db investigator strategy 121017
 

Semelhante a Data Warehousing and BI - Recruitment POV

Semelhante a Data Warehousing and BI - Recruitment POV (20)

Oracle fundamentals and sql
Oracle fundamentals and sqlOracle fundamentals and sql
Oracle fundamentals and sql
 
Sql Server 2000
Sql Server 2000Sql Server 2000
Sql Server 2000
 
Topics-Ch4Ch5.ppt
Topics-Ch4Ch5.pptTopics-Ch4Ch5.ppt
Topics-Ch4Ch5.ppt
 
Topics-Ch4Ch5.ppt
Topics-Ch4Ch5.pptTopics-Ch4Ch5.ppt
Topics-Ch4Ch5.ppt
 
Database management systems 3 - Data Modelling
Database management systems 3 - Data ModellingDatabase management systems 3 - Data Modelling
Database management systems 3 - Data Modelling
 
week3.ppt
week3.pptweek3.ppt
week3.ppt
 
DB Design.ppt
DB Design.pptDB Design.ppt
DB Design.ppt
 
Data Retrival
Data RetrivalData Retrival
Data Retrival
 
Lecture-2 - Relational Model.pptx
Lecture-2 - Relational Model.pptxLecture-2 - Relational Model.pptx
Lecture-2 - Relational Model.pptx
 
Lecture2.ppt
Lecture2.pptLecture2.ppt
Lecture2.ppt
 
Advance database system (part 3)
Advance database system (part 3)Advance database system (part 3)
Advance database system (part 3)
 
IT6701-Information Management Unit 1
IT6701-Information Management Unit 1IT6701-Information Management Unit 1
IT6701-Information Management Unit 1
 
Normalization
NormalizationNormalization
Normalization
 
Introduction to mysql part 1
Introduction to mysql part 1Introduction to mysql part 1
Introduction to mysql part 1
 
Dinesh Chandrasekar Database Theory For Siebel Enterprise Applications
Dinesh Chandrasekar  Database Theory For Siebel Enterprise ApplicationsDinesh Chandrasekar  Database Theory For Siebel Enterprise Applications
Dinesh Chandrasekar Database Theory For Siebel Enterprise Applications
 
Advance database system(part 4)
Advance database system(part 4)Advance database system(part 4)
Advance database system(part 4)
 
Page 1 of 27 Tony LoCoco ITCO333 – Unit 2 August 2.docx
Page 1 of 27 Tony LoCoco ITCO333 – Unit 2 August 2.docxPage 1 of 27 Tony LoCoco ITCO333 – Unit 2 August 2.docx
Page 1 of 27 Tony LoCoco ITCO333 – Unit 2 August 2.docx
 
Chapter 3 ( PART 2 ).pptx
Chapter 3 ( PART 2 ).pptxChapter 3 ( PART 2 ).pptx
Chapter 3 ( PART 2 ).pptx
 
Database_Introduction.pdf
Database_Introduction.pdfDatabase_Introduction.pdf
Database_Introduction.pdf
 
My sql
My sqlMy sql
My sql
 

Mais de Suvradeep Rudra

Cloud Strategies for Financial Firms : Migrating one step at a time
Cloud  Strategies for Financial Firms : Migrating one step at a timeCloud  Strategies for Financial Firms : Migrating one step at a time
Cloud Strategies for Financial Firms : Migrating one step at a timeSuvradeep Rudra
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
Data architecture around risk management
Data architecture around risk managementData architecture around risk management
Data architecture around risk managementSuvradeep Rudra
 
Business intelligence vs business analytics
Business intelligence  vs business analyticsBusiness intelligence  vs business analytics
Business intelligence vs business analyticsSuvradeep Rudra
 
Big data analytics in politics voted yes
Big data analytics in politics  voted yesBig data analytics in politics  voted yes
Big data analytics in politics voted yesSuvradeep Rudra
 
Overview ppdm data_architecture_in_oil and gas_ industry
Overview ppdm data_architecture_in_oil and gas_ industryOverview ppdm data_architecture_in_oil and gas_ industry
Overview ppdm data_architecture_in_oil and gas_ industrySuvradeep Rudra
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented DatabaseSuvradeep Rudra
 
Where HADOOP fits in and challenges
Where HADOOP fits in and challengesWhere HADOOP fits in and challenges
Where HADOOP fits in and challengesSuvradeep Rudra
 
Dodd frank yahoo_10_14_2011_show
Dodd frank yahoo_10_14_2011_showDodd frank yahoo_10_14_2011_show
Dodd frank yahoo_10_14_2011_showSuvradeep Rudra
 

Mais de Suvradeep Rudra (10)

Cloud Strategies for Financial Firms : Migrating one step at a time
Cloud  Strategies for Financial Firms : Migrating one step at a timeCloud  Strategies for Financial Firms : Migrating one step at a time
Cloud Strategies for Financial Firms : Migrating one step at a time
 
Design patterns 101
Design patterns 101Design patterns 101
Design patterns 101
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Data architecture around risk management
Data architecture around risk managementData architecture around risk management
Data architecture around risk management
 
Business intelligence vs business analytics
Business intelligence  vs business analyticsBusiness intelligence  vs business analytics
Business intelligence vs business analytics
 
Big data analytics in politics voted yes
Big data analytics in politics  voted yesBig data analytics in politics  voted yes
Big data analytics in politics voted yes
 
Overview ppdm data_architecture_in_oil and gas_ industry
Overview ppdm data_architecture_in_oil and gas_ industryOverview ppdm data_architecture_in_oil and gas_ industry
Overview ppdm data_architecture_in_oil and gas_ industry
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
Where HADOOP fits in and challenges
Where HADOOP fits in and challengesWhere HADOOP fits in and challenges
Where HADOOP fits in and challenges
 
Dodd frank yahoo_10_14_2011_show
Dodd frank yahoo_10_14_2011_showDodd frank yahoo_10_14_2011_show
Dodd frank yahoo_10_14_2011_show
 

Último

Back on Track: Navigating the Return to Work after Parental Leave
Back on Track: Navigating the Return to Work after Parental LeaveBack on Track: Navigating the Return to Work after Parental Leave
Back on Track: Navigating the Return to Work after Parental LeaveMarharyta Nedzelska
 
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证diploma001
 
Ioannis Tzachristas Self-Presentation for MBA.pdf
Ioannis Tzachristas Self-Presentation for MBA.pdfIoannis Tzachristas Self-Presentation for MBA.pdf
Ioannis Tzachristas Self-Presentation for MBA.pdfjtzach
 
办理(Salford毕业证书)索尔福德大学毕业证成绩单原版一比一
办理(Salford毕业证书)索尔福德大学毕业证成绩单原版一比一办理(Salford毕业证书)索尔福德大学毕业证成绩单原版一比一
办理(Salford毕业证书)索尔福德大学毕业证成绩单原版一比一diploma 1
 
Ethics of Animal Research Laika mission.ppt
Ethics of Animal Research Laika mission.pptEthics of Animal Research Laika mission.ppt
Ethics of Animal Research Laika mission.pptShafqatShakeel1
 
AICTE PPT slide of Engineering college kr pete
AICTE PPT slide of Engineering college kr peteAICTE PPT slide of Engineering college kr pete
AICTE PPT slide of Engineering college kr peteshivubhavv
 
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607dollysharma2066
 
办理老道明大学毕业证成绩单|购买美国ODU文凭证书
办理老道明大学毕业证成绩单|购买美国ODU文凭证书办理老道明大学毕业证成绩单|购买美国ODU文凭证书
办理老道明大学毕业证成绩单|购买美国ODU文凭证书saphesg8
 
Digital Marketing Training Institute in Mohali, India
Digital Marketing Training Institute in Mohali, IndiaDigital Marketing Training Institute in Mohali, India
Digital Marketing Training Institute in Mohali, IndiaDigital Discovery Institute
 
定制(Waikato毕业证书)新西兰怀卡托大学毕业证成绩单原版一比一
定制(Waikato毕业证书)新西兰怀卡托大学毕业证成绩单原版一比一定制(Waikato毕业证书)新西兰怀卡托大学毕业证成绩单原版一比一
定制(Waikato毕业证书)新西兰怀卡托大学毕业证成绩单原版一比一Fs
 
Ch. 9- __Skin, hair and nail Assessment (1).pdf
Ch. 9- __Skin, hair and nail Assessment (1).pdfCh. 9- __Skin, hair and nail Assessment (1).pdf
Ch. 9- __Skin, hair and nail Assessment (1).pdfJamalYaseenJameelOde
 
办澳洲詹姆斯库克大学毕业证成绩单pdf电子版制作修改
办澳洲詹姆斯库克大学毕业证成绩单pdf电子版制作修改办澳洲詹姆斯库克大学毕业证成绩单pdf电子版制作修改
办澳洲詹姆斯库克大学毕业证成绩单pdf电子版制作修改yuu sss
 
Protection of Children in context of IHL and Counter Terrorism
Protection of Children in context of IHL and  Counter TerrorismProtection of Children in context of IHL and  Counter Terrorism
Protection of Children in context of IHL and Counter TerrorismNilendra Kumar
 
Black and White Minimalist Co Letter.pdf
Black and White Minimalist Co Letter.pdfBlack and White Minimalist Co Letter.pdf
Black and White Minimalist Co Letter.pdfpadillaangelina0023
 
Navigating the Data Economy: Transforming Recruitment and Hiring
Navigating the Data Economy: Transforming Recruitment and HiringNavigating the Data Economy: Transforming Recruitment and Hiring
Navigating the Data Economy: Transforming Recruitment and Hiringkaran651042
 
美国SU学位证,雪城大学毕业证书1:1制作
美国SU学位证,雪城大学毕业证书1:1制作美国SU学位证,雪城大学毕业证书1:1制作
美国SU学位证,雪城大学毕业证书1:1制作ss846v0c
 
定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
 定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一 定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一Fs sss
 
Issues in the Philippines (Unemployment and Underemployment).pptx
Issues in the Philippines (Unemployment and Underemployment).pptxIssues in the Philippines (Unemployment and Underemployment).pptx
Issues in the Philippines (Unemployment and Underemployment).pptxJenniferPeraro1
 

Último (20)

FULL ENJOY Call Girls In Gautam Nagar (Delhi) Call Us 9953056974
FULL ENJOY Call Girls In Gautam Nagar (Delhi) Call Us 9953056974FULL ENJOY Call Girls In Gautam Nagar (Delhi) Call Us 9953056974
FULL ENJOY Call Girls In Gautam Nagar (Delhi) Call Us 9953056974
 
Back on Track: Navigating the Return to Work after Parental Leave
Back on Track: Navigating the Return to Work after Parental LeaveBack on Track: Navigating the Return to Work after Parental Leave
Back on Track: Navigating the Return to Work after Parental Leave
 
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
原版定制卡尔加里大学毕业证(UC毕业证)留信学历认证
 
Ioannis Tzachristas Self-Presentation for MBA.pdf
Ioannis Tzachristas Self-Presentation for MBA.pdfIoannis Tzachristas Self-Presentation for MBA.pdf
Ioannis Tzachristas Self-Presentation for MBA.pdf
 
办理(Salford毕业证书)索尔福德大学毕业证成绩单原版一比一
办理(Salford毕业证书)索尔福德大学毕业证成绩单原版一比一办理(Salford毕业证书)索尔福德大学毕业证成绩单原版一比一
办理(Salford毕业证书)索尔福德大学毕业证成绩单原版一比一
 
Ethics of Animal Research Laika mission.ppt
Ethics of Animal Research Laika mission.pptEthics of Animal Research Laika mission.ppt
Ethics of Animal Research Laika mission.ppt
 
AICTE PPT slide of Engineering college kr pete
AICTE PPT slide of Engineering college kr peteAICTE PPT slide of Engineering college kr pete
AICTE PPT slide of Engineering college kr pete
 
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
Gurgaon Call Girls: Free Delivery 24x7 at Your Doorstep G.G.N = 8377087607
 
办理老道明大学毕业证成绩单|购买美国ODU文凭证书
办理老道明大学毕业证成绩单|购买美国ODU文凭证书办理老道明大学毕业证成绩单|购买美国ODU文凭证书
办理老道明大学毕业证成绩单|购买美国ODU文凭证书
 
Digital Marketing Training Institute in Mohali, India
Digital Marketing Training Institute in Mohali, IndiaDigital Marketing Training Institute in Mohali, India
Digital Marketing Training Institute in Mohali, India
 
定制(Waikato毕业证书)新西兰怀卡托大学毕业证成绩单原版一比一
定制(Waikato毕业证书)新西兰怀卡托大学毕业证成绩单原版一比一定制(Waikato毕业证书)新西兰怀卡托大学毕业证成绩单原版一比一
定制(Waikato毕业证书)新西兰怀卡托大学毕业证成绩单原版一比一
 
Ch. 9- __Skin, hair and nail Assessment (1).pdf
Ch. 9- __Skin, hair and nail Assessment (1).pdfCh. 9- __Skin, hair and nail Assessment (1).pdf
Ch. 9- __Skin, hair and nail Assessment (1).pdf
 
办澳洲詹姆斯库克大学毕业证成绩单pdf电子版制作修改
办澳洲詹姆斯库克大学毕业证成绩单pdf电子版制作修改办澳洲詹姆斯库克大学毕业证成绩单pdf电子版制作修改
办澳洲詹姆斯库克大学毕业证成绩单pdf电子版制作修改
 
Protection of Children in context of IHL and Counter Terrorism
Protection of Children in context of IHL and  Counter TerrorismProtection of Children in context of IHL and  Counter Terrorism
Protection of Children in context of IHL and Counter Terrorism
 
Black and White Minimalist Co Letter.pdf
Black and White Minimalist Co Letter.pdfBlack and White Minimalist Co Letter.pdf
Black and White Minimalist Co Letter.pdf
 
Navigating the Data Economy: Transforming Recruitment and Hiring
Navigating the Data Economy: Transforming Recruitment and HiringNavigating the Data Economy: Transforming Recruitment and Hiring
Navigating the Data Economy: Transforming Recruitment and Hiring
 
美国SU学位证,雪城大学毕业证书1:1制作
美国SU学位证,雪城大学毕业证书1:1制作美国SU学位证,雪城大学毕业证书1:1制作
美国SU学位证,雪城大学毕业证书1:1制作
 
定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
 定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一 定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
定制(UOIT学位证)加拿大安大略理工大学毕业证成绩单原版一比一
 
Young Call~Girl in Pragati Maidan New Delhi 8448380779 Full Enjoy Escort Service
Young Call~Girl in Pragati Maidan New Delhi 8448380779 Full Enjoy Escort ServiceYoung Call~Girl in Pragati Maidan New Delhi 8448380779 Full Enjoy Escort Service
Young Call~Girl in Pragati Maidan New Delhi 8448380779 Full Enjoy Escort Service
 
Issues in the Philippines (Unemployment and Underemployment).pptx
Issues in the Philippines (Unemployment and Underemployment).pptxIssues in the Philippines (Unemployment and Underemployment).pptx
Issues in the Philippines (Unemployment and Underemployment).pptx
 

Data Warehousing and BI - Recruitment POV

  • 2. AGENDA  Data warehouse and BI overview  Data warehouse Data Flow  Staging Area  Transformation  Loading  ETL tools  Data Marts  Business Intelligence (BI)  OLAP  BIG DATA
  • 3. DATA WAREHOUSE AND BI OVERVIEW • A data warehouse is a database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources. • In addition to a relational database, a data warehouse environment includes an extraction, transformation, and loading (ETL) solution, an online analytical processing (OLAP) engine, client analysis tools, and other applications that manage the process of gathering data and delivering it to business users. • Business intelligence (BI) is defined as the ability for an organization to take all its capabilities and convert them into knowledge. This produces large amounts of information which can lead to the development of new opportunities for the organization.
  • 4. FEW KEY IMPORTANT WORDS • Business Operation • Business Intelligence • Business Management • Operational System • Data Warehouse • Operational Data store • Data Mart • Meta Data Management
  • 5. STEPS TO CREATE A DATAWAREHOUSE • Understand the business problem to be solved • Gather requirements • Determine appropriate end user technology to support the solution • Build a prototype • Develop data warehouse data model • Map the DW requirements based on the user’s requirement definitions • Generate ETL code • Test the DW • Once validate, move the data and code to Production
  • 6. SUBJECT • Referred as subject oriented data warehouse • Subject refers to data subject or major category of data relevant to business. • Subset of enterprise data and consist of related entities and relationship. • Examples Customers,Products,Sales,Geo
  • 7. ENTITY • Defined as person ,place, thing concept or relevant in which an enterprise has both interest and capability to capture and store information • Primary entity – defined as an entity that does not depend on any other entity for its existance • SUBTYPE Entity – is logical division of or category of a parent (super type) entity. Examples – Customers can be Wholesale customers and Retail customers. Both inherits parent attributes of parent entity. • Attribute - It handles a group of data for an entity that can occur multiple times. • Associative Entity - it depends upon 2 or more entities for its existence . Like Orders consists of Customer and Items purchased. • Primary Key – Servers as unique identifier for an Entity and is used in the physical database to locate a record for storage or access
  • 8. CHARACTERISTICS OF A PRIMARY KEY (PK) • The key is never NULL • The key is unique and unique by design and not by circumstances • The key is persistence over the time • The key is manageable – consists of integers and characters strings and no embedded symbols or odd characters • The key should not contain any embedded intelligence
  • 9. RELATIONSHIP • Relationship documents the business rules associating two entities together. The relationship is used to describe how the two entries are naturally linked to each other. • Example Customers can place orders. • Cardinality *** - denotes the maximum number of occurrence of one entity to another that can relate to another entity. Usually these are expressed as “ONE” or “MANY” • Identifying Relationship – An identifying relationship means that the child table cannot be uniquely identified without the parent • Example... Account (AccountID, AccountNum, AccountTypeID) PersonAccount (AccountID, PersonID, Balance) Person(PersonID, Name) • The Account to PersonAccount relationship and the Person to PersonAccount relationship are identifying because the child row (PersonAccount) cannot exist without having been defined in the parent (Account or Person). In other words: there is no personaccount when there is no Person or when there is no Account. • NON Identifying relationship - A non-identifying relationship is one where the child can be identified independently of the parent • Example... Account( AccountID, AccountNum, AccountTypeID ) AccountType( AccountTypeID, Code, Name, Description ) • The relationship between Account and AccountType is non-identifying because each AccountType can be identified without having to exist in the parent table.
  • 10. NORMALIZATION • Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one t) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. • The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through three (third normal form or 3NF).
  • 11. FIRST NORMAL FORM (1NF) • Eliminate duplicative columns from the same table. • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key). • The first rule dictates that we must not duplicate data within the same row of a table. Within the database community, this concept is referred to as the atomicity of a table. Tables that comply with this rule are said to be atomic. • Let’s explore this principle with a classic example – a table within a human resources database that stores the manager-subordinate relationship. For the purposes of our example, we’ll impose the business rule that each manager may have one or more subordinates while each subordinate may have only one manager.
  • 12. Option STUDENT 1: Make a determinant of the repeating group (or the multivalued attribute) a part of the Stud_ID Name Course_ID Units primary key. 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00 Composite Primary Key STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00
  • 13. Option 1: Make a of the repeating determinant group (or the multivalued attribute) a part of Composite the primary key. Primary Key STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00
  • 14.  Option 2: Remove the entire repeating group from the relation. Create another relation which would contain all the attributes of the repeating group, plus the primary key from the first relation. In this new relation, the primary key from the original relation and the determinant of the repeating group will comprise a primary key. STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00
  • 15. STUDENT Stud_ID Name 101 Lennon 125 Jonson STUDENT_COURSE Stud_ID Course Units 101 MSI 250 3 101 MSI 415 3 125 MSI 331 3
  • 16. SECOND NORMAL FORM (2NF) • Goal: Remove Partial Dependencies Composite Partial Dependencies Primary Key STUDENT Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00
  • 17. CUSTOMER STUDENT_COURSE Stud_ID Name Course_ID Units Stud_ID Cours _ID e 101 Lennon MSI 250 3.00 101 MSI 250 101 Lennon MSI 415 3.00 101 MSI 415 125 Johnson MSI 331 3.00 125 MSI 331 STUDENT COURSE Stud_ID Name Course_ID Units 101 Lennon MSI 250 3.00 101 Lennon MSI 415 3.00 125 Johnson MSI 331 3.00
  • 18. THIRD NORMAL FORM (3NF) • Goal: Get rid of transitive dependencies. Transitive Dependency EMPLOYEE Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct 122 Sarah Smith 2 Mktg
  • 19. THIRD NORMAL FORM (3NF) • Remove the attributes, which are dependent on a non-key attribute, from the original relation. For each transitive dependency, create a new relation with the non-key attribute which is a determinant in the transitive dependency as a primary key, and the dependent non-key attribute as a dependent. EMPLOYEE Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct 122 Sarah Smith 2 Mktg
  • 20. THIRD NORMAL FORM (3NF) EMPLOYEE Emp_ID F_Name L_Name Dept_ID Dept_Name 111 Mary Jones 1 Acct EMPLOYEE 122 Sarah Smith 2 Mktg Emp_ID F_Name L_Name Dept_ID 111 Mary Jones 1 122 Sarah Smith 2 DEPARTMENT Dept_ID Dept_Name 1 Acct 2 Mktg
  • 21. ZACHMAN FRAMEWORK FOR ENTERPRISE ARCHITECTURES
  • 22. ZACHMAN FRAMEWORK FOR ENTERPRISE ARCHITECTURES • As you can see from Figure 4, there are 36 intersecting cells in a Zachman grid—one for each meeting point between a player's perspective (for example, business owner) and a descriptive focus (for example, data.). As we move horizontally (for example, left to right) in the grid, we see different descriptions of the system—all from the same player's perspective. As we move vertically in the grid (for example, top to bottom), we see a single focus, but change the player from whose perspective we are viewing that focus. • The first suggestion of the Zachman taxonomy is that every architectural artifact should live in one and only one cell. There should be no ambiguity about where a particular artifact lives. If it is not clear in which cell a particular artifact lives, there is most likely a problem with the artifact itself. • The second suggestion of the Zachman taxonomy is that an architecture can be considered a complete architecture only when every cell in that architecture is complete. A cell is complete when it contains sufficient artifacts to fully define the system for one specific player looking at one specific descriptive focus. • The third suggestion of the Zachman grid is that cells in columns should be related to each other. Consider, for example, the data column (the first column) of the Zachman grid. From the business owner's (Bret's) perspective, data is information about the business. From the database administrator's perspective, data is rows and columns in the database.
  • 23. ZACHMAN GRID 5 ways in which the Zachman grid can help in the development of a enterprise architecture • Ensure that every stakeholder's perspective has been considered for every descriptive focal point. • Improve the client’s artifacts themselves by sharpening each of their focus points to one particular concern for one particular audience. • Ensure that all of client’sbusiness requirements can be traced down to some technical implementation. • Convince client’s technical team isn't planning on building a bunch of useless functionality. • Convince Client that the business folks are including her IT folks in their planning.
  • 24. THE OPEN GROUP ARCHITECTURE FRAMEWORK (TOGAF) • TOGAF is the Architecture Development Method • TOGAF divides an enterprise architecture into four categories, as follows • Business architecture—Describes the processes the business uses to meet its goals • Application architecture—Describes how specific applications are designed and how they interact with each other • Data architecture—Describes how the enterprise datastores are organized and accessed • Technical architecture—Describes the hardware and software infrastructure that supports applications and their interactions • Zachman tells you how to categorize your artifacts. TOGAF gives you a process for creating them.
  • 25. DAY-TO-DAY EXPERIENCE OF CREATING AN ENTERPRISE ARCHITECTURE WILL BE DRIVEN BY THE ADM A high-level view
  • 26. PHASE A & PHASE B • The culmination of Phase A will be a Statement of Architecture Work, which must be approved by the various stakeholders before the next phase of the ADM begins. The output of this phase is to create an architectural vision for the first pass through the ADM cycle. Architect will guide Client into choosing the project, validating the project against the architectural principles established in the Preliminary Phase, and ensure that the appropriate stakeholders have been identified and their issues have been addressed. • The Architectural Vision created in Phase A will be the main input into Phase B. Client’s goal in Phase B is to create a detailed baseline and target business architecture and perform a full analysis of the gaps between them. • Phase B is quite involved—involving business modeling, highly detailed business analysis, and technical-requirements documentation. A successful Phase B requires input from many stakeholders. The major outputs will be a detailed description of the baseline and target business objectives, and gap descriptions of the business architecture.
  • 27. PHASE C • Develop baseline data-architecture description • Review and validate principles, reference models, viewpoints, and tools • Create architecture models, including logical data models, data-management process models, and relationship models that map business functions to CRUD (Create, Read, Update, Delete) data operations • Select data-architecture building blocks • Conduct formal checkpoint reviews of the architecture model and building blocks with stakeholders • Review qualitative criteria (for example, performance, reliability, security, integrity) • Complete data architecture • Conduct checkpoint/impact analysis • Perform gap analysis • The most important deliverable from this phase will be the Target Information and Applications Architecture.
  • 28. PHASE D & PHASE E • Phase D completes the technical architecture—the infrastructure necessary to support the proposed new architecture. This phase is completed mostly by engaging with Client’s infrastructure and technical team. • Phase E evaluates the various implementation possibilities, identifies the major implementation projects that might be undertaken, and evaluates the business opportunity associated with each. The TOGAF standard recommends that Client’s first pass at Phase E "focus on projects that will deliver short-term payoffs and so create an impetus for proceeding with longer-term projects.― • A good starting place to look for such projects is the organizational pain-points that initially convinced by client’s CEO to adopt an enterprise architectural-based strategy
  • 29. PHASE F , PHASE G & PHASE H • Phase F is closely related to Phase E. In this phase, Teri works with MedAMore's governance body to sort the projects identified in Phase E into priority order that include not only the cost and benefits (identified in Phase E), but also the risk factors • In Phase G, Client takes the prioritized list of projects and creates architectural specifications for the implementation projects. These specifications will include acceptance criteria and lists of risks and issues • The final phase is H. In this phase, Client modifies the architectural change-management process with any new artifacts created in this last iteration and with new information that becomes available