O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a navegar o site, você aceita o uso de cookies. Leia nosso Contrato do Usuário e nossa Política de Privacidade.
O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a utilizar o site, você aceita o uso de cookies. Leia nossa Política de Privacidade e nosso Contrato do Usuário para obter mais detalhes.
ETL and OLAP Cube Reporting <br />Using the NetFlix OLTP Database<br />By: Rona Charlene Lao<br />
Introduction<br />This project is about building a Data Warehouse database from the Netflix database from the first week’s Assignment.<br />Objectives: <br />To provide an end to end solution to upload transactional data into the Data Warehouse. <br />Provide dynamic reports for NetFlix showing various representations of their aggregated data based on Rental, Shipment, Payment and DVD Inventory.<br />To demonstrate how OLAP is used to provide dynamic multidimensional reports.<br />
Scope<br />To create mock up data to be uploaded into the Data Warehouse<br />To build a complete end to end ETL solution.<br />Use of SQL*Loader, stored procedures and triggers to implement business transformation rules from Staging to Target Area.<br />To create canned reports and demonstrate how Data Warehouses can provide Dynamic multidimensional reports<br />
Out of Scope<br />To build the OLTP database from scratch<br />Code all business and functional rules related to Netflix data storage and operational requirements<br />
Process Flow - Extract<br />SQL Queries <br />SQL Queries were ran against the NetFlix OLTP Database to extract the data for the dimension tables. <br />The extracts were saved as CSV Files.<br />SQL*Loader – This tool was used to upload the CSV Files into the Staging Area of the DW database.<br />Stored Procedures – Used to extract data for the Member and DVD dimension tables and for the fact tables.<br />Fact Tables stored procedures have two parameters, startdt and enddt.<br />
Process Flow - Extract<br />Control File<br />SQL*Loader<br />
Process Flow - Transform<br />After the Stored Procedure for the DVD extract executes, the V_DVD materialized view gets refreshed (force)<br />T_STAR_DIM, also gets automatically updated through a trigger once the STG_MOVIEPERSONROLE_DIM table gets populated. <br />The T_STAR_DIM table is a denormalized version of the MOVIEPERSONROLE table<br />T_MEMBER_DIM is also a denormalizedversion of a source table<br />
Process Flow – Load<br />The Stored Procedure, POP_TARGET_SP, moves the data from the Staging Area (STG_) to its corresponding table in the Target Area (T_) within the DW Database.<br />Only takes the records that are not already in the Target Area. <br />Ensures that there is only a subset of data that is run by the process while guaranteeing the preservation of historical data in the Target Fact Tables (T_*_F).<br />Uses NOT IN statements to ensure that there is no duplication <br />Listed in sequence to preserve and abide byintegrity constraints set up in the Target Area. <br />
Incremental Load<br />Created mock up data<br />Performed CSV extracts<br />Ran SQL*Loader<br />Ran Stored Procedures for the population of the Staging Area<br />Ran Stored Procedure for the population of the Target Area<br />Refreshed Online Cubes<br />Recreated Offline Cubes<br />
Demo<br />Please see the demo.avi file in the ronalao_term.zip file<br />