O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Ronalao termpresent

2.664 visualizações

Publicada em

Publicada em: Educação
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Ronalao termpresent

  1. 1. ETL and OLAP Cube Reporting <br />Using the NetFlix OLTP Database<br />By: Rona Charlene Lao<br />
  2. 2. Introduction<br />This project is about building a Data Warehouse database from the Netflix database from the first week’s Assignment.<br />Objectives: <br />To provide an end to end solution to upload transactional data into the Data Warehouse. <br />Provide dynamic reports for NetFlix showing various representations of their aggregated data based on Rental, Shipment, Payment and DVD Inventory.<br />To demonstrate how OLAP is used to provide dynamic multidimensional reports.<br />
  3. 3. Scope<br />To create mock up data to be uploaded into the Data Warehouse<br />To build a complete end to end ETL solution.<br />Use of SQL*Loader, stored procedures and triggers to implement business transformation rules from Staging to Target Area.<br />To create canned reports and demonstrate how Data Warehouses can provide Dynamic multidimensional reports<br />
  4. 4. Out of Scope<br />To build the OLTP database from scratch<br />Code all business and functional rules related to Netflix data storage and operational requirements<br />
  5. 5. Tools and Environment<br />
  6. 6. Process Flow<br />
  7. 7. Process Flow - Extract<br />SQL Queries <br />SQL Queries were ran against the NetFlix OLTP Database to extract the data for the dimension tables. <br />The extracts were saved as CSV Files.<br />SQL*Loader – This tool was used to upload the CSV Files into the Staging Area of the DW database.<br />Stored Procedures – Used to extract data for the Member and DVD dimension tables and for the fact tables.<br />Fact Tables stored procedures have two parameters, startdt and enddt.<br />
  8. 8. Process Flow - Extract<br />Control File<br />SQL*Loader<br />
  9. 9. Process Flow - Transform<br />After the Stored Procedure for the DVD extract executes, the V_DVD materialized view gets refreshed (force)<br />T_STAR_DIM, also gets automatically updated through a trigger once the STG_MOVIEPERSONROLE_DIM table gets populated. <br />The T_STAR_DIM table is a denormalized version of the MOVIEPERSONROLE table<br />T_MEMBER_DIM is also a denormalizedversion of a source table<br />
  10. 10. Process Flow – Load<br />The Stored Procedure, POP_TARGET_SP, moves the data from the Staging Area (STG_) to its corresponding table in the Target Area (T_) within the DW Database.<br />Only takes the records that are not already in the Target Area. <br />Ensures that there is only a subset of data that is run by the process while guaranteeing the preservation of historical data in the Target Fact Tables (T_*_F).<br />Uses NOT IN statements to ensure that there is no duplication <br />Listed in sequence to preserve and abide byintegrity constraints set up in the Target Area. <br />
  11. 11. Database Diagram - NetFlix<br />
  12. 12. Database Diagram - DW<br />
  13. 13. OLAP Cubes and Reporting<br />3 Cubes<br />Rental Cube<br />DVD Cube<br />Payment Cube<br />Reports <br />Dashboard<br />Microsoft Excel – Pivot Tables using Offline Cubes<br />
  14. 14. Rental-DVD Cube<br />This cube is a virtual cube, a combination of the Rental cube and the DVD cube.<br />Rental Cube<br />DVD Cube<br />
  15. 15. Rental-DVD Cube<br />Dimensions and Measures<br />
  16. 16. Rental-DVD Dashboard<br />
  17. 17. Payment Cube<br />Starflake schema<br />Outer join on T_MEMBER_DIM<br />Calculated Measure<br />Example of a Data Warehouse constraint<br />
  18. 18. Payment Cube<br />Dimensions and Measures<br />
  19. 19. Payment Cube Dashboard and Report<br />
  20. 20. Incremental Load<br />Created mock up data<br />Performed CSV extracts<br />Ran SQL*Loader<br />Ran Stored Procedures for the population of the Staging Area<br />Ran Stored Procedure for the population of the Target Area<br />Refreshed Online Cubes<br />Recreated Offline Cubes<br />
  21. 21. Demo<br />Please see the demo.avi file in the ronalao_term.zip file<br />
  22. 22. Sources/References<br />CS779 NetFlix_Oracle_Inserts.sql<br />CS779 Netflix_Oracle_Create_Indexes.sql<br />CS779 NetFlix_Oracle_Create_Tables.sql<br />OLAP Cube 3.0 : http://www.adersoft.com<br />http://msdn.microsoft.com/en-us/library/aa216377(SQL80).aspx<br />http://e-articles.info/e/a/title/Dashboard-Report/<br />http://camstudio.org<br />
  23. 23. Thank you<br />Good luck in the final exams! <br />