5. O que é
• Trabalho rotineiro mantendo as atividades da empresa
em funcionamentoProdução
• Trabalho de análise dos dados gerados pela produção
com o objetivo de gerar novos métodos e rotinas de
trabalho, alterando o trabalho de produção
Bussiness
Intelligence
• A armazenagem de dados históricos utilizada pelo BI
Data
Warehouse
11. Organização dos Dados
J.Jones
Feminino
2 de julho de 1945
J.Jones
Duas multas
Um acidente grave
J.Jones
Rua Bela, 123
Casado
J.Jones
Dois Filhos
Hipertensão Arterial
Seguro de Vida
Seguro de Veiculo
Seguro de Saúde
Seguro Residencial
J.Jones
Feminino
Nascimento – 2 de
Julho de 1945
Duas multas
Um acidente grave
Rua Bela, 123
Casado
Dois filhos
Hipertensão Arterial
14. Etapas
Extrair para o DW
Criar modelos de dados
Criar Visualizações
Disponibilizar
15. Componentes de uma Solução de Data Warehouse
Data
Warehouse
Master Data
Management
Data
Cleansing
DataSources
ETL
Data
Models
Reporting and Analysis
17. Origens de dados
Data
Warehouse
Master Data
Management
Data
Cleansing
DataSources
ETL
Data
Models
Reporting and Analysis
Origens de dados dos mais diversos tipos:
Application relational databases
Proprietary data stores
Documents
Real-time data streams
External data
18. Extract, Transform, e Load
Reporting and Analysis
Data
Warehouse
Master Data
Management
Data
Cleansing
DataSources
ETL
Data
Models
Enterprise Integration Management
ETL:
Extração de dados
Transformação
Carga de dados
Data Cleansing:
Validação de dados
Eliminação de duplicações
Master Data Management:
Business entity integrity
19.
20. Arquiteturas de ETL
• Single-stage ETL
• Transferência ocorre direto para o DW
• Transformações e validações ocorrendo durante
o procedimento
• Two-stage ETL
• Os dados são armazenados em um staging
• Transformações e validações ocorrem na
transferência ou no stagin
• Three-stage ETL
• Dados transferidos para uma landing zone e
depois para um staging
• Transformações e validações ocorrem durante o
fluxo de dados
Source DW
Source DWStaging
Source
DWStaging
Landing Zone
23. The Data Warehouse
Reporting and Analysis
Data
Warehouse
Master Data
Management
Data
Cleansing
DataSources
ETL
Data
Models
Kimball Dimensional Data Marts
Inmon Corporate Information Factory
Central Dimensional Data Warehouse
Federated Hub-and-Spoke
24. Granularidade
1 Registro por fato
Disco de alto
desempenho
1 Registro por dia
Disco de alto
desempenho
1 Registro por mês
Menor desempenho
26. Analysis Services
Criação de modelos amigáveis para o analista final
Multidimensional x Tabular
Muitas otimizações
27. Analytical Data Models
Reporting and Analysis
Data
Warehouse
Master Data
Management
Data
Cleansing
DataSources
ETL
Data
Models
Benefícios do Data Model:
Abstrai a estrutura do DW
Simplifica a análise do usuário
Adiciona regras de negócio
Pre-agrega as measures
Tipos de modelo:
Multidimensional
Tabular
28. Multidimensional Tabular
MDX DAX
Mais complexo Mais simples – próximo do Excel
MOLAP/HOLAP/ROLAP In-Memory ou DirectQuery
Diversas pequenas diferenças
31. Reporting and Analysis
Data
Warehouse
Data ManagMaster
ement
Data
Cleansing
DataSources
ETL
Data
Models
IT-provided reports
Relatórios de Self-service
Análises interativas
Dashboards e scorecards
Data mining
Reporting and Analysis
32. Excel
Poderosa ferramenta client
Muitos recursos de análise
Evita o problema das manutenções infinitas
38. Resultados do BI
Novas estratégias de negócio
Novos procedimentos para produção
Novas necessidades dos clientes
Dados para suporte a produção
39. Big Data
Utilização de processamento paralelo para análise de
volumes muito grandes de dados
Muito utilizado para informações da web ou IoT
41. Data
Cleansing
Componentes de uma Solução de Data Warehouse
Data
Warehouse
Master Data
Management
DataSources
ETL
Data
Models
Big Data
Machine
Learning
42. Soluções de Big Data
Hadoop
HDInsight
SQL Server PDW
SQL Data Warehouse
Azure Data Lake
If students want to better understand how the star join query optimizations in the SQL Server query optimizer work, review the articles referenced in their notes. However, emphasize that the optimizations are automatic and that students do not need to do anything other than use a star schema for the data warehouse tables to benefit from them.
Note that not all data warehousing solutions include every component shown on the slide. However, each component has an important part to play in the implementation of a data warehousing solution and will be considered in this course.
Ask students for examples of other types of data source they have encountered.
Ask students how they would begin auditing a data source. Typical approaches include:
Interactively querying tables in a relational database and examining the results.
Reviewing data extracted from systems in text files.
Using a tool such as the Data Profiling task in SQL Server Integration Services.
Note that students who have previously attended course 20463C: Implementing a Data Warehouse with Microsoft® SQL Server® 2014 have used all of these techniques in the labs in that course.
Point out that, although ETL is a primary element of a data warehousing solution, it is a subset of EIM. A comprehensive enterprise BI solution may also include data cleansing (either directly in the data sources or as part of the ETL process that loads the data warehouse) and master data management.
Point out that the key difference between master data management and data cleansing is that the latter is based on knowledge about valid column values in a dataset. Master data management is based on knowledge about individual instances of business entities. For example, suppose a hypothetical customer has three records, one in each of three different systems. One indicates that the customer lives in “New Yrk”, another indicates that the customer lives in “Paris”, and the third indicates that the customer lives in “London”. A data quality solution might identify that the records all relate to the same customer (based on the fact that the records have the same name and email address) and that “New Yrk” should be corrected to “New York”, but it can’t determine in which of the three cities the customer lives. A master data management solution provides a definitive master record from which the correct data can be determined.
Students who have attended course 20463C: Implementing a Data Warehouse with Microsoft® SQL Server® 2014 have used SSIS, DQS, and MDS during the labs in that course.
Some students may want to debate the comparative merits of the Inmon and Kimball approaches. There are interesting aspects to both designs but, in practice, most solutions built on SQL Server favor a Kimball-style dimensional model in a centralized data warehouse or, in extremely large enterprises, a hub and spoke architecture. Try to avoid getting bogged down in a philosophical debate and steer students toward the pragmatic point that an understanding of dimensional modeling is important regardless of the methodology employed.
Do not dwell too long on the differences between tabular and multidimensional models, because this is covered in subsequent modules.
Explain that, although they can be considered as discrete elements of the BI solution, reporting and analysis are closely related and, in practice, are often indistinguishable from one another. For example, a user might employ an Excel PivotTable to analyze data in a cube, and then publish the resulting spreadsheet as a report. Another user might take an OData feed from a SQL Server Reporting Services report as a data source for a personal data model and analyze the report data together with data from other sources.
Avoid getting into too much detail about analysis and reporting technologies at this stage. The key point is that reporting and analysis are the end goal of most BI solutions, and the specific reporting and analysis requirements must be identified and documented during the initial planning.
Note that not all data warehousing solutions include every component shown on the slide. However, each component has an important part to play in the implementation of a data warehousing solution and will be considered in this course.