Ecosistema de Data Warehouse com
Ferramentas Microsoft
Quem Sou
 Diretor da Búfalo Informática
 Líder do PASS Chapter devSQL/RJ
 MCT – Trainer
 MCSE Data e BI
Links
 http://facebook.com/devSQL
 Devsql-subscribe@yahoogroups.com
 http://www.youtube.com/c/dennestorres
 http://www.bufaloinfo.com.br
 http://bufaloinfo.cloudapp.net
 dennes@bufaloinfo.com.br
 @Dennes
O que é
• Trabalho rotineiro mantendo as atividades da empresa
em funcionamentoProdução
• Trabalho de análise dos dados gerados pela produção
com o objetivo de gerar novos métodos e rotinas de
trabalho, alterando o trabalho de produção
Bussiness
Intelligence
• A armazenagem de dados históricos utilizada pelo BI
Data
Warehouse
POR QUE UM DATA WAREHOUSE ?
Desafios
 Tipo de Processamento
 Tipo de Modelagem
 Distribuição/Agregação dos dados
 Organização dos dados
 Contextualização
Tipo de Processamento
Modelagem
Fact
Dimension
Dimension
Dimension
Dimension
Dimension
Dimension
Snowflake schema
Star schema
Measures
Attributes
Attributes
Attributes
Attributes
Attributes
Attributes
Distribuição/Agregação dos dados
DW
Organização dos Dados
J.Jones
Feminino
2 de julho de 1945
J.Jones
Duas multas
Um acidente grave
J.Jones
Rua Bela, 123
Casado
J.Jones
Dois Filhos
Hipertensão Arterial
Seguro de Vida
Seguro de Veiculo
Seguro de Saúde
Seguro Residencial
J.Jones
Feminino
Nascimento – 2 de
Julho de 1945
Duas multas
Um acidente grave
Rua Bela, 123
Casado
Dois filhos
Hipertensão Arterial
Contextualização
ETAPAS DE CONSTRUÇÃO
Etapas
 Extrair para o DW
 Criar modelos de dados
 Criar Visualizações
 Disponibilizar
Componentes de uma Solução de Data Warehouse
Data
Warehouse
Master Data
Management
Data
Cleansing
DataSources

ETL
Data
Models
Reporting and Analysis
PASSOS EM UM DATA WAREHOUSE
Demo
Origens de dados
Data
Warehouse
Master Data
Management
Data
Cleansing
DataSources

ETL
Data
Models
Reporting and Analysis
Origens de dados dos mais diversos tipos:
 Application relational databases
 Proprietary data stores
 Documents
 Real-time data streams
 External data
Extract, Transform, e Load
Reporting and Analysis
Data
Warehouse
Master Data
Management
Data
Cleansing
DataSources

ETL
Data
Models
Enterprise Integration Management
 ETL:
 Extração de dados
 Transformação
 Carga de dados
 Data Cleansing:
 Validação de dados
 Eliminação de duplicações
 Master Data Management:
 Business entity integrity
Arquiteturas de ETL
• Single-stage ETL
• Transferência ocorre direto para o DW
• Transformações e validações ocorrendo durante
o procedimento
• Two-stage ETL
• Os dados são armazenados em um staging
• Transformações e validações ocorrem na
transferência ou no stagin
• Three-stage ETL
• Dados transferidos para uma landing zone e
depois para um staging
• Transformações e validações ocorrem durante o
fluxo de dados
Source DW
Source DWStaging
Source
DWStaging
Landing Zone
DATA WAREHOUSE
Data Warehouse
 Modelagem
 Particionamento
 Granularidade
 Distribuição/Data Marts
The Data Warehouse
Reporting and Analysis
Data
Warehouse
Master Data
Management
Data
Cleansing
DataSources

ETL
Data
Models
 Kimball Dimensional Data Marts
 Inmon Corporate Information Factory
 Central Dimensional Data Warehouse
 Federated Hub-and-Spoke
Granularidade
1 Registro por fato
Disco de alto
desempenho
1 Registro por dia
Disco de alto
desempenho
1 Registro por mês
Menor desempenho
ANALYSIS SERVICES
Analysis Services
 Criação de modelos amigáveis para o analista final
 Multidimensional x Tabular
 Muitas otimizações
Analytical Data Models
Reporting and Analysis
Data
Warehouse
Master Data
Management
Data
Cleansing
DataSources

ETL
Data
Models
Benefícios do Data Model:
 Abstrai a estrutura do DW
 Simplifica a análise do usuário
 Adiciona regras de negócio
 Pre-agrega as measures
Tipos de modelo:
 Multidimensional
 Tabular
Multidimensional Tabular
MDX DAX
Mais complexo Mais simples – próximo do Excel
MOLAP/HOLAP/ROLAP In-Memory ou DirectQuery
Diversas pequenas diferenças
Modelo Multidimensional
Modelo Tabular
Reporting and Analysis
Data
Warehouse
Data ManagMaster
ement
Data
Cleansing
DataSources

ETL
Data
Models
 IT-provided reports
 Relatórios de Self-service
 Análises interativas
 Dashboards e scorecards
 Data mining
Reporting and Analysis
Excel
 Poderosa ferramenta client
 Muitos recursos de análise
 Evita o problema das manutenções infinitas
Self-Service BI
 Permitir que o usuário monte suas próprias analyses
 Desafogar equipes técnicas
Ferramentas de Self-Service BI
 Excel
 Reporting Services
 PowerPivot
 PowerQuery - M
 PowerView
 PowerMap
 Sharepoint
PowerBI
 Ferramenta online
 Une as ferramentas de self-service
DEMO
Resultados do BI
 Novas estratégias de negócio
 Novos procedimentos para produção
 Novas necessidades dos clientes
 Dados para suporte a produção
Big Data
 Utilização de processamento paralelo para análise de
volumes muito grandes de dados
 Muito utilizado para informações da web ou IoT
Hadoop Cluster
Data
Cleansing
Componentes de uma Solução de Data Warehouse
Data
Warehouse
Master Data
Management
DataSources

ETL
Data
Models
Big Data
Machine
Learning
Soluções de Big Data
 Hadoop
 HDInsight
 SQL Server PDW
 SQL Data Warehouse
 Azure Data Lake
HDInsight Cluster
Mais
 Data Factory
 Data Mining x Machine Learning
 DMX
 R
 Spark e Storm
 StreamInsight
Links
 http://facebook.com/devSQL
 Devsql-subscribe@yahoogroups.com
 http://www.youtube.com/c/dennestorres
 http://www.bufaloinfo.com.br
 http://bufaloinfo.cloudapp.net
 dennes@bufaloinfo.com.br
 @Dennes
http://bufaloinfo.cloudapp.net
OBRIGADO!

Ecosistema de data warehouse com ferramentas microsoft

Notas do Editor

  • #10 If students want to better understand how the star join query optimizations in the SQL Server query optimizer work, review the articles referenced in their notes. However, emphasize that the optimizations are automatic and that students do not need to do anything other than use a star schema for the data warehouse tables to benefit from them.
  • #16 Note that not all data warehousing solutions include every component shown on the slide. However, each component has an important part to play in the implementation of a data warehousing solution and will be considered in this course.
  • #18 Ask students for examples of other types of data source they have encountered. Ask students how they would begin auditing a data source. Typical approaches include: Interactively querying tables in a relational database and examining the results. Reviewing data extracted from systems in text files. Using a tool such as the Data Profiling task in SQL Server Integration Services. Note that students who have previously attended course 20463C: Implementing a Data Warehouse with Microsoft® SQL Server® 2014 have used all of these techniques in the labs in that course.
  • #19 Point out that, although ETL is a primary element of a data warehousing solution, it is a subset of EIM. A comprehensive enterprise BI solution may also include data cleansing (either directly in the data sources or as part of the ETL process that loads the data warehouse) and master data management. Point out that the key difference between master data management and data cleansing is that the latter is based on knowledge about valid column values in a dataset. Master data management is based on knowledge about individual instances of business entities. For example, suppose a hypothetical customer has three records, one in each of three different systems. One indicates that the customer lives in “New Yrk”, another indicates that the customer lives in “Paris”, and the third indicates that the customer lives in “London”. A data quality solution might identify that the records all relate to the same customer (based on the fact that the records have the same name and email address) and that “New Yrk” should be corrected to “New York”, but it can’t determine in which of the three cities the customer lives. A master data management solution provides a definitive master record from which the correct data can be determined. Students who have attended course 20463C: Implementing a Data Warehouse with Microsoft® SQL Server® 2014 have used SSIS, DQS, and MDS during the labs in that course.
  • #24 Some students may want to debate the comparative merits of the Inmon and Kimball approaches. There are interesting aspects to both designs but, in practice, most solutions built on SQL Server favor a Kimball-style dimensional model in a centralized data warehouse or, in extremely large enterprises, a hub and spoke architecture. Try to avoid getting bogged down in a philosophical debate and steer students toward the pragmatic point that an understanding of dimensional modeling is important regardless of the methodology employed.
  • #28 Do not dwell too long on the differences between tabular and multidimensional models, because this is covered in subsequent modules.
  • #32 Explain that, although they can be considered as discrete elements of the BI solution, reporting and analysis are closely related and, in practice, are often indistinguishable from one another. For example, a user might employ an Excel PivotTable to analyze data in a cube, and then publish the resulting spreadsheet as a report. Another user might take an OData feed from a SQL Server Reporting Services report as a data source for a personal data model and analyze the report data together with data from other sources. Avoid getting into too much detail about analysis and reporting technologies at this stage. The key point is that reporting and analysis are the end goal of most BI solutions, and the specific reporting and analysis requirements must be identified and documented during the initial planning.
  • #42 Note that not all data warehousing solutions include every component shown on the slide. However, each component has an important part to play in the implementation of a data warehousing solution and will be considered in this course.