O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Power BI for Big Data and the New Look of Big Data Solutions

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 66 Anúncio

Power BI for Big Data and the New Look of Big Data Solutions

Baixar para ler offline

New features in Power BI give it enterprise tools, but that does not mean it automatically creates an enterprise solution.  In this talk we will cover these new features (composite models, aggregations tables, dataflow) as well as Azure Data Lake Store Gen2, and describe the use cases and products of an individual, departmental, and enterprise big data solution.  We will also talk about why a data warehouse and cubes still should be part of an enterprise solution, and how a data lake should be organized.

New features in Power BI give it enterprise tools, but that does not mean it automatically creates an enterprise solution.  In this talk we will cover these new features (composite models, aggregations tables, dataflow) as well as Azure Data Lake Store Gen2, and describe the use cases and products of an individual, departmental, and enterprise big data solution.  We will also talk about why a data warehouse and cubes still should be part of an enterprise solution, and how a data lake should be organized.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Power BI for Big Data and the New Look of Big Data Solutions (20)

Anúncio

Mais de James Serra (20)

Mais recentes (20)

Anúncio

Power BI for Big Data and the New Look of Big Data Solutions

  1. 1. Power BI for Big Data and the new look of Big Data solutions James Serra Big Data Evangelist Microsoft JamesSerra3@gmail.com
  2. 2. About Me  Microsoft, Big Data Evangelist  In IT for 30 years, worked on many BI and DW projects  Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM architect, PDW/APS developer  Been perm employee, contractor, consultant, business owner  Presenter at PASS Business Analytics Conference, PASS Summit, Enterprise Data World conference  Certifications: MCSE: Data Platform, Business Intelligence; MS: Architecting Microsoft Azure Solutions, Design and Implement Big Data Analytics Solutions, Design and Implement Cloud Data Platform Solutions  Blog at JamesSerra.com  Former SQL Server MVP  Author of book “Reporting with Microsoft SQL Server 2012”
  3. 3. Agenda  Azure Data Lake Store Gen2  Big data solution use cases  Power BI  Composite data models  Aggregation tables  Dataflows  XMLA Endpoints  RDL support  Application Lifecycle Management (ALM)  Incremental Refresh  Demo  Common architecture patterns
  4. 4. Blob Storage Data Lake Store Azure Data Lake Storage Gen2 Large partner ecosystem Global scale – All 50 regions Durability options Tiered - Hot/Cool/Archive Cost Efficient Built for Hadoop Hierarchical namespace ACLs, AAD and RBAC Performance tuned for big data Very high scale capacity and throughput Large partner ecosystem Global scale – All 50 regions Durability options Tiered - Hot/Cool/Archive Cost Efficient Built for Hadoop Hierarchical namespace ACLs, AAD and RBAC Performance tuned for big data Very high scale capacity and throughput
  5. 5. Hadoop on a cluster of Azure virtual machines (IaaS) Azure HDInsight (PaaS) Azure Data Lake Analytics (SaaS)Azure Databricks (PaaS) Higher level of complexity, control, & customization Greater integration with Apache projects Greater ease of use Less integration with Apache projects Greater administrative effort Less administrative effort
  6. 6. Needs data governance so your data lake does not turn into a data swamp!
  7. 7. Objectives  Plan the structure based on optimal data retrieval  Avoid a chaotic, unorganized data swamp Data Retention Policy Temporary data Permanent data Applicable period (ex: project lifetime) etc… Business Impact / Criticality High (HBI) Medium (MBI) Low (LBI) etc… Confidential Classification Public information Internal use only Supplier/partner confidential Personally identifiable information (PII) Sensitive – financial Sensitive – intellectual property etc… Probability of Data Access Recent/current data Historical data etc… Owner / Steward / SME Subject Area Security Boundaries Department Business unit etc… Time Partitioning Year/Month/Day/Hour/Minute Downstream App/Purpose Common ways to organize the data:
  8. 8. Microsoft Confidential Import vs. DirectQuery DirectQuery Import
  9. 9. Microsoft Confidential Import vs. DirectQuery DirectQuery Import
  10. 10. Sales Date Customer Product Employee Geography Reseller Sales Sales Date Customer Product Employee Geography Reseller Sales
  11. 11. SalesSales Product Customer Geography Date Employee Reseller Sales Date Employee Reseller Sales Customer Geography Product
  12. 12. Sales AggSales Product Customer Geography Date Employee Reseller Sales Date Employee Reseller Sales Customer Geography Product
  13. 13. Azure Analysis Services Power BIPower BI Premium Corporate BI Self-service BI users All BI users
  14. 14. Sales Product Sales Agg Customer Geography Date Employee Reseller Sales Date Employee Reseller Sales Customer Geography Product
  15. 15. Sales Product Sales Agg Customer Geography Date Employee Reseller Sales Date Employee Reseller Sales Customer Geography Product SummarizeColumns( Date[Year], Geography[City], "Sales", Sum(Sales[Amount]) )
  16. 16. Sales Product Sales Agg Customer Geography Date Employee Reseller Sales Date Employee Reseller Sales Customer Geography Product SummarizeColumns( Date[Year], Customer[Name], "Sales", Sum(Sales[Amount]) )
  17. 17. Sales Product Sales Agg Customer Geography Date Employee Reseller Sales Date Employee Reseller Sales Customer Geography Product “Many side” “One side” Dual Dual Import Import or Dual DQ DQ or Dual
  18. 18. Power BI introduces self-service data-prep capabilities Self-service low code/no code Integral part of Power BI stack Cloud and on-premises connectors Standard schema (Common Data Model) Data reuse In-lake transformationsDataflows
  19. 19. Power BI introduces dataflows BI models Visualizations Data prep Data (Azure Data Lake)
  20. 20. Data + AI professionals can use the full power of the Azure Data Platform Azure Databricks Azure MLAzure SQL DW Azure Data Factory Business analysts Low/no code Data scientists Data engineers Low to high code CDM folder CDM folder CDM folder
  21. 21. Dataflow editor Create a new dataflow using Power BI dataflow editor
  22. 22. Dataflow editor Create a new dataflow using Power BI dataflow editor
  23. 23. Ingest data Ingest data using on-prem and cloud connectors
  24. 24. Connect to Dynamics via Common Data Service for Apps connector Select Dynamics Common Data Model and custom entities from CDS for Apps data source to ingest into Power BI
  25. 25. PQ online Use Power Query Online to perform transformations and data cleansing Map entities from any data source (e.g. SQL Azure) to the Common Data Model as part of PQ transformations
  26. 26. Perform mapping to CDM Choose a standard entity that exists in CDM to map your data
  27. 27. Perform mapping to CDM Choose a standard entity that exists in CDM to map your data
  28. 28. Incremental refresh Define incremental refresh based on time columns This dataflow
  29. 29. Connect from Power BI Desktop Connect to Power BI dataflows to generate models and reports using dataflow data Dataflow Power BI dataflow
  30. 30. Business logic & metrics Data modeling Security Azure Analysis Services Server Lifecycle management In-memory cache
  31. 31. Business logic & metrics Data modeling Security Lifecycle management In-memory cache
  32. 32. Column(s) Measure(s) Table(s) Model Database public void RefreshTable(...) { var server = new Server(); server.Connect(cnnString); // Connect to the server Database db = server.Databases[dbName]; // Connect to the database Model = db.Model; // Reprocess the table model.Tables[tableName].RequestRefresh(RefreshType.Full); model.SaveChanges(); // Commit the changes }
  33. 33. { "refresh": { "type": "full", "objects": [ { "database": "Sales Analysis", "table": "Reseller Sales" } ] } } { "createOrReplace": { "object": { "database": "AdventureWorks" }, "database": { "name": "AdventureWorks", ... } } } }
  34. 34. I M P L E M E N T I N G C O M M O N C U S T O M E R P A T T E R N S
  35. 35. Advanced Analytics Social LOB Graph IoT Image CRM INGEST STORE PREP MODEL & SERVE Data orchestration and monitoring Big data store Transform & Clean Data warehouse AI BI + Reporting Azure Data Factory SSIS Azure Data Lake Storage Gen2 Azure Databricks Azure Data Lake Analytics Azure HDInsight Azure SQL Data Warehouse Azure Analysis Services
  36. 36. INGEST STORE PREP & TRAIN MODEL & SERVE C L O U D D A T A W A R E H O U S E Azure Data Lake Store Gen2 Logs (unstructured) Azure Data Factory Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the above architecture to meet their unique needs. Media (unstructured) Files (unstructured) PolyBase Business/custom apps (structured) Azure SQL Data Warehouse Azure Analysis Services Power BI
  37. 37. INGEST STORE PREP & TRAIN MODEL & SERVE M O D E R N D A T A W A R E H O U S E Azure Data Lake Store Gen2 Logs (unstructured) Azure Data Factory Azure Databricks Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the above architecture to meet their unique needs. Media (unstructured) Files (unstructured) PolyBase Business/custom apps (structured) Azure SQL Data Warehouse Azure Analysis Services Power BI
  38. 38. A D V A N C E D A N A L Y T I C S O N B I G D A T A INGEST STORE PREP & TRAIN MODEL & SERVE Cosmos DB Business/custom apps (structured) Files (unstructured) Media (unstructured) Logs (unstructured) Azure Data Lake Store Gen2Azure Data Factory Azure SQL Data Warehouse Azure Analysis Services Power BI PolyBase SparkR Azure Databricks Microsoft Azure also supports other Big Data services like Azure HDInsight, Azure Machine Learning to allow customers to tailor the above architecture to meet their unique needs. Real-time apps
  39. 39. INGEST STORE PREP & TRAIN MODEL & SERVE R E A L T I M E A N A L Y T I C S Sensors and IoT (unstructured) Apache Kafka for HDInsight Cosmos DB Files (unstructured) Media (unstructured) Logs (unstructured) Azure Data Lake Store Gen2Azure Data Factory Azure Databricks Real-time apps Business/custom apps (structured) Azure SQL Data Warehouse Azure Analysis Services Power BI Microsoft Azure also supports other Big Data services like Azure IoT Hub, Azure Event Hubs, Azure Machine Learning to allow customers to tailor the above architecture to meet their unique needs. PolyBase
  40. 40. INGEST STORE MODEL & SERVE D A T A M A R T C O N S O L I D A T I O N Azure Data Lake Store Gen2 Azure SQL Data Warehouse Azure Data Factory Azure Analysis Services Power BI RDBMS data marts Hadoop Microsoft Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the architecture to meet their unique needs. PolyBase
  41. 41. INGEST STORE PREP & TRAIN MODEL & SERVE H U B & S P O K E A R C H I T E C T U R E F O R B I Azure SQL Data Warehouse PolyBase Business/custom apps (structured) Power BI Microsoft Azure supports other services like Azure HDInsight to allow customers a truly customized solution. Multiple Azure Analysis Services instances SQL Multiple Azure SQL Database instances Data Marts Data Cubes Azure Databricks Logs (unstructured) Media (unstructured) Files (unstructured) Azure Data Lake Store Gen2Azure Data Factory
  42. 42. INGEST STORE PREP & TRAIN MODEL & SERVE A U T O S C A L I N G D A T A W A R E H O U S E Microsoft Azure supports other services like Azure HDInsight to allow customers a truly customized solution. Azure Analysis Services Azure Functions (Auto-scaling) Business/custom apps (structured) Logs (unstructured) Media (unstructured) Files (unstructured) Azure SQL Data Warehouse PolyBase Power BIAzure Data Lake Store Gen2Azure Data Factory Azure Databricks
  43. 43. D A T A W A R E H O U S E M I G R A T I O N INGEST STORE PREP & TRAIN MODEL & SERVE Azure also supports other Big Data services like Azure HDInsight to allow customers to tailor the architecture to meet their unique needs. Business/custom apps (structured) Azure SQL Data Warehouse Business/custom apps Azure Data Lake Store Gen2 Logs (unstructured) Azure Data Factory Azure Databricks Media (unstructured) Files (unstructured) Azure Analysis Services Power BI PolyBase
  44. 44. Resources  Why use a data lake? http://bit.ly/1WDy848  Big Data Architectures http://bit.ly/1RBbAbS  The Modern Data Warehouse: http://bit.ly/1xuX4Py  Hadoop and Data Warehouses: http://bit.ly/1xuXfu9
  45. 45. Q & A ? James Serra, Big Data Evangelist Email me at: JamesSerra3@gmail.com Follow me at: @JamesSerra Link to me at: www.linkedin.com/in/JamesSerra Visit my blog at: JamesSerra.com (where this slide deck is posted under the “Presentations” tab)

Notas do Editor

  • Power BI for Big Data and the new look of Big Data solutions
     
    New features in Power BI give it enterprise tools, but that does not mean it automatically creates an enterprise solution.  In this talk we will cover these new features (composite models, aggregations tables, dataflow) as well as Azure Data Lake Store Gen2, and describe the use cases and products of an individual, departmental, and enterprise big data solution.  We will also talk about why a data warehouse and cubes still should be part of an enterprise solution, and how a data lake should be organized.
  • Fluff, but point is I bring real work experience to the session
  • You can use enterprise tools, but that does not mean you are building an enterprise solution
  • Talking point: IT/PowerUser uses ADF/U-SQL. User could also bypass ADLS and go right to source if no cleaning needed

    It takes the approach of ELT instead of ETL in that data is loaded into Azure Data Lake Store and then converted using the power of Azure Data Lake Analytics instead of it being transformed during the move from the source system to the data lake like you usually do when using SSIS
  • Sometimes has data marts (hub-and-spoke)
  • Crowed sourced career service, smart-phone app emits drivers location
  • https://www.sqlchick.com/entries/2017/12/30/zones-in-a-data-lake
    https://www.sqlchick.com/entries/2016/7/31/data-lake-use-cases-and-planning

    Question: Do you see many companies building data lakes?

    Raw: Raw events are stored for historical reference. Also called staging layer or landing area
    Cleansed: Raw events are transformed (cleaned and mastered) into directly consumable data sets. Aim is to uniform the way files are stored in terms of encoding, format, data types and content (i.e. strings). Also called conformed layer
    Application: Business logic is applied to the cleansed data to produce data ready to be consumed by applications (i.e. DW application, advanced analysis process, etc). This is also called by a lot of other names: workspace, trusted, gold, secure, production ready, governed, presentation
    Sandbox: Optional layer to be used to “play” in.  Also called exploration layer or data science workspace
  • Drill to individual driver via Drillthrough
  • How to get answers to business questions about your data?
  • How to get answers to business questions about your data?
  • Question: Should SQL Database be considered in the Model & Serve blade, using it as a data mart?
  • Microsoft Azure supports other services like Azure HDInsight, Azure Data Lake, Azure IoT Hub, Azure Events Hub in various layers of the architecture above to allow customers a truly customized solution.

×