O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Modern Data Warehouse Fundamentals Part 2

879 visualizações

Publicada em

Explore new trends and use cases in data warehousing including exploration and discovery, self-service ad-hoc analysis, predictive analytics and more ways to get deeper business insight. Modern Data Warehousing Fundamentals will show how to modernize your data warehouse architecture and infrastructure for benefits to both traditional analytics practitioners and data scientists and engineers.

Publicada em: Tecnologia
  • Protect your brain from memory loss with brain pill. find out more... ♣♣♣ https://bit.ly/2GEWG9T
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Modern Data Warehouse Fundamentals Part 2

  1. 1. © Cloudera, Inc. All rights reserved. 1
  2. 2. MODERN DATA WAREHOUSE FUNDAMENTALS Part II: Exploring the Move to Cloud and Maintaining a Common Data Context December, 2018
  3. 3. © Cloudera, Inc. All rights reserved. 3 SPEAKERS Greg Rahn Director of Product Management grahn@cloudera.com Santosh Kumar Senior Product Manager skumar@cloudera.com
  4. 4. 4 © Cloudera, Inc. All rights reserved. BEYOND DATA WAREHOUSING The modern platform for machine learning and analytics optimized for the cloud Amazon S3 Microsoft ADLS HDFS KUDU SECURITY GOVERNANCE WORKLOAD MANAGEMENT INGEST & REPLICATION DATA CATALOG Core Services Storage Services ANALYTICSDATA SCIENCE EXTENSIBLE SERVICES OPERATIONAL DATABASE DATA ENGINEERING
  5. 5. Confidential-Restricted – For Discussion Purposes Only5 © Cloudera, Inc. All rights reserved. WITH A CLOUD NATIVE OPTION - ALTUS DW ● Quick time to value - no software or clusters to manage ● Bring warehouse to the data with zero copy simplicity ● Use your security policies with your data - no proprietary stacks ● Apply enterprise governance to transient workloads ● Shared data experience with SDX ● Optimized for Azure & AWS DATA WAREHOUSE GOVERNANCESECURITY ALTUS CONTROL PLANE LIFECYCLE MANAGEMENT MULTI-CLOUD Amazon S3 Microsoft ADLS MULTI-CLOUD PAAS SOLUTION
  6. 6. 6 © Cloudera, Inc. All rights reserved. KOMATSU MINING: Optimize Machine Performance CHALLENGES Create an Industrial IoT (IIoT) solution for optimizing mining equipment utility and build better next-generation products Current system couldn’t handle: • Scale of IoT data • Demand for new users and use cases • 30TB/month data growth RESULTS • 2X Increase in production hours on key equipment • Design next-generation equipment: environmentally smarter, more productive, at lower cost • Meet or exceed all KPIs: “Deliver all of the data with less complexity and significant cost savings” SOLUTION Cloud-based IIoT analytics for a full view of mining operations • Quickly and easily analyze huge volume and variety (time-series, sensor, event, and more) of data • More use cases and users: “democratizing analytics for different user groups” • Scale quickly and easily in the cloud https://www.cloudera.com/more/news-and-blogs/press-releases/2017-11-15-komatsu-helps-improve-mining-performance.html
  7. 7. © Cloudera, Inc. All rights reserved. MOTIVATIONS FOR THE CLOUD
  8. 8. 8 © Cloudera, Inc. All rights reserved. BUSINESS DRIVERS FOR DATA WAREHOUSING IN THE CLOUD *BI, Analytics, and the Cloud: Strategies for Business Agility, TDWI, 2016 Scalability (51%) Flexibility (41%) Business agility / reduce IT involvement (39%) Cost (37%) $$$
  9. 9. 9 © Cloudera, Inc. All rights reserved. TECHNOLOGY DRIVERS IN THE CLOUD 1. Cost-effective, scalable storage in a single, shared repository Azure Data Lake StorageAmazon S3 2. Access to limitless utility-based compute Amazon EC2 Azure Virtual Machine 3. Open and modular architectures Apache Impala
  10. 10. 10 © Cloudera, Inc. All rights reserved. KEY STAKEHOLDERS Instant, self-service access to data and resources Application performance Job-oriented tools Choice Secure, controlled provisioning Predictable costs Systems-oriented tools Standards and portability KNOWLEDGE WORKERS INFRASTRUCTURE TEAM Advance strategic initiatives Link analytics to business Reduce admin burden Integrated solutions DATA TEAM
  11. 11. 11 © Cloudera, Inc. All rights reserved. KEY BENEFITS Modern Data Warehouse High-Performance SQL Self-Service Flexibility Cost-Effective Scale Open Architecture for SQL and Beyond
  12. 12. 12 © Cloudera, Inc. All rights reserved. ADVANTAGES OF A MODERN DATA WAREHOUSE Data Flexibility • Iterative modeling and self-service accessibility • Portability: No proprietary formats or storage lock-in Go Beyond SQL • Consolidate data silos with an open architecture • Shared data across SQL and non-SQL workloads High-Performance SQL and … Cost-Effective Scalability • Elastic scale in any environment • Cloud-native integration for optimized pay-per-use costs • Proven at massive scale Hybrid Decoupled Architecture • Runs across multi-cloud & on-prem for zero lock-in • Multi-storage over S3, ADLS, HDFS, Kudu, Isilon, etc. Shared Data
  13. 13. 13 © Cloudera, Inc. All rights reserved. COMMON CLOUD ANALYTIC PATTERNS Shared Object StorageCloud ETL/ELT ETL/ELT Ad Hoc / Exploratory Sales Reporting Marketing Dashboard Only pay for what you need, when you need it • Transient workloads • Contention-free isolation Self-service flexibility at any scale • Elastic scale on-demand • Multi-tenant isolation DATA ENGINEERING DATA ENGINEERING DATA WAREHOUSE DATA WAREHOUSE DATA WAREHOUSE
  14. 14. 14 © Cloudera, Inc. All rights reserved. COMMON CLOUD ANALYTIC PATTERNS Shared Object StorageCloud ETL/ELT ETL/ELT Ad Hoc / Exploratory Sales Reporting Marketing Dashboard Only pay for what you need, when you need it • Transient workloads • Contention-free isolation Self-service flexibility at any scale • Elastic scale on-demand • Multi-tenant isolation DATA ENGINEERING DATA ENGINEERING DATA WAREHOUSE DATA WAREHOUSE DATA WAREHOUSE Beware of data silos without shared metadata
  15. 15. © Cloudera, Inc. All rights reserved. INTELLIGENT DATA CONTEXT - SHARED DATA EXPERIENCE
  16. 16. 16 © Cloudera, Inc. All rights reserved. Stateful Context, Shared Experience INTELLIGENT DATA CONTEXT
  17. 17. 17 © Cloudera, Inc. All rights reserved. With Cloudera Altus Data Warehouse and SDX running on Microsoft ADLS, we were able to establish our Telekom Data Intelligence Hub: a trusted, fully governed platform and ecosystem where our users are empowered to exchange and analyse data and develop multi-function, data-driven applications easier and securely. - Sven Löffler, BizDev Executive
  18. 18. 18 © Cloudera, Inc. All rights reserved. CUSTOMER STORIES Couldn’t solve predictive maintenance goals EDH delivers: • Ingest telematics in real-time • Machine learning to predict failures • Analytics to minimize service downtime • Protect sensitive and regulated data • Consistent security and governance • “SDX is the key to making that happen” - CIO Drug R&D too slow and expensive EDH delivers: • Self-service analytics • Meet HIPAA regulations • >5 petabytes from 2100 silos • Using Spark, Impala, & Search side-by-side • With Anaconda, AtScale, Cloudwick, Kinetica, StreamSets, Tamr, Trifacta, & Zoomdata
  19. 19. 19 © Cloudera, Inc. All rights reserved. CHALLENGES WITH MULTIPLE DEPLOYMENT MODELS How are you managing your Data Warehouse today? How do you share datasets? Do you copy things around? How do you audit accesses across copies? Have you lost the track of the Source of the Truth? How do you propagate access permissions on copies? Have you ended up with multiple silos in the process?
  20. 20. 20 © Cloudera, Inc. All rights reserved. BUSINESS IMPACT OF SILOED SYSTEMS Lost Revenue Inaccurate and duplicated data directly impacts bottom line of 88% of all companies. Limits of Legacy Legacy limits organizations from taking advantage of data-driven opportunities. Costly Compliance By 2023, regulated organizations will spend over 5% of revenue on compliance.
  21. 21. 21 © Cloudera, Inc. All rights reserved. Cloudera Enterprise with SDX provides maximum cloud flexibility, enabling enterprise IT to control workloads anywhere, managed any way, and deliver a shared data experience business and data professionals demand
  22. 22. 22 © Cloudera, Inc. All rights reserved. Of course! We have our internal EDH cluster. That would be easy! Charles: With increased focus on … business insights.. dashboard … FAST... Charles, SVP, Emerging Businesses Mulyadi, Data Scientist Pipelines! Workloads! Queries! More pipelines. More workloads! More queries! Even more…. Alan, Internal EDH Data Platform Manager Adding more workloads to Internal EDH clusters is risky and adds uncertainty to existing SLA- sensitive workloads. May be separate cluster with “required” data? Why not!!
  23. 23. 23 © Cloudera, Inc. All rights reserved. Support Data Migration Cost Grows Exponentially Internal EDH Emerging Businesses Analytics Sales Analytics 37 15 47 27 27 15 Product Training Finance No single source of truth Synchronization overhead Stale data
  24. 24. 24 © Cloudera, Inc. All rights reserved. Support Embrace unification of data and data context via SDX Internal EDH Emerging Businesses Analytics Sales Analytics Product Training Finance
  25. 25. 25 © Cloudera, Inc. All rights reserved. MODERN DATA WAREHOUSE REQUIREMENTS Modeling Transform to it easy to combine datasets Governance Audit trail, lineage etc. Authorization Ensure right permissions are for right folks Preparation Cleanse, filter, standardize to enable wider acceptance Schema Permissions Gov artifacts Ingestion Collect data from various sources in varied formats
  26. 26. 26 © Cloudera, Inc. All rights reserved. DATA WAREHOUSE IN CLOUD DEPLOYMENTS Data Sources Cloud Store Cloud Store ETL Tool BI ToolsAnalytics DB “Glueing Tools”
  27. 27. 27 © Cloudera, Inc. All rights reserved. THREE THINGS TO REMEMBER ABOUT SDX • SDX is a differentiated capability offered by Cloudera only • SDX enables a shared data experience across multiple deployment model • SDX provides shared data context essential for global enterprise including schema, access permissions and governance
  28. 28. © Cloudera, Inc. All rights reserved. CLOUDERA ALTUS FOR DATA WAREHOUSING
  29. 29. 2929 ✓ No software to install or clusters to manage ✓ Get multiple workloads up and running within minutes ✓ Enable self service across your organization ✓ Fully secure, automated, with identity preserved across functions ✓ Optimized for both AWS and Azure ✓ Pay only for what you use DATA ENGINEERING DATA WAREHOUSE DATA SCIENCE* MULTI FUNCTION DATA CATALOG GOVERNANCESECURITY CONTROL PLANE LIFECYCLE MANAGEMENT MULTI CLOUD Amazon S3 Microsoft ADLS CLOUDERA ALTUS DATA WAREHOUSE BRING THE WAREHOUSE TO YOUR DATA * roadmap
  30. 30. 30 © Cloudera, Inc. All rights reserved. ALTUS DATA WAREHOUSE The first data warehouse cloud service to bring the warehouse to the data—delivering instant analytics to anyone For business analysts: • Run reports and queries at any time, with fast, predictable performance • Get self service analytic access on demand, using the same preferred tools and SQL skills • Power reports, BI, exploratory analytics, and ad hoc queries, all over the same shared data and schemas • Extend insights to data science teams, data engineers, production applications, and more For IT: • Eliminate data movement across workloads with lock- in-free open architecture • Provision isolated resources as they’re needed, with just a few clicks • Easily manage unlimited tenants, and maintain consistent security and governance with the Shared Data Experience • Support transient and long-running workloads with elastic scale, all with a single view into cloud costs and usage
  31. 31. 31 © Cloudera, Inc. All rights reserved. Metadata Security Governance Workload Management Ingest & Replication MODERN DATA WAREHOUSING WITH ALTUS Elastic and decoupled by design Shared data in object store (S3 or ADLS) Altus Data Warehouse Sales & Marketing BI Altus Data Engineering Data Prep / ELT Altus Data Warehouse Exploratory Queries
  32. 32. 32 © Cloudera, Inc. All rights reserved. WHAT’S MISSING FROM YOUR CLOUD DATA WAREHOUSE? Does data need to be copied/loaded into the database? Is upfront modeling or a proprietary data format required? Can you scale compute and storage independently? What’s required to grow/shrink your cluster? Is data shared across workloads or do non-SQL workloads require different data silos? Are object stores a native storage layer? Can the database span on-prem and multiple cloud environments? Flexibility Hybrid Scale Beyond SQL Shared Data
  33. 33. © Cloudera, Inc. All rights reserved. SUMMARY
  34. 34. 34 © Cloudera, Inc. All rights reserved. CLOUDERA ENTERPRISE The modern platform for machine learning and analytics optimized for the cloud Amazon S3 Microsoft ADLS HDFS KUDU SECURITY GOVERNANCE WORKLOAD MANAGEMENT INGEST & REPLICATION DATA CATALOG Core Services Storage Services DATA WAREHOUSE DATA SCIENCE EXTENSIBLE SERVICES OPERATIONAL DATABASE DATA ENGINEERING
  35. 35. 35 © Cloudera, Inc. All rights reserved. CLOUDERA ALTUS Data warehousing in the cloud – multiple clusters over single shared data DATA WAREHOUSE Discovery (raw) DATA WAREHOUSE Exploration (curated) DATA ENGINEERING Prep - New Report DATA WAREHOUSE BI/New Reporting DATA SCIENCE Model Build/Test DATA ENGINEERING Prep – Known DATA WAREHOUSE Regular Reporting Shared Object Storage (S3, ADLS) Shared Metadata, Security, Governance
  36. 36. 36 © Cloudera, Inc. All rights reserved. Q&A ALTUS FREE TRIAL https://cloudera.com/altus
  37. 37. THANK YOU https://www.cloudera.com/products/data-warehouse.html
  38. 38. © Cloudera, Inc. All rights reserved. 38

×