O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform erstellen

157 visualizações

Publicada em

Maschinelles Lernen und Analyseanwendungen explodieren im Unternehmen und ermöglichen Anwendungsfällen in Bereichen wie vorbeugende Wartung, Bereitstellung neuer, wünschenswerter Produktangebote für Kunden zum richtigen Zeitpunkt und Bekämpfung von Insider-Bedrohungen für Ihr Unternehmen.

Publicada em: Negócios
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform erstellen

  1. 1. 1© Cloudera, Inc. All rights reserved. How to build multi-disciplinary analytics applications on a shared data platform Frank Hereygers | Systems engineer
  2. 2. 2© Cloudera, Inc. All rights reserved.
  3. 3. 3© Cloudera, Inc. All rights reserved. Challenges in data management Many data silos, each requiring its own proprietary tools and infrastructure Different vendors, products, and services on-premises versus in cloud A fragmented approach is difficult, expensive, and risky SQL analytic databases NoSQL and real-time databases Data engineering and ETL environments Data warehouses and data marts
  4. 4. 4© Cloudera, Inc. All rights reserved. Traditional applications 4 • One data type • One analytic function • Hard to integrate Data Exploration STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG SQL & BI Analytics STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG Operational Real-Time DB STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG ETL & Data Processing STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG Custom Functions STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG
  5. 5. 5© Cloudera, Inc. All rights reserved. Negative consequences for your business Increased operational costs many distinct environments to buy and build Increased staff overhead many distinct tools to learn and support Increased security risks many distinct frameworks to enforce Decreased business insights narrow data sets and analytics rigidity Decreased business agility – outdated and limiting for applications Decreased governance capability – no common visibility across stores
  6. 6. 6© Cloudera, Inc. All rights reserved. Support multi-function analytics Minimize time to add workloads Support elastic workloads Enable self- service Provide a scalable model for sharing data Reduce cost Increase tenant isolation Secure the environment Key design goals for today’s data management teams
  7. 7. 7© Cloudera, Inc. All rights reserved. Shared Storage (HDFS, Kudu) Traditional on-premises deployments perform reasonably well Strong multi-function support Strong shared data experience Strong information security model Moderate cost management Moderate tenant isolation Moderate workload elasticity Weak on self service Weak on speed of deployment Shared Data Experience (Metadata, Security, Governance) One physical cluster provides a shared data experience to multiple workloads and tenants … but not good enough going forward
  8. 8. 8© Cloudera, Inc. All rights reserved. Traditional cloud deployments are strong where on-premises is weak, but at the expense of creating workload silos Moderate multi-function support Weak on shared data experience Weak information security model Moderate cost management Strong on tenant isolation Strong on workload elasticity Strong on self service Strong on speed of deployment This is the experience of cloud house offerings… but not good enough going forward Shared Storage Cloud
  9. 9. 9© Cloudera, Inc. All rights reserved. In the beginning…
  10. 10. 10© Cloudera, Inc. All rights reserved. In the beginning…
  11. 11. 11© Cloudera, Inc. All rights reserved. Today: One platform. Multiple workloads DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA SCIENCE Store and process unlimited data fast and cost-effectively “Programmatic data processing and machine learning” Explore, analyze, and understand all your data “Fast, flexible, open source parallel database” Build data-driven applications to deliver real-time insights “Online applications, lambda/kappa architectures”
  12. 12. 12© Cloudera, Inc. All rights reserved. What is a workload? Data + Data Context + Compute Data Context: • HMS: Schema definitions • Sentry: Security authorizations • Navigator: Audit logs • Navigator: Business glossary • Navigator: Business metadata • Navigator: Lineage
  13. 13. 13© Cloudera, Inc. All rights reserved. What about multiple workloads? Cluster Hive/HMS Sentry NavigatorSpark Keys HDFS, Kudu, S3, Private Cloud Storage
  14. 14. 14© Cloudera, Inc. All rights reserved. Data context with multiple workloads Traditional Hadoop clusters contain compute, data, and data context Transient Hadoop clusters contain compute and data context, but externalize data HDFS, Kudu, S3, Private Cloud Storage Why is data context stored in each cluster, and not alongside the data? ?
  15. 15. 15© Cloudera, Inc. All rights reserved. The data context consistency problem Compute and data are becoming further separated • Compute is stateless: cloud-based or on-prem, either transient or long-running • Data is stateful: cloud-based or on-prem in HDFS, Kudu, S3, ADLS, Isilon, etc. What about data context? • Schema Definitions (Hive Metastore) • Permissions (Apache Sentry) • Encryption Keys (KMS) • Governance (Cloudera Navigator) Data context should be stateful, but currently is stateless • This creates synchronization and usability challenges for admins and end users alike
  16. 16. 16© Cloudera, Inc. All rights reserved. Solution: Shared Data Experience Externalize data context services as a shared service DATA ENGINEERIN G OPERATIONA L DATABASE ANALYTIC DATABASE DATA SCIENCE Benefits: • Common schemas, access permissions, classifications, and governance across all workloads • Reduced cost of ownership: less hardware and software to manage • Increased end-user productivity: data is presented consistently in every cluster • Faster expansion: admins don’t have to recreate data context services with each new cluster KEYSHMS SENTRY NAVIGATO R KEYSHMS SENTRY NAVIGATO R HDFS, Kudu, S3, Private Cloud StorageHDFS, Kudu, S3, Private Cloud Storage
  17. 17. 17© Cloudera, Inc. All rights reserved. The modern platform for machine learning and analytics optimized for the cloud EXTENSIBLE SERVICES CORE SERVICES DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA SCIENCE DATA CATALOG INGEST & REPLICATION SECURITY GOVERNANCE WORKLOAD MANAGEMENT Cloudera Enterprise S3 ADLS HDFS KUDU STORAGE SERVICES
  18. 18. 18© Cloudera, Inc. All rights reserved. Two deployment options Cloudera SDX Cloudera SDX: Customer-managed • RDS-backed Hive Metastore • RDS-backed Apache Sentry • Customer-managed Cloudera Navigator Ideal for: • Director-launched workloads • CM-managed workloads Cloudera Altus SDX: Cloudera- managed • Serverless Hive Metastore • Serverless Apache Sentry • Serverless Cloudera Navigator Ideal for: • Altus SDX workloads • Hybrid workloads
  19. 19. 19© Cloudera, Inc. All rights reserved. Cloud deployments with SDX optimize for all design goals Shared Data Experience (Metadata, Security, Governance) One logical cluster provides a shared data experience to multiple workloads and tenants SDX makes it possible to transfer on-premises design wins to cloud Shared Object Storage Cloud Strong multi-function support Strong shared data experience Strong information security model Strong on cost management Strong on tenant isolation Strong on workload elasticity Strong on self service Strong on speed of deployment
  20. 20. 20© Cloudera, Inc. All rights reserved. Positive business outcomes Increased business insights diverse data together with analytics flexibility Increased business agility modern and nimble application innovation Increased governance capability one common viewpoint and store Decreased operational costs – one environment for all needs Decreased staff overhead – one set of controls for everything Decreased security risks – comprehensive controls everywhere
  21. 21. 21© Cloudera, Inc. All rights reserved. Using Predictive Maintenance to Improve Performance and Reduce Fleet Downtime • OnCommand Connection is collecting telematics and geolocation data across the fleet • Reduced maintenance costs to $.03 per mile from $.12-$.15 per mile • Centralizing data from 13 systems with varying frequency and semantic definitions • Real-time visibility of 300,000+ trucks in order to improve uptime and vehicle performance
  22. 22. 22© Cloudera, Inc. All rights reserved. Thank you

×