O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Self-service Big Data Analytics on Microsoft Azure

827 visualizações

Publicada em

In this presentation Microsoft will join Cloudera to introduce a new Platform-as-a-Service (PaaS) offering that helps data engineers use on-demand cloud infrastructure to speed the creation and operation of data pipelines that power sophisticated, data-driven applications - without onerous administration.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

Self-service Big Data Analytics on Microsoft Azure

  1. 1. CLOUDERA ALTUS ON AZURE May 2018
  2. 2. 2 © Cloudera, Inc. All rights reserved.2 © Cloudera, Inc. All rights reserved. - Cloudera Altus Overview - Altus Architecture Deep Dive - ADLS Deep Dive - Data Engineering Demo - Roadmap AGENDA
  3. 3. 3 © Cloudera, Inc. All rights reserved.3 © Cloudera, Inc. All rights reserved. CLOUDERA ALTUS OVERVIEW
  4. 4. 4 © Cloudera, Inc. All rights reserved.4 © Cloudera, Inc. All rights reserved. CLOUDERA ENTERPRISE DATA PLATFORM The modern platform for machine learning & analytics optimized for the cloud WORKLOADS 3RD PARTY SERVICES DATA ENGINEERING DATA SCIENCE ANALYTIC DATABASE OPERATIONAL DATABASE DATA CATALOG GOVERNANCESECURITY LIFECYCLE MANAGEMENT STORAGE Other cloud COMMON SERVICES HDFS Microsoft ADLS CONTROL PLANE KUDU
  5. 5. 5 © Cloudera, Inc. All rights reserved.5 © Cloudera, Inc. All rights reserved. CLOUDERA ALTUS PAAS • Simple • Self-service • Auto-elastic • Role specific DATA ENGINEERING ANALYTIC DB DATA SCIENCE DATA CATALOG GOVERNANC E SECURITY CONTROL PLANE LIFECYCLE MANAGEMEN T beta soon Other Cloud Microsoft ADLS
  6. 6. 6 © Cloudera, Inc. All rights reserved.6 © Cloudera, Inc. All rights reserved. What is it? - Short-lived - Single tenant - Spark, Hive, MapReduce, or YARN Cluster Used for things like - ETL jobs - Batch processing - With data living in ADLS - Provides fast and easy job submission without cluster management Generally Available on Azure ALTUS DATA ENGINEERING DATA ENGINEERING TRANSFORM DATA AT SCALE WITHOUT THE ADMINISTRATION
  7. 7. 7 © Cloudera, Inc. All rights reserved.7 © Cloudera, Inc. All rights reserved. What is it? - Long-lived - Multi tenant - Impala Cluster Used for things like - Data warehousing - Analytics - With data living in ADLS - Provides fast and easy analytics without cluster management Available in Beta on Azure ALTUS ANALYTIC DATABASE ANALYTIC DATABASE MULTITENENT ANALYTICS AND DW AT SCALE WITHOUT THE ADMINISTRATION
  8. 8. 8 © Cloudera, Inc. All rights reserved.8 © Cloudera, Inc. All rights reserved. CLOUDERA SDX EASIEST WAY TO COLLABORATE IN A SHARED ENVIRONMENT 8 • Unified security – protects sensitive data with consistent controls, even for transient and recurring workloads • Consistent governance – enables secure self-service access to all relevant data and increases compliance • Easy workload management – increases user productivity and boosts job predictability • Flexible ingest and replication – aggregates a single copy of all data, provides disaster recovery, and eases migration • Shared data catalog – defines and preserves structure and business context of data for new applications and partner solutions SHARED DATA EXPERIENCE Available in Beta on Azure
  9. 9. 9 © Cloudera, Inc. All rights reserved.9 © Cloudera, Inc. All rights reserved. - Troubleshoot jobs after cluster termination - Insight into causes of job failure - Identification and root cause analysis of slow jobs - Define an SLA and get sizing recommendations ALTUS WORKLOAD ANALYTICS HELPING USERS FOCUS ONLY ON THEIR WORKLOADS – NOT CLUSTERS
  10. 10. 10 © Cloudera, Inc. All rights reserved.10 © Cloudera, Inc. All rights reserved. If customers move from persistent analytic clusters to elastic clusters, they can save money and bring agility. Save labor and/ upfront expense required to accommodate new teams & environments Save hardware/software costs by using transient nodes for peak workloads as opposed to always-on nodes. Transient Persistent ELASTIC WORKLOADS DATA MART EXPANSION PEAK BURSTING COMMON USE CASES FLEXIBLE DEPLOYMENTS FOR OPTIMAL TOTAL COST OF OWNERSHIP DATA ENGINEERING ANALYTIC DATABASE DATA SCIENCE MACHINE LEARNING
  11. 11. 11 © Cloudera, Inc. All rights reserved.11 © Cloudera, Inc. All rights reserved. ALTUS ON AZURE ARCHITECTURE
  12. 12. 12 © Cloudera, Inc. All rights reserved.12 © Cloudera, Inc. All rights reserved. Cloudera Altus User Cloud Account ALTUS ARCHITECTURE Object Store Web UI CLI SDK/ Partners API Job Metadata Environment Cluster Metadata Altus DE Cluster Telemetry Storage Job Job InputData OutputData TelemetryData(Optional) Job Queue Workers Workers Workers Workers Workers Workers Workers Workers JobLogs Remote Management
  13. 13. 13 © Cloudera, Inc. All rights reserved.13 © Cloudera, Inc. All rights reserved. Customer Azure Subscription ACCESS SECURITY Clusters Created with User Assigned MSI ADLS Job Workers Workers Workers VMs Workers Workers Workers VMs DataAccess SSH Altus Virtual Network/Subnet Resource Group Needed Permissions Provide consent for cross account access ● Can be restricted to Resource Groups ● Can leverage custom Azure RBAC roles ● For POCs, recommended to have contributor access to subscription Create a User Assigned MSI to: ● Read/write to ADLS folders/files ○ Governed by ACLs Network Security Group: ● Allow SSH from Altus management plane to VMs ○ Limited to Altus IPs Cross Account Access SSH
  14. 14. 14 © Cloudera, Inc. All rights reserved.14 © Cloudera, Inc. All rights reserved. DATA SECURITY Customer Azure Subscription User Assigned MSI Permissions ADLS Job Data Access SSH Altus Virtual Network/Subnet Resource Group Cross Account Access Workers Workers Workers VMs Managed Disks Workers Workers Workers VMs Managed Disks Encrypted at rest by default Encrypted at rest by default ● Can use custom keys (Azure Key Vault) ● Data, Logs TLS in-cluster Kerberos enabled Communications Encrypted
  15. 15. 15 © Cloudera, Inc. All rights reserved.15 © Cloudera, Inc. All rights reserved. MICROSOFT ADLS DEEP DIVE
  16. 16. USGovGlobalRegionalIndustry  ISO 27001:2013  ISO 27017:2015  ISO 27018:2014  ISO 22301:2012  ISO 9001:2015  ISO 20000-1:2011  SOC 1 Type 2  SOC 2 Type 2  SOC 3  CSA STAR Certification  CSA STAR Attestation  CSA STAR Self-Assessment  WCAG 2.0  FedRAMP High  FedRAMP Moderate  EAR  DoD DISA SRG Level 5  DoD DISA SRG Level 4  DoD DISA SRG Level 2  DFARS  DoE 10 CFR Part 810  NIST SP 800-171  NIST CSF  Section 508 VPATs  PCI DSS Level 1  GLBA  FFIEC  Shared Assessments  FISC (Japan)  APRA (Australia)  FCA (UK)  MAS + ABS (Singapore)  23 NYCRR 500  HIPAA BAA  HITRUST  21 CFR Part 11 (GxP)  MARS-E  NHS IG Toolkit (UK)  NEN 7510:2011 (Netherlands)  FERPA  CDSA  MPAA  FACT (UK)  DPP (UK)  SOX  Argentina PDPA  Australia IRAP Unclassified  Australia IRAP Protected  Canada Privacy Laws  China GB 18030:2005  China DJCP (MLPS) Level 3  Germany C5  India MeitY  Japan CS Mark Gold  Japan My Number Act  Netherlands BIR 2012  New Zealand Gov CIO Fwk  Singapore MTCS Level 3  Spain ENS  Spain DPA  UK Cyber Essentials Plus  UK G-Cloud  UK PASF  FIPS 140-2  ITAR  CJIS  IRS 1075 Azure covers 73 compliance offerings Azure has the deepest and most comprehensive compliance coverage in the industry  China TRUCS / CCCPPF  EN 301 549  EU ENISA IAF  EU Model Clauses  EU – US Privacy Shield  Germany IT-Grundschutz workbook https://aka.ms/AzureCompliance
  17. 17. Open source support Applications Infrastructure Management Databases & middleware App frameworks & tools DevOps
  18. 18. Azure Data Lake Store (ADLS) A hyper scale repository for big data analytics workloads Store ANY DATA in its native format HADOOP FILE SYSTEM (HDFS) for the cloud ENTERPRISE GRADE No limits to SCALE Optimized for analytic workload PERFORMANCE YARN Hive | Spark | Impala Cloudera 5.1x Azure PaaS Services ADL Store Compute Data
  19. 19. Hadoop Worker 2 Hadoop Worker 1 Hadoop Master Local Disk Local Disk HDFS Architectures - Traditional Hadoop Cluster
  20. 20. Hadoop Worker 2 Hadoop Worker 1 Hadoop Master ADLS Cloudera on ADLS
  21. 21. ADLS – Under the hood Data Lake Store Backend SSD-backed Data Lake Ingestion layer Data Lake Client Data Lake Management Client Data Lake Client SDK REST API (Data Access) Data Lake Store Frontend Management API Scale out Storage Azure ML Metadata Service Naming Service File System/ HDFS API 1 2 4 3 5 6 Microsoft R Server
  22. 22. Comparison between storage options Block based options Filesystem based options VHDs on WASB Premium Storage WASB ADLS Maximum volume 4TB per disk 4TB per disk 500 TB No limit (tested > exabytes) Maximum item size N/A N/A 4.75 TB No limit (tested > petabytes) Physical media HDD Flash/SSD HDD SSD + HDD Replication LRS and GRS None LRS and GRS LRS Throughput 60 MBps per disk 250 MBps per disk 60 MBps per blob Extremely high RBAC N/A N/A N/A POSIX compliant (file & folder level) Encryption SSE or Azure Key Vault N/A N/A Transparent (AES 256 + TLS 1.2) Workloads any any low TBs >10 TBs Locations all most all 4 and growing https://docs.microsoft.com/en-us/azure/storage/storage-scalability-targets
  23. 23. Why Cloudera on Azure Data Lake Store? Separation of Compute & Storage Transient clusters for flexibility, lower TCO Shared storage for many optimized clusters Compute time M T W R F S S Data Lake Store Data Lake Store Data Lake Store
  24. 24. 25 © Cloudera, Inc. All rights reserved.25 © Cloudera, Inc. All rights reserved. ALTUS DEMO
  25. 25. 26 © Cloudera, Inc. All rights reserved.26 © Cloudera, Inc. All rights reserved. ALTUS ROADMAP
  26. 26. 27 © Cloudera, Inc. All rights reserved.27 © Cloudera, Inc. All rights reserved. ALTUS DATA ENGINEERING ROADMAP Production Workflows • Scheduling, orchestration • Success/Failure notifications • Enhanced debugging • Failure handling • Dependency management Developer Workflows • IDE + partner integration • CI/CD • Python SDK • Interactive experience • Altus DS Integration Operational Efficiency • Autoscaling, enhanced spot • SLA and cost management • Workload automation
  27. 27. 28 © Cloudera, Inc. All rights reserved.28 © Cloudera, Inc. All rights reserved. ALTUS ANALYTIC DATABASE ROADMAP Platform • Pause, resume, and resize for clusters • Shrink w graceful shutdown • Altus SQL editor • Autoscaling Integrations • SDX • Workload XM • Navigator • Navigator Optimizer Misc • UDF support • SQL CLI (impala-shell)
  28. 28. 29 © Cloudera, Inc. All rights reserved.29 © Cloudera, Inc. All rights reserved. ALTUS PLATFORM ROADMAP Altus Self Service • Self-service subscription Platform • Increased Scalability • Java SDK for ADB, SDX Security • Identity federation • Enhanced security
  29. 29. THANK YOUTHANK YOU
  30. 30. 31 © Cloudera, Inc. All rights reserved.31 © Cloudera, Inc. All rights reserved. - Install any software to start working - Install any hardware - Worry about cluster configuration - Upgrade/reconfigure clusters - OS upgrades/patching - Resource Management EVERYTHING YOU DON’T HAVE TO DO FOCUS ON YOUR WORKLOADS, NOT THE CLUSTERS

×