O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

High-Performance Analytics in the Cloud with Apache Impala

1.199 visualizações

Publicada em

With more and more data being generated and stored in the cloud, you need a modern data platform that can extend to any environment so you can derive value from all your data. Cloudera Enterprise is the leading enterprise Hadoop platform for cloud deployments. It’s the easiest way to manage and secure Hadoop data across any cloud environment and includes component-level support for cloud-native object stores. This makes the platform uniquely suited to handle transient jobs like ETL and BI analytics, as well as persistent workloads like stream processing and advanced analytics.
With the recent release of Cloudera 5.8, Apache Impala (incubating) has added support for Amazon S3, enabling business analysts to get instant insights from all data through high-performance exploratory analytics and BI.
3 Things to learn:
Join David Tishgart, Director of Product Marketing, and James Curtis, Senior Analyst Data Platforms & Analytics at 451 Research, as they discuss:
* Best practices for analytic workloads in the cloud

* A live demo and real-world use cases

* What’s next for Cloudera and the cloud


Publicada em: Software

High-Performance Analytics in the Cloud with Apache Impala

  1. 1. 1© Cloudera, Inc. All rights reserved. High-Performance Analytics in the Cloud with Apache Impala James Curtis | Sr. Analyst, Data Platforms & Analytics | 451 Research David Tishgart | Cloud Product Marketing | Cloudera Tom Deane | Cloud Product Management | Cloudera
  2. 2. 2© Cloudera, Inc. All rights reserved. David Tishgart Director of Product Marketing - Cloud Cloudera Jim Curtis Senior Analyst Data Platforms & Analytics 451 Research Tom Deane Senior Product Manager - Cloud Cloudera Today’s Speakers
  3. 3. 3© Cloudera, Inc. All rights reserved. Agenda 1. Hadoop in the cloud 2. Why users migrate to the cloud for big data analytics 3. Cloud considerations 4. Cloudera Enterprise for cloud-based analytics 5. Product Demo 6. Q&A
  4. 4. 451 Research is a leading IT research & advisory company 4 Founded in 2000 250+ employees, including over 100 analysts 1,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers 50,000+ IT professionals, business users and consumers in our research community Over 52 million data points published each quarter and 4,500+ reports published each year 2,000+ technology & service providers under coverage 451 Research and its sister company, Uptime Institute, are the two divisions of The 451 Group Headquartered in New York City, with offices in London, Boston, San Francisco, Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia, Taiwan, Singapore and Malaysia Research & Data Advisory Events Go 2 Market
  5. 5. Total Hadoop Revenue 5 Source: 451 Research Market Monitor Total Data Service $0 $500 $1,000 $1,500 $2,000 $2,500 $3,000 $3,500 $4,000 $4,500 $5,000 2015 2016 2017 2018 2019 2020 $inM 42% CAGR expected for Hadoop-aaS 39% CAGR expected for total Hadoop market © 2016 451 Research. All rights reserved Hadoop Vendors Altiscale Dremio Pivotal Amazon Web Services Esgyn Qubole BlueData Hortonworks Rackspace Cazena IBM Splice Machine CenturyLink JethroData Stratio Cloudera MapR Technologies Teradata Cray MetaScale Tonomi Data Artisans Microsoft Treasure Data Databricks Oracle VMware Doopex Pepperdata WANdisco
  6. 6. Data Gravity: Play It as It Lies Cloud Compute  In golf, the USGA Rule 13 states that ‘the ball must be played where it lies….’  Data gravity: data being analyzed where it resides (instead of moving the data to a data warehouse for analysis)  Our research shows 33% of IT storage professionals already use cloud storage services and 23.4% plan to purchase cloud services in next 90 days Source: 451 Research, Voice of Enterprise Storage, Q1 2016.
  7. 7. Public Cloud Infrastructure (IaaS, PaaS) Storage Growth Cloud Compute Source: 451 Research, Voice of the Enterprise Storage, Q1 2016. Database/DW – Today Database/DW – 2 Yrs. Data Analytics/BI – Today Data Analytics/BI – 2 Yrs. Big Data – Today Big Data – 2 Yrs. 0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
  8. 8. Why Use Cloud for Analytics/BI and Big Data? Source: 451 Research, Voice of the Enterprise Cloud, Workloads, and Key Projects 2016. Analytics/BI Workloads Big Data Workloads
  9. 9. Benefits—Moving Analytic Workloads to Cloud Storage  Financial—costs significantly less to store data in S3 (AWS) over HDFS running on EC2 (AWS) HDFS requires three copies of each block, whereas cloud storage uses auto backups and compression  Scalability — HDFS inherently scales; it requires manual configuration and management. Cloud storage automatically scales as data is added, requiring no user involvement — Ability to scale storage and compute separately, based on use case and need  Durability/Persistence—with HDFS on EC2, data is persisted for the life of the instance. Cloud storage is designed to deliver data durability to 99.9999%
  10. 10. Considerations: Leveraging the Cloud for Analytics Cloud Compute  Remember data gravity. It’s much easier to analyze the data where it resides, particularly if it’s in the cloud  If moving to the cloud, know why and what you’re getting into. If data is on premise and you move it to the cloud, then keep it in cloud  Because cloud storage separates storage from compute (versus local storage), performance may be impacted  When moving objects to cloud storage, be aware size limitations for objects  Cloud providers enable security but organizations will want to implement platform-level security as well  Avoid looking at cloud or cloud storage from a single benefit. Does it benefit the organization across multiple dimensions?  It’s not necessarily an either/or proposition. It’s acceptable for enterprises to run a mix of cloud, cloud storage, and on-premises
  11. 11. Thank You! james.curtis@451research.com www.451research.com
  12. 12. 12© Cloudera, Inc. All rights reserved. Cloud growth fueling Hadoop use cases Requirements to move from POCs to enterprise use cases - Security - Portability - Flexibility - Ecosystem support
  13. 13. 13© Cloudera, Inc. All rights reserved. Common workloads in the cloud Only pay for what you need, when you need it ▪ Transient clusters ▪ Elastic workload ▪ Object storage centric ▪ Cloud-native deployment ETL/Modeling (Data Engineering) App Delivery (Operational Database) Reduce Operating Costs New Insights, New Revenue Run Without Risk BI/Analytics (Analytic Database) Explore and analyze all data, wherever it lives ▪ Transient or Persistent clusters ▪ Sized to demand ▪ HDFS or object storage ▪ Lift-and-shift or cloud-native deployment Enterprise-grade to protect your business, no matter what ▪ Fixed clusters ▪ Periodic sync ▪ All HDFS storage ▪ Lift-and-shift deployment
  14. 14. 14© Cloudera, Inc. All rights reserved. Lift and Shift for long-running workloads Cloud-native for transient clusters Deployment Models in the Cloud Object Store
  15. 15. 15© Cloudera, Inc. All rights reserved. Apache Impala: Self-Service BI & Exploratory Analytics • Self-Service Data Agility • No rigid data modeling encumbrances for agile acquisition • Iteratively analyze and flexibly model • Self-Service Exploratory Analytics • Interactive responses for iterative exploration • Confidently handle all BI and SQL users • Cost-Effective Scalability with Users/Data • Easily add nodes to handle more data and users • Leverage the full potential of available data • Productively Use Existing Tools and Skills • Integration with all leading BI tools & compatible analytic SQL language • Metadata and lineage for easy data discovery • Intelligent SQL editor for greater developer productivity
  16. 16. 16© Cloudera, Inc. All rights reserved. Deploy on any cloud infrastructure Cloudera Director: Management for IaaS-related and CDH cluster operations Easy Administration • Dynamic cluster lifecycle management • Single pane of glass: multi-cluster view • Consumption based billing and metering Enterprise-grade • Integration across Cloudera Enterprise • Management of CDH deployments at scale Flexible Deployments • No cloud vendor lock-in: open plugin framework for IaaS platforms • Scaling of provisioned clusters • Spot instance provisioning Cloudera Director
  17. 17. 17© Cloudera, Inc. All rights reserved. Cloudera’s Analytic Database Solution OPERATIONS DATAMANAGEMENT UNIFIED SERVICES PROCESS,ANALYZE, SERVE STORE INTEGRATE Identify, offload, & optimize workloads to Hadoop Navigator Optimizer Intelligent SQL editor Hue Audit, lineage, encryption, key management, & policy lifecycles Navigator Integration with the leading BI tools BI Partners Interactive query engine for BI & SQL analytics Impala Large-scale ETL & batch processing engine Hive-on- Spark Multi-Storage, Multi-Environment
  18. 18. 18© Cloudera, Inc. All rights reserved. Impala Key Cloud Benefits vs Cloud Analytic Database • Cloud Elasticity • Ability to grow/shrink cluster sizes (native architecture) • Elastic compute scale (via direct query from S3) • Transient workloads (via direct query from S3) • Data Agility • Faster, more agile data acquisition • Data portability – open formats and open storage (even more so with S3) • Scalability • Proven through hundreds of nodes • Proven with high concurrency • Hybrid cloud • Runs across clouds and on-prem • Runs across S3, HDFS, Kudu, Isilon, DSSD, etc
  19. 19. 19© Cloudera, Inc. All rights reserved. Demo – Impala on S3 Real-time SQL analytics with Cloudera Enterprise deployed on Amazon Web Services
  20. 20. 20© Cloudera, Inc. All rights reserved. Why Impala on S3? • Decouple storage from compute • Save on storage costs and replication costs • Pay-per-use (provision, grow, shrink, and terminate clusters) • No need to migrate data out of S3 to another filesystem
  21. 21. 21© Cloudera, Inc. All rights reserved. Impala on S3 Scalability 2.1x speedup for 4% lower cost for multi-user For full details: http://blog.cloudera.com/blog/2016/08/analytics-and-bi-on-amazon-s3-with-apache-impala-incubating/
  22. 22. 22© Cloudera, Inc. All rights reserved. End-to-End Demo
  23. 23. 23© Cloudera, Inc. All rights reserved. Why Cloudera in the Cloud? Hadoop Expertise ◆ Most committers ◆ World-class innovation ◆ Enterprise-class stack ◆ Granular data security + governance ◆ Best support, services, training Flexible Deployments ◆ No vendor lock-in ◆ Multi-cloud and on-prem ◆ Transient and long-lived clusters Customer success – not cloud consumption ◆ Focus on infrastructure choice ◆ Security separation from infrastructure leads to reduced risk Flexible Pricing ◆ Pay-as-you-go cloud usage ◆ Traditional node-based licensing
  24. 24. 24© Cloudera, Inc. All rights reserved. Get started with Cloudera Enterprise in the cloud Deploy and manage Cloudera Enterprise on AWS, GCP, Azure or the cloud of your choice Deploy an enterprise data hub on AWS Provision and deploy Cloudera Enterprise on the Azure Marketplace Cloudera Director AWS Quickstart Azure Marketplace
  25. 25. 25© Cloudera, Inc. All rights reserved. Q&A
  26. 26. 26© Cloudera, Inc. All rights reserved. Thank You

×