SlideShare uma empresa Scribd logo
1 de 20
Baixar para ler offline
Headline Goes Here
Speaker Name or Subhead Goes Here
DO NOT USE PUBLICLY
PRIOR TO 10/23/12
Challenges of running Hadoop on AWS
June 12, 2014 @ AdvancedAWS Meetup - Citizen Space
Andrei Savu - @andreisavu
Software Engineer, Cloud Automation Team
Overview
● Introduction
● Context
● Challenges
● Questions
Andrei Savu
Software Engineer
Cloud Automation Team @ Cloudera
Previously: founder of Axemblr, Apache Whirr PMC, contributor
to jclouds, Cloudsoft, Facebook etc. (see LinkedIn)
Cloud Automation Team @ Cloudera
Focused on:
● building tools to automate deployment and ongoing
management of Hadoop clusters on cloud infrastructure
● improving Hadoop cloud compatibility (e.g. s3 integration,
swift, managed databases, custom network topologies etc.)
We are hiring!
Context
● Hadoop
● Types of Deployments
● Cluster Topology
● AWS
Context: Hadoop
Hadoop is a broad, coherent stack of products for data storage
and processing.
“Hadoop” is more than HDFS & MapReduce. It can do: multiple
storage systems, different query engines, batch and real-time
etc.
Usually running on bare metal now moving towards cloud infra.
Types of Deployments
Long running:
- store data for analytics jobs with MapReduce, Impala, Spark
- online data serving with HBase
On-demand:
- analytical workloads, fetch data on-demand
- triggered by workflows (1:1)
- disconnected lifecycle (Netflix Genie)
Cluster Topology #1
Simple:
● EC2 classic (being phased-out)
● VPC: single subnet, security group with an optional VPN
Complex:
● VPC: multiple subnets & security groups
● DirectConnect
● highly available with disaster recovery
● multiple users & security
Cluster Topology #2
● Cloudera Reference Architecture for AWS Deployments:
http://www.cloudera.
com/content/cloudera/en/resources/library/whitepaper/cloudera-enterprise-
reference-architecture-for-aws-deployments.html
● Best Practices for Deploying Cloudera Enterprise on AWS:
http://blog.cloudera.com/blog/2014/02/best-practices-for-deploying-cloudera-
enterprise-on-amazon-web-services/
Amazon Web Services
Paradigm shift in how we work with infrastructure.
Key concept: software defined - controlled by APIs
Has most of the things we need for storage and high
performance data processing (placement groups, large
instances, high storage density, ssds, many vCPUs etc.)
Enterprise-ready: IAM, VPC, VPN / DirectConnect, Support etc.
Challenges
● Instance Provisioning & Health
● Ensuring Idempotency
● Networking & Performance
● AMIs & Bootstrap Speed
● Data durability
● S3 integration
What makes it more difficult?
… versus a typical stateless web application in an auto-scaling
group monitoring request latency or OS load averages
● statefulness (think databases)
● each cluster has multiple processes playing different roles
● topology & configuration changes require orchestration
● knowledge of service inter-dependencies is required
Instance Provisioning & Health
Questions:
● How do you define your cluster size to deal lack of capacity?
● How do you define health? Is that stable during setup?
● Is health a binary property? Or a threshold that needs to be
continuously evaluated?
Potential answers:
● match AWS semantics: define size as a range
● make simplifying assumptions (e.g. healthy during setup)
Ensuring Idempotency
Questions:
● How do you safely retry expensive calls?
● How do you build reliable workflows?
Potential answers:
● AWS User Guide via client token
● Discuss: Convergence vs. Single step retries
Networking & Performance
Questions:
● What’s the ideal setup that’s both usable and secure?
● How do you get consistent intra-cluster performance?
Potential answers:
● VPC with VPN or DirectConnect. Placement groups help.
● Security model: initial it was just perimeter security, now it
can do a lot more (disk encryption, SSL, kerberos)
Images & Bootstrap Speed
Questions:
● Do you allow custom AMIs or force your own choices?
● If using custom AMIs how can you reduce bootstrap time?
Potential answers:
● Custom AMIs are common - integrated with existing infra
● Fast bootstrap by baking on top with custom bits
Data durability
Questions:
● How do you place replicas? Datacenter topology?
● How are instances distributed in different failure domains?
Potential answers:
● ignore or go with large instances that map 1:1 to hosts
● would be nice to have: a way to influence host to instance
allocation or to get datacenter topology data
S3 integration
Questions:
● How do you reconcile differences in semantics with HDFS?
(strongly consistent vs. eventual consistency)
● How do you get most out of it in terms of performance?
Potential answers:
● we’ve done a fair amount of work improving S3 in the open
source (features, stability improvements, security etc.)
● performance is network bound
Thanks! Questions?
Andrei Savu - asavu@cloudera.com
Twitter: @andreisavu
Join us to take Hadoop to the clouds!
https://hire.jobvite.com/Jobvite/job.aspx?j=orafYfwy&b=nqlg3nwW
Challenges for running Hadoop on AWS - AdvancedAWS Meetup

Mais conteúdo relacionado

Mais procurados

Bursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using AlluxioBursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using AlluxioAlluxio, Inc.
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudAmazon Web Services
 
Building a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWSBuilding a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWSArun Sirimalla
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
 
Building Analytics Applications in the AWS Cloud
Building Analytics Applications in the AWS CloudBuilding Analytics Applications in the AWS Cloud
Building Analytics Applications in the AWS CloudAmazon Web Services
 
AWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce Costs
AWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce CostsAWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce Costs
AWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce CostsAmazon Web Services
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Amazon Web Services
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDSAWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDSAmazon Web Services
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIsCisco DevNet
 
Oracle and SQL Server on the Cloud - Bill Baldwin
Oracle and SQL Server on the Cloud - Bill BaldwinOracle and SQL Server on the Cloud - Bill Baldwin
Oracle and SQL Server on the Cloud - Bill BaldwinAmazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Gartner evaluation criteria_for_clou_security_networking
Gartner evaluation criteria_for_clou_security_networkingGartner evaluation criteria_for_clou_security_networking
Gartner evaluation criteria_for_clou_security_networkingYerlin Sturdivant
 
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...Amazon Web Services
 
BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2Paulraj Pappaiah
 

Mais procurados (20)

Bursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using AlluxioBursting on-premise analytic workloads to Amazon EMR using Alluxio
Bursting on-premise analytic workloads to Amazon EMR using Alluxio
 
Intro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS CloudIntro to High Performance Computing in the AWS Cloud
Intro to High Performance Computing in the AWS Cloud
 
Building a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWSBuilding a Bigdata Architecture on AWS
Building a Bigdata Architecture on AWS
 
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...
 
Building Analytics Applications in the AWS Cloud
Building Analytics Applications in the AWS CloudBuilding Analytics Applications in the AWS Cloud
Building Analytics Applications in the AWS Cloud
 
AWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce Costs
AWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce CostsAWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce Costs
AWS Sydney Summit 2013 - Optimizing AWS Applications and Usage to Reduce Costs
 
Amazon EMR
Amazon EMRAmazon EMR
Amazon EMR
 
HPC in AWS - Technical Workshop
HPC in AWS - Technical WorkshopHPC in AWS - Technical Workshop
HPC in AWS - Technical Workshop
 
Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015Getting Started with Big Data and HPC in the Cloud - August 2015
Getting Started with Big Data and HPC in the Cloud - August 2015
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDSAWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
AWSome Day 2016 - Module 4: Databases: Amazon DynamoDB and Amazon RDS
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 
DEVNET-1166 Open SDN Controller APIs
DEVNET-1166	Open SDN Controller APIsDEVNET-1166	Open SDN Controller APIs
DEVNET-1166 Open SDN Controller APIs
 
Oracle and SQL Server on the Cloud - Bill Baldwin
Oracle and SQL Server on the Cloud - Bill BaldwinOracle and SQL Server on the Cloud - Bill Baldwin
Oracle and SQL Server on the Cloud - Bill Baldwin
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Gartner evaluation criteria_for_clou_security_networking
Gartner evaluation criteria_for_clou_security_networkingGartner evaluation criteria_for_clou_security_networking
Gartner evaluation criteria_for_clou_security_networking
 
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
(BDT311) MegaRun: Behind the 156,000 Core HPC Run on AWS and Experience of On...
 
BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2BigData: AWS RedShift with S3, EC2
BigData: AWS RedShift with S3, EC2
 

Destaque

(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...Amazon Web Services
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호Amazon Web Services Korea
 
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data HubCloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data HubCloudera, Inc.
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSCloudera, Inc.
 
[GE Innovation Forum 2015] The Industrial Internet by Bill Ruh
[GE Innovation Forum 2015] The Industrial Internet by Bill Ruh[GE Innovation Forum 2015] The Industrial Internet by Bill Ruh
[GE Innovation Forum 2015] The Industrial Internet by Bill RuhGE코리아
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationmattlieber
 
E4: Building Your First Predix App (Predix Transform 2016)
E4: Building Your First Predix App (Predix Transform 2016)E4: Building Your First Predix App (Predix Transform 2016)
E4: Building Your First Predix App (Predix Transform 2016)Predix
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2IMC Institute
 
E3: Edge and Cloud Connectivity (Predix Transform 2016)
E3: Edge and Cloud Connectivity (Predix Transform 2016)E3: Edge and Cloud Connectivity (Predix Transform 2016)
E3: Edge and Cloud Connectivity (Predix Transform 2016)Predix
 
E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)Predix
 
Predix Builder Roadshow
Predix Builder RoadshowPredix Builder Roadshow
Predix Builder RoadshowPredix
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)Apache Apex
 
AWS Cloud Kata | Bangkok - Getting to MVP
AWS Cloud Kata | Bangkok - Getting to MVPAWS Cloud Kata | Bangkok - Getting to MVP
AWS Cloud Kata | Bangkok - Getting to MVPAmazon Web Services
 
Digital strategy to transform GE into digital industrial leader
Digital strategy to transform GE into digital industrial leaderDigital strategy to transform GE into digital industrial leader
Digital strategy to transform GE into digital industrial leaderYouji Zhang
 
AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)Amazon Web Services
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web ServicesAmazon Web Services
 

Destaque (20)

(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data HubCloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
Cloudera Federal Forum 2014: Cloud Deployment for the Enterprise Data Hub
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
Industrial Internet
Industrial InternetIndustrial Internet
Industrial Internet
 
Five Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWSFive Tips for Running Cloudera on AWS
Five Tips for Running Cloudera on AWS
 
[GE Innovation Forum 2015] The Industrial Internet by Bill Ruh
[GE Innovation Forum 2015] The Industrial Internet by Bill Ruh[GE Innovation Forum 2015] The Industrial Internet by Bill Ruh
[GE Innovation Forum 2015] The Industrial Internet by Bill Ruh
 
Hadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluationHadoop AWS infrastructure cost evaluation
Hadoop AWS infrastructure cost evaluation
 
E4: Building Your First Predix App (Predix Transform 2016)
E4: Building Your First Predix App (Predix Transform 2016)E4: Building Your First Predix App (Predix Transform 2016)
E4: Building Your First Predix App (Predix Transform 2016)
 
Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Workshop using Cloudera on Amazon EC2
 
E3: Edge and Cloud Connectivity (Predix Transform 2016)
E3: Edge and Cloud Connectivity (Predix Transform 2016)E3: Edge and Cloud Connectivity (Predix Transform 2016)
E3: Edge and Cloud Connectivity (Predix Transform 2016)
 
E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)E1: Building the Digital Twin (Predix Transform 2016)
E1: Building the Digital Twin (Predix Transform 2016)
 
Overview of Amazon Web Services
Overview of Amazon Web ServicesOverview of Amazon Web Services
Overview of Amazon Web Services
 
Predix Builder Roadshow
Predix Builder RoadshowPredix Builder Roadshow
Predix Builder Roadshow
 
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
GE IOT Predix Time Series & Data Ingestion Service using Apache Apex (Hadoop)
 
AWS Cloud Kata | Bangkok - Getting to MVP
AWS Cloud Kata | Bangkok - Getting to MVPAWS Cloud Kata | Bangkok - Getting to MVP
AWS Cloud Kata | Bangkok - Getting to MVP
 
Digital strategy to transform GE into digital industrial leader
Digital strategy to transform GE into digital industrial leaderDigital strategy to transform GE into digital industrial leader
Digital strategy to transform GE into digital industrial leader
 
GE Predix - The IIoT Platform
GE Predix - The IIoT PlatformGE Predix - The IIoT Platform
GE Predix - The IIoT Platform
 
AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)AWS 101: Cloud Computing Seminar (2012)
AWS 101: Cloud Computing Seminar (2012)
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web Services
 

Semelhante a Challenges for running Hadoop on AWS - AdvancedAWS Meetup

Cloud computing
Cloud computingCloud computing
Cloud computinggd1410
 
Migrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWSMigrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWSTom Laszewski
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Cloudera, Inc.
 
2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with AzureMarco Parenzan
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAlluxio, Inc.
 
Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)★ Akshay Surve
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashAndrei Savu
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesDataWorks Summit
 
CLOUD COMPUTING.pptx
CLOUD COMPUTING.pptxCLOUD COMPUTING.pptx
CLOUD COMPUTING.pptxSurajThapa79
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GooglePatrick Pierson
 
Exploring Cloud Computing with Amazon Web Services (AWS)
Exploring Cloud Computing with Amazon Web Services (AWS)Exploring Cloud Computing with Amazon Web Services (AWS)
Exploring Cloud Computing with Amazon Web Services (AWS)Kalema Edgar
 
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...Amazon Web Services
 
Cambridge Breakfast Seminar
Cambridge Breakfast SeminarCambridge Breakfast Seminar
Cambridge Breakfast SeminarNuoDB
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudCloudera, Inc.
 

Semelhante a Challenges for running Hadoop on AWS - AdvancedAWS Meetup (20)

Cloud computing
Cloud computingCloud computing
Cloud computing
 
Intro to cloud.pdf
Intro to cloud.pdfIntro to cloud.pdf
Intro to cloud.pdf
 
Migrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWSMigrating Enterprise Applications to AWS
Migrating Enterprise Applications to AWS
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure
 
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
 
Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)Migrating a multi tenant app to Azure (war biopic)
Migrating a multi tenant app to Azure (war biopic)
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
 
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual MachinesHadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
 
CLOUD COMPUTING.pptx
CLOUD COMPUTING.pptxCLOUD COMPUTING.pptx
CLOUD COMPUTING.pptx
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
 
Exploring Cloud Computing with Amazon Web Services (AWS)
Exploring Cloud Computing with Amazon Web Services (AWS)Exploring Cloud Computing with Amazon Web Services (AWS)
Exploring Cloud Computing with Amazon Web Services (AWS)
 
Cloud1 Computing 01
Cloud1 Computing 01Cloud1 Computing 01
Cloud1 Computing 01
 
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
Migrating Enterprise Applications to AWS: Best Practices & Techniques (ENT303...
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
Cambridge Breakfast Seminar
Cambridge Breakfast SeminarCambridge Breakfast Seminar
Cambridge Breakfast Seminar
 
A deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloudA deep dive into running data analytic workloads in the cloud
A deep dive into running data analytic workloads in the cloud
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Amazon cloud service
Amazon cloud serviceAmazon cloud service
Amazon cloud service
 

Mais de Andrei Savu

The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringAndrei Savu
 
The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringAndrei Savu
 
Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015Andrei Savu
 
APIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSFAPIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSFAndrei Savu
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data PlatformAndrei Savu
 
Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10Andrei Savu
 
Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013Andrei Savu
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist ToolboxAndrei Savu
 
Axemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x OverviewAxemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x OverviewAndrei Savu
 
2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUGAndrei Savu
 
Metrics for Web Applications - Netcamp 2012
Metrics for Web Applications - Netcamp 2012Metrics for Web Applications - Netcamp 2012
Metrics for Web Applications - Netcamp 2012Andrei Savu
 
Counters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverCounters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverAndrei Savu
 
Simple REST with Dropwizard
Simple REST with DropwizardSimple REST with Dropwizard
Simple REST with DropwizardAndrei Savu
 
Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Andrei Savu
 
Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1 Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1 Andrei Savu
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudAndrei Savu
 
Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011Andrei Savu
 
Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36Andrei Savu
 
Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayAndrei Savu
 

Mais de Andrei Savu (20)

The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data Engineering
 
The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data Engineering
 
Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015
 
APIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSFAPIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSF
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
 
Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10
 
Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Axemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x OverviewAxemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x Overview
 
2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG
 
Metrics for Web Applications - Netcamp 2012
Metrics for Web Applications - Netcamp 2012Metrics for Web Applications - Netcamp 2012
Metrics for Web Applications - Netcamp 2012
 
Counters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverCounters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at Hackover
 
Simple REST with Dropwizard
Simple REST with DropwizardSimple REST with Dropwizard
Simple REST with Dropwizard
 
Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2
 
Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1 Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the Cloud
 
Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011
 
Apache Whirr
Apache WhirrApache Whirr
Apache Whirr
 
Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36
 
Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesday
 

Último

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 

Último (20)

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 

Challenges for running Hadoop on AWS - AdvancedAWS Meetup

  • 1. Headline Goes Here Speaker Name or Subhead Goes Here DO NOT USE PUBLICLY PRIOR TO 10/23/12 Challenges of running Hadoop on AWS June 12, 2014 @ AdvancedAWS Meetup - Citizen Space Andrei Savu - @andreisavu Software Engineer, Cloud Automation Team
  • 2. Overview ● Introduction ● Context ● Challenges ● Questions
  • 3. Andrei Savu Software Engineer Cloud Automation Team @ Cloudera Previously: founder of Axemblr, Apache Whirr PMC, contributor to jclouds, Cloudsoft, Facebook etc. (see LinkedIn)
  • 4. Cloud Automation Team @ Cloudera Focused on: ● building tools to automate deployment and ongoing management of Hadoop clusters on cloud infrastructure ● improving Hadoop cloud compatibility (e.g. s3 integration, swift, managed databases, custom network topologies etc.) We are hiring!
  • 5. Context ● Hadoop ● Types of Deployments ● Cluster Topology ● AWS
  • 6. Context: Hadoop Hadoop is a broad, coherent stack of products for data storage and processing. “Hadoop” is more than HDFS & MapReduce. It can do: multiple storage systems, different query engines, batch and real-time etc. Usually running on bare metal now moving towards cloud infra.
  • 7. Types of Deployments Long running: - store data for analytics jobs with MapReduce, Impala, Spark - online data serving with HBase On-demand: - analytical workloads, fetch data on-demand - triggered by workflows (1:1) - disconnected lifecycle (Netflix Genie)
  • 8. Cluster Topology #1 Simple: ● EC2 classic (being phased-out) ● VPC: single subnet, security group with an optional VPN Complex: ● VPC: multiple subnets & security groups ● DirectConnect ● highly available with disaster recovery ● multiple users & security
  • 9. Cluster Topology #2 ● Cloudera Reference Architecture for AWS Deployments: http://www.cloudera. com/content/cloudera/en/resources/library/whitepaper/cloudera-enterprise- reference-architecture-for-aws-deployments.html ● Best Practices for Deploying Cloudera Enterprise on AWS: http://blog.cloudera.com/blog/2014/02/best-practices-for-deploying-cloudera- enterprise-on-amazon-web-services/
  • 10. Amazon Web Services Paradigm shift in how we work with infrastructure. Key concept: software defined - controlled by APIs Has most of the things we need for storage and high performance data processing (placement groups, large instances, high storage density, ssds, many vCPUs etc.) Enterprise-ready: IAM, VPC, VPN / DirectConnect, Support etc.
  • 11. Challenges ● Instance Provisioning & Health ● Ensuring Idempotency ● Networking & Performance ● AMIs & Bootstrap Speed ● Data durability ● S3 integration
  • 12. What makes it more difficult? … versus a typical stateless web application in an auto-scaling group monitoring request latency or OS load averages ● statefulness (think databases) ● each cluster has multiple processes playing different roles ● topology & configuration changes require orchestration ● knowledge of service inter-dependencies is required
  • 13. Instance Provisioning & Health Questions: ● How do you define your cluster size to deal lack of capacity? ● How do you define health? Is that stable during setup? ● Is health a binary property? Or a threshold that needs to be continuously evaluated? Potential answers: ● match AWS semantics: define size as a range ● make simplifying assumptions (e.g. healthy during setup)
  • 14. Ensuring Idempotency Questions: ● How do you safely retry expensive calls? ● How do you build reliable workflows? Potential answers: ● AWS User Guide via client token ● Discuss: Convergence vs. Single step retries
  • 15. Networking & Performance Questions: ● What’s the ideal setup that’s both usable and secure? ● How do you get consistent intra-cluster performance? Potential answers: ● VPC with VPN or DirectConnect. Placement groups help. ● Security model: initial it was just perimeter security, now it can do a lot more (disk encryption, SSL, kerberos)
  • 16. Images & Bootstrap Speed Questions: ● Do you allow custom AMIs or force your own choices? ● If using custom AMIs how can you reduce bootstrap time? Potential answers: ● Custom AMIs are common - integrated with existing infra ● Fast bootstrap by baking on top with custom bits
  • 17. Data durability Questions: ● How do you place replicas? Datacenter topology? ● How are instances distributed in different failure domains? Potential answers: ● ignore or go with large instances that map 1:1 to hosts ● would be nice to have: a way to influence host to instance allocation or to get datacenter topology data
  • 18. S3 integration Questions: ● How do you reconcile differences in semantics with HDFS? (strongly consistent vs. eventual consistency) ● How do you get most out of it in terms of performance? Potential answers: ● we’ve done a fair amount of work improving S3 in the open source (features, stability improvements, security etc.) ● performance is network bound
  • 19. Thanks! Questions? Andrei Savu - asavu@cloudera.com Twitter: @andreisavu Join us to take Hadoop to the clouds! https://hire.jobvite.com/Jobvite/job.aspx?j=orafYfwy&b=nqlg3nwW