SlideShare uma empresa Scribd logo
1 de 41
Making Hadoop Enterprise ready with Amazon Elastic Map/Reduce Simone Brunozzi Technology Evangelist, Amazon Web Services, APAC twitter: @simon Blog: www.brunozzi.com
What is Elastic MapReduce  Use Cases Service Features New Feature Announcements  Elastic MapReduce Ecosystem AGENDA
Enables customers to easily, securely and cost-effectively process vast amounts of data. Spin-up 10s or 100s or even 1000s of instances Process 10s or 100s of Terabytes of data Hosted Hadoop framework running on the web-scale infrastructure of Amazon. What is Amazon Elastic MapReduce
[object Object]
AWS Management Console
Command line interface
REST API ,[object Object]
Problems customers solve with Elastic MapReduce Data mining and BI Log processing, click stream analysis, similarities, advertizing Data warehousing applications Bio-informatics (Genome analysis)  Financial simulation (Monte Carlo simulation) File processing (resize jpegs) Web indexing
Web-Scale Data warehousing
Hadoop 0.20 Pig 0.6 Hive 0.5 Cascading 1.1 ELASTIC MAPREDUCE – SUPPORTED CONFIGURATIONS Hadoop 0.18 Pig 0.3 Hive 0.4 Cascading 1.1
Apache Hive  Batch and Interactive Mode Support Hive Steps Integration with Elastic MapReduce Client and Management Console Load table partitions automatically to/from Amazon S3 Optimized data writes to Amazon S3 Reference resources such as streaming scripts located on Amazon S3 Specify an off-instance metadata store  Support variables defined directly in Hive script  Supports JDBC and ODBC connections ELASTIC MAPREDUCE – HIVE FEATURES
Apache Pig  Batch and interactive mode Support Pig Steps Integration with Elastic MapReduce Client and Management Console Concurrent access to multiple file systems (HDFS, Amazon S3) Reference resources in Amazon S3 directly from Pig script Several User Defined Functions in Piggy Bank ELASTIC MAPREDUCE – PIG FEATURES
Enterprise customers need more flexibility Configuring Clusters Running Clusters Paying for clusters  Enterprise customers need more tools  Application development  Data analytics Enterprise customers need support options Forums support is not enough Amazon Elastic MapReduce For Enterprise
Amazon Elastic MapReduce features	 Bootstrap actions Run arbitrary scripts before job flow begins  Run on all nodes before data processing begins  Used for  Hadoop configuration (site-conf, Hadoop-conf, etc.) Cluster configuration (memory, swap, etc.) Application/packages installation (app-get install r-base)  Several pre-defined bootstrap actions available
Enterprise customers need more flexibility Configuring Clusters Running Clusters Paying for clusters  Enterprise customers need more tools  Application development  Data analytics Enterprise customers need support options Forum support is not enough  Amazon Elastic MapReduce For Enterprise
Amazon Elastic MapReduce - new features	 Preannounce: Expand running clusters Increase number of nodes in a running cluster Increase processing speed Increasing HDFS size
Use Case: Increase speed of running job flows Speed up job flow execution in response to changing requirements Dynamically balance cost versus performance without restarting a job PREANNOUNCE – EXPAND/SHRINK CLUSTERS Job Flow Job Flow Job Flow 3 Hours Allocate  4 instances Expand to  25 instances Expand to  9 instances Time remaining: Time remaining: 14 Hours 7 Hours Time remaining:
Amazon Elastic MapReduce - new features	 Shrink running clusters Decrease number of nodes in a running job flow Different capacity requirements from step to step Automatically regulate capacity between steps
Use Case: Agile Data Warehouse Cluster Customize cluster size to support varying resource needs (e.g., query support during the day versus batch processing overnight) Leverage flexibility to reduce costs and increase cluster utilization EXPAND/SHRINK CLUSTERS Data Warehouse (Batch Processing) Data Warehouse (Steady State) Data Warehouse (Steady State) Allocate  9 instances Expand to  25 instances Shrink to  9 instances
Enterprise customers need more flexibility Configuring Clusters Running Clusters Paying for clusters  Enterprise customers need more tools  Application development  Data analytics Enterprise customers need support options Forums support is not enough Amazon Elastic MapReduce For Enterprise
Amazon Elastic MapReduce Price
What is a Spot Instance? Way to purchase & consume EC2 instances based on compute value Reduce your computing costs Bid for unused EC2 capacity Control your costs Differences from On-Demand Instances: Request – maximum price bid Spot Price – what you pay Termination
M2.xlarge instance pricing history Amazon EC2 On-Demand price for the same instance is $0.50
Amazon Elastic MapReduce – new feature	 Spot pricing support for Elastic MapReduce job flows Specify the price you want to pay for instances Elastic MapReduce takes care of  Provisioning Node addition and removal to/from the cluster Can mix On-Demand and Spot instances in the same cluster
Use Case: Manage cost of running job flows Start with 4 On-Demand instances of type m2.xlarge Expand the cluster with 5 Spot Nodes Cost without Spot: 4 instances *14 hrs * $0.50 = $28 Cost with Spot: 4 instances *7 hrs * $0.50 = $13 + 5 instances * 7 hrs * $0.25 = $8.75 Total = $21.75 Savings: ~22% Integration with EC2 Spot Job Flow Job Flow Allocate  4 instances Expand to  9 instances Time remaining: Time remaining: 14 Hours 7 Hours
Enterprise customers need more flexibility Configuring Clusters Running Clusters Paying for clusters  Enterprise customers need more tools  Application development  Data analytics Enterprise customers need support options Forums support is not enough Amazon Elastic MapReduce For Enterprise
Elastic MapReduce Ecosystem Ecosystem is growing Integrated development environments for Hadoop Tools designed for data analytics Broad support for Amazon Elastic MapReduce
Big Data Intelligence software  For developers and analysts to work faster and easier Purpose built for all popular Hadoop distros and versions Tightly integrated with Elastic MapReduce (since 2009) Built on Karmasphere Application Framework™ Native Hadoop client-side platform Karmasphere
Free version from www.karmasphere.com Karmasphere Studio Professional Edition Analyst Edition Rich graphical environment Develop, debug and deploy easily Visualize, manipulate & diagnose Jobs, clusters & file systems Broad and deep Elastic MapReduce support Rapid development Comprehensive profiling Rich debugging ,[object Object]
Robust Hive implementation
Syntax checking, diagnostics, schema browser, JDBC4 compliance, multi-threaded and concurrent
No cluster changes
Works over proxies and firewalls
Integrated Hadoop monitoring,[object Object]
Web Logs Social Media CRM Sales Excel Files Customer Data Datameer Analytics Solution  Amazon Elastic MapReduce
MicroStrategy is a Global Leader in Business Intelligence Corporate Overview Founded in 1989 Largest independent public BI vendor (NASDAQ: MSTR) Positioned in the Gartner “Leader Quadrant” for BI Platforms Over 1 million business users at over 3,000 organizations The MicroStrategy 9 business intelligence platform enables mobile apps, dashboards, reporting and analytics with your business data Build once, deliver instantly and securely any time, to any device

Mais conteúdo relacionado

Mais procurados

Blind spots in big data erez koren @ forter
Blind spots in big data erez koren @ forterBlind spots in big data erez koren @ forter
Blind spots in big data erez koren @ forterIdo Shilon
 
Scaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksScaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksDatabricks
 
Machine learning on kubernetes
Machine learning on kubernetesMachine learning on kubernetes
Machine learning on kubernetesAnirudh Ramanathan
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with AirflowErin Shellman
 
Anomaly Detection at Scale!
Anomaly Detection at Scale!Anomaly Detection at Scale!
Anomaly Detection at Scale!Databricks
 
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...Sri Ambati
 
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta JhaMonitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta JhaCodeOps Technologies LLP
 
Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeRoland Kuhn
 
Driving a Digital Thread Program in Manufacturing with Apache Kafka | Anu Mis...
Driving a Digital Thread Program in Manufacturing with Apache Kafka | Anu Mis...Driving a Digital Thread Program in Manufacturing with Apache Kafka | Anu Mis...
Driving a Digital Thread Program in Manufacturing with Apache Kafka | Anu Mis...HostedbyConfluent
 
Hydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on KubeflowHydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on KubeflowRustem Zakiev
 
Building a Streaming Data Pipeline for Trains Delays Processing
Building a Streaming Data Pipeline for Trains Delays ProcessingBuilding a Streaming Data Pipeline for Trains Delays Processing
Building a Streaming Data Pipeline for Trains Delays ProcessingDatabricks
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringDatabricks
 
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...confluent
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafkajexp
 
Best Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkBest Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkDatabricks
 
AI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksAI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksDatabricks
 
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...CodeOps Technologies LLP
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...HostedbyConfluent
 
Real time Object Detection and Analytics using RedisEdge and Docker
Real time Object Detection and Analytics using RedisEdge and DockerReal time Object Detection and Analytics using RedisEdge and Docker
Real time Object Detection and Analytics using RedisEdge and DockerAjeet Singh Raina
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...HostedbyConfluent
 

Mais procurados (20)

Blind spots in big data erez koren @ forter
Blind spots in big data erez koren @ forterBlind spots in big data erez koren @ forter
Blind spots in big data erez koren @ forter
 
Scaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksScaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber Attacks
 
Machine learning on kubernetes
Machine learning on kubernetesMachine learning on kubernetes
Machine learning on kubernetes
 
Building Robust Pipelines with Airflow
Building Robust Pipelines with AirflowBuilding Robust Pipelines with Airflow
Building Robust Pipelines with Airflow
 
Anomaly Detection at Scale!
Anomaly Detection at Scale!Anomaly Detection at Scale!
Anomaly Detection at Scale!
 
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
 
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta JhaMonitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
Monitor Azure Kubernetes Cluster With Prometheus by Mamta Jha
 
Akka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in PracticeAkka and AngularJS – Reactive Applications in Practice
Akka and AngularJS – Reactive Applications in Practice
 
Driving a Digital Thread Program in Manufacturing with Apache Kafka | Anu Mis...
Driving a Digital Thread Program in Manufacturing with Apache Kafka | Anu Mis...Driving a Digital Thread Program in Manufacturing with Apache Kafka | Anu Mis...
Driving a Digital Thread Program in Manufacturing with Apache Kafka | Anu Mis...
 
Hydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on KubeflowHydrosphere.io for ODSC: Webinar on Kubeflow
Hydrosphere.io for ODSC: Webinar on Kubeflow
 
Building a Streaming Data Pipeline for Trains Delays Processing
Building a Streaming Data Pipeline for Trains Delays ProcessingBuilding a Streaming Data Pipeline for Trains Delays Processing
Building a Streaming Data Pipeline for Trains Delays Processing
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy MonitoringApache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
 
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
Flattening the Curve with Kafka (Rishi Tarar, Northrop Grumman Corp.) Kafka S...
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafka
 
Best Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache SparkBest Practices for Engineering Production-Ready Software with Apache Spark
Best Practices for Engineering Production-Ready Software with Apache Spark
 
AI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with DatabricksAI Modernization at AT&T and the Application to Fraud with Databricks
AI Modernization at AT&T and the Application to Fraud with Databricks
 
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
Creating Event Driven Serverless Applications - Sandeep - Adobe - Serverless ...
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
 
Real time Object Detection and Analytics using RedisEdge and Docker
Real time Object Detection and Analytics using RedisEdge and DockerReal time Object Detection and Analytics using RedisEdge and Docker
Real time Object Detection and Analytics using RedisEdge and Docker
 
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
Apache Kafka and API Management / API Gateway – Friends, Enemies or Frenemies...
 

Destaque

Introduction to Elastic MapReduce
Introduction to Elastic MapReduceIntroduction to Elastic MapReduce
Introduction to Elastic MapReduceAmazon Web Services
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceJ Singh
 
Scaling your analytics with Amazon EMR
Scaling your analytics with Amazon EMRScaling your analytics with Amazon EMR
Scaling your analytics with Amazon EMRIsrael AWS User Group
 
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in HadoopOctober 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in HadoopYahoo Developer Network
 
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...Yahoo Developer Network
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
 
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Web Services
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Amazon Web Services
 
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSBuilding a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSSmartNews, Inc.
 

Destaque (9)

Introduction to Elastic MapReduce
Introduction to Elastic MapReduceIntroduction to Elastic MapReduce
Introduction to Elastic MapReduce
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/Reduce
 
Scaling your analytics with Amazon EMR
Scaling your analytics with Amazon EMRScaling your analytics with Amazon EMR
Scaling your analytics with Amazon EMR
 
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in HadoopOctober 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
 
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
 
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSBuilding a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWS
 

Semelhante a Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi

AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAmazon Web Services
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Analytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAnalytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAmazon Web Services
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonData Con LA
 
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsAmazon Web Services
 
Cloud Crowd GigaSpaces Presentation
Cloud Crowd GigaSpaces PresentationCloud Crowd GigaSpaces Presentation
Cloud Crowd GigaSpaces Presentationjimliddle
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...Amazon Web Services
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overviewjimliddle
 
AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11Amazon Web Services
 
Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13Amazon Web Services
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
 
AWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAmazon Web Services
 
Architecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The CloudArchitecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The CloudDavid Chou
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRProvectus
 
Technology Overview
Technology OverviewTechnology Overview
Technology OverviewLiran Zelkha
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 
AWS Summit 2013 | India - Running Enterprise Applications like SAP, Oracle an...
AWS Summit 2013 | India - Running Enterprise Applications like SAP, Oracle an...AWS Summit 2013 | India - Running Enterprise Applications like SAP, Oracle an...
AWS Summit 2013 | India - Running Enterprise Applications like SAP, Oracle an...Amazon Web Services
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Amazon Web Services
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computingwebscale
 

Semelhante a Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi (20)

AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Analytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAnalytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWS
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
 
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on aws
 
Cloud Crowd GigaSpaces Presentation
Cloud Crowd GigaSpaces PresentationCloud Crowd GigaSpaces Presentation
Cloud Crowd GigaSpaces Presentation
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
 
AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11AWS webinar what is cloud computing 13 09 11
AWS webinar what is cloud computing 13 09 11
 
Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13Aws what is cloud computing deck 08 14 13
Aws what is cloud computing deck 08 14 13
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
AWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution Showcase
 
Architecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The CloudArchitecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The Cloud
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
 
Technology Overview
Technology OverviewTechnology Overview
Technology Overview
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
AWS Summit 2013 | India - Running Enterprise Applications like SAP, Oracle an...
AWS Summit 2013 | India - Running Enterprise Applications like SAP, Oracle an...AWS Summit 2013 | India - Running Enterprise Applications like SAP, Oracle an...
AWS Summit 2013 | India - Running Enterprise Applications like SAP, Oracle an...
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
Big Data in the Cloud
Big Data in the CloudBig Data in the Cloud
Big Data in the Cloud
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 

Mais de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Mais de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Apache Hadoop India Summit 2011 talk "Making Hadoop Enterprise Ready with Amazon Elastic Map/Reduce" by Simone Brunozzi

  • 1. Making Hadoop Enterprise ready with Amazon Elastic Map/Reduce Simone Brunozzi Technology Evangelist, Amazon Web Services, APAC twitter: @simon Blog: www.brunozzi.com
  • 2. What is Elastic MapReduce Use Cases Service Features New Feature Announcements Elastic MapReduce Ecosystem AGENDA
  • 3. Enables customers to easily, securely and cost-effectively process vast amounts of data. Spin-up 10s or 100s or even 1000s of instances Process 10s or 100s of Terabytes of data Hosted Hadoop framework running on the web-scale infrastructure of Amazon. What is Amazon Elastic MapReduce
  • 4.
  • 7.
  • 8. Problems customers solve with Elastic MapReduce Data mining and BI Log processing, click stream analysis, similarities, advertizing Data warehousing applications Bio-informatics (Genome analysis) Financial simulation (Monte Carlo simulation) File processing (resize jpegs) Web indexing
  • 10. Hadoop 0.20 Pig 0.6 Hive 0.5 Cascading 1.1 ELASTIC MAPREDUCE – SUPPORTED CONFIGURATIONS Hadoop 0.18 Pig 0.3 Hive 0.4 Cascading 1.1
  • 11. Apache Hive Batch and Interactive Mode Support Hive Steps Integration with Elastic MapReduce Client and Management Console Load table partitions automatically to/from Amazon S3 Optimized data writes to Amazon S3 Reference resources such as streaming scripts located on Amazon S3 Specify an off-instance metadata store Support variables defined directly in Hive script Supports JDBC and ODBC connections ELASTIC MAPREDUCE – HIVE FEATURES
  • 12. Apache Pig Batch and interactive mode Support Pig Steps Integration with Elastic MapReduce Client and Management Console Concurrent access to multiple file systems (HDFS, Amazon S3) Reference resources in Amazon S3 directly from Pig script Several User Defined Functions in Piggy Bank ELASTIC MAPREDUCE – PIG FEATURES
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. Enterprise customers need more flexibility Configuring Clusters Running Clusters Paying for clusters Enterprise customers need more tools Application development Data analytics Enterprise customers need support options Forums support is not enough Amazon Elastic MapReduce For Enterprise
  • 18. Amazon Elastic MapReduce features Bootstrap actions Run arbitrary scripts before job flow begins Run on all nodes before data processing begins Used for Hadoop configuration (site-conf, Hadoop-conf, etc.) Cluster configuration (memory, swap, etc.) Application/packages installation (app-get install r-base) Several pre-defined bootstrap actions available
  • 19.
  • 20. Enterprise customers need more flexibility Configuring Clusters Running Clusters Paying for clusters Enterprise customers need more tools Application development Data analytics Enterprise customers need support options Forum support is not enough Amazon Elastic MapReduce For Enterprise
  • 21. Amazon Elastic MapReduce - new features Preannounce: Expand running clusters Increase number of nodes in a running cluster Increase processing speed Increasing HDFS size
  • 22. Use Case: Increase speed of running job flows Speed up job flow execution in response to changing requirements Dynamically balance cost versus performance without restarting a job PREANNOUNCE – EXPAND/SHRINK CLUSTERS Job Flow Job Flow Job Flow 3 Hours Allocate 4 instances Expand to 25 instances Expand to 9 instances Time remaining: Time remaining: 14 Hours 7 Hours Time remaining:
  • 23. Amazon Elastic MapReduce - new features Shrink running clusters Decrease number of nodes in a running job flow Different capacity requirements from step to step Automatically regulate capacity between steps
  • 24. Use Case: Agile Data Warehouse Cluster Customize cluster size to support varying resource needs (e.g., query support during the day versus batch processing overnight) Leverage flexibility to reduce costs and increase cluster utilization EXPAND/SHRINK CLUSTERS Data Warehouse (Batch Processing) Data Warehouse (Steady State) Data Warehouse (Steady State) Allocate 9 instances Expand to 25 instances Shrink to 9 instances
  • 25. Enterprise customers need more flexibility Configuring Clusters Running Clusters Paying for clusters Enterprise customers need more tools Application development Data analytics Enterprise customers need support options Forums support is not enough Amazon Elastic MapReduce For Enterprise
  • 27. What is a Spot Instance? Way to purchase & consume EC2 instances based on compute value Reduce your computing costs Bid for unused EC2 capacity Control your costs Differences from On-Demand Instances: Request – maximum price bid Spot Price – what you pay Termination
  • 28. M2.xlarge instance pricing history Amazon EC2 On-Demand price for the same instance is $0.50
  • 29. Amazon Elastic MapReduce – new feature Spot pricing support for Elastic MapReduce job flows Specify the price you want to pay for instances Elastic MapReduce takes care of Provisioning Node addition and removal to/from the cluster Can mix On-Demand and Spot instances in the same cluster
  • 30. Use Case: Manage cost of running job flows Start with 4 On-Demand instances of type m2.xlarge Expand the cluster with 5 Spot Nodes Cost without Spot: 4 instances *14 hrs * $0.50 = $28 Cost with Spot: 4 instances *7 hrs * $0.50 = $13 + 5 instances * 7 hrs * $0.25 = $8.75 Total = $21.75 Savings: ~22% Integration with EC2 Spot Job Flow Job Flow Allocate 4 instances Expand to 9 instances Time remaining: Time remaining: 14 Hours 7 Hours
  • 31. Enterprise customers need more flexibility Configuring Clusters Running Clusters Paying for clusters Enterprise customers need more tools Application development Data analytics Enterprise customers need support options Forums support is not enough Amazon Elastic MapReduce For Enterprise
  • 32. Elastic MapReduce Ecosystem Ecosystem is growing Integrated development environments for Hadoop Tools designed for data analytics Broad support for Amazon Elastic MapReduce
  • 33. Big Data Intelligence software For developers and analysts to work faster and easier Purpose built for all popular Hadoop distros and versions Tightly integrated with Elastic MapReduce (since 2009) Built on Karmasphere Application Framework™ Native Hadoop client-side platform Karmasphere
  • 34.
  • 36. Syntax checking, diagnostics, schema browser, JDBC4 compliance, multi-threaded and concurrent
  • 38. Works over proxies and firewalls
  • 39.
  • 40. Web Logs Social Media CRM Sales Excel Files Customer Data Datameer Analytics Solution Amazon Elastic MapReduce
  • 41. MicroStrategy is a Global Leader in Business Intelligence Corporate Overview Founded in 1989 Largest independent public BI vendor (NASDAQ: MSTR) Positioned in the Gartner “Leader Quadrant” for BI Platforms Over 1 million business users at over 3,000 organizations The MicroStrategy 9 business intelligence platform enables mobile apps, dashboards, reporting and analytics with your business data Build once, deliver instantly and securely any time, to any device
  • 42. What can you do with MicroStrategy and Amazon Elastic MapReduce? Deliver insights to a broader range of users. End users interact with a point-and-click interface to query data without writing HiveQL or MapReduce jobs Use cases: Mobile Apps: Floor manager accesses order details stored in Amazon Elastic MapReduce through a custom iPhone App Dashboards: End user starts with a Dynamic Dashboard populated from data mart or data warehouse. The user then drills to a detail report that executes in Amazon Elastic MapReduce. Reporting: Application developer builds a parameterized HiveQL report, then schedules it to execute. Jobs execute against Amazon Elastic MapReduce and MicroStrategy sends out exception based alerts via email to end users. Analysis: Application developer populates a multidimensional cache in MicroStrategy with results of a HiveQL query. End user uses MicroStrategy’s web interface to slice-and-dice through results without going back to Hadoop.
  • 43. How can I learn more? Try it! Free MicroStrategy software is available at: http://www.microstrategy.com/freereportingsoftware Get More information about Microstrategy solutions for Amazon Elastic MapReduce http://aws.amazon.com/solutions/solution-providers/microstrategy
  • 44. Enterprise customers need more flexibility Configuring Clusters Running Clusters Paying for clusters Enterprise customers need more tools Application development Data analytics Enterprise customers need more support options Forums support is not enough Amazon Elastic MapReduce For Enterprise
  • 46. Enterprise customers need more flexibility Configuring Clusters Running Clusters Paying for clusters Enterprise customers need more tools Application development Data analytics Enterprise customers need more support options Forums support is not enough Amazon Elastic MapReduce For Enterprise
  • 47. Making Hadoop Enterprise ready with Amazon Elastic Map/Reduce Simone Brunozzi Technology Evangelist, Amazon Web Services, APAC twitter: @simon Blog: www.brunozzi.com