SlideShare uma empresa Scribd logo
1 de 26
Elastic MapReduce Office Hours April 13th, 2011
Introduction Richard Cole, Lead Engineer Adam Gray, Product Manager
Office Hours IS Simply, Office Hours is a program the enables a technical audience the ability to interact with AWS technical experts. We look to improve this program by soliciting feedback from Office Hours attendees.  Please let us know what you would like to see.
Office Hours is NOT Support If you have a problem with your current deployment please visit our forums or our support website http://aws.amazon.com/premiumsupport/ A place to find out about upcoming services We do not typically disclose information about our services until they are available
Agenda What’s New How-to Demonstrations Resize a running job flow Launch a Hive-based Data Warehouse (Contextual Advertising Example) Question and Answer  Please begin submitting questions now
What’s New?
S3 Multipart Upload ,[object Object]
Can begin upload process before a Hadoop task has finished
Parallel uploads and earlier start mean data-intensive applications finish significantly fasterUsage: ./elastic-mapreduce--create --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop  –args “-c,fs.s3n.multipart.uploads.enabled=true, -c,fs.s3n.multipart.uploads.split.size=524288000”
Elastic IP Address Integration ,[object Object]
Reference the same IP address for a long-running job flow even if it has to be restarted
Reference transient job flows in a consistent way each time they are launchedUsage: ./elastic-mapreduce --create --eip [existing_ip]
EC2 Instance Tagging ,[object Object]
Useful if you are running multiple job flows concurrently or managing a large numbers of Amazon EC2 instances,[object Object]
Example 1 – Resize a running job flow ,[object Object]
Customize cluster size to support varying resource needs (e.g., query support during the day versus batch processing overnight)Job Flow Job Flow Job Flow Data Warehouse (Batch Processing) Allocate  4 instances Expand to  25 instances Expand to  9 instances 3 Hours Data Warehouse (Steady State) Data Warehouse (Steady State) Shrink to  9 instances Expand to  25 instances Time remaining: Time remaining: 14 Hours Time remaining: 7 Hours
Job Flow Architecture
Launch a Job Flow // Launch a job flow with 8 core nodes ./elastic-mapreduce--create --alive --instance-group master --instance-type m2.2xlarge --instance-count 1 --instance-group core --instance-type m1.small --instance-count 8 Created job flow  j-ABABABABAB  // Describe job flow ./elastic-mapreduce --describe j-ABABABABAB
Resize Job Flow // Add 8 m2.4xlarge TASK Nodes ./elastic-mapreduce --jobflow j-ABABABABAB --add-instance-group task --instance-type m2.4xlarge --instance-count 8 // Expand CORE Nodes from 8 to 12 ./elastic-mapreduce --jobflow j-ABABABABAB --modify-instance-group core --instance-count 12 // Shrink TASK Nodes from 8 to 6 ./elastic-mapreduce --jobflow j-ABABABABAB --modify-instance-group task --instance-count 6
Terminate Job Flow // Terminate Job Flow ./elastic-mapreduce--terminate j-ABABABABAB
Example 2 – Launch a Hive-based Data Warehouse Apache Hive Data warehouse for Hadoop Open source project started at Facebook Turns data on Hadoop into a virtually limitless data warehouse Provides data summarization, ad hoc querying and analysis Enables SQL-like queries on structured and unstructured data ,[object Object],Inherits linear scalability of Hadoop
Launch a Hive Cluster(Contextual Advertising Example) // Launch a Hive cluster with cluster compute nodes ./elastic-mapreduce --create --alive --hive-interactive --name "Hive Job Flow" --instance-type cc1.4xlarge Created job flow  j-ABABABABAB  // SSH to Master Node ./elastic-mapreduce --ssh --jobflow j-ABABABABAB  // Run a Hive Session on the Master Node hadoop@domU-12-31-39-07-D2-14:~$ hive -d SAMPLE=s3://elasticmapreduce/samples/hive-ads -d DAY=2009-04-13 -d HOUR=08 -d NEXT_DAY=2009-04-13 -d NEXT_HOUR=09 -d OUTPUT=s3://mybucket/samples/output  hive>
Define Impressions Table Show Tables hive> show tables;  OK Time taken: 3.51 seconds hive>  Define Impressions Table ADD JAR ${SAMPLE}/libs/jsonserde.jar ; CREATE EXTERNAL TABLE impressions (  requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ipstring )  PARTITIONED BY (dt string)  ROW FORMAT  serde'com.amazon.elasticmapreduce.JsonSerde'  with serdeproperties ( 'paths'='requestBeginTime, adId, impressionId,  					referrer, userAgent, userCookie, ip' ) LOCATION '${SAMPLE}/tables/impressions' ;
Recover Partitions The table is partitioned based on time, we can tell Hive about the existence of a single partition using the following statement. ALTER TABLE impressions ADD PARTITION (dt='2009-04-13-08-05') ; If we were to query the table at this point the results would contain data from just the single partition. We can instruct Hive to recover all partitions by inspecting the data stored in Amazon S3 using the RECOVER PARTITIONS statement. ALTER TABLE impressions RECOVER PARTITIONS ;
Define Clicks Table We follow the same process to define the clicks table and recover its partitions. CREATE EXTERNAL TABLE clicks ( impressionIdstring  )  PARTITIONED BY (dt string)  ROW FORMAT  SERDE 'com.amazon.elasticmapreduce.JsonSerde'  WITH SERDEPROPERTIES ( 'paths'='impressionId' )  LOCATION '${SAMPLE}/tables/clicks'  ;  ALTER TABLE clicks RECOVER PARTITIONS ;
Define Output Table We are going to combine the clicks and impressions tables so that we have a record of whether or not each impression resulted in a click. We'd like this data stored in Amazon S3 so that it can be used as input to other job flows.  CREATE EXTERNAL TABLE joined_impressions (  requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ip string, clicked Boolean  )  PARTITIONED BY (day string, hour string)  	STORED AS SEQUENCEFILE  	LOCATION '${OUTPUT}/joined_impressions'  ;

Mais conteúdo relacionado

Mais procurados

AWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWSAWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWSAmazon Web Services
 
Gaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsGaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsAmazon Web Services
 
使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排Amazon Web Services
 
AWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacksAWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacksEmmanuel Quentin
 
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...Julien SIMON
 
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Amazon Web Services
 
Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Web Services
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Laurent Bernaille
 
Running Docker clusters on AWS (June 2016)
Running Docker clusters on AWS (June 2016)Running Docker clusters on AWS (June 2016)
Running Docker clusters on AWS (June 2016)Julien SIMON
 
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot InstancesWorkshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot InstancesAmazon Web Services
 
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017Amazon Web Services
 
AWS basics session
AWS basics sessionAWS basics session
AWS basics sessionSharad Gupta
 
Introduction to Batch Processing on AWS
Introduction to Batch Processing on AWSIntroduction to Batch Processing on AWS
Introduction to Batch Processing on AWSAmazon Web Services
 
AWS Cloud Formation
AWS Cloud Formation AWS Cloud Formation
AWS Cloud Formation Adam Book
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWSAmazon Web Services
 
An introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel AvivAn introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel AvivAmazon Web Services
 

Mais procurados (20)

AWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWSAWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWS
 
Gaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsGaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your Logs
 
使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排
 
AWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacksAWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacks
 
AWS Partner Presentation - SAP
AWS Partner Presentation - SAP AWS Partner Presentation - SAP
AWS Partner Presentation - SAP
 
Understanding The Benefits Of Amazon EC2
Understanding The Benefits Of Amazon EC2Understanding The Benefits Of Amazon EC2
Understanding The Benefits Of Amazon EC2
 
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
 
Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
 
Running Docker clusters on AWS (June 2016)
Running Docker clusters on AWS (June 2016)Running Docker clusters on AWS (June 2016)
Running Docker clusters on AWS (June 2016)
 
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot InstancesWorkshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
 
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
 
AWS basics session
AWS basics sessionAWS basics session
AWS basics session
 
Introduction to Batch Processing on AWS
Introduction to Batch Processing on AWSIntroduction to Batch Processing on AWS
Introduction to Batch Processing on AWS
 
Amazon EC2 Masterclass
Amazon EC2 MasterclassAmazon EC2 Masterclass
Amazon EC2 Masterclass
 
AWS Cloud Formation
AWS Cloud Formation AWS Cloud Formation
AWS Cloud Formation
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
An introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel AvivAn introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
 

Destaque

Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Amazon Web Services
 
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduceMatthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReducehuguk
 
AWS Customer Presentation - kikin
AWS Customer Presentation -  kikinAWS Customer Presentation -  kikin
AWS Customer Presentation - kikinAmazon Web Services
 
AWS 101 business seminar in Taipei
AWS 101 business seminar in TaipeiAWS 101 business seminar in Taipei
AWS 101 business seminar in TaipeiAmazon Web Services
 
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSV
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSVArchitecting for the Cloud: Demo and Best Practicses - Janakiram MSV
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSVAmazon Web Services
 
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWSAWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWSAmazon Web Services
 
Bringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialBringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialAdrian Hornsby
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Amazon Web Services
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesVladimir Simek
 

Destaque (11)

Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
 
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduceMatthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
 
AWS Customer Presentation - kikin
AWS Customer Presentation -  kikinAWS Customer Presentation -  kikin
AWS Customer Presentation - kikin
 
AWS 101 business seminar in Taipei
AWS 101 business seminar in TaipeiAWS 101 business seminar in Taipei
AWS 101 business seminar in Taipei
 
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSV
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSVArchitecting for the Cloud: Demo and Best Practicses - Janakiram MSV
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSV
 
Running databases on AWS
Running databases on AWSRunning databases on AWS
Running databases on AWS
 
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWSAWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
 
Bringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialBringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potential
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
 
Technical Track
Technical TrackTechnical Track
Technical Track
 

Semelhante a AWS Office Hours: Amazon Elastic MapReduce

Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCampRichard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCampBigDataCamp
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR MasterclassIan Massingham
 
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...AWS Chicago
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainYahoo Developer Network
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...Big Data Spain
 
Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04Krishna Sankar
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...Amazon Web Services
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
Tuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paperTuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paperVinay Kumar
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and DrupalPromet Source
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & ZingLong Dao
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...Altinity Ltd
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewDan Morrill
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewMadhur Nawandar
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseNag Arvind Gudiseva
 
Advance Sql Server Store procedure Presentation
Advance Sql Server Store procedure PresentationAdvance Sql Server Store procedure Presentation
Advance Sql Server Store procedure PresentationAmin Uddin
 
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)Amazon Web Services Korea
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zingzingopen
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 

Semelhante a AWS Office Hours: Amazon Elastic MapReduce (20)

Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCampRichard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 
Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Tuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paperTuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paper
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and Drupal
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & Zing
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overview
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBase
 
Advance Sql Server Store procedure Presentation
Advance Sql Server Store procedure PresentationAdvance Sql Server Store procedure Presentation
Advance Sql Server Store procedure Presentation
 
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 

Mais de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Mais de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Último (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

AWS Office Hours: Amazon Elastic MapReduce

  • 1. Elastic MapReduce Office Hours April 13th, 2011
  • 2. Introduction Richard Cole, Lead Engineer Adam Gray, Product Manager
  • 3. Office Hours IS Simply, Office Hours is a program the enables a technical audience the ability to interact with AWS technical experts. We look to improve this program by soliciting feedback from Office Hours attendees. Please let us know what you would like to see.
  • 4. Office Hours is NOT Support If you have a problem with your current deployment please visit our forums or our support website http://aws.amazon.com/premiumsupport/ A place to find out about upcoming services We do not typically disclose information about our services until they are available
  • 5. Agenda What’s New How-to Demonstrations Resize a running job flow Launch a Hive-based Data Warehouse (Contextual Advertising Example) Question and Answer Please begin submitting questions now
  • 7.
  • 8. Can begin upload process before a Hadoop task has finished
  • 9. Parallel uploads and earlier start mean data-intensive applications finish significantly fasterUsage: ./elastic-mapreduce--create --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop –args “-c,fs.s3n.multipart.uploads.enabled=true, -c,fs.s3n.multipart.uploads.split.size=524288000”
  • 10.
  • 11. Reference the same IP address for a long-running job flow even if it has to be restarted
  • 12. Reference transient job flows in a consistent way each time they are launchedUsage: ./elastic-mapreduce --create --eip [existing_ip]
  • 13.
  • 14.
  • 15.
  • 16. Customize cluster size to support varying resource needs (e.g., query support during the day versus batch processing overnight)Job Flow Job Flow Job Flow Data Warehouse (Batch Processing) Allocate 4 instances Expand to 25 instances Expand to 9 instances 3 Hours Data Warehouse (Steady State) Data Warehouse (Steady State) Shrink to 9 instances Expand to 25 instances Time remaining: Time remaining: 14 Hours Time remaining: 7 Hours
  • 18. Launch a Job Flow // Launch a job flow with 8 core nodes ./elastic-mapreduce--create --alive --instance-group master --instance-type m2.2xlarge --instance-count 1 --instance-group core --instance-type m1.small --instance-count 8 Created job flow j-ABABABABAB // Describe job flow ./elastic-mapreduce --describe j-ABABABABAB
  • 19. Resize Job Flow // Add 8 m2.4xlarge TASK Nodes ./elastic-mapreduce --jobflow j-ABABABABAB --add-instance-group task --instance-type m2.4xlarge --instance-count 8 // Expand CORE Nodes from 8 to 12 ./elastic-mapreduce --jobflow j-ABABABABAB --modify-instance-group core --instance-count 12 // Shrink TASK Nodes from 8 to 6 ./elastic-mapreduce --jobflow j-ABABABABAB --modify-instance-group task --instance-count 6
  • 20. Terminate Job Flow // Terminate Job Flow ./elastic-mapreduce--terminate j-ABABABABAB
  • 21.
  • 22. Launch a Hive Cluster(Contextual Advertising Example) // Launch a Hive cluster with cluster compute nodes ./elastic-mapreduce --create --alive --hive-interactive --name "Hive Job Flow" --instance-type cc1.4xlarge Created job flow j-ABABABABAB // SSH to Master Node ./elastic-mapreduce --ssh --jobflow j-ABABABABAB // Run a Hive Session on the Master Node hadoop@domU-12-31-39-07-D2-14:~$ hive -d SAMPLE=s3://elasticmapreduce/samples/hive-ads -d DAY=2009-04-13 -d HOUR=08 -d NEXT_DAY=2009-04-13 -d NEXT_HOUR=09 -d OUTPUT=s3://mybucket/samples/output hive>
  • 23. Define Impressions Table Show Tables hive> show tables; OK Time taken: 3.51 seconds hive> Define Impressions Table ADD JAR ${SAMPLE}/libs/jsonserde.jar ; CREATE EXTERNAL TABLE impressions ( requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ipstring ) PARTITIONED BY (dt string) ROW FORMAT serde'com.amazon.elasticmapreduce.JsonSerde' with serdeproperties ( 'paths'='requestBeginTime, adId, impressionId, referrer, userAgent, userCookie, ip' ) LOCATION '${SAMPLE}/tables/impressions' ;
  • 24. Recover Partitions The table is partitioned based on time, we can tell Hive about the existence of a single partition using the following statement. ALTER TABLE impressions ADD PARTITION (dt='2009-04-13-08-05') ; If we were to query the table at this point the results would contain data from just the single partition. We can instruct Hive to recover all partitions by inspecting the data stored in Amazon S3 using the RECOVER PARTITIONS statement. ALTER TABLE impressions RECOVER PARTITIONS ;
  • 25. Define Clicks Table We follow the same process to define the clicks table and recover its partitions. CREATE EXTERNAL TABLE clicks ( impressionIdstring ) PARTITIONED BY (dt string) ROW FORMAT SERDE 'com.amazon.elasticmapreduce.JsonSerde' WITH SERDEPROPERTIES ( 'paths'='impressionId' ) LOCATION '${SAMPLE}/tables/clicks' ; ALTER TABLE clicks RECOVER PARTITIONS ;
  • 26. Define Output Table We are going to combine the clicks and impressions tables so that we have a record of whether or not each impression resulted in a click. We'd like this data stored in Amazon S3 so that it can be used as input to other job flows. CREATE EXTERNAL TABLE joined_impressions ( requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ip string, clicked Boolean ) PARTITIONED BY (day string, hour string) STORED AS SEQUENCEFILE LOCATION '${OUTPUT}/joined_impressions' ;
  • 27. Define Local Impressions Table Next, we create some temporary tables in the job flow's local HDFS partition to store intermediate impression and click data. CREATE TABLE tmp_impressions( requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ip string ) STORED AS SEQUENCEFILE ; We insert data from the impressions table for the time duration we're interested in. Note that because the impressions table is partitioned only the relevant partitions will be read. INSERT OVERWRITE TABLE tmp_impressions SELECT from_unixtime(cast((cast(i.requestBeginTime as bigint) / 1000) as int)) requestBeginTime, i.adId, i.impressionId, i.referrer, i.userAgent, i.userCookie, i.ip FROM impressions i WHERE i.dt >= '${DAY}-${HOUR}-00' AND i.dt < '${NEXT_DAY}-${NEXT_HOUR}-00' ;
  • 28. Define Local Clicks Table For clicks, we extend the period of time over which we join by 20 minutes. Meaning we accept a click that occurred up to 20 minutes after the impression. CREATE TABLE tmp_clicks( impressionIdstring ) STORED AS SEQUENCEFILE; INSERT OVERWRITE TABLE tmp_clicks SELECT impressionId FROM clicks c WHERE c.dt>= '${DAY}-${HOUR}-00' AND c.dt < '${NEXT_DAY}-${NEXT_HOUR}-20' ;
  • 29. Join Tables Now we combine the impressions and clicks tables using a left outer join. This way any impressions that did not result in a click are preserved. INSERT OVERWRITE TABLE joined_impressions PARTITION (day='${DAY}', hour='${HOUR}') SELECT i.requestBeginTime, i.adId, i.impressionId, i.referrer, i.userAgent, i.userCookie, i.ip, (c.impressionId is not null) clicked FROM tmp_impressionsi LEFT OUTER JOIN tmp_clicksc ON i.impressionId = c.impressionId ;
  • 30. Terminate Interactive Session // Terminate Job Flow ./elastic-mapreduce--terminate j-ABABABABAB
  • 31. Question & Answer Visithttp://aws.amazon.com/officehours to watch recorded sessions and to sign up for upcoming sessions.