SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
Amazon Elastic MapReduce
MY BACKGROUND

•   Based in Seattle, WA

•   Education:
     –   BS in Computer Science, The American University, 1985
     –   Graduate student in Digital Media, University of Washington, 2010

•   Background:
     –   Microsoft Visual Studio team
     –   Consulting to startups and VC’s
     –   Amazon employee since 2002

•   Evangelist:
     –   Speak
     –   Write
     –   Tweet

•   Author, “Host Your Web Site in the Cloud”

•   Email: jbarr@amazon.com
•   Twitter: @jeffbarr
AGENDA

• What is Big Data
• Elastic MapReduce Overview
• Example Use Cases
• Ecosystem and Tools
• Upcoming Features
• Discussion
W HAT IS BIG DATA?

• Doesn’t refer just to volume
    – You can benefit from Big Data infrastructure
      without having a ton of data
    – Many existing technologies have little
      problem physically handling large volumes


• Challenges result from the
  combination of data volume, data
  structure, and usage demands from
  that data, usually tied to timeliness

• Big Data Tools are needed to provide
  a holistic view of enterprise data and
  systematically harness it for insights
  and trends
WHAT IS AMAZON ELASTIC MAPREDUCE

• Enables customers to easily, securely and
  cost-effectively process vast amounts of
  data:
  – Spin-up hundreds of instances
  – Process hundreds of terabytes of data


• Hosted Hadoop framework running on
  Amazon’s web-scale infrastructure
• Launch and monitor job flows
   • AWS Management Console
   • Command line interface
   • REST API
WHY USE AMAZON ELASTIC MAPREDUCE

• Elastic MapReduce removes “MUCK”
  from Big Data processing
   – Hard to manage compute clusters
   – Hard to tune Hadoop
   – Hard to monitor running Job Flows
   – Hard to debug Hadoop jobs
   – Hadoop issues prevent smooth
     operation in the cloud
PROBLEMS CUSTOMERS SOLVE WITH
         ELASTIC MAPREDUCE
•   Targeted advertising / Clickstream analysis
•   Data warehousing applications
•   Bio-informatics (Genome analysis)
•   Financial simulation (Monte Carlo simulation)
•   File processing (resize jpegs)
•   Web indexing
•   Data mining and BI
HARDWARE REQUIREMENTS FOR USE CASES

• Data or I/O Intensive (m1/m2 instances)
   – Data Warehouse
   – Data Mining
      • Click stream, logs, events, etc.
• Compute or I/O Intensive (c1, cc1/HPC instances)
   –   Credit Ratings
   –   Fraud Models
   –   Portfolio analysis
   –   VaR calculation
CLICKSTREAM ANALYSIS – R AZORFISH AND BEST BUY

• Best Buy came to Razorfish
   – 3.5 billion records, 71 million unique cookies, 1.7 million targeted ads
     required per day

                   User recently
                   purchased a
                   home theater         Targeted Ad
                   system and is
                   searching for     (1.7 Million per day)
                   video games


• Leveraged AWS and Elastic MapReduce
  – 100 node cluster on demand
  – Processing time dropped from 2+ days to 8 hours
  – Increased ROAS (Return on Advertising Spend) by 500%
CLICKSTREAM ANALYSIS - ARCHITECTURE
W HAT IS MAPR EDUCE?

                       •   Invented by Google
                       •   New processing model
                       •   Highly scalable
                       •   Easy to understand
                       •   Industry standard
                       •   Something worth knowing
ELASTIC MAPR EDUCE MODEL – O VERVIEW

•   Take input data
•   Break in to sub-problems
•   Distribute to worker nodes
•   Worker nodes process sub-problems in parallel
•   Take output of worker nodes and reduce to answer
MAPR EDUCE EXAMPLE – W ORD COUNT

  Input                                          Output
                                                 “This”, 3


                                                 “Word”, 2




Map Phase                                        Reduce
                                  “This”, Doc1    Phase
            “This”, Doc1
 Mapper                           “This”, Doc2   Reducer
            “Word”, Doc1   Sort
                                  “This”, Doc3
 Mapper     “This”, Doc2

            “This”, Doc3
 Mapper                           “Word”, Doc1
            “Word”, Doc3                         Reducer
                                  “Word”, Doc3
ELASTIC MAPR EDUCE MODEL – DETAILED
ELASTIC MAPR EDUCE IN ACTION – S3 L OG F ILE
ELASTIC MAPR EDUCE IN ACTION – S TEP 1
ELASTIC MAPR EDUCE IN ACTION – S TEP 2
ELASTIC MAPR EDUCE IN ACTION – S TEP 3
ELASTIC MAPR EDUCE IN ACTION – S TEP 4
ELASTIC MAPR EDUCE IN ACTION – S TEP 5
ELASTIC MAPR EDUCE IN ACTION – S TEP 6
ELASTIC MAPR EDUCE IN ACTION – S TEP 7
ELASTIC MAPR EDUCE IN ACTION - R ESULTS
NOTES / ATTRIBUTES

• Mapper and Reducer in Java JAR files
• Scale as large as needed
   – Data
   – Processing
   – Add nodes (even while running) to speed up
• No need to manage intermediate data
• Suitable for certain types of problems
   – Record-oriented input
   – No dependencies between records
• No more MUCK – focus on your problem
HADOOP + R
Thank You

Mais conteúdo relacionado

Mais de SORACOM, INC

クラウドがもたらす破壊と創造 = Developer Summit 2014 =
クラウドがもたらす破壊と創造  = Developer Summit 2014 = クラウドがもたらす破壊と創造  = Developer Summit 2014 =
クラウドがもたらす破壊と創造 = Developer Summit 2014 = SORACOM, INC
 
CDP2.0 - cloudpack night #7 -
CDP2.0 - cloudpack night #7 -CDP2.0 - cloudpack night #7 -
CDP2.0 - cloudpack night #7 -SORACOM, INC
 
AWSクラウドデザインパターン - JEITA講演 -
AWSクラウドデザインパターン - JEITA講演 - AWSクラウドデザインパターン - JEITA講演 -
AWSクラウドデザインパターン - JEITA講演 - SORACOM, INC
 
いまさら聞けないAWSクラウド - Java Festa 2013
いまさら聞けないAWSクラウド - Java Festa 2013いまさら聞けないAWSクラウド - Java Festa 2013
いまさら聞けないAWSクラウド - Java Festa 2013SORACOM, INC
 
Kansumi2013 tamagawa
Kansumi2013 tamagawaKansumi2013 tamagawa
Kansumi2013 tamagawaSORACOM, INC
 
Aws gameday tokyo_2013
Aws gameday tokyo_2013Aws gameday tokyo_2013
Aws gameday tokyo_2013SORACOM, INC
 
クラウドTCOの真実
クラウドTCOの真実クラウドTCOの真実
クラウドTCOの真実SORACOM, INC
 
AWSクラウドデザインパターン(CDP) - Eコマース編 -
AWSクラウドデザインパターン(CDP) - Eコマース編 -AWSクラウドデザインパターン(CDP) - Eコマース編 -
AWSクラウドデザインパターン(CDP) - Eコマース編 -SORACOM, INC
 
AWSクラウドデザインパターン(CDP) - 概要編 -
AWSクラウドデザインパターン(CDP) - 概要編 - AWSクラウドデザインパターン(CDP) - 概要編 -
AWSクラウドデザインパターン(CDP) - 概要編 - SORACOM, INC
 
Amazon DynamoDBの概要説明
Amazon DynamoDBの概要説明Amazon DynamoDBの概要説明
Amazon DynamoDBの概要説明SORACOM, INC
 
AWSアップデート 2月14日JAWS札幌
AWSアップデート 2月14日JAWS札幌AWSアップデート 2月14日JAWS札幌
AWSアップデート 2月14日JAWS札幌SORACOM, INC
 
AWS Storage Gateway 詳細 - AWSマイスターシリーズ
AWS Storage Gateway 詳細 - AWSマイスターシリーズAWS Storage Gateway 詳細 - AWSマイスターシリーズ
AWS Storage Gateway 詳細 - AWSマイスターシリーズSORACOM, INC
 
AWS Direct Connect 詳細 - AWSマイスターシリーズ Reloaded
AWS Direct Connect 詳細 - AWSマイスターシリーズ ReloadedAWS Direct Connect 詳細 - AWSマイスターシリーズ Reloaded
AWS Direct Connect 詳細 - AWSマイスターシリーズ ReloadedSORACOM, INC
 
はじめてのAWS - ビギナー編 -
はじめてのAWS - ビギナー編 - はじめてのAWS - ビギナー編 -
はじめてのAWS - ビギナー編 - SORACOM, INC
 
SimpleDB, SQS, SNS詳細 - AWSマイスターシリーズ
SimpleDB, SQS, SNS詳細 - AWSマイスターシリーズSimpleDB, SQS, SNS詳細 - AWSマイスターシリーズ
SimpleDB, SQS, SNS詳細 - AWSマイスターシリーズSORACOM, INC
 
JAWS-UG北陸第2回 AWSクラウド最新アップデート
JAWS-UG北陸第2回 AWSクラウド最新アップデートJAWS-UG北陸第2回 AWSクラウド最新アップデート
JAWS-UG北陸第2回 AWSクラウド最新アップデートSORACOM, INC
 
Amazon ElastiCache - AWSマイスターシリーズ
Amazon ElastiCache - AWSマイスターシリーズAmazon ElastiCache - AWSマイスターシリーズ
Amazon ElastiCache - AWSマイスターシリーズSORACOM, INC
 
AWS Simple Email Service詳細 -ほぼ週刊AWSマイスターシリーズ第11回-
AWS Simple Email Service詳細 -ほぼ週刊AWSマイスターシリーズ第11回-AWS Simple Email Service詳細 -ほぼ週刊AWSマイスターシリーズ第11回-
AWS Simple Email Service詳細 -ほぼ週刊AWSマイスターシリーズ第11回-SORACOM, INC
 
AWS Elastic MapReduce詳細 -ほぼ週刊AWSマイスターシリーズ第10回-
AWS Elastic MapReduce詳細 -ほぼ週刊AWSマイスターシリーズ第10回-AWS Elastic MapReduce詳細 -ほぼ週刊AWSマイスターシリーズ第10回-
AWS Elastic MapReduce詳細 -ほぼ週刊AWSマイスターシリーズ第10回-SORACOM, INC
 
AWS Elastic Beanstalk 詳細 -ほぼ週刊AWSマイスターシリーズ第9回-
AWS Elastic Beanstalk 詳細 -ほぼ週刊AWSマイスターシリーズ第9回-AWS Elastic Beanstalk 詳細 -ほぼ週刊AWSマイスターシリーズ第9回-
AWS Elastic Beanstalk 詳細 -ほぼ週刊AWSマイスターシリーズ第9回-SORACOM, INC
 

Mais de SORACOM, INC (20)

クラウドがもたらす破壊と創造 = Developer Summit 2014 =
クラウドがもたらす破壊と創造  = Developer Summit 2014 = クラウドがもたらす破壊と創造  = Developer Summit 2014 =
クラウドがもたらす破壊と創造 = Developer Summit 2014 =
 
CDP2.0 - cloudpack night #7 -
CDP2.0 - cloudpack night #7 -CDP2.0 - cloudpack night #7 -
CDP2.0 - cloudpack night #7 -
 
AWSクラウドデザインパターン - JEITA講演 -
AWSクラウドデザインパターン - JEITA講演 - AWSクラウドデザインパターン - JEITA講演 -
AWSクラウドデザインパターン - JEITA講演 -
 
いまさら聞けないAWSクラウド - Java Festa 2013
いまさら聞けないAWSクラウド - Java Festa 2013いまさら聞けないAWSクラウド - Java Festa 2013
いまさら聞けないAWSクラウド - Java Festa 2013
 
Kansumi2013 tamagawa
Kansumi2013 tamagawaKansumi2013 tamagawa
Kansumi2013 tamagawa
 
Aws gameday tokyo_2013
Aws gameday tokyo_2013Aws gameday tokyo_2013
Aws gameday tokyo_2013
 
クラウドTCOの真実
クラウドTCOの真実クラウドTCOの真実
クラウドTCOの真実
 
AWSクラウドデザインパターン(CDP) - Eコマース編 -
AWSクラウドデザインパターン(CDP) - Eコマース編 -AWSクラウドデザインパターン(CDP) - Eコマース編 -
AWSクラウドデザインパターン(CDP) - Eコマース編 -
 
AWSクラウドデザインパターン(CDP) - 概要編 -
AWSクラウドデザインパターン(CDP) - 概要編 - AWSクラウドデザインパターン(CDP) - 概要編 -
AWSクラウドデザインパターン(CDP) - 概要編 -
 
Amazon DynamoDBの概要説明
Amazon DynamoDBの概要説明Amazon DynamoDBの概要説明
Amazon DynamoDBの概要説明
 
AWSアップデート 2月14日JAWS札幌
AWSアップデート 2月14日JAWS札幌AWSアップデート 2月14日JAWS札幌
AWSアップデート 2月14日JAWS札幌
 
AWS Storage Gateway 詳細 - AWSマイスターシリーズ
AWS Storage Gateway 詳細 - AWSマイスターシリーズAWS Storage Gateway 詳細 - AWSマイスターシリーズ
AWS Storage Gateway 詳細 - AWSマイスターシリーズ
 
AWS Direct Connect 詳細 - AWSマイスターシリーズ Reloaded
AWS Direct Connect 詳細 - AWSマイスターシリーズ ReloadedAWS Direct Connect 詳細 - AWSマイスターシリーズ Reloaded
AWS Direct Connect 詳細 - AWSマイスターシリーズ Reloaded
 
はじめてのAWS - ビギナー編 -
はじめてのAWS - ビギナー編 - はじめてのAWS - ビギナー編 -
はじめてのAWS - ビギナー編 -
 
SimpleDB, SQS, SNS詳細 - AWSマイスターシリーズ
SimpleDB, SQS, SNS詳細 - AWSマイスターシリーズSimpleDB, SQS, SNS詳細 - AWSマイスターシリーズ
SimpleDB, SQS, SNS詳細 - AWSマイスターシリーズ
 
JAWS-UG北陸第2回 AWSクラウド最新アップデート
JAWS-UG北陸第2回 AWSクラウド最新アップデートJAWS-UG北陸第2回 AWSクラウド最新アップデート
JAWS-UG北陸第2回 AWSクラウド最新アップデート
 
Amazon ElastiCache - AWSマイスターシリーズ
Amazon ElastiCache - AWSマイスターシリーズAmazon ElastiCache - AWSマイスターシリーズ
Amazon ElastiCache - AWSマイスターシリーズ
 
AWS Simple Email Service詳細 -ほぼ週刊AWSマイスターシリーズ第11回-
AWS Simple Email Service詳細 -ほぼ週刊AWSマイスターシリーズ第11回-AWS Simple Email Service詳細 -ほぼ週刊AWSマイスターシリーズ第11回-
AWS Simple Email Service詳細 -ほぼ週刊AWSマイスターシリーズ第11回-
 
AWS Elastic MapReduce詳細 -ほぼ週刊AWSマイスターシリーズ第10回-
AWS Elastic MapReduce詳細 -ほぼ週刊AWSマイスターシリーズ第10回-AWS Elastic MapReduce詳細 -ほぼ週刊AWSマイスターシリーズ第10回-
AWS Elastic MapReduce詳細 -ほぼ週刊AWSマイスターシリーズ第10回-
 
AWS Elastic Beanstalk 詳細 -ほぼ週刊AWSマイスターシリーズ第9回-
AWS Elastic Beanstalk 詳細 -ほぼ週刊AWSマイスターシリーズ第9回-AWS Elastic Beanstalk 詳細 -ほぼ週刊AWSマイスターシリーズ第9回-
AWS Elastic Beanstalk 詳細 -ほぼ週刊AWSマイスターシリーズ第9回-
 

Último

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 

Último (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 

Amazon Elastic MapReduceの紹介(英語)

  • 2. MY BACKGROUND • Based in Seattle, WA • Education: – BS in Computer Science, The American University, 1985 – Graduate student in Digital Media, University of Washington, 2010 • Background: – Microsoft Visual Studio team – Consulting to startups and VC’s – Amazon employee since 2002 • Evangelist: – Speak – Write – Tweet • Author, “Host Your Web Site in the Cloud” • Email: jbarr@amazon.com • Twitter: @jeffbarr
  • 3. AGENDA • What is Big Data • Elastic MapReduce Overview • Example Use Cases • Ecosystem and Tools • Upcoming Features • Discussion
  • 4. W HAT IS BIG DATA? • Doesn’t refer just to volume – You can benefit from Big Data infrastructure without having a ton of data – Many existing technologies have little problem physically handling large volumes • Challenges result from the combination of data volume, data structure, and usage demands from that data, usually tied to timeliness • Big Data Tools are needed to provide a holistic view of enterprise data and systematically harness it for insights and trends
  • 5. WHAT IS AMAZON ELASTIC MAPREDUCE • Enables customers to easily, securely and cost-effectively process vast amounts of data: – Spin-up hundreds of instances – Process hundreds of terabytes of data • Hosted Hadoop framework running on Amazon’s web-scale infrastructure
  • 6. • Launch and monitor job flows • AWS Management Console • Command line interface • REST API
  • 7. WHY USE AMAZON ELASTIC MAPREDUCE • Elastic MapReduce removes “MUCK” from Big Data processing – Hard to manage compute clusters – Hard to tune Hadoop – Hard to monitor running Job Flows – Hard to debug Hadoop jobs – Hadoop issues prevent smooth operation in the cloud
  • 8. PROBLEMS CUSTOMERS SOLVE WITH ELASTIC MAPREDUCE • Targeted advertising / Clickstream analysis • Data warehousing applications • Bio-informatics (Genome analysis) • Financial simulation (Monte Carlo simulation) • File processing (resize jpegs) • Web indexing • Data mining and BI
  • 9. HARDWARE REQUIREMENTS FOR USE CASES • Data or I/O Intensive (m1/m2 instances) – Data Warehouse – Data Mining • Click stream, logs, events, etc. • Compute or I/O Intensive (c1, cc1/HPC instances) – Credit Ratings – Fraud Models – Portfolio analysis – VaR calculation
  • 10. CLICKSTREAM ANALYSIS – R AZORFISH AND BEST BUY • Best Buy came to Razorfish – 3.5 billion records, 71 million unique cookies, 1.7 million targeted ads required per day User recently purchased a home theater Targeted Ad system and is searching for (1.7 Million per day) video games • Leveraged AWS and Elastic MapReduce – 100 node cluster on demand – Processing time dropped from 2+ days to 8 hours – Increased ROAS (Return on Advertising Spend) by 500%
  • 11. CLICKSTREAM ANALYSIS - ARCHITECTURE
  • 12. W HAT IS MAPR EDUCE? • Invented by Google • New processing model • Highly scalable • Easy to understand • Industry standard • Something worth knowing
  • 13. ELASTIC MAPR EDUCE MODEL – O VERVIEW • Take input data • Break in to sub-problems • Distribute to worker nodes • Worker nodes process sub-problems in parallel • Take output of worker nodes and reduce to answer
  • 14. MAPR EDUCE EXAMPLE – W ORD COUNT Input Output “This”, 3 “Word”, 2 Map Phase Reduce “This”, Doc1 Phase “This”, Doc1 Mapper “This”, Doc2 Reducer “Word”, Doc1 Sort “This”, Doc3 Mapper “This”, Doc2 “This”, Doc3 Mapper “Word”, Doc1 “Word”, Doc3 Reducer “Word”, Doc3
  • 15. ELASTIC MAPR EDUCE MODEL – DETAILED
  • 16. ELASTIC MAPR EDUCE IN ACTION – S3 L OG F ILE
  • 17. ELASTIC MAPR EDUCE IN ACTION – S TEP 1
  • 18. ELASTIC MAPR EDUCE IN ACTION – S TEP 2
  • 19. ELASTIC MAPR EDUCE IN ACTION – S TEP 3
  • 20. ELASTIC MAPR EDUCE IN ACTION – S TEP 4
  • 21. ELASTIC MAPR EDUCE IN ACTION – S TEP 5
  • 22. ELASTIC MAPR EDUCE IN ACTION – S TEP 6
  • 23. ELASTIC MAPR EDUCE IN ACTION – S TEP 7
  • 24. ELASTIC MAPR EDUCE IN ACTION - R ESULTS
  • 25. NOTES / ATTRIBUTES • Mapper and Reducer in Java JAR files • Scale as large as needed – Data – Processing – Add nodes (even while running) to speed up • No need to manage intermediate data • Suitable for certain types of problems – Record-oriented input – No dependencies between records • No more MUCK – focus on your problem