SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
Big Data: Hadoop 2.0
Map Reduce / HDFS 2.0

@diego_pacheco
Software Architect | Agile Coach
Big Data
Hadoop - Cases
Hadoop
Hadoop
Ddistributed
F ile
S ystem

4000 nodes: 14PB storage
HDFS – Assumptions and Goals
• Hardware Failure: Houndred or thousands machines, expect to fail.
• Streaming Data Access: Batch processing, high throughtput not low latency.
• Large Data Sets: Terrabytes, works on cluster, scale, milions of files single instance.
• Simple Coherency Model: Write-once-read-many(create, read, close, no
changes) maximize coherency and high throughtput, perfect for Map/Reduce.

• Moving Computation instead of Moving Data: Is way
more cheaper, huge data, minimize network. HDFS moves the computation close to the data.

• Sofware and hardware Portability: Easily Portable.
HDFS

• Very large distributed FS
• 10k nodes, 100M files, 10PB
• Works with comodity hardware
• File replication
• Detect and recover from failures
• Optimized for batch processing
• Files break by blocks 128mb
• blocks: replicated in N dataNodes
• Data Coherency
• Write Once, Read Many
• Only Append to existent files
HDFS - Architecture
HDFS 2.0 - Federation
Hadoop
Map Reduce
Today: Parallelism per file

Single LARGE File
Single Thread

No
Parallelism
Map/Reduce: Unit of data

Task 0
0..64 mb

Task 1
64..128mb

Task 2
128..192mb

Each task process a unit of data

Task 3
192..256mb
Today: Network issue
Map/Reduce: Local Read

Task 0

Task 1

Task 2

Task 3

0..64 mb

64..128mb

128..192mb

192..256mb

Node 0 Node 1 Node 2 Node 3
• Local Read, no need for network copy
• Data is read from many disks in parallel
Map/Reduce: The Magic!

Single Hard Drive: Reads 75mb/second

12 hard drive
Per machine
12 * 75mb/second * 4k =

3.4 TB/ second
Map
Reduce
Big Data: Hadoop 2.0
Map Reduce / HDFS 2.0

Obrigado!
Thank You!

@diego_pacheco
Software Architect | Agile Coach

Mais conteúdo relacionado

Mais procurados

Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Cred_hadoop_presenatation
Cred_hadoop_presenatationCred_hadoop_presenatation
Cred_hadoop_presenatationAshish Saraf
 
Basic Hadoop Architecture V1 vs V2
Basic  Hadoop Architecture  V1 vs V2Basic  Hadoop Architecture  V1 vs V2
Basic Hadoop Architecture V1 vs V2VIVEKVANAVAN
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction葵慶 李
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processingroyans
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to SchoolAdam Doyle
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPTAnand Pandey
 
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...AyeeshaParveen
 

Mais procurados (20)

Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Cred_hadoop_presenatation
Cred_hadoop_presenatationCred_hadoop_presenatation
Cred_hadoop_presenatation
 
Basic Hadoop Architecture V1 vs V2
Basic  Hadoop Architecture  V1 vs V2Basic  Hadoop Architecture  V1 vs V2
Basic Hadoop Architecture V1 vs V2
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
Hadoop: Distributed data processing
Hadoop: Distributed data processingHadoop: Distributed data processing
Hadoop: Distributed data processing
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop
HadoopHadoop
Hadoop
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Anju
AnjuAnju
Anju
 
Hadoop An Introduction
Hadoop An IntroductionHadoop An Introduction
Hadoop An Introduction
 
Hadoop Presentation - PPT
Hadoop Presentation - PPTHadoop Presentation - PPT
Hadoop Presentation - PPT
 
Hadoop – big deal
Hadoop – big dealHadoop – big deal
Hadoop – big deal
 
Redis
RedisRedis
Redis
 
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science  Bon Secours...
Hadoop ecosystem; J.Ayeesha parveen 2 nd M.sc., computer science Bon Secours...
 

Semelhante a Big Data Hadoop 2.0 MapReduce HDFS

Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceDerek Chen
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File SystemVaibhav Jain
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basicssaili mane
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete informationbhargavi804095
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoopyaevents
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?sudhakara st
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFSKavyaGo
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxDanishMahmood23
 

Semelhante a Big Data Hadoop 2.0 MapReduce HDFS (20)

2. hadoop fundamentals
2. hadoop fundamentals2. hadoop fundamentals
2. hadoop fundamentals
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Apache hadoop basics
Apache hadoop basicsApache hadoop basics
Apache hadoop basics
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Hadoop
HadoopHadoop
Hadoop
 
hadoop distributed file systems complete information
hadoop distributed file systems complete informationhadoop distributed file systems complete information
hadoop distributed file systems complete information
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
 
Hadoop Technology
Hadoop TechnologyHadoop Technology
Hadoop Technology
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Seminar ppt
Seminar pptSeminar ppt
Seminar ppt
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
 

Mais de Diego Pacheco

Naming Things Book : Simple Book Review!
Naming Things Book : Simple Book Review!Naming Things Book : Simple Book Review!
Naming Things Book : Simple Book Review!Diego Pacheco
 
Continuous Discovery Habits Book Review.pdf
Continuous Discovery Habits  Book Review.pdfContinuous Discovery Habits  Book Review.pdf
Continuous Discovery Habits Book Review.pdfDiego Pacheco
 
Thoughts about Shape Up
Thoughts about Shape UpThoughts about Shape Up
Thoughts about Shape UpDiego Pacheco
 
Encryption Deep Dive
Encryption Deep DiveEncryption Deep Dive
Encryption Deep DiveDiego Pacheco
 
Management: Doing the non-obvious! III
Management: Doing the non-obvious! IIIManagement: Doing the non-obvious! III
Management: Doing the non-obvious! IIIDiego Pacheco
 
Design is not Subjective
Design is not SubjectiveDesign is not Subjective
Design is not SubjectiveDiego Pacheco
 
Architecture & Engineering : Doing the non-obvious!
Architecture & Engineering :  Doing the non-obvious!Architecture & Engineering :  Doing the non-obvious!
Architecture & Engineering : Doing the non-obvious!Diego Pacheco
 
Management doing the non-obvious II
Management doing the non-obvious II Management doing the non-obvious II
Management doing the non-obvious II Diego Pacheco
 
Testing in production
Testing in productionTesting in production
Testing in productionDiego Pacheco
 
Nine lies about work
Nine lies about workNine lies about work
Nine lies about workDiego Pacheco
 
Management: doing the nonobvious!
Management: doing the nonobvious!Management: doing the nonobvious!
Management: doing the nonobvious!Diego Pacheco
 
Dealing with dependencies
Dealing  with dependenciesDealing  with dependencies
Dealing with dependenciesDiego Pacheco
 
Dealing with dependencies in tests
Dealing  with dependencies in testsDealing  with dependencies in tests
Dealing with dependencies in testsDiego Pacheco
 

Mais de Diego Pacheco (20)

Naming Things Book : Simple Book Review!
Naming Things Book : Simple Book Review!Naming Things Book : Simple Book Review!
Naming Things Book : Simple Book Review!
 
Continuous Discovery Habits Book Review.pdf
Continuous Discovery Habits  Book Review.pdfContinuous Discovery Habits  Book Review.pdf
Continuous Discovery Habits Book Review.pdf
 
Thoughts about Shape Up
Thoughts about Shape UpThoughts about Shape Up
Thoughts about Shape Up
 
Holacracy
HolacracyHolacracy
Holacracy
 
AWS IAM
AWS IAMAWS IAM
AWS IAM
 
CDKs
CDKsCDKs
CDKs
 
Encryption Deep Dive
Encryption Deep DiveEncryption Deep Dive
Encryption Deep Dive
 
Sec 101
Sec 101Sec 101
Sec 101
 
Reflections on SCM
Reflections on SCMReflections on SCM
Reflections on SCM
 
Management: Doing the non-obvious! III
Management: Doing the non-obvious! IIIManagement: Doing the non-obvious! III
Management: Doing the non-obvious! III
 
Design is not Subjective
Design is not SubjectiveDesign is not Subjective
Design is not Subjective
 
Architecture & Engineering : Doing the non-obvious!
Architecture & Engineering :  Doing the non-obvious!Architecture & Engineering :  Doing the non-obvious!
Architecture & Engineering : Doing the non-obvious!
 
Management doing the non-obvious II
Management doing the non-obvious II Management doing the non-obvious II
Management doing the non-obvious II
 
Testing in production
Testing in productionTesting in production
Testing in production
 
Nine lies about work
Nine lies about workNine lies about work
Nine lies about work
 
Management: doing the nonobvious!
Management: doing the nonobvious!Management: doing the nonobvious!
Management: doing the nonobvious!
 
AI and the Future
AI and the FutureAI and the Future
AI and the Future
 
Dealing with dependencies
Dealing  with dependenciesDealing  with dependencies
Dealing with dependencies
 
Dealing with dependencies in tests
Dealing  with dependencies in testsDealing  with dependencies in tests
Dealing with dependencies in tests
 
Kanban 2020
Kanban 2020Kanban 2020
Kanban 2020
 

Último

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Último (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Big Data Hadoop 2.0 MapReduce HDFS

  • 1. Big Data: Hadoop 2.0 Map Reduce / HDFS 2.0 @diego_pacheco Software Architect | Agile Coach
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 14. HDFS – Assumptions and Goals • Hardware Failure: Houndred or thousands machines, expect to fail. • Streaming Data Access: Batch processing, high throughtput not low latency. • Large Data Sets: Terrabytes, works on cluster, scale, milions of files single instance. • Simple Coherency Model: Write-once-read-many(create, read, close, no changes) maximize coherency and high throughtput, perfect for Map/Reduce. • Moving Computation instead of Moving Data: Is way more cheaper, huge data, minimize network. HDFS moves the computation close to the data. • Sofware and hardware Portability: Easily Portable.
  • 15. HDFS • Very large distributed FS • 10k nodes, 100M files, 10PB • Works with comodity hardware • File replication • Detect and recover from failures • Optimized for batch processing • Files break by blocks 128mb • blocks: replicated in N dataNodes • Data Coherency • Write Once, Read Many • Only Append to existent files
  • 17. HDFS 2.0 - Federation
  • 20. Today: Parallelism per file Single LARGE File Single Thread No Parallelism
  • 21. Map/Reduce: Unit of data Task 0 0..64 mb Task 1 64..128mb Task 2 128..192mb Each task process a unit of data Task 3 192..256mb
  • 23. Map/Reduce: Local Read Task 0 Task 1 Task 2 Task 3 0..64 mb 64..128mb 128..192mb 192..256mb Node 0 Node 1 Node 2 Node 3 • Local Read, no need for network copy • Data is read from many disks in parallel
  • 24. Map/Reduce: The Magic! Single Hard Drive: Reads 75mb/second 12 hard drive Per machine 12 * 75mb/second * 4k = 3.4 TB/ second
  • 25. Map
  • 27. Big Data: Hadoop 2.0 Map Reduce / HDFS 2.0 Obrigado! Thank You! @diego_pacheco Software Architect | Agile Coach