SlideShare uma empresa Scribd logo
1 de 24
Byzantine Fault-Tolerant MapReduce
        in Cloud-of-Clouds
       Joint work with: Miguel Correia, Marcelo Pasin,
     Alysson Bessani, Fernando Ramos, Paulo Verissimo
                   Presenter: Pedro Costa


                         Navtalk
Motivation
• How to count the number of words in the
  internet?
• How to do it with the help of a cloud-of-clouds
  (ie, several clouds)
• Guarantee integrity and availability of data




                                                2
Outline
• Introduction
   – MapReduce programming model
   – Fault tolerance in Cloud-of-clouds
   – 3 problems for Basic scheme
• Our approach
   – Byzantine fault-tolerant MapReduce in clouds-of-clouds
• Evaluation




                                                              3
MAPREDUCE AND FAULTS


                       4
What is MapReduce?
• Programming model + execution environment
   • Introduced by Google in 2004
   • Used for processing large data sets using clusters of servers
   • A few implementations available, used by many companies
• Hadoop MapReduce, an open-source MapReduce of Apache
   • The most used, the one we have been using
   • Includes HDFS, a distributed file system for large files




                                                                     5
MapReduce basic idea
A file with all the words
      on the Internet


                            Map Phase   <word,1>

                                                                                 <word,n>


                                                                  Reduce Phase




                                                    Tasktracker
                                                   servers

                                  Tasktracker
                                      servers
                     Job tracker detects and recovers crashed map/reduce tasks              6
MapReduce components
  Wordcount




   TT1        TT2   TT3   TT1   TT3




  (TT)




                                      7
But there are more faults…
• Problem: Accidental faults may affect the correctness of the results
  of MapReduce
    • Task corruptions: memory errors, chipset errors, …
    • Cloud outages: MapReduce job interruptions
                     (as reported in popular clouds)

• Our goal:
    • guarantee integrity and availability (despite task corruptions and
      cloud outages)
    • Develop a new model to compute MapReduce in cloud-of-clouds
    • Commercially feasible?
        Yes, but out of scope of this presentation
        Tobias Kurze et al., Cloud federation. In Proceedings of the 2nd International
        Conference on Cloud Computing, GRIDs, and Virtualization CLOUD COMPUTING
        2011.

                                                                                         8
Byzantine fault-tolerant MapReduce
• Basic idea: to replicate tasks in different clouds and vote the
  results returned by the replicas
   • The set of clouds forms a clouds, so cloud-of-clouds
   • Inputs initially stored in all clouds (i.e., not our problem)


                                                                     Cloud 1


                                                             Cloud 2


                                                                Cloud 3




                                                                               9
System model
• Client is correct (not part of MapReduce)
• Clouds: up to t clouds can arbitrarily corrupt all tasks and
  other modules they execute
• Why use t and not f? t≤f

• Next:
   • Basic BFT MapReduce scheme
   • 3 problems of the Basic scheme
   • Our approach: Full BFT MapReduce scheme




                                                                 10
MapReduce: Map perspective

Official               Cloud-of-Clouds




                       Replicas in different
                              clouds




                                               11
MapReduce: Reduce perspective

Official                    Cloud-of-Clouds




                                                   Replicas in different
                                                          clouds
                But we can do better.         12
Improvements over basic version
• 3 problems have risen
   • Computation problem
   • Communication problem
   • Job execution control problem


• 3 Solutions: Our BFT MapReduce can be thought of as this
  basic version plus the following mechanisms,
   • Deferred execution (computation problem)
   • Digest communication (communication problem)
   • Distributed Job tracker (job execution control problem)


                                                               13
Problem 1: computation


                        split 0                                   part 0




                        split 0                                   part 0




                                                                                Replicas in different
Replicas in different




                                                                                       clouds
       clouds




                        split 0                                   part 0




                                  Tasks are executed 2t+1 times            14
Solution 1: Deferred execution
• Computation problem is uncommon
• Job Tracker replicates tasks across t+1 clouds (t in standby)
• If results differ or one cloud stops, request 1 more (up to t)


     split 0

                                                part 0

     split 0

                                                part 0



                                                                   15
Problem 2: communication


    split 0                                     part 0




    split 0                                     part 0




                                                                  Replicas in different
                                                                         clouds
    split 0                                     part 0




All this communication through the Internet (delay, cost)!   16
Solution 2: Transferring Digests
• Reduces must fetch the map task outputs
• Intra-cloud fetch: output fetched normally
• Inter-cloud fetch: only hash of the output fetched – key idea


          split 0




                                                            other clouds same cloud
                                                   part 0

          split 0




          split 0
                                                                                      17
Problem 3: Job execution control
• Job tracker controls all task executions in the task trackers in
  all clouds
• If Job tracker is in one cloud separated from many task
  trackers by the internet:
   • Communication is slow
   • Large timeouts for detecting task tracker failure
   • …and it’s a single point of failure (this is the case in MR & Hadoop MR)




                                                                            18
Solution 3: Job execution control
                                      Client
                                               VJT




                                               Job
                                           Tracker


            Job                                Task                       Job
          Tracker                          Tracker                      Tracker
                               Task                    Task
                              Tracker                 Tracker
           Task                                                          Task
          Tracker                                                       Tracker
 Task                Task                                       Task               Task
Tracker             Tracker                                   Tracker             Tracker


                                                                                            19
EVALUATION


             20
Setup and Test
Platform configuration
• 3 clouds
• Each cloud has 3 nodes
• 1 JT and 3TT for each cloud
• All JTs are interconnected

Job submitted (Wordcount)
• Input data: 26 chunks of 64 MB (total 1.5GB )
• Map tasks: 26
• Reduce tasks: 120, 180, 360, 400

                                                  21
Number of reduce tasks executed
          (no faults, t=1)


                             Nr.      Job          Job        Diff
                             Reduce   duration     duration
                             tasks    (Official)   (CoC)
                             120      00:15:35     00:17:13   00:02:35
                             180      00:19:35     00:21:36   00:02:01
                             360      00:31:12     00:33:30   00:02:18
                             400      00:33:37     00:36:24   00:02:47
Task details
Official                                                      BFT Cloud-of-clouds: 1 view
                Map Duration: 00:06:47                                      Map duration: 00:07:08
 Map Tasks




                                                   Map Tasks
                Reduce duration: 00:13:18                                  Reduce duration: 00:14:46
 Reduce Tasks




                                                   Reduce Tasks




                                                                                                       23
Conclusions
• Our method guarantee integrity and availability despite task
  corruptions and cloud outages
• BFT MapReduce in cloud-of-clouds is feasible!
   • No need to execute in all 2t+1 clouds
   • Only digests sent through the Internet (no “big data”)
   • Control job execution within each cloud




                          Thank you
                                                                 24

Mais conteúdo relacionado

Mais procurados

Fast Optimization Intevac
Fast Optimization IntevacFast Optimization Intevac
Fast Optimization Intevacvvk0
 
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...CSCJournals
 
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...Big Data Spain
 
SVD and Lifting Wavelet Based Fragile Image Watermarking
SVD and Lifting Wavelet Based Fragile Image WatermarkingSVD and Lifting Wavelet Based Fragile Image Watermarking
SVD and Lifting Wavelet Based Fragile Image WatermarkingIDES Editor
 
XCPU3: Workload Distribution and Aggregation
XCPU3: Workload Distribution and AggregationXCPU3: Workload Distribution and Aggregation
XCPU3: Workload Distribution and AggregationEric Van Hensbergen
 
Scheduling MapReduce Jobs in HPC Clusters
Scheduling MapReduce Jobs in HPC ClustersScheduling MapReduce Jobs in HPC Clusters
Scheduling MapReduce Jobs in HPC ClustersMarcelo Veiga Neves
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09Hiroshi Ono
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to HadoopDan Harvey
 
Design and implemation of an enhanced dds based digital
Design and implemation of an enhanced dds based digitalDesign and implemation of an enhanced dds based digital
Design and implemation of an enhanced dds based digitalManoj Kollam
 
Classification of Virtualization Environment for Cloud Computing
Classification of Virtualization Environment for Cloud ComputingClassification of Virtualization Environment for Cloud Computing
Classification of Virtualization Environment for Cloud ComputingSouvik Pal
 
Scientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceScientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceAngelo Corsaro
 
Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System ManagementIbrahim Amer
 
discrete wavelet transform based satellite image resolution enhancement
discrete wavelet transform based satellite image resolution enhancement discrete wavelet transform based satellite image resolution enhancement
discrete wavelet transform based satellite image resolution enhancement muniswamy Paluru
 

Mais procurados (20)

Ppt
PptPpt
Ppt
 
Fast Optimization Intevac
Fast Optimization IntevacFast Optimization Intevac
Fast Optimization Intevac
 
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...
A Dual Tree Complex Wavelet Transform Construction and Its Application to Ima...
 
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
CloudMC: A cloud computing map-reduce implementation for radiotherapy. RUBEN ...
 
SVD and Lifting Wavelet Based Fragile Image Watermarking
SVD and Lifting Wavelet Based Fragile Image WatermarkingSVD and Lifting Wavelet Based Fragile Image Watermarking
SVD and Lifting Wavelet Based Fragile Image Watermarking
 
XCPU3: Workload Distribution and Aggregation
XCPU3: Workload Distribution and AggregationXCPU3: Workload Distribution and Aggregation
XCPU3: Workload Distribution and Aggregation
 
Scheduling MapReduce Jobs in HPC Clusters
Scheduling MapReduce Jobs in HPC ClustersScheduling MapReduce Jobs in HPC Clusters
Scheduling MapReduce Jobs in HPC Clusters
 
Fuzzy causal order
Fuzzy causal orderFuzzy causal order
Fuzzy causal order
 
benchmarks-sigmod09
benchmarks-sigmod09benchmarks-sigmod09
benchmarks-sigmod09
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
145 153
145 153145 153
145 153
 
Gh2411361141
Gh2411361141Gh2411361141
Gh2411361141
 
Hadoop
HadoopHadoop
Hadoop
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Design and implemation of an enhanced dds based digital
Design and implemation of an enhanced dds based digitalDesign and implemation of an enhanced dds based digital
Design and implemation of an enhanced dds based digital
 
Classification of Virtualization Environment for Cloud Computing
Classification of Virtualization Environment for Cloud ComputingClassification of Virtualization Environment for Cloud Computing
Classification of Virtualization Environment for Cloud Computing
 
Scientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution ServiceScientific Applications of The Data Distribution Service
Scientific Applications of The Data Distribution Service
 
Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System Management
 
discrete wavelet transform based satellite image resolution enhancement
discrete wavelet transform based satellite image resolution enhancement discrete wavelet transform based satellite image resolution enhancement
discrete wavelet transform based satellite image resolution enhancement
 

Semelhante a Bft mr-clouds-of-clouds-discco2012 - navtalk

Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aSchubert Zhang
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniquesijsrd.com
 
mapreduce-advanced.pptx
mapreduce-advanced.pptxmapreduce-advanced.pptx
mapreduce-advanced.pptxShimoFcis
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsMichael Kopp
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftLee Stott
 
10c introduction
10c introduction10c introduction
10c introductionInyoung Cho
 
Strata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting BoarStrata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting BoarCloudera, Inc.
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabadsreehari orienit
 
Scalable and Available Services with Docker and Kubernetes
Scalable and Available Services with Docker and KubernetesScalable and Available Services with Docker and Kubernetes
Scalable and Available Services with Docker and KubernetesLaura Frank Tacho
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukAndrii Vozniuk
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 

Semelhante a Bft mr-clouds-of-clouds-discco2012 - navtalk (20)

Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
EEDC Programming Models
EEDC Programming ModelsEEDC Programming Models
EEDC Programming Models
 
A Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis TechniquesA Survey on Big Data Analysis Techniques
A Survey on Big Data Analysis Techniques
 
mapreduce-advanced.pptx
mapreduce-advanced.pptxmapreduce-advanced.pptx
mapreduce-advanced.pptx
 
Performance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ ApplicationsPerformance Management in ‘Big Data’ Applications
Performance Management in ‘Big Data’ Applications
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
 
10c introduction
10c introduction10c introduction
10c introduction
 
10c introduction
10c introduction10c introduction
10c introduction
 
Strata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting BoarStrata + Hadoop World 2012: Knitting Boar
Strata + Hadoop World 2012: Knitting Boar
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Scalable and Available Services with Docker and Kubernetes
Scalable and Available Services with Docker and KubernetesScalable and Available Services with Docker and Kubernetes
Scalable and Available Services with Docker and Kubernetes
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
E031201032036
E031201032036E031201032036
E031201032036
 
Scheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii VozniukScheduling in distributed systems - Andrii Vozniuk
Scheduling in distributed systems - Andrii Vozniuk
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 

Último

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Último (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Bft mr-clouds-of-clouds-discco2012 - navtalk

  • 1. Byzantine Fault-Tolerant MapReduce in Cloud-of-Clouds Joint work with: Miguel Correia, Marcelo Pasin, Alysson Bessani, Fernando Ramos, Paulo Verissimo Presenter: Pedro Costa Navtalk
  • 2. Motivation • How to count the number of words in the internet? • How to do it with the help of a cloud-of-clouds (ie, several clouds) • Guarantee integrity and availability of data 2
  • 3. Outline • Introduction – MapReduce programming model – Fault tolerance in Cloud-of-clouds – 3 problems for Basic scheme • Our approach – Byzantine fault-tolerant MapReduce in clouds-of-clouds • Evaluation 3
  • 5. What is MapReduce? • Programming model + execution environment • Introduced by Google in 2004 • Used for processing large data sets using clusters of servers • A few implementations available, used by many companies • Hadoop MapReduce, an open-source MapReduce of Apache • The most used, the one we have been using • Includes HDFS, a distributed file system for large files 5
  • 6. MapReduce basic idea A file with all the words on the Internet Map Phase <word,1> <word,n> Reduce Phase Tasktracker servers Tasktracker servers Job tracker detects and recovers crashed map/reduce tasks 6
  • 7. MapReduce components Wordcount TT1 TT2 TT3 TT1 TT3 (TT) 7
  • 8. But there are more faults… • Problem: Accidental faults may affect the correctness of the results of MapReduce • Task corruptions: memory errors, chipset errors, … • Cloud outages: MapReduce job interruptions (as reported in popular clouds) • Our goal: • guarantee integrity and availability (despite task corruptions and cloud outages) • Develop a new model to compute MapReduce in cloud-of-clouds • Commercially feasible? Yes, but out of scope of this presentation Tobias Kurze et al., Cloud federation. In Proceedings of the 2nd International Conference on Cloud Computing, GRIDs, and Virtualization CLOUD COMPUTING 2011. 8
  • 9. Byzantine fault-tolerant MapReduce • Basic idea: to replicate tasks in different clouds and vote the results returned by the replicas • The set of clouds forms a clouds, so cloud-of-clouds • Inputs initially stored in all clouds (i.e., not our problem) Cloud 1 Cloud 2 Cloud 3 9
  • 10. System model • Client is correct (not part of MapReduce) • Clouds: up to t clouds can arbitrarily corrupt all tasks and other modules they execute • Why use t and not f? t≤f • Next: • Basic BFT MapReduce scheme • 3 problems of the Basic scheme • Our approach: Full BFT MapReduce scheme 10
  • 11. MapReduce: Map perspective Official Cloud-of-Clouds Replicas in different clouds 11
  • 12. MapReduce: Reduce perspective Official Cloud-of-Clouds Replicas in different clouds But we can do better. 12
  • 13. Improvements over basic version • 3 problems have risen • Computation problem • Communication problem • Job execution control problem • 3 Solutions: Our BFT MapReduce can be thought of as this basic version plus the following mechanisms, • Deferred execution (computation problem) • Digest communication (communication problem) • Distributed Job tracker (job execution control problem) 13
  • 14. Problem 1: computation split 0 part 0 split 0 part 0 Replicas in different Replicas in different clouds clouds split 0 part 0 Tasks are executed 2t+1 times 14
  • 15. Solution 1: Deferred execution • Computation problem is uncommon • Job Tracker replicates tasks across t+1 clouds (t in standby) • If results differ or one cloud stops, request 1 more (up to t) split 0 part 0 split 0 part 0 15
  • 16. Problem 2: communication split 0 part 0 split 0 part 0 Replicas in different clouds split 0 part 0 All this communication through the Internet (delay, cost)! 16
  • 17. Solution 2: Transferring Digests • Reduces must fetch the map task outputs • Intra-cloud fetch: output fetched normally • Inter-cloud fetch: only hash of the output fetched – key idea split 0 other clouds same cloud part 0 split 0 split 0 17
  • 18. Problem 3: Job execution control • Job tracker controls all task executions in the task trackers in all clouds • If Job tracker is in one cloud separated from many task trackers by the internet: • Communication is slow • Large timeouts for detecting task tracker failure • …and it’s a single point of failure (this is the case in MR & Hadoop MR) 18
  • 19. Solution 3: Job execution control Client VJT Job Tracker Job Task Job Tracker Tracker Tracker Task Task Tracker Tracker Task Task Tracker Tracker Task Task Task Task Tracker Tracker Tracker Tracker 19
  • 21. Setup and Test Platform configuration • 3 clouds • Each cloud has 3 nodes • 1 JT and 3TT for each cloud • All JTs are interconnected Job submitted (Wordcount) • Input data: 26 chunks of 64 MB (total 1.5GB ) • Map tasks: 26 • Reduce tasks: 120, 180, 360, 400 21
  • 22. Number of reduce tasks executed (no faults, t=1) Nr. Job Job Diff Reduce duration duration tasks (Official) (CoC) 120 00:15:35 00:17:13 00:02:35 180 00:19:35 00:21:36 00:02:01 360 00:31:12 00:33:30 00:02:18 400 00:33:37 00:36:24 00:02:47
  • 23. Task details Official BFT Cloud-of-clouds: 1 view Map Duration: 00:06:47 Map duration: 00:07:08 Map Tasks Map Tasks Reduce duration: 00:13:18 Reduce duration: 00:14:46 Reduce Tasks Reduce Tasks 23
  • 24. Conclusions • Our method guarantee integrity and availability despite task corruptions and cloud outages • BFT MapReduce in cloud-of-clouds is feasible! • No need to execute in all 2t+1 clouds • Only digests sent through the Internet (no “big data”) • Control job execution within each cloud Thank you 24