SlideShare a Scribd company logo
1 of 17
Deadline Queries: Leveraging the Cloud to Produce On-Time Results Authors: David Ribeiro Alves, Pedro Bizarro, Paulo Marques
In a nutshell Cluster computing widely used to solve “BigData” problems Users use programming abstractions to express the computation, e.g., MapReduce, but are left with some difficult questions:  how many nodes?  how long will it take? Proposed solution:Users define a deadline; cluster expands/contracts to meet it. 2 CLOUD '11
Introducing Deadline Queries Cluster computing tasks that complete within a deadline… … while minimizingcost/resource consumption Independently of: 3 Processing Capacity per Machine Faults or Perturbations Initial Number of Nodes Data Size, Content or Skew Computation Complexity CLOUD '11
Approaches in current systems 4 … make the task fit the cluster. CLOUD '11
Our Approach 5 … make cluster fit the task. CLOUD '11
Architecture and Runtime 6 Ex:  SELECT symbol, avg(value), avg(volume) FROM Stocks     GROUP BY symbol FINISH IN 900 SEC Master Node Query     IaaS Provider request nodes metrics Worker Node Part. 1 Worker Node mod. cluster Worker Node Part. 2 Worker Node Worker Worker Part. 3 Worker Part. n CLOUD '11
Stream Processing  Continuous processing allows phases to start before previous phases complete Continuous processing allows to continuously gather progress metrics about the computation as a whole SP provides continuous load balancing, which allows to:  take immediate advantage of arriving nodes deal with temporary or permanent asymmetries deal with data skew SP fault tolerance allow to quickly respond to faults  CLOUD '11 7
MapReduce SELECT symbol, avg(value),avg(volume) FROM Stocks  GROUP BY symbol FINISH IN 900 sec MapReduce Decomposition: 8 Fetch & Transform Map (Select/Project) Group Reduce (Aggregate) Store Results CLOUD '11
Streaming MapReduce - Scaling Stream Processing => load balancing and fault tolerance in a changing cluster MapReduce => Simple, parallel, scalable programming and execution model 9 CLOUD '11
Progress estimation Consumed vs. remaining data + linear regression to estimate finish time. React accordingly by either expanding or contracting the cluster. 10 CLOUD '11
Experimental Evaluation - Setup 11 Real world environment experiments On top of Amazon EC2 Running Query: SELECT symbol, avg(value), avg(volume)FROM StocksGROUP BY symbol FINISH IN 900 sec Used between 1 and 27 machines (m1.large) 2* Dual Core Xeon (2.66 Ghz) 7.5 GB of RAM Experiments show: Predicted remaining time Number of nodes CLOUD '11
Exp. 1 – Varying Initial Cluster Size 12 CLOUD '11
Exp. 2 – Varying Deadline 13 CLOUD '11
Exp. 3 – Introducing Perturbations 14 CLOUD '11
Conclusions Cloud Computing, e.g., IaaS, allow new approaches to cluster computing and new optimization goals. Deadline Queries may help in expressing computation prov. requirements beyond number of nodes. Deadline Queries is a viable alternative to implement hard time limits for query execution. Real implementation and evaluation show approach is feasible and works as expected.  15 CLOUD '11
16 Questions? CLOUD '11
Fault Tolerance 17 CLOUD ‘11

More Related Content

What's hot

Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Filipo Mór
 
CUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce ClusterCUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce Cluster
airbots
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
Noha Elprince
 

What's hot (20)

Hadoop mapreduce performance study on arm cluster
Hadoop mapreduce performance study on arm clusterHadoop mapreduce performance study on arm cluster
Hadoop mapreduce performance study on arm cluster
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
Finalprojectpresentation
 
CNIT 127 Ch 5: Introduction to heap overflows
CNIT 127 Ch 5: Introduction to heap overflowsCNIT 127 Ch 5: Introduction to heap overflows
CNIT 127 Ch 5: Introduction to heap overflows
 
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
Parallelization Strategies for Implementing Nbody Codes on Multicore Architec...
 
Scheduling in cloud
Scheduling in cloudScheduling in cloud
Scheduling in cloud
 
CUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce ClusterCUDA performance study on Hadoop MapReduce Cluster
CUDA performance study on Hadoop MapReduce Cluster
 
Llnl talk
Llnl talkLlnl talk
Llnl talk
 
The next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engineThe next generation of the Montage image mosaic engine
The next generation of the Montage image mosaic engine
 
Ch 5: Introduction to heap overflows
Ch 5: Introduction to heap overflowsCh 5: Introduction to heap overflows
Ch 5: Introduction to heap overflows
 
MapReduce and Hadoop
MapReduce and HadoopMapReduce and Hadoop
MapReduce and Hadoop
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Mcs 041 assignment solution (2020-21)
Mcs 041 assignment solution (2020-21)Mcs 041 assignment solution (2020-21)
Mcs 041 assignment solution (2020-21)
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
HDFS-HC2: Analysis of Data Placement Strategy based on Computing Power of Nod...
 
MapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large ClustersMapReduce: Simplified Data Processing On Large Clusters
MapReduce: Simplified Data Processing On Large Clusters
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
Scaling metrics
Scaling metricsScaling metrics
Scaling metrics
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
REVIEW PAPER on Scheduling in Cloud Computing
REVIEW PAPER on Scheduling in Cloud ComputingREVIEW PAPER on Scheduling in Cloud Computing
REVIEW PAPER on Scheduling in Cloud Computing
 

Viewers also liked

Expanding Elastic: Learn how anyone can leverage heterogeneous compute to ext...
Expanding Elastic: Learn how anyone can leverage heterogeneous compute to ext...Expanding Elastic: Learn how anyone can leverage heterogeneous compute to ext...
Expanding Elastic: Learn how anyone can leverage heterogeneous compute to ext...
Ryft
 

Viewers also liked (11)

James elastic search
James   elastic searchJames   elastic search
James elastic search
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Elastic Search Indexing Internals
Elastic Search Indexing InternalsElastic Search Indexing Internals
Elastic Search Indexing Internals
 
quick intro to elastic search
quick intro to elastic search quick intro to elastic search
quick intro to elastic search
 
Elastic search Walkthrough
Elastic search WalkthroughElastic search Walkthrough
Elastic search Walkthrough
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...What I learnt: Elastic search & Kibana : introduction, installtion & configur...
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
 
Elastic search & patent information @ mtc
Elastic search & patent information @ mtcElastic search & patent information @ mtc
Elastic search & patent information @ mtc
 
Linux commands and file structure
Linux commands and file structureLinux commands and file structure
Linux commands and file structure
 
Diario Resumen 20170315
Diario Resumen 20170315Diario Resumen 20170315
Diario Resumen 20170315
 
Expanding Elastic: Learn how anyone can leverage heterogeneous compute to ext...
Expanding Elastic: Learn how anyone can leverage heterogeneous compute to ext...Expanding Elastic: Learn how anyone can leverage heterogeneous compute to ext...
Expanding Elastic: Learn how anyone can leverage heterogeneous compute to ext...
 
Elastic Search
Elastic SearchElastic Search
Elastic Search
 

Similar to IEEE CLOUD \'11

Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
Sri Prasanna
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Xiao Qin
 
Download It
Download ItDownload It
Download It
butest
 
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
James McGalliard
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
DataStax Academy
 

Similar to IEEE CLOUD \'11 (20)

Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like systemAccelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
 
Green scheduling
Green schedulingGreen scheduling
Green scheduling
 
Resisting skew accumulation
Resisting skew accumulationResisting skew accumulation
Resisting skew accumulation
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing Environment
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
Efficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceEfficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/Reduce
 
Efficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/ReduceEfficient processing of Rank-aware queries in Map/Reduce
Efficient processing of Rank-aware queries in Map/Reduce
 
Hadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraintHadoop scheduler with deadline constraint
Hadoop scheduler with deadline constraint
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
SRV402 Deep Dive on Amazon EC2 Instances, Featuring Performance Optimization ...
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Download It
Download ItDownload It
Download It
 
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
 
High Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & AzureHigh Throughput Analytics with Cassandra & Azure
High Throughput Analytics with Cassandra & Azure
 
Handout3o
Handout3oHandout3o
Handout3o
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop Cluster
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
Sawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data CloudsSawmill - Integrating R and Large Data Clouds
Sawmill - Integrating R and Large Data Clouds
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Recently uploaded (20)

Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 

IEEE CLOUD \'11

  • 1. Deadline Queries: Leveraging the Cloud to Produce On-Time Results Authors: David Ribeiro Alves, Pedro Bizarro, Paulo Marques
  • 2. In a nutshell Cluster computing widely used to solve “BigData” problems Users use programming abstractions to express the computation, e.g., MapReduce, but are left with some difficult questions: how many nodes? how long will it take? Proposed solution:Users define a deadline; cluster expands/contracts to meet it. 2 CLOUD '11
  • 3. Introducing Deadline Queries Cluster computing tasks that complete within a deadline… … while minimizingcost/resource consumption Independently of: 3 Processing Capacity per Machine Faults or Perturbations Initial Number of Nodes Data Size, Content or Skew Computation Complexity CLOUD '11
  • 4. Approaches in current systems 4 … make the task fit the cluster. CLOUD '11
  • 5. Our Approach 5 … make cluster fit the task. CLOUD '11
  • 6. Architecture and Runtime 6 Ex: SELECT symbol, avg(value), avg(volume) FROM Stocks GROUP BY symbol FINISH IN 900 SEC Master Node Query IaaS Provider request nodes metrics Worker Node Part. 1 Worker Node mod. cluster Worker Node Part. 2 Worker Node Worker Worker Part. 3 Worker Part. n CLOUD '11
  • 7. Stream Processing Continuous processing allows phases to start before previous phases complete Continuous processing allows to continuously gather progress metrics about the computation as a whole SP provides continuous load balancing, which allows to: take immediate advantage of arriving nodes deal with temporary or permanent asymmetries deal with data skew SP fault tolerance allow to quickly respond to faults CLOUD '11 7
  • 8. MapReduce SELECT symbol, avg(value),avg(volume) FROM Stocks GROUP BY symbol FINISH IN 900 sec MapReduce Decomposition: 8 Fetch & Transform Map (Select/Project) Group Reduce (Aggregate) Store Results CLOUD '11
  • 9. Streaming MapReduce - Scaling Stream Processing => load balancing and fault tolerance in a changing cluster MapReduce => Simple, parallel, scalable programming and execution model 9 CLOUD '11
  • 10. Progress estimation Consumed vs. remaining data + linear regression to estimate finish time. React accordingly by either expanding or contracting the cluster. 10 CLOUD '11
  • 11. Experimental Evaluation - Setup 11 Real world environment experiments On top of Amazon EC2 Running Query: SELECT symbol, avg(value), avg(volume)FROM StocksGROUP BY symbol FINISH IN 900 sec Used between 1 and 27 machines (m1.large) 2* Dual Core Xeon (2.66 Ghz) 7.5 GB of RAM Experiments show: Predicted remaining time Number of nodes CLOUD '11
  • 12. Exp. 1 – Varying Initial Cluster Size 12 CLOUD '11
  • 13. Exp. 2 – Varying Deadline 13 CLOUD '11
  • 14. Exp. 3 – Introducing Perturbations 14 CLOUD '11
  • 15. Conclusions Cloud Computing, e.g., IaaS, allow new approaches to cluster computing and new optimization goals. Deadline Queries may help in expressing computation prov. requirements beyond number of nodes. Deadline Queries is a viable alternative to implement hard time limits for query execution. Real implementation and evaluation show approach is feasible and works as expected. 15 CLOUD '11
  • 17. Fault Tolerance 17 CLOUD ‘11

Editor's Notes

  1. ----- Meeting Notes (10/20/10 14:48) -----Notasgenericas:Mais "sharp"FocarnaaudienciaNuncadigocomovouavaliar o sistema.Gantt estamuitopequeno
  2. In particular I’d like to refer to two practical cases:1st one is that of a portuguese bank that must complete processing 10M transaction and produce the respective reports in the morning, but has no idea how much machine power it requires to do so.2nd is that of a portuguese telecom company that is actually building the largest portuguese private cloud, but still has problems alocating nodes to tasks to guarantee they complete in time.
  3. Create an animation in a slide or two that describes how the problem was previously deal with and our solution, introduce the running example hereStory of the slide is:start a processing documents, (start moving doc arrow to the cluster)when the system predicts the deadline will be missed (clock turns red)… it starts discard data or reducing accuracy (put documents in the trash)mencionaroralmenteque outros sistemasdicartam dados mas naoficarporaquimuito tempomencionarqueemmuitoscasosnao se podedeitar dados for a (exemplosprevios)
  4. Story of the slide is:start a processing documents, (start moving doc arrow to the cluster)when we see the deadline will be missed (clock turns red)… start expanding used resources
  5. Mencionarqueadoptamos streaming mapreduceparasermoscapazes de lidar com alteracoes no cluster
  6. Transform task in dataflow and split data in partitionsRequest nodes and assign dataflow parts to themNodes fetch partitions from a queue and insert them in the dataflowNodes send report updates to the master, which decides if more nodes are needed anf if so…**CLICK**New nodes are added to the computationThe fact that we use stramingmapreduce allows us to:deal with data skew, by using streaming routing techniquesdeal with faults relatively quickly
  7. Transform task in dataflow and split data in partitionsRequest nodes and assign dataflow parts to themNodes fetch partitions from a queue and insert them in the dataflowNodes send report updates to the master, which decides if more nodes are needed anf if so…**CLICK**New nodes are added to the computationUse load-balanced content insensitive routing where possibleUse load-balanced content sensitive routing where needed.----- Meeting Notes (6/27/11 19:18) -----Por o nome das maquinas----- Meeting Notes (6/29/11 14:33) -----eficiente palavra demasiado relativa
  8. Experiment 1 – Varying Initial Cluster Size, Click 1 – The experiment starts with 1 nodeClick 2 – At first we
  9. Series 1Lets see the experimentsWe begin by executing the query starting with one node**CLICK**The system starts execution with 1 nodeAt first there are no statistics on progress so nothing can be said about wether the deadline will be met**CLICK**As soon as the system detects the deadline will be missed----- Meeting Notes (6/29/11 14:33) -----threshold on dealine fault detectionthreshold on max number of machineshistoria interesasnte para contar como se portaria em diversos modelos de custospeculOS
  10. Clarificaroralmentecomoforaminjectadas as perturbacoes (comandoslinux)
  11. Normal OperationMaps process single partitions and tag results with part_idPartial reduces maitain per partition windowsTotal Reduces maitain a tentative set, where results are separated partition wise.Upon receiving a part_end punctuation When faults occurMaster notifies remaining nodes that the node has failed (so they know not to receive data from that node).Nodes discard all data from that partition (partial reduces discard the partitions window set and total reduces discard the partitions group in the tentative set)----- Meeting Notes (6/27/11 18:55) -----Transformar isto em dois slides