SlideShare uma empresa Scribd logo
1 de 20
Mesos A Resource Management Platform for Hadoop and Big Data Clusters Benjamin Hindman, Andy Konwinski,MateiZaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica UC Berkeley
Challenges in Hadoop Cluster Management Isolation (both fault and resource) E.g. if co-locating production and experimental jobs Testing and deploying upgrades JobTracker scalability (in larger clusters) Running non-Hadoop jobs E.g. MPI, streaming MapReduce, etc
Mesos Mesosis a cluster resource manager over which multiple instances of Hadoop and other distributed applications can run Hadoop (production) Hadoop (ad-hoc) MPI … Mesos Node Node Node Node
What’s Different About It? Other resource managers exist today, including Hadoop on Demand Batch schedulers (e.g. Torque) VM schedulers (e.g. Eucalyptus) However, have 2 problems with Hadoop-like workloads: Data locality compromised due to static partitioning of nodes Utilization hurt because jobs hold nodes for their full duration Mesos addresses these through two features: Fine-grained sharing at the level of tasks Two-level scheduling model where jobs control placement
Fine-Grained Sharing Coarse-Grained (HOD, etc) Fine-Grained (Mesos) Hadoop 3 Hadoop 3 Hadoop 2 Hadoop 1 Hadoop 1 Hadoop 1 Hadoop 2 Hadoop 2 Hadoop 1 Hadoop 3 Hadoop 1 Hadoop 2 Hadoop 2 Hadoop 3 Hadoop 3 Hadoop 2 Hadoop 2 Hadoop 3 Hadoop 2 Hadoop 3 Hadoop 1 Hadoop 3 Hadoop 1 Hadoop 2
Hadoop job Hadoopjob MPI job Hadoop 0.20 scheduler Hadoop 0.19 scheduler MPI scheduler Mesosmaster Mesosslave Mesosslave Mesosslave MPI executor MPI executor Hadoop 19 executor Hadoop 20 executor Hadoop 19 executor task task task task Mesos Architecture task
Design Goals Scalability (to 10,000’s of nodes) Robustness (even to master failure) Flexibility (support wide variety of frameworks) Resulting design: simple, minimal core that pushes resource selection logic to frameworks
Other Features Master fault tolerance using ZooKeeper Resource isolation using Linux Containers Isolate CPU and memory between tasks on each node In newest kernels, can isolate network & disk IO too Web UI for viewing cluster state Deploy scripts for private clusters and EC2
Mesos Status Prototype in 10000 lines of C++ Ported frameworks: Hadoop (0.20.2), MPI (MPICH2), Torque New frameworks: Spark, Scala framework for iterative & interactive jobs Test deployments at Twitter and Facebook
Results Dynamic Resource Sharing Data Locality Scalability
Mesos Demo
Spark A framework for iterative and interactive cluster computing Matei Zaharia, MosharafChowdhury, Michael Franklin, Scott Shenker, Ion Stoica
Spark Goals Support iterative applications Common in machine learning but problematic forMapReduce, Dryad, etc Retain MapReduce’s fault tolerance & scalability Experiment with programmability Integrate into Scala programming language Support interactive use from Scala interpreter
Key Abstraction Resilient Distributed Datasets (RDDs) Collections of elements distributed across cluster that can persist across parallel operations Can be stored in memory, on disk, etc Can be transformed with map, filter, etc Automatically rebuilt on failure
Example: Log Mining Load error messages from a log into memory, then interactively search for various patterns Cache 1 Base RDD Transformed RDD lines = spark.textFile(“hdfs://...”) errors = lines.filter(_.startsWith(“ERROR”)) messages = errors.map(_.split(‘’)(2)) cachedMsgs = messages.cache() Worker results tasks Block 1 Driver Cached RDD Parallel operation cachedMsgs.filter(_.contains(“foo”)).count Cache 2 cachedMsgs.filter(_.contains(“bar”)).count Worker . . . Cache 3 Block 2 Worker Block 3
Example: Logistic Regression Goal: find best line separating two sets of points random initial line + + + + + + – + + – – + – + – – – – – – target
Logistic Regression Code val data = spark.textFile(...).map(readPoint).cache() varw = Vector.random(D) for (i <- 1 to ITERATIONS) { val gradient = data.map(p => { val scale = (1/(1+exp(-p.y*(w dot p.x))) - 1) * p.y     scale * p.x   }).reduce(_ + _) w -= gradient } println("Finalw: " + w)
Logistic Regression Performance 127 s / iteration first iteration 174 s further iterations 6 s
Spark Demo
Conclusion Mesos provides a stable platform to multiplex resources among diverse cluster applications Spark is a new cluster programming framework for iterative & interactive jobs enabled by Mesos Both are open-source (but in very early alpha!) http://github.com/mesos http://mesos.berkeley.edu

Mais conteúdo relacionado

Destaque

1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21
Hadoop User Group
 
Hadoop Record Reader In Python
Hadoop Record Reader In PythonHadoop Record Reader In Python
Hadoop Record Reader In Python
Hadoop User Group
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009
yhadoop
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
yhadoop
 
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big Data
Yahoo Developer Network
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Hadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Hadoop User Group
 

Destaque (19)

Hadoop Release Plan Feb17
Hadoop Release Plan Feb17Hadoop Release Plan Feb17
Hadoop Release Plan Feb17
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21
 
Hadoop Record Reader In Python
Hadoop Record Reader In PythonHadoop Record Reader In Python
Hadoop Record Reader In Python
 
The Bixo Web Mining Toolkit
The Bixo Web Mining ToolkitThe Bixo Web Mining Toolkit
The Bixo Web Mining Toolkit
 
Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009Hadoop at Yahoo! -- Hadoop World NY 2009
Hadoop at Yahoo! -- Hadoop World NY 2009
 
Mumak
MumakMumak
Mumak
 
File Context
File ContextFile Context
File Context
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Resource Sharing Beyond Boundaries - Apache Myriad
Resource Sharing Beyond Boundaries - Apache MyriadResource Sharing Beyond Boundaries - Apache Myriad
Resource Sharing Beyond Boundaries - Apache Myriad
 
Nov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big DataNov 2010 HUG: Business Intelligence for Big Data
Nov 2010 HUG: Business Intelligence for Big Data
 
Nov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.HNov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.H
 
Upgrading To The New Map Reduce API
Upgrading To The New Map Reduce APIUpgrading To The New Map Reduce API
Upgrading To The New Map Reduce API
 
HUG Nov 2010: HDFS Raid - Facebook
HUG Nov 2010: HDFS Raid - FacebookHUG Nov 2010: HDFS Raid - Facebook
HUG Nov 2010: HDFS Raid - Facebook
 
Cloudera Desktop
Cloudera DesktopCloudera Desktop
Cloudera Desktop
 
3 avro hug-2010-07-21
3 avro hug-2010-07-213 avro hug-2010-07-21
3 avro hug-2010-07-21
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
 
Scaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and MesosScaling Big Data with Hadoop and Mesos
Scaling Big Data with Hadoop and Mesos
 

Mais de Hadoop User Group

Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-tools
Hadoop User Group
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010
Hadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Hadoop User Group
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Hadoop User Group
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
Hadoop User Group
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation Hadoop
Hadoop User Group
 

Mais de Hadoop User Group (17)

Common crawlpresentation
Common crawlpresentationCommon crawlpresentation
Common crawlpresentation
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
Cascalog internal dsl_preso
Cascalog internal dsl_presoCascalog internal dsl_preso
Cascalog internal dsl_preso
 
Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-tools
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
Pig at Linkedin
Pig at LinkedinPig at Linkedin
Pig at Linkedin
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation Hadoop
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

HUG August 2010: Mesos

  • 1. Mesos A Resource Management Platform for Hadoop and Big Data Clusters Benjamin Hindman, Andy Konwinski,MateiZaharia, Ali Ghodsi, Anthony D. Joseph, Randy Katz, Scott Shenker, Ion Stoica UC Berkeley
  • 2. Challenges in Hadoop Cluster Management Isolation (both fault and resource) E.g. if co-locating production and experimental jobs Testing and deploying upgrades JobTracker scalability (in larger clusters) Running non-Hadoop jobs E.g. MPI, streaming MapReduce, etc
  • 3. Mesos Mesosis a cluster resource manager over which multiple instances of Hadoop and other distributed applications can run Hadoop (production) Hadoop (ad-hoc) MPI … Mesos Node Node Node Node
  • 4. What’s Different About It? Other resource managers exist today, including Hadoop on Demand Batch schedulers (e.g. Torque) VM schedulers (e.g. Eucalyptus) However, have 2 problems with Hadoop-like workloads: Data locality compromised due to static partitioning of nodes Utilization hurt because jobs hold nodes for their full duration Mesos addresses these through two features: Fine-grained sharing at the level of tasks Two-level scheduling model where jobs control placement
  • 5. Fine-Grained Sharing Coarse-Grained (HOD, etc) Fine-Grained (Mesos) Hadoop 3 Hadoop 3 Hadoop 2 Hadoop 1 Hadoop 1 Hadoop 1 Hadoop 2 Hadoop 2 Hadoop 1 Hadoop 3 Hadoop 1 Hadoop 2 Hadoop 2 Hadoop 3 Hadoop 3 Hadoop 2 Hadoop 2 Hadoop 3 Hadoop 2 Hadoop 3 Hadoop 1 Hadoop 3 Hadoop 1 Hadoop 2
  • 6. Hadoop job Hadoopjob MPI job Hadoop 0.20 scheduler Hadoop 0.19 scheduler MPI scheduler Mesosmaster Mesosslave Mesosslave Mesosslave MPI executor MPI executor Hadoop 19 executor Hadoop 20 executor Hadoop 19 executor task task task task Mesos Architecture task
  • 7. Design Goals Scalability (to 10,000’s of nodes) Robustness (even to master failure) Flexibility (support wide variety of frameworks) Resulting design: simple, minimal core that pushes resource selection logic to frameworks
  • 8. Other Features Master fault tolerance using ZooKeeper Resource isolation using Linux Containers Isolate CPU and memory between tasks on each node In newest kernels, can isolate network & disk IO too Web UI for viewing cluster state Deploy scripts for private clusters and EC2
  • 9. Mesos Status Prototype in 10000 lines of C++ Ported frameworks: Hadoop (0.20.2), MPI (MPICH2), Torque New frameworks: Spark, Scala framework for iterative & interactive jobs Test deployments at Twitter and Facebook
  • 10. Results Dynamic Resource Sharing Data Locality Scalability
  • 12. Spark A framework for iterative and interactive cluster computing Matei Zaharia, MosharafChowdhury, Michael Franklin, Scott Shenker, Ion Stoica
  • 13. Spark Goals Support iterative applications Common in machine learning but problematic forMapReduce, Dryad, etc Retain MapReduce’s fault tolerance & scalability Experiment with programmability Integrate into Scala programming language Support interactive use from Scala interpreter
  • 14. Key Abstraction Resilient Distributed Datasets (RDDs) Collections of elements distributed across cluster that can persist across parallel operations Can be stored in memory, on disk, etc Can be transformed with map, filter, etc Automatically rebuilt on failure
  • 15. Example: Log Mining Load error messages from a log into memory, then interactively search for various patterns Cache 1 Base RDD Transformed RDD lines = spark.textFile(“hdfs://...”) errors = lines.filter(_.startsWith(“ERROR”)) messages = errors.map(_.split(‘’)(2)) cachedMsgs = messages.cache() Worker results tasks Block 1 Driver Cached RDD Parallel operation cachedMsgs.filter(_.contains(“foo”)).count Cache 2 cachedMsgs.filter(_.contains(“bar”)).count Worker . . . Cache 3 Block 2 Worker Block 3
  • 16. Example: Logistic Regression Goal: find best line separating two sets of points random initial line + + + + + + – + + – – + – + – – – – – – target
  • 17. Logistic Regression Code val data = spark.textFile(...).map(readPoint).cache() varw = Vector.random(D) for (i <- 1 to ITERATIONS) { val gradient = data.map(p => { val scale = (1/(1+exp(-p.y*(w dot p.x))) - 1) * p.y scale * p.x }).reduce(_ + _) w -= gradient } println("Finalw: " + w)
  • 18. Logistic Regression Performance 127 s / iteration first iteration 174 s further iterations 6 s
  • 20. Conclusion Mesos provides a stable platform to multiplex resources among diverse cluster applications Spark is a new cluster programming framework for iterative & interactive jobs enabled by Mesos Both are open-source (but in very early alpha!) http://github.com/mesos http://mesos.berkeley.edu

Notas do Editor

  1. Mention it’s a research project but has recently gone open source
  2. Mention Hama, Pregel, Dremel, etc; also mention Spark***Mention how we want to enable NEW frameworks too!!***
  3. Mention how we want to enable NEW frameworks too!
  4. Note here that even users of Hadoop want Mesos for the isolation it provides
  5. Explain that Hadoop port uses pluggable scheduler, hence few changes to Hadoop and no changes to job clients
  6. Point out that Scala is a modern PL etcMention DryadLINQ (but we go beyond it with RDDs)Point out that interactive use and iterative use go hand in hand because both require small tasks and dataset reuse
  7. Note that dataset is reused on each gradient computation
  8. This is for a 29 GB dataset on 20 EC2 m1.xlarge machines (4 cores each)
  9. This slide has to say we’re committed to making Nexus “real”, and that it will be open sourced