SlideShare uma empresa Scribd logo
1 de 20
Scalable On-Demand Hadoop
Clusters with Docker and
Mesos
Andrew Nelson, Nutanix
@vmwnelson http://virtual-hiking.blogspot.com
Chris Mutchler, VMware
@chrismutchler http://virtualelephant.com
V
Agenda
 New Approach for Hadoop Ops
 Infrastructure Resource Considerations
 Docker as the new “Unit of Work”
 Future Work
2
Last Year’s State of the Art
 Self-service and multi-tenant Hadoop
 Elastic and decoupled infrastructure
 Extensible blueprinting
3
New Goals
 Operationalize multiple frameworks
 Decoupled service architecture
 Flexible and developer-friendly form factor
4
Apache Mesos Introduction
 Started at Berkeley
 Graduated to top level Apache project
2013
 Commercial entity is Mesosphere
 https://github.com/apache/mesos/
5
Mesos Architecture
6
Source: http://mesos.apache.org/assets/img/documentation/architecture3.jpg
Mesos as a Multi-Tenant
Resource Pool
7
Source: https://github.com/mesos/myriad/blob/phase1/docs/how-it-works.md
Tools to Build and Scale
 Serengeti, Vmware
 https://github.com/vmware-serengeti
 BOSH, Pivotal
 https://github.com/cloudfoundry/bosh
 Cloudify, Gigaspaces
 https://github.com/CloudifySource/cloudify
 Cloudbreak, SequenceIQ
 https://github.com/sequenceiq/cloudbreak
8
Advantages for Ops
 Mesos as a Resource Pool
 Multiple concurrent frameworks
 Decouple frameworks from resource pools
9
Compute Partitions on Mesos
10
Shared
Hadoop
Storm
Spark
Kafka
Hadoop Cassandra Storm Spark
Marathon
Cassandra
Siloed
HDFS as a Service
11
Namenode
Standby
Namenode
Secondary
Namenode
HDFS
MapReduce
Spark
Hive
Storm
…
Networking Services
 Service Discovery
 Handled per framework
 Port range resource managed by Mesos slave
 For example, Marathon uses HAProxy for request routing
 Per-container network monitoring
 Egress rate-limiting
12
Scheduling Options
 Mesos scheduling
 Capacity Scheduler
 Fair Scheduler
 Tenant scheduling examples
 Hadoop on Mesos
 Myriad (YARN) on Mesos
13
Dev Workflow
 Code Repo / Registry
 Pull / Push / Commit / Run
 Automated Builds
 Version tagging
 Marathon CI / CD
 Dependencies
 Rolling restarts
14
Registry Services
 Pluggable storage
 Webhooks
 Image control
 Security
 Logging
15
Registry
Repository Repository
Image Image Image
Advantages for Developers
 Interchangeable verbs for code<->containers
 Choice of framework to use as their PaaS
 Adopt microservices approach to app pipeline
16
Recommendations for Success
 Start small, scale fast
 Use most appropriate framework for the job
 Think ahead, decouple
 Plan for rolling restart capacity up front
17
Gap Analysis
 Be prepared to “look under the hood”
 Variable maturity and resiliency of the layers
 Networking
 Security
18
Where Are We Going Next
 Scale and learn
 Container-focused OS
 Software-defined networking services
 Discover key performance and availability metrics
19
Wrapping up
 Mesos allows for choice of framework
 Devs utilize Docker with familiar workflow
 Portable, flexible, and scalable architecture
20

Mais conteúdo relacionado

Mais procurados

Stratoscale Latest and Greatest
Stratoscale Latest and GreatestStratoscale Latest and Greatest
Stratoscale Latest and GreatestZach Lanksbury
 
Openshift Container Platform on Azure
Openshift Container Platform on Azure Openshift Container Platform on Azure
Openshift Container Platform on Azure Glenn West
 
Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesYousun Jeong
 
Scaling drupal on amazon web services dr
Scaling drupal on amazon web services drScaling drupal on amazon web services dr
Scaling drupal on amazon web services drTristan Roddis
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Successfully deploy build manage your cloud with cloud stack2
Successfully deploy build manage your cloud with cloud stack2Successfully deploy build manage your cloud with cloud stack2
Successfully deploy build manage your cloud with cloud stack2ke4qqq
 
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016 Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016 Lucas Jellema
 
Understanding AWS with Terraform
Understanding AWS with TerraformUnderstanding AWS with Terraform
Understanding AWS with TerraformKnoldus Inc.
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Spark Summit
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudScott Miao
 
Scale your docker containers with Mesos
Scale your docker containers with MesosScale your docker containers with Mesos
Scale your docker containers with MesosTimothy Chen
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.
 
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...CodeOps Technologies LLP
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreScaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreDropsolid
 
Dag Sonstebo - CloudStack usage service
Dag Sonstebo - CloudStack usage serviceDag Sonstebo - CloudStack usage service
Dag Sonstebo - CloudStack usage serviceShapeBlue
 
Redis Labs and SQL Server
Redis Labs and SQL ServerRedis Labs and SQL Server
Redis Labs and SQL ServerLynn Langit
 

Mais procurados (19)

Stratoscale Latest and Greatest
Stratoscale Latest and GreatestStratoscale Latest and Greatest
Stratoscale Latest and Greatest
 
Openshift Container Platform on Azure
Openshift Container Platform on Azure Openshift Container Platform on Azure
Openshift Container Platform on Azure
 
Spark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on KubernetesSpark day 2017 - Spark on Kubernetes
Spark day 2017 - Spark on Kubernetes
 
Scaling drupal on amazon web services dr
Scaling drupal on amazon web services drScaling drupal on amazon web services dr
Scaling drupal on amazon web services dr
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Successfully deploy build manage your cloud with cloud stack2
Successfully deploy build manage your cloud with cloud stack2Successfully deploy build manage your cloud with cloud stack2
Successfully deploy build manage your cloud with cloud stack2
 
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016 Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016
Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016
 
Understanding AWS with Terraform
Understanding AWS with TerraformUnderstanding AWS with Terraform
Understanding AWS with Terraform
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
Scale your docker containers with Mesos
Scale your docker containers with MesosScale your docker containers with Mesos
Scale your docker containers with Mesos
 
How to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized EnvironmentHow to Protect Big Data in a Containerized Environment
How to Protect Big Data in a Containerized Environment
 
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreScaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more
 
Dag Sonstebo - CloudStack usage service
Dag Sonstebo - CloudStack usage serviceDag Sonstebo - CloudStack usage service
Dag Sonstebo - CloudStack usage service
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 
Scaling HDFS at Xiaomi
Scaling HDFS at XiaomiScaling HDFS at Xiaomi
Scaling HDFS at Xiaomi
 
Redis Labs and SQL Server
Redis Labs and SQL ServerRedis Labs and SQL Server
Redis Labs and SQL Server
 

Destaque

Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsCognizant
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopGERARDO BARBERENA
 
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLR
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLRData Infrastructure on Hadoop - Hadoop Summit 2011 BLR
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLRSeetharam Venkatesh
 
A Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataA Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataEdward Hsu
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland
 
Hi Speed Datawarehousing
Hi Speed DatawarehousingHi Speed Datawarehousing
Hi Speed DatawarehousingJos van Dongen
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Cambridge Semantics
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI DutchJos van Dongen
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo UnstructuredCambridge Semantics
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureRoman Nikitchenko
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Jos van Dongen
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraRobbie Strickland
 
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleCambridge Semantics
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraRobbie Strickland
 
How to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using SemanticsHow to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using SemanticsCambridge Semantics
 

Destaque (20)

Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Introduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to HadoopIntroduccion a Hadoop / Introduction to Hadoop
Introduccion a Hadoop / Introduction to Hadoop
 
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLR
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLRData Infrastructure on Hadoop - Hadoop Summit 2011 BLR
Data Infrastructure on Hadoop - Hadoop Summit 2011 BLR
 
Final White Paper_
Final White Paper_Final White Paper_
Final White Paper_
 
7+1 myths of the new os
7+1 myths of the new os7+1 myths of the new os
7+1 myths of the new os
 
A Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataA Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big Data
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Hi Speed Datawarehousing
Hi Speed DatawarehousingHi Speed Datawarehousing
Hi Speed Datawarehousing
 
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
Transforming Data Management and Time to Insight with Anzo Smart Data Lake®
 
Data Scientist 101 BI Dutch
Data Scientist 101 BI DutchData Scientist 101 BI Dutch
Data Scientist 101 BI Dutch
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
Big data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructureBig data technologies and Hadoop infrastructure
Big data technologies and Hadoop infrastructure
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?
 
Always On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on CassandraAlways On: Building Highly Available Applications on Cassandra
Always On: Building Highly Available Applications on Cassandra
 
Graph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise ScaleGraph-based Discovery and Analytics at Enterprise Scale
Graph-based Discovery and Analytics at Enterprise Scale
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Online Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and CassandraOnline Analytics with Hadoop and Cassandra
Online Analytics with Hadoop and Cassandra
 
How to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using SemanticsHow to Build a Smart Data Lake Using Semantics
How to Build a Smart Data Lake Using Semantics
 

Semelhante a Scalable On-Demand Hadoop Clusters with Docker and Mesos

Highly scalable caching service on cloud - Redis
Highly scalable caching service on cloud - RedisHighly scalable caching service on cloud - Redis
Highly scalable caching service on cloud - RedisKrishna-Kumar
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSSteve Wong
 
Mesos vs kubernetes comparison
Mesos vs kubernetes comparisonMesos vs kubernetes comparison
Mesos vs kubernetes comparisonKrishna-Kumar
 
OpenSlava 2014 - CloudFoundry inside-out
OpenSlava 2014 - CloudFoundry inside-outOpenSlava 2014 - CloudFoundry inside-out
OpenSlava 2014 - CloudFoundry inside-outAntons Kranga
 
The New Stack Container Summit Talk
The New Stack Container Summit TalkThe New Stack Container Summit Talk
The New Stack Container Summit TalkThe New Stack
 
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemNETWAYS
 
Open source based container solution in Azure - May Docker Meetup
Open source based container solution in Azure - May Docker MeetupOpen source based container solution in Azure - May Docker Meetup
Open source based container solution in Azure - May Docker MeetupWiredcraft
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewKrishna-Kumar
 
PaaS with Docker
PaaS with DockerPaaS with Docker
PaaS with DockerAditya Jain
 
Mesosphere quick overview
Mesosphere quick overviewMesosphere quick overview
Mesosphere quick overviewKrishna-Kumar
 
Net core microservice development made easy with azure dev spaces
Net core microservice development made easy with azure dev spacesNet core microservice development made easy with azure dev spaces
Net core microservice development made easy with azure dev spacesAlon Fliess
 
Moving Your Enterprise to the Cloud
Moving Your Enterprise to the CloudMoving Your Enterprise to the Cloud
Moving Your Enterprise to the CloudImesh Gunaratne
 
Platform as a Service
Platform as a ServicePlatform as a Service
Platform as a ServiceAshok Kumar
 
Mesos: Cluster Management System
Mesos: Cluster Management SystemMesos: Cluster Management System
Mesos: Cluster Management SystemErhan Bagdemir
 
Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Amazon Web Services
 
Comparison of Several PaaS Cloud Computing Platforms
Comparison of Several PaaS Cloud Computing PlatformsComparison of Several PaaS Cloud Computing Platforms
Comparison of Several PaaS Cloud Computing Platformsijsrd.com
 
A clear strategy for moving your enterprise to the cloud
A clear strategy for moving your enterprise to the cloudA clear strategy for moving your enterprise to the cloud
A clear strategy for moving your enterprise to the cloudWSO2
 
Cloud Native Application @ VMUG.IT 20150529
Cloud Native Application @ VMUG.IT 20150529Cloud Native Application @ VMUG.IT 20150529
Cloud Native Application @ VMUG.IT 20150529VMUG IT
 
Apache Mesos Overview and Integration
Apache Mesos Overview and IntegrationApache Mesos Overview and Integration
Apache Mesos Overview and IntegrationAlex Baretto
 

Semelhante a Scalable On-Demand Hadoop Clusters with Docker and Mesos (20)

Highly scalable caching service on cloud - Redis
Highly scalable caching service on cloud - RedisHighly scalable caching service on cloud - Redis
Highly scalable caching service on cloud - Redis
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
Mesos vs kubernetes comparison
Mesos vs kubernetes comparisonMesos vs kubernetes comparison
Mesos vs kubernetes comparison
 
OpenSlava 2014 - CloudFoundry inside-out
OpenSlava 2014 - CloudFoundry inside-outOpenSlava 2014 - CloudFoundry inside-out
OpenSlava 2014 - CloudFoundry inside-out
 
The New Stack Container Summit Talk
The New Stack Container Summit TalkThe New Stack Container Summit Talk
The New Stack Container Summit Talk
 
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemOSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System
 
Open source based container solution in Azure - May Docker Meetup
Open source based container solution in Azure - May Docker MeetupOpen source based container solution in Azure - May Docker Meetup
Open source based container solution in Azure - May Docker Meetup
 
Mesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overviewMesos and Kubernetes ecosystem overview
Mesos and Kubernetes ecosystem overview
 
PaaS with Docker
PaaS with DockerPaaS with Docker
PaaS with Docker
 
Mesosphere quick overview
Mesosphere quick overviewMesosphere quick overview
Mesosphere quick overview
 
PaaS Solutions Comparison
PaaS Solutions ComparisonPaaS Solutions Comparison
PaaS Solutions Comparison
 
Net core microservice development made easy with azure dev spaces
Net core microservice development made easy with azure dev spacesNet core microservice development made easy with azure dev spaces
Net core microservice development made easy with azure dev spaces
 
Moving Your Enterprise to the Cloud
Moving Your Enterprise to the CloudMoving Your Enterprise to the Cloud
Moving Your Enterprise to the Cloud
 
Platform as a Service
Platform as a ServicePlatform as a Service
Platform as a Service
 
Mesos: Cluster Management System
Mesos: Cluster Management SystemMesos: Cluster Management System
Mesos: Cluster Management System
 
Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS Cloud Has Become the New Normal: TCS
Cloud Has Become the New Normal: TCS
 
Comparison of Several PaaS Cloud Computing Platforms
Comparison of Several PaaS Cloud Computing PlatformsComparison of Several PaaS Cloud Computing Platforms
Comparison of Several PaaS Cloud Computing Platforms
 
A clear strategy for moving your enterprise to the cloud
A clear strategy for moving your enterprise to the cloudA clear strategy for moving your enterprise to the cloud
A clear strategy for moving your enterprise to the cloud
 
Cloud Native Application @ VMUG.IT 20150529
Cloud Native Application @ VMUG.IT 20150529Cloud Native Application @ VMUG.IT 20150529
Cloud Native Application @ VMUG.IT 20150529
 
Apache Mesos Overview and Integration
Apache Mesos Overview and IntegrationApache Mesos Overview and Integration
Apache Mesos Overview and Integration
 

Mais de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Mais de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Último (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Scalable On-Demand Hadoop Clusters with Docker and Mesos

  • 1. Scalable On-Demand Hadoop Clusters with Docker and Mesos Andrew Nelson, Nutanix @vmwnelson http://virtual-hiking.blogspot.com Chris Mutchler, VMware @chrismutchler http://virtualelephant.com V
  • 2. Agenda  New Approach for Hadoop Ops  Infrastructure Resource Considerations  Docker as the new “Unit of Work”  Future Work 2
  • 3. Last Year’s State of the Art  Self-service and multi-tenant Hadoop  Elastic and decoupled infrastructure  Extensible blueprinting 3
  • 4. New Goals  Operationalize multiple frameworks  Decoupled service architecture  Flexible and developer-friendly form factor 4
  • 5. Apache Mesos Introduction  Started at Berkeley  Graduated to top level Apache project 2013  Commercial entity is Mesosphere  https://github.com/apache/mesos/ 5
  • 7. Mesos as a Multi-Tenant Resource Pool 7 Source: https://github.com/mesos/myriad/blob/phase1/docs/how-it-works.md
  • 8. Tools to Build and Scale  Serengeti, Vmware  https://github.com/vmware-serengeti  BOSH, Pivotal  https://github.com/cloudfoundry/bosh  Cloudify, Gigaspaces  https://github.com/CloudifySource/cloudify  Cloudbreak, SequenceIQ  https://github.com/sequenceiq/cloudbreak 8
  • 9. Advantages for Ops  Mesos as a Resource Pool  Multiple concurrent frameworks  Decouple frameworks from resource pools 9
  • 10. Compute Partitions on Mesos 10 Shared Hadoop Storm Spark Kafka Hadoop Cassandra Storm Spark Marathon Cassandra Siloed
  • 11. HDFS as a Service 11 Namenode Standby Namenode Secondary Namenode HDFS MapReduce Spark Hive Storm …
  • 12. Networking Services  Service Discovery  Handled per framework  Port range resource managed by Mesos slave  For example, Marathon uses HAProxy for request routing  Per-container network monitoring  Egress rate-limiting 12
  • 13. Scheduling Options  Mesos scheduling  Capacity Scheduler  Fair Scheduler  Tenant scheduling examples  Hadoop on Mesos  Myriad (YARN) on Mesos 13
  • 14. Dev Workflow  Code Repo / Registry  Pull / Push / Commit / Run  Automated Builds  Version tagging  Marathon CI / CD  Dependencies  Rolling restarts 14
  • 15. Registry Services  Pluggable storage  Webhooks  Image control  Security  Logging 15 Registry Repository Repository Image Image Image
  • 16. Advantages for Developers  Interchangeable verbs for code<->containers  Choice of framework to use as their PaaS  Adopt microservices approach to app pipeline 16
  • 17. Recommendations for Success  Start small, scale fast  Use most appropriate framework for the job  Think ahead, decouple  Plan for rolling restart capacity up front 17
  • 18. Gap Analysis  Be prepared to “look under the hood”  Variable maturity and resiliency of the layers  Networking  Security 18
  • 19. Where Are We Going Next  Scale and learn  Container-focused OS  Software-defined networking services  Discover key performance and availability metrics 19
  • 20. Wrapping up  Mesos allows for choice of framework  Devs utilize Docker with familiar workflow  Portable, flexible, and scalable architecture 20

Notas do Editor

  1. I'm going to be discussing some new opportunities to change the operational model of Hadoop and how to accommodate new services as well as work on better integration and end to end testing of modern application pipelines. This has everything to do with how ops can provide devs with the most flexible building environment without stretching too far to try and support everything. Key takeaways: Hadoop+docker for lightweight self-service on your laptop, in your cloud For building modern app pipelines, need CI/CD, to iterate faster, need this self-service, customizable framework to build what the devs want to build Evaluate whether yarn fits your needs or mesos Just pick a physical form factor or pick a cloud and move on, with portability in mind, unique situation in so many software choices that will affect your ultimate product more than hardware will Test and iterate, scale and learn
  2. Last year, Chris and I talked about how Adobe was virtualizing their Hadoop clusters in order to emulate a public cloud environment. Developers wanted to be able to be more flexible in what kind of Hadoop cluster was deployed, sizing, which templates, and which distro they wanted to work with. All of these things could be customized and were enabled for self-service. Potentially each developer could utilize their own private, dedicated cluster for experimentation and not have to worry about dedicated hardware. The automation and blueprints necessary were shared via catalog and extended to accommodate more than just Hadoop to include other distributed systems such as Storm, Kafka, Mesos, etc.
  3. One key realization is that you can't get there with just one framework. There are a ton of different solutions out there for cluster management and for different frameworks, different building blocks that devs can use to build their app and its date pipeline. So we needed to be able to be more flexible in giving developers options for building their desired service. Should they be building realtime or batch workloads, how will they scale? What if parameters need to be changed as they scale? So many questions and new code to look at and devs need to be just as quick about evaluating what tools are helpful and worth including as what code they are adding in themselves With all of these different frameworks, and to retain the element of flexibility once they go down a road, the devs need to ensure they remain loosely coupled. Otherwise all this flexibility was kinda pointless. What's flexible about having to go back and start from scratch? You could do that before and it was in a lot simpler system right? Now we're all platform-building, even if we're using someone else's services to bootstrap basic functionality. We need to deliver reliability somewhere before we get to the top of the stack. That's what CI and CD are basically about, imo. So what we need that is telatively portable, easily resizable across these different frameworks and reasonably self-contained so that we can pick it up and move it around when we need to? Last year the currency was VMs. We could resize, repurpose, share hardware, and blueprint. I have worked with VMs in high performance and I don't think that's the issue. However, they are not developer-friendly. Dev-friendly to me is basically infrastructure as code, or even infra as text files. As an architect I want devs to feel free to customize, do it themselves, and be able to interact with the system in a form factor that is consistent with their processes. Key part of self-service is choice
  4. Users: Twitter, Airbnb, Apple, Ebay Aurora, Marathon, Chronos
  5. http://mesos.apache.org/documentation/latest/mesos-architecture/ http://mesos.apache.org/assets/img/documentation/architecture3.jpg So from an infara perspective, why not just work on YARN. Well, YARN is not a hierarchical scheduler frmawork. It’s a framework for writing scalable analytics jobs and it does that really well. But how to encapsulate infra for jobs that don't fit that model. Maybe next year, YARN will have a competely different set of capabilities but for now, we have devs with those diverse set of job characteristics. Allows for multiple executors Allows for multiple independent schedulers Allows for multiple frameworks / toolsets Highly available master The master enables fine-grained sharing of resources (cpu, ram, …) across applications by making them resource offers. Each resource offer contains a list of . The master decides how many resources to offer to each framework according to a given organizational policy, such as fair sharing, or strict priority. To support a diverse set of policies, the master employs a modular architecture that makes it easy to add new allocation modules via a plugin mechanism. A framework running on top of Mesos consists of two components: a scheduler that registers with the master to be offered resources, and an executor process that is launched on slave nodes to run the framework’s tasks (/documentation/latest/see theApp/Framework development guide for more details about application schedulers and executors). While the master determines how many resources are offered to each framework, the frameworks' schedulers select which of the offered resources to use. When a frameworks accepts offered resources, it passes to Mesos a description of the tasks it wants to run on them. In turn, Mesos launches the tasks on the corresponding slaves.
  6. https://github.com/mesos/myriad/blob/phase1/docs/how-it-works.md https://github.com/mesos/myriad/raw/phase1/docs/images/how-it-works.png Each tenant has their own framework Each tenant can derive their own scheduling Each tenant can leverage services in a decoupled fashion
  7. This list will probably keep growing before it becomes consolidated. This is about blueprinting the distributed systems. There will typically be an infrastructure layer and a configuration management layer. Vmw is a solution based on vmware vcenter and chef obviously. There is the flexibility of creating your own roles and recipes but dependent on vmw licensing based on sockets. There is only a single template ever at any given time and calls are blocking meaning only one cluster can be in any stage of cration at any given time. Bosh is its own animal, originally conceived as a way to stand up cloud foundry because it is its own distributed system that can't instantiate itself. There is a director-based version or bosh-init as a quick and less heavyweight CLI. Bosh uses yaml as its conf format of choice. It can handle any cloud platform with a known CPI or cloud platform interface. Its templates are called stemcells. It has an async queue kv store with multiple workers that can build in parallel. Networking and dns are fully declared in the manifest but have to be much more explicit. Cloudbreak is relatively new cloud agnostic framework that uses cloud specific APIs for building out components, for example aws cloudformation. For hadoop blueprints, it uses ambari and at the guest-image level, everything is docker with swarm for clustering and consul for communication and service mgmt Clouidfy uses open source tosca blueprints which are yaml files that contain srvice definitions, tiers and dependencies. Cloudify determines the infra compatibility layer and config mgmt is chef or puppet
  8. Mesos is fundamentally a framework for accommodating different frameworks on the same hardware using cgroups, docker
  9. http://mesos.apache.org/documentation/latest/mesos-frameworks/ Compute is determined by resource offers. Instead of trying to fit a workload on whats left of a host, the host or worker advertises some resources, its up to the framework what it can accept and provision or wait.
  10. You have HA, checkpointing, and a common durable and resilient storage layer that can support the ecosystem of compute platforms. MapReduce (batch) Spark (In-memory) HIVE (SQL) Storm (streaming) Solr (Lucene Search) Flume Kafka (with Camus)
  11. Imo, the most immature portion of the tenant svcs of mesos but still headed in the right direction. Frameworks don’t want to manage ports or physical networking. Allow for per container granularity monitoring and logging which is good for debugging.
  12. These are the top-level scheduling algorithms that Mesos can use. Remember that it’s a hierarchy. When a job request comes into the YARN resource manager, YARN evaluates all the resources available, and it places the job. It’s the one making the decision where jobs should go… YARN is optimized for scheduling Hadoop jobs, which are historically (and still typically) batch jobs with long run times. This means that YARN was not designed for long-running services, nor for short-lived interactive queries…, and while it’s possible to have it schedule other kinds of workloads, this is not an ideal model. … uses a two-level scheduling mechanism where resource offers are made to frameworks (applications that run on top of Mesos). The Mesos master node decides how many resources to offer each framework, while each framework determines the resources it accepts and what application to execute on those resources. This method of resource allocation allows near-optimal data locality when sharing a cluster of nodes amongst diverse frameworks. This open source software project is both a Mesos framework and a YARN scheduler that enables Mesos to manage YARN resource requests. When a job comes into YARN, it will schedule it via the Myriad Scheduler, which will match the request to incoming Mesos resource offers. Mesos, in turn, will pass it on to the Mesos worker nodes. The Mesos nodes will then communicate the request to a Myriad executor which is running the YARN node manager. Myriad launches YARN node managers on Mesos resources, which then communicate to the YARN resource manager what resources are available to them. YARN can then consume the resources as it sees fit. Myriad provides a seamless bridge from the pool of resources available in Mesos to the YARN tasks that want those resources.
  13. Developers can push their code and Dockerfile to Git, as they usually do From there, Jenkins can build a container from the Dockerfile and then publish to a registry
  14. As typical, will there be template-creep? Container-creep? Image curation and testing necessary, but hopefully this fits into your CI/CD methodology.
  15. Working with Docker for developers should feel very familiar. Docker push, pull, commit Version dependency and tag-based search verbs Can choose from Marathon, YARN 2.7.0 CI/CD with cloudbees, shippable, drone, jenkins, on and on
  16. Logging is key, of course, best to test and iterate since stuff will break and pick a method that allows you to revert easily Decouple! Be ready to pull in network teams and security teams early and often The SDN decoupling is in progress but for now, infra should be ready to be explicit so devs don’t have to be Don’t just shift complexity, abstract Security, SDLC and infrastructure and ops and…
  17. Often need to change as we scale Remove the guest os as much as possible, options are multiplying, coreos, lxd, msft nano, rhat atomic, vmware photon Don’t know which will work better so need to test and iterate, ultimately we want decoupled so it doesn’t or shouldn’t matter A lot of maturation in the SDN space, controllers are just reaching scalability of thousands of VMs, what happens when I throw a million containers at them? Test and iterate
  18. YARN can be first class citizen, avoids siloeing datacenter Avoid siloing dev into specific frameworks Docker is the new currency for continuous test and deployment of code in infrastructure as text form factor for CI/CD