Scalable On-Demand Hadoop Clusters with Docker and Mesos

•

5 gostaram•2,955 visualizações

DataWorks Summit

Hadoop Summit 2015

Tecnologia

Scalable On-Demand Hadoop
Clusters with Docker and
Mesos
Andrew Nelson, Nutanix
@vmwnelson http://virtual-hiking.blogspot.com
Chris Mutchler, VMware
@chrismutchler http://virtualelephant.com
V

Agenda
 New Approach for Hadoop Ops
 Infrastructure Resource Considerations
 Docker as the new “Unit of Work”
 Future Work
2

Last Year’s State of the Art
 Self-service and multi-tenant Hadoop
 Elastic and decoupled infrastructure
 Extensible blueprinting
3

New Goals
 Operationalize multiple frameworks
 Decoupled service architecture
 Flexible and developer-friendly form factor
4

Apache Mesos Introduction
 Started at Berkeley
 Graduated to top level Apache project
2013
 Commercial entity is Mesosphere
 https://github.com/apache/mesos/
5

Mesos Architecture
6
Source: http://mesos.apache.org/assets/img/documentation/architecture3.jpg

Mesos as a Multi-Tenant
Resource Pool
7
Source: https://github.com/mesos/myriad/blob/phase1/docs/how-it-works.md

Tools to Build and Scale
 Serengeti, Vmware
 https://github.com/vmware-serengeti
 BOSH, Pivotal
 https://github.com/cloudfoundry/bosh
 Cloudify, Gigaspaces
 https://github.com/CloudifySource/cloudify
 Cloudbreak, SequenceIQ
 https://github.com/sequenceiq/cloudbreak
8

Advantages for Ops
 Mesos as a Resource Pool
 Multiple concurrent frameworks
 Decouple frameworks from resource pools
9

Compute Partitions on Mesos
10
Shared
Hadoop
Storm
Spark
Kafka
Hadoop Cassandra Storm Spark
Marathon
Cassandra
Siloed

HDFS as a Service
11
Namenode
Standby
Namenode
Secondary
Namenode
HDFS
MapReduce
Spark
Hive
Storm
…

Networking Services
 Service Discovery
 Handled per framework
 Port range resource managed by Mesos slave
 For example, Marathon uses HAProxy for request routing
 Per-container network monitoring
 Egress rate-limiting
12

Scheduling Options
 Mesos scheduling
 Capacity Scheduler
 Fair Scheduler
 Tenant scheduling examples
 Hadoop on Mesos
 Myriad (YARN) on Mesos
13

Dev Workflow
 Code Repo / Registry
 Pull / Push / Commit / Run
 Automated Builds
 Version tagging
 Marathon CI / CD
 Dependencies
 Rolling restarts
14

Registry Services
 Pluggable storage
 Webhooks
 Image control
 Security
 Logging
15
Registry
Repository Repository
Image Image Image

Advantages for Developers
 Interchangeable verbs for code<->containers
 Choice of framework to use as their PaaS
 Adopt microservices approach to app pipeline
16

Recommendations for Success
 Start small, scale fast
 Use most appropriate framework for the job
 Think ahead, decouple
 Plan for rolling restart capacity up front
17

Gap Analysis
 Be prepared to “look under the hood”
 Variable maturity and resiliency of the layers
 Networking
 Security
18

Where Are We Going Next
 Scale and learn
 Container-focused OS
 Software-defined networking services
 Discover key performance and availability metrics
19

Wrapping up
 Mesos allows for choice of framework
 Devs utilize Docker with familiar workflow
 Portable, flexible, and scalable architecture
20

Mais conteúdo relacionado

Mais procurados

Stratoscale Latest and GreatestZach Lanksbury

Openshift Container Platform on Azure Glenn West

Spark day 2017 - Spark on KubernetesYousun Jeong

Scaling drupal on amazon web services drTristan Roddis

HPC and cloud distributed computing, as a journeyPeter Clapham

Successfully deploy build manage your cloud with cloud stack2ke4qqq

Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016 Lucas Jellema

Understanding AWS with TerraformKnoldus Inc.

Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Spark Summit

Achieve big data analytic platform with lambda architecture on cloudScott Miao

Scale your docker containers with MesosTimothy Chen

How to Protect Big Data in a Containerized EnvironmentBlueData, Inc.

Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...CodeOps Technologies LLP

Cassandra on Docker @ Walmart LabsDataStax Academy

Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and moreDropsolid

Dag Sonstebo - CloudStack usage serviceShapeBlue

Flexible computePeter Clapham

Scaling HDFS at XiaomiDataWorks Summit

Redis Labs and SQL ServerLynn Langit

Mais procurados (19)

Stratoscale Latest and Greatest

Openshift Container Platform on Azure

Spark day 2017 - Spark on Kubernetes

Scaling drupal on amazon web services dr

HPC and cloud distributed computing, as a journey

Successfully deploy build manage your cloud with cloud stack2

Soaring through the Clouds - Oracle Fusion Middleware Partner Forum 2016

Understanding AWS with Terraform

Running Spark Inside Containers with Haohai Ma and Khalid Ahmed

Achieve big data analytic platform with lambda architecture on cloud

Scale your docker containers with Mesos

How to Protect Big Data in a Containerized Environment

Must Know Azure Kubernetes Best Practices And Features For Better Resiliency ...

Cassandra on Docker @ Walmart Labs

Scaling Drupal in AWS Using AutoScaling, Cloudformation, RDS and more

Dag Sonstebo - CloudStack usage service

Flexible compute

Scaling HDFS at Xiaomi

Redis Labs and SQL Server

Destaque

Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.

Infrastructure Considerations for Analytical WorkloadsCognizant

Introduccion a Hadoop / Introduction to HadoopGERARDO BARBERENA

Data Infrastructure on Hadoop - Hadoop Summit 2011 BLRSeetharam Venkatesh

Final White Paper_Ryan Ellingson

7+1 myths of the new osAlexis Richardson

A Journey to Modern Apps with Containers, Microservices and Big DataEdward Hsu

Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland

Hi Speed DatawarehousingJos van Dongen

Transforming Data Management and Time to Insight with Anzo Smart Data Lake®Cambridge Semantics

Data Scientist 101 BI DutchJos van Dongen

Introduction to Anzo UnstructuredCambridge Semantics

Big data technologies and Hadoop infrastructureRoman Nikitchenko

SnappyData overview NikeTechTalk 11/19/15SnappyData

Database Shootout: What's best for BI?Jos van Dongen

Always On: Building Highly Available Applications on CassandraRobbie Strickland

Graph-based Discovery and Analytics at Enterprise ScaleCambridge Semantics

NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson

Online Analytics with Hadoop and CassandraRobbie Strickland

How to Build a Smart Data Lake Using SemanticsCambridge Semantics

Destaque (20)

Lessons Learned Running Hadoop and Spark in Docker Containers

Infrastructure Considerations for Analytical Workloads

Introduccion a Hadoop / Introduction to Hadoop

Data Infrastructure on Hadoop - Hadoop Summit 2011 BLR

Final White Paper_

7+1 myths of the new os

A Journey to Modern Apps with Containers, Microservices and Big Data

Lambda at Weather Scale - Cassandra Summit 2015

Hi Speed Datawarehousing

Transforming Data Management and Time to Insight with Anzo Smart Data Lake®

Data Scientist 101 BI Dutch

Introduction to Anzo Unstructured

Big data technologies and Hadoop infrastructure

SnappyData overview NikeTechTalk 11/19/15

Database Shootout: What's best for BI?

Always On: Building Highly Available Applications on Cassandra

Graph-based Discovery and Analytics at Enterprise Scale

NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis

Online Analytics with Hadoop and Cassandra

How to Build a Smart Data Lake Using Semantics

Semelhante a Scalable On-Demand Hadoop Clusters with Docker and Mesos

Highly scalable caching service on cloud - RedisKrishna-Kumar

Introduction to Apache Mesos and DC/OSSteve Wong

Mesos vs kubernetes comparisonKrishna-Kumar

OpenSlava 2014 - CloudFoundry inside-outAntons Kranga

The New Stack Container Summit TalkThe New Stack

OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating SystemNETWAYS

Open source based container solution in Azure - May Docker MeetupWiredcraft

Mesos and Kubernetes ecosystem overviewKrishna-Kumar

PaaS with DockerAditya Jain

Mesosphere quick overviewKrishna-Kumar

PaaS Solutions ComparisonGlobalLogic Ukraine

Net core microservice development made easy with azure dev spacesAlon Fliess

Moving Your Enterprise to the CloudImesh Gunaratne

Platform as a ServiceAshok Kumar

Mesos: Cluster Management SystemErhan Bagdemir

Cloud Has Become the New Normal: TCS Amazon Web Services

Comparison of Several PaaS Cloud Computing Platformsijsrd.com

A clear strategy for moving your enterprise to the cloudWSO2

Cloud Native Application @ VMUG.IT 20150529VMUG IT

Apache Mesos Overview and IntegrationAlex Baretto

Semelhante a Scalable On-Demand Hadoop Clusters with Docker and Mesos (20)

Highly scalable caching service on cloud - Redis

Introduction to Apache Mesos and DC/OS

Mesos vs kubernetes comparison

OpenSlava 2014 - CloudFoundry inside-out

The New Stack Container Summit Talk

OSDC 2015: Bernd Mathiske | Why the Datacenter Needs an Operating System

Open source based container solution in Azure - May Docker Meetup

Mesos and Kubernetes ecosystem overview

PaaS with Docker

Mesosphere quick overview

PaaS Solutions Comparison

Net core microservice development made easy with azure dev spaces

Moving Your Enterprise to the Cloud

Platform as a Service

Mesos: Cluster Management System

Cloud Has Become the New Normal: TCS

Comparison of Several PaaS Cloud Computing Platforms

A clear strategy for moving your enterprise to the cloud

Cloud Native Application @ VMUG.IT 20150529

Apache Mesos Overview and Integration

Mais de DataWorks Summit

Data Science Crash CourseDataWorks Summit

Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit

HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit

Managing the Dewey Decimal SystemDataWorks Summit

Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit

HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit

Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit

Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit

Security Framework for Multitenant ArchitectureDataWorks Summit

Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit

Extending Twitter's Data Platform to Google CloudDataWorks Summit

Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit

Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit

Computer Vision: Coming to a Store Near YouDataWorks Summit

Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit

Mais de DataWorks Summit (20)

Data Science Crash Course

Floating on a RAFT: HBase Durability with Apache Ratis

Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi

HBase Tales From the Trenches - Short stories about most common HBase operati...

Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...

Managing the Dewey Decimal System

Practical NoSQL: Accumulo's dirlist Example

HBase Global Indexing to support large-scale data ingestion at Uber

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

Supporting Apache HBase : Troubleshooting and Supportability Improvements

Security Framework for Multitenant Architecture

Presto: Optimizing Performance of SQL-on-Anything Engine

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

Extending Twitter's Data Platform to Google Cloud

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

Computer Vision: Coming to a Store Near You

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Último

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity

Training state-of-the-art general text embeddingZilliz

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

CloudStudio User manual (basic edition):comworks

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Story boards and shot lists for my a level piececharlottematthew16

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Scalable On-Demand Hadoop Clusters with Docker and Mesos

1. Scalable On-Demand Hadoop Clusters with Docker and Mesos Andrew Nelson, Nutanix @vmwnelson http://virtual-hiking.blogspot.com Chris Mutchler, VMware @chrismutchler http://virtualelephant.com V

2. Agenda  New Approach for Hadoop Ops  Infrastructure Resource Considerations  Docker as the new “Unit of Work”  Future Work 2

3. Last Year’s State of the Art  Self-service and multi-tenant Hadoop  Elastic and decoupled infrastructure  Extensible blueprinting 3

4. New Goals  Operationalize multiple frameworks  Decoupled service architecture  Flexible and developer-friendly form factor 4

5. Apache Mesos Introduction  Started at Berkeley  Graduated to top level Apache project 2013  Commercial entity is Mesosphere  https://github.com/apache/mesos/ 5

6. Mesos Architecture 6 Source: http://mesos.apache.org/assets/img/documentation/architecture3.jpg

7. Mesos as a Multi-Tenant Resource Pool 7 Source: https://github.com/mesos/myriad/blob/phase1/docs/how-it-works.md

8. Tools to Build and Scale  Serengeti, Vmware  https://github.com/vmware-serengeti  BOSH, Pivotal  https://github.com/cloudfoundry/bosh  Cloudify, Gigaspaces  https://github.com/CloudifySource/cloudify  Cloudbreak, SequenceIQ  https://github.com/sequenceiq/cloudbreak 8

9. Advantages for Ops  Mesos as a Resource Pool  Multiple concurrent frameworks  Decouple frameworks from resource pools 9

10. Compute Partitions on Mesos 10 Shared Hadoop Storm Spark Kafka Hadoop Cassandra Storm Spark Marathon Cassandra Siloed

11. HDFS as a Service 11 Namenode Standby Namenode Secondary Namenode HDFS MapReduce Spark Hive Storm …

12. Networking Services  Service Discovery  Handled per framework  Port range resource managed by Mesos slave  For example, Marathon uses HAProxy for request routing  Per-container network monitoring  Egress rate-limiting 12

13. Scheduling Options  Mesos scheduling  Capacity Scheduler  Fair Scheduler  Tenant scheduling examples  Hadoop on Mesos  Myriad (YARN) on Mesos 13

14. Dev Workflow  Code Repo / Registry  Pull / Push / Commit / Run  Automated Builds  Version tagging  Marathon CI / CD  Dependencies  Rolling restarts 14

15. Registry Services  Pluggable storage  Webhooks  Image control  Security  Logging 15 Registry Repository Repository Image Image Image

16. Advantages for Developers  Interchangeable verbs for code<->containers  Choice of framework to use as their PaaS  Adopt microservices approach to app pipeline 16

17. Recommendations for Success  Start small, scale fast  Use most appropriate framework for the job  Think ahead, decouple  Plan for rolling restart capacity up front 17

18. Gap Analysis  Be prepared to “look under the hood”  Variable maturity and resiliency of the layers  Networking  Security 18

19. Where Are We Going Next  Scale and learn  Container-focused OS  Software-defined networking services  Discover key performance and availability metrics 19

20. Wrapping up  Mesos allows for choice of framework  Devs utilize Docker with familiar workflow  Portable, flexible, and scalable architecture 20

Notas do Editor

I'm going to be discussing some new opportunities to change the operational model of Hadoop and how to accommodate new services as well as work on better integration and end to end testing of modern application pipelines. This has everything to do with how ops can provide devs with the most flexible building environment without stretching too far to try and support everything. Key takeaways: Hadoop+docker for lightweight self-service on your laptop, in your cloud For building modern app pipelines, need CI/CD, to iterate faster, need this self-service, customizable framework to build what the devs want to build Evaluate whether yarn fits your needs or mesos Just pick a physical form factor or pick a cloud and move on, with portability in mind, unique situation in so many software choices that will affect your ultimate product more than hardware will Test and iterate, scale and learn
Last year, Chris and I talked about how Adobe was virtualizing their Hadoop clusters in order to emulate a public cloud environment. Developers wanted to be able to be more flexible in what kind of Hadoop cluster was deployed, sizing, which templates, and which distro they wanted to work with. All of these things could be customized and were enabled for self-service. Potentially each developer could utilize their own private, dedicated cluster for experimentation and not have to worry about dedicated hardware. The automation and blueprints necessary were shared via catalog and extended to accommodate more than just Hadoop to include other distributed systems such as Storm, Kafka, Mesos, etc.
One key realization is that you can't get there with just one framework. There are a ton of different solutions out there for cluster management and for different frameworks, different building blocks that devs can use to build their app and its date pipeline. So we needed to be able to be more flexible in giving developers options for building their desired service. Should they be building realtime or batch workloads, how will they scale? What if parameters need to be changed as they scale? So many questions and new code to look at and devs need to be just as quick about evaluating what tools are helpful and worth including as what code they are adding in themselves With all of these different frameworks, and to retain the element of flexibility once they go down a road, the devs need to ensure they remain loosely coupled. Otherwise all this flexibility was kinda pointless. What's flexible about having to go back and start from scratch? You could do that before and it was in a lot simpler system right? Now we're all platform-building, even if we're using someone else's services to bootstrap basic functionality. We need to deliver reliability somewhere before we get to the top of the stack. That's what CI and CD are basically about, imo. So what we need that is telatively portable, easily resizable across these different frameworks and reasonably self-contained so that we can pick it up and move it around when we need to? Last year the currency was VMs. We could resize, repurpose, share hardware, and blueprint. I have worked with VMs in high performance and I don't think that's the issue. However, they are not developer-friendly. Dev-friendly to me is basically infrastructure as code, or even infra as text files. As an architect I want devs to feel free to customize, do it themselves, and be able to interact with the system in a form factor that is consistent with their processes. Key part of self-service is choice
Users: Twitter, Airbnb, Apple, Ebay Aurora, Marathon, Chronos
http://mesos.apache.org/documentation/latest/mesos-architecture/ http://mesos.apache.org/assets/img/documentation/architecture3.jpg So from an infara perspective, why not just work on YARN. Well, YARN is not a hierarchical scheduler frmawork. It’s a framework for writing scalable analytics jobs and it does that really well. But how to encapsulate infra for jobs that don't fit that model. Maybe next year, YARN will have a competely different set of capabilities but for now, we have devs with those diverse set of job characteristics. Allows for multiple executors Allows for multiple independent schedulers Allows for multiple frameworks / toolsets Highly available master The master enables fine-grained sharing of resources (cpu, ram, …) across applications by making them resource offers. Each resource offer contains a list of . The master decides how many resources to offer to each framework according to a given organizational policy, such as fair sharing, or strict priority. To support a diverse set of policies, the master employs a modular architecture that makes it easy to add new allocation modules via a plugin mechanism. A framework running on top of Mesos consists of two components: a scheduler that registers with the master to be offered resources, and an executor process that is launched on slave nodes to run the framework’s tasks (/documentation/latest/see theApp/Framework development guide for more details about application schedulers and executors). While the master determines how many resources are offered to each framework, the frameworks' schedulers select which of the offered resources to use. When a frameworks accepts offered resources, it passes to Mesos a description of the tasks it wants to run on them. In turn, Mesos launches the tasks on the corresponding slaves.
https://github.com/mesos/myriad/blob/phase1/docs/how-it-works.md https://github.com/mesos/myriad/raw/phase1/docs/images/how-it-works.png Each tenant has their own framework Each tenant can derive their own scheduling Each tenant can leverage services in a decoupled fashion
This list will probably keep growing before it becomes consolidated. This is about blueprinting the distributed systems. There will typically be an infrastructure layer and a configuration management layer. Vmw is a solution based on vmware vcenter and chef obviously. There is the flexibility of creating your own roles and recipes but dependent on vmw licensing based on sockets. There is only a single template ever at any given time and calls are blocking meaning only one cluster can be in any stage of cration at any given time. Bosh is its own animal, originally conceived as a way to stand up cloud foundry because it is its own distributed system that can't instantiate itself. There is a director-based version or bosh-init as a quick and less heavyweight CLI. Bosh uses yaml as its conf format of choice. It can handle any cloud platform with a known CPI or cloud platform interface. Its templates are called stemcells. It has an async queue kv store with multiple workers that can build in parallel. Networking and dns are fully declared in the manifest but have to be much more explicit. Cloudbreak is relatively new cloud agnostic framework that uses cloud specific APIs for building out components, for example aws cloudformation. For hadoop blueprints, it uses ambari and at the guest-image level, everything is docker with swarm for clustering and consul for communication and service mgmt Clouidfy uses open source tosca blueprints which are yaml files that contain srvice definitions, tiers and dependencies. Cloudify determines the infra compatibility layer and config mgmt is chef or puppet
Mesos is fundamentally a framework for accommodating different frameworks on the same hardware using cgroups, docker
http://mesos.apache.org/documentation/latest/mesos-frameworks/ Compute is determined by resource offers. Instead of trying to fit a workload on whats left of a host, the host or worker advertises some resources, its up to the framework what it can accept and provision or wait.
You have HA, checkpointing, and a common durable and resilient storage layer that can support the ecosystem of compute platforms. MapReduce (batch) Spark (In-memory) HIVE (SQL) Storm (streaming) Solr (Lucene Search) Flume Kafka (with Camus)
Imo, the most immature portion of the tenant svcs of mesos but still headed in the right direction. Frameworks don’t want to manage ports or physical networking. Allow for per container granularity monitoring and logging which is good for debugging.
These are the top-level scheduling algorithms that Mesos can use. Remember that it’s a hierarchy. When a job request comes into the YARN resource manager, YARN evaluates all the resources available, and it places the job. It’s the one making the decision where jobs should go… YARN is optimized for scheduling Hadoop jobs, which are historically (and still typically) batch jobs with long run times. This means that YARN was not designed for long-running services, nor for short-lived interactive queries…, and while it’s possible to have it schedule other kinds of workloads, this is not an ideal model. … uses a two-level scheduling mechanism where resource offers are made to frameworks (applications that run on top of Mesos). The Mesos master node decides how many resources to offer each framework, while each framework determines the resources it accepts and what application to execute on those resources. This method of resource allocation allows near-optimal data locality when sharing a cluster of nodes amongst diverse frameworks. This open source software project is both a Mesos framework and a YARN scheduler that enables Mesos to manage YARN resource requests. When a job comes into YARN, it will schedule it via the Myriad Scheduler, which will match the request to incoming Mesos resource offers. Mesos, in turn, will pass it on to the Mesos worker nodes. The Mesos nodes will then communicate the request to a Myriad executor which is running the YARN node manager. Myriad launches YARN node managers on Mesos resources, which then communicate to the YARN resource manager what resources are available to them. YARN can then consume the resources as it sees fit. Myriad provides a seamless bridge from the pool of resources available in Mesos to the YARN tasks that want those resources.
Developers can push their code and Dockerfile to Git, as they usually do From there, Jenkins can build a container from the Dockerfile and then publish to a registry
As typical, will there be template-creep? Container-creep? Image curation and testing necessary, but hopefully this fits into your CI/CD methodology.
Working with Docker for developers should feel very familiar. Docker push, pull, commit Version dependency and tag-based search verbs Can choose from Marathon, YARN 2.7.0 CI/CD with cloudbees, shippable, drone, jenkins, on and on
Logging is key, of course, best to test and iterate since stuff will break and pick a method that allows you to revert easily Decouple! Be ready to pull in network teams and security teams early and often The SDN decoupling is in progress but for now, infra should be ready to be explicit so devs don’t have to be Don’t just shift complexity, abstract Security, SDLC and infrastructure and ops and…
Often need to change as we scale Remove the guest os as much as possible, options are multiplying, coreos, lxd, msft nano, rhat atomic, vmware photon Don’t know which will work better so need to test and iterate, ultimately we want decoupled so it doesn’t or shouldn’t matter A lot of maturation in the SDN space, controllers are just reaching scalability of thousands of VMs, what happens when I throw a million containers at them? Test and iterate
YARN can be first class citizen, avoids siloeing datacenter Avoid siloing dev into specific frameworks Docker is the new currency for continuous test and deployment of code in infrastructure as text form factor for CI/CD

Scalable On-Demand Hadoop Clusters with Docker and Mesos

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Destaque

Destaque (20)

Semelhante a Scalable On-Demand Hadoop Clusters with Docker and Mesos

Semelhante a Scalable On-Demand Hadoop Clusters with Docker and Mesos (20)

Mais de DataWorks Summit

Mais de DataWorks Summit (20)

Último

Último (20)

Scalable On-Demand Hadoop Clusters with Docker and Mesos

Notas do Editor