Scalable On-Demand Hadoop Clusters with Docker and Mesos
1. Scalable On-Demand Hadoop
Clusters with Docker and
Mesos
Andrew Nelson, Nutanix
@vmwnelson http://virtual-hiking.blogspot.com
Chris Mutchler, VMware
@chrismutchler http://virtualelephant.com
V
2. Agenda
New Approach for Hadoop Ops
Infrastructure Resource Considerations
Docker as the new “Unit of Work”
Future Work
2
3. Last Year’s State of the Art
Self-service and multi-tenant Hadoop
Elastic and decoupled infrastructure
Extensible blueprinting
3
4. New Goals
Operationalize multiple frameworks
Decoupled service architecture
Flexible and developer-friendly form factor
4
5. Apache Mesos Introduction
Started at Berkeley
Graduated to top level Apache project
2013
Commercial entity is Mesosphere
https://github.com/apache/mesos/
5
11. HDFS as a Service
11
Namenode
Standby
Namenode
Secondary
Namenode
HDFS
MapReduce
Spark
Hive
Storm
…
12. Networking Services
Service Discovery
Handled per framework
Port range resource managed by Mesos slave
For example, Marathon uses HAProxy for request routing
Per-container network monitoring
Egress rate-limiting
12
16. Advantages for Developers
Interchangeable verbs for code<->containers
Choice of framework to use as their PaaS
Adopt microservices approach to app pipeline
16
17. Recommendations for Success
Start small, scale fast
Use most appropriate framework for the job
Think ahead, decouple
Plan for rolling restart capacity up front
17
18. Gap Analysis
Be prepared to “look under the hood”
Variable maturity and resiliency of the layers
Networking
Security
18
19. Where Are We Going Next
Scale and learn
Container-focused OS
Software-defined networking services
Discover key performance and availability metrics
19
20. Wrapping up
Mesos allows for choice of framework
Devs utilize Docker with familiar workflow
Portable, flexible, and scalable architecture
20
Notas do Editor
I'm going to be discussing some new opportunities to change the operational model of Hadoop and how to accommodate new services as well as work on better integration and end to end testing of modern application pipelines. This has everything to do with how ops can provide devs with the most flexible building environment without stretching too far to try and support everything.
Key takeaways:
Hadoop+docker for lightweight self-service on your laptop, in your cloud
For building modern app pipelines, need CI/CD, to iterate faster, need this self-service, customizable framework to build what the devs want to build
Evaluate whether yarn fits your needs or mesos
Just pick a physical form factor or pick a cloud and move on, with portability in mind, unique situation in so many software choices that will affect your ultimate product more than hardware will
Test and iterate, scale and learn
Last year, Chris and I talked about how Adobe was virtualizing their Hadoop clusters in order to emulate a public cloud environment. Developers wanted to be able to be more flexible in what kind of Hadoop cluster was deployed, sizing, which templates, and which distro they wanted to work with. All of these things could be customized and were enabled for self-service. Potentially each developer could utilize their own private, dedicated cluster for experimentation and not have to worry about dedicated hardware. The automation and blueprints necessary were shared via catalog and extended to accommodate more than just Hadoop to include other distributed systems such as Storm, Kafka, Mesos, etc.
One key realization is that you can't get there with just one framework. There are a ton of different solutions out there for cluster management and for different frameworks, different building blocks that devs can use to build their app and its date pipeline. So we needed to be able to be more flexible in giving developers options for building their desired service. Should they be building realtime or batch workloads, how will they scale? What if parameters need to be changed as they scale? So many questions and new code to look at and devs need to be just as quick about evaluating what tools are helpful and worth including as what code they are adding in themselves
With all of these different frameworks, and to retain the element of flexibility once they go down a road, the devs need to ensure they remain loosely coupled. Otherwise all this flexibility was kinda pointless. What's flexible about having to go back and start from scratch? You could do that before and it was in a lot simpler system right?
Now we're all platform-building, even if we're using someone else's services to bootstrap basic functionality. We need to deliver reliability somewhere before we get to the top of the stack. That's what CI and CD are basically about, imo. So what we need that is telatively portable, easily resizable across these different frameworks and reasonably self-contained so that we can pick it up and move it around when we need to? Last year the currency was VMs. We could resize, repurpose, share hardware, and blueprint. I have worked with VMs in high performance and I don't think that's the issue. However, they are not developer-friendly. Dev-friendly to me is basically infrastructure as code, or even infra as text files. As an architect I want devs to feel free to customize, do it themselves, and be able to interact with the system in a form factor that is consistent with their processes.
Key part of self-service is choice
http://mesos.apache.org/documentation/latest/mesos-architecture/
http://mesos.apache.org/assets/img/documentation/architecture3.jpg
So from an infara perspective, why not just work on YARN. Well, YARN is not a hierarchical scheduler frmawork. It’s a framework for writing scalable analytics jobs and it does that really well. But how to encapsulate infra for jobs that don't fit that model. Maybe next year, YARN will have a competely different set of capabilities but for now, we have devs with those diverse set of job characteristics.
Allows for multiple executors
Allows for multiple independent schedulers
Allows for multiple frameworks / toolsets
Highly available master
The master enables fine-grained sharing of resources (cpu, ram, …) across applications by making them resource offers. Each resource offer contains a list of . The master decides how many resources to offer to each framework according to a given organizational policy, such as fair sharing, or strict priority. To support a diverse set of policies, the master employs a modular architecture that makes it easy to add new allocation modules via a plugin mechanism.
A framework running on top of Mesos consists of two components: a scheduler that registers with the master to be offered resources, and an executor process that is launched on slave nodes to run the framework’s tasks (/documentation/latest/see theApp/Framework development guide for more details about application schedulers and executors). While the master determines how many resources are offered to each framework, the frameworks' schedulers select which of the offered resources to use. When a frameworks accepts offered resources, it passes to Mesos a description of the tasks it wants to run on them. In turn, Mesos launches the tasks on the corresponding slaves.
https://github.com/mesos/myriad/blob/phase1/docs/how-it-works.md
https://github.com/mesos/myriad/raw/phase1/docs/images/how-it-works.png
Each tenant has their own framework
Each tenant can derive their own scheduling
Each tenant can leverage services in a decoupled fashion
This list will probably keep growing before it becomes consolidated. This is about blueprinting the distributed systems. There will typically be an infrastructure layer and a configuration management layer.
Vmw is a solution based on vmware vcenter and chef obviously. There is the flexibility of creating your own roles and recipes but dependent on vmw licensing based on sockets. There is only a single template ever at any given time and calls are blocking meaning only one cluster can be in any stage of cration at any given time.
Bosh is its own animal, originally conceived as a way to stand up cloud foundry because it is its own distributed system that can't instantiate itself. There is a director-based version or bosh-init as a quick and less heavyweight CLI. Bosh uses yaml as its conf format of choice. It can handle any cloud platform with a known CPI or cloud platform interface. Its templates are called stemcells. It has an async queue kv store with multiple workers that can build in parallel. Networking and dns are fully declared in the manifest but have to be much more explicit.
Cloudbreak is relatively new cloud agnostic framework that uses cloud specific APIs for building out components, for example aws cloudformation. For hadoop blueprints, it uses ambari and at the guest-image level, everything is docker with swarm for clustering and consul for communication and service mgmt
Clouidfy uses open source tosca blueprints which are yaml files that contain srvice definitions, tiers and dependencies. Cloudify determines the infra compatibility layer and config mgmt is chef or puppet
Mesos is fundamentally a framework for accommodating different frameworks on the same hardware using cgroups, docker
http://mesos.apache.org/documentation/latest/mesos-frameworks/
Compute is determined by resource offers. Instead of trying to fit a workload on whats left of a host, the host or worker advertises some resources, its up to the framework what it can accept and provision or wait.
You have HA, checkpointing, and a common durable and resilient storage layer that can support the ecosystem of compute platforms.
MapReduce (batch)
Spark (In-memory)
HIVE (SQL)
Storm (streaming)
Solr (Lucene Search)
Flume
Kafka (with Camus)
Imo, the most immature portion of the tenant svcs of mesos but still headed in the right direction. Frameworks don’t want to manage ports or physical networking. Allow for per container granularity monitoring and logging which is good for debugging.
These are the top-level scheduling algorithms that Mesos can use. Remember that it’s a hierarchy.
When a job request comes into the YARN resource manager, YARN evaluates all the resources available, and it places the job. It’s the one making the decision where jobs should go… YARN is optimized for scheduling Hadoop jobs, which are historically (and still typically) batch jobs with long run times. This means that YARN was not designed for long-running services, nor for short-lived interactive queries…, and while it’s possible to have it schedule other kinds of workloads, this is not an ideal model.
… uses a two-level scheduling mechanism where resource offers are made to frameworks (applications that run on top of Mesos). The Mesos master node decides how many resources to offer each framework, while each framework determines the resources it accepts and what application to execute on those resources. This method of resource allocation allows near-optimal data locality when sharing a cluster of nodes amongst diverse frameworks.
This open source software project is both a Mesos framework and a YARN scheduler that enables Mesos to manage YARN resource requests. When a job comes into YARN, it will schedule it via the Myriad Scheduler, which will match the request to incoming Mesos resource offers. Mesos, in turn, will pass it on to the Mesos worker nodes. The Mesos nodes will then communicate the request to a Myriad executor which is running the YARN node manager. Myriad launches YARN node managers on Mesos resources, which then communicate to the YARN resource manager what resources are available to them. YARN can then consume the resources as it sees fit. Myriad provides a seamless bridge from the pool of resources available in Mesos to the YARN tasks that want those resources.
Developers can push their code and Dockerfile to Git, as they usually do
From there, Jenkins can build a container from the Dockerfile and then publish to a registry
As typical, will there be template-creep? Container-creep? Image curation and testing necessary, but hopefully this fits into your CI/CD methodology.
Working with Docker for developers should feel very familiar.
Docker push, pull, commit
Version dependency and tag-based search verbs
Can choose from Marathon, YARN 2.7.0
CI/CD with cloudbees, shippable, drone, jenkins, on and on
Logging is key, of course, best to test and iterate since stuff will break and pick a method that allows you to revert easily
Decouple!
Be ready to pull in network teams and security teams early and often
The SDN decoupling is in progress but for now, infra should be ready to be explicit so devs don’t have to be
Don’t just shift complexity, abstract
Security, SDLC and infrastructure and ops and…
Often need to change as we scale
Remove the guest os as much as possible, options are multiplying, coreos, lxd, msft nano, rhat atomic, vmware photon
Don’t know which will work better so need to test and iterate, ultimately we want decoupled so it doesn’t or shouldn’t matter
A lot of maturation in the SDN space, controllers are just reaching scalability of thousands of VMs, what happens when I throw a million containers at them? Test and iterate
YARN can be first class citizen, avoids siloeing datacenter
Avoid siloing dev into specific frameworks
Docker is the new currency for continuous test and deployment of code in infrastructure as text
form factor for CI/CD