Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1exrtyR.
Brenden Matthews describes the infrastructure built at Airbnb using Mesos in order to support Hadoop and Storm. Filmed at qconsf.com.
Brenden Matthews is a software engineer at Airbnb on the data infrastructure team. He's the creator of Conky (a system monitor for X), an Apache commiter, and a free software enthusiast & advocate.
2. Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/airbnb-data-infrastructure
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
3. Presented at QCon San Francisco
www.qconsf.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
4. Alternative Titles
● Datacentres of the future
● Building HA infrastructure
● Building automated HA infrastructure
● Data & Infrastructure
10. Apache Mesos
●
●
●
●
Master/slave architecture
One master elected among
themselves
Most of the state is contained in
the slaves themselves
Master doesn’t do much:
○ Manages resources
○ Acts as a go-between for
slaves and frameworks
Master
Master
Slave
Slave
Slave
Slave
ZooKeeper
ZooKeeper
ZooKeeper
11. Apache Mesos: Components
●
●
●
●
libprocess
○ Components communicate using async messaging
○ Messages are immutable; internals easily parallelized
Master
○ Offers slave resources to frameworks
○ Launches tasks on slaves for accepted offers
○ Forwards status messages between tasks and frameworks
○ Task reconciliation for frameworks
Slave
○ Monitors individual tasks, reports status to master
○ Performs resource monitoring on tasks
○ Ensures tasks don’t exceed resource limits (cgroups)
Framework (i.e., your application)
○ Receives resource offers from master
○ Launches tasks for acceptable offers
12. Apache Mesos: Slave Detail
●
●
●
●
●
●
Slaves are configured with a resource
policy
Slaves execute tasks, which are submitted
by frameworks
Task resource limits are enforced with
cgroups
Tasks that exceed memory limit will be
killed (OOM’d)
Resources:
○ CPU, mem, ports (‘standard’)
○ network, and user defined parameters
Recovery: slaves can be restarted without
killing tasks (cool!)
Framework
CPU
Memory
Share
Chronos
1
1
3%
Storm
5
5
15%
Marathon
16
30
50%
*
32
60
100%
13. Apache Mesos: Framework Detail
●
●
●
●
●
●
Frameworks are applications that run on Mesos
The framework runs as a separate process, either on it’s own or as
a Mesos task itself (more on this later)
Frameworks must decide whether resource offers are sufficient
before launching a task
Once tasks are launched, frameworks must wait for status updates
and monitor the state of tasks
Task state can be reconciled with the Mesos master
Framework state may be stored using the Mesos State API (a keyvalue store)
15. Apache Mesos: Framework Detail
Resource offer handling sample in JavaScala
public void resourceOffers(SchedulerDriver schedulerDriver,
continued…
List<Offer> offers) {
for (offer <- offers) { // this is actually Scala
final boolean sufficient = computeSlots();
if (!sufficient) {
// Launch TaskTrackers to satisfy the slot requirements.
schedulerDriver.declineOffer(offer.getId());
// Pull out the cpus, memory, disk, and 2 ports from the
offer.
continue;
for (Resource resource : offer.getResourcesList()) {
}
if (resource.getName().equals("cpus")
schedulerDriver.launchTasks(offer.getId(),
&& resource.getType() == Value.Type.SCALAR) {
cpus = resource.getScalar().getValue();
cpuRole = resource.getRole();
} else if (resource.getName().equals("mem")
&& resource.getType() == Value.Type.SCALAR) {
mem = resource.getScalar().getValue();
memRole = resource.getRole();
} else if (resource.getName().equals("disk")
&& resource.getType() == Value.Type.SCALAR) {
//...
Arrays.asList(info));
}
16. Apache Mesos: Framework Detail
●
●
:(
Writing frameworks is not for everyone! (it’s a bit tricky)
Frameworks like Marathon and Apache Aurora make it possible to
write applications atop Mesos without having to worry about Mesos
17. Apache Mesos: Framework Detail
●
●
Writing frameworks is not for everyone! (it’s a bit tricky)
Frameworks like Marathon and Apache Aurora make it possible to
write applications atop Mesos without having to worry about Mesos
●
●
The Mesos framework ecosystem is alive and well!
A quadfecta of frameworks cover most use cases:
○ Hadoop - batch processing
○ Storm - stream processing
○ Chronos - task scheduling
○ Marathon or Aurora - long running services
18. Frameworks: Hadoop
● Hadoop on Mesos behaves like any other
Hadoop (except, perhaps, YARN)
● Code lives at https://github.
com/mesos/hadoop
19. Frameworks: Storm
● Storm is a distributed stream processing
framework
● ‘doing for realtime processing what Hadoop
did for batch processing’ — Nathan Marz
● Storm runs on Mesos at Twitter, but does
not ship with a Mesos scheduler
● Code lives at https://github.
com/brndnmtthws/storm
20. Frameworks: Chronos
● Chronos is a task scheduler that runs on
Mesos
● Could be thought of as ‘distributed cron on
Mesos’
● Code lives at https://github.
com/airbnb/chronos
21. Frameworks: Apache Aurora
● Aurora is a service framework developed at
Twitter - a significant portion of Twitter’s
infrastructure runs atop Aurora
● Aurora was announced as an Apache
Incubator project on Oct 1st, 2013
● Code lives at https://github.
com/twitter/aurora
22. Frameworks: Marathon
● Marathon is a framework for running
services on Mesos, similar to Aurora
● Marathon can be thought of as a meta
framework (more on this later)
● Project was created by many of the folks
behind Chronos
● Code lives at https://github.
com/mesosphere/marathon
24. Marathon as a Meta-Framework
● Marathon is designed to run tasks and
guarantee they stay running
● Why not run Marathon on top of itself in
addition to other frameworks?
● Frameworks like Hadoop and Chronos can
be run atop Marathon today
26. High Availability
● Slaves execute tasks, and the slaves
themselves are independent of each other
● You may run frameworks as tasks on slaves
● A high availability cluster might consist of
having 1 or more Mesos masters, in addition
to frameworks, running as Mesos tasks
27. High Availability
Typical Mesos cluster
●
2 masters, 1 elected
●
2 instances of framework A,
1 elected
Master
Slave
T
T
T
Master
Slave
Framework A
T
T
T
Slave
Framework A
T
T
T
28. High Availability
HA Mesos cluster w/ Marathon
●
3 masters, 1 elected
●
3 instances of framework A,
1 elected
Master
Slave
T
T
T
Master
Slave
Framework A
T
T
T
Slave
Framework A
T
T
T
29. High Availability
HA Mesos cluster w/ Marathon
●
3 masters, 1 elected
●
3 instances of framework A,
1 elected
Master
Slave
T
T
T
Master
Master
Slave
Framework A
T
T
T
Slave
Framework A
T
T
T
Framework A
30. High Availability
● Split cluster across datacentres
○ us-east-1a
○ us-east-1b
○ us-east-1e
● Replication factor of 3 with rack awareness
reduces sleepless nights
31. Automated Infrastructure
● Every machine is exactly the same! (except
masters)
● Maintenance becomes as simple as
start/stopping slaves
● Application experts have greater control over
deployment, without the need for worrying
about resources
33. Airpad
● A small ruby library for deploying
applications (i.e., services) on Mesos with
Marathon
● Depends upon SmartStack, Airbnb’s service
discovery tool
34. Airpad
● Things we run (experimentally) with Airpad
○
○
○
○
○
○
○
Kafka
Cassandra
Presto
Chronos
Marathon
Hadoop JobTracker
Other internal tools
36. Other Lessons I’ve Learned
● Figure out how to manage state early on
○ Depend upon replicated services (Cassandra, Kafka,
HDFS)
○ Use replicated storage (S3, HDFS)
○ Create backups and restore processes
● Better to over-provision than under-provision
○ It’s easier to scale up than scale down
37.
38. Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/airbnbdata-infrastructure