SlideShare uma empresa Scribd logo
1 de 112
Baixar para ler offline
1
Downtime is not an option
How Fast Data and Microservices change the datacenter
@joerg_schad @dcos
© 2017 Mesosphere, Inc. All Rights Reserved. 2
Jörg Schad
Distributed Systems Engineer
@joerg_schad
© 2017 Mesosphere, Inc. All Rights Reserved. 3
In the beginning
there was a big
Monolith
© 2016 Mesosphere, Inc. All Rights Reserved. 4
© 2017 Mesosphere, Inc. All Rights Reserved.
Hardware
Operating System
Application
5
COMPUTERS
© 2017 Mesosphere, Inc. All Rights Reserved.
noun | ˈmīkrō/ /ˈsərvəs/ :
an approach to application development in which a
large application is built as a suite of modular services.
Each module supports a specific business goal and
uses a simple, well-defined interface to communicate
with other modules.*
Microservices are designed to be flexible, resilient,
efficient, robust, and individually scalable.
*From whatis.com
OVERVIEW
© 2017 Mesosphere, Inc. All Rights Reserved.
Operating
System
Operating
System
Operating
System
ServiceApp ServiceServiceAppApp
7
MICROSERVICES
- Polyglot
- Single Responsibility
- Smaller Teams
- Utilization
- Machine
types/groups
- Dependency hell
Machine
Infrastructure
Machine Machine
ServiceService ServiceServiceServiceService
© Gerard Julien/AFP
Run everything in containers!
© 2017 Mesosphere, Inc. All Rights Reserved.
ServiceApp ServiceServiceAppApp
OS
9
CONTAINERS
- Rapid deployment
- Dependency
vendoring
- Container image
repositories
- Spreadsheet
scheduling
OS OS
Machine
Infrastructure
Machine Machine
Container Runtime Container Runtime Container Runtime
ServiceService ServiceServiceServiceService
© 2017 Mesosphere, Inc. All Rights Reserved. 10
CONTAINER
SCHEDULING
RESOURCE
MANAGEMENT
SERVICE
MANAGEMENT
- Load Balancing
- Readiness Checking
CONTAINER ORCHESTRATION
© 2017 Mesosphere, Inc. All Rights Reserved. 11
CONTAINER
SCHEDULING
- Placement
- Replication/Scaling
- Resurrection
- Rescheduling
- Rolling Deployment
- Upgrades
- Downgrades
- Collocation
RESOURCE
MANAGEMENT
- Memory
- CPU
- GPU
- Volumes
- Ports
- IPs
- Images/Artifacts
SERVICE
MANAGEMENT
- Labels
- Groups/Namespaces
- Dependencies
- Load Balancing
- Readiness Checking
CONTAINER ORCHESTRATION
© 2017 Mesosphere, Inc. All Rights Reserved.
Orchestration
12
Machine Infrastructure
Web Apps & Services
Scheduling
Resource Management
Container Runtime
Machine & OS
Service Management
CONTAINER
ORCHESTRATION
Machine & OS Machine & OS
Container Runtime Container Runtime
© 2017 Mesosphere, Inc. All Rights Reserved. 13
MapReduce is
crunching Data
Meanwhile...
© 2016 Mesosphere, Inc. All Rights Reserved. 14
But then business
demanded
FAST DATA
We need to turn faster!
© 2016 Mesosphere, Inc. All Rights Reserved. 15
Fast Data
Batch Event ProcessingMicro-Batch
Days Hours Minutes Seconds Microseconds
Solves problems using predictive and prescriptive analyticsReports what has happened using descriptive analytics
Predictive User InterfaceReal-time Pricing and Routing Real-time AdvertisingBilling, Chargeback Product recommendations
© 2016 Mesosphere, Inc. All Rights Reserved. 16
The SMACK Stack
EVENTS
Ubiquitous data streams
from connected devices
INGEST
Apache Kafka
STORE
Apache Spark
ANALYZE
Apache Cassandra
ACT
Akka
Ingest millions of events
per second
Distributed & highly scalable
database
Real-time and batch
process data
Visualize data and build
data driven applications
Mesos/ DC/OS
Sensors
Devices
Clients
© 2017 Mesosphere, Inc. All Rights Reserved. 17
Datacenter
© 2017 Mesosphere, Inc. All Rights Reserved. 18
NAIVE APPROACH
Typical Datacenter
siloed, over-provisioned servers,
low utilization
Industry Average
12-15% utilization
mySQL
microservice
Cassandra
Spark/Hadoop
Kafka
© 2017 Mesosphere, Inc. All Rights Reserved. 19
© 2017 Mesosphere, Inc. All Rights Reserved. 20
MULTIPLEXING OF DATA, SERVICES, USERS, ENVIRONMENTS
Typical Datacenter
siloed, over-provisioned servers,
low utilization
Mesos/ DC/OS
automated schedulers, workload multiplexing onto the
same machines
mySQL
microservice
Cassandra
Spark/Hadoop
Kafka
© 2017 Mesosphere, Inc. All Rights Reserved.
• A top-level Apache project
• A cluster resource
negotiator
• Scalable to 10,000s of
nodes
• Fault-tolerant, battle-tested
• An SDK for distributed apps
• Native Docker support
21
Apache Mesos
© 2017 Mesosphere, Inc. All Rights Reserved. 22
Datacenter Operating System (DC/OS)
Distributed Systems Kernel (Mesos)
DC/OS ENABLES MODERN DISTRIBUTED APPS
Big Data + Analytics EnginesMicroservices (in containers)
Streaming
Batch
Machine Learning
Analytics
Functions &
Logic
Search
Time Series
SQL / NoSQL
Databases
Modern App Components
Any Infrastructure (Physical, Virtual, Cloud)
23
24
THE
BASICS
DC/OS is …
● 100% open source (ASL2.0)
+ A big, diverse community
● An umbrella for ~30 OSS projects
+ Roadmap and designs
+ Docs and tutorials
● Not limited in any way
● Familiar, with more features
+ Networking, Security, CLI, UI,
Service Discovery, Load Balancing,
Packages, ...
© 2016 Mesosphere, Inc. All Rights Reserved.
Container Options
Enhancements to the Mesos Containerizer to
allow support launching specific container
formats (Docker, AppC, OCI (future), etc)
● Reduces need to maintain and update
multiple containerizers
● Support multiple container formats with
a single containerizer
Image provisioner component added to the
Mesos containerizer - responsible for pulling,
caching, and preparing container root
filesystems
Launcher Isolators
Universal containerizer
Provisioner
Process
management
Container
lifecycle hook
Container
image support
© 2016 Mesosphere, Inc. All Rights Reserved. 26
DEMO
© 2016 Mesosphere, Inc. All Rights Reserved. 27
GEO-ENABLED IoT
© 2016 Mesosphere, Inc. All Rights Reserved. 28
DATA FLOW
© 2017 Mesosphere, Inc. All Rights Reserved. 29
Keep it running!
© 2016 Mesosphere, Inc. All Rights Reserved. 30
Monitoring
- Collecting metrics
- Routing events
- Downstream processing
- Alerting
- Dashboards
- Storage (long-term retention)
Logging
- Scopes
- Local vs. Central
- Security considerations
DAY 2 OPERATIONS
© 2016 Mesosphere, Inc. All Rights Reserved. 31
Maintenance
- Cluster Upgrades
- Cluster Resizing
- Capacity Planning
- User & Package Management
- Networking Policies
- Auditing
- Backups & Disaster Recovery
Troubleshooting
- Debugging
- Services
- System
- Access?
- Tracing
- Chaos Engineering
DAY 2 OPERATIONS
© 2017 Mesosphere, Inc. All Rights Reserved. 32
Troubleshooting
● Services: typically specific to service, use logging (for
example, dcos task log) and dcos node ssh for
per-node investigations
● dcos task exec
○ Permissions?
● System:
○ Simple diagnostics via dcos node diagnostics
○ Comprehensive dump via clump
© 2016 Mesosphere, Inc. All Rights Reserved. 33
THANK YOU!
ANY QUESTIONS?
@dcos
users@dcos.io
/groups/8295652
/dcos
/dcos/examples
/dcos/demos
chat.dcos.io
© 2017 Mesosphere, Inc. All Rights Reserved. 34
Failures
Framework
Scheduler
Executor
Task
Agent
LEADER STANDBY STANDBY
ZK
ZK
ZK
Executor
Task
Agent
© 2016 Mesosphere, Inc. All Rights Reserved.
Distributed Systems could be so easy...
35
1. The network is reliable.
2. Latency is zero.
3. Bandwidth is infinite.
4. The network is secure.
5. Topology doesn't change.
6. There is one administrator.
7. Transport cost is zero.
8. The network is homogeneous.
*)
https://en.wikipedia.org/wiki/Fallacies_of_distributed_comp
uting
© 2017 Mesosphere, Inc. All Rights Reserved. 36
Questions?
Code: https://git.io/vXUoy
http://
grnh.se/
ie76ru
© 2015 Mesosphere, Inc. All Rights Reserved. 37
Monitoring
© 2016 Mesosphere, Inc. All Rights Reserved.
METRICS
Measurements captured to determine health and performance of cluster
- How utilized is the cluster?
- Are resources being optimally used?
- Is the system performing better or worse over time?
- Are there bottlenecks in the system?
- What is the response time of applications?
© 2016 Mesosphere, Inc. All Rights Reserved.
DC/OS METRIC SOURCES
● Mesos metrics
○ Resource, frameworks, masters, agents,
tasks, system, events
● Container Metrics
○ CPU, mem, disk, network
● Application Metrics
○ QPS, latency, response time, hits, active
users, errors
OS
Mesos
Container ContainerContainer
App App App
© 2016 Mesosphere, Inc. All Rights Reserved.
Before upgrading
1. Make sure cluster is healthy!
2. Perform backup
a. ZK
b. Replicated logs
c. other state
3. Review release notes
4. Generate install bundle
a. Validate versions
UPGRADE PROCEDURE
Framework
Scheduler
Executor
Task
Agent
Executor
Task
Agent
LEADER STANDBY STANDBY
ZK
ZK
ZK
© 2016 Mesosphere, Inc. All Rights Reserved.
MESOS MASTER METRICS
● Metrics for the master node are available at the following URL:
○ http://<mesos-master-ip>/mesos/master/metrics/snapshot
○ The response is a JSON object that contains metrics names and values as
key-value pairs.
● Metric Groups:
○ Resources
○ Master
○ System
○ Slaves
○ Frameworks
○ Tasks
○ Messages
○ Event Queue
○ Registrar
© 2016 Mesosphere, Inc. All Rights Reserved.
MESOS MASTER BASIC ALERTS
Metric Value Inference
master/uptime_secs is low The master has restarted
master/uptime_secs < 60 for sustained periods of time The cluster has a flapping master node
master/tasks_lost is increasing rapidly Tasks in the cluster are disappearing. Possible causes include
hardware failures, bugs in one of the frameworks or bugs in
Mesos
master/slaves_active is low Slaves are having trouble connecting to the master
master/cpus_percent > 0.9 for sustained periods of time DCOS Cluster CPU utilization is close to capacity
master/mem_percent > 0.9 for sustained periods of time DCOS Cluster Memory utilization is close to capacity
master/disk_used & master/disk_percent DCOS Disk space consumed by Reservations
master/elected is 0 for sustained periods of time No Master is currently elected
© 2016 Mesosphere, Inc. All Rights Reserved. 43
Operations
UPGRADES
© 2016 Mesosphere, Inc. All Rights Reserved.
1. Master rolling upgrade
a. Start with standby
b. Install new DC/OS
2. Agent rolling upgrade
3. Framework upgrades
UPGRADE PROCEDURE
Framework
Scheduler
Executor
Task
Agent
LEADER STANDBY STANDBY
ZK
ZK
ZK
Executor
Task
Agent
© 2016 Mesosphere, Inc. All Rights Reserved.
1. Master rolling upgrade
2. Agent rolling upgrade
a. Uninstall DC/OS
b. Install new DC/OS
3. Framework upgrades
UPGRADE PROCEDURE
Framework
Scheduler
Executor
Task
Agent
LEADER STANDBY STANDBY
ZK
ZK
ZK
Executor
Task
Agent
© 2016 Mesosphere, Inc. All Rights Reserved.
1. Master rolling upgrade
2. Agent rolling upgrade
3. Framework upgrades
a. Orthogonal to DC/OS
b. Ensure changes don’t
affect existing apps
UPGRADE PROCEDURE
Framework
Scheduler
Executor
Task
Agent
LEADER STANDBY STANDBY
ZK
ZK
ZK
Executor
Task
Agent
© 2015 Mesosphere, Inc. All Rights Reserved. 47
Failure
Handling
© 2016 Mesosphere, Inc. All Rights Reserved. 48
Failure Handling
MESOS TASK
FAILURE
© 2015 Mesosphere, Inc. All Rights Reserved.
MESOS TASK FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
© 2015 Mesosphere, Inc. All Rights Reserved.
Status Update
Status Update
EXECUTOR
MESOS TASK FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT
TASK
AGENT
SEGFAULT :(
© 2015 Mesosphere, Inc. All Rights Reserved.
MESOS TASK FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT
EXECUTOR
TASK
Launch Task
Launch Task
AGENT
Launch Task
© 2015 Mesosphere, Inc. All Rights Reserved.
EXECUTOR
Status Update
Status Update
MESOS TASK FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT
TASK
AGENT
Status Update
© 2016 Mesosphere, Inc. All Rights Reserved. 53
Failure Handling
MESOS AGENT
FAILURE
© 2015 Mesosphere, Inc. All Rights Reserved.
LOCAL AGENT FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
© 2015 Mesosphere, Inc. All Rights Reserved.
LOCAL AGENT FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
© 2015 Mesosphere, Inc. All Rights Reserved.
LOCAL AGENT FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
© 2015 Mesosphere, Inc. All Rights Reserved.
LOCAL AGENT FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
© 2015 Mesosphere, Inc. All Rights Reserved.
LOCAL AGENT FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
Re-register
© 2016 Mesosphere, Inc. All Rights Reserved. 59
Failure Handling
MESOS HOST
FAILURE
© 2015 Mesosphere, Inc. All Rights Reserved.
LOCAL AGENT FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
© 2015 Mesosphere, Inc. All Rights Reserved.
LOCAL AGENT FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT
© 2015 Mesosphere, Inc. All Rights Reserved.
LOCAL AGENT FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT
Status Update
© 2015 Mesosphere, Inc. All Rights Reserved.
MESOS TASK FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT
EXECUTOR
TASK
Launch Task
Launch Task
Launch Task
Resource Offer
© 2015 Mesosphere, Inc. All Rights Reserved.
MESOS TASK FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT
EXECUTOR
TASK
Status Update
Status Update
Status Update
© 2016 Mesosphere, Inc. All Rights Reserved. 65
Failure Handling
MESOS MASTER
FAILURE
© 2015 Mesosphere, Inc. All Rights Reserved.
MASTER FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
© 2015 Mesosphere, Inc. All Rights Reserved.
MASTER FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
© 2015 Mesosphere, Inc. All Rights Reserved.
MASTER FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
© 2015 Mesosphere, Inc. All Rights Reserved.
MASTER FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
Leading Master Leading Master
Leading Master
© 2015 Mesosphere, Inc. All Rights Reserved.
MASTER FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
Reregister
Reregister
Reregister
Reregister
© 2015 Mesosphere, Inc. All Rights Reserved.
Reregistered
MASTER FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
Reregistered
Reregistered
Reregistered
© 2016 Mesosphere, Inc. All Rights Reserved. 72
Failure Handling
SCHEDULER
FAILURE
© 2015 Mesosphere, Inc. All Rights Reserved.
SCHEDULER FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
© 2015 Mesosphere, Inc. All Rights Reserved.
SCHEDULER FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
© 2015 Mesosphere, Inc. All Rights Reserved.
SCHEDULER FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
© 2015 Mesosphere, Inc. All Rights Reserved.
SCHEDULER FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
Framework ID
Leading Master
© 2015 Mesosphere, Inc. All Rights Reserved.
SCHEDULER FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
Reregister
© 2015 Mesosphere, Inc. All Rights Reserved.
SCHEDULER FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
Reregistered
© 2015 Mesosphere, Inc. All Rights Reserved.
SCHEDULER FAILURE
ZK
MASTERMARATHON
CLIENT AGENT AGENT AGENT
EXECUTOR
TASK
Status Update
Reconcile Tasks
© 2016 Mesosphere, Inc. All Rights Reserved.
1. Master rolling upgrade
a. Start with standby
b. Uninstall DC/OS
c. Install new DC/OS
2. Agent rolling upgrade
3. Framework upgrades
UPGRADE PROCEDURE
Framework
Scheduler
Executor
Task
Agent
LEADER STANDBY STANDBY
ZK
ZK
ZK
Executor
Task
Agent
© 2016 Mesosphere, Inc. All Rights Reserved.
1. Master rolling upgrade
2. Agent rolling upgrade
a. Uninstall DC/OS
b. Install new DC/OS
3. Framework upgrades
UPGRADE PROCEDURE
Framework
Scheduler
Executor
Task
Agent
LEADER STANDBY STANDBY
ZK
ZK
ZK
Executor
Task
Agent
© 2016 Mesosphere, Inc. All Rights Reserved.
1. Master rolling upgrade
2. Agent rolling upgrade
3. Framework upgrades
a. Orthogonal to DC/OS
b. Ensure changes don’t
affect existing apps
UPGRADE PROCEDURE
Framework
Scheduler
Executor
Task
Agent
LEADER STANDBY STANDBY
ZK
ZK
ZK
Executor
Task
Agent
© 2016 Mesosphere, Inc. All Rights Reserved.
FUTURES (TBD)
Leverage maintenance primitives in Mesos to
drain host
Upgrade management through DC/OS to
perform rolling upgrades
© 2016 Mesosphere, Inc. All Rights Reserved. 84
Monitoring
- Collecting metrics
- Routing events
- Downstream processing
- Alerting
- Dashboards
- Storage (long-term retention)
Logging
- Scopes
- Local vs. Central
- Security considerations
DAY 2 OPERATIONS
© 2016 Mesosphere, Inc. All Rights Reserved. 85
Maintenance
- Cluster Upgrades
- Cluster Resizing
- Capacity Planning
- User & Package Management
- Networking Policies
- Auditing
- Backups & Disaster Recovery
Troubleshooting
- Debugging
- Services
- System
- Tracing
- Chaos Engineering
DAY 2 OPERATIONS
© 2016 Mesosphere, Inc. All Rights Reserved. 86
MONITORING
© 2017 Mesosphere, Inc. All Rights Reserved. 87
MONITORING
CONCEPT
© 2017 Mesosphere, Inc. All Rights Reserved. 88
MONITORING
TOOLING
EXAMPLES
● local scraping:
a. collectd
b. cAdvisor*
● event router:
a. fluentd
b. Flume
c. Kafka*
d. logstash*
e. Riemann
*) available via Mesosphere Universe
© 2017 Mesosphere, Inc. All Rights Reserved. 89
MONITORING
TOOLING
EXAMPLES
● storage:
a. Elasticsearch*
b. Graphite
c. InfluxDB*
d. KairosDB/Cassandra*
e. OpenTSDB/HBase
f. others such a local filesystem, Ceph FS*,
HDFS*, etc.
*) available via Mesosphere Universe
© 2017 Mesosphere, Inc. All Rights Reserved. 90
MONITORING
TOOLING
EXAMPLES
● dashboard:
a. D3
b. Grafana*
c. signal fx
● alerting:
a. BigPanda
b. PagerDuty
c. signal fx
d. VictorOps
*) available via Mesosphere Universe
© 2017 Mesosphere, Inc. All Rights Reserved. 91
MONITORING
TOOLING
EXAMPLES
(INTEGRATED)
● Amazon CloudWatch
● AppDynamics
● Azure Monitor
● Circonus
● DataDog*
● dcos/metrics
● Ganglia
● Google Stackdriver
● Hawkular
● Icinga
● Librato
● Nagios
● New Relic
● OpsGenie
● Pingdom
● Prometheus
● Ruxit Dynatrace*
● Sensu
● Sysdig*
● Zabbix
*) available via Mesosphere Universe
© 2016 Mesosphere, Inc. All Rights Reserved. 92
LOGGING
© 2017 Mesosphere, Inc. All Rights Reserved. 93
LOGGING
SCOPES
© 2017 Mesosphere, Inc. All Rights Reserved. 94
LOGGING
TOOLING
EXAMPLES
(PRIMITIVES) ● DC/OS logging overview
● Docker logging drivers
● systemd's journalctl
© 2017 Mesosphere, Inc. All Rights Reserved. 95
LOGGING
TOOLING
EXAMPLES
(INTEGRATED)
● Centralized app logging with fluentd
● DC/OS
a. ELK stack log shipping
b. Splunk
● Graylog
● Loggly
● Papertrail
● Sumo Logic
© 2016 Mesosphere, Inc. All Rights Reserved. 96
MAINTENANCE
© 2017 Mesosphere, Inc. All Rights Reserved. 97
Overview
● How to install a new version of X?
● When to scale what (service-level vs. nodes)
● Who gets to access/install which services in what way?
Upgrades
Sizing
User and package management
● What services can talk to each other and in which way?
● Who accessed what, when and how?
● How is the continuous operation of the cluster and the services accomplished?
What happens when cluster (or critical infra components like ZK) go down?
Networking
Auditing
Disaster Recovery
© 2017 Mesosphere, Inc. All Rights Reserved. 98
OTHER
TROUBLESHOOTING
TECHNIQUES
● Tracing
○ Idea: identify latency issues and perform
root-cause analysis in a distributed setup
○ OpenTracing
● Chaos Engineering
○ Idea: proactively break (parts of) the system to
understand how it reacts
○ Chaos Monkey
○ DRAX
© 2016 Mesosphere, Inc. All Rights Reserved. 99
© 2017 Mesosphere, Inc. All Rights Reserved. 100
ARCHITECTURE
MESOS FUNDAMENTALS
● Agents advertise resources to Master
● Master offers resources to Framework
● Framework rejects/uses resources
● Agents report task status to Master
© 2017 Mesosphere, Inc. All Rights Reserved. 101
ARCHITECTURE
MESOS FUNDAMENTALS
● Agents advertise resources to Master
● Master offers resources to Framework
● Framework rejects/uses resources
● Agents report task status to Master
© 2017 Mesosphere, Inc. All Rights Reserved. 102
ARCHITECTURE
MESOS FUNDAMENTALS
● Agents advertise resources to Master
● Master offers resources to Framework
● Framework rejects/uses resources
● Agents report task status to Master
© 2017 Mesosphere, Inc. All Rights Reserved. 103
ARCHITECTURE
MESOS FUNDAMENTALS
● Agents advertise resources to Master
● Master offers resources to Framework
● Framework rejects/uses resources
● Agents report task status to Master
© 2017 Mesosphere, Inc. All Rights Reserved. 104
ARCHITECTURE
MESOS FUNDAMENTALS
● Agents advertise resources to Master
● Master offers resources to Framework
● Framework rejects/uses resources
● Agents report task status to Master
© 2017 Mesosphere, Inc. All Rights Reserved. 105
Questions?
Code: https://git.io/vXUoyPsssssssst …
… we are hiring!
http://
grnh.se/
ie76ru
© 2017 Mesosphere, Inc. All Rights Reserved.
CONTAINER SCHEDULING
106
© 2017 Mesosphere, Inc. All Rights Reserved.
RESOURCE MANAGEMENT
107
© 2017 Mesosphere, Inc. All Rights Reserved.
SERVICE MANAGEMENT
108
© 2017 Mesosphere, Inc. All Rights Reserved.
Service Service Service
Web App Web App Web App
Hardware
Operating
System
109
SERVICE-
ORIENTED
ARCHITECTURE
- Separation of
concerns
- Optimization of
bottlenecks
- Smaller teams
- API Contracts
- Data replication
- Complicated
provisioning
- Dependency
management
Operating
System
Operating
System
Hardware Hardware
© 2017 Mesosphere, Inc. All Rights Reserved.
Operating
System
Operating
System
Operating
System
ServiceApp ServiceServiceAppApp
110
MICROSERVICES
- Polyglot
- Single Responsibility
- Smaller Teams
- Utilization
- Machine
types/groups
- Dependency hell
Machine
Infrastructure
Machine Machine
ServiceService ServiceServiceServiceService
© 2017 Mesosphere, Inc. All Rights Reserved. 111
THE BIRTH OF MESOS
TWITTER TECH TALK
The grad students working on Mesos
give a tech talk at Twitter.
March 2010
APACHE INCUBATION
Mesos enters the Apache Incubator.
Spring 2009
CS262B
Ben Hindman, Andy Konwinski and
Matei Zaharia create “Nexus” as their
CS262B class project.
MESOS PUBLISHED
Mesos: A Platform for Fine-Grained
Resource Sharing in the Data Center is
published as a technical report.
September 2010
December 2010
DC/OS
April 2016
© 2015 Mesosphere, Inc. All Rights Reserved. 112
Monitoring
- Collecting metrics
- Routing events
- Downstream processing
○ Alerting
○ Dashboards
○ Storage (long-term retention)
Logging
- Scopes
- Local vs. Central
- Security considerations
Maintenance
- Cluster Upgrades
- Cluster Resizing
- Capacity Planning
- User & Package
Management
- Networking Policies
- Auditing
- Backups & Disaster
Recovery
Troubleshooting
- Debugging
○ Services
○ System
- Tracing
- Chaos Engineering

Mais conteúdo relacionado

Mais procurados

AWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for GovernmentAWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for GovernmentAmazon Web Services
 
Icinga Director and vSphereDB - how they play together - Icinga Camp Zurich 2019
Icinga Director and vSphereDB - how they play together - Icinga Camp Zurich 2019Icinga Director and vSphereDB - how they play together - Icinga Camp Zurich 2019
Icinga Director and vSphereDB - how they play together - Icinga Camp Zurich 2019Icinga
 
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...Vadym Kazulkin
 
Current State of Icinga - Icinga Camp Zurich 2019
Current State of Icinga - Icinga Camp Zurich 2019Current State of Icinga - Icinga Camp Zurich 2019
Current State of Icinga - Icinga Camp Zurich 2019Icinga
 
Trash Talk! How to Reduce Downtime by Tuning Garbage Collection
Trash Talk! How to Reduce Downtime by Tuning Garbage CollectionTrash Talk! How to Reduce Downtime by Tuning Garbage Collection
Trash Talk! How to Reduce Downtime by Tuning Garbage CollectionAtlassian
 
Introduction and Overview of OpenStack for IaaS
Introduction and Overview of OpenStack for IaaSIntroduction and Overview of OpenStack for IaaS
Introduction and Overview of OpenStack for IaaSKeith Basil
 
AWS to Bare Metal: Motivation, Pitfalls, and Results
AWS to Bare Metal: Motivation, Pitfalls, and ResultsAWS to Bare Metal: Motivation, Pitfalls, and Results
AWS to Bare Metal: Motivation, Pitfalls, and ResultsMongoDB
 
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...NETWAYS
 
Deploy Microsoft Azure Data Solutions
Deploy Microsoft Azure Data SolutionsDeploy Microsoft Azure Data Solutions
Deploy Microsoft Azure Data SolutionsMarco Parenzan
 
Episode 4: Operating Kubernetes at Scale with DC/OS
Episode 4: Operating Kubernetes at Scale with DC/OSEpisode 4: Operating Kubernetes at Scale with DC/OS
Episode 4: Operating Kubernetes at Scale with DC/OSMesosphere Inc.
 
Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...
Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...
Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...Docker, Inc.
 
On-demand Continuous Integration with Jenkins, jclouds, and CloudStack
On-demand Continuous Integration with Jenkins, jclouds, and CloudStackOn-demand Continuous Integration with Jenkins, jclouds, and CloudStack
On-demand Continuous Integration with Jenkins, jclouds, and CloudStackke4qqq
 
MongoDB Management & Ansible
MongoDB Management & AnsibleMongoDB Management & Ansible
MongoDB Management & AnsibleMongoDB
 
Cloud Foundry Summit 2015: Building a Robust Cloud Foundry (HA, Security and DR)
Cloud Foundry Summit 2015: Building a Robust Cloud Foundry (HA, Security and DR)Cloud Foundry Summit 2015: Building a Robust Cloud Foundry (HA, Security and DR)
Cloud Foundry Summit 2015: Building a Robust Cloud Foundry (HA, Security and DR)VMware Tanzu
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016aspyker
 
Take an Analytics-driven Approach to Container Performance with Splunk for Co...
Take an Analytics-driven Approach to Container Performance with Splunk for Co...Take an Analytics-driven Approach to Container Performance with Splunk for Co...
Take an Analytics-driven Approach to Container Performance with Splunk for Co...Docker, Inc.
 
Securing an Azure full-PaaS architecture - Data saturday #0001 Pordenone
Securing an Azure full-PaaS architecture - Data saturday #0001 PordenoneSecuring an Azure full-PaaS architecture - Data saturday #0001 Pordenone
Securing an Azure full-PaaS architecture - Data saturday #0001 PordenoneMarco Obinu
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardDocker, Inc.
 
Terraform for azure: the good, the bad and the ugly -
Terraform for azure: the good, the bad and the ugly - Terraform for azure: the good, the bad and the ugly -
Terraform for azure: the good, the bad and the ugly - Giulio Vian
 

Mais procurados (20)

AWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for GovernmentAWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for Government
 
Icinga Director and vSphereDB - how they play together - Icinga Camp Zurich 2019
Icinga Director and vSphereDB - how they play together - Icinga Camp Zurich 2019Icinga Director and vSphereDB - how they play together - Icinga Camp Zurich 2019
Icinga Director and vSphereDB - how they play together - Icinga Camp Zurich 2019
 
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
Measure and Increase Developer Productivity with Help of Serverless at JCON 2...
 
Current State of Icinga - Icinga Camp Zurich 2019
Current State of Icinga - Icinga Camp Zurich 2019Current State of Icinga - Icinga Camp Zurich 2019
Current State of Icinga - Icinga Camp Zurich 2019
 
Trash Talk! How to Reduce Downtime by Tuning Garbage Collection
Trash Talk! How to Reduce Downtime by Tuning Garbage CollectionTrash Talk! How to Reduce Downtime by Tuning Garbage Collection
Trash Talk! How to Reduce Downtime by Tuning Garbage Collection
 
Introduction and Overview of OpenStack for IaaS
Introduction and Overview of OpenStack for IaaSIntroduction and Overview of OpenStack for IaaS
Introduction and Overview of OpenStack for IaaS
 
AWS to Bare Metal: Motivation, Pitfalls, and Results
AWS to Bare Metal: Motivation, Pitfalls, and ResultsAWS to Bare Metal: Motivation, Pitfalls, and Results
AWS to Bare Metal: Motivation, Pitfalls, and Results
 
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
OSDC 2018 | The operational brain: how new Paradigms like Machine Learning ar...
 
Deploy Microsoft Azure Data Solutions
Deploy Microsoft Azure Data SolutionsDeploy Microsoft Azure Data Solutions
Deploy Microsoft Azure Data Solutions
 
Episode 4: Operating Kubernetes at Scale with DC/OS
Episode 4: Operating Kubernetes at Scale with DC/OSEpisode 4: Operating Kubernetes at Scale with DC/OS
Episode 4: Operating Kubernetes at Scale with DC/OS
 
Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...
Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...
Docker for Ops: Operationalize your Docker Built Apps in Production by Evan H...
 
On-demand Continuous Integration with Jenkins, jclouds, and CloudStack
On-demand Continuous Integration with Jenkins, jclouds, and CloudStackOn-demand Continuous Integration with Jenkins, jclouds, and CloudStack
On-demand Continuous Integration with Jenkins, jclouds, and CloudStack
 
MongoDB Management & Ansible
MongoDB Management & AnsibleMongoDB Management & Ansible
MongoDB Management & Ansible
 
Cloud Foundry Summit 2015: Building a Robust Cloud Foundry (HA, Security and DR)
Cloud Foundry Summit 2015: Building a Robust Cloud Foundry (HA, Security and DR)Cloud Foundry Summit 2015: Building a Robust Cloud Foundry (HA, Security and DR)
Cloud Foundry Summit 2015: Building a Robust Cloud Foundry (HA, Security and DR)
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
 
Take an Analytics-driven Approach to Container Performance with Splunk for Co...
Take an Analytics-driven Approach to Container Performance with Splunk for Co...Take an Analytics-driven Approach to Container Performance with Splunk for Co...
Take an Analytics-driven Approach to Container Performance with Splunk for Co...
 
Securing an Azure full-PaaS architecture - Data saturday #0001 Pordenone
Securing an Azure full-PaaS architecture - Data saturday #0001 PordenoneSecuring an Azure full-PaaS architecture - Data saturday #0001 Pordenone
Securing an Azure full-PaaS architecture - Data saturday #0001 Pordenone
 
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah BardUsing Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
Using Containers and HPC to Solve the Mysteries of the Universe by Deborah Bard
 
Terraform for azure: the good, the bad and the ugly -
Terraform for azure: the good, the bad and the ugly - Terraform for azure: the good, the bad and the ugly -
Terraform for azure: the good, the bad and the ugly -
 
JEEconf 2017
JEEconf 2017JEEconf 2017
JEEconf 2017
 

Destaque

Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native
Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native
Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native Codemotion
 
Building multi lingual and empatic bots - Sander van den Hoven - Codemotion A...
Building multi lingual and empatic bots - Sander van den Hoven - Codemotion A...Building multi lingual and empatic bots - Sander van den Hoven - Codemotion A...
Building multi lingual and empatic bots - Sander van den Hoven - Codemotion A...Codemotion
 
Lorenzo Barbieri - Serverless computing in Azure: Functions, Logic Apps and m...
Lorenzo Barbieri - Serverless computing in Azure: Functions, Logic Apps and m...Lorenzo Barbieri - Serverless computing in Azure: Functions, Logic Apps and m...
Lorenzo Barbieri - Serverless computing in Azure: Functions, Logic Apps and m...Codemotion
 
Marco Balduzzi - Cyber-crime and attacks in the dark side of the web - Codemo...
Marco Balduzzi - Cyber-crime and attacks in the dark side of the web - Codemo...Marco Balduzzi - Cyber-crime and attacks in the dark side of the web - Codemo...
Marco Balduzzi - Cyber-crime and attacks in the dark side of the web - Codemo...Codemotion
 
Thomas Rossetto - Container and microservices: a love story - Codemotion Mila...
Thomas Rossetto - Container and microservices: a love story - Codemotion Mila...Thomas Rossetto - Container and microservices: a love story - Codemotion Mila...
Thomas Rossetto - Container and microservices: a love story - Codemotion Mila...Codemotion
 
Carlo Ferrarini/Marco Dragoni - How to avoid delivery of unsanitary food with...
Carlo Ferrarini/Marco Dragoni - How to avoid delivery of unsanitary food with...Carlo Ferrarini/Marco Dragoni - How to avoid delivery of unsanitary food with...
Carlo Ferrarini/Marco Dragoni - How to avoid delivery of unsanitary food with...Codemotion
 
Mobile Library Development - stuck between a pod and a jar file - Zan Markan ...
Mobile Library Development - stuck between a pod and a jar file - Zan Markan ...Mobile Library Development - stuck between a pod and a jar file - Zan Markan ...
Mobile Library Development - stuck between a pod and a jar file - Zan Markan ...Codemotion
 
Nicola Corti/Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017
Nicola Corti/Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017Nicola Corti/Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017
Nicola Corti/Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017Codemotion
 
Webinar: Mario Cartia - Facciamo il Punto su Presente e Futuro dei framework ...
Webinar: Mario Cartia - Facciamo il Punto su Presente e Futuro dei framework ...Webinar: Mario Cartia - Facciamo il Punto su Presente e Futuro dei framework ...
Webinar: Mario Cartia - Facciamo il Punto su Presente e Futuro dei framework ...Codemotion
 
The Most Important Thing - Mike Lee - Codemotion Amsterdam 2017
The Most Important Thing - Mike Lee - Codemotion Amsterdam 2017The Most Important Thing - Mike Lee - Codemotion Amsterdam 2017
The Most Important Thing - Mike Lee - Codemotion Amsterdam 2017Codemotion
 
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...Codemotion
 
Francesco Arcieri - La monetizzazione delle API - Codemotion Milan 2017
Francesco Arcieri - La monetizzazione delle API - Codemotion Milan 2017Francesco Arcieri - La monetizzazione delle API - Codemotion Milan 2017
Francesco Arcieri - La monetizzazione delle API - Codemotion Milan 2017Codemotion
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Codemotion
 
Roberto Clapis/Stefano Zanero - Night of the living vulnerabilities: forever-...
Roberto Clapis/Stefano Zanero - Night of the living vulnerabilities: forever-...Roberto Clapis/Stefano Zanero - Night of the living vulnerabilities: forever-...
Roberto Clapis/Stefano Zanero - Night of the living vulnerabilities: forever-...Codemotion
 
Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017
Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017
Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017Codemotion
 
Andrea Maietta - Il fascino della supercazzola: un breve viaggio nel mondo de...
Andrea Maietta - Il fascino della supercazzola: un breve viaggio nel mondo de...Andrea Maietta - Il fascino della supercazzola: un breve viaggio nel mondo de...
Andrea Maietta - Il fascino della supercazzola: un breve viaggio nel mondo de...Codemotion
 
Agnieszka Naplocha - Breaking the norm with creative CSS - Codemotion Milan 2017
Agnieszka Naplocha - Breaking the norm with creative CSS - Codemotion Milan 2017Agnieszka Naplocha - Breaking the norm with creative CSS - Codemotion Milan 2017
Agnieszka Naplocha - Breaking the norm with creative CSS - Codemotion Milan 2017Codemotion
 
Luciano Fiandesio - Docker 101 | Codemotion Milan 2015
Luciano Fiandesio - Docker 101 | Codemotion Milan 2015Luciano Fiandesio - Docker 101 | Codemotion Milan 2015
Luciano Fiandesio - Docker 101 | Codemotion Milan 2015Codemotion
 
Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...
Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...
Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...Codemotion
 
Luciano Mammino - Cracking JWT tokens: a tale of magic, Node.JS and parallel...
Luciano Mammino  - Cracking JWT tokens: a tale of magic, Node.JS and parallel...Luciano Mammino  - Cracking JWT tokens: a tale of magic, Node.JS and parallel...
Luciano Mammino - Cracking JWT tokens: a tale of magic, Node.JS and parallel...Codemotion
 

Destaque (20)

Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native
Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native
Webinar - Matteo Manchi: Dal web al nativo: Introduzione a React Native
 
Building multi lingual and empatic bots - Sander van den Hoven - Codemotion A...
Building multi lingual and empatic bots - Sander van den Hoven - Codemotion A...Building multi lingual and empatic bots - Sander van den Hoven - Codemotion A...
Building multi lingual and empatic bots - Sander van den Hoven - Codemotion A...
 
Lorenzo Barbieri - Serverless computing in Azure: Functions, Logic Apps and m...
Lorenzo Barbieri - Serverless computing in Azure: Functions, Logic Apps and m...Lorenzo Barbieri - Serverless computing in Azure: Functions, Logic Apps and m...
Lorenzo Barbieri - Serverless computing in Azure: Functions, Logic Apps and m...
 
Marco Balduzzi - Cyber-crime and attacks in the dark side of the web - Codemo...
Marco Balduzzi - Cyber-crime and attacks in the dark side of the web - Codemo...Marco Balduzzi - Cyber-crime and attacks in the dark side of the web - Codemo...
Marco Balduzzi - Cyber-crime and attacks in the dark side of the web - Codemo...
 
Thomas Rossetto - Container and microservices: a love story - Codemotion Mila...
Thomas Rossetto - Container and microservices: a love story - Codemotion Mila...Thomas Rossetto - Container and microservices: a love story - Codemotion Mila...
Thomas Rossetto - Container and microservices: a love story - Codemotion Mila...
 
Carlo Ferrarini/Marco Dragoni - How to avoid delivery of unsanitary food with...
Carlo Ferrarini/Marco Dragoni - How to avoid delivery of unsanitary food with...Carlo Ferrarini/Marco Dragoni - How to avoid delivery of unsanitary food with...
Carlo Ferrarini/Marco Dragoni - How to avoid delivery of unsanitary food with...
 
Mobile Library Development - stuck between a pod and a jar file - Zan Markan ...
Mobile Library Development - stuck between a pod and a jar file - Zan Markan ...Mobile Library Development - stuck between a pod and a jar file - Zan Markan ...
Mobile Library Development - stuck between a pod and a jar file - Zan Markan ...
 
Nicola Corti/Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017
Nicola Corti/Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017Nicola Corti/Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017
Nicola Corti/Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017
 
Webinar: Mario Cartia - Facciamo il Punto su Presente e Futuro dei framework ...
Webinar: Mario Cartia - Facciamo il Punto su Presente e Futuro dei framework ...Webinar: Mario Cartia - Facciamo il Punto su Presente e Futuro dei framework ...
Webinar: Mario Cartia - Facciamo il Punto su Presente e Futuro dei framework ...
 
The Most Important Thing - Mike Lee - Codemotion Amsterdam 2017
The Most Important Thing - Mike Lee - Codemotion Amsterdam 2017The Most Important Thing - Mike Lee - Codemotion Amsterdam 2017
The Most Important Thing - Mike Lee - Codemotion Amsterdam 2017
 
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
Oded Coster - Stack Overflow behind the scenes - how it's made - Codemotion M...
 
Francesco Arcieri - La monetizzazione delle API - Codemotion Milan 2017
Francesco Arcieri - La monetizzazione delle API - Codemotion Milan 2017Francesco Arcieri - La monetizzazione delle API - Codemotion Milan 2017
Francesco Arcieri - La monetizzazione delle API - Codemotion Milan 2017
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
 
Roberto Clapis/Stefano Zanero - Night of the living vulnerabilities: forever-...
Roberto Clapis/Stefano Zanero - Night of the living vulnerabilities: forever-...Roberto Clapis/Stefano Zanero - Night of the living vulnerabilities: forever-...
Roberto Clapis/Stefano Zanero - Night of the living vulnerabilities: forever-...
 
Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017
Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017
Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017
 
Andrea Maietta - Il fascino della supercazzola: un breve viaggio nel mondo de...
Andrea Maietta - Il fascino della supercazzola: un breve viaggio nel mondo de...Andrea Maietta - Il fascino della supercazzola: un breve viaggio nel mondo de...
Andrea Maietta - Il fascino della supercazzola: un breve viaggio nel mondo de...
 
Agnieszka Naplocha - Breaking the norm with creative CSS - Codemotion Milan 2017
Agnieszka Naplocha - Breaking the norm with creative CSS - Codemotion Milan 2017Agnieszka Naplocha - Breaking the norm with creative CSS - Codemotion Milan 2017
Agnieszka Naplocha - Breaking the norm with creative CSS - Codemotion Milan 2017
 
Luciano Fiandesio - Docker 101 | Codemotion Milan 2015
Luciano Fiandesio - Docker 101 | Codemotion Milan 2015Luciano Fiandesio - Docker 101 | Codemotion Milan 2015
Luciano Fiandesio - Docker 101 | Codemotion Milan 2015
 
Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...
Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...
Alison B Lowndes - Fueling the Artificial Intelligence Revolution with Gaming...
 
Luciano Mammino - Cracking JWT tokens: a tale of magic, Node.JS and parallel...
Luciano Mammino  - Cracking JWT tokens: a tale of magic, Node.JS and parallel...Luciano Mammino  - Cracking JWT tokens: a tale of magic, Node.JS and parallel...
Luciano Mammino - Cracking JWT tokens: a tale of magic, Node.JS and parallel...
 

Semelhante a Downtime is not an option - day 2 operations - Jörg Schad

OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...
OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...
OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...NETWAYS
 
Elastic data services on Apache Mesos via Mesosphere’s DCOS
Elastic data services on Apache Mesos via Mesosphere’s DCOSElastic data services on Apache Mesos via Mesosphere’s DCOS
Elastic data services on Apache Mesos via Mesosphere’s DCOSharrythewiz
 
[DO16] Mesosphere : Microservices meet Fast Data on Azure
[DO16] Mesosphere : Microservices meet Fast Data on Azure [DO16] Mesosphere : Microservices meet Fast Data on Azure
[DO16] Mesosphere : Microservices meet Fast Data on Azure de:code 2017
 
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACKAlluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACKAlluxio, Inc.
 
SMACK stack and beyond
SMACK stack and beyondSMACK stack and beyond
SMACK stack and beyondMatt Jarvis
 
Hyperscale Computing, Enterprise Agility with Mesosphere
Hyperscale Computing, Enterprise Agility with MesosphereHyperscale Computing, Enterprise Agility with Mesosphere
Hyperscale Computing, Enterprise Agility with MesosphereMarkus Eisele
 
Doing Dropbox the Native Cloud Native Way
Doing Dropbox the Native Cloud Native WayDoing Dropbox the Native Cloud Native Way
Doing Dropbox the Native Cloud Native WayMinio
 
DOD 2016 - Jörg Schad - How Fast Data and Microservices Change the Datacenter.
DOD 2016 - Jörg Schad - How Fast Data and Microservices Change the Datacenter.DOD 2016 - Jörg Schad - How Fast Data and Microservices Change the Datacenter.
DOD 2016 - Jörg Schad - How Fast Data and Microservices Change the Datacenter.PROIDEA
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Spark Summit
 
A Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataA Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataEdward Hsu
 
Mesos, DC/OS and the Architecture of the New Datacenter
Mesos, DC/OS and the Architecture of the New DatacenterMesos, DC/OS and the Architecture of the New Datacenter
Mesos, DC/OS and the Architecture of the New DatacenterQAware GmbH
 
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps.com
 
Kubernetes One-Click Deployment: Hands-on Workshop (Munich)
Kubernetes One-Click Deployment: Hands-on Workshop (Munich)Kubernetes One-Click Deployment: Hands-on Workshop (Munich)
Kubernetes One-Click Deployment: Hands-on Workshop (Munich)QAware GmbH
 
Journey to the Modern App with Containers, Microservices and Big Data
Journey to the Modern App with Containers, Microservices and Big DataJourney to the Modern App with Containers, Microservices and Big Data
Journey to the Modern App with Containers, Microservices and Big DataLightbend
 
DevOps in Age of Kubernetes
DevOps in Age of KubernetesDevOps in Age of Kubernetes
DevOps in Age of KubernetesMesosphere Inc.
 
Flink Forward San Francisco 2017 - Flink meet DC/OS
Flink Forward San Francisco 2017 - Flink meet DC/OSFlink Forward San Francisco 2017 - Flink meet DC/OS
Flink Forward San Francisco 2017 - Flink meet DC/OSpleia2
 
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...Flink Forward
 
Flink forward sf 17
Flink forward sf 17Flink forward sf 17
Flink forward sf 17Ravi Yadav
 
Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...
Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...
Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...DataStax
 
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSmack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSpark Summit
 

Semelhante a Downtime is not an option - day 2 operations - Jörg Schad (20)

OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...
OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...
OSDC 2018 | From batch to pipelines – why Apache Mesos and DC/OS are a soluti...
 
Elastic data services on Apache Mesos via Mesosphere’s DCOS
Elastic data services on Apache Mesos via Mesosphere’s DCOSElastic data services on Apache Mesos via Mesosphere’s DCOS
Elastic data services on Apache Mesos via Mesosphere’s DCOS
 
[DO16] Mesosphere : Microservices meet Fast Data on Azure
[DO16] Mesosphere : Microservices meet Fast Data on Azure [DO16] Mesosphere : Microservices meet Fast Data on Azure
[DO16] Mesosphere : Microservices meet Fast Data on Azure
 
Alluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACKAlluxio Mesos Meetup - SMACK to SMAACK
Alluxio Mesos Meetup - SMACK to SMAACK
 
SMACK stack and beyond
SMACK stack and beyondSMACK stack and beyond
SMACK stack and beyond
 
Hyperscale Computing, Enterprise Agility with Mesosphere
Hyperscale Computing, Enterprise Agility with MesosphereHyperscale Computing, Enterprise Agility with Mesosphere
Hyperscale Computing, Enterprise Agility with Mesosphere
 
Doing Dropbox the Native Cloud Native Way
Doing Dropbox the Native Cloud Native WayDoing Dropbox the Native Cloud Native Way
Doing Dropbox the Native Cloud Native Way
 
DOD 2016 - Jörg Schad - How Fast Data and Microservices Change the Datacenter.
DOD 2016 - Jörg Schad - How Fast Data and Microservices Change the Datacenter.DOD 2016 - Jörg Schad - How Fast Data and Microservices Change the Datacenter.
DOD 2016 - Jörg Schad - How Fast Data and Microservices Change the Datacenter.
 
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
Powering Predictive Mapping at Scale with Spark, Kafka, and Elastic Search: S...
 
A Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big DataA Journey to Modern Apps with Containers, Microservices and Big Data
A Journey to Modern Apps with Containers, Microservices and Big Data
 
Mesos, DC/OS and the Architecture of the New Datacenter
Mesos, DC/OS and the Architecture of the New DatacenterMesos, DC/OS and the Architecture of the New Datacenter
Mesos, DC/OS and the Architecture of the New Datacenter
 
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of KubernetesDevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
DevOps vs. Site Reliability Engineering (SRE) in Age of Kubernetes
 
Kubernetes One-Click Deployment: Hands-on Workshop (Munich)
Kubernetes One-Click Deployment: Hands-on Workshop (Munich)Kubernetes One-Click Deployment: Hands-on Workshop (Munich)
Kubernetes One-Click Deployment: Hands-on Workshop (Munich)
 
Journey to the Modern App with Containers, Microservices and Big Data
Journey to the Modern App with Containers, Microservices and Big DataJourney to the Modern App with Containers, Microservices and Big Data
Journey to the Modern App with Containers, Microservices and Big Data
 
DevOps in Age of Kubernetes
DevOps in Age of KubernetesDevOps in Age of Kubernetes
DevOps in Age of Kubernetes
 
Flink Forward San Francisco 2017 - Flink meet DC/OS
Flink Forward San Francisco 2017 - Flink meet DC/OSFlink Forward San Francisco 2017 - Flink meet DC/OS
Flink Forward San Francisco 2017 - Flink meet DC/OS
 
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
Flink forward SF 2017: Elizabeth K. Joseph and Ravi Yadav - Flink meet DC/OS ...
 
Flink forward sf 17
Flink forward sf 17Flink forward sf 17
Flink forward sf 17
 
Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...
Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...
Partner Webinar: Mesosphere and DSE: Production-Proven Infrastructure for Fas...
 
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg SchadSmack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
 

Mais de Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 

Mais de Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Último

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Último (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Downtime is not an option - day 2 operations - Jörg Schad

  • 1. 1 Downtime is not an option How Fast Data and Microservices change the datacenter @joerg_schad @dcos
  • 2. © 2017 Mesosphere, Inc. All Rights Reserved. 2 Jörg Schad Distributed Systems Engineer @joerg_schad
  • 3. © 2017 Mesosphere, Inc. All Rights Reserved. 3 In the beginning there was a big Monolith
  • 4. © 2016 Mesosphere, Inc. All Rights Reserved. 4
  • 5. © 2017 Mesosphere, Inc. All Rights Reserved. Hardware Operating System Application 5 COMPUTERS
  • 6. © 2017 Mesosphere, Inc. All Rights Reserved. noun | ˈmīkrō/ /ˈsərvəs/ : an approach to application development in which a large application is built as a suite of modular services. Each module supports a specific business goal and uses a simple, well-defined interface to communicate with other modules.* Microservices are designed to be flexible, resilient, efficient, robust, and individually scalable. *From whatis.com OVERVIEW
  • 7. © 2017 Mesosphere, Inc. All Rights Reserved. Operating System Operating System Operating System ServiceApp ServiceServiceAppApp 7 MICROSERVICES - Polyglot - Single Responsibility - Smaller Teams - Utilization - Machine types/groups - Dependency hell Machine Infrastructure Machine Machine ServiceService ServiceServiceServiceService
  • 8. © Gerard Julien/AFP Run everything in containers!
  • 9. © 2017 Mesosphere, Inc. All Rights Reserved. ServiceApp ServiceServiceAppApp OS 9 CONTAINERS - Rapid deployment - Dependency vendoring - Container image repositories - Spreadsheet scheduling OS OS Machine Infrastructure Machine Machine Container Runtime Container Runtime Container Runtime ServiceService ServiceServiceServiceService
  • 10. © 2017 Mesosphere, Inc. All Rights Reserved. 10 CONTAINER SCHEDULING RESOURCE MANAGEMENT SERVICE MANAGEMENT - Load Balancing - Readiness Checking CONTAINER ORCHESTRATION
  • 11. © 2017 Mesosphere, Inc. All Rights Reserved. 11 CONTAINER SCHEDULING - Placement - Replication/Scaling - Resurrection - Rescheduling - Rolling Deployment - Upgrades - Downgrades - Collocation RESOURCE MANAGEMENT - Memory - CPU - GPU - Volumes - Ports - IPs - Images/Artifacts SERVICE MANAGEMENT - Labels - Groups/Namespaces - Dependencies - Load Balancing - Readiness Checking CONTAINER ORCHESTRATION
  • 12. © 2017 Mesosphere, Inc. All Rights Reserved. Orchestration 12 Machine Infrastructure Web Apps & Services Scheduling Resource Management Container Runtime Machine & OS Service Management CONTAINER ORCHESTRATION Machine & OS Machine & OS Container Runtime Container Runtime
  • 13. © 2017 Mesosphere, Inc. All Rights Reserved. 13 MapReduce is crunching Data Meanwhile...
  • 14. © 2016 Mesosphere, Inc. All Rights Reserved. 14 But then business demanded FAST DATA We need to turn faster!
  • 15. © 2016 Mesosphere, Inc. All Rights Reserved. 15 Fast Data Batch Event ProcessingMicro-Batch Days Hours Minutes Seconds Microseconds Solves problems using predictive and prescriptive analyticsReports what has happened using descriptive analytics Predictive User InterfaceReal-time Pricing and Routing Real-time AdvertisingBilling, Chargeback Product recommendations
  • 16. © 2016 Mesosphere, Inc. All Rights Reserved. 16 The SMACK Stack EVENTS Ubiquitous data streams from connected devices INGEST Apache Kafka STORE Apache Spark ANALYZE Apache Cassandra ACT Akka Ingest millions of events per second Distributed & highly scalable database Real-time and batch process data Visualize data and build data driven applications Mesos/ DC/OS Sensors Devices Clients
  • 17. © 2017 Mesosphere, Inc. All Rights Reserved. 17 Datacenter
  • 18. © 2017 Mesosphere, Inc. All Rights Reserved. 18 NAIVE APPROACH Typical Datacenter siloed, over-provisioned servers, low utilization Industry Average 12-15% utilization mySQL microservice Cassandra Spark/Hadoop Kafka
  • 19. © 2017 Mesosphere, Inc. All Rights Reserved. 19
  • 20. © 2017 Mesosphere, Inc. All Rights Reserved. 20 MULTIPLEXING OF DATA, SERVICES, USERS, ENVIRONMENTS Typical Datacenter siloed, over-provisioned servers, low utilization Mesos/ DC/OS automated schedulers, workload multiplexing onto the same machines mySQL microservice Cassandra Spark/Hadoop Kafka
  • 21. © 2017 Mesosphere, Inc. All Rights Reserved. • A top-level Apache project • A cluster resource negotiator • Scalable to 10,000s of nodes • Fault-tolerant, battle-tested • An SDK for distributed apps • Native Docker support 21 Apache Mesos
  • 22. © 2017 Mesosphere, Inc. All Rights Reserved. 22
  • 23. Datacenter Operating System (DC/OS) Distributed Systems Kernel (Mesos) DC/OS ENABLES MODERN DISTRIBUTED APPS Big Data + Analytics EnginesMicroservices (in containers) Streaming Batch Machine Learning Analytics Functions & Logic Search Time Series SQL / NoSQL Databases Modern App Components Any Infrastructure (Physical, Virtual, Cloud) 23
  • 24. 24 THE BASICS DC/OS is … ● 100% open source (ASL2.0) + A big, diverse community ● An umbrella for ~30 OSS projects + Roadmap and designs + Docs and tutorials ● Not limited in any way ● Familiar, with more features + Networking, Security, CLI, UI, Service Discovery, Load Balancing, Packages, ...
  • 25. © 2016 Mesosphere, Inc. All Rights Reserved. Container Options Enhancements to the Mesos Containerizer to allow support launching specific container formats (Docker, AppC, OCI (future), etc) ● Reduces need to maintain and update multiple containerizers ● Support multiple container formats with a single containerizer Image provisioner component added to the Mesos containerizer - responsible for pulling, caching, and preparing container root filesystems Launcher Isolators Universal containerizer Provisioner Process management Container lifecycle hook Container image support
  • 26. © 2016 Mesosphere, Inc. All Rights Reserved. 26 DEMO
  • 27. © 2016 Mesosphere, Inc. All Rights Reserved. 27 GEO-ENABLED IoT
  • 28. © 2016 Mesosphere, Inc. All Rights Reserved. 28 DATA FLOW
  • 29. © 2017 Mesosphere, Inc. All Rights Reserved. 29 Keep it running!
  • 30. © 2016 Mesosphere, Inc. All Rights Reserved. 30 Monitoring - Collecting metrics - Routing events - Downstream processing - Alerting - Dashboards - Storage (long-term retention) Logging - Scopes - Local vs. Central - Security considerations DAY 2 OPERATIONS
  • 31. © 2016 Mesosphere, Inc. All Rights Reserved. 31 Maintenance - Cluster Upgrades - Cluster Resizing - Capacity Planning - User & Package Management - Networking Policies - Auditing - Backups & Disaster Recovery Troubleshooting - Debugging - Services - System - Access? - Tracing - Chaos Engineering DAY 2 OPERATIONS
  • 32. © 2017 Mesosphere, Inc. All Rights Reserved. 32 Troubleshooting ● Services: typically specific to service, use logging (for example, dcos task log) and dcos node ssh for per-node investigations ● dcos task exec ○ Permissions? ● System: ○ Simple diagnostics via dcos node diagnostics ○ Comprehensive dump via clump
  • 33. © 2016 Mesosphere, Inc. All Rights Reserved. 33 THANK YOU! ANY QUESTIONS? @dcos users@dcos.io /groups/8295652 /dcos /dcos/examples /dcos/demos chat.dcos.io
  • 34. © 2017 Mesosphere, Inc. All Rights Reserved. 34 Failures Framework Scheduler Executor Task Agent LEADER STANDBY STANDBY ZK ZK ZK Executor Task Agent
  • 35. © 2016 Mesosphere, Inc. All Rights Reserved. Distributed Systems could be so easy... 35 1. The network is reliable. 2. Latency is zero. 3. Bandwidth is infinite. 4. The network is secure. 5. Topology doesn't change. 6. There is one administrator. 7. Transport cost is zero. 8. The network is homogeneous. *) https://en.wikipedia.org/wiki/Fallacies_of_distributed_comp uting
  • 36. © 2017 Mesosphere, Inc. All Rights Reserved. 36 Questions? Code: https://git.io/vXUoy http:// grnh.se/ ie76ru
  • 37. © 2015 Mesosphere, Inc. All Rights Reserved. 37 Monitoring
  • 38. © 2016 Mesosphere, Inc. All Rights Reserved. METRICS Measurements captured to determine health and performance of cluster - How utilized is the cluster? - Are resources being optimally used? - Is the system performing better or worse over time? - Are there bottlenecks in the system? - What is the response time of applications?
  • 39. © 2016 Mesosphere, Inc. All Rights Reserved. DC/OS METRIC SOURCES ● Mesos metrics ○ Resource, frameworks, masters, agents, tasks, system, events ● Container Metrics ○ CPU, mem, disk, network ● Application Metrics ○ QPS, latency, response time, hits, active users, errors OS Mesos Container ContainerContainer App App App
  • 40. © 2016 Mesosphere, Inc. All Rights Reserved. Before upgrading 1. Make sure cluster is healthy! 2. Perform backup a. ZK b. Replicated logs c. other state 3. Review release notes 4. Generate install bundle a. Validate versions UPGRADE PROCEDURE Framework Scheduler Executor Task Agent Executor Task Agent LEADER STANDBY STANDBY ZK ZK ZK
  • 41. © 2016 Mesosphere, Inc. All Rights Reserved. MESOS MASTER METRICS ● Metrics for the master node are available at the following URL: ○ http://<mesos-master-ip>/mesos/master/metrics/snapshot ○ The response is a JSON object that contains metrics names and values as key-value pairs. ● Metric Groups: ○ Resources ○ Master ○ System ○ Slaves ○ Frameworks ○ Tasks ○ Messages ○ Event Queue ○ Registrar
  • 42. © 2016 Mesosphere, Inc. All Rights Reserved. MESOS MASTER BASIC ALERTS Metric Value Inference master/uptime_secs is low The master has restarted master/uptime_secs < 60 for sustained periods of time The cluster has a flapping master node master/tasks_lost is increasing rapidly Tasks in the cluster are disappearing. Possible causes include hardware failures, bugs in one of the frameworks or bugs in Mesos master/slaves_active is low Slaves are having trouble connecting to the master master/cpus_percent > 0.9 for sustained periods of time DCOS Cluster CPU utilization is close to capacity master/mem_percent > 0.9 for sustained periods of time DCOS Cluster Memory utilization is close to capacity master/disk_used & master/disk_percent DCOS Disk space consumed by Reservations master/elected is 0 for sustained periods of time No Master is currently elected
  • 43. © 2016 Mesosphere, Inc. All Rights Reserved. 43 Operations UPGRADES
  • 44. © 2016 Mesosphere, Inc. All Rights Reserved. 1. Master rolling upgrade a. Start with standby b. Install new DC/OS 2. Agent rolling upgrade 3. Framework upgrades UPGRADE PROCEDURE Framework Scheduler Executor Task Agent LEADER STANDBY STANDBY ZK ZK ZK Executor Task Agent
  • 45. © 2016 Mesosphere, Inc. All Rights Reserved. 1. Master rolling upgrade 2. Agent rolling upgrade a. Uninstall DC/OS b. Install new DC/OS 3. Framework upgrades UPGRADE PROCEDURE Framework Scheduler Executor Task Agent LEADER STANDBY STANDBY ZK ZK ZK Executor Task Agent
  • 46. © 2016 Mesosphere, Inc. All Rights Reserved. 1. Master rolling upgrade 2. Agent rolling upgrade 3. Framework upgrades a. Orthogonal to DC/OS b. Ensure changes don’t affect existing apps UPGRADE PROCEDURE Framework Scheduler Executor Task Agent LEADER STANDBY STANDBY ZK ZK ZK Executor Task Agent
  • 47. © 2015 Mesosphere, Inc. All Rights Reserved. 47 Failure Handling
  • 48. © 2016 Mesosphere, Inc. All Rights Reserved. 48 Failure Handling MESOS TASK FAILURE
  • 49. © 2015 Mesosphere, Inc. All Rights Reserved. MESOS TASK FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK
  • 50. © 2015 Mesosphere, Inc. All Rights Reserved. Status Update Status Update EXECUTOR MESOS TASK FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT TASK AGENT SEGFAULT :(
  • 51. © 2015 Mesosphere, Inc. All Rights Reserved. MESOS TASK FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT EXECUTOR TASK Launch Task Launch Task AGENT Launch Task
  • 52. © 2015 Mesosphere, Inc. All Rights Reserved. EXECUTOR Status Update Status Update MESOS TASK FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT TASK AGENT Status Update
  • 53. © 2016 Mesosphere, Inc. All Rights Reserved. 53 Failure Handling MESOS AGENT FAILURE
  • 54. © 2015 Mesosphere, Inc. All Rights Reserved. LOCAL AGENT FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK
  • 55. © 2015 Mesosphere, Inc. All Rights Reserved. LOCAL AGENT FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK
  • 56. © 2015 Mesosphere, Inc. All Rights Reserved. LOCAL AGENT FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK
  • 57. © 2015 Mesosphere, Inc. All Rights Reserved. LOCAL AGENT FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK
  • 58. © 2015 Mesosphere, Inc. All Rights Reserved. LOCAL AGENT FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK Re-register
  • 59. © 2016 Mesosphere, Inc. All Rights Reserved. 59 Failure Handling MESOS HOST FAILURE
  • 60. © 2015 Mesosphere, Inc. All Rights Reserved. LOCAL AGENT FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK
  • 61. © 2015 Mesosphere, Inc. All Rights Reserved. LOCAL AGENT FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT
  • 62. © 2015 Mesosphere, Inc. All Rights Reserved. LOCAL AGENT FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT Status Update
  • 63. © 2015 Mesosphere, Inc. All Rights Reserved. MESOS TASK FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT EXECUTOR TASK Launch Task Launch Task Launch Task Resource Offer
  • 64. © 2015 Mesosphere, Inc. All Rights Reserved. MESOS TASK FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT EXECUTOR TASK Status Update Status Update Status Update
  • 65. © 2016 Mesosphere, Inc. All Rights Reserved. 65 Failure Handling MESOS MASTER FAILURE
  • 66. © 2015 Mesosphere, Inc. All Rights Reserved. MASTER FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK
  • 67. © 2015 Mesosphere, Inc. All Rights Reserved. MASTER FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK
  • 68. © 2015 Mesosphere, Inc. All Rights Reserved. MASTER FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK
  • 69. © 2015 Mesosphere, Inc. All Rights Reserved. MASTER FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK Leading Master Leading Master Leading Master
  • 70. © 2015 Mesosphere, Inc. All Rights Reserved. MASTER FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK Reregister Reregister Reregister Reregister
  • 71. © 2015 Mesosphere, Inc. All Rights Reserved. Reregistered MASTER FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK Reregistered Reregistered Reregistered
  • 72. © 2016 Mesosphere, Inc. All Rights Reserved. 72 Failure Handling SCHEDULER FAILURE
  • 73. © 2015 Mesosphere, Inc. All Rights Reserved. SCHEDULER FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK
  • 74. © 2015 Mesosphere, Inc. All Rights Reserved. SCHEDULER FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK
  • 75. © 2015 Mesosphere, Inc. All Rights Reserved. SCHEDULER FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK
  • 76. © 2015 Mesosphere, Inc. All Rights Reserved. SCHEDULER FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK Framework ID Leading Master
  • 77. © 2015 Mesosphere, Inc. All Rights Reserved. SCHEDULER FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK Reregister
  • 78. © 2015 Mesosphere, Inc. All Rights Reserved. SCHEDULER FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK Reregistered
  • 79. © 2015 Mesosphere, Inc. All Rights Reserved. SCHEDULER FAILURE ZK MASTERMARATHON CLIENT AGENT AGENT AGENT EXECUTOR TASK Status Update Reconcile Tasks
  • 80. © 2016 Mesosphere, Inc. All Rights Reserved. 1. Master rolling upgrade a. Start with standby b. Uninstall DC/OS c. Install new DC/OS 2. Agent rolling upgrade 3. Framework upgrades UPGRADE PROCEDURE Framework Scheduler Executor Task Agent LEADER STANDBY STANDBY ZK ZK ZK Executor Task Agent
  • 81. © 2016 Mesosphere, Inc. All Rights Reserved. 1. Master rolling upgrade 2. Agent rolling upgrade a. Uninstall DC/OS b. Install new DC/OS 3. Framework upgrades UPGRADE PROCEDURE Framework Scheduler Executor Task Agent LEADER STANDBY STANDBY ZK ZK ZK Executor Task Agent
  • 82. © 2016 Mesosphere, Inc. All Rights Reserved. 1. Master rolling upgrade 2. Agent rolling upgrade 3. Framework upgrades a. Orthogonal to DC/OS b. Ensure changes don’t affect existing apps UPGRADE PROCEDURE Framework Scheduler Executor Task Agent LEADER STANDBY STANDBY ZK ZK ZK Executor Task Agent
  • 83. © 2016 Mesosphere, Inc. All Rights Reserved. FUTURES (TBD) Leverage maintenance primitives in Mesos to drain host Upgrade management through DC/OS to perform rolling upgrades
  • 84. © 2016 Mesosphere, Inc. All Rights Reserved. 84 Monitoring - Collecting metrics - Routing events - Downstream processing - Alerting - Dashboards - Storage (long-term retention) Logging - Scopes - Local vs. Central - Security considerations DAY 2 OPERATIONS
  • 85. © 2016 Mesosphere, Inc. All Rights Reserved. 85 Maintenance - Cluster Upgrades - Cluster Resizing - Capacity Planning - User & Package Management - Networking Policies - Auditing - Backups & Disaster Recovery Troubleshooting - Debugging - Services - System - Tracing - Chaos Engineering DAY 2 OPERATIONS
  • 86. © 2016 Mesosphere, Inc. All Rights Reserved. 86 MONITORING
  • 87. © 2017 Mesosphere, Inc. All Rights Reserved. 87 MONITORING CONCEPT
  • 88. © 2017 Mesosphere, Inc. All Rights Reserved. 88 MONITORING TOOLING EXAMPLES ● local scraping: a. collectd b. cAdvisor* ● event router: a. fluentd b. Flume c. Kafka* d. logstash* e. Riemann *) available via Mesosphere Universe
  • 89. © 2017 Mesosphere, Inc. All Rights Reserved. 89 MONITORING TOOLING EXAMPLES ● storage: a. Elasticsearch* b. Graphite c. InfluxDB* d. KairosDB/Cassandra* e. OpenTSDB/HBase f. others such a local filesystem, Ceph FS*, HDFS*, etc. *) available via Mesosphere Universe
  • 90. © 2017 Mesosphere, Inc. All Rights Reserved. 90 MONITORING TOOLING EXAMPLES ● dashboard: a. D3 b. Grafana* c. signal fx ● alerting: a. BigPanda b. PagerDuty c. signal fx d. VictorOps *) available via Mesosphere Universe
  • 91. © 2017 Mesosphere, Inc. All Rights Reserved. 91 MONITORING TOOLING EXAMPLES (INTEGRATED) ● Amazon CloudWatch ● AppDynamics ● Azure Monitor ● Circonus ● DataDog* ● dcos/metrics ● Ganglia ● Google Stackdriver ● Hawkular ● Icinga ● Librato ● Nagios ● New Relic ● OpsGenie ● Pingdom ● Prometheus ● Ruxit Dynatrace* ● Sensu ● Sysdig* ● Zabbix *) available via Mesosphere Universe
  • 92. © 2016 Mesosphere, Inc. All Rights Reserved. 92 LOGGING
  • 93. © 2017 Mesosphere, Inc. All Rights Reserved. 93 LOGGING SCOPES
  • 94. © 2017 Mesosphere, Inc. All Rights Reserved. 94 LOGGING TOOLING EXAMPLES (PRIMITIVES) ● DC/OS logging overview ● Docker logging drivers ● systemd's journalctl
  • 95. © 2017 Mesosphere, Inc. All Rights Reserved. 95 LOGGING TOOLING EXAMPLES (INTEGRATED) ● Centralized app logging with fluentd ● DC/OS a. ELK stack log shipping b. Splunk ● Graylog ● Loggly ● Papertrail ● Sumo Logic
  • 96. © 2016 Mesosphere, Inc. All Rights Reserved. 96 MAINTENANCE
  • 97. © 2017 Mesosphere, Inc. All Rights Reserved. 97 Overview ● How to install a new version of X? ● When to scale what (service-level vs. nodes) ● Who gets to access/install which services in what way? Upgrades Sizing User and package management ● What services can talk to each other and in which way? ● Who accessed what, when and how? ● How is the continuous operation of the cluster and the services accomplished? What happens when cluster (or critical infra components like ZK) go down? Networking Auditing Disaster Recovery
  • 98. © 2017 Mesosphere, Inc. All Rights Reserved. 98 OTHER TROUBLESHOOTING TECHNIQUES ● Tracing ○ Idea: identify latency issues and perform root-cause analysis in a distributed setup ○ OpenTracing ● Chaos Engineering ○ Idea: proactively break (parts of) the system to understand how it reacts ○ Chaos Monkey ○ DRAX
  • 99. © 2016 Mesosphere, Inc. All Rights Reserved. 99
  • 100. © 2017 Mesosphere, Inc. All Rights Reserved. 100 ARCHITECTURE MESOS FUNDAMENTALS ● Agents advertise resources to Master ● Master offers resources to Framework ● Framework rejects/uses resources ● Agents report task status to Master
  • 101. © 2017 Mesosphere, Inc. All Rights Reserved. 101 ARCHITECTURE MESOS FUNDAMENTALS ● Agents advertise resources to Master ● Master offers resources to Framework ● Framework rejects/uses resources ● Agents report task status to Master
  • 102. © 2017 Mesosphere, Inc. All Rights Reserved. 102 ARCHITECTURE MESOS FUNDAMENTALS ● Agents advertise resources to Master ● Master offers resources to Framework ● Framework rejects/uses resources ● Agents report task status to Master
  • 103. © 2017 Mesosphere, Inc. All Rights Reserved. 103 ARCHITECTURE MESOS FUNDAMENTALS ● Agents advertise resources to Master ● Master offers resources to Framework ● Framework rejects/uses resources ● Agents report task status to Master
  • 104. © 2017 Mesosphere, Inc. All Rights Reserved. 104 ARCHITECTURE MESOS FUNDAMENTALS ● Agents advertise resources to Master ● Master offers resources to Framework ● Framework rejects/uses resources ● Agents report task status to Master
  • 105. © 2017 Mesosphere, Inc. All Rights Reserved. 105 Questions? Code: https://git.io/vXUoyPsssssssst … … we are hiring! http:// grnh.se/ ie76ru
  • 106. © 2017 Mesosphere, Inc. All Rights Reserved. CONTAINER SCHEDULING 106
  • 107. © 2017 Mesosphere, Inc. All Rights Reserved. RESOURCE MANAGEMENT 107
  • 108. © 2017 Mesosphere, Inc. All Rights Reserved. SERVICE MANAGEMENT 108
  • 109. © 2017 Mesosphere, Inc. All Rights Reserved. Service Service Service Web App Web App Web App Hardware Operating System 109 SERVICE- ORIENTED ARCHITECTURE - Separation of concerns - Optimization of bottlenecks - Smaller teams - API Contracts - Data replication - Complicated provisioning - Dependency management Operating System Operating System Hardware Hardware
  • 110. © 2017 Mesosphere, Inc. All Rights Reserved. Operating System Operating System Operating System ServiceApp ServiceServiceAppApp 110 MICROSERVICES - Polyglot - Single Responsibility - Smaller Teams - Utilization - Machine types/groups - Dependency hell Machine Infrastructure Machine Machine ServiceService ServiceServiceServiceService
  • 111. © 2017 Mesosphere, Inc. All Rights Reserved. 111 THE BIRTH OF MESOS TWITTER TECH TALK The grad students working on Mesos give a tech talk at Twitter. March 2010 APACHE INCUBATION Mesos enters the Apache Incubator. Spring 2009 CS262B Ben Hindman, Andy Konwinski and Matei Zaharia create “Nexus” as their CS262B class project. MESOS PUBLISHED Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center is published as a technical report. September 2010 December 2010 DC/OS April 2016
  • 112. © 2015 Mesosphere, Inc. All Rights Reserved. 112 Monitoring - Collecting metrics - Routing events - Downstream processing ○ Alerting ○ Dashboards ○ Storage (long-term retention) Logging - Scopes - Local vs. Central - Security considerations Maintenance - Cluster Upgrades - Cluster Resizing - Capacity Planning - User & Package Management - Networking Policies - Auditing - Backups & Disaster Recovery Troubleshooting - Debugging ○ Services ○ System - Tracing - Chaos Engineering