SlideShare uma empresa Scribd logo
1 de 32
Lessons learned running large
real-world Docker environments
Oct 27th 2015
Alois Mayr
@mayralois
alois.mayr@ruxit.com
Dec 3rd 2015
Source: http://www.schoonoart.de/
What is a “large” environment?
Campfire stories
#1 – The Death Star of Service Dependencies
#1 – Death Star of Service Dependencies
Load-balanced service
System-wide service
dependencies
Reverse proxies are essential
#1 – The Death Star of Service Dependencies
App #1
App #2
App #1 depends on App #2
Where is this specified?
Unwanted dependencies break architecture
#1 – The Death Star of Service Dependencies
Use proper versioning for
services, APIs, and images
#1 – The Death Star of Service Dependencies
Campfire stories
#1 – The Death Star of Service Dependencies
#2 – The Network Retransmission Episode
#2 – The Network Retransmission Episode
Retransmissions
Retransmissions Retransmissions
Retransmissions Retransmissions
Retransmissions
Retransmissions
• Hardware defect in a single network interface card
• NIC worked well under low load
• Retransmissions only under heavy load
• Affected communications to other machines
in datacenter
• Still not sure about exact defect on NIC
What was the problem?
#2 – The Network Retransmission Episode
#2 – The Network Retransmission Episode
Co-locate related containers.
Check network infrastructure.
#2 – The Network Retransmission Episode
Campfire stories
#1 – The Death Star of Service Dependencies
#2 – The Network Retransmission Episode
#3 – The Hungry Container Breakdown
#3 – The Hungry Container Breakdown
Low disk space
Low disk space
• Shared /logs partition on host
• No log rotation, no archiving for app logs
• No proper log management used for Docker environment
• Shared /logs partition on a single host ran out of space
What was the problem?
#3 – The Hungry Container Breakdown
• Container health checks failed
• Marathon terminated task and rescheduled new one
• Still no free space on /logs
• Termination and rescheduling
• /var/lib/docker ran out of space
• Mesos slave unable to run Docker tasks
How the problem evolved over time
#3 – The Hungry Container Breakdown
• Log management tools for app logs, e.g. Fluentd and Logstash
--log-driver=none|syslog
• Remove container
--rm=true
• Run Mesos slave with
--docker_remove_delay=VALUE
How the problem could have been avoided
#3 – The Hungry Container Breakdown
Use log management tools
Empty /var/lib/docker
#3 – The Hungry Container Breakdown
Campfire stories
#1 – The Death Star of Service Dependencies
#2 – The Network Retransmission Episode
#3 – The Hungry Container Breakdown
#4 – The Day Orchestration Stood Still
#4 – The Day Orchestration Stood Still
Queue and deployment
methods are slow
• Marathon 0.8.x keeps all versions of applications for recovery (by default)
• High frequency of microservices deployments
• Slowdown through zk overload
What was the problem?
#4 – The Day Orchestration Stood Still
• Respective parameter (zk_max_versions) was not set to proper limit
--zk_max_versions=20
How the problem could have been avoided
#4 – The Day Orchestration Stood Still
Track orchestration layer performance
Separate Mesos clusters
#4 – The Day Orchestration Stood Still
Campfire stories
#1 – The Death Star of Service Dependencies
#2 – The Network Retransmission Episode
#3 – The Hungry Container Breakdown
#4 – The Day Orchestration Stood Still
#5 – The Mushroom Cloud Effect
#5 – The Mushroom Cloud Effect
Way too many
components involved
820 BILLION dependencies!
• Massive load testing in preparation for Black Friday
• Tests ran for 3 days
• No impact to real users, only backend services affected
• Many components to take into account
What was the problem?
174 / 3.4k
22 / 13.3k
Service
Container
Host
1
1..*
*
1
#5 – The Mushroom Cloud Effect
Automation needed for problem
analysis in large environments
#5 – The Mushroom Cloud Effect
Campfire stories
#1 – The Death Star of Service Dependencies
#2 – The Network Retransmission Episode
#3 – The Hungry Container Breakdown
#4 – The Day Orchestration Stood Still
#5 – The Mushroom Cloud Effect
Free trial - https://ruxit.com/docker-monitoring/
Blog - https://blog.ruxit.com/
@ruxit
What lessons have you learned?

Mais conteúdo relacionado

Mais procurados

Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafkaconfluent
 
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015 Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015 Jeff Holoman
 
Automated Deployment Using Jenkins Across Clusters
Automated Deployment Using Jenkins Across ClustersAutomated Deployment Using Jenkins Across Clusters
Automated Deployment Using Jenkins Across ClustersNaveen S.R
 
Container Orchestration with Docker Swarm and Kubernetes
Container Orchestration with Docker Swarm and KubernetesContainer Orchestration with Docker Swarm and Kubernetes
Container Orchestration with Docker Swarm and KubernetesWill Hall
 
Windows container security
Windows container securityWindows container security
Windows container securityDocker, Inc.
 
BlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
BlueHat Seattle 2019 || Kubernetes Practical Attack and DefenseBlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
BlueHat Seattle 2019 || Kubernetes Practical Attack and DefenseBlueHat Security Conference
 
How to install and use Kubernetes
How to install and use KubernetesHow to install and use Kubernetes
How to install and use KubernetesLuke Marsden
 
Securing & Enforcing Network Policy and Encryption with Weave Net
Securing & Enforcing Network Policy and Encryption with Weave NetSecuring & Enforcing Network Policy and Encryption with Weave Net
Securing & Enforcing Network Policy and Encryption with Weave NetLuke Marsden
 
Accessible hpc for everyone with docker and containers
Accessible hpc for everyone with docker and containersAccessible hpc for everyone with docker and containers
Accessible hpc for everyone with docker and containersDocker, Inc.
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016aspyker
 
Lightning Fast Monitoring against Lightning Fast Outages
Lightning Fast Monitoring against Lightning Fast OutagesLightning Fast Monitoring against Lightning Fast Outages
Lightning Fast Monitoring against Lightning Fast OutagesMaxime Petazzoni
 
How and why we got Prometheus working with Docker Swarm
How and why we got Prometheus working with Docker SwarmHow and why we got Prometheus working with Docker Swarm
How and why we got Prometheus working with Docker SwarmLuke Marsden
 
WebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck ThreadsWebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck ThreadsMaarten Smeets
 
Build your own Service Bus V2
Build your own Service Bus V2Build your own Service Bus V2
Build your own Service Bus V2Kévin LOVATO
 
An empirical comparison of dependency issues in open source software packagin...
An empirical comparison of dependency issues in open source software packagin...An empirical comparison of dependency issues in open source software packagin...
An empirical comparison of dependency issues in open source software packagin...Tom Mens
 
Locking down your Kubernetes cluster with Linkerd
Locking down your Kubernetes cluster with LinkerdLocking down your Kubernetes cluster with Linkerd
Locking down your Kubernetes cluster with LinkerdBuoyant
 
KubeCon London 2016 Ronana Cloud Native SDN
KubeCon London 2016 Ronana Cloud Native SDNKubeCon London 2016 Ronana Cloud Native SDN
KubeCon London 2016 Ronana Cloud Native SDNRomana Project
 
How to build a Neutron Plugin (stadium edition)
How to build a Neutron Plugin (stadium edition)How to build a Neutron Plugin (stadium edition)
How to build a Neutron Plugin (stadium edition)Salvatore Orlando
 
Docker casual alpine with nim nimlang 박승환_2016_03
Docker casual alpine with nim nimlang 박승환_2016_03Docker casual alpine with nim nimlang 박승환_2016_03
Docker casual alpine with nim nimlang 박승환_2016_03Seunghwan Park
 

Mais procurados (20)

Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
 
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015 Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
Apache Kafka Reliability Guarantees StrataHadoop NYC 2015
 
Automated Deployment Using Jenkins Across Clusters
Automated Deployment Using Jenkins Across ClustersAutomated Deployment Using Jenkins Across Clusters
Automated Deployment Using Jenkins Across Clusters
 
Container Orchestration with Docker Swarm and Kubernetes
Container Orchestration with Docker Swarm and KubernetesContainer Orchestration with Docker Swarm and Kubernetes
Container Orchestration with Docker Swarm and Kubernetes
 
Windows container security
Windows container securityWindows container security
Windows container security
 
BlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
BlueHat Seattle 2019 || Kubernetes Practical Attack and DefenseBlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
BlueHat Seattle 2019 || Kubernetes Practical Attack and Defense
 
How to install and use Kubernetes
How to install and use KubernetesHow to install and use Kubernetes
How to install and use Kubernetes
 
Docker {at,with} SignalFx
Docker {at,with} SignalFxDocker {at,with} SignalFx
Docker {at,with} SignalFx
 
Securing & Enforcing Network Policy and Encryption with Weave Net
Securing & Enforcing Network Policy and Encryption with Weave NetSecuring & Enforcing Network Policy and Encryption with Weave Net
Securing & Enforcing Network Policy and Encryption with Weave Net
 
Accessible hpc for everyone with docker and containers
Accessible hpc for everyone with docker and containersAccessible hpc for everyone with docker and containers
Accessible hpc for everyone with docker and containers
 
Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016Netflix Container Runtime - Titus - for Container Camp 2016
Netflix Container Runtime - Titus - for Container Camp 2016
 
Lightning Fast Monitoring against Lightning Fast Outages
Lightning Fast Monitoring against Lightning Fast OutagesLightning Fast Monitoring against Lightning Fast Outages
Lightning Fast Monitoring against Lightning Fast Outages
 
How and why we got Prometheus working with Docker Swarm
How and why we got Prometheus working with Docker SwarmHow and why we got Prometheus working with Docker Swarm
How and why we got Prometheus working with Docker Swarm
 
WebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck ThreadsWebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck Threads
 
Build your own Service Bus V2
Build your own Service Bus V2Build your own Service Bus V2
Build your own Service Bus V2
 
An empirical comparison of dependency issues in open source software packagin...
An empirical comparison of dependency issues in open source software packagin...An empirical comparison of dependency issues in open source software packagin...
An empirical comparison of dependency issues in open source software packagin...
 
Locking down your Kubernetes cluster with Linkerd
Locking down your Kubernetes cluster with LinkerdLocking down your Kubernetes cluster with Linkerd
Locking down your Kubernetes cluster with Linkerd
 
KubeCon London 2016 Ronana Cloud Native SDN
KubeCon London 2016 Ronana Cloud Native SDNKubeCon London 2016 Ronana Cloud Native SDN
KubeCon London 2016 Ronana Cloud Native SDN
 
How to build a Neutron Plugin (stadium edition)
How to build a Neutron Plugin (stadium edition)How to build a Neutron Plugin (stadium edition)
How to build a Neutron Plugin (stadium edition)
 
Docker casual alpine with nim nimlang 박승환_2016_03
Docker casual alpine with nim nimlang 박승환_2016_03Docker casual alpine with nim nimlang 박승환_2016_03
Docker casual alpine with nim nimlang 박승환_2016_03
 

Destaque

Blue Whale in an Enterprise Pond
Blue Whale in an Enterprise PondBlue Whale in an Enterprise Pond
Blue Whale in an Enterprise PondDigia Plc
 
Using Docker in the Real World
Using Docker in the Real WorldUsing Docker in the Real World
Using Docker in the Real WorldTim Haak
 
Solving Real World Production Problems with Docker
Solving Real World Production Problems with DockerSolving Real World Production Problems with Docker
Solving Real World Production Problems with DockerMarc Campbell
 
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy SystemA Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy Systemadrian_nye
 
Real World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and ProductionReal World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and ProductionBen Hall
 
Real-World Docker: 10 Things We've Learned
Real-World Docker: 10 Things We've Learned  Real-World Docker: 10 Things We've Learned
Real-World Docker: 10 Things We've Learned RightScale
 
Programming the world with Docker
Programming the world with DockerProgramming the world with Docker
Programming the world with DockerPatrick Chanezon
 

Destaque (7)

Blue Whale in an Enterprise Pond
Blue Whale in an Enterprise PondBlue Whale in an Enterprise Pond
Blue Whale in an Enterprise Pond
 
Using Docker in the Real World
Using Docker in the Real WorldUsing Docker in the Real World
Using Docker in the Real World
 
Solving Real World Production Problems with Docker
Solving Real World Production Problems with DockerSolving Real World Production Problems with Docker
Solving Real World Production Problems with Docker
 
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy SystemA Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy System
 
Real World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and ProductionReal World Experience of Running Docker in Development and Production
Real World Experience of Running Docker in Development and Production
 
Real-World Docker: 10 Things We've Learned
Real-World Docker: 10 Things We've Learned  Real-World Docker: 10 Things We've Learned
Real-World Docker: 10 Things We've Learned
 
Programming the world with Docker
Programming the world with DockerProgramming the world with Docker
Programming the world with Docker
 

Semelhante a Lessons learned running large real-world Docker environments

KubeCon EU 2016: Kubernetes meets Finagle for Resilient Microservices
KubeCon EU 2016: Kubernetes meets Finagle for Resilient MicroservicesKubeCon EU 2016: Kubernetes meets Finagle for Resilient Microservices
KubeCon EU 2016: Kubernetes meets Finagle for Resilient MicroservicesKubeAcademy
 
Tokyo azure meetup #12 service fabric internals
Tokyo azure meetup #12   service fabric internalsTokyo azure meetup #12   service fabric internals
Tokyo azure meetup #12 service fabric internalsTokyo Azure Meetup
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?Jagadish Venkatraman
 
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...Masaaki Nakagawa
 
Fasten Industry Meeting with GitHub about Dependancy Management
Fasten Industry Meeting with GitHub about Dependancy ManagementFasten Industry Meeting with GitHub about Dependancy Management
Fasten Industry Meeting with GitHub about Dependancy ManagementFasten Project
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesJosef Adersberger
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesQAware GmbH
 
Orchestrating Linux Containers while tolerating failures
Orchestrating Linux Containers while tolerating failuresOrchestrating Linux Containers while tolerating failures
Orchestrating Linux Containers while tolerating failuresDocker, Inc.
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Remote core locking-Andrea Lombardo
Remote core locking-Andrea LombardoRemote core locking-Andrea Lombardo
Remote core locking-Andrea LombardoAndrea Lombardo
 
Hands on kubernetes_container_orchestration
Hands on kubernetes_container_orchestrationHands on kubernetes_container_orchestration
Hands on kubernetes_container_orchestrationAmir Hossein Sorouri
 
Sample Solution Blueprint
Sample Solution BlueprintSample Solution Blueprint
Sample Solution BlueprintMike Alvarado
 
Tupperware: Containerized Deployment at FB
Tupperware: Containerized Deployment at FBTupperware: Containerized Deployment at FB
Tupperware: Containerized Deployment at FBDocker, Inc.
 
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...Docker, Inc.
 
The Mushroom Cloud Effect - What happens when containers fail?
The Mushroom Cloud Effect - What happens when containers fail?The Mushroom Cloud Effect - What happens when containers fail?
The Mushroom Cloud Effect - What happens when containers fail?Alois Mayr
 
Breaking the Monolith Road to Containers
Breaking the Monolith Road to ContainersBreaking the Monolith Road to Containers
Breaking the Monolith Road to ContainersAmazon Web Services
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...Docker, Inc.
 
Cloud orchestration risks
Cloud orchestration risksCloud orchestration risks
Cloud orchestration risksGlib Pakharenko
 

Semelhante a Lessons learned running large real-world Docker environments (20)

KubeCon EU 2016: Kubernetes meets Finagle for Resilient Microservices
KubeCon EU 2016: Kubernetes meets Finagle for Resilient MicroservicesKubeCon EU 2016: Kubernetes meets Finagle for Resilient Microservices
KubeCon EU 2016: Kubernetes meets Finagle for Resilient Microservices
 
Tokyo azure meetup #12 service fabric internals
Tokyo azure meetup #12   service fabric internalsTokyo azure meetup #12   service fabric internals
Tokyo azure meetup #12 service fabric internals
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?
 
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
 
Fasten Industry Meeting with GitHub about Dependancy Management
Fasten Industry Meeting with GitHub about Dependancy ManagementFasten Industry Meeting with GitHub about Dependancy Management
Fasten Industry Meeting with GitHub about Dependancy Management
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
Patterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to KubernetesPatterns and Pains of Migrating Legacy Applications to Kubernetes
Patterns and Pains of Migrating Legacy Applications to Kubernetes
 
Orchestrating Linux Containers while tolerating failures
Orchestrating Linux Containers while tolerating failuresOrchestrating Linux Containers while tolerating failures
Orchestrating Linux Containers while tolerating failures
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Remote core locking-Andrea Lombardo
Remote core locking-Andrea LombardoRemote core locking-Andrea Lombardo
Remote core locking-Andrea Lombardo
 
Hands on kubernetes_container_orchestration
Hands on kubernetes_container_orchestrationHands on kubernetes_container_orchestration
Hands on kubernetes_container_orchestration
 
Sample Solution Blueprint
Sample Solution BlueprintSample Solution Blueprint
Sample Solution Blueprint
 
4. system models
4. system models4. system models
4. system models
 
Tupperware: Containerized Deployment at FB
Tupperware: Containerized Deployment at FBTupperware: Containerized Deployment at FB
Tupperware: Containerized Deployment at FB
 
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
The Mushroom Cloud Effect or What Happens When Containers Fail? by Alois Mayr...
 
The Mushroom Cloud Effect - What happens when containers fail?
The Mushroom Cloud Effect - What happens when containers fail?The Mushroom Cloud Effect - What happens when containers fail?
The Mushroom Cloud Effect - What happens when containers fail?
 
Breaking the Monolith Road to Containers
Breaking the Monolith Road to ContainersBreaking the Monolith Road to Containers
Breaking the Monolith Road to Containers
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...
 
Cloud orchestration risks
Cloud orchestration risksCloud orchestration risks
Cloud orchestration risks
 

Mais de Alois Mayr

Automated distributed tracing - a first class citizen of monitoring
Automated distributed tracing - a first class citizen of monitoringAutomated distributed tracing - a first class citizen of monitoring
Automated distributed tracing - a first class citizen of monitoringAlois Mayr
 
Monitoring a cloud native platform feature
Monitoring a cloud native platform featureMonitoring a cloud native platform feature
Monitoring a cloud native platform featureAlois Mayr
 
When containers fail
When containers failWhen containers fail
When containers failAlois Mayr
 
Running microservice environments is no free lunch
Running microservice environments is no free lunchRunning microservice environments is no free lunch
Running microservice environments is no free lunchAlois Mayr
 
Managing and Scaling Microservices with Docker in the Wild
Managing and Scaling Microservices with Docker in the WildManaging and Scaling Microservices with Docker in the Wild
Managing and Scaling Microservices with Docker in the WildAlois Mayr
 
Scaling and Monitoring Docker environments
Scaling and Monitoring Docker environmentsScaling and Monitoring Docker environments
Scaling and Monitoring Docker environmentsAlois Mayr
 

Mais de Alois Mayr (6)

Automated distributed tracing - a first class citizen of monitoring
Automated distributed tracing - a first class citizen of monitoringAutomated distributed tracing - a first class citizen of monitoring
Automated distributed tracing - a first class citizen of monitoring
 
Monitoring a cloud native platform feature
Monitoring a cloud native platform featureMonitoring a cloud native platform feature
Monitoring a cloud native platform feature
 
When containers fail
When containers failWhen containers fail
When containers fail
 
Running microservice environments is no free lunch
Running microservice environments is no free lunchRunning microservice environments is no free lunch
Running microservice environments is no free lunch
 
Managing and Scaling Microservices with Docker in the Wild
Managing and Scaling Microservices with Docker in the WildManaging and Scaling Microservices with Docker in the Wild
Managing and Scaling Microservices with Docker in the Wild
 
Scaling and Monitoring Docker environments
Scaling and Monitoring Docker environmentsScaling and Monitoring Docker environments
Scaling and Monitoring Docker environments
 

Último

WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 

Último (20)

WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 

Lessons learned running large real-world Docker environments

  • 1. Lessons learned running large real-world Docker environments Oct 27th 2015 Alois Mayr @mayralois alois.mayr@ruxit.com Dec 3rd 2015
  • 3. What is a “large” environment?
  • 4.
  • 5. Campfire stories #1 – The Death Star of Service Dependencies
  • 6. #1 – Death Star of Service Dependencies Load-balanced service System-wide service dependencies
  • 7. Reverse proxies are essential #1 – The Death Star of Service Dependencies
  • 8. App #1 App #2 App #1 depends on App #2 Where is this specified? Unwanted dependencies break architecture #1 – The Death Star of Service Dependencies
  • 9. Use proper versioning for services, APIs, and images #1 – The Death Star of Service Dependencies
  • 10. Campfire stories #1 – The Death Star of Service Dependencies #2 – The Network Retransmission Episode
  • 11. #2 – The Network Retransmission Episode Retransmissions Retransmissions Retransmissions Retransmissions Retransmissions Retransmissions Retransmissions
  • 12. • Hardware defect in a single network interface card • NIC worked well under low load • Retransmissions only under heavy load • Affected communications to other machines in datacenter • Still not sure about exact defect on NIC What was the problem? #2 – The Network Retransmission Episode
  • 13. #2 – The Network Retransmission Episode
  • 14. Co-locate related containers. Check network infrastructure. #2 – The Network Retransmission Episode
  • 15. Campfire stories #1 – The Death Star of Service Dependencies #2 – The Network Retransmission Episode #3 – The Hungry Container Breakdown
  • 16. #3 – The Hungry Container Breakdown Low disk space Low disk space
  • 17. • Shared /logs partition on host • No log rotation, no archiving for app logs • No proper log management used for Docker environment • Shared /logs partition on a single host ran out of space What was the problem? #3 – The Hungry Container Breakdown
  • 18. • Container health checks failed • Marathon terminated task and rescheduled new one • Still no free space on /logs • Termination and rescheduling • /var/lib/docker ran out of space • Mesos slave unable to run Docker tasks How the problem evolved over time #3 – The Hungry Container Breakdown
  • 19. • Log management tools for app logs, e.g. Fluentd and Logstash --log-driver=none|syslog • Remove container --rm=true • Run Mesos slave with --docker_remove_delay=VALUE How the problem could have been avoided #3 – The Hungry Container Breakdown
  • 20. Use log management tools Empty /var/lib/docker #3 – The Hungry Container Breakdown
  • 21. Campfire stories #1 – The Death Star of Service Dependencies #2 – The Network Retransmission Episode #3 – The Hungry Container Breakdown #4 – The Day Orchestration Stood Still
  • 22. #4 – The Day Orchestration Stood Still Queue and deployment methods are slow
  • 23. • Marathon 0.8.x keeps all versions of applications for recovery (by default) • High frequency of microservices deployments • Slowdown through zk overload What was the problem? #4 – The Day Orchestration Stood Still
  • 24. • Respective parameter (zk_max_versions) was not set to proper limit --zk_max_versions=20 How the problem could have been avoided #4 – The Day Orchestration Stood Still
  • 25. Track orchestration layer performance Separate Mesos clusters #4 – The Day Orchestration Stood Still
  • 26. Campfire stories #1 – The Death Star of Service Dependencies #2 – The Network Retransmission Episode #3 – The Hungry Container Breakdown #4 – The Day Orchestration Stood Still #5 – The Mushroom Cloud Effect
  • 27. #5 – The Mushroom Cloud Effect Way too many components involved 820 BILLION dependencies!
  • 28. • Massive load testing in preparation for Black Friday • Tests ran for 3 days • No impact to real users, only backend services affected • Many components to take into account What was the problem? 174 / 3.4k 22 / 13.3k Service Container Host 1 1..* * 1 #5 – The Mushroom Cloud Effect
  • 29.
  • 30. Automation needed for problem analysis in large environments #5 – The Mushroom Cloud Effect
  • 31. Campfire stories #1 – The Death Star of Service Dependencies #2 – The Network Retransmission Episode #3 – The Hungry Container Breakdown #4 – The Day Orchestration Stood Still #5 – The Mushroom Cloud Effect
  • 32. Free trial - https://ruxit.com/docker-monitoring/ Blog - https://blog.ruxit.com/ @ruxit What lessons have you learned?