Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
MongoDB World 2016: Scaling MongoDB with Docker and cGroups
1. Scaling MongoDB with Docker
and cgroups
Marco Bonezzi
TechnicalServicesEngineer,MongoDB
marco@mongodb.com
@marcobonezzi
2. #MDBW16
About the speaker
I am Marco Bonezzi, TSE at MongoDB
TSE = helping customers with a variety of issues
Based in Dublin, Ireland
Experience in distributed systems, high availability:
3. #MDBW16
How many of you have ever...
1. … manually deployed a MongoDB replica sets or sharded clusters?
2. ... had issues with resource allocation?
3. ... used Docker?
4. … used MongoDB running on Docker?
4. #MDBW16
We know how it feels…
Different
architectures in
Development and
Production
Co-located
MongoDB
processes
Production !=
docker run
mongodb
6. #MDBW16
How to solve this?
Deployment
Using predefined cluster patterns
Replicating environments
Resource Control
Setting limits to key resources
MongoDB & Docker
Create once, deploy everywhere
Deploy patterns, not processes
Orchestration
Resource
Management
Automate for
scaling
7. #MDBW16
About this talk
Patterns for successful deployments
Difference between success and failure
Orchestrating MongoDB with Docker
MongoDB cluster on AWS with containers
Patterns with Swarm and Compose
Managing container resources with cgroups
Benefits of cgroups in a MongoDB cluster
P
S
S
8. #MDBW16
Redundancy and fault tolerance
Deploy an odd number of voting members
Members ó Majority required ó Fault tolerance
High availability and resource colocation
Single member of a replica set / server
Shards as Replica Set
Ideally: primary / secondary / secondary
Deployment patterns: Replica Set and Sharded Clusters
Server 3Server 2Server 1
mongos
Primary
Primary
RS1
SecondarySecondary
Secondary
RS2
Secondary
RS3
Secondary
RS1
Primary
RS2
Secondary
RS3
Secondary
RS1
Secondary
RS2
Primary
RS3
mongos mongos
cfgsvr1 cfgsvr2 cfgsvr3
9. #MDBW16
Docker
• Noun: a person employed in a port to load and unload ships (from “what
is docker” on Google)
Containers:
Isolated process in userspace
Application + dependencies
Shared kernel and libraries
Can run on any infrastructure (or
cloud)
www.docker.com
10. #MDBW16
44%
of orgs adopting
microservices
Why use Docker?
41%
want application
portability
13x
improvement in
release frequency
62%
MTTR on software
issues
60%
Using Docker to
migrate to cloud
Reason to run containers:
SPEED
Microservices
architectures
Efficiency Cloud
(The Docker Survey, 2016)
12. #MDBW16
Orchestrating MongoDB with Docker
How can we use Docker for MongoDB deployments?
How can we deploy these patterns using Docker containers?
Why should we use Docker?
Our recipe:
13. #MDBW16
Docker ecosystem
Provisioning and managing your Dockerized hosts
Native clustering: turns a pool of Docker hosts into a single, virtual
Docker host.
Define a multi-container application with all of its dependencies in
a single file
S
14. #MDBW16
Why Docker Swarm?
5x faster than
Kubernetes to spin up
a new container
7x faster than
Kubernetes to list all
running containers
Evaluating Container Platforms at
Scale
1000 EC2 instances in a cluster
What is their performance at scale?
Can they operate at scale?
What does it take to support them at scale?
https://medium.com/on-docker/evaluating-container-platforms-at-scale-5e7b44d93f2c#.k2fxds8c2
hGps://www.docker.com/survey-2016
16. #MDBW16
Swarm filters to build our patterns
Constraint filters
Mark each mongod container with a label:
“role=mongod”
“replset=rs1”
17. #MDBW16
Affinity filters
Prevent multiple RS members on the same host:
"affinity:replset!=rs1”
swarm-node-1 swarm-node-3swarm-node-2
Affinity filters for container distribution
18. #MDBW16
Road to container success
Deploying containers to the right node is not enough…
Next step: Resource control on each swarm cluster node using cgroups
Maritime New Zealand
19. #MDBW16
Resource control with cgroups and Docker
Simple parameters to add to docker run or compose:
--cpu-shares
--cpuset-cpus
--memory
--blkio-weight
--net
20. #MDBW16
MongoDB Memory usage in 3.2
with WiredTiger
MongoDB Memory:
mongod process: connections, aggregations, mapReduce, etc
WiredTiger cache: (0.6 x total memory) – 1 GB
Total = mem(mongod) + mem(WiredTiger cache)
WiredTiger cache
mongod
mongod memory
21. #MDBW16
cgroup!
memory_limit!
Process memory with containers and cgroups
WiredTiger cache
mongod
mongod memory
total memory (seen from mongod process)!
Inside the container
• Can see total memory and not
memory limit
WiredTiger cache:
• memory_limit *0.6
25. #MDBW16
Understanding resource usage:
• docker top rs1a!
• docker stats rs1a!
Container stats available via Docker remote API:
GET /containers/(id)/stats
Also available from docker-py:
http://docker-py.readthedocs.org/en/latest/api/#stats
Resource usage with Docker
26. #MDBW16
Resource usage with Docker
Multiple statistics for each container:
Memory limit and usage, CPU (per core level), Network, Disk
Useful to combine with MongoDB metrics (like db.serverStatus())
28. #MDBW16
Creating a Swarm cluster on AWS to deploy MongoDB
DEMO!
Configure docker-machine
with ec2 driver (AWS)
Deploy discovery service
for Swarm Master
Deploy AWS instances for:
• Swarm master
• Swarm worker nodes
Connect to the Swarm
master
Define compose file for
deployment
Define Swarm filters and
constraints and cgroup
limits
Deploy the environment
with a single command
using the compose file
Configure our MongoDB
sharded cluster using
Cloud Manager API
31. #MDBW16
Advantages of using MongoDB with Docker
Speed: testing and deploying cluster patterns easily
Build once, deploy everywhere
Control: Resource control and utilization
Key to success with containers
Agility: Microservices architectures
Making change less expensive
Flexibility: Multi vendor cloud opportunities
AWS, Azure, Google, IBM, CloudFoundry
P
S
S
32. #MDBW16
How successful customers use MongoDB with Docker
• Case Studies @ hGps://www.mongodb.com/blog
• Whitepaper:
“Enabling Microservices – Containers & Orchestration Explained”
https://www.mongodb.com/collateral/microservices-containers-and-orchestration-explained
33. #MDBW16
Now it’s YOUR turn
Share with us your use case of MongoDB & Docker:
http://bit.do/DockerMongoDB
@marcobonezzi
You can actually try this at home:
https://github.com/sisteming/mongo-swarm
Thank you everybody for being here today for this talk. It is really exciting to be speaking today about these two technologies, Docker and MongoDB,
We get many questions on how to use both of these technologies successfully and in this talk I will show you some how running our MongoDB Clusters on Docker containers can be really useful in some situations
Let me introduce myself, my name is Marco Bonezzi, I’m a TSE (or Technical Services Engineer) at MongoDB.
What TSE really means is that we help customers to be successfull with MongoDB by assisting them with a variety of issues.
I am based in Dublin, in the sunny Ireland
My main experience is in databases, distributed systems and high availability solutions with different database technologies
Before we start, I’d like to ask you if you (or maybe not you but some friend or someone else at a different company you may know about) have ever been in one the situations we are about to see:
So let’s start with some questions:
For the first one, you are all here at MongoDB World so I’m pretty sure that you all have manually deployed a MongoDB replica set or shurded cluster
Who heard of issues with resource allocation when running multiple mongod processes on the same server?
In terms of your docker experience, how many of you are using Docker in production for the last year or two?
Ok, that’s great. So this one is for you: How many of you are running MongoDB on Docker?
Well, this is interesting and some of you might be familiar with the following situations and hopefully you will all learn some tricks to improve your MongoDB deployment running on Docker
If you have seen these issues closer than what you liked, it’s fine. Some of our customers have had these issues.
We know that sometimes our issues come from having different architectures in development and production.
Or also just by adding an extra MongoDB process on the same server, that starts consuming resources dedicates to our main process.
And generally when we work with our customers, we check that they apply our production notes. In many cases, we hear that they will deploy their cluster if not already deployed on Docker containers.
This is really great, as this will give us the opportunity to help them understanding the key conceptoss to successfully implement these technologies.
KISS
3 – THIS IS OUR STORY
Based on our interaction with different customers, we identified several paint points and today I will show you how Docker can be really useful for our MongoDB deployments
Deployment
Deploying the architectures for our clusters is time consuming
Sometimes this also affect how we deploy our clusters for example in terms of high availability
Resource Contention
It’s great to add more and more processes to our servers, but what about resource contention with multiple instances?
Docker
How do we setup MongoDB on Docker? How do we setup a replica set between containers? Where is my data? Can I access the mongod log file as I currently do?
3 – THIS IS OUR STORY
So how can we solve these issues?
We will see how we can deploy faster our replica set or sharded clusters, making sure that we use recommended highly available patterns by using orchestration
Using resource management is the key for successful and reliable architectures, so that we will avoid obscure issues and we can have almost predictable resource utilization
Also by automating our MongoDB deployments with Docker we can make sure that we have a scalable configuration and that we deploy patterns and not just independent processes
AN IMPORTANT POINT TO BEAR IN MIND IS ALSO THE TIME SAVED WITH THIS APPROACH COMPARED TO MORE TRADITIONAL TOOLS OR DEVELOPMENT TECHNIQUES
SHOULD BE HERE AT 5 MINUTES
In this talk today, I will show you briefly how to define highly available patterns for our MongoDB deployments
Once we define these patterns, we’ll see how we can orchestrate our MongoDB containers to build these patterns by using Docker Swarm and compose on AWS
Lastly, we will see how to use the cgroup implementation in Docker and the key points to define limits for memory or cpu for each MongoDB container of our cluster
These three ingredients will help you not only to implement your MongoDB deployment with docker containers, but to deploy successful patterns that can be configured to your needs also in terms of resource utilisation
When speaking about replica sets, and this is a general MongoDB best practice, we generally suggest a primary secondary secondary configuration as recommended pattern.
If we have two secondaries and the primary replicating all operations to them, so that we have two copies of our data and in case of losing one member, we still have a replicated copy.
We always recommend having an odd number of members and this is strictly related to the fault tolerance: With more members, higher majority and higher resiliency of our environment.
It is also essential to understand why we want to have each replica set member on a different server or instances. Losing one of the servers won’t mean that our replica set will be unavailable.
5 – THIS IS WHAT WE REALIZED
So probably most of you already know and use Docker, but in case there is someone that does not, I’ll quickly introduce it.
If you ask Google – what is Docker – Google will tell you that Docker is a person employed in a port to load and unload ships. While this might seem unrelated, it is quite related as our ship will be our servers, and Docker will allow us to load and unload containers on them.
So these containers are basically be isolated processes in the userspace, they have a shared kernel, and in these isolated process we can deploy applications and their dependencies. The idea is that we can containarize a single application, including its dependencies, and this will allow us to run them everywhere – we just need to have a docker daemon running, whether in our laptop, in a cloud infrastructure like AWS or Azure or even on our Rasperry Pi.
5 – THIS IS WHAT WE REALIZED
So probably most of you already know and use Docker, but in case there is someone that does not, I’ll quickly introduce it.
If you ask Google – what is Docker – Google will tell you that Docker is a person employed in a port to load and unload ships. While this might seem unrelated, it is quite related as our ship will be our servers, and Docker will allow us to load and unload containers on them.
So these containers are basically be isolated processes in the userspace, they have a shared kernel, and in these isolated process we can deploy applications and their dependencies. The idea is that we can containarize a single application, including its dependencies, and this will allow us to run them everywhere – we just need to have a docker daemon running, whether in our laptop, in a cloud infrastructure like AWS or Azure or even on our Rasperry Pi.
4 – UNTIL WHEN SOME OF OUR SUCCESSFUL CUSTOMERS DISCOVER DOCKER
The good news is that Docker can help us with this! So for example we can easily coordinate our containers to deploy a recommended highly available pattern, and we can then size each container and therefore its instance and each cluster to avoid any resource allocation issue
So now we know how a highly available MongoDB pattern looks like and we will dive into how we can use Docker to implement these patterns.
As we will see shortly, in our recipe today we will run MongoDB on Docker containers, and we will use Docker compose, docker-machine, swarm and the Cloud Manager API to orchestrate and automate the deployment of a sharded cluster
Generally, what we mean by Docker is refered to docker daemon, but Docker has an ecosystem of tools to help us manage containers
So for example with docker-machine we can provision and manage our containers under different providers, like AWS, virtualbox or others,
We then have docker swarm that provide us with a clustering solution to have multiple nodes running docker daemon. And then we can have filters and rules to orchestrate the deployment of the containers into each node of the cluster.
Docker compose instead can be used to define our patterns and multi-container deployments just with a YAML description. This way we can define services for replica sets or sharded cluster and easily deploy this type of deployments - i.e. Deploy a replica set with a single command
All these tools have in common the use of Docker API, so any tool that works with Docker can use Swarm to scale to multiple hosts
One of the interesting things about docker swarm is the possibility of having multi-host networking.
As we have containers in each of the swarm nodes, they need to be able to connect with each other. For this reason, Swarm automatically creates a overlay container-to-container network.
The underliying technology is based in the docker swarm master and its service discovery (in this example we are using consul as a discovery service)
With this, by refering to the hostname associated with each container, we can reach the containers located in a different swarm node. This is a key concept for deploying our MongoDB containers in a swarm cluster, but once understood is really easy to use and connect all the containers with each other.
As previously mentioned, Swarm also offers interesting features to orchestrate and schedule the deployment of containers to the swarm cluster nodes.
There are different type of filters that can be used, we can for instance establish some constraints or affinity rules.
In this slide we can see how we will be marking each mongod container with a label, so that it will say that this is a mongod process part of replica set rs1.
Once each container is marked, we will see how to use this label.
Based on the label previously defined, we will define affinity filters to ensure that we will only have at most one container from replica set rs1 in the same swarm node. So the effect of this rule will be that with 3 containers for a replica set, each will deploy to a different node
Now we know how a recommended pattern looks like and how we can make sure that we follow these patterns and rules when deploying Docker containers to a swarm cluster.
So of course deploying containers to the right node is very important, so that we ensure their availability, but in addition to this, for a successful deployment the next step is to use Resource Management on each swarm node by using the Docker implementation of cgroups.
Resource controlwith Docker and cgroups is quite easy, it can be done just from the docker run command or also defining the limits in the compose file.
The idea for using cgroups is that thanks to the cgroups feature included in the Linux Kernel, we can define limits for the key resources of a process, like memory, CPU, network, or disk.
As you probably know, MongoDB 3.2 with the WiredTiger storage engine uses a memory region called WiredTiger cache. Other than this region, we also have a region outside of the WiredTiger cache that includes the kernel filesystem cache plus and it is used for other elements required by the mongod process like connections, aggregations or mapreduce operations.
Based on this, the total memory for the mongod process running WiredTiger is the sum of the WiredTiger cache, whether defined by default or manually, plus the memory required for the mongodb process.
While this is a general overview of the memory usage with WiredTiger, there are some additional caveats to consider if running MongoDB on Docker containers.
As mentioned in the previous slide, with WiredTiger we have the WiredTiger cache which is used for compressed indexes and uncompressed data.
By default, the cache is set to 60% - 1GB of the total memory on the server. However, when running in a container, the mongod process (and any process) does not have an easy way to check a memory limit defined for a container.
As the process only reads the total memory on the server, is really important to set manually the WiredTiger cache for each container according to the memory limit defined for a container, so that it will always be at most 60% of the container limit.
In terms of memory limit, with the memory option (or memory limit in compose) we can define a memory limit for each container.
This is really important mainly for two reasons:
Docker is often used in larger servers with large volumes of RAM, so we want to correctly size the environment to avoid any resource contention issue
If running MongoDB with WiredTiger in a docker container, we need to set the WiredTiger cache to 60% of the memory limit as previously mentioned
To see this in a more visual way, here we have a server or our swarm node, and we will have:
Memory regiond left available for the OS memory
The rest of the available memory would be splitted into smaller memory sub regions for each container, as the idea is that we definit the memory limit to each container. Then based on this, we will define the WiredTiger cache to 60% of the size of the memory limit previously defined for the container.
As you can see, this is a sizing exercise that may require some planning but in the long run this will ensure the stability and correct resource utilisation for our MongoDB cluster.
This is also a visual representation of how we can pin different containers to specific nodes, so we could have each mongod associated to one core, and the same for mongos or config servers.
Lastly, as we speak of resource management and key metrics like CPU or memory, it is also important to understand how to monitor these resources when running on Docker.
Docker has some commands like docker top or stats which will show us the command line or other stats like the CPU % or the memory usage and memory limit.
Additionally, we can also use the Docker remote API to get these metrics or we can also use other clients or libraries as for instance docker-py, which allow us to retrieve these stats into python and work with them from there.
DEMO GRAPHS
Lastly, as we speak of resource management and key metrics like CPU or memory, it is also important to understand how to monitor these resources when running on Docker.
Docker has some commands like docker top or stats which will show us the command line or other stats like the CPU % or the memory usage and memory limit.
Additionally, we can also use the Docker remote API to get these metrics or we can also use other clients or libraries as for instance docker-py, which allow us to retrieve these stats into python and work with them from there.
DEMO GRAPHS
To be able to easily deploy our MongoDB patters with Docker, we will use a Swarm cluster
In this slide we can see the architecture of the example we are using today in this talk.
This is a simplified version where we have:
A swarm manager node, that manages the swarm cluster and is responsible to schedule where each container will be deployed and will check any rules or filters defined.
We will have then a number of swarm nodes, that will have already the docker daemon installed so that we can start deploying our MongoDB containers straight away.
These four elements are all deployed using docker-machine, that allow us create these instances on AWS just by running the docker-machine command from our terminal. Docker-machine will take care of setting up ssh keys, deploy the instance type we define and it will have Ubuntu (or other distribution) with the Docker daemon already installed.
Once we have this architecture deployed, we will define the pattern and configuration of our MongoDB deployment using compose yml files, and through the docker command we will run them against the swarm manager. This will then schedule each container according to any filter defined and in the end each swarm node will be running a number of MongoDB containers.
SWAAARM
SHOULD BE AT 20 min here
In summary, creating a swarm cluster on AWS to deploy our MongoDB cluster is not that complicated and can be done in a small series of steps:
With using docker-machine and the ecd driver
Deploy service discovery (consul) and the docker swarm master instance
Deploy mutiple swarm workes nodes
Connect to the swarm master with docker tools
Use compose and swarm to define affinity filters and constraints
Deploy our containers
Configure our cluster with Cloud Manager API
I will now show this in more detail with a small demo. The files I am using during this demo are publicly available and you can actually try this at home.
Firstly we use docker-machine with the ec2driver to enable docker-machine to create aws instances.
With this, we will create a discovery service container that will be then use by the Swarm master. Once we deploy an instance for the swarm master, we will also create 3 or more swarm worker nodes.
Once we have these nodes, we can then connect to the swarm master, which is the main point to interact with our Docker cluster. We can define filters and constraints in the docker compose description and once we have this, we can deploy all the containers across the swarm cluster.
At this point we only need to enable the sharding or replica set configuration, and this can be done by using manual scripts to automate this task or as in this case, we can use the Cloud Manager API to enable our MongoDB cluster running on Docker in Cloud Manager with all the benefits of monitoring or automation.
What I will show you now is an example of a swarm cluster up and running. We have our worker nodes and we will deploy mongodb containers to them, to build a sharded cluster. We will see how the compose files are defined, how we can define labels and use them to build our cluster. Then we will see how running a
In summary, what are advantages of using MongoDB on Docker for you? As we have seen today like having predefined patterns of deployment, so that we can deploy reliable architectures and clusters easily just as a single element.
Docker allow us also to deploy faster, fail faster, and test and deploy complex architectures like an issue occurred in production. For instance, some capabilities from Docker cgroups could be quite interesting if trying to simulate a specific configuration when one of the containers disks is way slower.
Additionally, the resource management possibilites with Docker and cgroups is easier to implement that just a plain cgroup configuration, and this will also encourage more people
6 – THESE ARE THE ADVANTAGES FOR YOU
In summary, these are some recommendations from our side on why our Docker can be really useful for some cases.
By our parntership with different customers enabling these technologies, you can now also learn how succesful customers are using Docker for MongoDB in their environments.