To Build My Own Cloud with Blackjack…

To Build My Own
Cloud with
Blackjack…
How to manage large Docker infrastructure
easy

Docker and infrastructure
efforts
Platform as a Service Requests

Docker for .NET
EXPECTATION REALITY

How resource
request works
DEV
Dev Team asked to create
separated Kafka instance

How resource
request works
DEV
IT
• request IT to provide Linux
VM
• configure access

How resource
request works
DEV
IT
Infra Team
• request IT to provide Linux
VM
• configure access
• request Infrastructure
Team to setup Kafka
• add monitoring
• add healthcheck

Maintain
infrastructure
manually
Manual resource assignment
has lot of complexity and
require some operation flow
to be proceed.

Maintain
infrastructure
manually
Request to deploy
components arrived from
different sources and requires
unique approach.
DEV
Infra Team
DEV
DEV
DEV
DEV
DEV
DEV

Maintain
infrastructure
manually
Deployment of each
component requires unique
deployment activities and
scripts.
Infra Team

Maintain
infrastructure
manually
Using Docker allow to
standardize deployment
process to docker allowed
interface.
Docker Image for each of
components need to be
prepared and preconfigured
for our needs.
Infra Team

Maintain
infrastructure
manually
Cloud Engineers has lot of
good examples how
component deployment
automation should looks like.
Next step was to provide
simple user experience to
maintain this process just
simply stupid.

AWS
Management
Console
Experience
• Browser based
• Clear configuration and
deployment
• Simple scaling
• Build-in monitoring
• Services catalog
• Self documented

It’s all about Jenkins
The short story how we started to use Mesos

Jenkins for
CI/CD
• ~ 300 repositories
• ~ 500 builds per day
• ~ 1 build per minute
• ~ 40 windows slaves

Jenkins for
CI/CD
It requires some additional
interfaces to provide and
process information for
developers and build
engineers.

Jenkins for
CI/CD
The common infrastructure
includes:
• Elastic + Kibana
• Go APIs
• Angular SPAs
• MySQL and Redis
• Zabbix for Jenkins slaves
monitoring
• SonarQube

Jenkins for
CI/CD
But time-to-time some of
components became broken
state and affects the whole
system

Jenkins for
CI/CD
The most issues was:
• Out of free disk space
• Service fault
• Network outage
• VM shutdown
• CPU/MEM overload
• Deployed package is
broken

Jenkins for
CI/CD
And here is a moment, You
realize that having lot of pets
in your datacenter may be
not a good idea.

Jenkins for
CI/CD
The good news – Simpsons
already did it.

Jenkins for
CI/CD
Requirements to infrastructure management:
• Need to be distributed
• Supports health-checks
• Docker based
• Easy deploy and maintain
• Can scale
• Has persistence storage support
• To be user friendly
• Be self-recoverable

DC/OS overview
Short story about evolution from Mesos to DC/OS

DC/OS
Overview
• Physical master nodes
cluster
• Any number of worker
nodes
Masters
Node Node Node

Master-Slave
Architecture
Master Node:
• ZooKeeper
• Marathon
• Mesos-master
Worker Node
• Mesos-slave
Marathon + Zookeeper
Mesos
Mesos
Mesos

Master-Slave
Architecture
Mesos-slave has frameworks,
that allow to run locally
different Docker images and
apps.
Mesos
Mesos
Mesos

Resource
Management
in Mesos
Mesos is responsible for
running tasks and resource
management at the nodes.
When starting new Task,
Mesos Master will
automatically assign the Node
depending on recourse usage.
Mesos
Mesos
Mesos

Resource
Management
in Mesos
When Agent Node initialized
and Mesos Slave started, it
analyzes available resources and
connect to cluster with
predefined resource pool.
A: 4CPU, 8Gb RAM, 10 Gb HDD
B: 2CPU, 4Gb RAM, 10 Gb HDD
workerA
CPU MEM
8 Gb
HDD
4
10 Gb
workerB
CPU MEM
4 Gb
HDD
2
10 Gb

Resource
Management
in Mesos
New service can be started
only with predefined value of
requested resources.
To change this value – full
service redeployment
needed.
workerA
CPU MEM
8 Gb
HDD
4
10 Gb
workerB
CPU MEM
4 Gb
HDD
2
10 Gb
2 CPU
6 GB RAM
5 Gb HDD

Resource
Management
in Mesos
When new deployment
starts, task will be started on
random Agent Node with free
resource available.
In case of lack of resources,
deployment will be put on
Pending state.
workerA
8 Gb4
10 Gb
workerB
CPU MEM
4 Gb
HDD
2
10 Gb
2 CPU
6 GB RAM
5 Gb HDD

Resource
Management
in Mesos
During service scaling,
resource request and
assignment operations will be
proceed for each instance
separately.
workerA
8 Gb4
10 Gb
workerB
CPU MEM
4 Gb
HDD
2
10 Gb
1 CPU
1 GB RAM
0 Gb HDD
2 x

Resource
Management
in Mesos
The killing feature of DC/OS is
better resources utilization.
Each separated machine
potentially can contain more
then one component to be
installed.
Health check, deployment,
distribution and resources
management will be proceed
on DC/OS (Mesos) side.
workerA
8 Gb4
10 Gb
workerB
CPU MEM
4 Gb
HDD
2
10 Gb
1 CPU
1 GB RAM
0 Gb HDD
2 x

Resource
Management
in Mesos
DC/OS takes care about Agent
Nodes health check and each
running service instance
health check.
In case of unhealthy state,
service or all services in
broken node will be
redeployed at another nodes.
workerA
8 Gb4
10 Gb
workerB
CPU MEM
4 Gb
HDD
2
10 Gb
2

Each Node
has resource
capacity

Each service
has
declarative
resource
usage

DC/OS
display list of
services with
healthcheck

Service can
contains
more than
one task

Summary
• DCOS is all about resources usage
• Each Agent Node has declared resources scope
• Each Service has declared resource request
• DCOS deploy new tasks only in case of free
resources available
• Agent Node will be shared between services in
case of free resources
• Each Service or Agent Node has it’s own health-
check – in case of failed state task will be
redeployed
• Bonus: in case of master failure, all Agent Nodes
and deployed Service will continue working

Hiding Pets behind the Cattles
Managing hardware for Docker orchestration private cloud

DC/OS
Deployment-
Bootstrap Node
The bootstrap node is only
used during the installation
and upgrade process, so
there are no specific
recommendations for high
performance storage or
separated mount points.
Bootstrap Node
Master
Master
Master

DC/OS
Deployment-
Master Nodes
Master nodes should be
joined to HA cluster
Bootstrap Node
Master
Master
Master
cluster
Minimum Recommended
Nodes 1* 3 or 5
Processor 4 cores 4 cores
Memory 32 GB RAM 32 GB RAM
Hard disk 120 GB 120 GB

DC/OS
Deployment –
Agent Nodes
Agent Nodes is worker nodes
with tasks and services
running.
It support Docker or any
other Mesos runtime.
Bootstrap Node
Master
Master
Master
cluster

DC/OS
Deployment -
Mesos
After startup, agent nodes
connect to DC/OS leader and
provide number or available
resources.
Master
Master
Master
cluster

DC/OS
Deployment -
Services
Agent nodes start requested
services:
DC/OS Service
Marathon Task
Mesos Task
Docker
Master
Master
Master
cluster

Node
contains at
least 22
services

How we started
• Vagrant + Virtual Box
• Mini PC
• 8 TVs with Mini PC
• Ubuntu 16.04
• Daily usage – for Scrum and Monitoring
Dashboards
• 8 * 2 CPU = 16 CPU
• 8 * 4 Gb RAM = 32 GB RAM
• PC
• VMWare
• Google Cloud

DC/OS Initial
Setup
We started from the simple
one-master node
configuration and one slave
node, dedicated for services
deployments.
Elastic was the first try – it
aggregates lot of logs, and
time to time goes to broken
state.
Master
cluster
DevOps

DC/OS Initial
Setup – first
customer
The schema was very simple:
We need some service to be
running – we request
machine for this service in
VMWare and add it as DC/OS
Agent.
At backstage – all VMs
became a DC/OS nodes.
Master
cluster
DevOps
4 8 Gb

DC/OS Initial
Setup – TV
boxes
The best start was to use TV
boxes for scrum meetings in
all office rooms.
It gives lot of free resources
just for fun.
10 * 2 CPU = 20 CPU
10 * 4Gb = 40 Gb
Master
cluster
24 48 Gb

DC/OS Initial
Setup – internal
services
Such setup allow us to run all
services requires for internal
needs of infrastructure
teams.
.. and a little more like bots
for Slack, Sonar, etc.
24 48 Gb
Slack Bot

DC/OS Initial
Setup – issues
24 48 Gb
The main issues on this stage were:
• Master Node performance
Master nodes had lack of resources, which causes
often DC/OS UI failures or Marathon failures.
Temporary solution – master node restart.
• Agent Nodes failures
Out of free disk space, machine shutdown, CPU
high load, out of memory – the most common
reason of failures.

DC/OS Initial
Setup – Cluster
With number of Agents
greater than 1 single master
became the gap:
• In case of failure – system
goes down
• Lack of performance
Master
cluster
Master
Master
24 48 Gb

DC/OS Initial
Setup – more
VMs
With current setup, DC/OS
became ready for consuming
external requests.
Master
cluster
Master
Master
Dev
40 60 Gb

DC/OS Initial
Setup –
Hardware PCs
Few dedicated PCs became a
cluster member in worker
Agent role.
Master
cluster
Master
Master
DevTeam
60 90 Gb

DC/OS Initial
Setup – Self-
Bootstrap
To allow setup new nodes
automatically without need
to login, Chef was added.
As part of DC/OS Services,
Chef allow to Bootstrap new
nodes muck more quickly.
Master
cluster
Master
Master
60 90 Gb

DC/OS Initial
Setup –
Google Cloud
Creating scaling group in the
cloud makes a DC/OS cluster
unlimited by resources.
Master
cluster
Master
Master
Unit
100 160 Gb

Summary
• Number of nodes was increased eventually
• Infra Team used DCOS for own needs only for
the first time
• To monitor and bootstrap the cluster some
additional resources were required: Zabbix and
Chef
• Different type of nodes allow to increase
flexibility and give positive grow speed.
• Adding Google Cloud instance eliminated
cluster size limit. With hybrid cloud DC/OS can
grow much quickly.

Docker as a Service.
Available Docker Images and Use Cases for shared infrastructure cluster

Who are You –
Mr. Microservice?
Meet Mr. Microservice. It was
written on .NET Standard/C#,
self-hostable and http-rest
friendly.
Mr. Microservice

Who are You –
Mr. Microservice?
Mr. Microservice is using
Consul for service discovery.
To find his friend and call
them in HA way it is using
Fabio load balancer.
Mr. Microservice
Consul
Fabio-lb
discovery

Who are You –
Mr. Microservice?
Some of Mr. Microservice old
friends like to talk using ICQ
RabbitMQ (using RPC).
Mr. Microservice
Consul
Fabio-lb
eventsdiscovery

Who are You –
Mr. Microservice?
Sending logs in High Load
system is real art.
Kafka will help us to make this
bicycle much more simpler
and stable.
Unfortunately, users will not
be happy to search log
records with kafka-
consumer.sh. Elastic and
Kibana are correct tools to
have deal with logs.
Mr. Microservice
Consul
Fabio-lb
logs
eventsdiscovery

Who are You –
Mr. Microservice?
In large distributed system it
will be hard to make proper
scale decisions without
knowing some system
internals.
Collecting and analyzing
metrics will be the best
approach to collect runtime
data continuously.
InfluxDB and Grafana is one
of popular tools for this.
Mr. Microservice
Consul
Fabio-lb
logs
metrics
eventsdiscovery

Who are You –
Mr. Microservice?
Zipkin is great tool to send
traces in runtime according
MS configuration.
Mr. Microservice
Consul
Fabio-lb
logs
metrics
trace
eventsdiscovery

Who are You –
Mr. Microservice?
Distributed cache is
something we are really
needed and in High Load
system.
Aerospike is one of possible
solutions.
Mr. Microservice
Consul
Fabio-lb
logs
metrics
trace
cache
eventsdiscovery

Who are You –
Mr. Microservice?
Oh, your Mr. Microservice has
it’s own state ?
Greetings, here is your
Mongo DB MySQL PostgreSQL
Cassandra!
Mr. Microservice
Consul
Fabio-lb
logs
metrics
trace
cache
state
eventsdiscovery

Who are You –
Mr. Microservice?
Hash Cop vault is good
enough for securing
configuration settings, but
also requires some initial
infrastructure setup.
Mr. Microservice
Consul
Fabio-lb
logs
metrics
trace
cache
state
eventsdiscovery
config

Who are You –
Mr. Microservice?
But sometimes it’s something
specific even for experienced
DevOps engineers.
Mr. Microservice
Consul
Fabio-lb
logs
metrics
trace
cache
state
eventsdiscovery
config

So what is Dev ENV ?
EXPECTATION REALITY

Solution 1 – add
services quickly
Service catalog allows to
chose service from
predefined list and deploy in
one click.
If needed – own repository
can be added.

Solution 1 – add
services quickly
For all rest cases – Service
manual deployment are
available
1. Single Container (Docker)
2. Bash runtime
3. Multi-container

Solution 2 –
Folder Structure
Services can be organized as
folder structure.
This feature allows to isolate
environments for different
dev teams.

Solution 3 –
Mesos DNS vs
Marathon LB
Mesos DNS, integrated with
company DNS server, allow to
access each service directly
by Agent IP/port.
Marathon-LB is l4 load
balancer that allows to route
requests to target Agent
IP/port through HA Proxy.

DNS and LB
Services access
Same services can be
deployed with different
names and different nested
levels.
DC/OS will took care to assign
ports automatically.
192.168.101.21 192.168.101.26
Services
devteam1
devteam2
consul | 192.168.101.21:15444
consul | 192.168.101.26:15327
Marathon-LB | 192.168.101.26:15121
AgentA
AgentB
consul | 192.168.101.21:15632

DNS and LB
Services access
192.168.101.21 192.168.101.26
Services
devteam1
devteam2
consul | 192.168.101.21:15444
consul | 192.168.101.26:15327
Marathon-LB | 192.168.101.26:15121
AgentA
AgentB
consul | 192.168.101.21:15632
dig consul.marathon.mesos
It’s easy to integrate Mesos
DNS with your company DNS
servers.

DNS and LB
Services access
192.168.101.21 192.168.101.26
Services
devteam1
devteam2
consul | 192.168.101.21:15444
consul | 192.168.101.26:15327
Marathon-LB | 192.168.101.26:15121
AgentA
AgentB
consul | 192.168.101.21:15632
dig consul.devteam2.marathon.mesos
All running services launched on
DC/OS get an FQDN based upon
the service that launched it, in
the form <service-name>.<group-
name>.<framework-
name>.mesos

DNS and LB
Services access
192.168.101.21 192.168.101.26
Services
devteam1
devteam2
consul | 192.168.101.21:15444
consul | 192.168.101.26:15327
Marathon-LB | 192.168.101.26:15121
AgentA
AgentB
consul | 192.168.101.21:15632
curl consul.marathon.mesos:15632
But to access service using
TCP or HTTP, client need to
know IP address of running
service.
IP address can be assigned
automatically (default) or
assigned manually.

DNS and LB
Services access
192.168.101.21 192.168.101.26
Services
devteam1
devteam2
consul | 192.168.101.21:15444
consul | 192.168.101.26:15327
Marathon-LB | 192.168.101.26:15121
AgentA
AgentB
consul | 192.168.101.21:15632
curl consul.devteam2.marathon.mesos:15327
Current approach has one
significant limitation – You
always should care about
ports by yourself.

DNS and LB
Services access
192.168.101.21 192.168.101.26
Services
devteam1
devteam2
consul | 192.168.101.21:15444
consul | 192.168.101.26:15327
Marathon-LB | 192.168.101.26:15121
AgentA
AgentB
consul | 192.168.101.21:15632
dig consul.devteam1.company.local
Another approach is to use
internal Marathon-LB and
manage each service
manually on DNS server
192.168.101.26
consul.devteam1.company.local
DNS

DNS and LB
Services access
192.168.101.21 192.168.101.26
Services
devteam1
devteam2
consul | 192.168.101.21:15444
consul | 192.168.101.26:15327
Marathon-LB | 192.168.101.26:15121
AgentA
AgentB
consul | 192.168.101.21:15632
curl consul.devteam1.company.local:80
In this case, Marathon-LB will
take care about port
redirecting automatically.
192.168.101.26
consul.devteam1.company.local
DNS
15444

Summary
• DC/OS allows to build complex DEV/UAT
environments bases on Docker infrastructure
• The simplest way of deployment – Universe
Catalog with well known services deployed in
one click.
• Each service can be placed to separated folder.
• Mesos DNS include full folder structure in
service DNS name.
• Marathon-LB allows to proxy any external call
thought HA Proxy to target service instance
(transforming IP and Port)

To Build My Own Cloud with Blackjack…

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a To Build My Own Cloud with Blackjack…

Semelhante a To Build My Own Cloud with Blackjack… (20)

Último

Último (20)

To Build My Own Cloud with Blackjack…