SlideShare uma empresa Scribd logo
1 de 45
The Expendables
Maxime Fouilleul
Lead Database Engineer
Scalability via Expendable
Resources: Containers at
BlaBlaCar
M|18, Feb 27, 2018
Today’s
agenda
BlaBlaCar - Facts & Figures
Infrastructure Ecosystem - 100% containers powered carpooling
Backend High Availability Pillars - MariaDB as an example
Database as a Service - Building a frictionless infrastructure
What’s next?
BlaBlaCar
Facts & Figures
60 million
members
Founded
in 2006
1 million tonnes
less CO2
In the past year
30 million mobile
app downloads
iPhone and Android
5 million
monthly travellers
Currently in
22 countriesFrance, Spain, UK, Italy, Poland, Hungary, Croatia, Serbia, Romania,
Germany, Belgium, India, Mexico, The Netherlands, Luxembourg,
Portugal, Ukraine, Czech Republic, Slovakia, Russia, Brazil and Turkey.
Facts and Figures
MariaDB Cassandra Redis PostgreSQL
Transactional
20 clusters
55 nodes
40K reads/s
Our prod data ecosystem
ElasticSearch
Distributed
6 clusters
32 nodes
3K reads/s
Volatile
17 clusters
51 nodes
40K reads/s
Search
11 clusters
65 nodes
1K searches/s
Spatial
4 clusters
14 nodes
3K reads/s
Infrastructure Ecosystem
100% containers powered
carpooling
Infrastructure Ecosystem
bare-metal servers
1 type of
hardware
3 disk profiles
fleet cluster
CoreOS
fleet etcd“Distributed init system”
Hardware
Container Registry
ggn
dgr
Service Codebase
rkt PODs
build
run
store
host
create mysqld
monitoring
nerve
mysql-main1
php
nginx
nerve
monitoring
synapse
front1
synapse
nerve
zookeeper Service Discovery
backend pod
client pod
Service Discovery
/database/node1
go-nerve does health checks
and reports to zookeeper in
service keys
node1
/database
Applications hit their local
haproxy to access backends
go-synapse watches
zookeeper service keys and
reloads haproxy if changes are
detected
HAProxy
go-nerve
Zookeeper
go-synapse
Backend High Availability Pillars
MariaDB as an example
Abolish Slavery
Everyone's the same
Asynchronous vs. Synchronous
Master
Slave Slave Slave
wsrep wsrep wsrep wsrep
MariaDB Cluster
wsrep
MariaDB Cluster means
No Single Point of
Failure
No Replication Lag
Auto States Transfers
As fast as the slowest
MySQL at BlaBlaCar?
wsrep wsrep wsrep wsrep
MariaDB Cluster
wsrep
MariaDB Cluster
Our prerequisites are
Containers
Writes go on one
node
Writes
Reads are balanced
on the others
Reads
# zookeepercli -c lsr /services/mysql/main
mysql-main1_192.168.1.2_ba0f1f8b
mysql-main2_192.168.1.3_734d63da
mysql-main3_192.168.1.4_dde45787
# zookeepercli -c get /services/mysql/main/mysql-
main1_192.168.1.2_ba0f1f8b3
{
"available":true,
"host":"192.168.1.2",
"port":3306,
"name":"mysql-main1",
"weight":255,
"labels":{
"host":"r10-srv4"
}
}
# cat env/prod-dc1/services/mysql-main/attributes/nerve.yml
---
override:
nerve:
services:
- name: "mysql-main"
port: 3306
reporters:
- {type: zookeeper, path: /services/mysql/main}
checks:
- type: sql
driver: mysql
datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/"
Nerve - Track and report service status
# cat env/prod-dc1/services/tripsearch/attributes/tripsearch.yml
—-
override:
tripsearch:
database:
read:
host: localhaproxy
database: tripsearch
user: tripsearch_rd
port: 3307
write:
host: localhaproxy
database: tripsearch
user: tripsearch_wr
port: 3308
Synapse - Service discovery router
# cat env/prod-dc1/services/tripsearch/attributes/synapse.yml
---
override:
synapse:
services:
- name: mysql-main_read
path: /services/mysql/main
port: 3307
serverCorrelation:
type: excludeServer
otherServiceName: mysql-main_write
scope: first
- name: mysql-main_write
path: /services/mysql/main
port: 3308
serverOptions: backup
serverSort: date
Be Quiet!
Come gently into prod
Service
Discovery
weight
system
Nerve’s checks are OK
Service is reported
with a current weight
of 1/255.
Warmup is triggered
Current weight is
increased following a
weighted fibonacci
suite.
If enableCheckStableCommand is set
The command is run at each
increase and if returning != 0,
current weight restart from 1.
Weight value is reached
The service is fully in
production.
go-nerve Zookeeper go-synapse HAProxy
call API on
/enable or
/weight/:weight
store current
weight
update weight on
HaProxy via
socket
set weight
<backend>/<server>
<weight>
# cat /report_slow_queries.sh
#!/dgr/bin/busybox sh
. /dgr/bin/functions.sh
isLevelEnabled "debug" && set -x
slwq=$(/usr/bin/timeout 1 /usr/bin/mysql -h127.0.0.1 -ulocal_mon -plocal_mon information_schema -e "SELECT COUNT(1) FROM processlist WHERE user
LIKE '%rd' AND LOWER(command) <> 'sleep' AND time > 1" -BN)
if [ $? -eq 0 ] && [ $slwq -eq 0 ]; then
return 0
else
return 1
fi
MySQL’s warm up in nerve
# cat env/prod-dc1/services/mysql-main/attributes/nerve.yml
---
override:
nerve:
services:
- name: "mysql-main"
port: 3306
reporters:
- {type: zookeeper, path: /services/mysql/main}
checks:
- type: sql
driver: mysql
datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/"
enableCheckStableCommand: ["/report_slow_queries.sh"]
MySQL’s warm up in nerve
bbc mysql prod-dc1 mysql-main mysql-main1 monitor
#1 Weight: 1/255 Processes: 0 Slow: 0
#2 Weight: 2/255 Processes: 0 Slow: 0
#3 Weight: 3/255 Processes: 3 Slow: 0
#4 Weight: 5/255 Processes: 7 Slow: 0
#5 Weight: 5/255 Processes: 10 Slow: 0
#6 Weight: 8/255 Processes: 12 Slow: 0
#7 Weight: 13/255 Processes: 20 Slow: 1 <- SLOW !
#8 Weight: 1/255 Processes: 20 Slow: 1
#9 Weight: 2/255 Processes: 12 Slow: 0
#10 Weight: 3/255 Processes: 4 Slow: 0
#11 Weight: 5/255 Processes: 7 Slow: 0
#12 Weight: 8/255 Processes: 10 Slow: 0
#13 Weight: 13/255 Processes: 12 Slow: 0
#14 Weight: 15/255 Processes: 20 Slow: 0
#15 Weight: 23/255 Processes: 35 Slow: 0
#16 Weight: 38/255 Processes: 40 Slow: 0
#17 Weight: 38/255 Processes: 35 Slow: 0
#18 Weight: 61/255 Processes: 36 Slow: 0
#19 Weight: 61/255 Processes: 47 Slow: 0
#20 Weight: 98/255 Processes: 44 Slow: 0
#21 Weight: 98/255 Processes: 41 Slow: 0
#22 Weight: 158/255 Processes: 38 Slow: 0
#23 Weight: 158/255 Processes: 50 Slow: 0
#24 Weight: 255/255 Processes: 46 Slow: 0 <- FULL POWER !
#25 Weight: 255/255 Processes: 46 Slow: 0
Die in Peace...
Get out when you are
ready
API call /disable return
The service can be shutdown
without risk.
Call /disable on Nerve’s API
Set weight to 0 = no more new
sessions will go into the services.
if disableGracefullyDoneCommand is set
This command is run in loop until
return 0.
Gracefully
Disabling
Pipeline
# cat /report_remaining_processes.sh
#!/dgr/bin/busybox sh
. /dgr/bin/functions.sh
isLevelEnabled "debug" && set -x
procs=$(/usr/bin/timeout 1 /usr/bin/mysql -h127.0.0.1 -ulocal_mon -plocal_mon information_schema -e "SELECT COUNT(1) FROM processlist WHERE user
LIKE '%rd' OR user LIKE '%wr'" -BN)
if [ $? -eq 0 ] && [ $procs -eq 0 ]; then
return 0
else
return 1
fi
MySQL’s graceful shutdown in nerve
# cat env/prod-dc1/services/mysql-main/attributes/nerve.yml
---
override:
nerve:
services:
- name: "mysql-main"
port: 3306
reporters:
- {type: zookeeper, path: /services/mysql/main}
checks:
- type: sql
driver: mysql
datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/"
enableCheckStableCommand: ["/report_slow_queries.sh"]
disableGracefullyDoneCommand: ["/root/report_remaining_processes.sh"]
Be Quiet!
Come gently into prod
Abolish Slavery
Every node is the same
Die in Peace...
Get out when you are ready
Graceful restart
Service Discovery (nerve/synapse)
Weight system
Slow query tracking
Graceful restart
Service Discovery (nerve/synapse)
Weight system
No master/slave
Auto States Transferts
Service Discovery (nerve/synapse)
Backend High Availability Pillars
Database as a Service
Building a frictionless infrastructure
Easy deployment
Pull Request on a services
repository
No technical parameters to
override
The services are auto initialized
Easy deployment with GGN
$ tree env/prod-dc1/services/mysql-main
env/prod-dc1/services/mysql-main
├── attributes
│ ├── galera.yml
│ ├── innodb.yml
│ └── nerve.yml
├── service-manifest.yml
└── unit.tmpl
1 directory, 5 files
$ cat env/prod-dc1/services/mysql-main/service-manifest.yml
containers:
- aci.blbl.cr/pod-mysql:10.1-32
nodes:
- hostname: "mysql-main1"
ip: "192.168.1.1"
fleet:
- MachineMetadata=name=r11-srv1
- hostname: "mysql-mysql-main2"
ip: "192.168.1.2"
fleet:
- MachineMetadata=name=r12-srv2
- hostname: "mysql-mysql-main3"
ip: "192.168.1.3"
fleet:
- MachineMetadata=name=r13-srv3
$ cat env/prod-dc1/services/mysql-main/attributes/galera.yml
---
override:
mariadb:
galera:
wsrep_cluster_name: "prod-dc1_main"
$ cat env/prod-dc1/services/mysql-main/attributes/innodb.yml
---
override:
mariadb:
innodb:
innodb_log_file_size: "1G"
innodb_buffer_pool_size: "4G
Easy deployment
$ cat env/prod-dc1/services/mysql-main/unit.tmpl
[Unit]
Description=pod-mysql {{.hostname}}
[Service]
{{- template "env-fleet" .}}
{{ template "rkt-pre-start" . -}}
{{ template "rkt-post-stop" . }}
ExecStartPre=/usr/bin/mkdir -p /mnt/sdb1/{{.hostname}}/log
{{ template "rkt-run-options" . -}}
--volume=mysql-data,kind=host,source=/mnt/sdb1/{{.hostname}} 
--volume=mysql-log,kind=host,source=/mnt/sdb1/{{.hostname}}/log 
{{.acis}}
{{- template "x-fleet" . }}
# ggn prod-dc1 mysql-main update -y
Deploy the service with GGN (github.com/blablacar/ggn)
Generates systemd units based on templating send them to the environment using fleet.
Easy Monitoring &
Alerting
Service Oriented Monitoring
The monitoring plateform is
plugged into the service
discovery
Pager Duty
Incidents Manager
Grafana
Beautiful Visualizations
Prometheus
Smart Monitoring
Nerve
Service Discovery
Easy Monitoring & Alerting
Prometheus with Nerve integration
$ cat pod-mysql/pod-manifest.yml
name: aci.blbl.cr/pod-mysql:10.1-33
pod:
apps:
- dependencies:
- aci.blbl.cr/aci-mariadb:10.1-29
app:
mountPoints:
- {name: mysql-data, path: /var/lib/mysql}
- {name: mysql-log, path: /var/log/mysql}
- name: aci-nerve
dependencies:
- aci.blbl.cr/aci-go-nerve:21-23
- aci.blbl.cr/aci-mariadb:10.1-29
- dependencies:
- aci.blbl.cr/aci-prometheus-mysql-exporter:0.10.0-1
# cat env/prod-dc1/services/mysql-main/attributes/nerve.yml
---
override:
nerve:
services:
- name: "{{.hostname}}"
port: 3306
reporters:
- {type: zookeeper, path: /services/mysql/main}
checks:
- type: sql
driver: mysql
datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/"
- name: "{{.hostname}}_prometheus"
port: 9104
reporters:
- {type: zookeeper, path: /monitoring/mysql/main}
# curl mysql-main1.prod.dc1.com:9104/metrics | head
# HELP mysql_exporter_last_scrape_duration_seconds Duration of the
last scrape of metrics from MySQL.
# TYPE mysql_exporter_last_scrape_duration_seconds gauge
mysql_exporter_last_scrape_duration_seconds 0.056807316
# HELP mysql_exporter_last_scrape_error Whether the last scrape of
metrics from MySQL resulted in an error (1 for error, 0 for success).
# TYPE mysql_exporter_last_scrape_error gauge
mysql_exporter_last_scrape_error 0
[...]
# cat env/prod-dc1/services/prometheus/attributes/prometheus.yml
[...]
ranged_targets:
- type: zk
job_name: discovery_prod-dc1
scrape_interval: 20s
metrics_path: /metrics
zk:
hosts: '{{ toJson .zk.hosts }}'
zkpaths:
- /monitoring
[...]
Prometheus relabeling
# [zk: localhost:2181(CONNECTED) 1] get /monitoring/mysql/main/mysql-main1_prometheus_192.168.1.2_ba0f1f8b
{"available":true,"host":"192.168.1.2","port":9104,"name":"mysql-main1","weight":255,"labels":{"host":"r11-srv1"}}
We push services info with Nerve into Zookeeper
And Prometheus does the magic
$ cat prometheus-rules/alert.mysql.rules
# Alert: Galera node state is not synced.
ALERT MySQLGaleraStateIsNotSynced
IF (mysql_global_status_wsrep_local_state != 4 AND mysql_global_variables_wsrep_desync == 0)
FOR 2m
LABELS {
severity = "warning", team="data_infrastructure"
}
ANNOTATIONS {
summary = "Galera node {{ $labels.name }} state is not in “Synced” (state={{$value}}).",
dashboard = "https://promgrafana.blabla.com/dashboard/db/mysql-cluster-view?var-
cluster={{$labels.service}}&var-ds=prom-dc1&from=now-1h&to=now",
runbook="https://ops-run-book.blabla.com/mysql/operational-tasks#MySQLGaleraOutOfSync",
}
Alerting
PromQL to find out
unhealthy services
Labeling for routing to
Slack & Pager Duty
Annotations with
templating to have clear
descriptions, URL to
dashboards and ops
runbooks
Easy troubleshooting
Do the basic health checks
quickly
In real time
Avoiding human
mistakes/errors
A set of bash scripts Do the basic health
checks quickly
Easy troubleshooting with “bbc” command
Manage all backends
the same way
Can be used by non-
specialists
Plugged into the
service discovery
Designed for our
needs
# bbc mysql list
pp-dc2 mysql-main
pp-dc2 mysql-user
pp-dc2 mysql-trip
pp-dc2 mysql-payment
prod-dc1 mysql-main
prod-dc1 mysql-user
prod-dc1 mysql-trip
prod-dc1 mysql-payment
[...]
bbc command examples
# bbc mysql overview prod-dc1 mysql-main
=== Service Overview 'prod-dc1 mysql-main' ===
mysql-main1 (192.168.1.1) PING, PORT, Synced
---
mysql-main1 (3306) - enabled - weight = 255/255
mysql-main1_prometheus (9104) - enabled - weight = 255/255
mysql-main2 (192.168.1.2) PING, PORT, Synced
---
mysql-main2 (3306) - enabled - weight = 255/255
mysql-main2_prometheus (9104) - enabled - weight = 255/255
mysql-main3 (192.168.1.3) PING, PORT, Synced
---
mysql-main3 (3306) - enabled - weight = 255/255
mysql-main3_prometheus (9104) - enabled - weight = 255/255 # bbc mysql connect prod-dc1 mysql-main
env: prod-dc1
service: mysql-main
host: mysql-main1
ip: 192.168.1.1
Enter the username [ENTER]: team_data
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or g.
Your MariaDB connection id is 2887129
Server version: 10.1.28-MariaDB-1~jessie mariadb.org binary distribution
Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
MariaDB [(none)]>
# bbc mysql monitor prod-dc1 mysql-main mysql-main1
Weight: 255/255 Processes: 88 Slow: 0
Weight: 255/255 Processes: 75 Slow: 0
Weight: 255/255 Processes: 89 Slow: 0
Weight: 255/255 Processes: 99 Slow: 0
Weight: 255/255 Processes: 79 Slow: 0
Weight: 255/255 Processes: 65 Slow: 0
Weight: 255/255 Processes: 86 Slow: 0
Weight: 255/255 Processes: 93 Slow: 0
Weight: 255/255 Processes: 88 Slow: 0
Weight: 255/255 Processes: 96 Slow: 0
Weight: 255/255 Processes: 77 Slow: 0
Weight: 255/255 Processes: 73 Slow: 0
# bbc postgresql overview prod-dc1 postgresql-corridoring
Service Overview 'prod-dc1 postgresql-corridoring'
-- USING BDR --
postgresql-corridoring1 (192.168.1.10) PING, PORT
postgresql-corridoring2 (192.168.1.11) PING, PORT
postgresql-corridoring3 (192.168.1.12) PING, PORT
postgresql-corridoring4 (192.168.1.13) PING, PORT
postgresql-corridoring5 (192.168.1.14) PING, PORT
# bbc postgresql list
pp-dc2 postgresql-airflow
pp-dc2 postgresql-corridoring
pp-dc2 postgresql-redash
pp-dc2 postgresql-trip-pricing
prod-dc1 postgresql-corridoring
prod-dc1 postgresql-redash
bbc command examples
# bbc postgresql connect prod-dc1 postgresql-corridoring
env: prod-dc1
service: postgresql-corridoring - database : corridoring
host: postgresql-corridoring1
ip: 192.168.1.10
Enter the username [ENTER]: team_data
Password for user team_arch:
psql (9.6.6, server 9.4.12)
Type "help" for help.
corridoring=#
# bbc redis overview prod-dc1 redis-main
=== Service 'prod-dc1' 'redis-main' ===
Redis elector master: redis-main1.prod.dc-1.blabla.com
redis-main1 (192.168.1.20): PING, PORT, role:master, clients:255
redis-main2 (192.168.1.21): PING, PORT, role:slave, clients:2, slaveof:192.168.1.20
redis-main3 (192.168.1.22): PING, PORT, role:slave, clients:2, slaveof:192.168.1.20
# bbc redis list
pp-dc2 redis-main
pp-dc2 redis-quota
pp-dc2 redis-translation
pp-dc2 redis-user
prod-dc1 redis-main
prod-dc1 redis-quota
# bbc redis connect prod-dc1 redis-main
env: prod-dc1
service: redis-main
host: redis-main1
ip: 192.168.1.20
role: slave
192.168.1.20:6379>
# bbc cassandra ping prod-dc1 cassandra-user
cassandra-user1 (192.168.1.30) PING, CQL, JMX
---
cassandra-user2 (192.168.1.31) PING, CQL, JMX
---
cassandra-user3 (192.168.1.32) PING, CQL, JMX
---
bbc command examples
# bbc cassandra overview prod-dc1 cassandra-user
=== Service 'prod-dc1 cassandra-user' ===
Datacenter: prod-dc1
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.1.30 6.01 GB 256 33.3% bef39dd5-d4e5-4733-93e5-75904b6d556a r10
UN 192.168.1.31 5.89 GB 256 33.3% 23b77937-2177-4638-b860-e73e4bb913d2 r10
UN 192.168.1.32 5.12 GB 256 33.3% de0f4ed1-1241-499d-9485-e73e4bb913d2 r10
Datacenter: prod-dc2
====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 192.168.2.10 15.69 GB 256 100.0% 3ca1e862-f3e2-4fbf-a6c1-4d7d5a3e70ec r14
UN 192.168.2.11 14.99 GB 256 100.0% de0f4ed1-1241-499d-9485-2f8196aa7425 r13
UN 192.168.2.12 16.1 GB 256 100.0% 7e5fee00-052f-4546-973d-befaebbe604b r15
Today, 32 subcommands are available on bbc...
What’s next?
Moving to Kubernetes
From a simple
“Distributed init
system” to the
standard for container
orchestration.
Fleet is deprecated
Fleet is no longer
developed and
maintained by
CoreOS.
What does
the future
look like?
Ownership
Move backends
ownership to the
developers teams.
Moving to the cloud?
Extend this idea of
“expendable” services to
hardware resources.
Docker?
Kubernetes + RKT
(rktnetes, rktlet) has a
poor adoption.
M|18 Scalability via Expendable Resources: Containers at BlaBlaCar
M|18 Scalability via Expendable Resources: Containers at BlaBlaCar

Mais conteúdo relacionado

Mais procurados

Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise ClusterWebseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
MariaDB Corporation
 

Mais procurados (20)

Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise ClusterWebseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
Webseminar: MariaDB Enterprise und MariaDB Enterprise Cluster
 
What’s new in Galera 4
What’s new in Galera 4What’s new in Galera 4
What’s new in Galera 4
 
MariaDB Platform for hybrid transactional/analytical workloads
MariaDB Platform for hybrid transactional/analytical workloadsMariaDB Platform for hybrid transactional/analytical workloads
MariaDB Platform for hybrid transactional/analytical workloads
 
Configuring workload-based storage and topologies
Configuring workload-based storage and topologiesConfiguring workload-based storage and topologies
Configuring workload-based storage and topologies
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
 
How THINQ runs both transactions and analytics at scale
How THINQ runs both transactions and analytics at scaleHow THINQ runs both transactions and analytics at scale
How THINQ runs both transactions and analytics at scale
 
How we switched to columnar at SpendHQ
How we switched to columnar at SpendHQHow we switched to columnar at SpendHQ
How we switched to columnar at SpendHQ
 
M|18 Writing Stored Procedures in the Real World
M|18 Writing Stored Procedures in the Real WorldM|18 Writing Stored Procedures in the Real World
M|18 Writing Stored Procedures in the Real World
 
How QBerg scaled to store data longer, query it faster
How QBerg scaled to store data longer, query it fasterHow QBerg scaled to store data longer, query it faster
How QBerg scaled to store data longer, query it faster
 
Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®Global Data Replication with Galera for Ansell Guardian®
Global Data Replication with Galera for Ansell Guardian®
 
M|18 Creating a Reference Architecture for High Availability at Nokia
M|18 Creating a Reference Architecture for High Availability at NokiaM|18 Creating a Reference Architecture for High Availability at Nokia
M|18 Creating a Reference Architecture for High Availability at Nokia
 
NewSQL overview, Feb 2015
NewSQL overview, Feb 2015NewSQL overview, Feb 2015
NewSQL overview, Feb 2015
 
Extending MariaDB with user-defined functions
Extending MariaDB with user-defined functionsExtending MariaDB with user-defined functions
Extending MariaDB with user-defined functions
 
MariaDB und mehr - MariaDB Roadshow Summer 2014 Hamburg Berlin Frankfurt
MariaDB und mehr - MariaDB Roadshow Summer 2014 Hamburg Berlin FrankfurtMariaDB und mehr - MariaDB Roadshow Summer 2014 Hamburg Berlin Frankfurt
MariaDB und mehr - MariaDB Roadshow Summer 2014 Hamburg Berlin Frankfurt
 
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in GoScylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
Scylla Summit 2016: Using ScyllaDB for a Microservice-based Pipeline in Go
 
MaxScale - The Pluggable Router
MaxScale - The Pluggable RouterMaxScale - The Pluggable Router
MaxScale - The Pluggable Router
 
M|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX PlatformM|18 Analyzing Data with the MariaDB AX Platform
M|18 Analyzing Data with the MariaDB AX Platform
 
Deploying MariaDB databases with containers at Nokia Networks
Deploying MariaDB databases with containers at Nokia NetworksDeploying MariaDB databases with containers at Nokia Networks
Deploying MariaDB databases with containers at Nokia Networks
 
When is Myrocks good? 2020 Webinar Series
When is Myrocks good? 2020 Webinar SeriesWhen is Myrocks good? 2020 Webinar Series
When is Myrocks good? 2020 Webinar Series
 
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
MariaDB Enterprise & MariaDB Enterprise Cluster - MariaDB Webinar July 2014 F...
 

Semelhante a M|18 Scalability via Expendable Resources: Containers at BlaBlaCar

How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...
Jos Boumans
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
csching
 

Semelhante a M|18 Scalability via Expendable Resources: Containers at BlaBlaCar (20)

Saltstack - Orchestration & Application Deployment
Saltstack - Orchestration & Application DeploymentSaltstack - Orchestration & Application Deployment
Saltstack - Orchestration & Application Deployment
 
Using a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming AggregationsUsing a Fast Operational Database to Build Real-time Streaming Aggregations
Using a Fast Operational Database to Build Real-time Streaming Aggregations
 
Docker Monitoring Webinar
Docker Monitoring  WebinarDocker Monitoring  Webinar
Docker Monitoring Webinar
 
Applying profilers to my sql (fosdem 2017)
Applying profilers to my sql (fosdem 2017)Applying profilers to my sql (fosdem 2017)
Applying profilers to my sql (fosdem 2017)
 
Code4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch PortalCode4Lib 2007: MyResearch Portal
Code4Lib 2007: MyResearch Portal
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
 
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour Oct 2019
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour  Oct 2019Troubleshooting Tips and Tricks for Database 19c - EMEA Tour  Oct 2019
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour Oct 2019
 
Introduction to Industrial Control Systems : Pentesting PLCs 101 (BlackHat Eu...
Introduction to Industrial Control Systems : Pentesting PLCs 101 (BlackHat Eu...Introduction to Industrial Control Systems : Pentesting PLCs 101 (BlackHat Eu...
Introduction to Industrial Control Systems : Pentesting PLCs 101 (BlackHat Eu...
 
Percona Live UK 2014 Part III
Percona Live UK 2014  Part IIIPercona Live UK 2014  Part III
Percona Live UK 2014 Part III
 
How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...How to measure everything - a million metrics per second with minimal develop...
How to measure everything - a million metrics per second with minimal develop...
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
Why Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container TechnologyWhy Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container Technology
 
Network Automation with Salt and NAPALM: a self-resilient network
Network Automation with Salt and NAPALM: a self-resilient networkNetwork Automation with Salt and NAPALM: a self-resilient network
Network Automation with Salt and NAPALM: a self-resilient network
 
Finding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLFinding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQL
 
SolarWinds Scalability for the Enterprise
SolarWinds Scalability for the EnterpriseSolarWinds Scalability for the Enterprise
SolarWinds Scalability for the Enterprise
 
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
 

Mais de MariaDB plc

Mais de MariaDB plc (20)

MariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.xMariaDB Paris Workshop 2023 - MaxScale 23.02.x
MariaDB Paris Workshop 2023 - MaxScale 23.02.x
 
MariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - NewpharmaMariaDB Paris Workshop 2023 - Newpharma
MariaDB Paris Workshop 2023 - Newpharma
 
MariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - CloudMariaDB Paris Workshop 2023 - Cloud
MariaDB Paris Workshop 2023 - Cloud
 
MariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB EnterpriseMariaDB Paris Workshop 2023 - MariaDB Enterprise
MariaDB Paris Workshop 2023 - MariaDB Enterprise
 
MariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance OptimizationMariaDB Paris Workshop 2023 - Performance Optimization
MariaDB Paris Workshop 2023 - Performance Optimization
 
MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale MariaDB Paris Workshop 2023 - MaxScale
MariaDB Paris Workshop 2023 - MaxScale
 
MariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentationMariaDB Paris Workshop 2023 - novadys presentation
MariaDB Paris Workshop 2023 - novadys presentation
 
MariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentationMariaDB Paris Workshop 2023 - DARVA presentation
MariaDB Paris Workshop 2023 - DARVA presentation
 
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
MariaDB Tech und Business Update Hamburg 2023 - MariaDB Enterprise Server
 
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-BackupMariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
MariaDB SkySQL Autonome Skalierung, Observability, Cloud-Backup
 
Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023Einführung : MariaDB Tech und Business Update Hamburg 2023
Einführung : MariaDB Tech und Business Update Hamburg 2023
 
Hochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDBHochverfügbarkeitslösungen mit MariaDB
Hochverfügbarkeitslösungen mit MariaDB
 
Die Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise ServerDie Neuheiten in MariaDB Enterprise Server
Die Neuheiten in MariaDB Enterprise Server
 
Introducing workload analysis
Introducing workload analysisIntroducing workload analysis
Introducing workload analysis
 
Under the hood: SkySQL monitoring
Under the hood: SkySQL monitoringUnder the hood: SkySQL monitoring
Under the hood: SkySQL monitoring
 
Introducing the R2DBC async Java connector
Introducing the R2DBC async Java connectorIntroducing the R2DBC async Java connector
Introducing the R2DBC async Java connector
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
 
The architecture of SkySQL
The architecture of SkySQLThe architecture of SkySQL
The architecture of SkySQL
 
What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1What to expect from MariaDB Platform X5, part 1
What to expect from MariaDB Platform X5, part 1
 

Último

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Último (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

M|18 Scalability via Expendable Resources: Containers at BlaBlaCar

  • 3. Scalability via Expendable Resources: Containers at BlaBlaCar M|18, Feb 27, 2018
  • 4. Today’s agenda BlaBlaCar - Facts & Figures Infrastructure Ecosystem - 100% containers powered carpooling Backend High Availability Pillars - MariaDB as an example Database as a Service - Building a frictionless infrastructure What’s next?
  • 6. 60 million members Founded in 2006 1 million tonnes less CO2 In the past year 30 million mobile app downloads iPhone and Android 5 million monthly travellers Currently in 22 countriesFrance, Spain, UK, Italy, Poland, Hungary, Croatia, Serbia, Romania, Germany, Belgium, India, Mexico, The Netherlands, Luxembourg, Portugal, Ukraine, Czech Republic, Slovakia, Russia, Brazil and Turkey. Facts and Figures
  • 7. MariaDB Cassandra Redis PostgreSQL Transactional 20 clusters 55 nodes 40K reads/s Our prod data ecosystem ElasticSearch Distributed 6 clusters 32 nodes 3K reads/s Volatile 17 clusters 51 nodes 40K reads/s Search 11 clusters 65 nodes 1K searches/s Spatial 4 clusters 14 nodes 3K reads/s
  • 9. Infrastructure Ecosystem bare-metal servers 1 type of hardware 3 disk profiles fleet cluster CoreOS fleet etcd“Distributed init system” Hardware Container Registry ggn dgr Service Codebase rkt PODs build run store host create mysqld monitoring nerve mysql-main1 php nginx nerve monitoring synapse front1 synapse nerve zookeeper Service Discovery
  • 10. backend pod client pod Service Discovery /database/node1 go-nerve does health checks and reports to zookeeper in service keys node1 /database Applications hit their local haproxy to access backends go-synapse watches zookeeper service keys and reloads haproxy if changes are detected HAProxy go-nerve Zookeeper go-synapse
  • 11. Backend High Availability Pillars MariaDB as an example
  • 13. Asynchronous vs. Synchronous Master Slave Slave Slave wsrep wsrep wsrep wsrep MariaDB Cluster wsrep MariaDB Cluster means No Single Point of Failure No Replication Lag Auto States Transfers As fast as the slowest
  • 14. MySQL at BlaBlaCar? wsrep wsrep wsrep wsrep MariaDB Cluster wsrep MariaDB Cluster Our prerequisites are Containers Writes go on one node Writes Reads are balanced on the others Reads
  • 15. # zookeepercli -c lsr /services/mysql/main mysql-main1_192.168.1.2_ba0f1f8b mysql-main2_192.168.1.3_734d63da mysql-main3_192.168.1.4_dde45787 # zookeepercli -c get /services/mysql/main/mysql- main1_192.168.1.2_ba0f1f8b3 { "available":true, "host":"192.168.1.2", "port":3306, "name":"mysql-main1", "weight":255, "labels":{ "host":"r10-srv4" } } # cat env/prod-dc1/services/mysql-main/attributes/nerve.yml --- override: nerve: services: - name: "mysql-main" port: 3306 reporters: - {type: zookeeper, path: /services/mysql/main} checks: - type: sql driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/" Nerve - Track and report service status
  • 16. # cat env/prod-dc1/services/tripsearch/attributes/tripsearch.yml —- override: tripsearch: database: read: host: localhaproxy database: tripsearch user: tripsearch_rd port: 3307 write: host: localhaproxy database: tripsearch user: tripsearch_wr port: 3308 Synapse - Service discovery router # cat env/prod-dc1/services/tripsearch/attributes/synapse.yml --- override: synapse: services: - name: mysql-main_read path: /services/mysql/main port: 3307 serverCorrelation: type: excludeServer otherServiceName: mysql-main_write scope: first - name: mysql-main_write path: /services/mysql/main port: 3308 serverOptions: backup serverSort: date
  • 18. Service Discovery weight system Nerve’s checks are OK Service is reported with a current weight of 1/255. Warmup is triggered Current weight is increased following a weighted fibonacci suite.
  • 19. If enableCheckStableCommand is set The command is run at each increase and if returning != 0, current weight restart from 1. Weight value is reached The service is fully in production. go-nerve Zookeeper go-synapse HAProxy call API on /enable or /weight/:weight store current weight update weight on HaProxy via socket set weight <backend>/<server> <weight>
  • 20. # cat /report_slow_queries.sh #!/dgr/bin/busybox sh . /dgr/bin/functions.sh isLevelEnabled "debug" && set -x slwq=$(/usr/bin/timeout 1 /usr/bin/mysql -h127.0.0.1 -ulocal_mon -plocal_mon information_schema -e "SELECT COUNT(1) FROM processlist WHERE user LIKE '%rd' AND LOWER(command) <> 'sleep' AND time > 1" -BN) if [ $? -eq 0 ] && [ $slwq -eq 0 ]; then return 0 else return 1 fi MySQL’s warm up in nerve # cat env/prod-dc1/services/mysql-main/attributes/nerve.yml --- override: nerve: services: - name: "mysql-main" port: 3306 reporters: - {type: zookeeper, path: /services/mysql/main} checks: - type: sql driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/" enableCheckStableCommand: ["/report_slow_queries.sh"]
  • 21. MySQL’s warm up in nerve bbc mysql prod-dc1 mysql-main mysql-main1 monitor #1 Weight: 1/255 Processes: 0 Slow: 0 #2 Weight: 2/255 Processes: 0 Slow: 0 #3 Weight: 3/255 Processes: 3 Slow: 0 #4 Weight: 5/255 Processes: 7 Slow: 0 #5 Weight: 5/255 Processes: 10 Slow: 0 #6 Weight: 8/255 Processes: 12 Slow: 0 #7 Weight: 13/255 Processes: 20 Slow: 1 <- SLOW ! #8 Weight: 1/255 Processes: 20 Slow: 1 #9 Weight: 2/255 Processes: 12 Slow: 0 #10 Weight: 3/255 Processes: 4 Slow: 0 #11 Weight: 5/255 Processes: 7 Slow: 0 #12 Weight: 8/255 Processes: 10 Slow: 0 #13 Weight: 13/255 Processes: 12 Slow: 0 #14 Weight: 15/255 Processes: 20 Slow: 0 #15 Weight: 23/255 Processes: 35 Slow: 0 #16 Weight: 38/255 Processes: 40 Slow: 0 #17 Weight: 38/255 Processes: 35 Slow: 0 #18 Weight: 61/255 Processes: 36 Slow: 0 #19 Weight: 61/255 Processes: 47 Slow: 0 #20 Weight: 98/255 Processes: 44 Slow: 0 #21 Weight: 98/255 Processes: 41 Slow: 0 #22 Weight: 158/255 Processes: 38 Slow: 0 #23 Weight: 158/255 Processes: 50 Slow: 0 #24 Weight: 255/255 Processes: 46 Slow: 0 <- FULL POWER ! #25 Weight: 255/255 Processes: 46 Slow: 0
  • 22. Die in Peace... Get out when you are ready
  • 23. API call /disable return The service can be shutdown without risk. Call /disable on Nerve’s API Set weight to 0 = no more new sessions will go into the services. if disableGracefullyDoneCommand is set This command is run in loop until return 0. Gracefully Disabling Pipeline
  • 24. # cat /report_remaining_processes.sh #!/dgr/bin/busybox sh . /dgr/bin/functions.sh isLevelEnabled "debug" && set -x procs=$(/usr/bin/timeout 1 /usr/bin/mysql -h127.0.0.1 -ulocal_mon -plocal_mon information_schema -e "SELECT COUNT(1) FROM processlist WHERE user LIKE '%rd' OR user LIKE '%wr'" -BN) if [ $? -eq 0 ] && [ $procs -eq 0 ]; then return 0 else return 1 fi MySQL’s graceful shutdown in nerve # cat env/prod-dc1/services/mysql-main/attributes/nerve.yml --- override: nerve: services: - name: "mysql-main" port: 3306 reporters: - {type: zookeeper, path: /services/mysql/main} checks: - type: sql driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/" enableCheckStableCommand: ["/report_slow_queries.sh"] disableGracefullyDoneCommand: ["/root/report_remaining_processes.sh"]
  • 25. Be Quiet! Come gently into prod Abolish Slavery Every node is the same Die in Peace... Get out when you are ready Graceful restart Service Discovery (nerve/synapse) Weight system Slow query tracking Graceful restart Service Discovery (nerve/synapse) Weight system No master/slave Auto States Transferts Service Discovery (nerve/synapse) Backend High Availability Pillars
  • 26. Database as a Service Building a frictionless infrastructure
  • 27. Easy deployment Pull Request on a services repository No technical parameters to override The services are auto initialized
  • 28. Easy deployment with GGN $ tree env/prod-dc1/services/mysql-main env/prod-dc1/services/mysql-main ├── attributes │ ├── galera.yml │ ├── innodb.yml │ └── nerve.yml ├── service-manifest.yml └── unit.tmpl 1 directory, 5 files $ cat env/prod-dc1/services/mysql-main/service-manifest.yml containers: - aci.blbl.cr/pod-mysql:10.1-32 nodes: - hostname: "mysql-main1" ip: "192.168.1.1" fleet: - MachineMetadata=name=r11-srv1 - hostname: "mysql-mysql-main2" ip: "192.168.1.2" fleet: - MachineMetadata=name=r12-srv2 - hostname: "mysql-mysql-main3" ip: "192.168.1.3" fleet: - MachineMetadata=name=r13-srv3 $ cat env/prod-dc1/services/mysql-main/attributes/galera.yml --- override: mariadb: galera: wsrep_cluster_name: "prod-dc1_main" $ cat env/prod-dc1/services/mysql-main/attributes/innodb.yml --- override: mariadb: innodb: innodb_log_file_size: "1G" innodb_buffer_pool_size: "4G
  • 29. Easy deployment $ cat env/prod-dc1/services/mysql-main/unit.tmpl [Unit] Description=pod-mysql {{.hostname}} [Service] {{- template "env-fleet" .}} {{ template "rkt-pre-start" . -}} {{ template "rkt-post-stop" . }} ExecStartPre=/usr/bin/mkdir -p /mnt/sdb1/{{.hostname}}/log {{ template "rkt-run-options" . -}} --volume=mysql-data,kind=host,source=/mnt/sdb1/{{.hostname}} --volume=mysql-log,kind=host,source=/mnt/sdb1/{{.hostname}}/log {{.acis}} {{- template "x-fleet" . }} # ggn prod-dc1 mysql-main update -y Deploy the service with GGN (github.com/blablacar/ggn) Generates systemd units based on templating send them to the environment using fleet.
  • 30. Easy Monitoring & Alerting Service Oriented Monitoring The monitoring plateform is plugged into the service discovery
  • 31. Pager Duty Incidents Manager Grafana Beautiful Visualizations Prometheus Smart Monitoring Nerve Service Discovery Easy Monitoring & Alerting
  • 32. Prometheus with Nerve integration $ cat pod-mysql/pod-manifest.yml name: aci.blbl.cr/pod-mysql:10.1-33 pod: apps: - dependencies: - aci.blbl.cr/aci-mariadb:10.1-29 app: mountPoints: - {name: mysql-data, path: /var/lib/mysql} - {name: mysql-log, path: /var/log/mysql} - name: aci-nerve dependencies: - aci.blbl.cr/aci-go-nerve:21-23 - aci.blbl.cr/aci-mariadb:10.1-29 - dependencies: - aci.blbl.cr/aci-prometheus-mysql-exporter:0.10.0-1 # cat env/prod-dc1/services/mysql-main/attributes/nerve.yml --- override: nerve: services: - name: "{{.hostname}}" port: 3306 reporters: - {type: zookeeper, path: /services/mysql/main} checks: - type: sql driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/" - name: "{{.hostname}}_prometheus" port: 9104 reporters: - {type: zookeeper, path: /monitoring/mysql/main} # curl mysql-main1.prod.dc1.com:9104/metrics | head # HELP mysql_exporter_last_scrape_duration_seconds Duration of the last scrape of metrics from MySQL. # TYPE mysql_exporter_last_scrape_duration_seconds gauge mysql_exporter_last_scrape_duration_seconds 0.056807316 # HELP mysql_exporter_last_scrape_error Whether the last scrape of metrics from MySQL resulted in an error (1 for error, 0 for success). # TYPE mysql_exporter_last_scrape_error gauge mysql_exporter_last_scrape_error 0 [...] # cat env/prod-dc1/services/prometheus/attributes/prometheus.yml [...] ranged_targets: - type: zk job_name: discovery_prod-dc1 scrape_interval: 20s metrics_path: /metrics zk: hosts: '{{ toJson .zk.hosts }}' zkpaths: - /monitoring [...]
  • 33. Prometheus relabeling # [zk: localhost:2181(CONNECTED) 1] get /monitoring/mysql/main/mysql-main1_prometheus_192.168.1.2_ba0f1f8b {"available":true,"host":"192.168.1.2","port":9104,"name":"mysql-main1","weight":255,"labels":{"host":"r11-srv1"}} We push services info with Nerve into Zookeeper And Prometheus does the magic
  • 34.
  • 35. $ cat prometheus-rules/alert.mysql.rules # Alert: Galera node state is not synced. ALERT MySQLGaleraStateIsNotSynced IF (mysql_global_status_wsrep_local_state != 4 AND mysql_global_variables_wsrep_desync == 0) FOR 2m LABELS { severity = "warning", team="data_infrastructure" } ANNOTATIONS { summary = "Galera node {{ $labels.name }} state is not in “Synced” (state={{$value}}).", dashboard = "https://promgrafana.blabla.com/dashboard/db/mysql-cluster-view?var- cluster={{$labels.service}}&var-ds=prom-dc1&from=now-1h&to=now", runbook="https://ops-run-book.blabla.com/mysql/operational-tasks#MySQLGaleraOutOfSync", } Alerting PromQL to find out unhealthy services Labeling for routing to Slack & Pager Duty Annotations with templating to have clear descriptions, URL to dashboards and ops runbooks
  • 36. Easy troubleshooting Do the basic health checks quickly In real time Avoiding human mistakes/errors
  • 37. A set of bash scripts Do the basic health checks quickly Easy troubleshooting with “bbc” command Manage all backends the same way Can be used by non- specialists Plugged into the service discovery Designed for our needs
  • 38. # bbc mysql list pp-dc2 mysql-main pp-dc2 mysql-user pp-dc2 mysql-trip pp-dc2 mysql-payment prod-dc1 mysql-main prod-dc1 mysql-user prod-dc1 mysql-trip prod-dc1 mysql-payment [...] bbc command examples # bbc mysql overview prod-dc1 mysql-main === Service Overview 'prod-dc1 mysql-main' === mysql-main1 (192.168.1.1) PING, PORT, Synced --- mysql-main1 (3306) - enabled - weight = 255/255 mysql-main1_prometheus (9104) - enabled - weight = 255/255 mysql-main2 (192.168.1.2) PING, PORT, Synced --- mysql-main2 (3306) - enabled - weight = 255/255 mysql-main2_prometheus (9104) - enabled - weight = 255/255 mysql-main3 (192.168.1.3) PING, PORT, Synced --- mysql-main3 (3306) - enabled - weight = 255/255 mysql-main3_prometheus (9104) - enabled - weight = 255/255 # bbc mysql connect prod-dc1 mysql-main env: prod-dc1 service: mysql-main host: mysql-main1 ip: 192.168.1.1 Enter the username [ENTER]: team_data Enter password: Welcome to the MariaDB monitor. Commands end with ; or g. Your MariaDB connection id is 2887129 Server version: 10.1.28-MariaDB-1~jessie mariadb.org binary distribution Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. MariaDB [(none)]> # bbc mysql monitor prod-dc1 mysql-main mysql-main1 Weight: 255/255 Processes: 88 Slow: 0 Weight: 255/255 Processes: 75 Slow: 0 Weight: 255/255 Processes: 89 Slow: 0 Weight: 255/255 Processes: 99 Slow: 0 Weight: 255/255 Processes: 79 Slow: 0 Weight: 255/255 Processes: 65 Slow: 0 Weight: 255/255 Processes: 86 Slow: 0 Weight: 255/255 Processes: 93 Slow: 0 Weight: 255/255 Processes: 88 Slow: 0 Weight: 255/255 Processes: 96 Slow: 0 Weight: 255/255 Processes: 77 Slow: 0 Weight: 255/255 Processes: 73 Slow: 0
  • 39. # bbc postgresql overview prod-dc1 postgresql-corridoring Service Overview 'prod-dc1 postgresql-corridoring' -- USING BDR -- postgresql-corridoring1 (192.168.1.10) PING, PORT postgresql-corridoring2 (192.168.1.11) PING, PORT postgresql-corridoring3 (192.168.1.12) PING, PORT postgresql-corridoring4 (192.168.1.13) PING, PORT postgresql-corridoring5 (192.168.1.14) PING, PORT # bbc postgresql list pp-dc2 postgresql-airflow pp-dc2 postgresql-corridoring pp-dc2 postgresql-redash pp-dc2 postgresql-trip-pricing prod-dc1 postgresql-corridoring prod-dc1 postgresql-redash bbc command examples # bbc postgresql connect prod-dc1 postgresql-corridoring env: prod-dc1 service: postgresql-corridoring - database : corridoring host: postgresql-corridoring1 ip: 192.168.1.10 Enter the username [ENTER]: team_data Password for user team_arch: psql (9.6.6, server 9.4.12) Type "help" for help. corridoring=# # bbc redis overview prod-dc1 redis-main === Service 'prod-dc1' 'redis-main' === Redis elector master: redis-main1.prod.dc-1.blabla.com redis-main1 (192.168.1.20): PING, PORT, role:master, clients:255 redis-main2 (192.168.1.21): PING, PORT, role:slave, clients:2, slaveof:192.168.1.20 redis-main3 (192.168.1.22): PING, PORT, role:slave, clients:2, slaveof:192.168.1.20 # bbc redis list pp-dc2 redis-main pp-dc2 redis-quota pp-dc2 redis-translation pp-dc2 redis-user prod-dc1 redis-main prod-dc1 redis-quota # bbc redis connect prod-dc1 redis-main env: prod-dc1 service: redis-main host: redis-main1 ip: 192.168.1.20 role: slave 192.168.1.20:6379>
  • 40. # bbc cassandra ping prod-dc1 cassandra-user cassandra-user1 (192.168.1.30) PING, CQL, JMX --- cassandra-user2 (192.168.1.31) PING, CQL, JMX --- cassandra-user3 (192.168.1.32) PING, CQL, JMX --- bbc command examples # bbc cassandra overview prod-dc1 cassandra-user === Service 'prod-dc1 cassandra-user' === Datacenter: prod-dc1 ==================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 192.168.1.30 6.01 GB 256 33.3% bef39dd5-d4e5-4733-93e5-75904b6d556a r10 UN 192.168.1.31 5.89 GB 256 33.3% 23b77937-2177-4638-b860-e73e4bb913d2 r10 UN 192.168.1.32 5.12 GB 256 33.3% de0f4ed1-1241-499d-9485-e73e4bb913d2 r10 Datacenter: prod-dc2 ==================== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 192.168.2.10 15.69 GB 256 100.0% 3ca1e862-f3e2-4fbf-a6c1-4d7d5a3e70ec r14 UN 192.168.2.11 14.99 GB 256 100.0% de0f4ed1-1241-499d-9485-2f8196aa7425 r13 UN 192.168.2.12 16.1 GB 256 100.0% 7e5fee00-052f-4546-973d-befaebbe604b r15 Today, 32 subcommands are available on bbc...
  • 42. Moving to Kubernetes From a simple “Distributed init system” to the standard for container orchestration. Fleet is deprecated Fleet is no longer developed and maintained by CoreOS. What does the future look like?
  • 43. Ownership Move backends ownership to the developers teams. Moving to the cloud? Extend this idea of “expendable” services to hardware resources. Docker? Kubernetes + RKT (rktnetes, rktlet) has a poor adoption.