How we monitor 1 billion km
of monthly ride sharing
Jean Baptiste Favre
Ops Lead
@jbfavre
How are we ?
5 million
members in december 2013
20 million members
monitoring
7 million
members in april 2014
50 million members
monitoring
20 million
members in april 2015
2015
100 million members
monitoring
How we monitor 1 billion km
of monthly ride sharing
KEEP CALM
AND
MONITOR
ALL
THE THINGS
Zabbix
How many items ?
How many new VPS ?
Load ? What load ?:)
How ?
Standardization
Standardization
Server triggers probe execution
via zabbix-agent active item
Probes collects, format and send
informations...
Standard :
0 => OK
1 => fail during init
2 => fail while getting informations
3 => fail during Container update
4 => fail ...
Python or Java
LLD wherever possible
trappers always
Only 2 zabbix-agent (active) items per
template
Client side probes
python-protobix
KEEP CALM
AND USE
TRAPPERS
& LLD
EVERYWHERE
Almost
python-protobix
Actually no,
but could
have been
https://github.com/jbfavre/python­protobix
(also on pypi.python.org)
#!/usr/bin/env python
import protobix
''' create DataContainer, providing data_type, zabbix server and port '''
zbx_contai...
PUT YOUR OWN LOGIC HERE :)
#!/usr/bin/env python
import protobix
''' create DataContainer, providing data_type, zabbix ser...
Low Level Discovery
vhosts & queues
thresholds
Update values
message number
in/out ratio
Who is master of this queue
Rabbi...
Low Level Discovery
Galera
storage engines
Multi-replication
Update values
Pretty much everything:)
MariaDB example
Protobix probes
16 probes available
And more to come
redis/dynomite
zookeeper
…
https://github.com/jbfavre/python­zabbix
jmx-zabbix
KEEP CALM
AND
MONITOR
ALL THE
JAVA
THINGS
Because python is not (always) enough :)Because python is not (always) enough :)
jmx-zabbix
https://github.com/n0rad/jmx­z...
Embedded inside a Java process
– Internal Java daemons
Aside any Java process (separate service)
– Cassandra
– Elasticsear...
serverName: <hostname in Zabbix>
pushIntervalSecond: 60
inMemoryMaxQueueSize: 10
zabbix:
  host: <Zabbix server hostname o...
[...]
metrics:
  cassandra.status.failure:org.apache.cassandra.net:type=FailureDetector
  cassandra.status.timeouts:org.ap...
Zabbix visualization
KEEP CALM
AND
LOOK
AT THE
GRAPHS
Grafana
Grafana
+
Zabbix datasource
=
10 dashboards
in 2 days
Grafana
https://github.com/grafana/grafana
https://github.com/alexan...
Dashing
https://gist.github.com/chojayr/7401426
https://github.com/tolleiv/dashing­zabbix
Caveats
KEEP CALM
AND
FIX THINGS
BEFORE
CTO
NOTICES
Plugins & templates synchronization
Zabbix configuration automatization
Use same hostname everywhere
Beware of
What next ?
KEEP CALM
AND
WAIT
FOR
ZABBIX 3.0
Announced
– Trends predictions
– More scalable backend
– SSL communications
Not announced
(As far as I know)
– Trends from...
3 Take aways
Now you can wake up :)
1. Define & use standards
2. Use LLD & Trappers
3. Visualization is critical
Let's dis...
Jean-Baptiste Favre - How to Monitor Bilions of Miles Shared by 20 Million Users with Zabbix: A Case Study at BlaBlaCar
Jean-Baptiste Favre - How to Monitor Bilions of Miles Shared by 20 Million Users with Zabbix: A Case Study at BlaBlaCar
Próximos SlideShares
Carregando em…5
×

Jean-Baptiste Favre - How to Monitor Bilions of Miles Shared by 20 Million Users with Zabbix: A Case Study at BlaBlaCar

1.626 visualizações

Publicada em

This talk will cover a case study at BlaBlaCar, a transport network community with more than 20 milion members in 19 countires. The presentation will overview using a low-powered virtual machine to monitor the whole environment, concentrating on LLD and trapper items. Also projects like python-protobix, jmx-zabbix and python-zabbix will be covered.

Zabbix Conference 2015

Publicada em: Tecnologia
0 comentários
1 gostou
Estatísticas
Notas
  • Seja o primeiro a comentar

Sem downloads
Visualizações
Visualizações totais
1.626
No SlideShare
0
A partir de incorporações
0
Número de incorporações
49
Ações
Compartilhamentos
0
Downloads
51
Comentários
0
Gostaram
1
Incorporações 0
Nenhuma incorporação

Nenhuma nota no slide

Jean-Baptiste Favre - How to Monitor Bilions of Miles Shared by 20 Million Users with Zabbix: A Case Study at BlaBlaCar

  1. 1. How we monitor 1 billion km of monthly ride sharing Jean Baptiste Favre Ops Lead @jbfavre
  2. 2. How are we ?
  3. 3. 5 million members in december 2013
  4. 4. 20 million members monitoring
  5. 5. 7 million members in april 2014
  6. 6. 50 million members monitoring
  7. 7. 20 million members in april 2015
  8. 8. 2015
  9. 9. 100 million members monitoring
  10. 10. How we monitor 1 billion km of monthly ride sharing
  11. 11. KEEP CALM AND MONITOR ALL THE THINGS Zabbix
  12. 12. How many items ?
  13. 13. How many new VPS ?
  14. 14. Load ? What load ?:)
  15. 15. How ?
  16. 16. Standardization
  17. 17. Standardization Server triggers probe execution via zabbix-agent active item Probes collects, format and send informations using zabbix sender protocol Probe's exit code is send back to the server for feedback loop
  18. 18. Standard : 0 => OK 1 => fail during init 2 => fail while getting informations 3 => fail during Container update 4 => fail during Send phase Exit codes
  19. 19. Python or Java LLD wherever possible trappers always Only 2 zabbix-agent (active) items per template Client side probes
  20. 20. python-protobix KEEP CALM AND USE TRAPPERS & LLD EVERYWHERE Almost
  21. 21. python-protobix Actually no, but could have been https://github.com/jbfavre/python­protobix (also on pypi.python.org)
  22. 22. #!/usr/bin/env python import protobix ''' create DataContainer, providing data_type, zabbix server and port ''' zbx_container = protobix.DataContainer('lld', 'localhost', 10051) hostname='myhost' item='hardware.power_supply' value=[     { '{#SLOT}': 0, '{#PLUGGED}' : 1 },     { '{#SLOT}': 1, '{#PLUGGED}' : 0 }, ] zbx_container.add_item( hostname, item, value) try:     zbx_response = zbx_container.send() except protobix.SenderException:     print 'Oups...' LLD example PUT YOUR OWN LOGIC HERE :)
  23. 23. PUT YOUR OWN LOGIC HERE :) #!/usr/bin/env python import protobix ''' create DataContainer, providing data_type, zabbix server and port ''' zbx_container = protobix.DataContainer('items', 'localhost', 10051) hostname='myhost' item='hardware.power_supply[0,status]' value=1 zbx_container.add_item( hostname, item, value) try:     zbx_response = zbx_container.send() except protobix.SenderException:     print 'Oups...' item example
  24. 24. Low Level Discovery vhosts & queues thresholds Update values message number in/out ratio Who is master of this queue RabbitMQ example
  25. 25. Low Level Discovery Galera storage engines Multi-replication Update values Pretty much everything:) MariaDB example
  26. 26. Protobix probes 16 probes available And more to come redis/dynomite zookeeper … https://github.com/jbfavre/python­zabbix
  27. 27. jmx-zabbix KEEP CALM AND MONITOR ALL THE JAVA THINGS
  28. 28. Because python is not (always) enough :)Because python is not (always) enough :) jmx-zabbix https://github.com/n0rad/jmx­zabbix
  29. 29. Embedded inside a Java process – Internal Java daemons Aside any Java process (separate service) – Cassandra – Elasticsearch – … jmx-zabbix
  30. 30. serverName: <hostname in Zabbix> pushIntervalSecond: 60 inMemoryMaxQueueSize: 10 zabbix:   host: <Zabbix server hostname or IP>   port: 10051 jmx:   url: service:jmx:rmi:///jndi/rmi://localhost:7199/jmxrmi   username: zabbix   password: zabbix   timeoutSecond: 30 [...] configuration
  31. 31. [...] metrics:   cassandra.status.failure:org.apache.cassandra.net:type=FailureDetector   cassandra.status.timeouts:org.apache.cassandra.net:type=MessagingService   cassandra.db.storage: org.apache.cassandra.db:type=StorageProxy valuesCaptured:   org.apache.cassandra.gms.FailureDetector: ["DownEndpointCount"]   org.apache.cassandra.net.MessagingService: ["RecentTotalTimouts"]   org.apache.cassandra.service.StorageProxy: ["RecentRangeLatencyMicros",                                                "RecentReadLatencyMicros",                                                "RecentWriteLatencyMicros"] JMX to ZBX mapping
  32. 32. Zabbix visualization KEEP CALM AND LOOK AT THE GRAPHS
  33. 33. Grafana
  34. 34. Grafana + Zabbix datasource = 10 dashboards in 2 days Grafana https://github.com/grafana/grafana https://github.com/alexanderzobnin/grafana­zabbix
  35. 35. Dashing https://gist.github.com/chojayr/7401426 https://github.com/tolleiv/dashing­zabbix
  36. 36. Caveats KEEP CALM AND FIX THINGS BEFORE CTO NOTICES
  37. 37. Plugins & templates synchronization Zabbix configuration automatization Use same hostname everywhere Beware of
  38. 38. What next ? KEEP CALM AND WAIT FOR ZABBIX 3.0
  39. 39. Announced – Trends predictions – More scalable backend – SSL communications Not announced (As far as I know) – Trends from – Implicit dependency against proxy – Detailled web scenario – Per item maintenance – Anomaly detection What I miss in Zabbix
  40. 40. 3 Take aways Now you can wake up :) 1. Define & use standards 2. Use LLD & Trappers 3. Visualization is critical Let's discuss all that !

×