How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using InfluxDB and Telegraf

Monitoring web servers
at scale

Today’s agenda
● About us
● What do we require from monitoring?
● Brief history of used monitoring tools so far
● How do we use Inﬂuxdata stack?

About us
Sasa Teković
● Senior Linux systems engineer
● Experienced in designing and maintaining shared,
VPS, dedicated server, and private cloud hosting
platforms
● Enjoying simplifying things for developers
● Big InﬂuxDB and Telegraf fan

About us
Branko Toić
● Senior Linux systems engineer
● Performance and monitoring enthusiast
● Python “developer”

About Sysbee - how we got here
● 2001 - humble beginning as one of the ﬁrst web hosting provider in
Croatia - Plus Hosting
● 2006 - building a brand new infrastructure in Zagreb, Croatia
● 2008 - introduction of VPS and dedicated server hosting
● 2010 - adding managed services to portfolio
● 2015 - joining DHH (Dominion Hosting Holding) group
● 2018 - Sysbee is founded due to increasing demand for managed services

About Sysbee - what we do
● Platform agnostic
● Services
○ Infrastructure assessment
○ Managed infrastructure
○ Managed AWS
● Products
○ Magento optimized hosting
○ Managed GitLab

About Sysbee - our typical clients
● Small projects
○ 1-3 servers
○ Mostly standalone virtual/physical servers
○ No high-availability
● Medium projects
○ 3+ servers
○ Load balancing
○ Partially fault-tolerant
● Large projects
○ 5+ servers
○ Highly available
○ Fault tolerant
○ Auto-scaling
● Small to medium businesses
○ E-commerce sites
○ News portals
○ API servers
○ Software as-a-service

Monitoring requirements
● Speciﬁc
○ Servers
○ Web applications
○ Periodic remote systems
● General
○ Does it work?
○ How does it work?
○ What's the trend

Brief history (pt 1)
● Not so many options
● MRTG
● Custom scripts
● Small server count
2001

● Introduction of GIGRIB
● Nagios starts to gain
popularity
2005
We started using Nagios :)

● Metrics?
2007
Munin(https://en.wikipedia.org/wiki/Munin_(software))

● Centralized
● Easy to use web interface
● Scalable and redundant
cluster
2012
(https://en.wikipedia.org/wiki/Ganglia_(software))

● Grafana released in 2014
● reconﬁgured Ganglia to be
usable with Grafana
● Started evaluating InﬂuxDB
with version 0.9.1 (2015)
2014

InﬂuxDB's early days?
● Started with v0.9.1
● Out of the box support for multiple write protocols
● Was suddenly impressed by backend storage engine performance
● A bit disappointed in disk usage
● Telegraf was an absolute breeze to set up

strengths
● Business value:
○ Open core
○ HA support (paid version)
○ Multiple databases per server
● Technical value:
○ Push
○ Single binary collector
○ Interoperability
○ Numeric + String
○ Rollup
○ Kapacitor

Telegraf
● Prepackaged binary
● Easy to conﬁgure
● Large number of built in
plugins
● Extendable through own
executable scripts
19

# Read metrics from one or many redis servers
[[inputs.redis]]
## specify servers via a url matching:
## [protocol://][:password]@address[:port]
## e.g.
## tcp://localhost:6379
## tcp://:password@192.168.99.100
## unix:///var/run/redis.sock
##
## If no servers are speciﬁed, then localhost is used as the host.
## If no port is speciﬁed, 6379 is used
servers = ["tcp://localhost:6379"]
21

22
[[inputs.net]]
# interfaces = ["eth0"]
# # Collects conntrack stats from the configured directories and files.
[[inputs.conntrack]]
files = ["ip_conntrack_count","ip_conntrack_max",
"nf_conntrack_count","nf_conntrack_max"]
dirs = ["/proc/sys/net/ipv4/netfilter","/proc/sys/net/netfilter"]

https://github.com/influxdata/kube-influxdb/tree/master/telegraf-ds

Clients and databases
● Each client has its own database
● Conﬁgurable data sources in Grafana
● Possibility to open organizations and special Grafana instances per client

Retention policies
● Autoscaling hosts
○ Write directly to 7 day retention policy
● Less used metrics
○ Specify shorter retention policy
● Data downsampling?
○ Waiting for intelligent roll-ups (https://github.com/influxdata/influxdb/issues/7198)

Future?
● Looking forward to upgrade to InﬂuxDB 2.0
● Flux and Kapacitor for alerting
● Anomaly detection
○ Kapacitor or LoudML (https://loudml.io/)
● Collecting even more metrics :)

© 2020 InﬂuxData. All rights reserved.33
November 10 - 11, 2020
North America Virtual Experience
www.inﬂuxdays.com/virtual-experience-2020/
Call for Papers is now open!
We’re looking for great speakers – submit
your speaker application today.

How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using InfluxDB and Telegraf

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using InfluxDB and Telegraf

Semelhante a How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using InfluxDB and Telegraf (20)

Mais de InfluxData

Mais de InfluxData (20)

Último

Último (20)

How Sysbee Manages Infrastructures and Provides Advanced Monitoring by Using InfluxDB and Telegraf