Discover how Sysbee helps organizations bring DevOps culture to small and medium enterprises. Their team helps their customers by improving stability, security, scalability — by providing cost-effective IT infrastructure. Learn how monitoring everything can improve your processes and simplify debugging!
Sysbee’s introspection on monitoring tools over the years
How TSDB’s, and specifically InfluxDB, fits into improving observability
Their approach to using the TICK Stack to improve the web hosting industry
2. Today’s agenda
● About us
● What do we require from monitoring?
● Brief history of used monitoring tools so far
● How do we use Influxdata stack?
3. About us
Sasa Teković
● Senior Linux systems engineer
● Experienced in designing and maintaining shared,
VPS, dedicated server, and private cloud hosting
platforms
● Enjoying simplifying things for developers
● Big InfluxDB and Telegraf fan
4. About us
Branko Toić
● Senior Linux systems engineer
● Performance and monitoring enthusiast
● Python “developer”
5. About Sysbee - how we got here
● 2001 - humble beginning as one of the first web hosting provider in
Croatia - Plus Hosting
● 2006 - building a brand new infrastructure in Zagreb, Croatia
● 2008 - introduction of VPS and dedicated server hosting
● 2010 - adding managed services to portfolio
● 2015 - joining DHH (Dominion Hosting Holding) group
● 2018 - Sysbee is founded due to increasing demand for managed services
6. About Sysbee - what we do
● Platform agnostic
● Services
○ Infrastructure assessment
○ Managed infrastructure
○ Managed AWS
● Products
○ Magento optimized hosting
○ Managed GitLab
7. About Sysbee - our typical clients
● Small projects
○ 1-3 servers
○ Mostly standalone virtual/physical servers
○ No high-availability
● Medium projects
○ 3+ servers
○ Load balancing
○ Partially fault-tolerant
● Large projects
○ 5+ servers
○ Highly available
○ Fault tolerant
○ Auto-scaling
● Small to medium businesses
○ E-commerce sites
○ News portals
○ API servers
○ Software as-a-service
8. Monitoring requirements
● Specific
○ Servers
○ Web applications
○ Periodic remote systems
● General
○ Does it work?
○ How does it work?
○ What's the trend
9. Brief history (pt 1)
● Not so many options
● MRTG
● Custom scripts
● Small server count
2001
10. Brief history (pt 2)
● Introduction of GIGRIB
● Nagios starts to gain
popularity
2005
We started using Nagios :)
11. Brief history (pt 3)
● Metrics?
2007
Munin(https://en.wikipedia.org/wiki/Munin_(software))
12. Brief history (pt 4)
● Centralized
● Easy to use web interface
● Scalable and redundant
cluster
2012
(https://en.wikipedia.org/wiki/Ganglia_(software))
13. Brief history (pt 5)
● Grafana released in 2014
● reconfigured Ganglia to be
usable with Grafana
● Started evaluating InfluxDB
with version 0.9.1 (2015)
2014
16. InfluxDB's early days?
● Started with v0.9.1
● Out of the box support for multiple write protocols
● Was suddenly impressed by backend storage engine performance
● A bit disappointed in disk usage
● Telegraf was an absolute breeze to set up
17. strengths
● Business value:
○ Open core
○ HA support (paid version)
○ Multiple databases per server
● Technical value:
○ Push
○ Single binary collector
○ Interoperability
○ Numeric + String
○ Rollup
○ Kapacitor
21. # Read metrics from one or many redis servers
[[inputs.redis]]
## specify servers via a url matching:
## [protocol://][:password]@address[:port]
## e.g.
## tcp://localhost:6379
## tcp://:password@192.168.99.100
## unix:///var/run/redis.sock
##
## If no servers are specified, then localhost is used as the host.
## If no port is specified, 6379 is used
servers = ["tcp://localhost:6379"]
21
22. 22
[[inputs.net]]
# interfaces = ["eth0"]
# # Collects conntrack stats from the configured directories and files.
[[inputs.conntrack]]
files = ["ip_conntrack_count","ip_conntrack_max",
"nf_conntrack_count","nf_conntrack_max"]
dirs = ["/proc/sys/net/ipv4/netfilter","/proc/sys/net/netfilter"]
28. Clients and databases
● Each client has its own database
● Configurable data sources in Grafana
● Possibility to open organizations and special Grafana instances per client
30. Retention policies
● Autoscaling hosts
○ Write directly to 7 day retention policy
● Less used metrics
○ Specify shorter retention policy
● Data downsampling?
○ Waiting for intelligent roll-ups (https://github.com/influxdata/influxdb/issues/7198)
31. Future?
● Looking forward to upgrade to InfluxDB 2.0
● Flux and Kapacitor for alerting
● Anomaly detection
○ Kapacitor or LoudML (https://loudml.io/)
● Collecting even more metrics :)