Monitoring it assets such as servers, application, networking devices databases, etc with different open source tools. From scripts to frameworks. Presentation was given as part of August Penguin 2013, Israeli Open Source Movement annual convention.
2. Your presenter
● Married +1
● CEO @ Forthscale systems
● Scalable infrastructure Architect
● Linux from 2.0, first distro is Biltmore
● Linux migration activist (from Unix and M$)
● nagios & forks from 2001
● monitoring expert with patents
3. Why monitor
● downtime suck
● losing money
● understanding your stack
● planning ahead
● analyzing what went wrong
● keeping inventory
● knowledge is control
4. What to monitor
● availability (ping, url, content ...)
● use of resources (CPU, RAM, disk … )
● benchmarking (throughput, request count …)
● events (logs, exceptions, dumps …)
● material (files, DBs, outputs … )
● anything else you might need
5. How to monitor
With some help of the open source
● Active
● Passive
● SNMP
● Scripts
● Frameworks
● Systems
6. What to do with data
based on time / object / state
● log (locally, externally )
● alert (someone, groupe, list)
● handle (or at least try to)
7. SNMP
● active and passive
● easy, out of the box support for basics
● widely deployed by default
● structured
● complicated for queries customization
● complicated to show trends
Net-SNMP package @ http://www.net-snmp.
org/
8. Scripts
● custom
● can do everything you want
● can be executed over ssh
● require programming skills
● need maintenance
● complicated to show trends
● do not show tactical overview
shell / perl / python
9. Frameworks
● scriptural frameworks such as watchdog
● full frameworks such as sensu
● structured
● easier in deployment
● supported
http://www.sensuapp.org
https://github.com/sebastien/monitoring
10. Systems
● Structured
● Extendable (plugins)
● Supported
● Complex
● Locking
● Not as scalable as you imagine
nagios / zenoss / zabbix
11. Monitoring (prv. known as watchdog)
● monitoring and data-collection daemon
● lightweight
● written in python
good for:
● to be notified when incidents happen
● automatic actions to be taken
● to collect statistics for further processing
12. Sensu
● lightweight
● written in python
consider themselves to be “monitoring router”
basically it is a framework that:
connects “check” scripts run across many
nodes with “handler” scripts run on one or
more Sensu servers
13. Systems
Everything packaged in one spot
● tactical
nagios / zenoss / zabbix
● acumalative
munin / cacti
● hybrids
nagios+cacti / icinga + munin
14. nagios / icinga / shinken
● de facto industry standard
● plugins for almost everything
● huge community
● text files for configuration
● clientless / clients
● alerting / handlers
● bad scalability
● trending via 3rd party
http://www.nagios.org/
15. zenoss
● written in python
● modern
● a very good Ajax gui
● split architecture (portal / process / data layer)
● only gui configuration
● scalable
● needs snmp
only basic core is open sourced
http://zenoss.com/
16. zabbix
● aimed to beat nagios
● some autodiscovery
● gui customization
● can show trends
● lightweight
● can correlate graphs
● cumbersome in configuration
● week escalation policy
http://www.zabbix.com/
17. munin
Munin is an open source client /server network
or system monitoring application that presents
output in graphs through a web interface
● stored to rrd
● makes graphs
● shows trends
● can alert
● very very very simple to deploy
● text based configuration
18. Summary
● nagios is still rock solid solution while
frameworks still are in 0.x version
● different frameworks look promising but require
high level of customization
● hybrids integrate nicely and can provide good
solution
● scalability is an issue for systems designed for
LAN use.