5. Would you start building own scales,
when you would operate a real zoo?
- What’s your mechanical engineering expertise?
- How long does it take to get tools and raw material?
- Who feeds the animals while being in the workshop?
- When do we need it and could it be ‚in time‘?
8. • Many VM’s & Apps - each one generates ~ 5-130
metrics in short intervals!
• Aggregation, Compromises on resolutions etc.!
• Transactions - each creates N log entries !
• limit recording, time based indices + aliases!
• High throughput - high rate of logs & metrics!
• build a monitoring infrastructure (remember this)!
!
9. METRIC SOURCE!
NUMBER OF METRICS TO
COLLECT!
OS (CPU. Mem, Disk)
21
Hadoop
133
Hbase
68
Elasticsearch
62
Apache Storm
25
Total
309
~ 3,1 Mio. data points per week x N machines !
!
Example - No. of metrics per application!
11. • Find out and define metrics to collect !
• Install, configure collectd, statsd, graphite, …!
• Build, install / configure available agents!
• Define reports or arrange all collected metrics to
dashboards e.g. grafana, …!
• This are the basics!!
• automate deployment for agents!
#monitoringsucks
12. #monitoringlove
• Integrate with the organization !
• alerting workflows + multi-user + security!
• Scale out: !
• Distributed event processing (e.g. Kafka)!
• Scalable data stores (e.g. Elasticsearch, HBase)!
• Add intelligence: !
• Machine learning for metrics & events!
• Alerting & Reporting based on it!
23. Log files as source of metrics
• Simplest: log rate of an application!
• Generate Count for operations!
• Apply search and count related events!
• E.g. count slow operations!
• Extract values from logs !
• Apply regex or field search to extract numbers !
27. Map your landscape
• Quantity of servers & applications to monitor!
• What are the components of your App-Stack?!
• Linux on AWS, NGINX, Node.js, REDIS, Elasticsearch!
• Which programming languages are used?!
• Can you find agents/monitors for all your ‚Apps‘?!
• List missing parts -> find other or build a monitor!
28. Customizing – custom
metrics/plugins
• What metrics are relevant for each ‚App‘?!
• What is covered by existing agents?!
• How to aggregate each of this metrics? !
• min, max, sum, avg!
• Pre-Aggregation vs. Query Time Aggregation!
29. Dashboards
• Graphs!
• Which metrics belong together?!
• Display options ….!
• Query language !
• Dashboards!
• What combination of graphs provides best insight?!
• Can you share and re-use arranged dashboards for similar setups or situations? !
• Or do you need to configure it again for other servers?!
• Is sharing secured? Or just a link to your UI?!
30. Alerts
• Threshold based alerts!
• Status changes !
• Heartbeat alerts!
• Anomaly detection!
• Challenges: Number of alert rules and queries !
& tuning ‘noise level’!
32. • Metrics show „something happens“!
• Logs provide evidence „what happened“!
• Faster insights by reporting them together!
• Correlate logs and metrics!
• Metrics could be created from logs!
Integrate metrics & logs
36. raw logs! parser!
Log
shipper! storage! Visualization!
Kibana!Elasticsearch!Logstash!
Where is the work?!
Centralizing Logs with ELK !
files,
syslog!
Format adaption,!
& transport!
Tuning !
Maintenance!
Queries!
Security !
38. • Schema: Define the right Mapping
• Insert rate:!
• Use bulk indexing!
• Increase refresh time for higher insert rate!
• Volume: !
• Aliases and time based indices!
• Memory usage: configure caching limits!
Setup Elasticsearch
39. • How to secure it? !
• Proxies, Security plugins, Hosted Solutions!
• Queries and dashboard creation!
• generators/templates for specific setups!
• Learn Lucene query language!