Building a Unified Logging Layer with Fluentd, Elasticsearch and Kibana
1. Building a Unified Logging Layer with
Fluentd, Elasticsearch and Kibana
Mushfekur Rahman
2. What This Session is About
● What is Unified Logging Layer?
● Why We Need One?
● Basics of Fluentd
● Storage and Querying Using Elasticsearch
● Real Time Visualization with Kibana
5. Unified Logging Layer (cont.)
We need something that can-
● Collect logs from multiple sources
● Can format and/or filter those according to our need
● Forward them to appropriate destinations for storage or
analysis
8. Basics of Fluentd
● Fluentd is a stream data collector application but it can work as a-
○ Filter
○ Buffer
○ Router
○ Converter
○ Aggregator etc etc.
● But it doesn’t come with these features built-in
● Instead, it has a flexible plugin based architecture consisting of
three modules: input, buffer and output. So everything is a plugin in
Fluentd-land (like LEGO blocks)
9. Why Fluentd?
● Open source
○ Github: https://github.com/fluent/fluentd
○ Developed and maintained by Treasure Data, Inc.
● Reliable
○ Buffering
■ Supports memory and file based buffering to prevent data loss when
routing between nodes
○ Retrying
■ Includes mechanism for retrying if there’s any failure to capture an
event
■ Uses buffer ID for maintaining idempotency
○ Error handling
■ Forward plugin has support for load balancing, transaction, failover
etc
■ Supports secondary node for backup and/or load balancing
10. Why Fluentd? (cont.)
● Scalable and easy to extend
○ Completely decentralized architecture (e.g. no central/master node)
hence easy to scale-out
○ Plugins to communicate between nodes
● Rich plugin ecosystem based on rubygems
○ 300+ community plugins
○ Easy apache-like configuration syntax with Ruby DSL support
● Uses minimum resource
○ Developed using CRuby (performance critical sections are written in C
while plugins are developed in Ruby)
○ Vanilla instance uses only 30-40 MB of memory
○ Can process up to 1300 events/sec/core
13. Fluentd Events
● Time
○ Uses timestamp to order events
○ Network should have NTP (Network Time Protocol) configured
● Tag
○ For routing
● Record
○ The actual log message
○ Usually JSON
14. Capturing Events
● File tailing
○ Uses in_tail plugin
○ Allows reading events from the tail of text files
○ Keeps track of inode number so handles rotations
● Directly from application
○ Use Fluentd Logger
■ For Java: https://github.com/fluent/fluent-logger-java
○ Use Logback Fluentd appender
■ https://github.com/sndyuk/logback-more-appenders
● From other interfaces like
○ TCP/UDP, Socket, Stdout, syslog etc.
15. Fluentd Output
● Capable of forwarding events to almost all mainstream
interfaces like Stdout, Socket, /dev/null etc
● There is plugin for most of the popular storage or
analytics engine like Amazon S3, Elasticsearch, TD
Analytics Cloud, Mongo DB, Hadoop HDFS etc.
● Can send alerts using SMTP plugin
● Outputs can be both buffered and non-buffered
● Can even route an event to itself
○ Filtering, conversion and/or formatting
17. Plugin Management
● 6 types of plugin in total: Input, Parser, Formatter, Filter,
Output and Buffer
● Using td-agent
○ To install plugin
/usr/sbin/td-agent-gem install fluent-plugin-
<name>
● Fluentd will pick up any plugin in its LOAD_PATH
● To add any plugin that’s not in default load path
fluentd -p path/to/plugins
18. Running Fluentd
● Can be installed using td-agent and/or ruby gems
● Configuration file in
/etc/td-agent/td-agent.conf
● OS X
$ sudo launchctl unload /Library/LaunchDaemons/td-agent.plist
$ sudo launchctl load /Library/LaunchDaemons/td-agent.plist
● Linux
$ sudo /etc/init.d/td-agent restart
19. Sample Config
Input
<source>
type tail
format apache
path path/to/log/file
pos_file path/to/pos
tag apache.access
</source>
Output
<match apache.**>
type copy
<store>
type elasticsearch
host <host>
port <port>
<store>
path /path/to/file/buffer
</match>
20. High Availability Config
Two types of nodes:
1. Forwarders
○ Typically installed on every node to capture local events
○ Next they forward those events to aggregators through network (TCP)
2. Aggregators
○ Daemons that receive events from forwarders
○ Later they upload these events to analytics engine or storage for
further processing
22. Email Alerting
● Grep for patterns in log and send Email alert when
certain conditions are met
● Saves your inbox from getting flooded with alert mails
● Like Grep-and-Alert-Email of Splunk
● Need to install two plugins
■ fluent-plugin-grepcounter
■ fluent-plugin-mail
● Sample configuration: Gist
23. Fluentd Failure Scenarios
● We cannot get past to the Murphy’s Law
● Upon receiving an event, a fluentd instance (forwarder or
aggregator) writes it to the buffer (specified by
buffer_path for disks) and after a flush_interval
buffered events are forwarded to the aggregator
● If an instance dies, buffered data is transferred on restart
○ Instances are restarted automatically
● If the connection from an instance breaks, a retry event is
triggered
24. Fluentd Failure Scenarios (cont.)
● Disk buffering is inherently robust against
data loss, however data will be lost if-
○ An instance dies right after receiving an event but
before writing it to the disk
○ The disk breaks and data buffer is lost
25. Performance Tuning
● If there’s any CPU bottleneck in the system we can
utilize multiple cores using in_multiprocess plugin
● Setup ntpd to prevent invalid log timestamps
● Increase the maximum number of file descriptor to
65535
● If you’re directly pushing logs from application to
Fluentd via TCP then enable tcp_tw_recycle and
tcp_tw_reuse to cope with TCP TIME-WAIT.
28. Elasticsearch
● An open source, flexible, powerful, distributed, real time text search
engine
● Built on top of Apache Lucene
● Runs on JVM
● Second most popular enterprise search engine on the planet
○ New kid in the block
○ First official release in 2010
● Developed by Shay Banon and currently maintained by
ElasticSearch
○ Github: https://github.com/elastic/elasticsearch
29. Why Elasticsearch?
● Real time write and analytics
○ Blazing fast indexing
● Long term persistence
○ Writes data to the filesystem
● NoSQL based document storage (like MongoDB)
● Schema free and denormalized (JSON in, JSON out)
● Full text search
○ Search from large volume of unstructured text data
30. Why Elasticsearch? (cont.)
● Distributed
○ Supports horizontal scaling out of the box
○ Automatic node discovery and sharding
● Multi tenant architecture
○ A single node can contain multiple indices and can
be queried individually or as a group
● High availability
○ Resilient cluster management which automatically
detects node failure and rebalances data
31. Why Elasticsearch? (cont.)
● Robust conflict management to prevent data loss
○ Uses optimistic concurrency control and versioning
● RESTful API hence developer friendly
○ Easy to perform an operation from application or any
CLI tools (e.g. curl)
33. ES Terminology (cont.)
● Shard
○ A lucene index
○ Replicated to ensure data availability
○ Each ES index contains one or more primary shards
● Node
○ A running ES instance
● Cluster
○ Collection of cooperating ES nodes
○ Manages nodes to have better availability and performance
● So the hierarchy is-
Cluster > Node > Index > Shard > Document > Field
34. ES Up and Running
● Settings
○ Config file
<ES_HOME>/config/elasticsearch.yml
○ Default node name is a Marvel comics character
○ Configurations can be changed using configuration file or can be
passed as a JVM arg during startup
● To start an ES instance
$ <ES_HOME>/bin/elasticsearch --cluster.name=<name>
● Can be accessed from
http://localhost:9200
35. ES Query DSL
● Endpoint
○ Refers to ES index type
○ Can contain multiple indices and types (Remember multitenancy?)
● Query types
○ Basic query using query parameters
{endpoint}/_search?q=code:304&size=10&pretty=true
○ Using query object
curl -XPOST {endpoint} -d ‘{json_query_doc}’
36. ES Query DSL (cont.)
● Basic ES JSON query format
{
size: # number of results (default 10),
from: # offset into results (default 0),
sort: # sort order,
query: # query object,
facet: # meta information about a field,
filter: # filter results on a specific value or range
}
37. ES Query DSL (cont.)
● Basic ES response format
{
# the results
hits: {
total: # total number of matched documents
hits: [
_id: # id of the document
max_score: # search index score (relevance with the search query)
_source: {} # the actual JSON document which was indexed
],
},
facets: {}
}
38. ES Query DSL (cont.)
● Examples
○ Match all
{"query": {
"match_all": {}
}}
○ Classic full text query
{"query": {
"query_string": {
"query": "shuttle/countdown/liftoff.html"
}
}}
39. ES Query DSL (cont.)
○ Filter on a field
{"query": {
"term": {
{field-name}: {value}
}
}}
40. ES Query DSL (cont.)
○ Range query (number, date etc.)
{"query": {
"constant_score": {
"filter": {
"range": {
{field-name}: {
"from": {lower-value}
"to": {upper-value}
}}
}}
}
}
42. ES Performance Tuning
● Increase the number of open file descriptors from 32K
to even 64K
● ES is designed to function well while running on multiple
commodity machines rather than a single machine
○ Make a cluster with several nodes if you have
staggering amount of text
○ Find a sweet spot for your shard size which suites
best with your system
43. ● Don’t let ES to consume all memory
○ Frequently accessed part of the index is assumed to be
in OS page cache
○ Search filters can also be cached in filter cache to
provide fast response
○ Field values are used for faceting and cached into field
cache
○ Keep track of memory stats so that OutOfMemory never
happens
○ Xms and Xmx should be equal
ES Performance Tuning (cont.)
44. ES Performance Tuning (cont.)
● Client considerations
○ Idempotent request
■ Requests that may change server state should
be idempotent
○ Connection pooling
■ Avoid creating a request for every query
○ Bulk update request
■ Can request bulk update when needed
■ Catch: No rollback if operation fails during update
47. ● Highly customizable dashboarding
● Kibana 3
○ Just a front-end (Angular.js + jQuery + Elastic.js)
● Kibana 4
○ Standalone web application
○ Node.js
○ Float (jQuery plotting library)
● Developed and maintained by ElasticSearch as a part
of ELK Stack
● Github: https://github.com/elastic/kibana
Kibana
48. Why Kibana?
● Built to work with Elasticsearch
● Gives shapes to data
● Very flexible interface
● Easy to export data
● Easy setup procedure
50. ● Settings
○ Tell Kibana from where to pull data
○ Configure an index pattern
● Discover
○ Displays all available documents from the search
result based on a specific timespan
○ Visualizes available data using histogram
○ Search using Lucene Query Syntax
Kibana Navigation
51. Kibana Navigation (cont.)
● Visualize
○ Creates visualizations like area chart, pie chart, data
table, tile map etc using appropriate data
● Dashboard
○ Several visualizations can be placed in a dashboard
○ Dashboard can have data from multiple indices
○ Saves you from creating frequently needed
visualizations over and over again
52. Kibana Access Control
● No authentication or role based dashboard access
support out of the box
● Basic authentication can be done from a proxy server
(e.g. nginx)
● Use Shield by ElasticSearch
○ Authentication services for both ES and Kibana
○ Role based access control for both ES and Kibana
○ Easy to integrate with OAuth, LDAP
○ Audit logging with complete record of user and system activities
○ Easy SSL/TLS configuration
54. How About A Demo?
● Two months worth of HTTP access logs of
NASA Kennedy Space Center of Florida
(openly accessible from here)
● More than 3 million requests from various
sources
● Around 400 MB server log dump
55. To Explore More...
● Fluentd Documentation
http://docs.fluentd.org/articles/quickstart
● Elasticsearch Reference
https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
● Elasticsearch from the bottom up series
https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up
● Kibana User Guide
https://www.elastic.co/guide/en/kibana/current/index.html