Zero mq logs

Using ZeroMQ and
Elasticsearch for log
aggregation

Tomas Doran (t0m) <bobtﬁsh@bobtﬁsh.net>

Who are you?
• CPAN Developer
• Catalyst core team
• Moose hacker
• AnyEvent::RabbitMQ user
• Ruby/Python/C as needed
• Dayjob - deveverthingops - state51
• 3/4 Pb of MogileFS - online music
• Thousands of steams a second
• Lots of perl.
• Lots of servers
• Lots of services

Sorry!

• This isn’t a ZeroMQ tutotial
• This isn’t an ElasticSearch tutorial

Debugging production

• Is hard!
• Especially interactions
• Need to cross-correlate logs.

Naïve solution

• “Lets log into the database”
• NO NO NO NO
• 120 lines/s (7200 disk)
• 167 lines/s (10k disk)
• 250 lines/s (15k disk)

Less naïve solution

• Queue before we log
• Bulk insert
• No good for unstructured data
• No good for many different structures

Still a stupid solution

• Lots of UNION queries
• OR epic multi-way JOIN
• Adding new data types HARD

Splunk

• Splunk is enterprise software used to
monitor, report and analyze the data
produced by the applications, systems and
infrastructure to run a business.
-Wikidpedia

$$$$$$$$$$$$$$
$$$$$$$$$$$$$$
$$$$$$$$$$$$$$
$$$$$$$$$$$$$$
$$$$$$$$$$$$$$

Splunk

• Small agent program on each host you tell
about your log ﬁles - ships to server
• Server component analyzes / indexes your
logs. Also a syslog server.
• Builds structure from your data - in a GUI.

Splunk

• Splunk is amazing.
• You just tip logs into it, structure later.
• If you can afford the license, use
it, be happy!

I cannot afford splunk

• Sad panda!
• Also, splunk isn’t extensible - black box.
• So, a guy called Jorden Sissel invented:

Diversion -
ElasticSearch
• Just tip JSON documents into it (with a
type)
• Figures out structure for each type, indexes
appropriately.
• Free sharding and replication

So
• We post-process logs to be somewhat
structured.
• We can then search over them (fast!)
• Free text (for text ﬁelds)
• Numeric
• Dates + ranges

New types

• Trivial!
• Just emit it, it’s indexed and queryable.
• Can hint elasticsearch for better queries (if
needed)

Logstash
In JRuby, by Jordan Sissel

Input
Simple: Filter
Output

Flexible
Extensible
Plays well with others
Pre-built web interface

Logstash
INPUT

FILTER

OUTPUT

Java (JRuby) decoding
AMQP is, however
much much faster than
perl doing that...

JVM+-

Logstash on each host
is totally out...

• Running it on ElasticSearch servers which
are already dedicated to this is ﬁne..
• I’d still like to reuse all of it’s parsing

Lots of my data is
already JSON
• Log::Message::Structured
• AnyEvent::RabbitMQ
• App logging relies on Rabbit being up
• Can get slow if rabbit is sick and blocks

But not in the right
format..
• So I can write a munger in ruby...
• Or I can write one in perl.
• I’m already (going to be) running a
collection / aggregation daemon on each
host (for apache logs).

Myxomatosis

• If RabbitMQ gets really sick, app slows
down.
• Multiple exponential backoffs
• AMQP is crap for ‘ﬁre and forget’

Syslog

• Yes, I could. But JSON in syslog - just no.
• 1024 bytes - UDP packet.
• Not inventing my own protocol!

ZeroMQ has the
correct semantics
• Pub/Sub sockets
• Never, ever blocking
• Lossy! (If needed)
• Buffer sizes / locations conﬁgureable
• Arbitrary message size

Subset of logstash

• In perl
• ZeroMQ receiver
• Per host aggregation
• Push AMQP to RabbitMQ
• Run logstash on a central server

Subset of logstash

• Small async process. ZMQ receive socket.
• Pull JSON from ZMQ, decode, munge, emit
back to AMQP
• Slowness no longer blocks app servers

Subset of logstash
• Use logstash at the other end to pop
AMQP and insert into ElasticSearch
• Keep per-host cost small
• Same process can tail logﬁles and cast into
AMQP
• Reuse all the logstash parsing (at server
side) for apache logs etc

100% drop in
compatible subset of
logstash

100% drop in
compatible subset of
logstash
In perl - making it easy for you to emit
structured app events as JSON.

Everything is down - your app is still up (you
lose some logs)

24Mb
I used Moose - RAM use can and will go down.

Yes, yes - I know

• The web app is fugly
• Other people already have alternate
implementations
• Keeping interoperable opens lots of choices
• E.g. graylog2 as the event sink

rfc3164
• This document describes the observed behavior
of the syslog protocol
• This is not a good place to be.
• Working with Jordan to document message
format.
• End to end tests of both implementations
to follow.

Code

• http://logstash.net/
• https://github.com/bobtﬁsh/Log-Stash
• https://github.com/logstash/logstash

Thanks!
• <bobtﬁsh@bobtﬁsh.net>
• t0m on irc.perl.org
• And Freenode (idle in #logstash)

• We are hiring!!!
• Developers (learn ruby, or perl, or both!)
• Front end people (play with websockets!)

This is all now pointless

• The latest logstash .jar will do all the
mungeing for you.
• And it (mostly) runs in MRI (C ruby), so my
RAM thing is less bad.
• N implementations still a good thing!

Zero mq logs

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Zero mq logs

Semelhante a Zero mq logs (20)

Mais de Tomas Doran

Mais de Tomas Doran (20)

Último

Último (20)

Zero mq logs

Notas do Editor