2. Who are you?
• CPAN Developer
• Catalyst core team
• Moose hacker
• AnyEvent::RabbitMQ user
• Ruby/Python/C as needed
• Dayjob - deveverthingops - state51
• 3/4 Pb of MogileFS - online music
• Thousands of steams a second
• Lots of perl.
• Lots of servers
• Lots of services
5. Naïve solution
• “Lets log into the database”
• NO NO NO NO
• 120 lines/s (7200 disk)
• 167 lines/s (10k disk)
• 250 lines/s (15k disk)
6. Less naïve solution
• Queue before we log
• Bulk insert
• No good for unstructured data
• No good for many different structures
7. Still a stupid solution
• Lots of UNION queries
• OR epic multi-way JOIN
• Adding new data types HARD
8. Splunk
• Splunk is enterprise software used to
monitor, report and analyze the data
produced by the applications, systems and
infrastructure to run a business.
-Wikidpedia
9. Splunk
• Splunk is enterprise software used to
monitor, report and analyze the data
produced by the applications, systems and
infrastructure to run a business.
-Wikidpedia
11. Splunk
• Small agent program on each host you tell
about your log files - ships to server
• Server component analyzes / indexes your
logs. Also a syslog server.
• Builds structure from your data - in a GUI.
12. Splunk
• Splunk is amazing.
• You just tip logs into it, structure later.
• If you can afford the license, use
it, be happy!
17. Diversion -
ElasticSearch
• Just tip JSON documents into it (with a
type)
• Figures out structure for each type, indexes
appropriately.
• Free sharding and replication
18. So
• We post-process logs to be somewhat
structured.
• We can then search over them (fast!)
• Free text (for text fields)
• Numeric
• Dates + ranges
19. New types
• Trivial!
• Just emit it, it’s indexed and queryable.
• Can hint elasticsearch for better queries (if
needed)
20. Logstash
In JRuby, by Jordan Sissel
Input
Simple: Filter
Output
Flexible
Extensible
Plays well with others
Pre-built web interface
31. Java (JRuby) decoding
AMQP is, however
much much faster than
perl doing that...
JVM+-
32. Logstash on each host
is totally out...
• Running it on ElasticSearch servers which
are already dedicated to this is fine..
• I’d still like to reuse all of it’s parsing
33. Lots of my data is
already JSON
• Log::Message::Structured
• AnyEvent::RabbitMQ
• App logging relies on Rabbit being up
• Can get slow if rabbit is sick and blocks
35. But not in the right
format..
• So I can write a munger in ruby...
• Or I can write one in perl.
• I’m already (going to be) running a
collection / aggregation daemon on each
host (for apache logs).
40. Subset of logstash
• In perl
• ZeroMQ receiver
• Per host aggregation
• Push AMQP to RabbitMQ
• Run logstash on a central server
41. Subset of logstash
• Small async process. ZMQ receive socket.
• Pull JSON from ZMQ, decode, munge, emit
back to AMQP
• Slowness no longer blocks app servers
42. Subset of logstash
• Use logstash at the other end to pop
AMQP and insert into ElasticSearch
• Keep per-host cost small
• Same process can tail logfiles and cast into
AMQP
• Reuse all the logstash parsing (at server
side) for apache logs etc
44. 100% drop in
compatible subset of
logstash
In perl - making it easy for you to emit
structured app events as JSON.
Everything is down - your app is still up (you
lose some logs)
50. Yes, yes - I know
• The web app is fugly
• Other people already have alternate
implementations
• Keeping interoperable opens lots of choices
• E.g. graylog2 as the event sink
51. rfc3164
• This document describes the observed behavior
of the syslog protocol
• This is not a good place to be.
• Working with Jordan to document message
format.
• End to end tests of both implementations
to follow.
53. Thanks!
• <bobtfish@bobtfish.net>
• t0m on irc.perl.org
• And Freenode (idle in #logstash)
• We are hiring!!!
• Developers (learn ruby, or perl, or both!)
• Front end people (play with websockets!)
54. This is all now pointless
• The latest logstash .jar will do all the
mungeing for you.
• And it (mostly) runs in MRI (C ruby), so my
RAM thing is less bad.
• N implementations still a good thing!
Mention JFDI, and I really don&#x2019;t care what language it&#x2019;s in\n
The former has amazing documentation.\nThe latter, well, bad luck. (Great reference material, but docs not so great. Good mailing list though)\n
grep is great, I love grep\nNot very good for 100 servers at once\nSolution needs to be just as good as grep for the simple case\n
This is always the first thing sugested / thought of.\nIt&#x2019;s great for audit trail, as your DB (should be!) durable\nDoing the simple thing is (at least) one disk rotation (aka fsync) per log line\n\n
This solves the performance problems, but gives you a load more moving parts\nA table with id, date, message is likely to perform less well than grep\nOne table with lots of NULL cols, lots of tables (one per data type)\n
And how do we get data back from this pile?\nNot as easy as grep!\nYou&#x2019;re stuffed as soon as you add more data types\n
We played with this. We liked it, a lot.\n
Enterprise means\n
Spenny\n
So, what does it do?\nYou can just tip logs into it, and it&#x2019;ll do the right thing... Even after the fact.\nSearching is fast fast fast!\n
Really, it&#x2019;s a great product.\nShame about the pricing.\n
I&#x2019;m also a little wary of using splunk as more than &#x2018;turbo grep&#x2019;\nSo, open source - someone else must have thought about this, right?\n
I&#x2019;m also a little wary of using splunk as more than &#x2018;turbo grep&#x2019;\nSo, open source - someone else must have thought about this, right?\n
I&#x2019;m also a little wary of using splunk as more than &#x2018;turbo grep&#x2019;\nSo, open source - someone else must have thought about this, right?\n
Isn&#x2019;t he cute? And woody!\n
Sorry - just to go off at a tangent...\nLets make all our log messages JSON messages, as json is fast, easy to parse (and you can search it with grep!)\nLets throw it in elasticsearch. Ponies and unicorns for everyone.\n
N.B. ElasticSeach storage will be MUCH larger than the bytes size (as it&#x2019;s indexed 90 ways)\nWe post-process our logs before insertion, to pull out structured fields (e.g. dates & durations)\n\n
We can add new log message types (or start parsing things we currently add as simple text), make schema changes any time we want.\nWe just pour data into ElasticSearch, and then get better searching than grep!\nThe more it&#x2019;s split into fields, the more we win, but just writing log lines still gives us as good as grep.\nAnyway - back to the story..\n
Very simple model - input (pluggable), filtering (pluggable by type) in C, output (pluggable)\nLots of backends - AMQP and ElasticSearch + syslog and many others\nPre-built parser library for various line based log formats\nComes with web app for searches.. Everything I need!\n
Lets take a simple case here - I&#x2019;ll shove my apache logs from N servers into ElasticSearch\nI run a logstash on each host (writer), and one on each ElasticSearch server (reader)..\n
So, that has 2 logstashes - one reading files and writing AMQP\nOne reading AMQP and writing to ElasticSearch\nHowever, my raw apache log lines need parsing (in the filter stage) - to be able to do things like &#x2018;all apache requests with 500 status&#x2019;, rather than &#x2018;all apache requests containing the string 500&#x2019;\n
Red indicates the filtering\n
There we go, everyone got that?\n
Except I could instead do the filtering here, if I wanted to.\nDoesn&#x2019;t really matter - depends what&#x2019;s best for me..\nRight, so... Lets try that then?\n
First problem...\n
Well then, I&#x2019;m not going to be running this on the end nodes.\n
And it&#x2019;s not tiny, even on machines dedicated to log parsing / filtering / indexing\n
But sure, I spun it up on a couple of spare machines...\n
It works fairly well as advertised.\n
The JVM giveth (lots of awesome software), the JVM taketh away (any RAM you had).\nruby is generally slower than perl. jruby is generally faster than perl.\nI&#x2019;m not actually knocking the technology here - just saying it won&#x2019;t work in this situation for me.\n
So, anyway, I&#x2019;m totally stuffed... The previous plan is a non-starter.\nSo I need something to collect logs from each host and ship them to AMQP\nOk, cool, I can write that in plain ruby or plain perl and it&#x2019;s gotta be slimmer, right?\n
But wait a second... I just want to get something &#x2018;real&#x2019; running here...\nSo, I&#x2019;m already tipping stuff into AMQP..\n\n\n
So I can just use my existing structured data, right.. Well - no, sorry..\nAnd I got distracted at this point. For about 6 months.\n
So I come back to this, still needing something to munge my JSON into other JSON.\nBut, right now, the easiest thing to try is:\n
30 line perl script, it works.\nI have data in ElasticSearch.\nI can view it in the logstash webapp\n
Going back a few slides - if RabbitMQ gets sick, everything goes bad.\nI ended up with a load of code to deal with this.\nIt still didn&#x2019;t work very well.\nIn fact, the entire idea of using TCP/IP for this is probably bad.\n\n
Syslog is hateful.\nMOST of my log messages are under 1024 bytes, but I don&#x2019;t want to throw them away (or throw an exception) if they aren&#x2019;t.\n
ZeroMQ looked like the right answer.\nI played with it. It works REALLY well.\nI&#x2019;d recommend you try it.\n
So lets write this per host collection daemon\nTake our previous mungeing code, and run it per host in the aggregation process\n
Tada! I have fixed all my woes with rabbitmq and at the same time I&#x2019;ve got my app logs in logstash format for free.\n\n
I can reuse all the heavy-lifting parts of logstash.\nI can reuse my per host ZMQ daemon as a log file tailer.\nOverhead on hosts is very small. Heavy lifting occurs entirely in the search cluster.\n
So, to recap... I&#x2019;ve got....\n
I&#x2019;ve got a solution to logging lots of stuff but not blocking or falling over.\n
I&#x2019;ve got a solution that has a minimal impact on my servers.\n\n
\n
\n
So, this is what it actually looks like.\nRaw app logs go to the agent via ZMQ. It munges them to processed logstash logs, emits.\nAgent also tails fails and emits raw logstash\nLogstash does parsing of apache logs\n
\n
It&#x2019;s taken me over 6 months to get any of it running, I don&#x2019;t have time to re-write the web app\nSomeone else is already doing that.\nI love open source.\n
So, I&#x2019;ve talked about these &#x2018;raw&#x2019; and &#x2018;processed&#x2019; log formats - they&#x2019;re just conventions to what fields can be found in the JSON.\nThis still needs to be better documented!\n