At Booking.com, we have a constant flow of events coming from various applications and internal subsystems. This critical data needs to be stored for real-time, medium and long term analysis. Events are schema-less, making it difficult to use standard analysis tools.This presentation will explain how we built a storage and analysis solution based on Riak. The talk will cover: data aggregation and serialization, Riak configuration, solutions for lowering the network usage, and finally, how Riak's advanced features are used to perform real-time data crunching on the cluster nodes.
19. EVENTS FLOW PROPERTIES
• Read-only
• Schema-less
• Continuous, ordered, timed
• 15 K events per sec
• 1.25 Billion events per day
• peak at 70 MB/s, min 25MB/s
• 100 GB per hour
20. SERIALIZATION
• JSON didn’t work for us (slow, big, lack features)
• Created Sereal in 2012
• « Sereal, a new, binary data serialization format that
provides high-performance, schema-less serialization »
• Added Sereal encoder & decoder in Erlang in 2014
47. WHAT WE WANT
• Storage security
• Mass write performance
• Mass read performance
• Easy administration
• Very scalable
48. WE CHOSE RIAK
• Security: cluster, distributed, very robust
• Good and predictable read / write performance
• The easiest to setup and administrate
• Advanced features (MapReduce, triggers, 2i, CRDTs …)
• Riak Search
• Multi Datacenter Replication
49.
50. CLUSTER
• Commodity hardware
• All nodes serve data
• Data replication
• Gossip between nodes
• No master
Ring of servers
59. RIAK CONFIGURATION
• Vnodes: 256
• Replication: n_val = 3
• Expiration: 8 days
• 4 GB files
• Compaction only when file is full
• Compact only once a day
66. PUSH DATA IN
• In each DC, in each cell, Loggers push to Riak
• 2 protocols: REST or ProtoBuf
• Every seconds:
• Push data values to Riak, async
• Wait for success
• Push metadata
78. THE IDEA
• Instead of
• Fetching data, crunch data, small result
• Do
• Bring code to data
79. WHAT TAKES TIME
• Takes a lot of time
• Fetching data out
• Decompressing
• Takes almost no time
• Crunching data
80. MAPREDUCE
• Send code to be executed
• Works fine for 1 job
• Takes < 1s to process 1s of data
• Doesn’t work for multiple jobs
• Has to be written in Erlang
81. HOOKS
• Every time metadata is written
• Post-Commit hook triggered
• Crunch data on the nodes
82.
83. Riak post-commit hook
REST serviceRIAK service
key key
socket
new data sent for storage
fetch, decompress
and process all tasks
NODE HOST
84. HOOK CODE
metadata_stored_hook(RiakObject) ->
Key = riak_object:key(RiakObject),
Bucket = riak_object:bucket(RiakObject),
[ Epoch, DC ] = binary:split(Key, <<"-">>),
Data = riak_object:get_value(RiakObject),
DataKeys = binary:split(Data, <<"|">>, [ global ]),
send_to_REST(Epoch, Hostname, DataKeys),
ok.
86. REST SERVICE
• In Perl, using PSGI, Starman, preforks
• Allow to write data cruncher in Perl
• Also supports loading code on demand
87. ADVANTAGES
• CPU usage and execution time can be capped
• Data is local to processing
• Two systems are decoupled
• REST service written in any language
• Data processing done all at once
• Data is decompressed only once
88. DISADVANTAGES
• Only for incoming data (streaming), not old data
• Can’t easily use cross-second data
• What if the companion service goes down ?
89. FUTURE
• Use this companion to generate optional small values
• Use Riak Search to index and search those
111. CONCLUSION
• We used only Riak Open Source
• No training, self-taught, small team
• Riak is a great solution
• Robust, fast, scalable, easy
• Very flexible and hackable
• Helps us continue scaling