Delivered by Bhasker V Kode at foss.in/2009
Official talk page at http://foss.in/2009/schedules/talkdetailspub.php?talkid=17
Erlang 's support for handling binaries and pattern matching make it a great choice for parsing everything from IPv4 packets, to payloads from the Memcached protocol, SWF files, or databases like Tokyo Cabinet. From a functional programming perspective, there are various ways of building these parsers, taking advantage of the concurrent and recursive nature that is inherent to the language and other challenges which have been gathered while validating the storage & retrieval options for our distributed crawler, and submitting patches to projects like Medici & Tora ( erlang based Tokyo Cabinet clients). The talk will also touch upon Tokyo cabinet's support for mapreduce with Lua, and notes from building your own custom formats & our internal mapreduce'esque and caching frameworks used in building a multi-million impression platform utilizing under a gig of RAM per node.
Notes on:
- trends in disk/memory/bandwidth
- why erlang, RAM, binaries
- garbage collection in the erlang VM
- message passing
- use-cases
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Parsing binaries and protocols with erlang
1. “Parsing binaries and protocols
with erlang ?!”
Bhasker V Kode
cofounder & CTO at hover.in
at foss.in
December 4th, 2009
http://developers.hover.in
6. “ha! ofcourse i knew that...
err.... but people scale...
that's what they do .....
that's our way out !!!
scaling out ...
scaling up ...
auto scaling even...!!!
: O ”
foss.in/2009 http://developers.hover.in
7. “scale UP ...!
more RAM seems to stop those
stall those silly CPUunit warnings
my hosting provider gives...
bring on those infinite loops &
polling crons. RealTimeWeb FTW!”
foss.in/2009 http://developers.hover.in
8. “scaling OUT , maybe with a
distributed filesystem
and figure out a way for nodes to
talk, and... Replication... and
location transparency during
weekends... and commodity
hardware which i can't pay for ”
foss.in/2009 http://developers.hover.in
9. More data becoming archival
NOT by choice, but forced to.
Not pushed to handling streams of
data well ( even hadoop!) #bigdata
If you're not compromising, you're
not pushing enough. Disk's loss
must be some else's gain.
fixedlength eg's at fb, twitter, google
foss.in/2009 http://developers.hover.in
10. Erlang for RAM
on the web is the new
Embedded C
foss.in/2009 http://developers.hover.in
11. “THE NEWS TODAY. Once popular
retro format 'binary' continues to
go unnoticed after brief sightings
on wallpapers during the matrix
trilogy ....”
pssst! in files of any mime/content type
in db's that accept binary
in RAM, via caching engines
compact for n/w transfer & storage
the answer to unicode
foss.in/2009 http://developers.hover.in
12. “fine! Binaries are everywhere,
disk's are not keeping up, and i've
got more cores on my nodes every
year.”
foss.in/2009 http://developers.hover.in
13. “But i'm not still not going near a
strict, dynamically typed functional
programming language with
support for concurrency,
communication, and distribution,
automatic memory management &
supports multiple platforms !!!”
foss.in/2009 http://developers.hover.in
16. “ahh... so processes are pseudo
threads in the erlang VM that are
light weight & the base of erlang
programs having their own heap or
message inbox & are meant for
message passing erlang
primitaves. Also the developer can
configure how many cores are
used based on # of schedulers,
which run process's.
foss.in/2009 http://developers.hover.in
18. Let M= msgs to random users
Let N= 100,000 users
Route M msgs to right N users !
typical onenode approach :
for i to M
for j to N
if match, add_update
actor approach:
N concurrent processes listening to all msgs
As new msg arrives, msg pass to all N pids
in each concurrent process: if match, add_update
foss.in/2009 http://developers.hover.in
23. “ahh... so this is what the no
shared memory in erlang, or light
weight process's being garbage
collected easily since they dont
have references to data in each
other's process heap, & messages
copied or shared based on it's
size, likelihood of reuse and also
optimized for binary. tellmemore!!”
foss.in/2009 http://developers.hover.in
27. “Can a spawned process listen as
long as i want it to?”
“Can a spawned process stop
listening when I want it to?”
“Can a spawned process spawn
more processes?”
foss.in/2009 http://developers.hover.in
28. “So though erlang gives a library
called OTP & a db called mnesia for
making life easier you can parse
or create binaries easily, make
clientserver programs, distributed
rpc calls, tailrecursive servers,
message/priority queue's for
flowcontrol, talk to ports and other
lang's, or create any data structure
explicitly (a) inmemory (b)ondisk
of any connected node!
foss.in/2009 http://developers.hover.in
29. “show me the demo's”
● Process related
– Message queue's , Client – server
– RPC , Timeouts
● Binary
– Binary pattern matching, Parse swf/mp3 for metadata
– Networking, comm. with C, Tokyocabinet client eg.
● Process + Binary!
– Building a production ready inmemory CDN
consistently faster than Am4z0n cl0udfr0nt, in stages
open & gzip < concat js's < inmemory < streaming?
foss.in/2009 http://developers.hover.in
30. “Binary pattern matching ?”
<<Value:Size/TypeSignednessEndianism
unit:Unit>>
<<1:32>> = <<0,0,0,1>.
<<1:32/unsigned-little>> = <<1,0,0,0>.
<<_:8,“mnesia”/binary>> = <<”Amnesia”>>.
So <<Bin>> could be unicode characters
( English, hindi, tamil ) or JPG's or http headers
or basically segments of binaries
NewBinary=<<Segment1,Segment2>>.
foss.in/2009 http://developers.hover.in
31. summary of tech at hover.in
● LYME stack since ~dec 07 , 4 (1) nodes (64bit 4GB)
● python crawler + associated NLP parsers, index's now
in tokyo cabinet, inverted index's in erlang 's mnesia db
with binaries of 5 diff indian languages + multiple
contenttypes, cpu timesplicing algo's, priority queue's
for heatseeking algo, flowcontrol, caching engines,
cyclic queues, mapreduces with nonblocking gathers,
headlessfirefox for thumbnails, patches to
tokyocabinet client 'medici'
● Beta in Jan 09, 1 million hovers/month in May'09
●
24 developers + several interns across ~2 years
foss.in/2009 http://developers.hover.in