At Yahoo!, Hadoop plays a central role in providing personalized experiences for our users and creating value for our advertisers. In this talk, we will discuss the convergence of low-latency processing and Hadoop platform. Through a collection of use cases, we will explain how Yahoo! delivers personalized user experience through Hadoop and Storm. We have developed Storm-on-YARN to enable Storm streaming/micro-batch applications and Hadoop batch applications hosted on a single cluster. Storm applications could leverage YARN for resource management, and apply Hadoop style security to Hadoop datasets on HDFS and HBase. Yahoo! has recently released our Storm enhancement as open source.
Presenter(s):
Andy Feng, Distinguished Architect, Cloud Engineering Group, Yahoo!
Bobby Evans, Tech Yahoo!, Apache Hadoop PMC and Committer
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
April 2013 HUG: Storm and Hadoop - Convergence of Big-Data and Low-Latency Processing
1. Storm
and
Hadoop:
Convergence
of
Big-‐Data
and
Low-‐
Latency
Processing
Andy
Feng
(afeng@yahoo-‐inc.com)
Robert
Evans
(evans@yahoo-‐inc.com)
Yahoo!
Inc.
6. Storm:
Distributed
Stream
Processing
hNps://github.com/nathanmarz/storm
X
Example
of
Streams
• User
acRviRes
• Ad
beacons
• Content
feeds
• Social
feeds
• …
spout
bolt
bolt
bolt
7. Storm
API:
Illustrated
public
class
DoubleAndTripleBolt
extends
BaseRichBolt
{
private
OutputCollectorBase
_collector;
…
public
void
execute(Tuple
input)
{
int
val
=
input.getInteger(0);
_collector.emit(input,
new
Values(val*2,
val*3));
_collector.ack(input);
}
}
10. Hadoop
YARN:
MapReduce
&
Beyond
hNp://hadoop.apache.org/docs/r0.23.6/hadoop-‐yarn/hadoop-‐yarn-‐site/YARN.html
Yahoo!
has
deployed
Hadoop
YARN
into
over
40k
machines
in
producRon.
11. Storm
Enhancement
by
Yahoo!
• YARN
IntegraRon
– Enable
Storm
topologies
to
leverage
Hadoop
resources
• Coming
soon
at
github.com/yahoo/storm-‐yarn
• Storm
enhancement
• Contributed
to
Storm
via
pull
requests
– Security
• AuthenRcaRon,
AuthorizaRon,
Audit
(Pull
#469,
#511,
#528)
• SerializaRon
(Pull
#461,
#472,
#473)
• UI
(Pull
#488)
– Message
Transport
• 0MQ
replacement
(Pull
#518)
– Reliability
• Zookeeper
client
exponenRal
back-‐off
(Pull
#471)
• Bug
fix
(Pull
#476)
• Many
test
cases
15. AuthenRcaRon/AuthorizaRon/Audit
• AuthenRcaRon
plugins
– Kerberos
(soon)
– Digest
– None
– Bring
your
own
• AuthorizaRon
plugins
– Accept
all
– Limited
operaRons
only
– User
whitelist
– Bring
your
own
• Audit
– Access
log
on
Nimbus/
DRPC
servers
16. Tuple
SerializaRon
&
Transport
• Tuple
serializaRon
plugins
– Default
serializer
– Blowfish
serializer
(encrypRon)
– Bring
your
own
• Transport
Plugins
– 0MQ
– Java
Nio.2
– Bring
your
own
Tuple
17. Conclusion
• Yahoo!
is
leading
the
emergence
of
big-‐data
&
low-‐latency
processing
via
open
source
collaboraRon.
– Join
us
at
Hadoop
Summit
2013
for
update
18. We
Are
Hiring!
Please
reach
out
to
Michael
Grossmann
<grossman@yahoo-‐inc.com>