Dunning strata-2012-27-02

Expect More from Hadoop!

©MapR Technologies - Confidential 1

My Background

 University, Startups
– Aptex, MusicMatch, ID Analytics, Veoh
– big data since before it was big

 Open source
– even before the internet
– Apache Hadoop, Mahout, Zookeeper, Drill
– bought the beer at first HUG

 MapR
 Founding member of Apache Drill


MapR Technologies

 Enterprise quality distribution for Hadoop
– Many extensions beyond basic Hadoop
 Super strong team
– Long history of successful startups
 Strong supporter of Apache Drill
– and open source in general


meta-Hadoop?


meta
Meta- (from Greek: μετά = "after", "beyond",
"with", "adjacent", "self"), is a…


Answering
Beyond ≠ yesterday’s
problems


Philosophy First

What is History?


The study of the past

(what came before now)


What is the future?

(it comes after now)


But the future also
has a past!


the future of the past
is not
the past of the future


Do you remember the
future?


Those are
yesterday’s
answers


and also
the seeds
of tomorrow


Guys wearing
Fedoras


Hadoop has
a history


Hadoop also
has a
future


The Old Future of Hadoop

 Implementing yet another Google paper
– Map-reduce and HDFS, and Yarn and Tez
– more and more, but not really different

 Eco-system additions (more Google papers)
– simpler programming (Hive and Pig and Crunch) (Sawzall, FlumeJava, etc)
– key-value store (big table)
– ad hoc query (Dremel)
– also not really different

 Stands apart from other computing
– required by HDFS and other limitations


The New Future of Hadoop

 Real-time processing
– Combines real-time and long-time

 Integration with traditional IT
– No need to stand apart

 Integration with new technologies
– Solr, Node.js, Twisted all should work directly on Hadoop

 Fast and flexible computation
– Drill logical plan language


Example #1
Search Abuse


History matrix

One row per user

One column per thing


Recommendation based on
cooccurrence

Cooccurrence gives item-item
mapping

One row and column per thing


Cooccurrence matrix can also be
implemented as a search index


SolR
SolR
Complete Cooccurrence Indexer
Solr
Indexer
history (Mahout) indexing

Item meta- Index
data shards


SolR
SolR
User Indexer
Solr
Web tier Indexer
history search

Item meta-
Index
data shards


Objective Results

 At a very large credit card company

 History is all transactions, all web interaction

 Processing time cut from 20 hours per day to 3

 Recommendation engine load time decreased from 8 hours to 3
minutes


Scaling Estimates – Twitter Fire hose

 Old School – 8+ separate  MapR – one platform
clusters, 20-25 nodes – 5-10 nodes total, any node does any
– >3 Kafka nodes job
– >2 TwitterLogger – Full HA included,
– 5-10 Hadoop backups included,
– >3 Storm disaster recovery included
– 3 zookeepers (or not?)
– NAS for web storage
– >2 web servers


Example #2
Web Technology


Real-time Fast analysis
data (Storm)

Analytic
Raw logs
output


Large analysis
(map-reduce)

Analytic
Raw logs
output


Presentation
Browser
tier (d3 +
query
node.js)

Analytic
Raw logs
output


Old School Storm: Complex architecture
Twitter
Twitter
API Kafka
Kafka API
TwitterLogger Kafka
Kafka
Cluster
Cluster
Cluster
Storm
Kafka Storm

Web
Flume Data
NAS

HDFS
Data
Hadoop http

Web-server

MapR: One Platform with Streaming Writes
Twitter

Twitter
API http

Catcher Web-server
TwitterLogger Catcher Storm

NFS NFS NFS NFS
Optional
HDFS
MapReduce Topic Web
API
Queue Data
MapR
Users can also run extended
analytics/MapReduce on the stored
data


Objective Results

 Real-time + long-time analysis is seamless

 Web tier can be rooted directly on Hadoop cluster

 No need to move data


The future is
not what we
thought it
would be


It is better!


Get Involved!

Tweet:
#strataconf
#mapr
@ted_dunning


Get Involved!

 Join Apache Drill!
– drill-dev-subscribe@incubator.apache.org
– Follow @apachedrill

 Join MapR!
– jobs@mapr.com

 Download these slides
– http://www.mapr.com/company/events/strata-conference-2-2-27-13

 Contact me:
– tdunning@maprtech.com
– tdunning@apache.org
– @ted_dunning


Dunning strata-2012-27-02

Recommended

Recommended

More Related Content

What's hot

What's hot (8)

Viewers also liked

Viewers also liked (8)

Similar to Dunning strata-2012-27-02

Similar to Dunning strata-2012-27-02 (20)

More from MapR Technologies

More from MapR Technologies (20)

Recently uploaded

Recently uploaded (20)

Dunning strata-2012-27-02

Editor's Notes