More Related Content Similar to LinkedIn Segmentation & Targeting Platform: A Big Data Application (20) More from Amy W. Tang (9) LinkedIn Segmentation & Targeting Platform: A Big Data Application1. LinkedIn Segmentation & Targeting
Platform: A Big Data Application
Hadoop Summit, June 2013
Hien Luu, Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.
3. ©2013 LinkedIn Corporation. All Rights Reserved.
Our
mission
Connect the world’s professionals to make
them more productive and successful
4. Over 200M members and counting
2 4 8
17
32
55
90
145
2004 2005 2006 2007 2008 2009 2010 2011 2012
LinkedIn Members (Millions)
200+
The world’s largest professional network
Growing at more than 2 members/sec
Source :
http://press.linkedin.com/about
©2013 LinkedIn Corporation. All Rights Reserved.
5. *
>88%
Fortune
100
Companies
use
LinkedIn
Talent
Soln
to
hire
Company
Pages
>2.9M
Professional
searches
in
2012
>5.7B
Languages
19
>30M
Fastest
growing
demographic:
Students
and
NCGs
The world’s largest professional network
Over 64% of members are now international
Source :
http://press.linkedin.com/about
©2013 LinkedIn Corporation. All Rights Reserved.
6. Other Company Facts
*
• Headquartered
in
Mountain
View,
Calif.,
with
offices
around
the
world!
• As
of
June
1,
2013,
LinkedIn
has
~3,700
full-‐Rme
employees
located
around
the
world
Source :
http://press.linkedin.com/about
7. Agenda
ü Company Overview
• Big Data @ LinkedIn
• The Segmentation & Targeting Problem
• Solution : LinkedIn Segmentation & Targeting Platform
• Q & A
8.
Big
Data
@
LinkedIn
©2013 LinkedIn Corporation. All Rights Reserved.
9. LinkedIn : Big Data Story
©2013 LinkedIn Corporation. All Rights Reserved.
Our
Big
Data
Story
depends
on
Infrastructure!
• On-‐line
Data
Infrastructure
• Near-‐line
Data
Infrastructure
• Offline
Data
Infrastructure
Oracle
or
Espresso
Updates
Web
Serving
Teradata
Data
Streams
Near-‐line
On-‐line
Off-‐line
10. Big Data Story : On-line Data
©2013 LinkedIn Corporation. All Rights Reserved.
On-‐line
Data
Infrastructure
• Supports
typical
OLTP
requirements
• Highly
concurrent
R/W
access
• TransacRonal
guarantees
• Back-‐up
&
Recovery
• Supports
a
central
LinkedIn
Data
Principle!
• “All
data
everywhere”
• All
OLTP
databases
need
to
provide
a
Rme-‐line
consistent
change
stream
• For
this,
we
developed
and
open-‐
sourced
Databus!
Oracle
or
Espresso
Updates
Web
Serving
On-‐line
11. Big Data Story : On-line Data
Oracle
or
Espresso
Data
Change
Events
Search
Index
Graph
Index
Read
Replicas
Updates
Standar
dizaRon
A user updates the company, title, & school on his profile. He also accepts a
connection
The write is made to an Oracle or Espresso Master and DataBus replicates it:
• the profile change is applied to the Standardization service
Ø E.g. the many forms of IBM were canonicalized for search-friendliness
• …. and to the Search Index
Ø Recruiters can find you immediately by new keywords
• the connection change is applied to the Graph Index service
Ø The user can now start receiving feed updates from his new connections
12. Big Data Story : On-line Data
Databus streams also update Hadoop!
Oracle
or
Espresso
Search
Index
Graph
Index
Read
Replica
Updates
Standar
dizaRon
Data
Change
Events
13. Big Data Story : Near-line & Off-line Data
©2013 LinkedIn Corporation. All Rights Reserved.
2
Main
Sources
of
Data
@
LinkedIn
• User-‐provided
data
• e.g.
Member
Profile
data
(e.g.
employment,
educaRon
history,
endorsements)
• Tracking
data
via
web
site
instrumentaRon
• e.g.
pages
viewed,
email
opened/sent,
social
gestures
:
posts/likes/shares
Oracle
or
Espresso
Updates
Databus
Web
Servers
Teradata
14. The
SegmentaRon
&
TargeRng
Problem
©2013 LinkedIn Corporation. All Rights Reserved.
17. Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Step
1
:
Take
some
informaSon
about
users
Member
ID
Join
Date
Country
Responded
to
PromoSon
X1
1
01/01/2013
FR
F
2
01/02/2013
BE
F
3
01/03/2013
FR
F
4
02/01/2013
FR
T
Step
2
:
Provide
some
targeSng
criteria
for
a
new
promoSon
Pick
members
where
• Join
Date
between('01/01/2013",
'01/31/2013")
and
• Country="FR"
and
• Responded
to
PromoRon
X1="F"
à
Members
1
&
3
Step
3
:
Target
them
for
a
different
email
campaign
(promoRon_X2)
18. Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Step
1
:
Take
some
informaSon
about
users
Member
ID
Join
Date
Country
Responded
to
PromoSon
X1
1
01/01/2013
FR
F
2
01/02/2013
BE
F
3
01/03/2013
FR
F
4
02/01/2013
FR
T
Step
2
:
Provide
some
targeSng
criteria
for
a
new
promoSon
Pick
members
where
• Join
Date
between('01/01/2013",
'01/31/2013")
and
• Country="FR"
and
• Responded
to
PromoRon
X1="F"
à
Members
1
&
3
Step
3
:
Target
them
for
a
different
email
campaign
(promoRon_X2)
Alributes
Segment
DefiniRon
Segment
19. Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Problem
DefiniSon
• The
business
wants
to
launch
new
campaigns
omen
• The
business
wants
to
specify
targeRng
criteria
(segment
definiRons)
using
an
arbitrary
set
of
alributes
• The
alributes
omen
need
to
be
computed
to
fulfill
the
targeRng
criteria
• This
data
resides
on
Hadoop
or
TD
• The
business
is
most
comfortable
with
SQL-‐like
languages
22. Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute
Computation
Engine
Self-service
Support various
data sources
Attribute
consolidation
Attribute
availability
24. LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute Portal Web Application
Attribute & Definition
Metadata
25. LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute &
Definition
Metadata
TD Executor
Hive Executor
Pig Executor
REST
REST
REST
26. LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
M/R
Stitcher
/path/dataset1
/path/dataset2
/path/dataset3
/path/dataset4
/path/lnkd_big_table
Data
Loader
Attribute consolidation & availability
27. LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
LinkedIn big table, the most sought after data
Segmentation
Propensity
Model
Ad hoc analysis
LinkedIn big table
28. Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Attribute
Serving
Engine
Self-service
Attribute predicate
expression
Build
segments
Build lists
29. Segmentation & Targeting
©2013 LinkedIn Corporation. All Rights Reserved.
Serving Engine
$
count filter sum
complex
expressions
Σ1234
LinkedIn big table
~225M
~240
30. LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Inverted
Index
Inverted
Index
Inverted
Index
M/R
Indexer
LinkedIn big table
Attribute &
Definition
Metadata
31. LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Who are north American recruiters that
don’t work for a competitor?
Who are the LinkedIn Talent Solution prospects
in Europe?
Who are the job seekers?
32. LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
JSON Predicate
Expression
JSON Lucene
Query Parser
Inverted
Index
Inverted
Index
Inverted
Index
Segment &
List
33. LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
Complex tree-like attribute predicate expressions
34. LinkedIn Segmentation & Targeting Platform
©2013 LinkedIn Corporation. All Rights Reserved.
A marketing campaign is represented by a list
35. Conclusion
©2013 LinkedIn Corporation. All Rights Reserved.
Move at business speed and scale at LinkedIn scale
§ Segmentation & Targeting Platform
– Self-service
– Multiple data sources & massive data volume
– Support complex expression evaluation in seconds
– Attribute availability at business speed
36. Engineering Team
§ Jessica Ho
§ Swetha Karthik
§ Raj Rangaswamy
§ Tony Tong
§ Ajinkya Harkare
§ Hien Luu
§ Sid Anand
©2013 LinkedIn Corporation. All Rights Reserved.