Linked in stream experimentation framework

LinkedIn’s STREAM EXPERIMENTATION FRAMEWORK
Joseph Adler, Bee-Chung Chen, and Xin Fu
O’Reilly Strata Conference
February 12 2014

©2014 LinkedIn Corporation. All Rights Reserved.

The LinkedIn Stream
Like many social networks, the
centerpiece of LinkedIn’s home
page is a news stream.
It contains

• Updates about users’ networks
• News stories and shares
• Recommendations


The LinkedIn Stream
We operate at a large scale.

• 277+ million members
• 75+ million monthly unique
•

users
5000+ employees


The LinkedIn Stream
Today, we’ll tell you how we
experiment with new content in
the stream:

• Creating new content
• Maximizing relevance
• Managing tests


History of the LinkedIn Stream
Network updates were
introduced in 2006
Back then, LinkedIn had

• 5mm members
• 875k monthly uniques
• 70 employees


In practice this meant:

•Slow changing content, small

number of updates, weekly visit
rate

‣ No ranking/optimization

•Small number of active tests,
limited analytics resources

‣ Primitive resources for A/B tests

•Limited engineering resources
‣ Hacky solution for testing new
content...



We experimented with new
content using a system called
the Analytics Prototype Engine,
or APE. It was implemented as
an ad slot on the home page.
Big wins included:

• People You May Know
• Groups You Might Like
• Jobs You Might Be Interested In

We added more content over
the next couple of years:

•Status updates
•Twitter content
•Group discussions
•OpenSocial content (TripIt,
GitHub, and more...)


By 2009, the stream looked
very similar to the stream
today.
LinkedIn was much bigger than
when we first added a news
stream...

• 55mm members
• 36mm monthly uniques
• 500 employees (end of year)


… but the infrastructure hadn’t
changed much and we were
experiencing growing
pains:

•No system for ranking and
optimization:

‣ Users were overwhelmed with low
relevance updates

•No system for A/B testing

‣ Overlapping A/B tests, poor

experiment design, difficult analysis

•No system for rapid
prototyping/testing

‣ APE was making the site slow and
unstable, and was shut down


History of the Stream
In the rest of this talk, we’ll tell
you how we’ve addressed
these challenges (and used a
lot of data science to make this
happen).


Content Insertion
In the beginning (2006),
experiments happened outside
the stream through APE:

• Easy data uploads
• Management UI
• Templates


Content Insertion
Most new content experiments
boil down to one thing: creating
experimental data.
We wanted the data experts to
be able to create experiments
easily by focusing on data, not
on writing production code (and
wrestling with build systems,
deployment processes, etc).
We created a system that lets
data scientists push new
content into the stream by
writing scripts (in Pig, Hive, etc).

Content Insertion
Project Gorilla brought the spirit
of APE back to the home page,
inside the stream.

nhome

USCP

Federator

Gorilla First Pass
Ranker

Architecture diagram →

Gorilla Voldemort Store

Gorilla Batch

Gorilla jobs


Content Insertion
What does this consist of?

•An Apache Pig UDF for

pushing content
•A batch process that filters,
consolidates, and ranks
updates
•A process that pushes data
from Hadoop into Voldemort
(our NoSQL key/value store)
•An online system that fetches
updates from the store and
mixes them into the stream

nhome

USCP

Federator

Gorilla First Pass
Ranker

Gorilla Voldemort Store

Gorilla Batch

Gorilla jobs

Content Insertion
Our implementation is very simple:

•LinkedIn production systems use

rest.li as an API (JSON data +
schema)
•We create data offline on Hadoop,
put it in Voldemort, and surface it
through an API
This means that we can experiment
easily using existing templates,
tracking, etc; we just have to change
the data that’s rendered.
(We’re also experimenting with a
similar real time system based on
Apache Samza.)

Relevance Optimization
Bring each individual user the most relevant items from different
sources to optimize for a single or multiple measurable
objectives



• Maximize users’ clicks on items in the stream
• Rank items according their click rates

• Probability that a user would click an item

• Predict the click rate based on

• User features: Profile, visit pattern, interests, …
• Item features: Type, topics, keywords, …
• User-item interaction features
• Context: Device, time of day, previous page …


Large scale logistic regression

•Input: A set of past users’ responses to items
Response
1
0
…

Feature Vector
(Gender=M, JobTitle=CEO, ItemType=JobChange, ...)
(Gender=F, JobTitle=Engineer, ItemType=Article, ...)
…

•Output: Model parameters
•Challenge: Data too large to fit in a single machine
•Solution: Train a model using MapReduce on Hadoop


Large scale Logistic Regression with ADMM

Large Input Data Set

Partition 1

Partition 2

Partition 3

…

Partition K

Logistic
Regression

Logistic
Regression

Logistic
Regression

…

Logistic
Regression

Consensus
Computation


Diversity
Users get tired when seeing items of the same type many times in the
stream.
Example: Group discussions
Drop in Click Rate
2 consecutive
discussions

21%

3 consecutive
discussions

48%


Multi-Objective Optimization

• Different items in the stream generate different kinds of value
• Click
• Social actions: Like, share, comment, …
• Revenue from sponsored items
• One approach:

Maximize revenue s.t. clicks and social actions are
still within ε% of optimal

• It requires extensive experiments!


Experimentation Framework
Stream experiments are carried
out on LinkedIn’s central
experimentation platform:

• A one stop solution for feature
•
•

A/B testing, ramping, and
advanced targeting needs
Built-in power calculation to aid
experiment design
Automated reporting and
analysis capabilities
Mockup of UI



• History: assign members into test groups based on modulo of
Member IDs

• A very high likelihood of range overlaps between tests
• Just one experiment can negatively affect results of other tests
executed on the same page

• Now: deterministic pseudo-random algorithm for treatment
assignment computation

• Improved logging of treatment assignment
• Automated scorecards
• Record of historical experiments



• History: focus on productspecific metrics

• Stream relevance change
•

⇒ CTR
Profile redesign
⇒ # of profile views

• Now: standardized, tiered
metric system

• Sitewide Tier 1 metrics
• Product-specific Tier 2 / Tier 3
•

metrics
Comprehensive understanding
of feature impact


Mockup of UI

Conclusions
LinkedIn has always experimented with site content. As we’ve
grown, we’ve had to rethink how we experiment.
Key lessons:

•Managing experimentation at scale is hard
•Scale means users, content volume, and employees
•Invest in platforms if it saves time, money, labor.


Linked in stream experimentation framework

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Linked in stream experimentation framework

Semelhante a Linked in stream experimentation framework (20)

Último

Último (20)

Linked in stream experimentation framework