Recommendation and personalization systems are an important part of many modern websites. Graphs provide a natural way to represent the behavioral data that is the core input to many recommendation algorithms. Thomas Pinckney and his colleagues at Hunch (recently acquired by eBay) built a large scale recommendation system, and then ported the technology to eBay. Thomas will be discussing how his team uses Cassandra to provide the high I/O storage of their fifty billion edge graphs and how they generate new recommendations in real time as users click around the site.
1. Modeling taste with Cassandra
Affinity is based on user tastes, preferences, and interests
1
2. What is a taste profile?
Operational definition: the set of things you like and dislike
Stuff I like Stuff I don’t like
Challenge: how do you build a set of things you like and dislike
Operational definition: the taste profile for someone? 2
4. Inferring correlations
D 1) User A:
• Democrat
• Likes Arugula
2) User B:
C
• Republican
E
? • Dislikes Arugula
3) User C indicates:
• Democrat
What would we infer is User C’s affinity for
Arugula?
A
Answer: User C would like Arugula
B
4
5. Inferring correlations
Like arugula
User A
<3, 2.5>
<1,1>
Dislike Like
Obama Obama
User B
<-2,-1.5>
<-3,-3>
Dislike arugula
User C If someone’s affinity
for Obama is 2.0,
<2,?>
what is their affinity
for arugula?
5
6. Discovering latent factors
Obama
Liberal
Arugula <5, 5>
Like arugula
<4, 4>
User A
<3, 2>
<1,1>
Dislike Like
Obama Obama
User B
<-2,-1.5>
Iceberg
<-3,-3>
<-4, -4> Dislike arugula
GOP
<-5, -5> User C Predict 1.5 for how
much this person will
<2,1.5>
Conservative like arugula.
6
7. Taste space = many latent factors
<0.7, 4.4, -.1>
Liberal
<0.5, 2.4, -.4>
A
Extroverted
Masculine Feminine
<-0.5, -3.1, 0.1>
Introverted
B
Conservative
7
8. What is a taste profile profile?
Operational definition: a coordinate in taste space
Stuff I like (close to me in taste space) Stuff I don’t like (far away in taste space)
Operational definition: the set of things you like and dislike
Challenge: how do you calculate taste coordinates? 8
9. Calculating taste coordinates
D Edge weight = dot product of nodes
? <x, y>
to constrain similar items to be
2 <1, -1>
close to each other.
C Assume edge weights of:
E +2 = “love”
-2 = “hate”
2 <1, -0.5>
Democratic node must solve:
1*x -2*y = 2 (edge from A)
2
-2 1*x -1*y = 2 (edge from C)
A
Solution = <2, 0>
<1, -2> B
<-1, 2>
9
10. Updating taste coordinates
User A purchases a camera...
<1, -1>
<1, -0.5>
2 <1, -1>
2 <1, -1>
C
C
<-1, 0.5>
<-1, 0.5>
<1, -0.5>
2 <1, -0.5>
2
-2 2
2 -2
A 2
A
<1, -2> B
<0.75, -2.5> B
<-1, 2>
<-1, 2>
Resulting in blue coordinates changing.
11. v1 System overview - Model updates
1) Receive event
Rec. Updater (eg, Purchase)
Engine
3) Write user 2a) Write Purchase edge
and item 2b) Read other edges
coordinates for this user and item
Reco. DB Taste graph
User -> coord
Item -> coord
12. v1 System overview - Rec serving
1) Page load Rec. Updater
requests Engine
recommendations
2) Rec. engine
finds other
cameras close
to user’s
3) Recommendations coordinates
shown to user
Reco. DB Taste graph
User -> coord
Item -> coord
13. v1 Taste Graph data size
40 billion edges
2 billion item nodes
200 million user nodes
5TB of data, takes up 10TB with Replication Factor of 2
We expect this to quadruple next year as we get more events and add
new types of edges
13
14. v1 Taste Graph DB configuration
32 Linux machines
128GB RAM
1TB iSCSI SSD
10 GigE NIC
Cassandra version 1.0.8
8GB JVM heap space
Size-tiered compaction strategy