1. Verarbeitung von Datenstromen in Echtzeit
Tobias Heintz1 Benjamin Kille2
1plista GmbH
2Technische Universitat Berlin
September 26, 2014
2. Table of Contents
Introduction
Recommender Systems
Unpersonalised Recommendation
Collaborative Filtering
Content-based Filtering
Evaluation
News Recommendation
Big Data Issues
3. Who are we?
I Tobias Heintz, plista GmbH
I Benjamin Kille, Technische Universitat Berlin
plista GmbH
Pioneers for targeted advertisement and content distribution.
I founded 31 July, 2008
I incorporated in the WPP Group as of 1 January, 2014
I headquaters in Berlin, Germany
I 120 employees (30 % R&D)
Technische Universitat Berlin
I >30 000 enrolled students
I 331 professors
I >2600 researchers
4. What problems do we address?
Recommender Systems
We will introduce recommender systems; we will discuss a variety
of algorithms; we will explore how to evaluate recommender
systems.
News
We will talk about speci
5. c challenges when recommending news;
we will illustrate issues arising as system fail to build
comprehensive user pro
6. les; we will depict how news evolving over
time aect recommender systems.
Big Data
We will examplify in what way news represent a source of big data;
we will introduce a system which grants researchers access to big
data; we will show you, how you can compete with your own
approaches.
7. Why are these problem important?
Users increasingly face information overload as they interact with
item collections. For instance:
I 43 000 000 songs on Apple's iTunes
I 100 h of video are uploaded on Youtube every minute
I 3 000 000 movies on IMDb
I ...
Collection continue to grow causing even more severe information
overload. The same yields for news articles.
8. Table of Contents
Introduction
Recommender Systems
Unpersonalised Recommendation
Collaborative Filtering
Content-based Filtering
Evaluation
News Recommendation
Big Data Issues
12. lter. More formally, a general-purpose
recommender system is a triple (U; I; ).
U ! set of users fu1; u2; : : : ; uMg
I ! set of items fi1; i2; : : : ; iNg
! a
13. lter function
The performance of dierent recommendation algorithms typically
depends on .
14. Filter Functions
Filter functions take a user u, the entire item collection I, and a
model M. They return a subset of items to be recommended I.
(u; I;M) = I
Recommender systems' success or failure strongly depends on the
model M. In particular, how accurately the model re
ects actual
user preferences. M may take various kinds of input, as we will
discuss for a selection of recommendation algorithms.
17. Most-Popular Recommendation
M orders the item collection according to the number of
interactions, K L M N.
K interactions
L interactions
M interactions
most N interactions
popular
18. Summary: Unpersonalised Recommenders
Advantages
I low computational complexity
I easy to update M
I domain independent
Disadvantages
I disregard personal taste
I disregard context
I high chance to recommend known or unpopular items
19. Collaborative Filtering
Basic Assumptions
I systems have access to users' preferences
I users with similar tastes in the past will continue to like
similar items
I systems have means to compare users tastes
Distinctions
I model-based vs memory-based
I item-based vs user-based
22. Example
Anna
Aviator
Bob
Clara
Dan
Bad Boys
Cars
District 9
Elektra
user profile: Anna
Bad Boys District 9 Elektra [ , , ]
23. Example
Anna
Bob
Clara
Dan
[ , , ]
Bad Boys District 9 Elektra [ , , , ]
Aviator
Bad Boys District 9 Elektra
[ , , ]
Cars District 9 Elektra
[ ]
Aviator
28. Preference Elicitation
Explicit Preferences
I Likes
I Thumbs Up/Down
I Ratings
I Comments
I Purchase
Implicit Preferences
I Click
I Dwell Time
I Returns
How can we measure whether users like items and how much they
do?
29. Collaborative Filtering Algorithms with Ratings
Memory-based
Algorithm uses the complete set of data in the recommendation
process. M contains the full rating matrix.
I user-based k-nearest neighbour
I item-based k-nearest neighbour
Model-based
Algorithm learns a model M and uses it to recommend items.
I matrix factorisation with ALS
I matrix factorisation with SGD
30. User-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (u; v)
Anna
Aviator
Bob
Clara
Dan
Bad Boys Cars District 9 Elektra
31. User-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (u; v)
Anna
Aviator
Bob
Clara
Dan
Bad Boys Cars District 9 Elektra
32. User-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (u; v)
Anna
Aviator
Bob
Bad Boys Cars District 9 Elektra
0 0
1 1 1
1 1 0
1 1
33. Similarity Measures
Number of items in common
(u; v) =
X
i2I
I(i)
I(i) =
(
1 if both u and v liked i
0 otherwise
Cosine similarity
(u; v) =
u v
jjujjjjvjj
Pearson's correlation coecient
(u; v) =
cov(u; v)
std(u)std(v)
34. User-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (u; v)
Anna
Bob
Clara
Dan
Anna Bob Clara Dan
1
1
1
1
sim(Anna, Bob)
sim(Bob, Anna)
35. User-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (u; v)
Anna
Bob
Clara
Dan
Anna Bob Clara Dan
1
1
1
1
sim(Anna, Bob)
sim(Bob, Anna)
[1, sBob, sClara, sDan]
36. User-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (u; v)
Anna
Aviator
Bob
Clara
Dan
Bad Boys Cars District 9 Elektra
?
38. le:
u = (r (i1); r (i2); : : : ; r (iN))
similarity vector:
(u; ) = ((u; v1); (u; v2); : : : ; (u; u); : : : ; (u; vM))
preference prediction:
r (j) = u(u; )
Result
We obtain a prediction for each item's preference and can rank
them accordingly. The algorithm returns as many items as
requested starting from the top rank.
39. Item-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (i ; j)
Anna
Aviator
Bob
Clara
Dan
Bad Boys Cars District 9 Elektra
40. Item-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (i ; j)
Anna
Aviator
Bob
Clara
Dan
Bad Boys Cars District 9 Elektra
41. Item-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (i ; j)
Anna
Aviator
Bob
Clara
Dan
Bad Boys
1
1
1
1
0
0 0
0
42. Similarity Measures
Number of items in common
(i ; j) =
X
u2U
I(u)
I(u) =
(
1 if both i and j are liked by u
0 otherwise
Cosine similarity
(i ; j) =
i j
jji jjjjj jj
Pearson's correlation coecient
(i ; j) =
cov(i ; j)
std(i)std(j)
43. Item-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (i ; j)
Aviator Bad Boys Cars District 9 Elektra
Aviator
Bad Boys
Cars
District 9
Elektra
1
1
1
1
1
sim(Aviator, Bad Boys)
sim(Bad Boys, Aviator)
44. Item-based k-nearest Neighbour
Input: M N rating matrix R, similarity measure (i ; j)
Anna
Aviator
Bob
Clara
Dan
Bad Boys Cars District 9 Elektra
?
46. le:
i = (r (u1); r (u2); : : : ; r (uM))
similarity vector:
(i ; ) = ((i ; j1); (i ; j2); : : : ; (i ; i); : : : ; (i ; jN))
preference prediction:
r (u) = (i ; )i
Result
We obtain a prediction for each item's preference and can rank
them accordingly. The algorithm returns as many items as
requested starting from the top rank.
47. Matrix Factorisation
Input: M N rating matrix R
R =
2
664
1 1 1
1 1 1 1
1 1 1
1
3
775
Goal
Fill the gaps of missing preferences.
48. Matrix Factorisation
Idea
Project preferences into low dimensional space to detect latent
structures.
[R]MN [P]MK[Q]N
K
K M;N
Problem
How to determine P and Q?
49. Matrix Factorisation
Learning P and Q
Input: Error metric
E(P;Q; R) =
X
(u;i)2R
i )2
(r (u; i) PuQ
(quadratic error)
E(P;Q; R) =
X
(u;i)2R
jr (u; i) PuQ
i j
(absolute error)
50. Matrix Factorisation
Stochastic Gradient Descent
Optimise error metric by selecting data points at random.
I initialise P;Q with small random values
I pick a preference (u; i) at random
I determine the gradient at that point
I adjust P;Q accordingly
I continue
Alternating Least Squares
Optimise either P or Q keeping the other
51. xed
I initialise P;Q with small random values
I optimise error metric by P
I optimise error metric by Q
I continue
52. Summary: Collaborative Filtering
Advantages
I takes personal taste into account
I successful in the Net
ix Prize competition
I domain-independent
Disadvantages
I cold-start problem
I sparsity
I grey sheep
53. Cold-Start Problem
I user without known preferences
I item without preferences
I similarity measures fail
I inconclusive latent factors
54. Grey Sheep
I user rate all their items average
I user pro
55. le: [3; 3; 3; 3; : : : ; 3]
I collaborative systems cannot distinguish good from bad items
56. Content-based Filtering
Idea
Suggest items which are similar to items users have liked.
Similarity
I based on content ! features
I depending on the domain
62. le, item collection, item features, and similarity
measure
Features
▪ Name/ID
▪ Meta data
▪ Content
▪ audio stream -- songs
▪ video stream --
movies
▪ text -- book, news
article
65. Content-based Filtering
Similarity: Example
I keyword overlap ! text
I average colour match ! images/video
I maximum amplitude ! audio/sound
I common actors ! movies
I common interests ! friends/partnership
66. Summary: Content-based Filtering
Advantages
I considers personal taste
I high expectability
Disadvantages
I cost-sensitive for high-volume contents, e.g., video
I low serendipity
I user cold-start
67. Evaluation
Important aspects
I how well does the system predict preferences?
I how often do users receive useful suggestions?
I how long does it take for the system to provide suggestions?
I how many requests cannot be answered?
I how often do users return to the site?
I how often do users purchase/rent/consume items which the
system had recommended?
I how well did users perceive the system?
68. Evaluation: Rating Prediction
Goal
The evaluation ought to show how well the system estimates
preferences.
Assumptions
I system can access recorded explicit numerical preferences
I tastes remain stable over time
I the more accurate the system estimates preferences, the more
suited the suggestions
Metrics
I root mean squared error
q
1
j(u;i)j
P
(u;i)2R(r (u; i) ^r (u; i ))2
I mean absolute error 1
j(u;i)j
P
(u;i)2R jr (u; i) ^r (u; i)j
69. Evaluation: Ranking
Goal
The evaluation ought to show how well the system ranks items
according to users' preferences.
Assumptions
I system can access preference relations between items
I tastes remain stable over time
I the better the system ranks items, the more suited the
suggestions
Metrics
I normalised discounted cumulative gain DCG
IDCG
I mean reciprocal rank 1
juj
P
u2U
1
ranki
70. Evaluation: Top-N
Goald
The evaluation ought to show how well the system selects the top
suggestions.
Assumptions
I system can access preference relations between items
I tastes remain stable over time
I the better the system selects the top suggestions, the more
suited they are
Metrics
I precision@N TP
TP+FP
I recall@N TP
TP+FN
71. Evaluation: Problems
I explicit preferences may not be available
I tastes change over time
I recorded data does not fully re
ect the current situation
Solution
Accessing real systems with current user interactions to see
whether method performs better than existing one ! second part
of the tutorial
72. Summary: Recommender Systems
I support users by suggesting interesting items
I counteract information overload
I unpersonalised recommender
I collaborative
73. ltering
I user-based k-nearest neighbour
I item-based k-nearest neighbour
I matrix factorisation
I content-based
75. Table of Contents
Introduction
Recommender Systems
Unpersonalised Recommendation
Collaborative Filtering
Content-based Filtering
Evaluation
News Recommendation
Big Data Issues
76. News Recommendation: Special Characteristics
Collection Dynamics
I thousands of new article published daily
I older articles' relevancy decays
Contextual Dierences
I users perceive recommendations dierently
I devices render recommendations dierently
I dependence on daytime and weekday
Popularity Bias
I few items receive a lot of attention
I most items receive hardly any attention
80. Table of Contents
Introduction
Recommender Systems
Unpersonalised Recommendation
Collaborative Filtering
Content-based Filtering
Evaluation
News Recommendation
Big Data Issues
81. Big Data
Goal
Intelligent real-time processing of huge amounts of data.
Recommender Systems ! personalisation
I volume ! amount of data to be stored increases
I variety ! heterogeneous data
I velocity ! data streams in (near) real-time
I veracity ! noisy data
83. l the requirements of big data?
Volume
hundreds of GB every day X
Variety
news entail textual data and images enducing some variety
Velocity
news arise continuously ! second part of the tutorial X
Veracity
news have some consistent attributes (headline, text), but also
comprise some features which are missing or wrong (date, location,
image)