Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
The Beauty of Computing with People
1. If you like this...
The Beauty of Computing
… with People
Xavier Amatriain
Telefonica Research
November 2010
10 Years of Computer Science
@ Free University of Bolzano
2. Outline
1. Introduction (to the talk, me, and Telefonica)
2. Computing with people: Information overload
and Recommender Systems
3. Some of our latest research
4. Conclusions
15. Telefonica is a fast-growing Telecom
1989 2000 2009
Clients About 12 About 68 About 265
million million million
subscribers customers customers
Services Basic Wireline and mobile Integrated ICT
telephone and voice, data and solutions for all
data services Internet services customers
Geographies
Operations in Operations in
Spain 25 countries
16 countries
Staff
About 71,000 About 149,000 About 257,000
professionals professionals professionals
Finances Rev: 4,273 M€ Rev: 28,485 M€ Rev: 56.7 b€
(1) EPS: Earnings per share
16. Currently among the largest in the world
Source: Bloomberg, 06/12/09
Telco sector worldwide ranking by market cap (US$ bn)
17. Telefónica is the sixth worldwide operator in R&D
effort and the first company in Spain
R&D INVESTMENT R&D
TELCO OPERATOR
2008 (M€) COMPANY INVESTMENT
2008 (M€)
NTT 2.151,28
Telefonica 668,00
BT 1.157,49
Indra Sistemas 166,34
France Telecom 900,00
Almirall 98,20
Telstra 756,41
Repsol YPF 83,00
Telecom Italia 704,00
Iberdrola 73,10
Telefonica 668,00
Acciona 71,30
Deutsche Telekom 614,00
Zeltia 58,09
AT&T 598,57 Fagor Electrodomesticos 56,00
Vodafone 289,63 Industria de Turbo
50,00
Propulsores
KT 218,92
Abengoa 33,54
KDDI 155,30
Gamesa 32,06
SK Telecom 138,84 Ebro Puleva 11,58
Telenor 103,16 Cie Automotive 11,51
TeliaSonera 102,53 Amper 11,11
17
18. Scientific Research
Mobile and Ubicomp
Multimedia Core User Modelling &
Data Mining
HCIR
DATA MINING
Wireless Systems
Content Distribution & P2P
Social Networks
27. The Age of Search has come to
an end
●
... long live the Age of Recommendation!
●
Chris Anderson in “The Long Tail”
●
“We are leaving the age of information and entering the age of
recommendation”
●
CNN Money, “The race to create a 'smart' Google”:
●
“The Web, they say, is leaving the era of search and entering
one of discovery. What's the difference? Search is what you do
when you're looking for something. Discovery is when
something wonderful that you didn't know existed, or didn't
know how to ask for, finds you.”
28. But, what are
Recommender
Systems?
Read This!
Ask Prof. Francesco Ricci
29. The value of recommendations
Netflix: 2/3 of the movies rented are
recommended
Google News: recommendations generate
38% more clickthrough
Amazon: 35% sales from recommendations
Choicestream: 28% of the people would buy
more music if they found what they liked.
u
30. The “Recommender problem”
● Estimate a utility function that is able to
automatically predict how much a user will like an
item that is unknown for her. Based on:
● Past behavior
● Relations to other users
● Item similarity
● Context
● ...
31. Data mining +
all those other things
● User Interface
● System requirements (efficiency, scalability,
privacy....)
● Business Logic
● Serendipity
● ....
32. The Netflix Prize
● 500K users x 17K movie
titles = 100M ratings = $1M
(if you “only” improve
existing system by 10%!
From 0.95 to 0.85 RMSE)
● 49K contestants on 40K teams from
184 countries.
● 41K valid submissions from 5K
teams; 64 submissions per day
● Wining approach uses hundreds of
predictors from several teams
33. Approaches to
Recommendation
●
Collaborative Filtering
●
Recommend items based only on the users past behavior
●
User-based
●
Find similar users to me and recommend what they liked
●
Item-based
●
Find similar items to those that I have previously liked
●
Content-based
●
Recommend based on features inherent to the items
●
Social recommendations (trust-based)
34. What works
●
It depends on the domain and particular problem
●
As a general rule, it is usually a good idea to combine:
Hybrid Recommender Systems
●
However, in the general case it has been
demonstrated that (currently) the best isolated
approach is CF.
●
Item-based in general more efficient and better but
mixing CF approaches can improve result
●
Other approaches can be hybridized to improve
results in specific cases (cold-start problem...)
35. The CF Ingredients
● List of m Users and a list of n Items
● Each user has a list of items with associated opinion
● Explicit opinion - a rating score (numerical scale)
● Implicit feedback – purchase records or listening
history
● Active user for whom the prediction task is performed
● A metric for measuring similarity between users
● A method for selecting a subset of neighbors
● A method for predicting a rating for items not rated by
the active user.
35
37. User Feedback is Noisy
DID YOU HEAR WHAT
I LIKE??!!
...and limits Our Prediction
Accuracy
38. Experimental Setup
● 100 Movies selected from Netflix dataset doing
a stratified random sampling on popularity
● Ratings on a 1 to 5 star scale
● Special “not seen” symbol.
● Trial 1 and 3 = random order; trial 2 = ordered
by popularity
● 118 participants
39. Results
● Users are inconsistent
● Inconsistencies are not random and depend on
many factors
● More inconsistencies for mild opinions
● More inconsistencies for negative opinions
● How the items are presented affects
inconsistencies
● Inconsistencies produce natural noise
● Natural noise limits our prediction accuracy
independently of the algorithm: Magic Barrier
40. Rate it again
● By asking users to rate items again we can
remove noise in the dataset
● Improvements of up to 14% in accuracy!
● Because we don't want all users to re-rate all
items we design ways to do partial denoising
● Data-dependent: only denoise extreme ratings
● User-dependent: detect “noisy” users
41.
Who Can we trust?
42. The Wisdom of the Few
X. Amatriain et al. "The wisdom of the few: a collaborative filtering
approach based on expert opinions from the web", SIGIR '09
43. Expert-based CF
● expert = individual that we can trust to have produced
thoughtful, consistent and reliable evaluations (ratings) of
items in a given domain
● Expert-based Collaborative Filtering
● Find neighbors from a reduced set of experts instead of
regular users.
1. Identify domain experts with reliable ratings
2. For each user, compute “expert neighbors”
3. Compute recommendations similar to standard kNN CF
45. User Study
● 57 participants, only 14.5 ratings/participant
● 50% of the users consider Expert-based CF to be
good or very good
● Expert-based CF: only algorithm with an average
rating over 3 (on a 0-4 scale)
53. Context-aware Recommendations
● A clear area of research and interest for
companies: recommend me something that I
like and is relevant in my current context.
● Context = any variable that adds a new dimension
to the 2D user-item problem (e.g. time, geolocation,
weather...)
54. User micro-profiles
● Our proposal is to represent a user by a
hierarchy of micro-profiles where each micro-
profile represents a class in the context variable
55. Multiverse Recommendation
● A different approach: represent the contextual
recommendation problem by n-dimensional
matrices (aka Tensors)
57. Tourism 2.0
● Tourism is not the same
since the web
appeared:
– People search for
information on where to
go online (reading blogs,
in their social networks...)
– People buy tickets and
hotel packages online
– People post pictures and
discuss tips online
58. Tourism 3.0 – Going Mobile
● The mobile web and smartphones are introducing yet
another revolution
– Tourists can now access information on the go:
● Looking for information on a sight
● Tips on where to go next
● Information about the weather
59. Master Planner
● I am in Bolzano, it's
November and sunny, I have
6 hours to visit things and I
am interested on music, art,
literature, and sports
● I need: An automatic tourist
route recommender system
60. Master Planner
● Completely automatic
personalized/contextualized
tourist recommender system
● Generates automatic city
models using web resources
● Generates automatic user
models from regular user
profiles
● Personalizes/contextualizes
generic city models
● Recommends optimized
personalized routes taking
into account constraints
using AI techniques
61. Friending 3.0
Recommending contacts in
Social Networks
62. The importance of finding contacts
● The ability to attract people to a social network
is the key to its success
● The main reason people get hooked to a
particular SN is because they find relevant
“friends”
63. The concept of “friend”
● The idea of “friend” is different for each SN
– People do not connect on Facebook for the
same reasons than in Twitter or Linkedin
● Even in a particular SN, different people
connect for different reasons:
– Social proximity (friend of friend)
– Geoproximity (person who lives nearby)
– Content (person that talks about
interesting stuff)
– Popularity (to connect to influential people)
– ....
64. Friending 3.0
● Automatic Personalized Friend
Recommending System
● Basic rationale
– Combines different factors
and personalizes the
combination for each user:
● Social proximity
● Geoproximity
● Popularity
● Content similarity
● ...
67. Can we improve the search and
discovery experience by providing
a readily available connection to
their social network?
68. WHAT IS PORQPINE?
So
ci
Distributed social web search engine al
ly
●
● Locally caches the page & records user aw
interactions (e.g., bookmarking). ar
e
● Searches by querying caches of friends
● Pages that friends have “interacted with” are
ranked higher
Personalized
e
ar
w
a
d
te
xt
bu
te
on
tri
is
C
Lazy collaboration
D
69.
70. SSB
iPhone optimized web-
application + Facebook app
When launched it centers on the
users current physical location
Displays all queries/questions
posted by other users in that
location
As users pan/zoom the set of
queries is updated
Users can post new queries or
interact with queries of others
71. Apr 2009, 16 users, 1 week, ireland
Live Field Study in-the-wild
Sept 2009, 34 users, 1 month, ireland
73. Conclusions
● Computer Science is not only a good choice from a
career perspective, it's also fun, creative, and
engaging (Hope I have convinced you by now)
● One of the amazing things is that you can now apply
CS research to any domain (I am meeting the world's
best chef next week to brainstorm)
● An important current trend is to use CS to better
understand people and improve their lives
● The goal of Recommender Systems is precisely that:
understand you in your context and help you take
better decisions