Recommandation sociale : filtrage collaboratif et par le contenu

RECOMMANDATION
SOCIALE
Patrice Bellot 
Aix-Marseille Université - CNRS (LSIS UMR 7296) — OpenEdition
patrice.bellot@univ-amu.fr
LSIS - DIMAG team http://www.lsis.org/dimag
OpenEdition Lab : http://lab.hypotheses.org

OpenEdition home page
> 4 million unique visitors / month
Our partners: libraries an institutions
all over the world

P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Quelques questions ouvertes…
— Est-il utile d’exploiter les méta-données, les contenus, les commentaires ?
— Comment relier les contenus les uns aux autres ?
— Comment exploiter des contenus de nature différente ?
— Comment « comprendre » les besoins des lecteurs ? des requêtes longues ?
des profils ?
— Quels sont les usages ? Quels sont les besoins ?
— Comment aller au-delà de la pertinence informationnelle ? (genre, niveau
d’expertise, document récent ou non…)
3
— OpenEdition Lab : un programme de recherche HN
— Détecter des tendances, des sujets émergents, les livres « à lire »…

Plan
— Quelques exemples : poser les problèmes et les enjeux
— Quelles ressources ?
— Quelques généralités méthodologiques
— Quelques stratégies d’évaluation d’une recommandation
— Autour du filtrage collaboratif ( = recommandation « sociale » ?)
— Autour de l’analyse de contenu et de la suggestion de contenus
. focus sur la recherche de livres par requêtes longues en langue naturelle
4

Introduction
Objectifs de la recommandation :
— Recommander des « objets » (films, livres, pages Web…)
— Prédire les notes que individus donneraient
Différents types de recommandation :
— Selon des connaissances : caractéristiques sur les individus cibles (âge, salaire…)
— Selon les préférences des individus
— exprimées par les individus eux-mêmes explicitement
— devinées en analysant leur comportement (%) — lien avec classification
— En croisant les comportements des individus : filtrage collaboratif
— En construisant des profils et en les comparant aux contenus
Un grand nombre de sources d’information :
— Informations explicitement données par les individus
— Les contenus et leurs méta-données
— Le Web et les réseaux sociaux (contenus, graphes…)
5

P. Bellot (AMU-CNRS, LSIS-OpenEdition) 6

ACM Conférences et ateliers
— Conférences :
— Recommender Systems RecSys (depuis 2007)
— Sessions « Recommendation Systems » à SIGIR, CIKM,
— Ateliers :
— Context-aware Movie Recommendation (2010+2011)
— Information Heterogeneity and Fusion in Recommender Systems (2010+2011)
— Large-Scale Recommender Systems and the Netflix Prize Competition (2008)
— Recommendation Systems for Software Engineering (2008-14)
— Recommender Systems and the Social Web (2012)
7

Articles « systèmes de recommandation »
Conférence ACM RecSys (https://recsys.acm.org)
8

EXEMPLES
11

(2015)
https://www.slideshare.net/MrChrisJohnson/interactive-recommender-systems-with-netflix-and-spotify/20-Spotify_in_NumbersStarted_in_2006

Amazon Navigation 
Graph : YASIV
21
http://www.yasiv.com/#/Search?q=orwell&category=Books&lang=US

Nombreuses considérations
26
Bobadilla J, Ortega F, Hernando A, Gutiérrez A. Recommender systems survey. Knowledge-Based Systems. 2013;46(C):109-132. doi:10.1016/j.knosys.2013.03.012.

RESSOURCES
27

Quelques collections de données
29
MSD(x,y) (11) tends towards zero as the ratings of users x and
y become more similar and tends towards 1 as they became
more different (we assume that the votes are normalized in
the interval [0..1]).
(3) We obtain the Jaccard(x,y) measure computing the propor-
tion between the number of positions [1..I] in which there
are elements different to in both rx and ry regarding the
number of positions [1..I] in which there are elements differ-
ent to in rx or in ry:
Jaccardðx; yÞ ¼
rx ry
rx [ ry
¼
#dx;y
#rx þ #ry À #dx;y
; ð12Þ
in our example: 4/(6 + 6À4) = 0.5.
(4) We combine the above elements in the final equation:
newmetric x; yð Þ ¼ Jaccard x; yð Þ Â 1 À MSD x; yð Þð Þ; ð13Þ
in the running example:
users taken at rand
the remaining 80% w
given the huge num
its users as test user
Table 2 shows th
5. Results
In this section w
abases specified in T
MovieLens, Fig. 7 sho
responds to FilmAffi
Graph 6A shows t
ing Pearson correlat
uous). The new m
practically all the ex
of k-neighborhoods
around 0.2 stars in t
150, 200).
Graph 6B shows
small percentages i
improbable that the
film that this user h
increases, the proba
the film also increas
Table 1
Main parameters of the databases used in the experiments.
MovieLens FilmAffinity NetFlix
Number of users 4382 26447 480189
Number of movies 3952 21128 17770
Number of ratings 1000209 19126278 100480507
Min and max values 1–5 1–10 1–5
Table 2
Main parameters used in the experiments.
K (MAE, coverage, perfect predictions) Precision/recall

The MovieLens
Datasets
31
Harper, F. M., Konstan, J. A. (2016). The movielens datasets: History and context. ACM
Transactions on Interactive Intelligent Systems (TiiS), 5(4), 19.

https://labrosa.ee.columbia.edu/millionsong/lastfm

http://webscope.sandbox.yahoo.com/catalog.php?datatype=r

http://files.grouplens.org/datasets/hetrec2011/hetrec2011-delicious-readme.txt

METHODES :
GENERALITES
41

Articles « Etat de l’art »
42

https://www.slideshare.net/xamat/recommender-systems-machine-learning-summer-school-2014-cmu

Des « individus » et des « données »
44
Soient T un tableau croisant n individus I (en lignes) et K variable
quantitatives X (en colonnes). xi,k est la valeur de la variable k pour l’indi
vidu i :
X1 X2 · · · XK variables
individus
I1 x1,1 x1,2 · · · x1,K
I2 x2,1 x2,2
... x2,K
...
...
... xi,k
...
In xn,1 xn,2
... xn,K
Un des objectifs de l’analyse de données est de déterminer des profil
d’individus ou, dit autrement, des classes d’individus se ressemblant. Cett
ressemblance est déterminée à partir des valeurs des variables associées au
individus.
Un autre objectif concerne les variables elles-mêmes : calcul des corrélatio
entre elles (à quel point une évolution des valeurs de l’une entraˆıne un
évolution des valeurs de l’autre et de quelle manière), régression entre va
riables (formulation des liens entre variables)... L’Analyse en Composante
Principales (ACP) concerne les liaisons linéaires entre variables, par op
position aux liaisons quadratiques, logarithmiques ou exponentielles pa
exemple. L’ACP fait partie des analyses factorielles qui vont détermine

P. Bellot
• L’analyse des données peut être conduite selon

• les individus : recherche de ressemblance entre les individus (en fonction
des valeurs des variables) = classification automatique des individus

• les variables : quelles sont les variables qui expliquent le mieux les
données (les différences entre individus) ? quelles sont les composantes
principales ? où se trouve la plus grande variabilité ?
Etude des individus / étude des variables
45
Chapitre 2
Analyse en composantes
principales
Soient T un tableau croisant n individus I (en lignes) et K variables
quantitatives X (en colonnes). xi,k est la valeur de la variable k pour l’indi-
vidu i :
individus
I1 x1,1 x1,2 · · · x1,K
I2 x2,1 x2,2
... x2,K
...
...
... xi,k
...
In xn,1 xn,2
... xn,K
Un des objectifs de l’analyse de données est de déterminer des profils
d’individus ou, dit autrement, des classes d’individus se ressemblant. Cette
ressemblance est déterminée à partir des valeurs des variables associées aux
individus.
Un autre objectif concerne les variables elles-mêmes : calcul des corrélations
entre elles (à quel point une évolution des valeurs de l’une entraˆıne une
évolution des valeurs de l’autre et de quelle manière), régression entre va-
riables (formulation des liens entre variables)... L’Analyse en Composantes
Principales (ACP) concerne les liaisons linéaires entre variables, par op-
position aux liaisons quadratiques, logarithmiques ou exponentielles par
exemple. L’ACP fait partie des analyses factorielles qui vont déterminer
des facteurs à partir des valeurs des variables associées aux individus. Ces
7
Chapitre 2
Analyse en composantes
principales
Soient T un tableau croisant n individus I (en lignes) et K variables
quantitatives X (en colonnes). xi,k est la valeur de la variable k pour l’indi-
vidu i :
individus
I1 x1,1 x1,2 · · · x1,K
I2 x2,1 x2,2
... x2,K
...
...
... xi,k
...
In xn,1 xn,2
... xn,K
Un des objectifs de l’analyse de données est de déterminer des profils
temp-data.frame(temperature[1:12])
cl = kmeans(temp,3,iter.max=2,nstart=15)
e) visualisez les classes :
summary(cl)
cl$cluster
summary(cl$cluster)
cl$center
f) Ajouter le résultat de la classification aux données
- utilisez le paquetage cluster pour accéder à la fonction clusplot : library(cluster)
- puis :
aggregate(temperature,by=list(cl$cluster),FUN=mean)
cl2-data.frame(temperature,cl$cluster)
clusplot(temperature,cl$cluster,color=TRUE,shade=TRUE,labels=2,lines=0)
5- Question «subsidiaire» : manipulation du paquetage APCluster
Installer le paquetage APCluster
Polytech’Marseille Page 2 sur 3
-6 -4 -2 0 2 4 6 8
-4-3-2-10123
Individuals factor map (PCA)
Dim 1 (86.87%)
Dim2(11.42%)
Amsterdam
Athens
Berlin
Brussels
Budapest
Copenhagen
Dublin
Elsinki
Kiev
Krakow
LisbonLondon
Madrid
Minsk
Moscow
Oslo
Paris
Prague
Reykjavik
RomeSarajevo
Sofia
Stockholm
Antwerp
Barcelona
Bordeaux
Edinburgh
FrankfurtGeneva
Genoa
Milan
Palermo
Seville
St. Petersburg
Zurich
East
North
South
West
-1.0 -0.5 0.0 0.5 1.0
-1.0-0.50.00.51.0
Variables factor map (PCA)
Dim 1 (86.87%)
Dim2(11.42%)
January
February
March
April
May
JuneJuly
August
September
October
November
December
Annual
Amplitude
Latitude
Longitude
Polytech’Marseille Page 2 sur 4

P. Bellot
ACP et réduction de la dimension
• Une façon de représenter en quelques dimensions des nuages d’individus 
— en conservant au mieux les distances entre les individus 
— en privilégiant les dimensions de plus grande variabilité (sélection itérative
des facteurs qui maximisent la variance) 
= application d’une fonction de projection
49

P. Bellot 50
Méthodes d’apprentissage
• Diﬀérentes formes d’apprentissage

• Agent « élève » recopie l’agent « maître » -- fournir des exemples

• Raisonnement par induction (à partir d’exemples)

• Apprentissage de caractéristiques importantes

• Détection de patterns récurrents

• Ajustement des paramètres importants

• Transformation d’informations en connaissances
Exemples -- Modèle -- Test -- Correction / Enrichissement des exemples

Approches statistiques, probabilistes
Apprentissage automatique
51
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data 
Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001)
Yi+1
-
c
6
s
Xi+1
Yi 1 Yi Yi+1
c
s
c
s
c
s
Xi 1 Xi Xi+1
nd the chain-structured case of CRFs (right) for sequences.
e training data. Both algorithms are based on the im-
ed iterative scaling (IIS) algorithm of Della Pietra et al.
7); the proof technique based on auxiliary functions
be extended to show convergence of the algorithms for
exp (
j
λjtj(yi−1, yi, x, i) +
k
µksk(yi, x, i)), (2)
(yi−1, yi, x, i) is a transition feature function of the entire observation
and the labels at positions i and i−1 in the label sequence; sk(yi, x, i)
feature function of the label at position i and the observation sequence;
nd µk are parameters to be estimated from training data.
n deﬁning feature functions, we construct a set of real-valued features
the observation to expresses some characteristic of the empirical dis-
n of the training data that should also hold of the model distribution.
mple of such a feature is
i) =
1 if the observation at position i is the word “September”
0 otherwise.
ture function takes on the value of one of these real-valued observation
b(x, i) if the current state (in the case of a state function) or previous
ent states (in the case of a transition function) take on particular val-
feature functions are therefore real-valued. For example, consider the
g transition function:
tj(yi−1, yi, x, i) =
b(x, i) if yi−1 = IN and yi = NNP
0 otherwise.
e remainder of this report, notation is simpliﬁed by writing
s(yi, x, i) = s(yi−1, yi, x, i)
Fj(y, x) =
n
i=1
fj(yi−1, yi, x, i),
ch fj(yi−1, yi, x, i) is either a state function s(yi−1, yi, x, i) or a transi-
ction t(yi−1, yi, x, i). This allows the probability of a label sequence y
observation sequence x to be written as
p(y|x, λ) =
1
Z(x)
exp (
j
λjFj(y, x)). (3)
a normalization factor.
4
for classification because the tweet is too short,
therefore many tweets does not have any words
with salient Z_score. The three following figures
1,2,3 show the distribution of Z_score over each
class, we remark that the majority of terms has
Z_score between -1.5 and 2.5 in each class and
the rest are either vey frequent (2.5) or very rare
(-1.5). It should indicate that negative value
means that the term is not frequent in this class in
comparison with its frequencies in other classes.
Table1 demonstrates the first ten terms having
the highest Z_scores in each class. We have test-
ed to use different values for the threshold, the
best results was obtained when the threshold is 3.
positive
Z_score
negative
Z_score
Neutral
Z_score
Love
Good
Happy
Great
Excite
Best
Thank
Hope
Cant
Wait
14.31
14.01
12.30
11.10
10.35
9.24
9.21
8.24
8.10
8.05
Not
Fuck
Don’t
Shit
Bad
Hate
Sad
Sorry
Cancel
stupid
13.99
12.97
10.97
8.99
8.40
8.29
8.28
8.11
7.53
6.83
Httpbit
Httpfb
Httpbnd
Intern
Nov
Httpdlvr
Open
Live
Cloud
begin
6.44
4.56
3.78
3.58
3.45
3.40
3.30
3.28
3.28
3.17
Table1. The first ten terms having the highest Z_score in
each class
- Sentiment Lexicon Features (POL)
We used two sentiment lexicons, MPQA Subjec-
tivity Lexicon(Wilson, Wiebe et al. 2005) and
tweets using a given s
we could only downl
of protected profiles
we used the develop
tweets for evaluating o
the development set w
new model which pre
set 2013 and 2014.
4.2 Experiments
Official Results
The results of ou
SemEval evaluation
test set 2013 and 20
mention that these res
of a software bug dis
sion deadline, theref
demonstrated as non-
previous results are t
which is trained by al
but because of index
was represented by a
terms.
Non-official Results
We have done vario
features presented in S
Naïve-Bayes model. W
ture vector of tweet te
for test set 2013, 20
augmented this origin
Z_score for each term ti in a class Cj (tij) by cal-
culating its term relative frequency tfrij in a par-
ticular class Cj, as well as the mean (meani)
which is the term probability over the whole cor-
pus multiplied by nj the number of terms in the
class Cj, and standard deviation (sdi) of term ti
according to the underlying corpus (see Eq.
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
The term which has salient frequency in a class
in compassion to others will have a salient
Z_score. Z_score was exploited for SA by
(Zubaryeva and Savoy 2010) , they choose a
threshold (2) for selecting the number of terms
having Z_score more than the threshold, then
they used a logistic regression for combining
these scores. We use Z_scores as added features
Bing Liu's Opinion Lexicon
(Hu and Liu 2004) and augm
works. We extract the numb
tive and neutral words in tw
se lexicons. Bing Liu's le
negative and positive annot
contains negative, positive a
- Part Of Speech (POS)
We annotate each word in
tag, and then we compute
tives, verbs, nouns, adverb
each tweet.
4 Evaluation
4.1 Data collection
We used the data set provi
and 2014 for subtask B of
Twitter(Rosenthal, Ritter e
Kozareva et al. 2013). Th
provided with training twee
tive, negative or neutral. W
tweets using a given script.
we could only download 8
Quels sont les mots caractéristiques d’un groupe de documents ?
Quelles relations significatives à partir des seules formes observées ?
Analogies, corrélations

Recommandation et séries temporelles
54

EVALUATION
55

Grille d’évaluation
56
A. Constructs and Questions of ResQue
The following contains the questionnaire statements that can be
used in a survey. They are developed based on the ResQue model
described in this paper. Users should be asked to indicate their
answers to each of the questions using the 1-5 Likert scales, where
1 indicates “strongly disagree” and 5 is “strongly agree.”
A1. Quality of Recommended Items
A.1.1 Accuracy
The items recommended to me matched my interests.*
The recommender’s interface provides sufficient informa
The information provided for the recommended item
sufficient for me.
The labels of the recommender interface are clear
adequate.
The layout of the recommender interface is attractive
adequate.*
A4. Perceived Ease of Use
A.4.1 Ease of Initial Learning
19
Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.
This volume is published and copyrighted by its editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.
he
s.
on
y.
be
ed
h.
ve
4].
es
we
of
ns
nd
he
on
of
ng
of
er
ed
The recommender gave me good suggestions.
I am not interested in the items recommended to me (reverse
scale).
A.1.2 Relative Accuracy
The recommendation I received better fits my interests than
what I may receive from a friend.
A recommendation from my friends better suits my interests
than the recommendation from this system (reverse scale).
A.1.3 Familiarity
Some of the recommended items are familiar to me.
I am not familiar with the items that were recommended to me
(reverse scale).
A.1.4 Attractiveness
The items recommended to me are attractive.
A.1.5 Enjoyability
I enjoyed the items recommended to me.
A.1.6 Novelty
The items recommended to me are novel and interesting.*
ULL PAPER
tric Evaluation of Recommender Systems and Their Interfaces (UCERSTI),
, Spain, Sep 30, 2010
613-0073, online ceur-ws.org/Vol-612/paper3.pdf
them in recent studies. On average, between 12 and 15 questions
were used. Based this previous work, we have synthesized and
organized a total of 15 questions as a simplified model for the
purpose of performing a quick and easy usability and adoption
evaluation of a recommender (see questions with * sign).
5. CONCLUSION AND FUTURE WORK
User evaluation of recommender systems is a crucial subject of
study that requires a deep understanding, development and testing
of the right dimensions (or constructs) and the standardization of
the questions used. The framework described in this paper
presents the first attempt to develop a complete and balanced
evaluation framework that measures users’ subjective attitudes
based on their experience towards a recommender.
ResQue consists of a set of 13 constructs and 60 questions for a
high-quality recommender system from the user point of view and
can be used as a standard guideline for a user evaluation. It can
also be adapted to a custom-made user evaluation by tailoring it in
an individual research context. Researchers and practitioners can
use these questionnaires with ease to measure users’ general
satisfaction with recommenders, their readiness to adopt the
technology, and their intention to purchase recommended items
and return to the site in the future.
After ResQue was finalized, we asked several expert researchers
in the community of recommender systems to review the model.
Their feedback and comments were then incorporated into the
final version of the model. This method, known as the Delphi
method, is one of the first validation attempts on the model. Since
the work was submitted, we have started conducting a survey to
further validate the model’s reliability, validity and sensitivity
using factor analysis, structural equation modeling (SEM), and
other techniques described in [21]. Initial results based on 150
participants indicate how the model can be interpreted and show
factors that correspond to the original model. At the same time,
analysis also gives some indications on how to refine the model.
More users are expected to participate in the survey and the final
outcome will be soon reported.
APPENDIX
I am not familiar with the items that were recommended to me
(reverse scale).
A.1.4 Attractiveness
The items recommended to me are attractive.
A.1.5 Enjoyability
I enjoyed the items recommended to me.
A.1.6 Novelty
The items recommended to me are novel and interesting.*
The recommender system is educational.
The recommender system helps me discover new products.
I could not find new items through the recommender (reverse
scale).
A.1.6 Diversity
The items recommended to me are diverse.*
The items recommended to me are similar to each other
(reverse scale).*
A.1.7 Context Compatibility
I was only provided with general recommendations.
The items recommended to me took my personal context
requirements into consideration.
The recommendations are timely.
A2. Interaction Adequacy
The recommender provides an adequate way for me to express
my preferences.
The recommender provides an adequate way for me to revise
my preferences.
The recommender explains why the products are
recommended to me.*
A3. Interface Adequacy
Pu P, Chen L. A User-Centric Evaluation Framework of Recommender Systems. In : ACM RecSys 2010 Workshop on User-Centric Evaluation of
Recommender Systems and Their Interfaces ; 2010:14-22.

57
e
i
e
o
y
d
0
w
,
.
l
e
l
r
e
requirements into consideration.
The recommendations are timely.
A2. Interaction Adequacy
The recommender provides an adequate way for me to express
my preferences.
The recommender provides an adequate way for me to revise
my preferences.
The recommender explains why the products are
recommended to me.*
A3. Interface Adequacy
The recommender’s interface provides sufficient information.
The information provided for the recommended items is
sufficient for me.
The labels of the recommender interface are clear and
adequate.
The layout of the recommender interface is attractive and
adequate.*
A4. Perceived Ease of Use
A.4.1 Ease of Initial Learning
19
authors. Copying permitted only for private and academic purposes.
editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.
I became familiar with the recommender system very quickly.
I easily found the recommended items.
Looking for a recommended item required too much effort
(reverse scale).
A.4.2 Ease of Preference Elicitation
I found it easy to tell the system about my preferences.
It is easy to learn to tell the system what I like.
It required too much effort to tell the system what I like
(reversed scale).
A.4.3 Ease of Preference Revision
The recommender made me more confident about my
selection/decision.
The recommended items made me confused about my choice
(reverse scale).
The recommender can be trusted.
A8. Behavioral Intentions
A.8.1 Intention to Use the System
If a recommender such as this exists, I will use it to find
products to buy.
A.8.2 Continuance and Frequency
I will use this recommender again.*
FULL PAPER
Proceedings of the ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces (UCERSTI),
Barcelona, Spain, Sep 30, 2010
Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf
(reversed scale).
I found it easy to make the system recommend different things
to me.
It is easy to train the system to update my preferences.
I found it easy to alter the outcome of the recommended items
due to my preference changes.
It is easy for me to inform the system if I dislike/like the
recommended item.
It is easy for me to get a new set of recommendations.
A.4.4 Ease of Decision Making
Using the recommender to find what I like is easy.
I was able to take advantage of the recommender very quickly.
I quickly became productive with the recommender.
Finding an item to buy with the help of the recommender is
easy.*
Finding an item to buy, even with the help of the
recommender, consumes too much time.
A5. Perceived Usefulness
The recommended items effectively helped me find the ideal
product.*
The recommended items influence my selection of products.
I feel supported to find what I like with the help of the
recommender.*
I feel supported in selecting the items to buy with the help of
the recommender.
A6. Control/Transparency
I feel in control of telling the recommender what I want.
A8.
A.8.
A.8.
A.8.
A.8.
6.
[1]
[2]
[3]
[4]
[5]
[6]Pu P, Chen L. A User-Centric Evaluation Framework of Recommender Systems. In : ACM RecSys 2010 Workshop on User-Centric Evaluation of

58
The recommended items influence my selection of products.
I feel supported to find what I like with the help of the
recommender.*
I feel supported in selecting the items to buy with the help of
the recommender.
A6. Control/Transparency
I feel in control of telling the recommender what I want.
I don’t feel in control of telling the system what I want.
I don’t feel in control of specifying and changing my
preferences (reverse scale).
I understood why the items were recommended to me.
The system helps me understand why the items were
recommended to me.
The system seems to control my decision process rather than
me (reverse scale).
A7. Attitudes
Overall, I am satisfied with the recommender.*
I am convinced of the products recommended to me.*
I am confident I will like the items recommended to me. *
[4] Chen, L. and Pu, P. 2006. Trust Building with Explanation
Interfaces. In Proceedings of International Conference on
Intelligent User Interface (IUI’06), 93-100.
[5] Chen, L. and Pu, P. 2008. A Cross-Cultural User Evaluation
of Product Recommender Interfaces. RecSys 2008, 75-82.
[6] Chen, L. and Pu, P. 2009. Interaction Design Guidelines on
Critiquing-based Recommender Systems. User Modeling and
User-Adapted Interaction Journal (UMUAI), Springer
Netherlands, Volume 19, Issue3, 167-206.
[7] Davis, F.D. 1989. Perceived usefulness, perceived ease of
use, and user acceptance of information technology. MIS
Quart. 13 319-339.
[8] Grabner-Kräuter, S. and Kaluscha, E.A. 2003. Empirical
research in on-line trust: a review and critical assessment Int.
J. Hum.-Comput. Stud. (IJMMS) 58(6), 783-812.
[9] Herlocker, J.L., Konstan, J.A., Borchers, A., and Riedl, J. An
algorithmic framework for performing collaborative filtering.
In Proc. of ACM SIGIR 1999, ACM Press (1999), 230-237.
[10] Herlocker, J.L., Konstan, J.A., and Riedl, J. 2000. Explaining
collaborative filtering recommendations. CSCW 2000, 241-
250.
20
Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.
This volume is published and copyrighted by its editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.
I became familiar with the recommender system very quickly.
I easily found the recommended items.
Looking for a recommended item required too much effort
(reverse scale).
A.4.2 Ease of Preference Elicitation
(reversed scale).
I found it easy to make the system recommend different things
to me.
It is easy to train the system to update my preferences.
I found it easy to alter the outcome of the recommended items
due to my preference changes.
It is easy for me to inform the system if I dislike/like the
recommended item.
It is easy for me to get a new set of recommendations.
A.4.4 Ease of Decision Making
Using the recommender to find what I like is easy.
I was able to take advantage of the recommender very quickly.
I quickly became productive with the recommender.
The recommender made me more confident about my
selection/decision.
The recommended items made me confused about my choice
(reverse scale).
The recommender can be trusted.
A8. Behavioral Intentions
A.8.1 Intention to Use the System
If a recommender such as this exists, I will use it to find
products to buy.
A.8.2 Continuance and Frequency
I will use this recommender again.*
I will use this type of recommender frequently.
I prefer to use this type of recommender in the future.
A.8.3 Recommendation to Friends
I will tell my friends about this recommender.*
A.8.4 Purchase Intention
I would buy the items recommended, given the opportunity.*
6. REFERENCES
[1] Adomavicius, G. and Tuzhilin, A. 2005. Toward the Next
Generation of Recommender Systems: A Survey of the State-
of-the-Art and Possible Extensions. IEEE Trans. Knowl.
Data Eng. 17(6), 734-749.
[2] Beenen, G., Ling, K., Wang, X., Chang, K., Frankowski, D.,
Barcelona, Spain, Sep 30, 2010
Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf

Mesures d’évaluation
— Qualité de la prédiction : Mean Absolute Error, Root Mean Squared Error, Coverage
— Qualité de la recommandation : Precision, Recall, F1-Measure
61
which in-
res: mean
s of these:
nd fallout;
o the eval-
cy of vari-
s.
ents to the
mon to at-
on, recall,
considered
pic diversi-
ating algo-
s, even at
se aspects,
mendation
e methods
MAE ¼
1
#U
X
u2U
1
#Ou
X
i2Ou
jpu;i À ru;ij
!
ð1Þ
RMSE ¼
1
#U
X
u2U
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
#Ou
X
i2Ou
ðpu;i À ru;iÞ2
s
ð2Þ
The coverage could be defined as the capacity of predicting from
a metric applied to a specific RS. In short, it calculates the percent-
age of situations in which at least one k-neighbor of each active
user can rate an item that has not been rated yet by that active
user. We defined Ku,i as the set of neighbors of u which have rated
the item i. We define the coverage of the system as the average of
the user’s coverage:
Let
Cu ¼ fi 2 Ijru;i ¼ ^ Ku;i – £g; Du ¼ fi 2 Ijru;i ¼ g
coverage ¼
1
#U
X
u2U
100 Â
#Cu
#Du

ð3Þ
. / Knowledge-Based Systems 46 (2013) 109–132 117
squared error, normalized mean absolute error, ROC and fallout;
berg et al. [87] focuses on the aspects not related to the eval-
n, Breese et al. [43] compare the predictive accuracy of vari-
methods in a set of representative problem domains.
e majority of articles discuss attempted improvements to the
acy of RS results (RMSE, MAE, etc.). It is also common to at-
t an improvement in recommendations (precision, recall,
etc.). However, additional objectives should be considered
nerating greater user satisfaction [253], such as topic diversi-
on and coverage serendipity.
rrently, the field has a growing interest in generating algo-
s with diverse and innovative recommendations, even at
xpense of accuracy and precision. To evaluate these aspects,
us metrics have been proposed to measure recommendation
ty and diversity [105,220].
e frameworks aid in defining and standardizing the methods
lgorithms employed by RS as well as the mechanisms to eval-
the quality of the results. Among the most significant papers
propose CF frameworks are Herlocker et al. [92] which
ates the following: similarity weight, significance weighting,
nce weighting, selecting neighborhood and rating normaliza-
Hernández and Gaudioso [95] proposes a framework in which
RS is formed by two different subsystems, one of them to
the user and the other to provide useful/interesting items.
ika et al. [125] is a framework which introduces levels of
action in CF process, making the modifications in the RS more
le. Antunes et al. [12] presents an evaluation framework
ming that evaluation is an evolving process during the system
cle.
e majority of RS evaluation frameworks proposed until now
nt two deficiencies: the first of these is the lack of formal-
n. Although the evaluation metrics are well defined, there
RMSE ¼
1
#U
X
u2U
1
#Ou
X
i2Ou
ðpu;i À ru;iÞ2
The coverage could be defined as the capacity of predicting
a metric applied to a specific RS. In short, it calculates the pe
age of situations in which at least one k-neighbor of each
user can rate an item that has not been rated yet by that
user. We defined Ku,i as the set of neighbors of u which have
the item i. We define the coverage of the system as the aver
the user’s coverage:
Let
Cu ¼ fi 2 Ijru;i ¼ ^ Ku;i – £g; Du ¼ fi 2 Ijru;i ¼ g
coverage ¼
1
#U
X
u2U
100 Â
#Cu
#Du

4.2. Quality of the set of recommendations: precision, recall an
The confidence of users for a certain RS does not depend d
on the accuracy for the set of possible predictions. A user
confidence on the RS when this user agrees with a reduced
recommendations made by the RS.
In this section, we define the following three most widely
recommendation quality measures: (1) precision, which ind
the proportion of relevant recommended items from the
number of recommended items, (2) recall, which indicates th
portion of relevant recommended items from the number o
vant items, and (3) F1, which is a combination of precisio
recall.
Let Xu as the set of recommendations to user u, and Zu as t
of n recommendations to user u. We will represent the eval
erlocker et al. [92] which
ight, significance weighting,
rhood and rating normaliza-
oposes a framework in which
ubsystems, one of them to
de useful/interesting items.
which introduces levels of
modifications in the RS more
s an evaluation framework
g process during the system
meworks proposed until now
these is the lack of formal-
rics are well defined, there
ementation of the methods
specified, can lead to the
similar experiments. The
tandardization of the evalu-
novelty and trust of the
plete series of mathematical
uthors provide a set of eval-
uality analysis of the follow-
tions, novelty and trust.
election of the RS evaluation
he bibliography.
solute error, accuracy and
The confidence of users for a certain RS does not depend directly
on the accuracy for the set of possible predictions. A user gains
confidence on the RS when this user agrees with a reduced set of
recommendations made by the RS.
In this section, we define the following three most widely used
recommendation quality measures: (1) precision, which indicates
the proportion of relevant recommended items from the total
number of recommended items, (2) recall, which indicates the pro-
portion of relevant recommended items from the number of rele-
vant items, and (3) F1, which is a combination of precision and
recall.
Let Xu as the set of recommendations to user u, and Zu as the set
of n recommendations to user u. We will represent the evaluation
precision, recall and F1 measures for recommendations obtained
by making n test recommendations to the user u, taking a h rele-
vancy threshold. Assuming that all users accept n test
recommendations:
precision ¼
1
#U
X
u2U
#fi 2 Zujru;i P hg
n
ð4Þ
recall ¼
1
#U
X
u2U
#fi 2 Zujru;i P hg
#fi 2 Zujru;i P hg þ # i 2 Zc
u

ru;i P h
È É ð5Þ
F1 ¼
2 Â precision Â recall
precision þ recall
ð6Þ
4.3. Quality of the list of recommendations: rank measures

Mesures d’évaluation (2)
— Qualité d’une liste de recommandations (selon les rangs) : DCG au rang k :  
. le gain apporté par un item est inversement lié à sa position dans la liste
. calculé pour chaque utilisateur (u) puis moyenne sur tous les utilisateur
nDCG est la version normalisée selon le « DCG idéal » (liste idéale)
— Nouveauté et diversité
62
t mean
out.
the RS
(ru,i =
em i on
r u hav-
system
ute dif-
informs
used are the following standard information retrieval measures:
(a) half-life (7) [43], which assumes an exponential decrease in
the interest of users as they move away from the recommenda-
tions at the top and (b) discounted cumulative gain (8) [17], wherein
decay is logarithmic.
HL ¼
1
#U
X
u2U
XN
i¼1
maxðru;pi
À d; 0Þ
2ðiÀ1Þ=ðaÀ1Þ
ð7Þ
DCGk
¼
1
#U
X
u2U
ru;p1
þ
Xk
i¼2
ru;pi
log2ðiÞ
!
ð8Þ
p1,. . .,pn represents the recommendation list, ru,pi represents
the true rating of the user u for the item pi, k is the rank of the eval-
uated item, d is the default rating, a is the number of the item on
the list such that there is a 50% chance the user will review that
item.
4.4. Novelty and diversity
The novelty evaluation measure indicates the degree of differ-
ence between the items recommended to and known by the user.
The diversity quality measure indicates the degree of differentia-
tion among recommended items.
Currently, novelty and diversity measures do not have a stan-
dard; therefore, different authors propose different metrics
[163,220]. Certain authors have [105] used the following:
diversityZu
¼
1
#Zuð#Zu À 1Þ
X
i2Zu
X
j2Zu;j–i
½1 À simði; jÞŠ ð9Þ
noveltyi ¼
1
#Zu À 1
X
j2Zu
½1 À simði; jÞŠ; i 2 Zu ð10Þ
Here, sim(i, j) indicates item to item memory-based CF similar-
ity measures. Zu indicates the set of n recommendations to user u.
4.5. Stability
The stability in the predictions and recommendations influ-
ences on the users’ trust towards the RS. A RS is stable if the pre-
dicitions it provides do not change strongly over a short period
4.6. Re
The
about
recom
{1,. . .,
value o
degree
value
able if
obtain
In H
ing the
ble to
measu
cross v
ated to
vides a
which
probab
quentl
provid
for tak
measu
algorit
The
based
ity of
measu
Fig. 7. Recommender systems evalu
item.
diversityZu
¼
1
#Zuð#Zu À 1Þ
X
i2Zu
X
j2Zu;j–i
noveltyi ¼
1
#Zu À 1
X
j2Zu
4.6. Reliab
The re
about ho
recomme
{1,. . .,5},
value of p
degree th
value 4.5
able if it
obtained
In Her
ing the us
ble to be
measure
cross vali
ated to a
vides a p
which us
probably
quently,
provides
item.
diversityZu
¼
1
#Zuð#Zu À 1Þ
X
i2Zu
X
j2Zu;j–i
noveltyi ¼
1
#Zu À 1
X
j2Zu
ity measures. Zu indicates the set of n recommendations to user u.
4.5. Stability
The stability in the predictions and recommendations influ-
ences on the users’ trust towards the RS. A RS is stable if the pre-
dicitions it provides do not change strongly over a short period
of time. Adomavicius and Zhang [4] propose a quality measure of
stability, MAS (Mean Absolute Shift). This measure is defined
4.6. R
Th
abou
recom
{1,. . .
value
degre
value
able
obtai
In
ing th
ble to
meas
cross
ated
vides
whic
proba
quen
provi
for ta
meas
algor
Th
based
ity o
meas
ratin
follow
Fig. 7. Recommender systems eva

Mesures d’évaluation (3)
— D’autres mesures orientées utilisateurs
— Pertinence (accuracy) perçue par l’utilisateur
— Familiarité : les items sont connus (leur existence) des utilisateurs
— Nouveauté : découverte d’items nouveaux
— Attractivité : les items attirent les utilisateurs (pas toujours le cas d’items
pertinents…)
— Utilité : les items ont été appréciés (après usage / lecture)
— Compatibilité avec le contexte de l’utilisateur
— Niveau de l’interaction
— Contrôle des paramètres
— Explications de la recommandation
— Transparence de la méthode
63

FILTRAGE COLLABORATIF
64

Filtrage collaboratif
Nous sommes des êtres « sociaux »
— Les « autres » dictent / influencent nos choix
— Nos relations sont typées (amis / ennemis, famille, relations professionnelles…)
— « Dis moi qui sont tes amis, je te dirai qui tu es » — homophilie
66
C Pairs (user, item) that have not been voted for and k
accept predictions
D Pairs (user, item) that have not been voted for
Exy Items that have recently been voted for by both user x ft userl,
and user y user2
S„ User's recent votes user, ji
Table 4
Running example: RS database.
ru,¡ h h h
u Í5 Í6 h Í8 h ho in Í12 Í13 Í14
Ui 5 • 3 • 4 • • 4 • 2 4 •
u?. 1 • 2 4 1 4 1
U3 5 2 4 • • 3 5 4 • • 4 •
U4 4 • 3 • • • 5 4 • • • •
¡A 3 3 4 5 • • 5 •
2.3. Obtaining a user's K-neighbors
2.3.1. Formalization
aggre
Let G
Pu,i =
Pu,i =
Pu,i =
where
W
none
make
the R
pleme
Notes explicites vs. notes implicites (nombre d’accès ou de citations, temps passé…)
Notes à prédire

Filtrage collaboratif : similarités et voisinages
68
Variante : item to item

Quelles fonctions de similarité ?
- Corrélation de Pearson
- Corrélation de Spearman sur les rangs
- Cosinus
- Distance euclidienne
- Métriques plus complexes:
- JMSD pour intégrer des informations non numériques
(combinaison de Pearson et de Jaccard)
- « Optimum de Pareto » pour filtrer les individus les moins
représentatifs
- Intégration des scores des autres individus / autres items
69
lu;i ¼ 1
X
n2Gu;i
simðu; nÞ () Gu;i – £
,
:
The most popular similarity metrics are Pearson
cosine (7), constrained Pearson’s correlation (8)
rank correlation (9):
sim x; yð Þ ¼
P
i rx;i À rx
À Á
ry;i À ry
À Á
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i rx;i À rx
À Á2P
i ry;i À ry
À Á2
q ;
sim x; yð Þ ¼
P
irx;iry;i
ffiffiffiffiffiffiffiffiffiffiffiffiP
ir2
x;i
q ffiffiffiffiffiffiffiffiffiffiffiffiP
ir2
y;i
q ;
n2Gu;i
pu;i ¼ ru þ lu;i
X
n2Gu;i
sim u; nð Þ rn;i À rn
À Á
()
where l serves as a normalizing factor, usu
lu;i ¼ 1
X
n2Gu;i
simðu; nÞ () Gu;i – £
,
:
The most popular similarity metrics are
cosine (7), constrained Pearson’s correlat
rank correlation (9):
sim x; yð Þ ¼
P
i rx;i À rx
À Á
ry;i À ry
À Á
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i rx;i À rx
À Á2P
i ry;i À ry
À Á2
q ;
sim x; yð Þ ¼
P
irx;iry;i
ffiffiffiffiffiffiffiffiffiffiffiffiP
ir2
x;i
q ffiffiffiffiffiffiffiffiffiffiffiffiP
ir2
y;i
q ;
publications and reviews also exist which include the most com-
monly accepted metrics, aggregation approaches and evaluation
measures: mean absolute error, coverage, precision, recall and
derivatives of these: mean squared error, normalized mean absolute
error, ROC and fallout; Goldberg et al. [13] focuses on the aspects not
related to the evaluation, Breese et al. [6] compare the predictive
accuracy of various methods in a set of representative problem
domains. Candillier et al. [7] and Schafer et al. [36] review the main
collaborative filtering methods proposed in the literature.
The rest of the paper is structured as follows:
In Section 2 we provide the basis for the principles on which the
design of the new metric will be based, we present graphs
which show the way in which the users vote, we carry out
experiments which support the decisions made, we establish
the best way of selecting numerical and non-numerical infor-
mation from the votes and, finally, we establish the hypothesis
on which the paper and its proposed metric are based.
In Section 3 we establish the mathematical formulation of the
metric.
In Sections 4 and 5, respectively, we list the experiments that
will be carried out and we present and discuss the results
obtained.
Section 6 presents the most relevant conclusions of the
publication.
2. Approach and design of the new similarity metric
2.1. Introduction
Collaborative filtering methods work on a table of U users who
can rate I items. The prediction of a non-rated item i for a user u is
sim x; yð Þ ¼
P
i rx;i À rmed
À Á
ry;i À rmed
À Á
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i rx;i À rmed
À Á2
q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i ry;i À rmed
À Á2
q ;
rmed : median value in the rating scale;
sim x; yð Þ ¼
P
i rankx;i À rankx

ranky;i À ranky

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i rankx;i À rankx
2P
i ranky;i À ranky
2
r :
Although Pearson correlation is the most commonly used
ric in the process of memory-based CF (user to user), this cho
not always backed by the nature and distribution of the data i
RS. Formally, in order to be able to apply this metric with gu
tees, the following assumptions must be met:
Linear relationship between x and y.
Continuous random variables.
Both variables must be normally distributed.
These conditions are not normally met in real RS, and Pea
correlation presents some significant cases of erroneous oper
that should not be ignored in RS.
Despite the deficiencies of Pearson correlation, this simi
measure presents the best prediction and recommendation re
in CF-based RS [15,16,31,7,35], furthermore, it is the most
monly used, and therefore, any alternative metric proposed
improve its results.
On accepting that Pearson correlation is the metric for w
the results must be improved, but not necessarily the most ap
priate to be taken as a base, it is advisable to focus on the info
tion that is obtained in the different research processes and w
can sometimes be overlooked when searching for other diff
J. Bobadilla et al. / Knowledge-Based Systems 23 (2010) 520–528
x y
using standardized values [0..1]:
rx : 0:75; 1; ; 0:5; 0:25; ; 0; 0ð Þ;
ry : 0:75; 0:5; 0; 0:25; ; 0:5; 0:75; ð Þ:
We define the cardinality of a list: #l as the number of elements
in the list l different to .
(1) We obtain the list
dx;y : d
1
x;y; d
2
x;y; d
3
x;y; . . . ; d
I
x;y

j
d
i
x;y ¼ ri
x À ri
y
2
8ijri
x – ^ri
y – ; d
i
x;y ¼ 8ijri
x ¼ _ ri
y ¼ ;
ð10Þ
in our example:
dx;y ¼ ð0; 0:25; ; 0:0625; ; ; 0:5625; Þ:
(2) We obtain the MSD(x,y) measure computing the arithmetic
average of the values in the list dx,y
MSDðx; yÞ ¼ dx;y ¼
P
i¼1::I;di
x;y–
d
i
x;y
#dx;y
; ð11Þ
in our example:
dx;y ¼ ð0 þ 0:25 þ 0:0625 þ 0:5625Þ=4 ¼ 0:218
MSD(x,y) (11) tends towards zero as the ratings of users x and
y become more similar and tends towards 1 as they became
more different (we assume that the votes are normalized in
the interval [0..1]).
(3) We obtain the Jaccard(x,y) measure computing the propor-
tion between the number of positions [1..I] in which there
are elements different to in both rx and ry regarding the
number of positions [1..I] in which there are elements differ-
ent to in rx or in ry:
Jaccardðx; yÞ ¼
rx ry
rx [ ry
¼
#dx;y
#rx þ #ry À #dx;y
; ð12Þ
in our example: 4/(6 + 6À4) = 0.5.
(4) We combine the above elements in the final equation:
newmetric x; yð Þ ¼ Jaccard x; yð Þ Â 1 À MSD x; yð Þð Þ; ð13Þ
in the running example:
Table 1
Main parameters of the databases used in the experiments.
MovieLens FilmAffinity NetFlix
Number of users 4382 26447 480189
Number of movies 3952 21128 17770
Number of ratings 1000209 19126278 100480507
Min and max values 1–5 1–10 1–5
Ortega, F., SáNchez, J. L., Bobadilla, J., GutiéRrez, A. (2013). Improving collaborative filtering-based recommender systems results using Pareto dominance. Information Sciences, 239, 50-61.

Coverage Recall
CORR : Pearson ; COS : cosinus ; EUC : euclidienne ; MSD : mean squared differences

commonly used due to its low capacity to produce new recom-
mendations.
MSD offers both a great advantage and a great disadvantage at
the same time; the advantage is that it generates very good general
results: low average error, high percentage of correct predictions
and low percentage of incorrect predictions: the disadvantage is
that it has an intrinsic tendency to choose as similar users to one
given user those users who have rated a very small number of
items [35], e.g. if we have 7 items that can be rated from 1 to 5
and three users u1, u2, u3 with the following ratings: u1: (, , 4,
5, , , ), u2: (3, 4, 5, 5, 1, 4, ), u3: (3, 5, 4, 5, , 3, ) ( means
not rated item), the MSD metric will indicate that (u1,u3) have a to-
tal similarity (0), (u1,u2) have a similarity 0.5 and (u2,u3) have a
lower similarity (0.6). This situation is not convincing, as intuitively
we realize u2 and u3 are very similar, whilst u1 is only similar to u2
and u3 in 2 ratios, and, therefore, it is not logical to choose it as the
most similar to them, and what is worse, if it is chosen it will not
provide us with possibilities to recommend new items.
The strategy to follow to design the new metric is to consider-
ably raise the capacity to generate MSD predictions, without losing
along the way its good behavior as regards accuracy and quality of
the results.
The metric designed is based on two factors:
The similarity between two users calculated as the mean of the
squared differences (MSD): the smaller these differences, the
greater the similarity between the 2 users. This part of the met-
ric enables very good accuracy results to be obtained.
The number of items in which both one user and the other have
made a rating regarding the total number of items which have
been rated between the two users. E.g. given users u1: (3, 2,
4, , , ) and u2: (, 4, 4, 3, , 1), a common rating has been
made in two items as regards a joint rating of ﬁve items. This
factor enables us to greatly improve the metric’s capacity to
make predictions.
An important design aspect is the decision whether not to use a
parameter for which the value should be given arbitrarily, i.e. the
result provided by the metric should be obtained by only taking
the values of the ratings provided by the users of the RS.
By working on the 2 factors with standardized values [0..1], the
metric obtained is as follows: Given the lists of ratings of 2 generic
users x and y: rx; ry
À Á
: r1
x ; r2
x ; r3
x ; . . . ; rI
x
À Á
; r1
y ; r2
y ; r3
y ; ; . . . ; rI
y

j I is the
number of items of our RS, where one of the possible values of each
Fig. 5. MAE and coverage obtained with Pearson correlation and by combining Jaccard with Pearson correlation, cosine, constrained Pearson’s correlation, Spearman rank
correlation and mean squared differences. (A) MAE, (B) Coverage. MovieLens 1M, 20% of test users, 20% of test items, k e [2..1500] step 25.
Fig. 4. Measurements related to the Jaccard metric on MovieLens. (A) Number of pairs of users that display the Jaccard values represented on the x axis. (B) Averaged MAE
obtained in the pairs of users with the Jaccard values represented on the x axis. (C) Averaged coverages obtained in the pairs of users with the Jaccard values represented on
the x axis.
Bobadilla, J., Serradilla, F., Bernal, J. (2010). A new collaborative filtering metric that improves the behavior of
recommender systems. Knowledge-Based Systems, 23(6), 520-528.The comparative results in Graph 6B show improvements of up
to 9% when applying the new metric as regards the correlation.
ment in the results of the new metric regarding correlation, even
by 15% in some cases.
Fig. 6. Pearson correlation and new metric comparative results using MovieLens: (A) accuracy, (B) coverage, (C) percentage of perfect predictions, (D) precision/recall. 20% of
test users, 20% of test items, k e [2..1500] step 50, N e [2..20], h = 5.
Fig. 7. Correlation and new metric comparative results using NetFlix: (A) accuracy, (B) coverage, (C) percentage of perfect predictions, (D) precision/recall. 5% of test users,
20% of test items, k e [2..10000] step 100, N e [2..20], h = 9.

Michael D. Ekstrand, Michael Ludwig, Joseph A. Konstan, and John T. Riedl. 2011. Rethinking The Recommender Research Ecosystem:
Reproducibility, Openness, and LensKit. In Proceedings of the Fifth ACM Conference on Recommender Systems (RecSys ’11). ACM, New York, NY,
USA, 133-140. DOI=10.1145/2043932.2043958.

Evaluation par validation croisée
73

Le problème du démarrage à froid
— Nouvelle application
— Recommandation éditoriale
— Encourager les utilisateurs à
donner des avis
— Nouvel utilisateur
— Exploiter autant que possible
d’autres informations sur l’utilisateur
— formulaires,
— amis sur les réseaux sociaux
(= demander l’accès)
— préférences sous forme de
tags…
— Nouvel item
— Exploiter les méta-données (pour
un film : année, réalisateur,
acteurs…)
— Exploiter les critiques que l’on
peut trouver par ailleurs sur le Web
74

Amazon : Organisation des objets (catégories)
78
Product Advertising API https://aws.amazon.com/
cf. http://www.codediesel.com/libraries/amazon-advertising-api-browsenodes/

Similarités et espaces latents
79
Koren Y, Bell R, Volinsky C. Matrix Factorization Techniques for Recommender Systems. IEEE Computer. July 2009:42-50.

Projection de la matrice individus / items
— Chaque item I est représenté par un vecteur q de dimension f
— Chaque utilisateur U est représenté par un vecteur p de dim. f
— Chaque facteur représente une propriété latente qui caractérise 
les items et qui souligne l’intérêt des utilisateurs pour celle-ci
— Le produit scalaire entre q et p est une estimation de l’intérêt de U pour I
— Méthode :
— Décomposition en valeurs singulières
— Approximation par descente de gradient (sur des données d’apprentissage)
80
note réelle note prédite facteur de régularisation
constante de régularisation (apprise par validation croisée)

Espaces latents (suite)
— Espace non convexe : risque de solution éloignée de l’optimum global
— Approche par Moindres Carrés Alternés (Alternating Least Squares)
. Fixe q, cherche p ; fixe p, cherche q etc.
. Utile lorsque les données (notes d’apprentissage) sont implicites (matrice non
creuse)
— Tenir compte des biais = modifier les valeurs prédites
— Des utilisateurs ont tendance à toujours donner de bonnes notes
— Certains items ont toujours tendance à avoir de bonnes notes
— Le score final doit dépendre de la moyenne de tous les scores (base de départ)
— Intégrer les préférences a priori des utilisateurs (x : items préférés de u ; y: attributs (âge…))
— Tenir compte de la dynamique
81

https://www.slideshare.net/MrChrisJohnson/interactive-recommender-systems-with-netflix-and-spotify/48-Diversity_Scorenote

Filtrage collaboratif à destination de « groupes »
87

Intégration du contexte
— Très nombreuses définition du contexte
— Plusieurs stratégies d’intégration
88
Adomavicius G, Mobashe B, Ricci F, Tuzhilin A. Context-Aware Recommender Systems. In AAAI 2011 :; 2017:67-81.

Intégration du contexte (suite)
— Cube Individus x Items x Contextes remplace la matrice Individus x Items
— Factorisation de tenseurs
89
Karatzoglou, A.; Amatriain, X.; Baltrunas, L.; and Oliver, N. 2010. Multiverse Recommendation: N-Dimensional Tensor Factorization
for Context-Aware Collaborative Filtering. In Proceedings of the 2010 ACM Conference on Recommender Systems, 79–86.

Exploitation des liens (réseaux sociaux)
— Le réseau social comme entrée supplémentaire
90
Yang X, Guo Y, Liu Y, Steck H. A survey of collaborative filtering based social recommender systems. Computer
Communications. 2014;41(C):1-10. doi:10.1016/j.comcom.2013.06.009.

Exploitation des liens (réseaux sociaux) (2)
— Prédiction selon les liens entre individus (inférence Bayésienne)
91
individu qui cherche une note
individus qui ont noté l’item
individus intermédiaires
qui réunissent les notes

FILTRAGE SELON LE
CONTENU
92

Recommandation basée sur le contenu
— Lien fort avec la Recherche d’Information
— La notion de « Profil utilisateur » est à rapprocher de la notion de « Requête »
93

https://www.slideshare.net/MrChrisJohnson/interactive-recommender-systems-with-netflix-and-spotify/81-81NLP_models_also_work_on

Un mot, une chose ? pas si simple…
95

Contenu audio
96
Wang, X., Wang, Y. (2014, November). Improving content-based and hybrid music recommendation using deep learning.
In Proceedings of the 22nd ACM international conference on Multimedia (pp. 627-636). ACM.

LA RECOMMANDATION DE
LECTURES (LIVRES)
97

Recommending Books vs Searching for Books ?
Very diverse needs :
— Topicality
— With a precise context eg. arts in China during the XXth century
— With named entities : locations (the book is about a specific location OR the action
takes place at this location), proper names…
— Style / Expertise / Language
— fiction, novel, essay, proceedings, position papers…
— for experts / for dummies / for children …
— in English, in French, in old French, in (very) local languages …
— looking for citations / references
— in what book appears a given citation
— what are the books that refer to a given one
— Authority :
— What are the most important books about … (what most important means ?)
— What are the most popular books about …
99

http://social-book-search.humanities.uva.nl/#/overview
2 The Amazon collection
The document used for this year’s Book Track is composed of Amazon pages of
existing books. These pages consist of editorial information such as ISBN num-
ber, title, number of pages etc... However, in this collection the most important
content resides in social data. Indeed Amazon is social-oriented, and user can
comment and rate products they purchased or they own. Reviews are identi-
fied by the review fields and are unique for a single user: Amazon does not
allow a forum-like discussion. They can also assign tags of their creation to a
product. These tags are useful for refining the search of other users in the way
that they are not fixed: they reflect the trends for a specific product. In the
XML documents, they can be found in the tag fields. Apart from this user
classification, Amazon provides its own category labels that are contained in the
browseNode fields.
Table 1. Some facts about the Amazon collection.
Number of pages (i.e. books) 2, 781, 400
Number of reviews 15, 785, 133
Number of pages that contain a least a review 1, 915, 336
3 Retrieval model
3.1 Sequential Dependence Model
Like the previous year, we used a language modeling approach to retrieval [4].
We use Metzler and Croft’s Markov Random Field (MRF) model [5] to integrate
multiword phrases in the query. Specifically, we use the Sequential Dependance
Organizers
Marijn Koolen (University of Amsterdam)

Toine Bogers (Aalborg University Copenhagen)

Antal van den Bosch (Radboud University Nijmegen)

Antoine Doucet (University of Caen)

Maria Gaede (Humboldt University Berlin)

Preben Hansen (Stockholm University)

Mark Hall (Edge Hill University)

Iris Hendrickx (Radboud University Nijmegen)

Hugo Huurdeman (University of Amsterdam)

Jaap Kamps (University of Amsterdam)

Vivien Petras (Humboldt University Berlin)

Michael Preminger (Oslo and Akershus University College of Applied Sciences)

Mette Skov (Aalborg University Copenhagen)

Suzan Verberne (Radboud University Nijmegen)

David Walsh (Edge Hill University)

http://social-book-search.humanities.uva.nl
SBS Collection : des requêtes réelles issues du
forum Library Thing

Le catalogue de
la personne qui
pose la question

Social Tagging
104
Complement categories but a lot of tags !

Des profils « utilisateur » (catalog, reviews, ratings)

Idée : utiliser les critiques et commentaires
plutôt que les contenus
106
Commentaires  
contiennent:
- keywords
- topics
- sentiment
- abstracts
- other books

6
La Recommandation de Livres / RI
SBS 2016 – Dataset : Amazon collection of 2.8M records
Index Fields
Université Aix-Marseille Amal Htait

7
SBS 2016 – Dataset : LibraryThing Collection of 113,490 users profiles
userid workid author booktitle publication-year catalogue-date rating tags
u3266995 660947 Rosina Lippi Homestead 1999 2006-06 10.0 fiction
u1885143 2729214 Ellen Hopkins Glass 2009 2009-05 6.0 drugs
u1885143 133315 Tite Kubo Bleach, Vol. 1 2004 2009-06 6.0 manga
Index Fields

8
SBS 2016 - Topics Query : Traitement de la requête par
les Informations des Livres en Exemples

9
SBS 2016 - Retrieval Model : Méthode - SDM
Weighting query terms [Metzler2005]
● Unigram matches
● Bigram exact matches
● Bigram matches within an
un-ordered window of 8 terme

Koolen, M., Bogers, T., Gäde, M., Hall, M., Hendrickx, I., Huurdeman, H., ... Walsh, D. (2016, September). Overview of the CLEF 2016 Social
Book Search Lab. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 351-370). Springer
International Publishing.

http://ceur-ws.org/Vol-1609/

Building a Graph of Books
— Nodes = books + properties (metadata, #reviews and ranking, page ranks, ratings…)
— Edges = links between books
— Book A refers to Book B according to: 
— Bibliographic references and citations (in the book / in the reviews) 
— Amazon recommendation (People who bought A bought B, People who liked A liked B…)
— A is similar to B 
— They share bibliographic references 
— Full-text similarity + similarity between the metadata
115
The graph allows to
estimate
— « Book Ranks »
(cf. the Google’s Page Rank)
— Neighborhood
— Shortest paths

Jeh, G., Widom, J. (2002, July). SimRank: a measure of structural-context similarity. In Proceedings of the eighth ACM
SIGKDD international conference on Knowledge discovery and data mining (pp. 538-543). ACM.

Recommending books : IR + graph mining
117
IR : Sequential Dependance Model (SDM) - Markov Random Field (Metzler Croft, 2004) and/or Divergence
From Randomness (InL2) model + Query Expansion with Dependance Analysis
Ratings : The more a book has reviews and the more it has good ratings, the more relevant it is.
Graph : Expanding the retrieved books with Similar Books then Reranking with PageRank
13
●
We tested many reranking methods. Combining the
retrieval model scores and other scores based on social
information.
●
For each document compute:
– PageRank: algorithm that exploits link structure to score
the important of nodes in the graph.
– Likeliness: Computed from information generated by
users (reviews and ratings). More the book has a lot of
reviews and good ratings, the more interesting it is.
Graph Modeling – Reranking Schemes
12
ti Retrieving
Collection
DGD
Dti
DStartingNodes
Neighbors
SPnodes
DgraphDgraph
Delete
duplications
D nal
1 2
3
5
6
7
89 + 10
Reranking
11
Graph Modeling - Recommendation
Page Rank + Similar Products - Very good results in 2011
(Judgements obtained by crowdsourcing)
(IR and ratings) 
P@10 ≈ 0.58 
- Good results in 2014
(IR, ratings, expansion)
P@10 ≈ 0.23 ; MAP ≈ 0.44 
 
- in 2015 : rank 25/47
(IR + graph  
but graph improved IR)
P@10 ≈ 0.2
(best 0.39, included  
the price of books)

Une perspective: fouille de graphes multicouches
— Thèse de Mohamed Ettaleb (co-dirigée par Pr. C. Latiri, B. Douhar, P. Bellot)
118
Livres « similaire à »
couche
« achetés
ensemble »
couche
auteurs
couche
tags
Question : quels sous-graphes fréquents ? comment les interpréter ?

119
Et dans la vraie vie ? (pour nous : OpenEdition)
*ProvenanceDBased*Access*Control*for*the*Cloud!
VGOLAP:*Volunteered*Geographic*OLAP*
BILBO
ÉCHO
CLASSIFICATION
AUTOMATIQUE
ET MÉTADONNÉES
RECOMMANDATION
GRAPHE
DE CONTENUS
questions de
communication
vertigo
edc
echogeo
vertigo
quaderni
BILBO - MISE EN RELATION
DES COMPTES-RENDUS
AVEC LES LIVRES
ÉCHO - ANALYSE
DES SENTIMENTS
Langouet, G., (1986), « Innovations pédago-
giques et technologies éducatives », Revue
française de pédagogie, n° 76, pp. 25-30.
Langouet, G., (1986), « Innovations pédagogiques et technologies éducatives
», Revue française de pédagogie, n° 76, pp. 25-30.
DOI : 10.3406/rfp.1986.1499
18 Voir Permanent Mandates Commission, Minutes of
the Fifteenth Session (Geneva: League of Nations,
1929), pp. 100-1. Pour plus de détails, voir Paul Ghali,
Les nationalités détachées de l’Empire ottoman à la
suite de la guerre (Paris: Les Éditions Domat-Mont-
chrestien, 1934), pp. 221-6.
ils ont déjà édité trois recueils et auxquelles ils ont consacré
de nombreux travaux critiques. Leur nouvel ouvrage, intitulé
Le Roman véritable. Stratégies préfacielles au XVIIIe siècle
et rédigé à six mains par Jan Herman, Mladen Kozul et
Nathalie Kremer – chaque auteur se chargeant de certains
chapitres au sein
BILBO
NIVEAU 1
NIVEAU 2
NIVEAU 3
biblauthorsurnameLangouet/surname,
forenameG./forename,/author (date1986/date),
title level=a« Innovations pédagogiques et
technologies éducatives »/title, title
level=jRevue française de pédagogie/title,
abbrn°/abbr biblScope
type=issue76/biblScope, abbrpp./abbr
biblScope type=page25-30/biblScope. idno
type=DOIDOI : 10.3406/rfp.1986.1499/idno/bibl
RI sociale
Extraction d’information par Programmation Logique Inductiveles de langue temporels et apprentissage de méta-caractéristiques
OpenEdition
OpenEdition
Univ. Recife (Brésil)
Extraction d’information
Chercher des critiques
Les reliers aux livres
Analyse de sentiments Recommandation de livres
SVM - Z-score - CRF Graph scoring
NOTES
POLARITE
GRAPHE
RECOMMANDATION
Analyse de citations

Identifier des critiques de livres dans des blogs
• Classification supervisée « en genre »
• Caractéristiques : unigrammes, localisation des entités nommées, dates
• Sélection de caractéristiques : Seuil du Z-score + random forest
120
tations (bag-of-words, feature selection using z-
score, Named Entity repartition in the text).
6.1 Naive Bayes (NB)
In order to evaluate different classification mod-
els, we have adopted as a baseline the naive Bayes
approach (Zubaryeva and Savoy, 2010). The clas-
sification system has to choose between two pos-
sible hypotheses: h0 = It is a Review and h1 =
It is not a Review the class that has the maxi-
mum value according to the Equation (5). Where
|w| indicates the number of words included in the
current document and wj is the number of words
that appear in the document.
arg max
hi
P(hi).
|w|
Y
j=1
P(wj|hi) (5)
where P(wj|hi) =
tfj,hi
nhi
We estimate the probabilities with the Equation
(5) and get the relation between the lexical fre-
quency of the word wj in the whole size of the
collection Thi
(denoted tfj,hi
) and the size of the
corresponding corpus.
We have used different strategies to represent each
textual unit. First, the unigram model (Bag-of-
Words) where all words are considered as features.
We also used feature selection based on the nor-
malized z-score by keeping the first 1000 words
according to this score (after removing all words
that appear less than 5 times). As the third ap-
proach, we suggested that the common features
between the Review collection can be located in
the Named Entity distribution in the text.
Table 4: Results showing the performances of
the classification models using different indexing
schemes on the test set. The best values for the
Review class are noted in bold and those for
Review class are are underlined
Review Review
# Model R P F-M R P F-M
1 NB 65.5% 81.5% 72.6% 81.6% 65.7% 72.8%
SVM (Linear) 99.6% 98.3% 98.9% 97.9% 99.5% 98.7%
SVM (RBF) 89.8% 97.2% 93.4% 96.8% 88.5% 92.5%
* C = 5.0
* = 0.00185
2 NB 90.6% 64.2% 75.1% 37.4% 76.3% 50.2%
SVM (Linear) 87.2% 81.3% 84.2% 75.3% 82.7% 78.8%
SVM (RBF) 87.2% 86.5% 86.8% 83.1% 84.0% 83.6%
* C = 32.0
* = 0.00781
3 NB 80.0% 68.4% 73.7% 54.2% 68.7% 60.6%
SVM (Linear) 77.0% 81.9% 79.4% 78.9% 73.5% 76.1%
SVM (RBF) 81.2% 48.6% 79.9% 72.6% 75.8% 74.1%
* C = 8.0
* = 0.03125
Z scores across the corpus.
# Feature Z
score
# Feature Z
score
1 abandonne 30.14 16 winter 9.23
2 seront 30.00 17 cleo 8.88
3 biographie 21.84 18 visible 8.75
4 entranent 21.20 19 fondamentale 8.67
5 prise 21.20 20 david 8.54
6 sacre 21.20 21 pratiques 8.52
7 toute 20.70 22 signification 8.47
8 quitte 19.55 23 01 8.38
9 dimension 15.65 24 institutionnels 8.38
10 les 14.43 25 1930 8.16
11 commandement 11.01 26 attaques 8.14
12 lie 10.61 27 courrier 8.08
13 construisent 10.16 28 moyennes 7.99
14 lieux 10.14 29 petite 7.85
15 garde 9.75 30 adapted 7.84
In our training corpus, we have 106 911 words
obtained from the Bag-of-Words approach. We se-
lected all tokens (features) that appear more than
5 times in each classes. The goal is therefore to
design a method capable of selecting terms that
clearly belong to one genre of documents. We ob-
tained a vector space that contains 5 957 words
(features). After calculating the normalized z-
score of all features, we selected the first 1 000
features according to this score.
(Poibeau, 2003). We aim to explore the distribu-
tion of 3 named entities (”authors’ names”, ”loca-
tions” and ”dates”) in the text after removing all
XML-HTML tags. After that, we divided texts
into 10 parts (the size of each part = total num-
ber of words / 10). The distribution ratio of each
named entity in each part is used as feature to build
the new document representation and we obtained
a set of 30 features.
Figure 3: ”Person” named entity distribution
6 Experiments
In this section we describe results from experi-
ments using a collection of documents from Re-
vues.org and the Web. We use supervised learning
Figure 4: ”Location” named entity distribution
Figure 5: ”Date” named entity distribution
methods to build our classifiers, and evaluate the
resulting models on new test cases. The focus of
our work has been on comparing the effectiveness
of different inductive learning algorithms (Naive
Bayes, Support Vector Machines with RBF and
Linear Kernels) in terms of classification accuracy.
We also explored alternative document represen-
6.2 Supp
SVM desig
by Vapnik
recognition
method is
mization p
tional learn
learn linea
a simple p
tion, they
radial basi
layer sigm
key in suc
mal bound
use them f
garwal and
the differen
used the W
model with
Basic Func
a good lev
growth of
stage.(Kum
6.3 Resu
We have us
textual uni
Words) wh
Figure 4: ”Location” named entity distribution
Figure 5: ”Date” named entity distribution
methods to build our classifiers, and evaluate the
resulting models on new test cases. The focus of
our work has been on comparing the effectiveness
of different inductive learning algorithms (Naive
Bayes, Support Vector Machines with RBF and
Linear Kernels) in terms of classification accuracy.
We also explored alternative document represen-
6.1 Naive Bayes (NB)
In order to evaluate different classification mod-
els, we have adopted as a baseline the naive Bayes
approach (Zubaryeva and Savoy, 2010). The clas-
sification system has to choose between two pos-
sible hypotheses: h0 = It is a Review and h1 =
It is not a Review the class that has the maxi-
mum value according to the Equation (5). Where
6.2 Support Vector Machines (SVM)
SVM designates a learning approach introduced
by Vapnik in 1995 for solving two-class pattern
recognition problem (Vapnik, 1995). The SVM
method is based on the Structural Risk Mini-
mization principle (Vapnik, 1995) from computa-
tional learning theory. In their basic form, SVMs
learn linear threshold function. Nevertheless, by
a simple plug-in of an appropriate kernel func-
tion, they can be used to learn linear classifiers,
radial basic function (RBF) networks, and three-
layer sigmoid neural nets (Joachims, 1998). The
key in such classifiers is to determine the opti-
mal boundaries between the different classes and
use them for the purposes of classification (Ag-
garwal and Zhai, 2012). Having the vectors form
the different representations presented below. we
used the Weka toolkit to learning model. This
model with the use of the linear kernel and Radial
Basic Function(RBF) sometimes allows to reach
a good level of performance at the cost of fast
growth of the processing time during the learning
stage.(Kummer, 2012)
6.3 Results
We have used different strategies to represent each
textual unit. First, the unigram model (Bag-of-
Words) where all words are considered as features.
We also used feature selection based on the nor-
malized z-score by keeping the first 1000 words
according to this score (after removing all words
that appear less than 5 times). As the third ap-
proach, we suggested that the common features
between the Review collection can be located in
the Named Entity distribution in the text.

L’analyse de sentiments sur les critiques
• Statistical Metrics (PMI, Z-score, odd ratio…)
• Combined with Linguistic Ressources
121
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
Bing Liu's Opinion Lexicon which is created by
(Hu and Liu 2004) and augmented in many latter
works. We extract the number of positive, nega-
tive and neutral words in tweets according to the-
se lexicons. Bing Liu's lexicon only contains
negative and positive annotation but Subjectivity
contains negative, positive and neutral.
We annotate each word in the tweet by its POS
tag, and then we compute the number of adjec-
tives, verbs, nouns, adverbs and connectors in
each tweet.
4 Evaluation
4.1 Data collection
We used the data set provided in SemEval 2013
and 2014 for subtask B of sentiment analysis in
Twitter(Rosenthal, Ritter et al. 2014) (Wilson,
Kozareva et al. 2013). The participants were
provided with training tweets annotated as posi-
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
Bin
(Hu
wor
tive
se
neg
con
- Pa
We
tag,
tive
eac
4
4.1
W
and
Tw
Koz
pro
tive
twe
we
of p
pus multiplied by nj the number of term
class Cj, and standard deviation (sdi) o
according to the underlying corpus (
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
The term which has salient frequency in
in compassion to others will have a
Z_score. Z_score was exploited for
(Zubaryeva and Savoy 2010) , they c
threshold (2) for selecting the number
having Z_score more than the thresho
they used a logistic regression for co
these scores. We use Z_scores as added
for classification because the tweet is to
therefore many tweets does not have an
with salient Z_score. The three following
1,2,3 show the distribution of Z_score o
class, we remark that the majority of te
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
1,2,3 show the distribution of Z_score over each
class, we remark that the majority of terms has
Z_score between -1.5 and 2.5 in each class and
the rest are either vey frequent (2.5) or very rare
(-1.5). It should indicate that negative value
means that the term is not frequent in this class in
comparison with its frequencies in other classes.
Table1 demonstrates the first ten terms having
the highest Z_scores in each class. We have test-
ed to use different values for the threshold, the
best results was obtained when the threshold is 3.
positive
Z_score
negative
Z_score
Neutral
Z_score
Love
Good
Happy
Great
Excite
Best
Thank
Hope
Cant
Wait
14.31
14.01
12.30
11.10
10.35
9.24
9.21
8.24
8.10
8.05
Not
Fuck
Don’t
Shit
Bad
Hate
Sad
Sorry
Cancel
stupid
13.99
12.97
10.97
8.99
8.40
8.29
8.28
8.11
7.53
6.83
Httpbit
Httpfb
Httpbnd
Intern
Nov
Httpdlvr
Open
Live
Cloud
begin
6.44
4.56
3.78
3.58
3.45
3.40
3.30
3.28
3.28
3.17
Table1. The first ten terms having the highest Z_score in
each class
We annotate each word in the tweet by its POS
tag, and then we compute the number of adjec-
tives, verbs, nouns, adverbs and connectors in
each tweet.
4 Evaluation
4.1 Data collection
We used the data set provided in SemEval 2013
and 2014 for subtask B of sentiment analysis in
Twitter(Rosenthal, Ritter et al. 2014) (Wilson,
Kozareva et al. 2013). The participants were
provided with training tweets annotated as posi-
tive, negative or neutral. We downloaded these
tweets using a given script. Among 9646 tweets,
we could only download 8498 of them because
of protected profiles and deleted tweets. Then,
we used the development set containing 1654
tweets for evaluating our methods. We combined
the development set with training set and built a
new model which predicted the labels of the test
set 2013 and 2014.
4.2 Experiments
Official Results
The results of our system submitted for
SemEval evaluation gave 46.38%, 52.02% for
test set 2013 and 2014 respectively. It should
mention that these results are not correct because
of a software bug discovered after the submis-
sion deadline, therefore the correct results is
demonstrated as non-official results. In fact the
previous results are the output of our classifier
which is trained by all the features in section 3,
but because of index shifting error the test set
was represented by all the features except the
terms.
Non-official Results
We have done various experiments using the
features presented in Section 3 with Multinomial
Naïve-Bayes model. We firstly constructed fea-
ture vector of tweet terms which gave 49%, 46%
features which improve the performance by 6.5%
and 10.9%, then by pre-polarity features which
also improve the f-measure by 4%, 6%, but the
extending with POS tags decreases the f-
measure. We also test all combinations with the-
se previous features, Table2 demonstrates the
results of each combination, we remark that POS
tags are not useful over all the experiments, the
best result is obtained by combining Z_score and
pre-polarity features. We find that Z_score fea-
tures improve significantly the f-measure and
they are better than pre-polarity features.
Figure 1 Z_score distribution in positive class
Figure 2 Z_score distribution in neutral class
Features F-measure
2013 2014
Terms 49.42 46.31
Terms+Z 55.90 57.28
Terms+POS 43.45 41.14
Terms+POL 53.53 52.73
Terms+Z+POS 52.59 54.43
Terms+Z+POL 58.34 59.38
Terms+POS+POL 48.42 50.03
Terms+Z+POS+POL 55.35 58.58
Table 2. Average f-measures for positive and negative clas-
ses of SemEval2013 and 2014 test sets.
We repeated all previous experiments after using
a twitter dictionary where we extend the tweet by
the expressions related to each emotion icons or
abbreviations in tweets. The results in Table3
demonstrate that using that dictionary improves
the f-measure over all the experiments, the best
results obtained also by combining Z_scores and
pre-polarity features.
Features F-measure
2013 2014
Terms 50.15 48.56
Terms+Z 57.17 58.37
Terms+POS 44.07 42.64
Terms+POL 54.72 54.53
Terms+Z+POS 53.20 56.47
Terms+Z+POL 59.66 61.07
ses of SemEval2013 and 2014 test sets after using a twitter
dictionary.
5 Conclusion
In this paper we tested the impact of using
Twitter Dictionary, Sentiment Lexicons, Z_score
features and POS tags for the sentiment classifi-
cation of tweets. We extended the feature vector
of tweets by all these features; we have proposed
new type of features Z_score and demonstrated
that they can improve the performance.
features which improve the performance by 6.5%
and 10.9%, then by pre-polarity features which
also improve the f-measure by 4%, 6%, but the
extending with POS tags decreases the f-
measure. We also test all combinations with the-
se previous features, Table2 demonstrates the
results of each combination, we remark that POS
tags are not useful over all the experiments, the
best result is obtained by combining Z_score and
pre-polarity features. We find that Z_score fea-
tures improve significantly the f-measure and
they are better than pre-polarity features.
Figure 1 Z_score distribution in positive class
Features F-measure
2013 2014
Terms 49.42 46.31
Terms+Z 55.90 57.28
Terms+POS 43.45 41.14
Terms+POL 53.53 52.73
Terms+Z+POS 52.59 54.43
Terms+Z+POL 58.34 59.38
ses of SemEval2013 and 2014 test sets.
We repeated all previous experiments after using
a twitter dictionary where we extend the tweet by
the expressions related to each emotion icons or
abbreviations in tweets. The results in Table3
demonstrate that using that dictionary improves
the f-measure over all the experiments, the best
results obtained also by combining Z_scores and
pre-polarity features.
Features F-measure
2013 2014
Terms 50.15 48.56
Terms+Z 57.17 58.37
Terms+POS 44.07 42.64
Terms+POL 54.72 54.53
Terms+Z+POS 53.20 56.47
Terms+Z+POL 59.66 61.07
ses of SemEval2013 and 2014 test sets after using a twitter
dictionary.
5 Conclusion
[Hamdan, Béchet Bellot, SemEval 2014]
http://sentiwordnet.isti.cnr.it

http://reviewofbooks.openeditionlab.org

Linking Contents by Analyzing the References
In Books : no common stylesheet (or a lot of stylesheets poorly respected…)
Our proposal :
1) Searching for references in the document / footnotes (Support Vector Machines)
2) Annotating the references (Conditional Random Fields)
BILBO : Our (open-source) software for Reference Analysis
125
Google Digital Humanities Research Awards (2012)
Annotation
DOI search
(Crossref)
OpenEdition Journals : more than 1.5 million references analyzed
Test : http://bilbo.openeditionlab.org
Sources : http://github.com/OpenEdition/bilbo

Test : http://bilbo.openeditionlab.org
Sources : http://github.com/OpenEdition/bilbo

Ollagnier, A., Fournier, S., Bellot, P. (2016). A Supervised Approach for Detecting Allusive Bibliographical References in
Scholarly Publications. In WIMS (p. 36).

thèse de Doctorat de Anaïs Ollagnier (dir. P. Bellot / S. Fournier)

http://sentiment-analyser.openeditionlab.org/aboutsemeval

SYSTÈMES HYBRIDES
130

CONCLUSION
133

Conclusion
— De très nombreuses approches (hybrides)
— Filtrage collaboratif et exploitation de l’historique
— Analyse des contenus
— Exploitation de données comportementales et d’informations
explicites
— Exploitation des réseaux sociaux
— - tout combiner dans un seul modèle d’apprentissage ? 
Quelle fonction à optimiser ?
— Des liens forts avec d’autres domaines
— Méthodes statistiques, fouille de données et de graphes,
apprentissage…
— Recherche d’information (n’est-ce pas aussi de la recommandation ?),
traitement automatique des langues, analyse d’image/signal, ergonomie
et interaction…
— Il faut choisir les approches mais aussi les données
— Usages et contextes
— Préservation de la vie privée
134

https://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf

http://lenskit.org
Michael D. Ekstrand, Michael Ludwig, Joseph A. Konstan, and John T. Riedl. 2011. Rethinking The Recommender Research Ecosystem:
Reproducibility, Openness, and LensKit. In Proceedings of the Fifth ACM Conference on Recommender Systems (RecSys ’11). ACM, New York, NY,
USA, 133-140. DOI=10.1145/2043932.2043958.

Çoba, L., Zanker, M. rrecsys: an R-package for prototyping recommendation algorithms, RecSys 2016.

Challenges
140

http://lab.hypotheses.org
Merci de votre attention :-)

Recommandation sociale : filtrage collaboratif et par le contenu

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Recommandation sociale : filtrage collaboratif et par le contenu

Semelhante a Recommandation sociale : filtrage collaboratif et par le contenu (20)

Mais de Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I)

Mais de Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I) (7)

Último

Último (20)

Recommandation sociale : filtrage collaboratif et par le contenu