SlideShare uma empresa Scribd logo
1 de 141
Baixar para ler offline
RECOMMANDATION
SOCIALE
Patrice	Bellot

Aix-Marseille	Université	-	CNRS	(LSIS	UMR	7296)	—	OpenEdition	
patrice.bellot@univ-amu.fr
LSIS	-	DIMAG	team	http://www.lsis.org/dimag	
OpenEdition	Lab	:	http://lab.hypotheses.org
OpenEdition	home	page
>	4	million	unique	visitors	/	month
Our	partners:	libraries	an	institutions	
all	over	the	world
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Quelques questions ouvertes…
— Est-il utile d’exploiter les méta-données, les contenus, les commentaires ?
— Comment relier les contenus les uns aux autres ?
— Comment exploiter des contenus de nature différente ?
— Comment « comprendre » les besoins des lecteurs ? des requêtes longues ?
des profils ?
— Quels sont les usages ? Quels sont les besoins ?
— Comment aller au-delà de la pertinence informationnelle ? (genre, niveau
d’expertise, document récent ou non…)
3
— OpenEdition Lab : un programme de recherche HN
— Détecter des tendances, des sujets émergents, les livres « à lire »…

P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Plan
— Quelques exemples : poser les problèmes et les enjeux
— Quelles ressources ?
— Quelques généralités méthodologiques
— Quelques stratégies d’évaluation d’une recommandation
— Autour du filtrage collaboratif ( = recommandation « sociale » ?)
— Autour de l’analyse de contenu et de la suggestion de contenus
. focus sur la recherche de livres par requêtes longues en langue naturelle
4
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Introduction
Objectifs de la recommandation :
— Recommander des « objets » (films, livres, pages Web…)
— Prédire les notes que individus donneraient
Différents types de recommandation :
— Selon des connaissances : caractéristiques sur les individus cibles (âge, salaire…)
— Selon les préférences des individus
— exprimées par les individus eux-mêmes explicitement
— devinées en analysant leur comportement (%) — lien avec classification
— En croisant les comportements des individus : filtrage collaboratif
— En construisant des profils et en les comparant aux contenus
Un grand nombre de sources d’information :
— Informations explicitement données par les individus
— Les contenus et leurs méta-données
— Le Web et les réseaux sociaux (contenus, graphes…)
5
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 6
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
ACM Conférences et ateliers
— Conférences :
— Recommender Systems RecSys (depuis 2007)
— Sessions « Recommendation Systems » à SIGIR, CIKM,
— Ateliers :
— Context-aware Movie Recommendation (2010+2011)
— Information Heterogeneity and Fusion in Recommender Systems (2010+2011)
— Large-Scale Recommender Systems and the Netflix Prize Competition (2008)
— Recommendation Systems for Software Engineering (2008-14)
— Recommender Systems and the Social Web (2012)
7
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Articles « systèmes de recommandation »
Conférence ACM RecSys (https://recsys.acm.org)
8
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 9
Aperçu des approches
10
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
EXEMPLES
11
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 12
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 13
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 14
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 15
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 16
(2015)
https://www.slideshare.net/MrChrisJohnson/interactive-recommender-systems-with-netflix-and-spotify/20-Spotify_in_NumbersStarted_in_2006
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 17
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 18
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 19
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 20
Amazon Navigation

Graph : YASIV
21
http://www.yasiv.com/#/Search?q=orwell&category=Books&lang=US
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 22
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 23
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 24
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 25
Nombreuses considérations
26
Bobadilla	J,	Ortega	F,	Hernando	A,	Gutiérrez	A.	Recommender	systems	survey.	Knowledge-Based	Systems.	2013;46(C):109-132.	doi:10.1016/j.knosys.2013.03.012.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
RESSOURCES
27
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 28
Quelques collections de données
29
MSD(x,y) (11) tends towards zero as the ratings of users x and
y become more similar and tends towards 1 as they became
more different (we assume that the votes are normalized in
the interval [0..1]).
(3) We obtain the Jaccard(x,y) measure computing the propor-
tion between the number of positions [1..I] in which there
are elements different to  in both rx and ry regarding the
number of positions [1..I] in which there are elements differ-
ent to  in rx or in ry:
Jaccardðx; yÞ ¼
rx  ry
rx [ ry
¼
#dx;y
#rx þ #ry À #dx;y
; ð12Þ
in our example: 4/(6 + 6À4) = 0.5.
(4) We combine the above elements in the final equation:
newmetric x; yð Þ ¼ Jaccard x; yð Þ Â 1 À MSD x; yð Þð Þ; ð13Þ
in the running example:
users taken at rand
the remaining 80% w
given the huge num
its users as test user
Table 2 shows th
5. Results
In this section w
abases specified in T
MovieLens, Fig. 7 sho
responds to FilmAffi
Graph 6A shows t
ing Pearson correlat
uous). The new m
practically all the ex
of k-neighborhoods
around 0.2 stars in t
150, 200).
Graph 6B shows
small percentages i
improbable that the
film that this user h
increases, the proba
the film also increas
Table 1
Main parameters of the databases used in the experiments.
MovieLens FilmAffinity NetFlix
Number of users 4382 26447 480189
Number of movies 3952 21128 17770
Number of ratings 1000209 19126278 100480507
Min and max values 1–5 1–10 1–5
Table 2
Main parameters used in the experiments.
K (MAE, coverage, perfect predictions) Precision/recall
30
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
The MovieLens
Datasets
31
Harper,	F.	M.,		Konstan,	J.	A.	(2016).	The	movielens	datasets:	History	and	context.	ACM	
Transactions	on	Interactive	Intelligent	Systems	(TiiS),	5(4),	19.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 32
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 33
https://labrosa.ee.columbia.edu/millionsong/lastfm
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 34
https://labrosa.ee.columbia.edu/millionsong/lastfm
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 35
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 36
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 37
http://webscope.sandbox.yahoo.com/catalog.php?datatype=r
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 38
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 39
http://files.grouplens.org/datasets/hetrec2011/hetrec2011-delicious-readme.txt
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 40
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
METHODES :
GENERALITES
41
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Articles « Etat de l’art »
42
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 43
https://www.slideshare.net/xamat/recommender-systems-machine-learning-summer-school-2014-cmu
Des « individus » et des « données »
44
Soient T un tableau croisant n individus I (en lignes) et K variable
quantitatives X (en colonnes). xi,k est la valeur de la variable k pour l’indi
vidu i :
X1 X2 · · · XK variables
individus
I1 x1,1 x1,2 · · · x1,K
I2 x2,1 x2,2
... x2,K
...
...
... xi,k
...
In xn,1 xn,2
... xn,K
Un des objectifs de l’analyse de donn´ees est de d´eterminer des profil
d’individus ou, dit autrement, des classes d’individus se ressemblant. Cett
ressemblance est d´etermin´ee `a partir des valeurs des variables associ´ees au
individus.
Un autre objectif concerne les variables elles-mˆemes : calcul des corr´elatio
entre elles (`a quel point une ´evolution des valeurs de l’une entraˆıne un
´evolution des valeurs de l’autre et de quelle mani`ere), r´egression entre va
riables (formulation des liens entre variables)... L’Analyse en Composante
Principales (ACP) concerne les liaisons lin´eaires entre variables, par op
position aux liaisons quadratiques, logarithmiques ou exponentielles pa
exemple. L’ACP fait partie des analyses factorielles qui vont d´etermine
P. Bellot
• L’analyse des données peut être conduite selon

• les individus : recherche de ressemblance entre les individus (en fonction
des valeurs des variables) = classification automatique des individus

• les variables : quelles sont les variables qui expliquent le mieux les
données (les différences entre individus) ? quelles sont les composantes
principales ? où se trouve la plus grande variabilité ?
Etude des individus / étude des variables
45
Chapitre 2
Analyse en composantes
principales
Soient T un tableau croisant n individus I (en lignes) et K variables
quantitatives X (en colonnes). xi,k est la valeur de la variable k pour l’indi-
vidu i :
X1 X2 · · · XK variables
individus
I1 x1,1 x1,2 · · · x1,K
I2 x2,1 x2,2
... x2,K
...
...
... xi,k
...
In xn,1 xn,2
... xn,K
Un des objectifs de l’analyse de donn´ees est de d´eterminer des profils
d’individus ou, dit autrement, des classes d’individus se ressemblant. Cette
ressemblance est d´etermin´ee `a partir des valeurs des variables associ´ees aux
individus.
Un autre objectif concerne les variables elles-mˆemes : calcul des corr´elations
entre elles (`a quel point une ´evolution des valeurs de l’une entraˆıne une
´evolution des valeurs de l’autre et de quelle mani`ere), r´egression entre va-
riables (formulation des liens entre variables)... L’Analyse en Composantes
Principales (ACP) concerne les liaisons lin´eaires entre variables, par op-
position aux liaisons quadratiques, logarithmiques ou exponentielles par
exemple. L’ACP fait partie des analyses factorielles qui vont d´eterminer
des facteurs `a partir des valeurs des variables associ´ees aux individus. Ces
7
Chapitre 2
Analyse en composantes
principales
Soient T un tableau croisant n individus I (en lignes) et K variables
quantitatives X (en colonnes). xi,k est la valeur de la variable k pour l’indi-
vidu i :
X1 X2 · · · XK variables
individus
I1 x1,1 x1,2 · · · x1,K
I2 x2,1 x2,2
... x2,K
...
...
... xi,k
...
In xn,1 xn,2
... xn,K
Un des objectifs de l’analyse de donn´ees est de d´eterminer des profils
 temp-data.frame(temperature[1:12])
 cl = kmeans(temp,3,iter.max=2,nstart=15)
e) visualisez les classes :
 summary(cl)
 cl$cluster
 summary(cl$cluster)
 cl$center
f) Ajouter le résultat de la classification aux données
- utilisez le paquetage cluster pour accéder à la fonction clusplot :  library(cluster)
- puis :
 aggregate(temperature,by=list(cl$cluster),FUN=mean)
 cl2-data.frame(temperature,cl$cluster)
 clusplot(temperature,cl$cluster,color=TRUE,shade=TRUE,labels=2,lines=0)
5- Question «subsidiaire» : manipulation du paquetage APCluster
Installer le paquetage APCluster
Polytech’Marseille Page 2 sur 3
-6 -4 -2 0 2 4 6 8
-4-3-2-10123
Individuals factor map (PCA)
Dim 1 (86.87%)
Dim2(11.42%)
Amsterdam
Athens
Berlin
Brussels
Budapest
Copenhagen
Dublin
Elsinki
Kiev
Krakow
LisbonLondon
Madrid
Minsk
Moscow
Oslo
Paris
Prague
Reykjavik
RomeSarajevo
Sofia
Stockholm
Antwerp
Barcelona
Bordeaux
Edinburgh
FrankfurtGeneva
Genoa
Milan
Palermo
Seville
St. Petersburg
Zurich
East
North
South
West
-1.0 -0.5 0.0 0.5 1.0
-1.0-0.50.00.51.0
Variables factor map (PCA)
Dim 1 (86.87%)
Dim2(11.42%)
January
February
March
April
May
JuneJuly
August
September
October
November
December
Annual
Amplitude
Latitude
Longitude
Polytech’Marseille Page 2 sur 4
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 46
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 47
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 48
P. Bellot
ACP et réduction de la dimension
• Une façon de représenter en quelques dimensions des nuages d’individus

— en conservant au mieux les distances entre les individus

— en privilégiant les dimensions de plus grande variabilité (sélection itérative
des facteurs qui maximisent la variance)

= application d’une fonction de projection
49
P. Bellot 50
Méthodes d’apprentissage
• Différentes formes d’apprentissage

• Agent « élève » recopie l’agent « maître » -- fournir des exemples

• Raisonnement par induction (à partir d’exemples)

• Apprentissage de caractéristiques importantes

• Détection de patterns récurrents

• Ajustement des paramètres importants

• Transformation d’informations en connaissances
Exemples -- Modèle -- Test -- Correction / Enrichissement des exemples
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Approches statistiques, probabilistes
Apprentissage automatique
51
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001)
Yi+1
-
c
6
s
Xi+1
Yi 1 Yi Yi+1
c
s
c
s
c
s
Xi 1 Xi Xi+1
nd the chain-structured case of CRFs (right) for sequences.
e training data. Both algorithms are based on the im-
ed iterative scaling (IIS) algorithm of Della Pietra et al.
7); the proof technique based on auxiliary functions
be extended to show convergence of the algorithms for
exp (
j
λjtj(yi−1, yi, x, i) +
k
µksk(yi, x, i)), (2)
(yi−1, yi, x, i) is a transition feature function of the entire observation
and the labels at positions i and i−1 in the label sequence; sk(yi, x, i)
feature function of the label at position i and the observation sequence;
nd µk are parameters to be estimated from training data.
n defining feature functions, we construct a set of real-valued features
the observation to expresses some characteristic of the empirical dis-
n of the training data that should also hold of the model distribution.
mple of such a feature is
i) =
1 if the observation at position i is the word “September”
0 otherwise.
ture function takes on the value of one of these real-valued observation
b(x, i) if the current state (in the case of a state function) or previous
ent states (in the case of a transition function) take on particular val-
feature functions are therefore real-valued. For example, consider the
g transition function:
tj(yi−1, yi, x, i) =
b(x, i) if yi−1 = IN and yi = NNP
0 otherwise.
e remainder of this report, notation is simplified by writing
s(yi, x, i) = s(yi−1, yi, x, i)
Fj(y, x) =
n
i=1
fj(yi−1, yi, x, i),
ch fj(yi−1, yi, x, i) is either a state function s(yi−1, yi, x, i) or a transi-
ction t(yi−1, yi, x, i). This allows the probability of a label sequence y
observation sequence x to be written as
p(y|x, λ) =
1
Z(x)
exp (
j
λjFj(y, x)). (3)
a normalization factor.
4
for classification because the tweet is too short,
therefore many tweets does not have any words
with salient Z_score. The three following figures
1,2,3 show the distribution of Z_score over each
class, we remark that the majority of terms has
Z_score between -1.5 and 2.5 in each class and
the rest are either vey frequent (2.5) or very rare
(-1.5). It should indicate that negative value
means that the term is not frequent in this class in
comparison with its frequencies in other classes.
Table1 demonstrates the first ten terms having
the highest Z_scores in each class. We have test-
ed to use different values for the threshold, the
best results was obtained when the threshold is 3.
positive
Z_score
negative
Z_score
Neutral
Z_score
Love
Good
Happy
Great
Excite
Best
Thank
Hope
Cant
Wait
14.31
14.01
12.30
11.10
10.35
9.24
9.21
8.24
8.10
8.05
Not
Fuck
Don’t
Shit
Bad
Hate
Sad
Sorry
Cancel
stupid
13.99
12.97
10.97
8.99
8.40
8.29
8.28
8.11
7.53
6.83
Httpbit
Httpfb
Httpbnd
Intern
Nov
Httpdlvr
Open
Live
Cloud
begin
6.44
4.56
3.78
3.58
3.45
3.40
3.30
3.28
3.28
3.17
Table1. The first ten terms having the highest Z_score in
each class
- Sentiment Lexicon Features (POL)
We used two sentiment lexicons, MPQA Subjec-
tivity Lexicon(Wilson, Wiebe et al. 2005) and
tweets using a given s
we could only downl
of protected profiles
we used the develop
tweets for evaluating o
the development set w
new model which pre
set 2013 and 2014.
4.2 Experiments
Official Results
The results of ou
SemEval evaluation
test set 2013 and 20
mention that these res
of a software bug dis
sion deadline, theref
demonstrated as non-
previous results are t
which is trained by al
but because of index
was represented by a
terms.
Non-official Results
We have done vario
features presented in S
Naïve-Bayes model. W
ture vector of tweet te
for test set 2013, 20
augmented this origin
Z_score for each term ti in a class Cj (tij) by cal-
culating its term relative frequency tfrij in a par-
ticular class Cj, as well as the mean (meani)
which is the term probability over the whole cor-
pus multiplied by nj the number of terms in the
class Cj, and standard deviation (sdi) of term ti
according to the underlying corpus (see Eq.
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
The term which has salient frequency in a class
in compassion to others will have a salient
Z_score. Z_score was exploited for SA by
(Zubaryeva and Savoy 2010) , they choose a
threshold (2) for selecting the number of terms
having Z_score more than the threshold, then
they used a logistic regression for combining
these scores. We use Z_scores as added features
for classification because the tweet is too short,
therefore many tweets does not have any words
Bing Liu's Opinion Lexicon
(Hu and Liu 2004) and augm
works. We extract the numb
tive and neutral words in tw
se lexicons. Bing Liu's le
negative and positive annot
contains negative, positive a
- Part Of Speech (POS)
We annotate each word in
tag, and then we compute
tives, verbs, nouns, adverb
each tweet.
4 Evaluation
4.1 Data collection
We used the data set provi
and 2014 for subtask B of
Twitter(Rosenthal, Ritter e
Kozareva et al. 2013). Th
provided with training twee
tive, negative or neutral. W
tweets using a given script.
we could only download 8
Quels	sont	les	mots	caractéristiques	d’un	groupe	de	documents	?
Quelles	relations	significatives	à	partir	des	seules	formes	observées	?	
Analogies,	corrélations
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 52
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 53
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Recommandation et séries temporelles
54
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
EVALUATION
55
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Grille d’évaluation
56
A. Constructs and Questions of ResQue
The following contains the questionnaire statements that can be
used in a survey. They are developed based on the ResQue model
described in this paper. Users should be asked to indicate their
answers to each of the questions using the 1-5 Likert scales, where
1 indicates “strongly disagree” and 5 is “strongly agree.”
A1. Quality of Recommended Items
A.1.1 Accuracy
The items recommended to me matched my interests.*
The recommender’s interface provides sufficient informa
The information provided for the recommended item
sufficient for me.
The labels of the recommender interface are clear
adequate.
The layout of the recommender interface is attractive
adequate.*
A4. Perceived Ease of Use
A.4.1 Ease of Initial Learning
19
Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.
This volume is published and copyrighted by its editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.
he
s.
on
y.
be
ed
h.
ve
4].
es
we
of
ns
nd
he
on
of
ng
of
er
ed
The recommender gave me good suggestions.
I am not interested in the items recommended to me (reverse
scale).
A.1.2 Relative Accuracy
The recommendation I received better fits my interests than
what I may receive from a friend.
A recommendation from my friends better suits my interests
than the recommendation from this system (reverse scale).
A.1.3 Familiarity
Some of the recommended items are familiar to me.
I am not familiar with the items that were recommended to me
(reverse scale).
A.1.4 Attractiveness
The items recommended to me are attractive.
A.1.5 Enjoyability
I enjoyed the items recommended to me.
A.1.6 Novelty
The items recommended to me are novel and interesting.*
ULL PAPER
tric Evaluation of Recommender Systems and Their Interfaces (UCERSTI),
, Spain, Sep 30, 2010
613-0073, online ceur-ws.org/Vol-612/paper3.pdf
them in recent studies. On average, between 12 and 15 questions
were used. Based this previous work, we have synthesized and
organized a total of 15 questions as a simplified model for the
purpose of performing a quick and easy usability and adoption
evaluation of a recommender (see questions with * sign).
5. CONCLUSION AND FUTURE WORK
User evaluation of recommender systems is a crucial subject of
study that requires a deep understanding, development and testing
of the right dimensions (or constructs) and the standardization of
the questions used. The framework described in this paper
presents the first attempt to develop a complete and balanced
evaluation framework that measures users’ subjective attitudes
based on their experience towards a recommender.
ResQue consists of a set of 13 constructs and 60 questions for a
high-quality recommender system from the user point of view and
can be used as a standard guideline for a user evaluation. It can
also be adapted to a custom-made user evaluation by tailoring it in
an individual research context. Researchers and practitioners can
use these questionnaires with ease to measure users’ general
satisfaction with recommenders, their readiness to adopt the
technology, and their intention to purchase recommended items
and return to the site in the future.
After ResQue was finalized, we asked several expert researchers
in the community of recommender systems to review the model.
Their feedback and comments were then incorporated into the
final version of the model. This method, known as the Delphi
method, is one of the first validation attempts on the model. Since
the work was submitted, we have started conducting a survey to
further validate the model’s reliability, validity and sensitivity
using factor analysis, structural equation modeling (SEM), and
other techniques described in [21]. Initial results based on 150
participants indicate how the model can be interpreted and show
factors that correspond to the original model. At the same time,
analysis also gives some indications on how to refine the model.
More users are expected to participate in the survey and the final
outcome will be soon reported.
APPENDIX
I am not familiar with the items that were recommended to me
(reverse scale).
A.1.4 Attractiveness
The items recommended to me are attractive.
A.1.5 Enjoyability
I enjoyed the items recommended to me.
A.1.6 Novelty
The items recommended to me are novel and interesting.*
The recommender system is educational.
The recommender system helps me discover new products.
I could not find new items through the recommender (reverse
scale).
A.1.6 Diversity
The items recommended to me are diverse.*
The items recommended to me are similar to each other
(reverse scale).*
A.1.7 Context Compatibility
I was only provided with general recommendations.
The items recommended to me took my personal context
requirements into consideration.
The recommendations are timely.
A2. Interaction Adequacy
The recommender provides an adequate way for me to express
my preferences.
The recommender provides an adequate way for me to revise
my preferences.
The recommender explains why the products are
recommended to me.*
A3. Interface Adequacy
Pu	P,	Chen	L.	A	User-Centric	Evaluation	Framework	of	Recommender	Systems.	In	:	ACM	RecSys	2010	Workshop	on	User-Centric	Evaluation	of	
Recommender	Systems	and	Their	Interfaces	;	2010:14-22.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
57
e
i
e
o
y
d
0
w
,
.
l
e
l
r
e
requirements into consideration.
The recommendations are timely.
A2. Interaction Adequacy
The recommender provides an adequate way for me to express
my preferences.
The recommender provides an adequate way for me to revise
my preferences.
The recommender explains why the products are
recommended to me.*
A3. Interface Adequacy
The recommender’s interface provides sufficient information.
The information provided for the recommended items is
sufficient for me.
The labels of the recommender interface are clear and
adequate.
The layout of the recommender interface is attractive and
adequate.*
A4. Perceived Ease of Use
A.4.1 Ease of Initial Learning
19
authors. Copying permitted only for private and academic purposes.
editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.
I became familiar with the recommender system very quickly.
I easily found the recommended items.
Looking for a recommended item required too much effort
(reverse scale).
A.4.2 Ease of Preference Elicitation
I found it easy to tell the system about my preferences.
It is easy to learn to tell the system what I like.
It required too much effort to tell the system what I like
(reversed scale).
A.4.3 Ease of Preference Revision
The recommender made me more confident about my
selection/decision.
The recommended items made me confused about my choice
(reverse scale).
The recommender can be trusted.
A8. Behavioral Intentions
A.8.1 Intention to Use the System
If a recommender such as this exists, I will use it to find
products to buy.
A.8.2 Continuance and Frequency
I will use this recommender again.*
FULL PAPER
Proceedings of the ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces (UCERSTI),
Barcelona, Spain, Sep 30, 2010
Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf
I found it easy to tell the system about my preferences.
It is easy to learn to tell the system what I like.
It required too much effort to tell the system what I like
(reversed scale).
A.4.3 Ease of Preference Revision
I found it easy to make the system recommend different things
to me.
It is easy to train the system to update my preferences.
I found it easy to alter the outcome of the recommended items
due to my preference changes.
It is easy for me to inform the system if I dislike/like the
recommended item.
It is easy for me to get a new set of recommendations.
A.4.4 Ease of Decision Making
Using the recommender to find what I like is easy.
I was able to take advantage of the recommender very quickly.
I quickly became productive with the recommender.
Finding an item to buy with the help of the recommender is
easy.*
Finding an item to buy, even with the help of the
recommender, consumes too much time.
A5. Perceived Usefulness
The recommended items effectively helped me find the ideal
product.*
The recommended items influence my selection of products.
I feel supported to find what I like with the help of the
recommender.*
I feel supported in selecting the items to buy with the help of
the recommender.
A6. Control/Transparency
I feel in control of telling the recommender what I want.
A8.
A.8.
A.8.
A.8.
A.8.
6.
[1]
[2]
[3]
[4]
[5]
[6]Pu	P,	Chen	L.	A	User-Centric	Evaluation	Framework	of	Recommender	Systems.	In	:	ACM	RecSys	2010	Workshop	on	User-Centric	Evaluation	of	
Recommender	Systems	and	Their	Interfaces	;	2010:14-22.
58
The recommended items influence my selection of products.
I feel supported to find what I like with the help of the
recommender.*
I feel supported in selecting the items to buy with the help of
the recommender.
A6. Control/Transparency
I feel in control of telling the recommender what I want.
I don’t feel in control of telling the system what I want.
I don’t feel in control of specifying and changing my
preferences (reverse scale).
I understood why the items were recommended to me.
The system helps me understand why the items were
recommended to me.
The system seems to control my decision process rather than
me (reverse scale).
A7. Attitudes
Overall, I am satisfied with the recommender.*
I am convinced of the products recommended to me.*
I am confident I will like the items recommended to me. *
[4] Chen, L. and Pu, P. 2006. Trust Building with Explanation
Interfaces. In Proceedings of International Conference on
Intelligent User Interface (IUI’06), 93-100.
[5] Chen, L. and Pu, P. 2008. A Cross-Cultural User Evaluation
of Product Recommender Interfaces. RecSys 2008, 75-82.
[6] Chen, L. and Pu, P. 2009. Interaction Design Guidelines on
Critiquing-based Recommender Systems. User Modeling and
User-Adapted Interaction Journal (UMUAI), Springer
Netherlands, Volume 19, Issue3, 167-206.
[7] Davis, F.D. 1989. Perceived usefulness, perceived ease of
use, and user acceptance of information technology. MIS
Quart. 13 319-339.
[8] Grabner-Kräuter, S. and Kaluscha, E.A. 2003. Empirical
research in on-line trust: a review and critical assessment Int.
J. Hum.-Comput. Stud. (IJMMS) 58(6), 783-812.
[9] Herlocker, J.L., Konstan, J.A., Borchers, A., and Riedl, J. An
algorithmic framework for performing collaborative filtering.
In Proc. of ACM SIGIR 1999, ACM Press (1999), 230-237.
[10] Herlocker, J.L., Konstan, J.A., and Riedl, J. 2000. Explaining
collaborative filtering recommendations. CSCW 2000, 241-
250.
20
Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes.
This volume is published and copyrighted by its editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.
I became familiar with the recommender system very quickly.
I easily found the recommended items.
Looking for a recommended item required too much effort
(reverse scale).
A.4.2 Ease of Preference Elicitation
I found it easy to tell the system about my preferences.
It is easy to learn to tell the system what I like.
It required too much effort to tell the system what I like
(reversed scale).
A.4.3 Ease of Preference Revision
I found it easy to make the system recommend different things
to me.
It is easy to train the system to update my preferences.
I found it easy to alter the outcome of the recommended items
due to my preference changes.
It is easy for me to inform the system if I dislike/like the
recommended item.
It is easy for me to get a new set of recommendations.
A.4.4 Ease of Decision Making
Using the recommender to find what I like is easy.
I was able to take advantage of the recommender very quickly.
I quickly became productive with the recommender.
The recommender made me more confident about my
selection/decision.
The recommended items made me confused about my choice
(reverse scale).
The recommender can be trusted.
A8. Behavioral Intentions
A.8.1 Intention to Use the System
If a recommender such as this exists, I will use it to find
products to buy.
A.8.2 Continuance and Frequency
I will use this recommender again.*
I will use this type of recommender frequently.
I prefer to use this type of recommender in the future.
A.8.3 Recommendation to Friends
I will tell my friends about this recommender.*
A.8.4 Purchase Intention
I would buy the items recommended, given the opportunity.*
6. REFERENCES
[1] Adomavicius, G. and Tuzhilin, A. 2005. Toward the Next
Generation of Recommender Systems: A Survey of the State-
of-the-Art and Possible Extensions. IEEE Trans. Knowl.
Data Eng. 17(6), 734-749.
[2] Beenen, G., Ling, K., Wang, X., Chang, K., Frankowski, D.,
Barcelona, Spain, Sep 30, 2010
Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf
Pu	P,	Chen	L.	A	User-Centric	Evaluation	Framework	of	Recommender	Systems.	In	:	ACM	RecSys	2010	Workshop	on	User-Centric	Evaluation	of	
Recommender	Systems	and	Their	Interfaces	;	2010:14-22.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 59
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 60
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Mesures d’évaluation
— Qualité de la prédiction : Mean Absolute Error, Root Mean Squared Error, Coverage
— Qualité de la recommandation : Precision, Recall, F1-Measure
61
which in-
res: mean
s of these:
nd fallout;
o the eval-
cy of vari-
s.
ents to the
mon to at-
on, recall,
considered
pic diversi-
ating algo-
s, even at
se aspects,
mendation
e methods
MAE ¼
1
#U
X
u2U
1
#Ou
X
i2Ou
jpu;i À ru;ij
!
ð1Þ
RMSE ¼
1
#U
X
u2U
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
#Ou
X
i2Ou
ðpu;i À ru;iÞ2
s
ð2Þ
The coverage could be defined as the capacity of predicting from
a metric applied to a specific RS. In short, it calculates the percent-
age of situations in which at least one k-neighbor of each active
user can rate an item that has not been rated yet by that active
user. We defined Ku,i as the set of neighbors of u which have rated
the item i. We define the coverage of the system as the average of
the user’s coverage:
Let
Cu ¼ fi 2 Ijru;i ¼  ^ Ku;i – £g; Du ¼ fi 2 Ijru;i ¼ g
coverage ¼
1
#U
X
u2U
100 Â
#Cu
#Du
 
ð3Þ
. / Knowledge-Based Systems 46 (2013) 109–132 117
squared error, normalized mean absolute error, ROC and fallout;
berg et al. [87] focuses on the aspects not related to the eval-
n, Breese et al. [43] compare the predictive accuracy of vari-
methods in a set of representative problem domains.
e majority of articles discuss attempted improvements to the
acy of RS results (RMSE, MAE, etc.). It is also common to at-
t an improvement in recommendations (precision, recall,
etc.). However, additional objectives should be considered
nerating greater user satisfaction [253], such as topic diversi-
on and coverage serendipity.
rrently, the field has a growing interest in generating algo-
s with diverse and innovative recommendations, even at
xpense of accuracy and precision. To evaluate these aspects,
us metrics have been proposed to measure recommendation
ty and diversity [105,220].
e frameworks aid in defining and standardizing the methods
lgorithms employed by RS as well as the mechanisms to eval-
the quality of the results. Among the most significant papers
propose CF frameworks are Herlocker et al. [92] which
ates the following: similarity weight, significance weighting,
nce weighting, selecting neighborhood and rating normaliza-
Hernández and Gaudioso [95] proposes a framework in which
RS is formed by two different subsystems, one of them to
the user and the other to provide useful/interesting items.
ika et al. [125] is a framework which introduces levels of
action in CF process, making the modifications in the RS more
le. Antunes et al. [12] presents an evaluation framework
ming that evaluation is an evolving process during the system
cle.
e majority of RS evaluation frameworks proposed until now
nt two deficiencies: the first of these is the lack of formal-
n. Although the evaluation metrics are well defined, there
RMSE ¼
1
#U
X
u2U
1
#Ou
X
i2Ou
ðpu;i À ru;iÞ2
The coverage could be defined as the capacity of predicting
a metric applied to a specific RS. In short, it calculates the pe
age of situations in which at least one k-neighbor of each
user can rate an item that has not been rated yet by that
user. We defined Ku,i as the set of neighbors of u which have
the item i. We define the coverage of the system as the aver
the user’s coverage:
Let
Cu ¼ fi 2 Ijru;i ¼  ^ Ku;i – £g; Du ¼ fi 2 Ijru;i ¼ g
coverage ¼
1
#U
X
u2U
100 Â
#Cu
#Du
 
4.2. Quality of the set of recommendations: precision, recall an
The confidence of users for a certain RS does not depend d
on the accuracy for the set of possible predictions. A user
confidence on the RS when this user agrees with a reduced
recommendations made by the RS.
In this section, we define the following three most widely
recommendation quality measures: (1) precision, which ind
the proportion of relevant recommended items from the
number of recommended items, (2) recall, which indicates th
portion of relevant recommended items from the number o
vant items, and (3) F1, which is a combination of precisio
recall.
Let Xu as the set of recommendations to user u, and Zu as t
of n recommendations to user u. We will represent the eval
erlocker et al. [92] which
ight, significance weighting,
rhood and rating normaliza-
oposes a framework in which
ubsystems, one of them to
de useful/interesting items.
which introduces levels of
modifications in the RS more
s an evaluation framework
g process during the system
meworks proposed until now
these is the lack of formal-
rics are well defined, there
ementation of the methods
specified, can lead to the
similar experiments. The
tandardization of the evalu-
novelty and trust of the
plete series of mathematical
uthors provide a set of eval-
uality analysis of the follow-
tions, novelty and trust.
election of the RS evaluation
he bibliography.
solute error, accuracy and
The confidence of users for a certain RS does not depend directly
on the accuracy for the set of possible predictions. A user gains
confidence on the RS when this user agrees with a reduced set of
recommendations made by the RS.
In this section, we define the following three most widely used
recommendation quality measures: (1) precision, which indicates
the proportion of relevant recommended items from the total
number of recommended items, (2) recall, which indicates the pro-
portion of relevant recommended items from the number of rele-
vant items, and (3) F1, which is a combination of precision and
recall.
Let Xu as the set of recommendations to user u, and Zu as the set
of n recommendations to user u. We will represent the evaluation
precision, recall and F1 measures for recommendations obtained
by making n test recommendations to the user u, taking a h rele-
vancy threshold. Assuming that all users accept n test
recommendations:
precision ¼
1
#U
X
u2U
#fi 2 Zujru;i P hg
n
ð4Þ
recall ¼
1
#U
X
u2U
#fi 2 Zujru;i P hg
#fi 2 Zujru;i P hg þ # i 2 Zc
u

ru;i P h
È É ð5Þ
F1 ¼
2  precision  recall
precision þ recall
ð6Þ
4.3. Quality of the list of recommendations: rank measures
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Mesures d’évaluation (2)
— Qualité d’une liste de recommandations (selon les rangs) : DCG au rang k : 

. le gain apporté par un item est inversement lié à sa position dans la liste
. calculé pour chaque utilisateur (u) puis moyenne sur tous les utilisateur
nDCG est la version normalisée selon le « DCG idéal » (liste idéale)
— Nouveauté et diversité
62
t mean
out.
the RS
(ru,i = 
em i on
r u hav-
system
ute dif-
informs
used are the following standard information retrieval measures:
(a) half-life (7) [43], which assumes an exponential decrease in
the interest of users as they move away from the recommenda-
tions at the top and (b) discounted cumulative gain (8) [17], wherein
decay is logarithmic.
HL ¼
1
#U
X
u2U
XN
i¼1
maxðru;pi
À d; 0Þ
2ðiÀ1Þ=ðaÀ1Þ
ð7Þ
DCGk
¼
1
#U
X
u2U
ru;p1
þ
Xk
i¼2
ru;pi
log2ðiÞ
!
ð8Þ
p1,. . .,pn represents the recommendation list, ru,pi represents
the true rating of the user u for the item pi, k is the rank of the eval-
uated item, d is the default rating, a is the number of the item on
the list such that there is a 50% chance the user will review that
item.
4.4. Novelty and diversity
The novelty evaluation measure indicates the degree of differ-
ence between the items recommended to and known by the user.
The diversity quality measure indicates the degree of differentia-
tion among recommended items.
Currently, novelty and diversity measures do not have a stan-
dard; therefore, different authors propose different metrics
[163,220]. Certain authors have [105] used the following:
diversityZu
¼
1
#Zuð#Zu À 1Þ
X
i2Zu
X
j2Zu;j–i
½1 À simði; jÞŠ ð9Þ
noveltyi ¼
1
#Zu À 1
X
j2Zu
½1 À simði; jÞŠ; i 2 Zu ð10Þ
Here, sim(i, j) indicates item to item memory-based CF similar-
ity measures. Zu indicates the set of n recommendations to user u.
4.5. Stability
The stability in the predictions and recommendations influ-
ences on the users’ trust towards the RS. A RS is stable if the pre-
dicitions it provides do not change strongly over a short period
4.6. Re
The
about
recom
{1,. . .,
value o
degree
value
able if
obtain
In H
ing the
ble to
measu
cross v
ated to
vides a
which
probab
quentl
provid
for tak
measu
algorit
The
based
ity of
measu
Fig. 7. Recommender systems evalu
p1,. . .,pn represents the recommendation list, ru,pi represents
the true rating of the user u for the item pi, k is the rank of the eval-
uated item, d is the default rating, a is the number of the item on
the list such that there is a 50% chance the user will review that
item.
4.4. Novelty and diversity
The novelty evaluation measure indicates the degree of differ-
ence between the items recommended to and known by the user.
The diversity quality measure indicates the degree of differentia-
tion among recommended items.
Currently, novelty and diversity measures do not have a stan-
dard; therefore, different authors propose different metrics
[163,220]. Certain authors have [105] used the following:
diversityZu
¼
1
#Zuð#Zu À 1Þ
X
i2Zu
X
j2Zu;j–i
½1 À simði; jÞŠ ð9Þ
noveltyi ¼
1
#Zu À 1
X
j2Zu
½1 À simði; jÞŠ; i 2 Zu ð10Þ
Here, sim(i, j) indicates item to item memory-based CF similar-
4.6. Reliab
The re
about ho
recomme
{1,. . .,5},
value of p
degree th
value 4.5
able if it
obtained
In Her
ing the us
ble to be
measure
cross vali
ated to a
vides a p
which us
probably
quently,
provides
p1,. . .,pn represents the recommendation list, ru,pi represents
the true rating of the user u for the item pi, k is the rank of the eval-
uated item, d is the default rating, a is the number of the item on
the list such that there is a 50% chance the user will review that
item.
4.4. Novelty and diversity
The novelty evaluation measure indicates the degree of differ-
ence between the items recommended to and known by the user.
The diversity quality measure indicates the degree of differentia-
tion among recommended items.
Currently, novelty and diversity measures do not have a stan-
dard; therefore, different authors propose different metrics
[163,220]. Certain authors have [105] used the following:
diversityZu
¼
1
#Zuð#Zu À 1Þ
X
i2Zu
X
j2Zu;j–i
½1 À simði; jÞŠ ð9Þ
noveltyi ¼
1
#Zu À 1
X
j2Zu
½1 À simði; jÞŠ; i 2 Zu ð10Þ
Here, sim(i, j) indicates item to item memory-based CF similar-
ity measures. Zu indicates the set of n recommendations to user u.
4.5. Stability
The stability in the predictions and recommendations influ-
ences on the users’ trust towards the RS. A RS is stable if the pre-
dicitions it provides do not change strongly over a short period
of time. Adomavicius and Zhang [4] propose a quality measure of
stability, MAS (Mean Absolute Shift). This measure is defined
4.6. R
Th
abou
recom
{1,. . .
value
degre
value
able
obtai
In
ing th
ble to
meas
cross
ated
vides
whic
proba
quen
provi
for ta
meas
algor
Th
based
ity o
meas
ratin
follow
Fig. 7. Recommender systems eva
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Mesures d’évaluation (3)
— D’autres mesures orientées utilisateurs
— Pertinence (accuracy) perçue par l’utilisateur
— Familiarité : les items sont connus (leur existence) des utilisateurs
— Nouveauté : découverte d’items nouveaux
— Attractivité : les items attirent les utilisateurs (pas toujours le cas d’items
pertinents…)
— Utilité : les items ont été appréciés (après usage / lecture)
— Compatibilité avec le contexte de l’utilisateur
— Niveau de l’interaction
— Contrôle des paramètres
— Explications de la recommandation
— Transparence de la méthode
63
Pu	P,	Chen	L.	A	User-Centric	Evaluation	Framework	of	Recommender	Systems.	In	:	ACM	RecSys	2010	Workshop	on	User-Centric	Evaluation	of	
Recommender	Systems	and	Their	Interfaces	;	2010:14-22.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
FILTRAGE COLLABORATIF
64
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 65
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Filtrage collaboratif
Nous sommes des êtres « sociaux »
— Les « autres » dictent / influencent nos choix
— Nos relations sont typées (amis / ennemis, famille, relations professionnelles…)
— « Dis moi qui sont tes amis, je te dirai qui tu es » — homophilie
66
C Pairs (user, item) that have not been voted for and k
accept predictions
D Pairs (user, item) that have not been voted for
Exy Items that have recently been voted for by both user x ft userl,
and user y user2
S„ User's recent votes user, ji
Table 4
Running example: RS database.
ru,¡ h h h
u Í5 Í6 h Í8 h ho in Í12 Í13 Í14
Ui 5 • 3 • 4 • • 4 • 2 4 •
u?. 1 • 2 4 1 4 1
U3 5 2 4 • • 3 5 4 • • 4 •
U4 4 • 3 • • • 5 4 • • • •
¡A 3 3 4 5 • • 5 •
2.3. Obtaining a user's K-neighbors
2.3.1. Formalization
aggre
Let G
Pu,i =
Pu,i =
Pu,i =
where
W
none
make
the R
pleme
Notes	explicites	vs.	notes	implicites	(nombre	d’accès	ou	de	citations,	temps	passé…)
Notes	à	prédire
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 67
Filtrage collaboratif : similarités et voisinages
68
Variante	:	item	to	item
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Quelles fonctions de similarité ?
- Corrélation de Pearson
- Corrélation de Spearman sur les rangs
- Cosinus
- Distance euclidienne
- Métriques plus complexes:
- JMSD pour intégrer des informations non numériques
(combinaison de Pearson et de Jaccard)
- « Optimum de Pareto » pour filtrer les individus les moins
représentatifs
- Intégration des scores des autres individus / autres items
69
lu;i ¼ 1
X
n2Gu;i
simðu; nÞ () Gu;i – £
,
:
The most popular similarity metrics are Pearson
cosine (7), constrained Pearson’s correlation (8)
rank correlation (9):
sim x; yð Þ ¼
P
i rx;i À rx
À Á
ry;i À ry
À Á
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i rx;i À rx
À Á2P
i ry;i À ry
À Á2
q ;
sim x; yð Þ ¼
P
irx;iry;i
ffiffiffiffiffiffiffiffiffiffiffiffiP
ir2
x;i
q ffiffiffiffiffiffiffiffiffiffiffiffiP
ir2
y;i
q ;
n2Gu;i
pu;i ¼ ru þ lu;i
X
n2Gu;i
sim u; nð Þ rn;i À rn
À Á
()
where l serves as a normalizing factor, usu
lu;i ¼ 1
X
n2Gu;i
simðu; nÞ () Gu;i – £
,
:
The most popular similarity metrics are
cosine (7), constrained Pearson’s correlat
rank correlation (9):
sim x; yð Þ ¼
P
i rx;i À rx
À Á
ry;i À ry
À Á
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i rx;i À rx
À Á2P
i ry;i À ry
À Á2
q ;
sim x; yð Þ ¼
P
irx;iry;i
ffiffiffiffiffiffiffiffiffiffiffiffiP
ir2
x;i
q ffiffiffiffiffiffiffiffiffiffiffiffiP
ir2
y;i
q ;
publications and reviews also exist which include the most com-
monly accepted metrics, aggregation approaches and evaluation
measures: mean absolute error, coverage, precision, recall and
derivatives of these: mean squared error, normalized mean absolute
error, ROC and fallout; Goldberg et al. [13] focuses on the aspects not
related to the evaluation, Breese et al. [6] compare the predictive
accuracy of various methods in a set of representative problem
domains. Candillier et al. [7] and Schafer et al. [36] review the main
collaborative filtering methods proposed in the literature.
The rest of the paper is structured as follows:
 In Section 2 we provide the basis for the principles on which the
design of the new metric will be based, we present graphs
which show the way in which the users vote, we carry out
experiments which support the decisions made, we establish
the best way of selecting numerical and non-numerical infor-
mation from the votes and, finally, we establish the hypothesis
on which the paper and its proposed metric are based.
 In Section 3 we establish the mathematical formulation of the
metric.
 In Sections 4 and 5, respectively, we list the experiments that
will be carried out and we present and discuss the results
obtained.
 Section 6 presents the most relevant conclusions of the
publication.
2. Approach and design of the new similarity metric
2.1. Introduction
Collaborative filtering methods work on a table of U users who
can rate I items. The prediction of a non-rated item i for a user u is
sim x; yð Þ ¼
P
i rx;i À rmed
À Á
ry;i À rmed
À Á
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i rx;i À rmed
À Á2
q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i ry;i À rmed
À Á2
q ;
rmed : median value in the rating scale;
sim x; yð Þ ¼
P
i rankx;i À rankx
 
ranky;i À ranky
 
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i rankx;i À rankx
 2P
i ranky;i À ranky
 2
r :
Although Pearson correlation is the most commonly used
ric in the process of memory-based CF (user to user), this cho
not always backed by the nature and distribution of the data i
RS. Formally, in order to be able to apply this metric with gu
tees, the following assumptions must be met:
 Linear relationship between x and y.
 Continuous random variables.
 Both variables must be normally distributed.
These conditions are not normally met in real RS, and Pea
correlation presents some significant cases of erroneous oper
that should not be ignored in RS.
Despite the deficiencies of Pearson correlation, this simi
measure presents the best prediction and recommendation re
in CF-based RS [15,16,31,7,35], furthermore, it is the most
monly used, and therefore, any alternative metric proposed
improve its results.
On accepting that Pearson correlation is the metric for w
the results must be improved, but not necessarily the most ap
priate to be taken as a base, it is advisable to focus on the info
tion that is obtained in the different research processes and w
can sometimes be overlooked when searching for other diff
J. Bobadilla et al. / Knowledge-Based Systems 23 (2010) 520–528
x y
using standardized values [0..1]:
rx : 0:75; 1; ; 0:5; 0:25; ; 0; 0ð Þ;
ry : 0:75; 0:5; 0; 0:25; ; 0:5; 0:75; ð Þ:
We define the cardinality of a list: #l as the number of elements
in the list l different to .
(1) We obtain the list
dx;y : d
1
x;y; d
2
x;y; d
3
x;y; . . . ; d
I
x;y
 
j
d
i
x;y ¼ ri
x À ri
y
 2
8ijri
x –  ^ri
y – ; d
i
x;y ¼ 8ijri
x ¼  _ ri
y ¼ ;
ð10Þ
in our example:
dx;y ¼ ð0; 0:25; ; 0:0625; ; ; 0:5625; Þ:
(2) We obtain the MSD(x,y) measure computing the arithmetic
average of the values in the list dx,y
MSDðx; yÞ ¼ dx;y ¼
P
i¼1::I;di
x;y–
d
i
x;y
#dx;y
; ð11Þ
in our example:
dx;y ¼ ð0 þ 0:25 þ 0:0625 þ 0:5625Þ=4 ¼ 0:218
MSD(x,y) (11) tends towards zero as the ratings of users x and
y become more similar and tends towards 1 as they became
more different (we assume that the votes are normalized in
the interval [0..1]).
(3) We obtain the Jaccard(x,y) measure computing the propor-
tion between the number of positions [1..I] in which there
are elements different to  in both rx and ry regarding the
number of positions [1..I] in which there are elements differ-
ent to  in rx or in ry:
Jaccardðx; yÞ ¼
rx  ry
rx [ ry
¼
#dx;y
#rx þ #ry À #dx;y
; ð12Þ
in our example: 4/(6 + 6À4) = 0.5.
(4) We combine the above elements in the final equation:
newmetric x; yð Þ ¼ Jaccard x; yð Þ Â 1 À MSD x; yð Þð Þ; ð13Þ
in the running example:
Table 1
Main parameters of the databases used in the experiments.
MovieLens FilmAffinity NetFlix
Number of users 4382 26447 480189
Number of movies 3952 21128 17770
Number of ratings 1000209 19126278 100480507
Min and max values 1–5 1–10 1–5
Ortega,	F.,	SáNchez,	J.	L.,	Bobadilla,	J.,		GutiéRrez,	A.	(2013).	Improving	collaborative	filtering-based	recommender	systems	results	using	Pareto	dominance.	Information	Sciences,	239,	50-61.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 70
Coverage Recall
CORR	:	Pearson	;	COS	:	cosinus	;	EUC	:	euclidienne	;	MSD	:	mean	squared	differences
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 71
commonly used due to its low capacity to produce new recom-
mendations.
MSD offers both a great advantage and a great disadvantage at
the same time; the advantage is that it generates very good general
results: low average error, high percentage of correct predictions
and low percentage of incorrect predictions: the disadvantage is
that it has an intrinsic tendency to choose as similar users to one
given user those users who have rated a very small number of
items [35], e.g. if we have 7 items that can be rated from 1 to 5
and three users u1, u2, u3 with the following ratings: u1: (, , 4,
5, , , ), u2: (3, 4, 5, 5, 1, 4, ), u3: (3, 5, 4, 5, , 3, ) ( means
not rated item), the MSD metric will indicate that (u1,u3) have a to-
tal similarity (0), (u1,u2) have a similarity 0.5 and (u2,u3) have a
lower similarity (0.6). This situation is not convincing, as intuitively
we realize u2 and u3 are very similar, whilst u1 is only similar to u2
and u3 in 2 ratios, and, therefore, it is not logical to choose it as the
most similar to them, and what is worse, if it is chosen it will not
provide us with possibilities to recommend new items.
The strategy to follow to design the new metric is to consider-
ably raise the capacity to generate MSD predictions, without losing
along the way its good behavior as regards accuracy and quality of
the results.
The metric designed is based on two factors:
 The similarity between two users calculated as the mean of the
squared differences (MSD): the smaller these differences, the
greater the similarity between the 2 users. This part of the met-
ric enables very good accuracy results to be obtained.
 The number of items in which both one user and the other have
made a rating regarding the total number of items which have
been rated between the two users. E.g. given users u1: (3, 2,
4, , , ) and u2: (, 4, 4, 3, , 1), a common rating has been
made in two items as regards a joint rating of five items. This
factor enables us to greatly improve the metric’s capacity to
make predictions.
An important design aspect is the decision whether not to use a
parameter for which the value should be given arbitrarily, i.e. the
result provided by the metric should be obtained by only taking
the values of the ratings provided by the users of the RS.
By working on the 2 factors with standardized values [0..1], the
metric obtained is as follows: Given the lists of ratings of 2 generic
users x and y: rx; ry
À Á
: r1
x ; r2
x ; r3
x ; . . . ; rI
x
À Á
; r1
y ; r2
y ; r3
y ; ; . . . ; rI
y
 
j I is the
number of items of our RS, where one of the possible values of each
Fig. 5. MAE and coverage obtained with Pearson correlation and by combining Jaccard with Pearson correlation, cosine, constrained Pearson’s correlation, Spearman rank
correlation and mean squared differences. (A) MAE, (B) Coverage. MovieLens 1M, 20% of test users, 20% of test items, k e [2..1500] step 25.
Fig. 4. Measurements related to the Jaccard metric on MovieLens. (A) Number of pairs of users that display the Jaccard values represented on the x axis. (B) Averaged MAE
obtained in the pairs of users with the Jaccard values represented on the x axis. (C) Averaged coverages obtained in the pairs of users with the Jaccard values represented on
the x axis.
Bobadilla, J., Serradilla, F.,  Bernal, J. (2010). A new collaborative filtering metric that improves the behavior of
recommender systems. Knowledge-Based Systems, 23(6), 520-528.The comparative results in Graph 6B show improvements of up
to 9% when applying the new metric as regards the correlation.
ment in the results of the new metric regarding correlation, even
by 15% in some cases.
Fig. 6. Pearson correlation and new metric comparative results using MovieLens: (A) accuracy, (B) coverage, (C) percentage of perfect predictions, (D) precision/recall. 20% of
test users, 20% of test items, k e [2..1500] step 50, N e [2..20], h = 5.
Fig. 7. Correlation and new metric comparative results using NetFlix: (A) accuracy, (B) coverage, (C) percentage of perfect predictions, (D) precision/recall. 5% of test users,
20% of test items, k e [2..10000] step 100, N e [2..20], h = 9.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 72
Michael	D.	Ekstrand,	Michael	Ludwig,	Joseph	A.	Konstan,	and	John	T.	Riedl.	2011.	Rethinking	The	Recommender	Research	Ecosystem:	
Reproducibility,	Openness,	and	LensKit.	In	Proceedings	of	the	Fifth	ACM	Conference	on	Recommender	Systems	(RecSys	’11).	ACM,	New	York,	NY,	
USA,	133-140.	DOI=10.1145/2043932.2043958.
Evaluation par validation croisée
73
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Le problème du démarrage à froid
— Nouvelle application
— Recommandation éditoriale
— Encourager les utilisateurs à
donner des avis
— Nouvel utilisateur
— Exploiter autant que possible
d’autres informations sur l’utilisateur
— formulaires,
— amis sur les réseaux sociaux
(= demander l’accès)
— préférences sous forme de
tags…
— Nouvel item
— Exploiter les méta-données (pour
un film : année, réalisateur,
acteurs…)
— Exploiter les critiques que l’on
peut trouver par ailleurs sur le Web
74
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 75
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 76
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 77
Amazon : Organisation des objets (catégories)
78
Product Advertising API https://aws.amazon.com/
cf.	http://www.codediesel.com/libraries/amazon-advertising-api-browsenodes/
Similarités et espaces latents
79
Koren	Y,	Bell	R,	Volinsky	C.	Matrix	Factorization	Techniques	for	Recommender	Systems.	IEEE	Computer.	July	2009:42-50.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Projection de la matrice individus / items
— Chaque item I est représenté par un vecteur q de dimension f
— Chaque utilisateur U est représenté par un vecteur p de dim. f
— Chaque facteur représente une propriété latente qui caractérise

les items et qui souligne l’intérêt des utilisateurs pour celle-ci
— Le produit scalaire entre q et p est une estimation de l’intérêt de U pour I
— Méthode :
— Décomposition en valeurs singulières
— Approximation par descente de gradient (sur des données d’apprentissage)
80
note	réelle note	prédite facteur	de	régularisation
constante	de	régularisation	(apprise	par	validation	croisée)
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Espaces latents (suite)
— Espace non convexe : risque de solution éloignée de l’optimum global
— Approche par Moindres Carrés Alternés (Alternating Least Squares)
. Fixe q, cherche p ; fixe p, cherche q etc.
. Utile lorsque les données (notes d’apprentissage) sont implicites (matrice non
creuse)
— Tenir compte des biais = modifier les valeurs prédites
— Des utilisateurs ont tendance à toujours donner de bonnes notes
— Certains items ont toujours tendance à avoir de bonnes notes
— Le score final doit dépendre de la moyenne de tous les scores (base de départ)
— Intégrer les préférences a priori des utilisateurs (x : items préférés de u ; y: attributs (âge…))
— Tenir compte de la dynamique
81
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 82
https://www.slideshare.net/MrChrisJohnson/interactive-recommender-systems-with-netflix-and-spotify/48-Diversity_Scorenote
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 83
Koren	Y,	Bell	R,	Volinsky	C.	Matrix	Factorization	Techniques	for	Recommender	Systems.	IEEE	Computer.	July	2009:42-50.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 84
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 85
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 86
Koren	Y,	Bell	R,	Volinsky	C.	Matrix	Factorization	Techniques	for	Recommender	Systems.	IEEE	Computer.	July	2009:42-50.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Filtrage collaboratif à destination de « groupes »
87
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Intégration du contexte
— Très nombreuses définition du contexte
— Plusieurs stratégies d’intégration
88
Adomavicius	G,	Mobashe	B,	Ricci	F,	Tuzhilin	A.	Context-Aware	Recommender	Systems.	In	AAAI	2011	:;	2017:67-81.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Intégration du contexte (suite)
— Cube Individus x Items x Contextes remplace la matrice Individus x Items
— Factorisation de tenseurs
89
Karatzoglou, A.; Amatriain, X.; Baltrunas, L.; and Oliver, N. 2010. Multiverse Recommendation: N-Dimensional Tensor Factorization
for Context-Aware Collaborative Filtering. In Proceedings of the 2010 ACM Conference on Recommender Systems, 79–86.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Exploitation des liens (réseaux sociaux)
— Le réseau social comme entrée supplémentaire
90
Yang	X,	Guo	Y,	Liu	Y,	Steck	H.	A	survey	of	collaborative	filtering	based	social	recommender	systems.	Computer	
Communications.	2014;41(C):1-10.	doi:10.1016/j.comcom.2013.06.009.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Exploitation des liens (réseaux sociaux) (2)
— Prédiction selon les liens entre individus (inférence Bayésienne)
91
individu	qui	cherche	une	note
individus	qui	ont	noté	l’item
individus	intermédiaires	
qui	réunissent	les	notes
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
FILTRAGE SELON LE
CONTENU
92
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Recommandation basée sur le contenu
— Lien fort avec la Recherche d’Information
— La notion de « Profil utilisateur » est à rapprocher de la notion de « Requête »
93
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 94
https://www.slideshare.net/MrChrisJohnson/interactive-recommender-systems-with-netflix-and-spotify/81-81NLP_models_also_work_on
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Un mot, une chose ? pas si simple…
95
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Contenu audio
96
Wang, X.,  Wang, Y. (2014, November). Improving content-based and hybrid music recommendation using deep learning.
In Proceedings of the 22nd ACM international conference on Multimedia (pp. 627-636). ACM.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
LA RECOMMANDATION DE
LECTURES (LIVRES)
97
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 98
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Recommending Books vs Searching for Books ?
Very diverse needs :
— Topicality
— With a precise context eg. arts in China during the XXth century
— With named entities : locations (the book is about a specific location OR the action
takes place at this location), proper names…
— Style / Expertise / Language
— fiction, novel, essay, proceedings, position papers…
— for experts / for dummies / for children …
— in English, in French, in old French, in (very) local languages …
— looking for citations / references
— in what book appears a given citation
— what are the books that refer to a given one
— Authority :
— What are the most important books about … (what most important means ?)
— What are the most popular books about …
99
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 100
http://social-book-search.humanities.uva.nl/#/overview
2 The Amazon collection
The document used for this year’s Book Track is composed of Amazon pages of
existing books. These pages consist of editorial information such as ISBN num-
ber, title, number of pages etc... However, in this collection the most important
content resides in social data. Indeed Amazon is social-oriented, and user can
comment and rate products they purchased or they own. Reviews are identi-
fied by the review fields and are unique for a single user: Amazon does not
allow a forum-like discussion. They can also assign tags of their creation to a
product. These tags are useful for refining the search of other users in the way
that they are not fixed: they reflect the trends for a specific product. In the
XML documents, they can be found in the tag fields. Apart from this user
classification, Amazon provides its own category labels that are contained in the
browseNode fields.
Table 1. Some facts about the Amazon collection.
Number of pages (i.e. books) 2, 781, 400
Number of reviews 15, 785, 133
Number of pages that contain a least a review 1, 915, 336
3 Retrieval model
3.1 Sequential Dependence Model
Like the previous year, we used a language modeling approach to retrieval [4].
We use Metzler and Croft’s Markov Random Field (MRF) model [5] to integrate
multiword phrases in the query. Specifically, we use the Sequential Dependance
Organizers
Marijn Koolen (University of Amsterdam)

Toine Bogers (Aalborg University Copenhagen)

Antal van den Bosch (Radboud University Nijmegen)

Antoine Doucet (University of Caen)

Maria Gaede (Humboldt University Berlin)

Preben Hansen (Stockholm University)

Mark Hall (Edge Hill University)

Iris Hendrickx (Radboud University Nijmegen)

Hugo Huurdeman (University of Amsterdam)

Jaap Kamps (University of Amsterdam)

Vivien Petras (Humboldt University Berlin)

Michael Preminger (Oslo and Akershus University College of Applied Sciences)

Mette Skov (Aalborg University Copenhagen)

Suzan Verberne (Radboud University Nijmegen)

David Walsh (Edge Hill University)
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 101
http://social-book-search.humanities.uva.nl
SBS Collection : des requêtes réelles issues du
forum Library Thing
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 102
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 103
Le	catalogue	de		
la	personne	qui	
pose	la	question
Social Tagging
104
Complement	categories		but	a	lot	of	tags	!
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 105
Des profils « utilisateur » (catalog, reviews, ratings)
Idée : utiliser les critiques et commentaires
plutôt que les contenus
106
Commentaires	

contiennent:	
- keywords	
- topics	
- sentiment	
- abstracts	
- other	books
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 107
6
La Recommandation de Livres / RI
SBS 2016 – Dataset : Amazon collection of 2.8M records
Index Fields
Université Aix-Marseille Amal Htait
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 108
7
La Recommandation de Livres / RI
SBS 2016 – Dataset : LibraryThing Collection of 113,490 users profiles
userid workid author booktitle publication-year catalogue-date rating tags
u3266995 660947 Rosina Lippi Homestead 1999 2006-06 10.0 fiction
u1885143 2729214 Ellen Hopkins Glass 2009 2009-05 6.0 drugs
u1885143 133315 Tite Kubo Bleach, Vol. 1 2004 2009-06 6.0 manga
Index Fields
Université Aix-Marseille Amal Htait
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 109
8
La Recommandation de Livres / RI
SBS 2016 - Topics Query : Traitement de la requête par
les Informations des Livres en Exemples
Université Aix-Marseille Amal Htait
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 110
9
La Recommandation de Livres / RI
SBS 2016 - Retrieval Model : Méthode - SDM
Weighting query terms [Metzler2005]
● Unigram matches
● Bigram exact matches
● Bigram matches within an
un-ordered window of 8 terme
Université Aix-Marseille Amal Htait
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 111
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 112
Koolen, M., Bogers, T., Gäde, M., Hall, M., Hendrickx, I., Huurdeman, H., ...  Walsh, D. (2016, September). Overview of the CLEF 2016 Social
Book Search Lab. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 351-370). Springer
International Publishing.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 113
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 114
http://ceur-ws.org/Vol-1609/
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Building a Graph of Books
— Nodes = books + properties (metadata, #reviews and ranking, page ranks, ratings…)
— Edges = links between books
— Book A refers to Book B according to:

— Bibliographic references and citations (in the book / in the reviews)

— Amazon recommendation (People who bought A bought B, People who liked A liked B…)
— A is similar to B

— They share bibliographic references

— Full-text similarity + similarity between the metadata
115
The	graph	allows	to	
estimate		
—	«	Book	Ranks	»	
(cf.	the	Google’s	Page	Rank)	
—	Neighborhood	
—	Shortest	paths
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 116
Jeh, G.,  Widom, J. (2002, July). SimRank: a measure of structural-context similarity. In Proceedings of the eighth ACM
SIGKDD international conference on Knowledge discovery and data mining (pp. 538-543). ACM.
Recommending books : IR + graph mining
117
IR : Sequential Dependance Model (SDM) - Markov Random Field (Metzler  Croft, 2004) and/or Divergence
From Randomness (InL2) model + Query Expansion with Dependance Analysis
Ratings : The more a book has reviews and the more it has good ratings, the more relevant it is.
Graph : Expanding the retrieved books with Similar Books then Reranking with PageRank
13
●
We tested many reranking methods. Combining the
retrieval model scores and other scores based on social
information.
●
For each document compute:
– PageRank: algorithm that exploits link structure to score
the important of nodes in the graph.
– Likeliness: Computed from information generated by
users (reviews and ratings). More the book has a lot of
reviews and good ratings, the more interesting it is.
Graph Modeling – Reranking Schemes
12
ti Retrieving
Collection
DGD
Dti
DStartingNodes
Neighbors
SPnodes
DgraphDgraph
Delete
duplications
D nal
1 2
3
5
6
7
89 + 10
Reranking
11
Graph Modeling - Recommendation
Page	Rank	+	Similar	Products -	Very	good	results	in	2011
(Judgements	obtained	by	crowdsourcing)	
(IR	and	ratings)

P@10	≈	0.58

-	Good	results	in	2014	
(IR,	ratings,	expansion)	
P@10	≈	0.23	;	MAP	≈	0.44



-	in	2015	:	rank	25/47	
(IR	+	graph	

but	graph	improved	IR)	
P@10	≈	0.2	
(best	0.39,	included	

the	price	of	books)
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Une perspective: fouille de graphes multicouches
— Thèse de Mohamed Ettaleb (co-dirigée par Pr. C. Latiri, B. Douhar, P. Bellot)
118
Livres «	similaire	à	»
couche	
«	achetés	
ensemble	»
couche	
auteurs
couche	
tags
Question	:	quels	sous-graphes	fréquents	?	comment	les	interpréter	?
119
Et dans la vraie vie ? (pour nous : OpenEdition)
*ProvenanceDBased*Access*Control*for*the*Cloud!
VGOLAP:*Volunteered*Geographic*OLAP*
BILBO
ÉCHO
CLASSIFICATION
AUTOMATIQUE
ET MÉTADONNÉES
RECOMMANDATION
GRAPHE
DE CONTENUS
questions de
communication
vertigo
edc
echogeo
vertigo
quaderni
BILBO - MISE EN RELATION
DES COMPTES-RENDUS
AVEC LES LIVRES
ÉCHO - ANALYSE
DES SENTIMENTS
Langouet, G., (1986), « Innovations pédago-
giques et technologies éducatives », Revue
française de pédagogie, n° 76, pp. 25-30.
Langouet, G., (1986), « Innovations pédagogiques et technologies éducatives
», Revue française de pédagogie, n° 76, pp. 25-30.
DOI : 10.3406/rfp.1986.1499
18 Voir Permanent Mandates Commission, Minutes of
the Fifteenth Session (Geneva: League of Nations,
1929), pp. 100-1. Pour plus de détails, voir Paul Ghali,
Les nationalités détachées de l’Empire ottoman à la
suite de la guerre (Paris: Les Éditions Domat-Mont-
chrestien, 1934), pp. 221-6.
ils ont déjà édité trois recueils et auxquelles ils ont consacré
de nombreux travaux critiques. Leur nouvel ouvrage, intitulé
Le Roman véritable. Stratégies préfacielles au XVIIIe siècle
et rédigé à six mains par Jan Herman, Mladen Kozul et
Nathalie Kremer – chaque auteur se chargeant de certains
chapitres au sein
BILBO
NIVEAU 1
NIVEAU 2
NIVEAU 3
biblauthorsurnameLangouet/surname,
forenameG./forename,/author (date1986/date),
title level=a« Innovations pédagogiques et
technologies éducatives »/title, title
level=jRevue française de pédagogie/title,
abbrn°/abbr biblScope
type=issue76/biblScope, abbrpp./abbr
biblScope type=page25-30/biblScope. idno
type=DOIDOI : 10.3406/rfp.1986.1499/idno/bibl
RI sociale
Extraction d’information par Programmation Logique Inductiveles de langue temporels et apprentissage de méta-caractéristiques
OpenEdition
OpenEdition
Univ. Recife (Brésil)
Extraction	d’information
Chercher	des	critiques	
Les	reliers	aux	livres
Analyse	de	sentiments Recommandation	de	livres
SVM	-	Z-score	-	CRF Graph	scoring
NOTES	
POLARITE
GRAPHE
RECOMMANDATION
Analyse	de	citations
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Identifier des critiques de livres dans des blogs
• Classification supervisée « en genre »
• Caractéristiques : unigrammes, localisation des entités nommées, dates
• Sélection de caractéristiques : Seuil du Z-score + random forest
120
tations (bag-of-words, feature selection using z-
score, Named Entity repartition in the text).
6.1 Naive Bayes (NB)
In order to evaluate different classification mod-
els, we have adopted as a baseline the naive Bayes
approach (Zubaryeva and Savoy, 2010). The clas-
sification system has to choose between two pos-
sible hypotheses: h0 = It is a Review and h1 =
It is not a Review the class that has the maxi-
mum value according to the Equation (5). Where
|w| indicates the number of words included in the
current document and wj is the number of words
that appear in the document.
arg max
hi
P(hi).
|w|
Y
j=1
P(wj|hi) (5)
where P(wj|hi) =
tfj,hi
nhi
We estimate the probabilities with the Equation
(5) and get the relation between the lexical fre-
quency of the word wj in the whole size of the
collection Thi
(denoted tfj,hi
) and the size of the
corresponding corpus.
We have used different strategies to represent each
textual unit. First, the unigram model (Bag-of-
Words) where all words are considered as features.
We also used feature selection based on the nor-
malized z-score by keeping the first 1000 words
according to this score (after removing all words
that appear less than 5 times). As the third ap-
proach, we suggested that the common features
between the Review collection can be located in
the Named Entity distribution in the text.
Table 4: Results showing the performances of
the classification models using different indexing
schemes on the test set. The best values for the
Review class are noted in bold and those for
Review class are are underlined
Review Review
# Model R P F-M R P F-M
1 NB 65.5% 81.5% 72.6% 81.6% 65.7% 72.8%
SVM (Linear) 99.6% 98.3% 98.9% 97.9% 99.5% 98.7%
SVM (RBF) 89.8% 97.2% 93.4% 96.8% 88.5% 92.5%
* C = 5.0
* = 0.00185
2 NB 90.6% 64.2% 75.1% 37.4% 76.3% 50.2%
SVM (Linear) 87.2% 81.3% 84.2% 75.3% 82.7% 78.8%
SVM (RBF) 87.2% 86.5% 86.8% 83.1% 84.0% 83.6%
* C = 32.0
* = 0.00781
3 NB 80.0% 68.4% 73.7% 54.2% 68.7% 60.6%
SVM (Linear) 77.0% 81.9% 79.4% 78.9% 73.5% 76.1%
SVM (RBF) 81.2% 48.6% 79.9% 72.6% 75.8% 74.1%
* C = 8.0
* = 0.03125
Z scores across the corpus.
# Feature Z
score
# Feature Z
score
1 abandonne 30.14 16 winter 9.23
2 seront 30.00 17 cleo 8.88
3 biographie 21.84 18 visible 8.75
4 entranent 21.20 19 fondamentale 8.67
5 prise 21.20 20 david 8.54
6 sacre 21.20 21 pratiques 8.52
7 toute 20.70 22 signification 8.47
8 quitte 19.55 23 01 8.38
9 dimension 15.65 24 institutionnels 8.38
10 les 14.43 25 1930 8.16
11 commandement 11.01 26 attaques 8.14
12 lie 10.61 27 courrier 8.08
13 construisent 10.16 28 moyennes 7.99
14 lieux 10.14 29 petite 7.85
15 garde 9.75 30 adapted 7.84
In our training corpus, we have 106 911 words
obtained from the Bag-of-Words approach. We se-
lected all tokens (features) that appear more than
5 times in each classes. The goal is therefore to
design a method capable of selecting terms that
clearly belong to one genre of documents. We ob-
tained a vector space that contains 5 957 words
(features). After calculating the normalized z-
score of all features, we selected the first 1 000
features according to this score.
(Poibeau, 2003). We aim to explore the distribu-
tion of 3 named entities (”authors’ names”, ”loca-
tions” and ”dates”) in the text after removing all
XML-HTML tags. After that, we divided texts
into 10 parts (the size of each part = total num-
ber of words / 10). The distribution ratio of each
named entity in each part is used as feature to build
the new document representation and we obtained
a set of 30 features.
Figure 3: ”Person” named entity distribution
6 Experiments
In this section we describe results from experi-
ments using a collection of documents from Re-
vues.org and the Web. We use supervised learning
Figure 4: ”Location” named entity distribution
Figure 5: ”Date” named entity distribution
methods to build our classifiers, and evaluate the
resulting models on new test cases. The focus of
our work has been on comparing the effectiveness
of different inductive learning algorithms (Naive
Bayes, Support Vector Machines with RBF and
Linear Kernels) in terms of classification accuracy.
We also explored alternative document represen-
tations (bag-of-words, feature selection using z-
score, Named Entity repartition in the text).
6.2 Supp
SVM desig
by Vapnik
recognition
method is
mization p
tional learn
learn linea
a simple p
tion, they
radial basi
layer sigm
key in suc
mal bound
use them f
garwal and
the differen
used the W
model with
Basic Func
a good lev
growth of
stage.(Kum
6.3 Resu
We have us
textual uni
Words) wh
Figure 4: ”Location” named entity distribution
Figure 5: ”Date” named entity distribution
methods to build our classifiers, and evaluate the
resulting models on new test cases. The focus of
our work has been on comparing the effectiveness
of different inductive learning algorithms (Naive
Bayes, Support Vector Machines with RBF and
Linear Kernels) in terms of classification accuracy.
We also explored alternative document represen-
tations (bag-of-words, feature selection using z-
score, Named Entity repartition in the text).
6.1 Naive Bayes (NB)
In order to evaluate different classification mod-
els, we have adopted as a baseline the naive Bayes
approach (Zubaryeva and Savoy, 2010). The clas-
sification system has to choose between two pos-
sible hypotheses: h0 = It is a Review and h1 =
It is not a Review the class that has the maxi-
mum value according to the Equation (5). Where
6.2 Support Vector Machines (SVM)
SVM designates a learning approach introduced
by Vapnik in 1995 for solving two-class pattern
recognition problem (Vapnik, 1995). The SVM
method is based on the Structural Risk Mini-
mization principle (Vapnik, 1995) from computa-
tional learning theory. In their basic form, SVMs
learn linear threshold function. Nevertheless, by
a simple plug-in of an appropriate kernel func-
tion, they can be used to learn linear classifiers,
radial basic function (RBF) networks, and three-
layer sigmoid neural nets (Joachims, 1998). The
key in such classifiers is to determine the opti-
mal boundaries between the different classes and
use them for the purposes of classification (Ag-
garwal and Zhai, 2012). Having the vectors form
the different representations presented below. we
used the Weka toolkit to learning model. This
model with the use of the linear kernel and Radial
Basic Function(RBF) sometimes allows to reach
a good level of performance at the cost of fast
growth of the processing time during the learning
stage.(Kummer, 2012)
6.3 Results
We have used different strategies to represent each
textual unit. First, the unigram model (Bag-of-
Words) where all words are considered as features.
We also used feature selection based on the nor-
malized z-score by keeping the first 1000 words
according to this score (after removing all words
that appear less than 5 times). As the third ap-
proach, we suggested that the common features
between the Review collection can be located in
the Named Entity distribution in the text.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
L’analyse de sentiments sur les critiques
• Statistical Metrics (PMI, Z-score, odd ratio…)
• Combined with Linguistic Ressources
121
Z_score for each term ti in a class Cj (tij) by cal-
culating its term relative frequency tfrij in a par-
ticular class Cj, as well as the mean (meani)
which is the term probability over the whole cor-
pus multiplied by nj the number of terms in the
class Cj, and standard deviation (sdi) of term ti
according to the underlying corpus (see Eq.
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
The term which has salient frequency in a class
in compassion to others will have a salient
Z_score. Z_score was exploited for SA by
(Zubaryeva and Savoy 2010) , they choose a
threshold (2) for selecting the number of terms
having Z_score more than the threshold, then
they used a logistic regression for combining
Bing Liu's Opinion Lexicon which is created by
(Hu and Liu 2004) and augmented in many latter
works. We extract the number of positive, nega-
tive and neutral words in tweets according to the-
se lexicons. Bing Liu's lexicon only contains
negative and positive annotation but Subjectivity
contains negative, positive and neutral.
- Part Of Speech (POS)
We annotate each word in the tweet by its POS
tag, and then we compute the number of adjec-
tives, verbs, nouns, adverbs and connectors in
each tweet.
4 Evaluation
4.1 Data collection
We used the data set provided in SemEval 2013
and 2014 for subtask B of sentiment analysis in
Twitter(Rosenthal, Ritter et al. 2014) (Wilson,
Kozareva et al. 2013). The participants were
provided with training tweets annotated as posi-
Z_score for each term ti in a class Cj (tij) by cal-
culating its term relative frequency tfrij in a par-
ticular class Cj, as well as the mean (meani)
which is the term probability over the whole cor-
pus multiplied by nj the number of terms in the
class Cj, and standard deviation (sdi) of term ti
according to the underlying corpus (see Eq.
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
The term which has salient frequency in a class
in compassion to others will have a salient
Z_score. Z_score was exploited for SA by
(Zubaryeva and Savoy 2010) , they choose a
threshold (2) for selecting the number of terms
having Z_score more than the threshold, then
they used a logistic regression for combining
these scores. We use Z_scores as added features
for classification because the tweet is too short,
therefore many tweets does not have any words
with salient Z_score. The three following figures
Bin
(Hu
wor
tive
se
neg
con
- Pa
We
tag,
tive
eac
4
4.1
W
and
Tw
Koz
pro
tive
twe
we
of p
pus multiplied by nj the number of term
class Cj, and standard deviation (sdi) o
according to the underlying corpus (
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
The term which has salient frequency in
in compassion to others will have a
Z_score. Z_score was exploited for
(Zubaryeva and Savoy 2010) , they c
threshold (2) for selecting the number
having Z_score more than the thresho
they used a logistic regression for co
these scores. We use Z_scores as added
for classification because the tweet is to
therefore many tweets does not have an
with salient Z_score. The three following
1,2,3 show the distribution of Z_score o
class, we remark that the majority of te
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
The term which has salient frequency in a class
in compassion to others will have a salient
Z_score. Z_score was exploited for SA by
(Zubaryeva and Savoy 2010) , they choose a
threshold (2) for selecting the number of terms
having Z_score more than the threshold, then
they used a logistic regression for combining
these scores. We use Z_scores as added features
for classification because the tweet is too short,
therefore many tweets does not have any words
with salient Z_score. The three following figures
1,2,3 show the distribution of Z_score over each
class, we remark that the majority of terms has
Z_score between -1.5 and 2.5 in each class and
the rest are either vey frequent (2.5) or very rare
(-1.5). It should indicate that negative value
means that the term is not frequent in this class in
comparison with its frequencies in other classes.
Table1 demonstrates the first ten terms having
the highest Z_scores in each class. We have test-
ed to use different values for the threshold, the
best results was obtained when the threshold is 3.
positive
Z_score
negative
Z_score
Neutral
Z_score
Love
Good
Happy
Great
Excite
Best
Thank
Hope
Cant
Wait
14.31
14.01
12.30
11.10
10.35
9.24
9.21
8.24
8.10
8.05
Not
Fuck
Don’t
Shit
Bad
Hate
Sad
Sorry
Cancel
stupid
13.99
12.97
10.97
8.99
8.40
8.29
8.28
8.11
7.53
6.83
Httpbit
Httpfb
Httpbnd
Intern
Nov
Httpdlvr
Open
Live
Cloud
begin
6.44
4.56
3.78
3.58
3.45
3.40
3.30
3.28
3.28
3.17
Table1. The first ten terms having the highest Z_score in
each class
- Part Of Speech (POS)
We annotate each word in the tweet by its POS
tag, and then we compute the number of adjec-
tives, verbs, nouns, adverbs and connectors in
each tweet.
4 Evaluation
4.1 Data collection
We used the data set provided in SemEval 2013
and 2014 for subtask B of sentiment analysis in
Twitter(Rosenthal, Ritter et al. 2014) (Wilson,
Kozareva et al. 2013). The participants were
provided with training tweets annotated as posi-
tive, negative or neutral. We downloaded these
tweets using a given script. Among 9646 tweets,
we could only download 8498 of them because
of protected profiles and deleted tweets. Then,
we used the development set containing 1654
tweets for evaluating our methods. We combined
the development set with training set and built a
new model which predicted the labels of the test
set 2013 and 2014.
4.2 Experiments
Official Results
The results of our system submitted for
SemEval evaluation gave 46.38%, 52.02% for
test set 2013 and 2014 respectively. It should
mention that these results are not correct because
of a software bug discovered after the submis-
sion deadline, therefore the correct results is
demonstrated as non-official results. In fact the
previous results are the output of our classifier
which is trained by all the features in section 3,
but because of index shifting error the test set
was represented by all the features except the
terms.
Non-official Results
We have done various experiments using the
features presented in Section 3 with Multinomial
Naïve-Bayes model. We firstly constructed fea-
ture vector of tweet terms which gave 49%, 46%
features which improve the performance by 6.5%
and 10.9%, then by pre-polarity features which
also improve the f-measure by 4%, 6%, but the
extending with POS tags decreases the f-
measure. We also test all combinations with the-
se previous features, Table2 demonstrates the
results of each combination, we remark that POS
tags are not useful over all the experiments, the
best result is obtained by combining Z_score and
pre-polarity features. We find that Z_score fea-
tures improve significantly the f-measure and
they are better than pre-polarity features.
Figure 1 Z_score distribution in positive class
Figure 2 Z_score distribution in neutral class
Features F-measure
2013 2014
Terms 49.42 46.31
Terms+Z 55.90 57.28
Terms+POS 43.45 41.14
Terms+POL 53.53 52.73
Terms+Z+POS 52.59 54.43
Terms+Z+POL 58.34 59.38
Terms+POS+POL 48.42 50.03
Terms+Z+POS+POL 55.35 58.58
Table 2. Average f-measures for positive and negative clas-
ses of SemEval2013 and 2014 test sets.
We repeated all previous experiments after using
a twitter dictionary where we extend the tweet by
the expressions related to each emotion icons or
abbreviations in tweets. The results in Table3
demonstrate that using that dictionary improves
the f-measure over all the experiments, the best
results obtained also by combining Z_scores and
pre-polarity features.
Features F-measure
2013 2014
Terms 50.15 48.56
Terms+Z 57.17 58.37
Terms+POS 44.07 42.64
Terms+POL 54.72 54.53
Terms+Z+POS 53.20 56.47
Terms+Z+POL 59.66 61.07
Terms+POS+POL 48.97 51.90
Terms+Z+POS+POL 55.83 60.22
Table 3. Average f-measures for positive and negative clas-
ses of SemEval2013 and 2014 test sets after using a twitter
dictionary.
5 Conclusion
In this paper we tested the impact of using
Twitter Dictionary, Sentiment Lexicons, Z_score
features and POS tags for the sentiment classifi-
cation of tweets. We extended the feature vector
of tweets by all these features; we have proposed
new type of features Z_score and demonstrated
that they can improve the performance.
features which improve the performance by 6.5%
and 10.9%, then by pre-polarity features which
also improve the f-measure by 4%, 6%, but the
extending with POS tags decreases the f-
measure. We also test all combinations with the-
se previous features, Table2 demonstrates the
results of each combination, we remark that POS
tags are not useful over all the experiments, the
best result is obtained by combining Z_score and
pre-polarity features. We find that Z_score fea-
tures improve significantly the f-measure and
they are better than pre-polarity features.
Figure 1 Z_score distribution in positive class
Features F-measure
2013 2014
Terms 49.42 46.31
Terms+Z 55.90 57.28
Terms+POS 43.45 41.14
Terms+POL 53.53 52.73
Terms+Z+POS 52.59 54.43
Terms+Z+POL 58.34 59.38
Terms+POS+POL 48.42 50.03
Terms+Z+POS+POL 55.35 58.58
Table 2. Average f-measures for positive and negative clas-
ses of SemEval2013 and 2014 test sets.
We repeated all previous experiments after using
a twitter dictionary where we extend the tweet by
the expressions related to each emotion icons or
abbreviations in tweets. The results in Table3
demonstrate that using that dictionary improves
the f-measure over all the experiments, the best
results obtained also by combining Z_scores and
pre-polarity features.
Features F-measure
2013 2014
Terms 50.15 48.56
Terms+Z 57.17 58.37
Terms+POS 44.07 42.64
Terms+POL 54.72 54.53
Terms+Z+POS 53.20 56.47
Terms+Z+POL 59.66 61.07
Terms+POS+POL 48.97 51.90
Terms+Z+POS+POL 55.83 60.22
Table 3. Average f-measures for positive and negative clas-
ses of SemEval2013 and 2014 test sets after using a twitter
dictionary.
5 Conclusion
[Hamdan,	Béchet		Bellot,	SemEval	2014]
http://sentiwordnet.isti.cnr.it
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 122
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 123
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 124
http://reviewofbooks.openeditionlab.org
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Linking Contents by Analyzing the References
In Books : no common stylesheet (or a lot of stylesheets poorly respected…)
Our proposal :
1) Searching for references in the document / footnotes (Support Vector Machines)
2) Annotating the references (Conditional Random Fields)
BILBO : Our (open-source) software for Reference Analysis
125
Google Digital Humanities Research Awards (2012)
Annotation
DOI	search	
(Crossref)
OpenEdition	Journals	:	more	than	1.5	million	references	analyzed
Test	:	http://bilbo.openeditionlab.org	
Sources	:	http://github.com/OpenEdition/bilbo
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 126
Test	:	http://bilbo.openeditionlab.org	
Sources	:	http://github.com/OpenEdition/bilbo
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 127
Ollagnier, A., Fournier, S.,  Bellot, P. (2016). A Supervised Approach for Detecting Allusive Bibliographical References in
Scholarly Publications. In WIMS (p. 36).
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 128
thèse	de	Doctorat	de	Anaïs	Ollagnier	(dir.	P.	Bellot	/	S.	Fournier)
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 129
http://sentiment-analyser.openeditionlab.org/aboutsemeval
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
SYSTÈMES HYBRIDES
130
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 131
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 132
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
CONCLUSION
133
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Conclusion
— De très nombreuses approches (hybrides)
— Filtrage collaboratif et exploitation de l’historique
— Analyse des contenus
— Exploitation de données comportementales et d’informations
explicites
— Exploitation des réseaux sociaux
— - tout combiner dans un seul modèle d’apprentissage ?

Quelle fonction à optimiser ?
— Des liens forts avec d’autres domaines
— Méthodes statistiques, fouille de données et de graphes,
apprentissage…
— Recherche d’information (n’est-ce pas aussi de la recommandation ?),
traitement automatique des langues, analyse d’image/signal, ergonomie
et interaction…
— Il faut choisir les approches mais aussi les données
— Usages et contextes
— Préservation de la vie privée
134
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 135
https://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 136
http://lenskit.org
Michael	D.	Ekstrand,	Michael	Ludwig,	Joseph	A.	Konstan,	and	John	T.	Riedl.	2011.	Rethinking	The	Recommender	Research	Ecosystem:	
Reproducibility,	Openness,	and	LensKit.	In	Proceedings	of	the	Fifth	ACM	Conference	on	Recommender	Systems	(RecSys	’11).	ACM,	New	York,	NY,	
USA,	133-140.	DOI=10.1145/2043932.2043958.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 137
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 138
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 139
Çoba, L.,  Zanker, M. rrecsys: an R-package for prototyping recommendation algorithms, RecSys 2016.
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition)
Challenges
140
P.	Bellot	(AMU-CNRS,	LSIS-OpenEdition) 141
http://lab.hypotheses.org
Merci	de	votre	attention	:-)

Mais conteúdo relacionado

Mais procurados

Introduction to active learning
Introduction to active learningIntroduction to active learning
Introduction to active learning
Alexey Voropaev
 
Natural Language Processing in Practice
Natural Language Processing in PracticeNatural Language Processing in Practice
Natural Language Processing in Practice
Vsevolod Dyomkin
 

Mais procurados (20)

Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
Thai Word Embedding with Tensorflow
Thai Word Embedding with Tensorflow Thai Word Embedding with Tensorflow
Thai Word Embedding with Tensorflow
 
Aspects of NLP Practice
Aspects of NLP PracticeAspects of NLP Practice
Aspects of NLP Practice
 
Language models
Language modelsLanguage models
Language models
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensim
 
Text Mining for Lexicography
Text Mining for LexicographyText Mining for Lexicography
Text Mining for Lexicography
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Semantics and Computational Semantics
Semantics and Computational SemanticsSemantics and Computational Semantics
Semantics and Computational Semantics
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
Introduction to active learning
Introduction to active learningIntroduction to active learning
Introduction to active learning
 
Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?
 
Topicmodels
TopicmodelsTopicmodels
Topicmodels
 
Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)Overview of text mining and NLP (+software)
Overview of text mining and NLP (+software)
 
Natural Language Processing in Practice
Natural Language Processing in PracticeNatural Language Processing in Practice
Natural Language Processing in Practice
 
OUTDATED Text Mining 5/5: Information Extraction
OUTDATED Text Mining 5/5: Information ExtractionOUTDATED Text Mining 5/5: Information Extraction
OUTDATED Text Mining 5/5: Information Extraction
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational Semantics
 

Semelhante a Recommandation sociale : filtrage collaboratif et par le contenu

Towards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsTowards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in Linguistics
Richard Littauer
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Carole Goble
 
Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Aaron Sloman
 
Science 2.0 and language technology
Science 2.0 and language technologyScience 2.0 and language technology
Science 2.0 and language technology
fridolin.wild
 
Setting the Scene for ViBRANT – Strategy, Philosophy and Communication
Setting the Scene for ViBRANT – Strategy, Philosophy and CommunicationSetting the Scene for ViBRANT – Strategy, Philosophy and Communication
Setting the Scene for ViBRANT – Strategy, Philosophy and Communication
vbrant
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
Bertram Ludäscher
 

Semelhante a Recommandation sociale : filtrage collaboratif et par le contenu (20)

Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Striving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational ModellingStriving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational Modelling
 
Towards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in LinguisticsTowards Open Methods: Using Scientific Workflows in Linguistics
Towards Open Methods: Using Scientific Workflows in Linguistics
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
 
Model-Driven Research in Social Computing
Model-Driven Research in Social ComputingModel-Driven Research in Social Computing
Model-Driven Research in Social Computing
 
Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...Ontologies for baby animals and robots From "baby stuff" to the world of adul...
Ontologies for baby animals and robots From "baby stuff" to the world of adul...
 
Rapid biomedical search
Rapid biomedical search Rapid biomedical search
Rapid biomedical search
 
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMsNG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
 
Science 2.0 and language technology
Science 2.0 and language technologyScience 2.0 and language technology
Science 2.0 and language technology
 
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday Learning
 
AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011AH-XLDBEurope-position-09 jun2011
AH-XLDBEurope-position-09 jun2011
 
20111022 ontologiescomeofageocas germanymcguinnessfinal
20111022 ontologiescomeofageocas germanymcguinnessfinal20111022 ontologiescomeofageocas germanymcguinnessfinal
20111022 ontologiescomeofageocas germanymcguinnessfinal
 
Setting the Scene for ViBRANT – Strategy, Philosophy and Communication
Setting the Scene for ViBRANT – Strategy, Philosophy and CommunicationSetting the Scene for ViBRANT – Strategy, Philosophy and Communication
Setting the Scene for ViBRANT – Strategy, Philosophy and Communication
 
20110122 vibrant final
20110122 vibrant final20110122 vibrant final
20110122 vibrant final
 
The culture of researchData
The culture of researchDataThe culture of researchData
The culture of researchData
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
From Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science TalesFrom Research Objects to Reproducible Science Tales
From Research Objects to Reproducible Science Tales
 
Science 2.0
Science 2.0Science 2.0
Science 2.0
 

Mais de Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I)

Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I)
 

Mais de Patrice Bellot - Aix-Marseille Université / CNRS (LIS, INS2I) (7)

Introduction à la fouille de textes et positionnement de l'offre logicielle
Introduction à la fouille de textes et positionnement de l'offre logicielleIntroduction à la fouille de textes et positionnement de l'offre logicielle
Introduction à la fouille de textes et positionnement de l'offre logicielle
 
A combination of reduction and expansion approaches to handle with long natur...
A combination of reduction and expansion approaches to handle with long natur...A combination of reduction and expansion approaches to handle with long natur...
A combination of reduction and expansion approaches to handle with long natur...
 
Introduction générale sur les enjeux du Text and Data Mining TDM
Introduction générale sur les enjeux du Text and Data Mining TDMIntroduction générale sur les enjeux du Text and Data Mining TDM
Introduction générale sur les enjeux du Text and Data Mining TDM
 
Scholarly Book Recommendation
Scholarly Book RecommendationScholarly Book Recommendation
Scholarly Book Recommendation
 
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
Infrastructures et recommandations pour les Humanités Numériques - Big Data e...
 
Huma-Num une Infrastructure pour les SHS
Huma-Num une Infrastructure pour les SHSHuma-Num une Infrastructure pour les SHS
Huma-Num une Infrastructure pour les SHS
 
OpenEdition Lab projects in Text Mining
OpenEdition Lab projects in Text MiningOpenEdition Lab projects in Text Mining
OpenEdition Lab projects in Text Mining
 

Último

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
anilsa9823
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
LeenakshiTyagi
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 

Último (20)

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 

Recommandation sociale : filtrage collaboratif et par le contenu

  • 3. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Quelques questions ouvertes… — Est-il utile d’exploiter les méta-données, les contenus, les commentaires ? — Comment relier les contenus les uns aux autres ? — Comment exploiter des contenus de nature différente ? — Comment « comprendre » les besoins des lecteurs ? des requêtes longues ? des profils ? — Quels sont les usages ? Quels sont les besoins ? — Comment aller au-delà de la pertinence informationnelle ? (genre, niveau d’expertise, document récent ou non…) 3 — OpenEdition Lab : un programme de recherche HN — Détecter des tendances, des sujets émergents, les livres « à lire »…

  • 4. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Plan — Quelques exemples : poser les problèmes et les enjeux — Quelles ressources ? — Quelques généralités méthodologiques — Quelques stratégies d’évaluation d’une recommandation — Autour du filtrage collaboratif ( = recommandation « sociale » ?) — Autour de l’analyse de contenu et de la suggestion de contenus . focus sur la recherche de livres par requêtes longues en langue naturelle 4
  • 5. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Introduction Objectifs de la recommandation : — Recommander des « objets » (films, livres, pages Web…) — Prédire les notes que individus donneraient Différents types de recommandation : — Selon des connaissances : caractéristiques sur les individus cibles (âge, salaire…) — Selon les préférences des individus — exprimées par les individus eux-mêmes explicitement — devinées en analysant leur comportement (%) — lien avec classification — En croisant les comportements des individus : filtrage collaboratif — En construisant des profils et en les comparant aux contenus Un grand nombre de sources d’information : — Informations explicitement données par les individus — Les contenus et leurs méta-données — Le Web et les réseaux sociaux (contenus, graphes…) 5
  • 7. P. Bellot (AMU-CNRS, LSIS-OpenEdition) ACM Conférences et ateliers — Conférences : — Recommender Systems RecSys (depuis 2007) — Sessions « Recommendation Systems » à SIGIR, CIKM, — Ateliers : — Context-aware Movie Recommendation (2010+2011) — Information Heterogeneity and Fusion in Recommender Systems (2010+2011) — Large-Scale Recommender Systems and the Netflix Prize Competition (2008) — Recommendation Systems for Software Engineering (2008-14) — Recommender Systems and the Social Web (2012) 7
  • 8. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Articles « systèmes de recommandation » Conférence ACM RecSys (https://recsys.acm.org) 8
  • 21. Amazon Navigation
 Graph : YASIV 21 http://www.yasiv.com/#/Search?q=orwell&category=Books&lang=US
  • 29. Quelques collections de données 29 MSD(x,y) (11) tends towards zero as the ratings of users x and y become more similar and tends towards 1 as they became more different (we assume that the votes are normalized in the interval [0..1]). (3) We obtain the Jaccard(x,y) measure computing the propor- tion between the number of positions [1..I] in which there are elements different to in both rx and ry regarding the number of positions [1..I] in which there are elements differ- ent to in rx or in ry: Jaccardðx; yÞ ¼ rx ry rx [ ry ¼ #dx;y #rx þ #ry À #dx;y ; ð12Þ in our example: 4/(6 + 6À4) = 0.5. (4) We combine the above elements in the final equation: newmetric x; yð Þ ¼ Jaccard x; yð Þ Â 1 À MSD x; yð Þð Þ; ð13Þ in the running example: users taken at rand the remaining 80% w given the huge num its users as test user Table 2 shows th 5. Results In this section w abases specified in T MovieLens, Fig. 7 sho responds to FilmAffi Graph 6A shows t ing Pearson correlat uous). The new m practically all the ex of k-neighborhoods around 0.2 stars in t 150, 200). Graph 6B shows small percentages i improbable that the film that this user h increases, the proba the film also increas Table 1 Main parameters of the databases used in the experiments. MovieLens FilmAffinity NetFlix Number of users 4382 26447 480189 Number of movies 3952 21128 17770 Number of ratings 1000209 19126278 100480507 Min and max values 1–5 1–10 1–5 Table 2 Main parameters used in the experiments. K (MAE, coverage, perfect predictions) Precision/recall
  • 30. 30
  • 44. Des « individus » et des « données » 44 Soient T un tableau croisant n individus I (en lignes) et K variable quantitatives X (en colonnes). xi,k est la valeur de la variable k pour l’indi vidu i : X1 X2 · · · XK variables individus I1 x1,1 x1,2 · · · x1,K I2 x2,1 x2,2 ... x2,K ... ... ... xi,k ... In xn,1 xn,2 ... xn,K Un des objectifs de l’analyse de donn´ees est de d´eterminer des profil d’individus ou, dit autrement, des classes d’individus se ressemblant. Cett ressemblance est d´etermin´ee `a partir des valeurs des variables associ´ees au individus. Un autre objectif concerne les variables elles-mˆemes : calcul des corr´elatio entre elles (`a quel point une ´evolution des valeurs de l’une entraˆıne un ´evolution des valeurs de l’autre et de quelle mani`ere), r´egression entre va riables (formulation des liens entre variables)... L’Analyse en Composante Principales (ACP) concerne les liaisons lin´eaires entre variables, par op position aux liaisons quadratiques, logarithmiques ou exponentielles pa exemple. L’ACP fait partie des analyses factorielles qui vont d´etermine
  • 45. P. Bellot • L’analyse des données peut être conduite selon • les individus : recherche de ressemblance entre les individus (en fonction des valeurs des variables) = classification automatique des individus • les variables : quelles sont les variables qui expliquent le mieux les données (les différences entre individus) ? quelles sont les composantes principales ? où se trouve la plus grande variabilité ? Etude des individus / étude des variables 45 Chapitre 2 Analyse en composantes principales Soient T un tableau croisant n individus I (en lignes) et K variables quantitatives X (en colonnes). xi,k est la valeur de la variable k pour l’indi- vidu i : X1 X2 · · · XK variables individus I1 x1,1 x1,2 · · · x1,K I2 x2,1 x2,2 ... x2,K ... ... ... xi,k ... In xn,1 xn,2 ... xn,K Un des objectifs de l’analyse de donn´ees est de d´eterminer des profils d’individus ou, dit autrement, des classes d’individus se ressemblant. Cette ressemblance est d´etermin´ee `a partir des valeurs des variables associ´ees aux individus. Un autre objectif concerne les variables elles-mˆemes : calcul des corr´elations entre elles (`a quel point une ´evolution des valeurs de l’une entraˆıne une ´evolution des valeurs de l’autre et de quelle mani`ere), r´egression entre va- riables (formulation des liens entre variables)... L’Analyse en Composantes Principales (ACP) concerne les liaisons lin´eaires entre variables, par op- position aux liaisons quadratiques, logarithmiques ou exponentielles par exemple. L’ACP fait partie des analyses factorielles qui vont d´eterminer des facteurs `a partir des valeurs des variables associ´ees aux individus. Ces 7 Chapitre 2 Analyse en composantes principales Soient T un tableau croisant n individus I (en lignes) et K variables quantitatives X (en colonnes). xi,k est la valeur de la variable k pour l’indi- vidu i : X1 X2 · · · XK variables individus I1 x1,1 x1,2 · · · x1,K I2 x2,1 x2,2 ... x2,K ... ... ... xi,k ... In xn,1 xn,2 ... xn,K Un des objectifs de l’analyse de donn´ees est de d´eterminer des profils temp-data.frame(temperature[1:12]) cl = kmeans(temp,3,iter.max=2,nstart=15) e) visualisez les classes : summary(cl) cl$cluster summary(cl$cluster) cl$center f) Ajouter le résultat de la classification aux données - utilisez le paquetage cluster pour accéder à la fonction clusplot : library(cluster) - puis : aggregate(temperature,by=list(cl$cluster),FUN=mean) cl2-data.frame(temperature,cl$cluster) clusplot(temperature,cl$cluster,color=TRUE,shade=TRUE,labels=2,lines=0) 5- Question «subsidiaire» : manipulation du paquetage APCluster Installer le paquetage APCluster Polytech’Marseille Page 2 sur 3 -6 -4 -2 0 2 4 6 8 -4-3-2-10123 Individuals factor map (PCA) Dim 1 (86.87%) Dim2(11.42%) Amsterdam Athens Berlin Brussels Budapest Copenhagen Dublin Elsinki Kiev Krakow LisbonLondon Madrid Minsk Moscow Oslo Paris Prague Reykjavik RomeSarajevo Sofia Stockholm Antwerp Barcelona Bordeaux Edinburgh FrankfurtGeneva Genoa Milan Palermo Seville St. Petersburg Zurich East North South West -1.0 -0.5 0.0 0.5 1.0 -1.0-0.50.00.51.0 Variables factor map (PCA) Dim 1 (86.87%) Dim2(11.42%) January February March April May JuneJuly August September October November December Annual Amplitude Latitude Longitude Polytech’Marseille Page 2 sur 4
  • 49. P. Bellot ACP et réduction de la dimension • Une façon de représenter en quelques dimensions des nuages d’individus
 — en conservant au mieux les distances entre les individus
 — en privilégiant les dimensions de plus grande variabilité (sélection itérative des facteurs qui maximisent la variance)
 = application d’une fonction de projection 49
  • 50. P. Bellot 50 Méthodes d’apprentissage • Différentes formes d’apprentissage • Agent « élève » recopie l’agent « maître » -- fournir des exemples • Raisonnement par induction (à partir d’exemples) • Apprentissage de caractéristiques importantes • Détection de patterns récurrents • Ajustement des paramètres importants • Transformation d’informations en connaissances Exemples -- Modèle -- Test -- Correction / Enrichissement des exemples
  • 51. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Approches statistiques, probabilistes Apprentissage automatique 51 Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
 Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001) Yi+1 - c 6 s Xi+1 Yi 1 Yi Yi+1 c s c s c s Xi 1 Xi Xi+1 nd the chain-structured case of CRFs (right) for sequences. e training data. Both algorithms are based on the im- ed iterative scaling (IIS) algorithm of Della Pietra et al. 7); the proof technique based on auxiliary functions be extended to show convergence of the algorithms for exp ( j λjtj(yi−1, yi, x, i) + k µksk(yi, x, i)), (2) (yi−1, yi, x, i) is a transition feature function of the entire observation and the labels at positions i and i−1 in the label sequence; sk(yi, x, i) feature function of the label at position i and the observation sequence; nd µk are parameters to be estimated from training data. n defining feature functions, we construct a set of real-valued features the observation to expresses some characteristic of the empirical dis- n of the training data that should also hold of the model distribution. mple of such a feature is i) = 1 if the observation at position i is the word “September” 0 otherwise. ture function takes on the value of one of these real-valued observation b(x, i) if the current state (in the case of a state function) or previous ent states (in the case of a transition function) take on particular val- feature functions are therefore real-valued. For example, consider the g transition function: tj(yi−1, yi, x, i) = b(x, i) if yi−1 = IN and yi = NNP 0 otherwise. e remainder of this report, notation is simplified by writing s(yi, x, i) = s(yi−1, yi, x, i) Fj(y, x) = n i=1 fj(yi−1, yi, x, i), ch fj(yi−1, yi, x, i) is either a state function s(yi−1, yi, x, i) or a transi- ction t(yi−1, yi, x, i). This allows the probability of a label sequence y observation sequence x to be written as p(y|x, λ) = 1 Z(x) exp ( j λjFj(y, x)). (3) a normalization factor. 4 for classification because the tweet is too short, therefore many tweets does not have any words with salient Z_score. The three following figures 1,2,3 show the distribution of Z_score over each class, we remark that the majority of terms has Z_score between -1.5 and 2.5 in each class and the rest are either vey frequent (2.5) or very rare (-1.5). It should indicate that negative value means that the term is not frequent in this class in comparison with its frequencies in other classes. Table1 demonstrates the first ten terms having the highest Z_scores in each class. We have test- ed to use different values for the threshold, the best results was obtained when the threshold is 3. positive Z_score negative Z_score Neutral Z_score Love Good Happy Great Excite Best Thank Hope Cant Wait 14.31 14.01 12.30 11.10 10.35 9.24 9.21 8.24 8.10 8.05 Not Fuck Don’t Shit Bad Hate Sad Sorry Cancel stupid 13.99 12.97 10.97 8.99 8.40 8.29 8.28 8.11 7.53 6.83 Httpbit Httpfb Httpbnd Intern Nov Httpdlvr Open Live Cloud begin 6.44 4.56 3.78 3.58 3.45 3.40 3.30 3.28 3.28 3.17 Table1. The first ten terms having the highest Z_score in each class - Sentiment Lexicon Features (POL) We used two sentiment lexicons, MPQA Subjec- tivity Lexicon(Wilson, Wiebe et al. 2005) and tweets using a given s we could only downl of protected profiles we used the develop tweets for evaluating o the development set w new model which pre set 2013 and 2014. 4.2 Experiments Official Results The results of ou SemEval evaluation test set 2013 and 20 mention that these res of a software bug dis sion deadline, theref demonstrated as non- previous results are t which is trained by al but because of index was represented by a terms. Non-official Results We have done vario features presented in S Naïve-Bayes model. W ture vector of tweet te for test set 2013, 20 augmented this origin Z_score for each term ti in a class Cj (tij) by cal- culating its term relative frequency tfrij in a par- ticular class Cj, as well as the mean (meani) which is the term probability over the whole cor- pus multiplied by nj the number of terms in the class Cj, and standard deviation (sdi) of term ti according to the underlying corpus (see Eq. (1,2)). Z!#$% !! = !#!!!#$! !# Eq. (1) Z!#$% !! = !#!!!!∗!(!) !∗! ! ∗(!!!(!)) Eq. (2) The term which has salient frequency in a class in compassion to others will have a salient Z_score. Z_score was exploited for SA by (Zubaryeva and Savoy 2010) , they choose a threshold (2) for selecting the number of terms having Z_score more than the threshold, then they used a logistic regression for combining these scores. We use Z_scores as added features for classification because the tweet is too short, therefore many tweets does not have any words Bing Liu's Opinion Lexicon (Hu and Liu 2004) and augm works. We extract the numb tive and neutral words in tw se lexicons. Bing Liu's le negative and positive annot contains negative, positive a - Part Of Speech (POS) We annotate each word in tag, and then we compute tives, verbs, nouns, adverb each tweet. 4 Evaluation 4.1 Data collection We used the data set provi and 2014 for subtask B of Twitter(Rosenthal, Ritter e Kozareva et al. 2013). Th provided with training twee tive, negative or neutral. W tweets using a given script. we could only download 8 Quels sont les mots caractéristiques d’un groupe de documents ? Quelles relations significatives à partir des seules formes observées ? Analogies, corrélations
  • 56. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Grille d’évaluation 56 A. Constructs and Questions of ResQue The following contains the questionnaire statements that can be used in a survey. They are developed based on the ResQue model described in this paper. Users should be asked to indicate their answers to each of the questions using the 1-5 Likert scales, where 1 indicates “strongly disagree” and 5 is “strongly agree.” A1. Quality of Recommended Items A.1.1 Accuracy The items recommended to me matched my interests.* The recommender’s interface provides sufficient informa The information provided for the recommended item sufficient for me. The labels of the recommender interface are clear adequate. The layout of the recommender interface is attractive adequate.* A4. Perceived Ease of Use A.4.1 Ease of Initial Learning 19 Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D. he s. on y. be ed h. ve 4]. es we of ns nd he on of ng of er ed The recommender gave me good suggestions. I am not interested in the items recommended to me (reverse scale). A.1.2 Relative Accuracy The recommendation I received better fits my interests than what I may receive from a friend. A recommendation from my friends better suits my interests than the recommendation from this system (reverse scale). A.1.3 Familiarity Some of the recommended items are familiar to me. I am not familiar with the items that were recommended to me (reverse scale). A.1.4 Attractiveness The items recommended to me are attractive. A.1.5 Enjoyability I enjoyed the items recommended to me. A.1.6 Novelty The items recommended to me are novel and interesting.* ULL PAPER tric Evaluation of Recommender Systems and Their Interfaces (UCERSTI), , Spain, Sep 30, 2010 613-0073, online ceur-ws.org/Vol-612/paper3.pdf them in recent studies. On average, between 12 and 15 questions were used. Based this previous work, we have synthesized and organized a total of 15 questions as a simplified model for the purpose of performing a quick and easy usability and adoption evaluation of a recommender (see questions with * sign). 5. CONCLUSION AND FUTURE WORK User evaluation of recommender systems is a crucial subject of study that requires a deep understanding, development and testing of the right dimensions (or constructs) and the standardization of the questions used. The framework described in this paper presents the first attempt to develop a complete and balanced evaluation framework that measures users’ subjective attitudes based on their experience towards a recommender. ResQue consists of a set of 13 constructs and 60 questions for a high-quality recommender system from the user point of view and can be used as a standard guideline for a user evaluation. It can also be adapted to a custom-made user evaluation by tailoring it in an individual research context. Researchers and practitioners can use these questionnaires with ease to measure users’ general satisfaction with recommenders, their readiness to adopt the technology, and their intention to purchase recommended items and return to the site in the future. After ResQue was finalized, we asked several expert researchers in the community of recommender systems to review the model. Their feedback and comments were then incorporated into the final version of the model. This method, known as the Delphi method, is one of the first validation attempts on the model. Since the work was submitted, we have started conducting a survey to further validate the model’s reliability, validity and sensitivity using factor analysis, structural equation modeling (SEM), and other techniques described in [21]. Initial results based on 150 participants indicate how the model can be interpreted and show factors that correspond to the original model. At the same time, analysis also gives some indications on how to refine the model. More users are expected to participate in the survey and the final outcome will be soon reported. APPENDIX I am not familiar with the items that were recommended to me (reverse scale). A.1.4 Attractiveness The items recommended to me are attractive. A.1.5 Enjoyability I enjoyed the items recommended to me. A.1.6 Novelty The items recommended to me are novel and interesting.* The recommender system is educational. The recommender system helps me discover new products. I could not find new items through the recommender (reverse scale). A.1.6 Diversity The items recommended to me are diverse.* The items recommended to me are similar to each other (reverse scale).* A.1.7 Context Compatibility I was only provided with general recommendations. The items recommended to me took my personal context requirements into consideration. The recommendations are timely. A2. Interaction Adequacy The recommender provides an adequate way for me to express my preferences. The recommender provides an adequate way for me to revise my preferences. The recommender explains why the products are recommended to me.* A3. Interface Adequacy Pu P, Chen L. A User-Centric Evaluation Framework of Recommender Systems. In : ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces ; 2010:14-22.
  • 57. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 57 e i e o y d 0 w , . l e l r e requirements into consideration. The recommendations are timely. A2. Interaction Adequacy The recommender provides an adequate way for me to express my preferences. The recommender provides an adequate way for me to revise my preferences. The recommender explains why the products are recommended to me.* A3. Interface Adequacy The recommender’s interface provides sufficient information. The information provided for the recommended items is sufficient for me. The labels of the recommender interface are clear and adequate. The layout of the recommender interface is attractive and adequate.* A4. Perceived Ease of Use A.4.1 Ease of Initial Learning 19 authors. Copying permitted only for private and academic purposes. editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D. I became familiar with the recommender system very quickly. I easily found the recommended items. Looking for a recommended item required too much effort (reverse scale). A.4.2 Ease of Preference Elicitation I found it easy to tell the system about my preferences. It is easy to learn to tell the system what I like. It required too much effort to tell the system what I like (reversed scale). A.4.3 Ease of Preference Revision The recommender made me more confident about my selection/decision. The recommended items made me confused about my choice (reverse scale). The recommender can be trusted. A8. Behavioral Intentions A.8.1 Intention to Use the System If a recommender such as this exists, I will use it to find products to buy. A.8.2 Continuance and Frequency I will use this recommender again.* FULL PAPER Proceedings of the ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces (UCERSTI), Barcelona, Spain, Sep 30, 2010 Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf I found it easy to tell the system about my preferences. It is easy to learn to tell the system what I like. It required too much effort to tell the system what I like (reversed scale). A.4.3 Ease of Preference Revision I found it easy to make the system recommend different things to me. It is easy to train the system to update my preferences. I found it easy to alter the outcome of the recommended items due to my preference changes. It is easy for me to inform the system if I dislike/like the recommended item. It is easy for me to get a new set of recommendations. A.4.4 Ease of Decision Making Using the recommender to find what I like is easy. I was able to take advantage of the recommender very quickly. I quickly became productive with the recommender. Finding an item to buy with the help of the recommender is easy.* Finding an item to buy, even with the help of the recommender, consumes too much time. A5. Perceived Usefulness The recommended items effectively helped me find the ideal product.* The recommended items influence my selection of products. I feel supported to find what I like with the help of the recommender.* I feel supported in selecting the items to buy with the help of the recommender. A6. Control/Transparency I feel in control of telling the recommender what I want. A8. A.8. A.8. A.8. A.8. 6. [1] [2] [3] [4] [5] [6]Pu P, Chen L. A User-Centric Evaluation Framework of Recommender Systems. In : ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces ; 2010:14-22.
  • 58. 58 The recommended items influence my selection of products. I feel supported to find what I like with the help of the recommender.* I feel supported in selecting the items to buy with the help of the recommender. A6. Control/Transparency I feel in control of telling the recommender what I want. I don’t feel in control of telling the system what I want. I don’t feel in control of specifying and changing my preferences (reverse scale). I understood why the items were recommended to me. The system helps me understand why the items were recommended to me. The system seems to control my decision process rather than me (reverse scale). A7. Attitudes Overall, I am satisfied with the recommender.* I am convinced of the products recommended to me.* I am confident I will like the items recommended to me. * [4] Chen, L. and Pu, P. 2006. Trust Building with Explanation Interfaces. In Proceedings of International Conference on Intelligent User Interface (IUI’06), 93-100. [5] Chen, L. and Pu, P. 2008. A Cross-Cultural User Evaluation of Product Recommender Interfaces. RecSys 2008, 75-82. [6] Chen, L. and Pu, P. 2009. Interaction Design Guidelines on Critiquing-based Recommender Systems. User Modeling and User-Adapted Interaction Journal (UMUAI), Springer Netherlands, Volume 19, Issue3, 167-206. [7] Davis, F.D. 1989. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quart. 13 319-339. [8] Grabner-Kräuter, S. and Kaluscha, E.A. 2003. Empirical research in on-line trust: a review and critical assessment Int. J. Hum.-Comput. Stud. (IJMMS) 58(6), 783-812. [9] Herlocker, J.L., Konstan, J.A., Borchers, A., and Riedl, J. An algorithmic framework for performing collaborative filtering. In Proc. of ACM SIGIR 1999, ACM Press (1999), 230-237. [10] Herlocker, J.L., Konstan, J.A., and Riedl, J. 2000. Explaining collaborative filtering recommendations. CSCW 2000, 241- 250. 20 Copyright © 2010 for the individual papers by the papers' authors. Copying permitted only for private and academic purposes. This volume is published and copyrighted by its editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D. I became familiar with the recommender system very quickly. I easily found the recommended items. Looking for a recommended item required too much effort (reverse scale). A.4.2 Ease of Preference Elicitation I found it easy to tell the system about my preferences. It is easy to learn to tell the system what I like. It required too much effort to tell the system what I like (reversed scale). A.4.3 Ease of Preference Revision I found it easy to make the system recommend different things to me. It is easy to train the system to update my preferences. I found it easy to alter the outcome of the recommended items due to my preference changes. It is easy for me to inform the system if I dislike/like the recommended item. It is easy for me to get a new set of recommendations. A.4.4 Ease of Decision Making Using the recommender to find what I like is easy. I was able to take advantage of the recommender very quickly. I quickly became productive with the recommender. The recommender made me more confident about my selection/decision. The recommended items made me confused about my choice (reverse scale). The recommender can be trusted. A8. Behavioral Intentions A.8.1 Intention to Use the System If a recommender such as this exists, I will use it to find products to buy. A.8.2 Continuance and Frequency I will use this recommender again.* I will use this type of recommender frequently. I prefer to use this type of recommender in the future. A.8.3 Recommendation to Friends I will tell my friends about this recommender.* A.8.4 Purchase Intention I would buy the items recommended, given the opportunity.* 6. REFERENCES [1] Adomavicius, G. and Tuzhilin, A. 2005. Toward the Next Generation of Recommender Systems: A Survey of the State- of-the-Art and Possible Extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734-749. [2] Beenen, G., Ling, K., Wang, X., Chang, K., Frankowski, D., Barcelona, Spain, Sep 30, 2010 Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf Pu P, Chen L. A User-Centric Evaluation Framework of Recommender Systems. In : ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces ; 2010:14-22.
  • 61. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Mesures d’évaluation — Qualité de la prédiction : Mean Absolute Error, Root Mean Squared Error, Coverage — Qualité de la recommandation : Precision, Recall, F1-Measure 61 which in- res: mean s of these: nd fallout; o the eval- cy of vari- s. ents to the mon to at- on, recall, considered pic diversi- ating algo- s, even at se aspects, mendation e methods MAE ¼ 1 #U X u2U 1 #Ou X i2Ou jpu;i À ru;ij ! ð1Þ RMSE ¼ 1 #U X u2U ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 #Ou X i2Ou ðpu;i À ru;iÞ2 s ð2Þ The coverage could be defined as the capacity of predicting from a metric applied to a specific RS. In short, it calculates the percent- age of situations in which at least one k-neighbor of each active user can rate an item that has not been rated yet by that active user. We defined Ku,i as the set of neighbors of u which have rated the item i. We define the coverage of the system as the average of the user’s coverage: Let Cu ¼ fi 2 Ijru;i ¼ ^ Ku;i – £g; Du ¼ fi 2 Ijru;i ¼ g coverage ¼ 1 #U X u2U 100  #Cu #Du ð3Þ . / Knowledge-Based Systems 46 (2013) 109–132 117 squared error, normalized mean absolute error, ROC and fallout; berg et al. [87] focuses on the aspects not related to the eval- n, Breese et al. [43] compare the predictive accuracy of vari- methods in a set of representative problem domains. e majority of articles discuss attempted improvements to the acy of RS results (RMSE, MAE, etc.). It is also common to at- t an improvement in recommendations (precision, recall, etc.). However, additional objectives should be considered nerating greater user satisfaction [253], such as topic diversi- on and coverage serendipity. rrently, the field has a growing interest in generating algo- s with diverse and innovative recommendations, even at xpense of accuracy and precision. To evaluate these aspects, us metrics have been proposed to measure recommendation ty and diversity [105,220]. e frameworks aid in defining and standardizing the methods lgorithms employed by RS as well as the mechanisms to eval- the quality of the results. Among the most significant papers propose CF frameworks are Herlocker et al. [92] which ates the following: similarity weight, significance weighting, nce weighting, selecting neighborhood and rating normaliza- Hernández and Gaudioso [95] proposes a framework in which RS is formed by two different subsystems, one of them to the user and the other to provide useful/interesting items. ika et al. [125] is a framework which introduces levels of action in CF process, making the modifications in the RS more le. Antunes et al. [12] presents an evaluation framework ming that evaluation is an evolving process during the system cle. e majority of RS evaluation frameworks proposed until now nt two deficiencies: the first of these is the lack of formal- n. Although the evaluation metrics are well defined, there RMSE ¼ 1 #U X u2U 1 #Ou X i2Ou ðpu;i À ru;iÞ2 The coverage could be defined as the capacity of predicting a metric applied to a specific RS. In short, it calculates the pe age of situations in which at least one k-neighbor of each user can rate an item that has not been rated yet by that user. We defined Ku,i as the set of neighbors of u which have the item i. We define the coverage of the system as the aver the user’s coverage: Let Cu ¼ fi 2 Ijru;i ¼ ^ Ku;i – £g; Du ¼ fi 2 Ijru;i ¼ g coverage ¼ 1 #U X u2U 100  #Cu #Du 4.2. Quality of the set of recommendations: precision, recall an The confidence of users for a certain RS does not depend d on the accuracy for the set of possible predictions. A user confidence on the RS when this user agrees with a reduced recommendations made by the RS. In this section, we define the following three most widely recommendation quality measures: (1) precision, which ind the proportion of relevant recommended items from the number of recommended items, (2) recall, which indicates th portion of relevant recommended items from the number o vant items, and (3) F1, which is a combination of precisio recall. Let Xu as the set of recommendations to user u, and Zu as t of n recommendations to user u. We will represent the eval erlocker et al. [92] which ight, significance weighting, rhood and rating normaliza- oposes a framework in which ubsystems, one of them to de useful/interesting items. which introduces levels of modifications in the RS more s an evaluation framework g process during the system meworks proposed until now these is the lack of formal- rics are well defined, there ementation of the methods specified, can lead to the similar experiments. The tandardization of the evalu- novelty and trust of the plete series of mathematical uthors provide a set of eval- uality analysis of the follow- tions, novelty and trust. election of the RS evaluation he bibliography. solute error, accuracy and The confidence of users for a certain RS does not depend directly on the accuracy for the set of possible predictions. A user gains confidence on the RS when this user agrees with a reduced set of recommendations made by the RS. In this section, we define the following three most widely used recommendation quality measures: (1) precision, which indicates the proportion of relevant recommended items from the total number of recommended items, (2) recall, which indicates the pro- portion of relevant recommended items from the number of rele- vant items, and (3) F1, which is a combination of precision and recall. Let Xu as the set of recommendations to user u, and Zu as the set of n recommendations to user u. We will represent the evaluation precision, recall and F1 measures for recommendations obtained by making n test recommendations to the user u, taking a h rele- vancy threshold. Assuming that all users accept n test recommendations: precision ¼ 1 #U X u2U #fi 2 Zujru;i P hg n ð4Þ recall ¼ 1 #U X u2U #fi 2 Zujru;i P hg #fi 2 Zujru;i P hg þ # i 2 Zc u ru;i P h È É ð5Þ F1 ¼ 2  precision  recall precision þ recall ð6Þ 4.3. Quality of the list of recommendations: rank measures
  • 62. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Mesures d’évaluation (2) — Qualité d’une liste de recommandations (selon les rangs) : DCG au rang k : 
 . le gain apporté par un item est inversement lié à sa position dans la liste . calculé pour chaque utilisateur (u) puis moyenne sur tous les utilisateur nDCG est la version normalisée selon le « DCG idéal » (liste idéale) — Nouveauté et diversité 62 t mean out. the RS (ru,i = em i on r u hav- system ute dif- informs used are the following standard information retrieval measures: (a) half-life (7) [43], which assumes an exponential decrease in the interest of users as they move away from the recommenda- tions at the top and (b) discounted cumulative gain (8) [17], wherein decay is logarithmic. HL ¼ 1 #U X u2U XN i¼1 maxðru;pi À d; 0Þ 2ðiÀ1Þ=ðaÀ1Þ ð7Þ DCGk ¼ 1 #U X u2U ru;p1 þ Xk i¼2 ru;pi log2ðiÞ ! ð8Þ p1,. . .,pn represents the recommendation list, ru,pi represents the true rating of the user u for the item pi, k is the rank of the eval- uated item, d is the default rating, a is the number of the item on the list such that there is a 50% chance the user will review that item. 4.4. Novelty and diversity The novelty evaluation measure indicates the degree of differ- ence between the items recommended to and known by the user. The diversity quality measure indicates the degree of differentia- tion among recommended items. Currently, novelty and diversity measures do not have a stan- dard; therefore, different authors propose different metrics [163,220]. Certain authors have [105] used the following: diversityZu ¼ 1 #Zuð#Zu À 1Þ X i2Zu X j2Zu;j–i ½1 À simði; jÞŠ ð9Þ noveltyi ¼ 1 #Zu À 1 X j2Zu ½1 À simði; jÞŠ; i 2 Zu ð10Þ Here, sim(i, j) indicates item to item memory-based CF similar- ity measures. Zu indicates the set of n recommendations to user u. 4.5. Stability The stability in the predictions and recommendations influ- ences on the users’ trust towards the RS. A RS is stable if the pre- dicitions it provides do not change strongly over a short period 4.6. Re The about recom {1,. . ., value o degree value able if obtain In H ing the ble to measu cross v ated to vides a which probab quentl provid for tak measu algorit The based ity of measu Fig. 7. Recommender systems evalu p1,. . .,pn represents the recommendation list, ru,pi represents the true rating of the user u for the item pi, k is the rank of the eval- uated item, d is the default rating, a is the number of the item on the list such that there is a 50% chance the user will review that item. 4.4. Novelty and diversity The novelty evaluation measure indicates the degree of differ- ence between the items recommended to and known by the user. The diversity quality measure indicates the degree of differentia- tion among recommended items. Currently, novelty and diversity measures do not have a stan- dard; therefore, different authors propose different metrics [163,220]. Certain authors have [105] used the following: diversityZu ¼ 1 #Zuð#Zu À 1Þ X i2Zu X j2Zu;j–i ½1 À simði; jÞŠ ð9Þ noveltyi ¼ 1 #Zu À 1 X j2Zu ½1 À simði; jÞŠ; i 2 Zu ð10Þ Here, sim(i, j) indicates item to item memory-based CF similar- 4.6. Reliab The re about ho recomme {1,. . .,5}, value of p degree th value 4.5 able if it obtained In Her ing the us ble to be measure cross vali ated to a vides a p which us probably quently, provides p1,. . .,pn represents the recommendation list, ru,pi represents the true rating of the user u for the item pi, k is the rank of the eval- uated item, d is the default rating, a is the number of the item on the list such that there is a 50% chance the user will review that item. 4.4. Novelty and diversity The novelty evaluation measure indicates the degree of differ- ence between the items recommended to and known by the user. The diversity quality measure indicates the degree of differentia- tion among recommended items. Currently, novelty and diversity measures do not have a stan- dard; therefore, different authors propose different metrics [163,220]. Certain authors have [105] used the following: diversityZu ¼ 1 #Zuð#Zu À 1Þ X i2Zu X j2Zu;j–i ½1 À simði; jÞŠ ð9Þ noveltyi ¼ 1 #Zu À 1 X j2Zu ½1 À simði; jÞŠ; i 2 Zu ð10Þ Here, sim(i, j) indicates item to item memory-based CF similar- ity measures. Zu indicates the set of n recommendations to user u. 4.5. Stability The stability in the predictions and recommendations influ- ences on the users’ trust towards the RS. A RS is stable if the pre- dicitions it provides do not change strongly over a short period of time. Adomavicius and Zhang [4] propose a quality measure of stability, MAS (Mean Absolute Shift). This measure is defined 4.6. R Th abou recom {1,. . . value degre value able obtai In ing th ble to meas cross ated vides whic proba quen provi for ta meas algor Th based ity o meas ratin follow Fig. 7. Recommender systems eva
  • 63. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Mesures d’évaluation (3) — D’autres mesures orientées utilisateurs — Pertinence (accuracy) perçue par l’utilisateur — Familiarité : les items sont connus (leur existence) des utilisateurs — Nouveauté : découverte d’items nouveaux — Attractivité : les items attirent les utilisateurs (pas toujours le cas d’items pertinents…) — Utilité : les items ont été appréciés (après usage / lecture) — Compatibilité avec le contexte de l’utilisateur — Niveau de l’interaction — Contrôle des paramètres — Explications de la recommandation — Transparence de la méthode 63 Pu P, Chen L. A User-Centric Evaluation Framework of Recommender Systems. In : ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces ; 2010:14-22.
  • 66. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Filtrage collaboratif Nous sommes des êtres « sociaux » — Les « autres » dictent / influencent nos choix — Nos relations sont typées (amis / ennemis, famille, relations professionnelles…) — « Dis moi qui sont tes amis, je te dirai qui tu es » — homophilie 66 C Pairs (user, item) that have not been voted for and k accept predictions D Pairs (user, item) that have not been voted for Exy Items that have recently been voted for by both user x ft userl, and user y user2 S„ User's recent votes user, ji Table 4 Running example: RS database. ru,¡ h h h u Í5 Í6 h Í8 h ho in Í12 Í13 Í14 Ui 5 • 3 • 4 • • 4 • 2 4 • u?. 1 • 2 4 1 4 1 U3 5 2 4 • • 3 5 4 • • 4 • U4 4 • 3 • • • 5 4 • • • • ¡A 3 3 4 5 • • 5 • 2.3. Obtaining a user's K-neighbors 2.3.1. Formalization aggre Let G Pu,i = Pu,i = Pu,i = where W none make the R pleme Notes explicites vs. notes implicites (nombre d’accès ou de citations, temps passé…) Notes à prédire
  • 68. Filtrage collaboratif : similarités et voisinages 68 Variante : item to item
  • 69. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Quelles fonctions de similarité ? - Corrélation de Pearson - Corrélation de Spearman sur les rangs - Cosinus - Distance euclidienne - Métriques plus complexes: - JMSD pour intégrer des informations non numériques (combinaison de Pearson et de Jaccard) - « Optimum de Pareto » pour filtrer les individus les moins représentatifs - Intégration des scores des autres individus / autres items 69 lu;i ¼ 1 X n2Gu;i simðu; nÞ () Gu;i – £ , : The most popular similarity metrics are Pearson cosine (7), constrained Pearson’s correlation (8) rank correlation (9): sim x; yð Þ ¼ P i rx;i À rx À Á ry;i À ry À Á ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P i rx;i À rx À Á2P i ry;i À ry À Á2 q ; sim x; yð Þ ¼ P irx;iry;i ffiffiffiffiffiffiffiffiffiffiffiffiP ir2 x;i q ffiffiffiffiffiffiffiffiffiffiffiffiP ir2 y;i q ; n2Gu;i pu;i ¼ ru þ lu;i X n2Gu;i sim u; nð Þ rn;i À rn À Á () where l serves as a normalizing factor, usu lu;i ¼ 1 X n2Gu;i simðu; nÞ () Gu;i – £ , : The most popular similarity metrics are cosine (7), constrained Pearson’s correlat rank correlation (9): sim x; yð Þ ¼ P i rx;i À rx À Á ry;i À ry À Á ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P i rx;i À rx À Á2P i ry;i À ry À Á2 q ; sim x; yð Þ ¼ P irx;iry;i ffiffiffiffiffiffiffiffiffiffiffiffiP ir2 x;i q ffiffiffiffiffiffiffiffiffiffiffiffiP ir2 y;i q ; publications and reviews also exist which include the most com- monly accepted metrics, aggregation approaches and evaluation measures: mean absolute error, coverage, precision, recall and derivatives of these: mean squared error, normalized mean absolute error, ROC and fallout; Goldberg et al. [13] focuses on the aspects not related to the evaluation, Breese et al. [6] compare the predictive accuracy of various methods in a set of representative problem domains. Candillier et al. [7] and Schafer et al. [36] review the main collaborative filtering methods proposed in the literature. The rest of the paper is structured as follows: In Section 2 we provide the basis for the principles on which the design of the new metric will be based, we present graphs which show the way in which the users vote, we carry out experiments which support the decisions made, we establish the best way of selecting numerical and non-numerical infor- mation from the votes and, finally, we establish the hypothesis on which the paper and its proposed metric are based. In Section 3 we establish the mathematical formulation of the metric. In Sections 4 and 5, respectively, we list the experiments that will be carried out and we present and discuss the results obtained. Section 6 presents the most relevant conclusions of the publication. 2. Approach and design of the new similarity metric 2.1. Introduction Collaborative filtering methods work on a table of U users who can rate I items. The prediction of a non-rated item i for a user u is sim x; yð Þ ¼ P i rx;i À rmed À Á ry;i À rmed À Á ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P i rx;i À rmed À Á2 q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P i ry;i À rmed À Á2 q ; rmed : median value in the rating scale; sim x; yð Þ ¼ P i rankx;i À rankx ranky;i À ranky ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P i rankx;i À rankx 2P i ranky;i À ranky 2 r : Although Pearson correlation is the most commonly used ric in the process of memory-based CF (user to user), this cho not always backed by the nature and distribution of the data i RS. Formally, in order to be able to apply this metric with gu tees, the following assumptions must be met: Linear relationship between x and y. Continuous random variables. Both variables must be normally distributed. These conditions are not normally met in real RS, and Pea correlation presents some significant cases of erroneous oper that should not be ignored in RS. Despite the deficiencies of Pearson correlation, this simi measure presents the best prediction and recommendation re in CF-based RS [15,16,31,7,35], furthermore, it is the most monly used, and therefore, any alternative metric proposed improve its results. On accepting that Pearson correlation is the metric for w the results must be improved, but not necessarily the most ap priate to be taken as a base, it is advisable to focus on the info tion that is obtained in the different research processes and w can sometimes be overlooked when searching for other diff J. Bobadilla et al. / Knowledge-Based Systems 23 (2010) 520–528 x y using standardized values [0..1]: rx : 0:75; 1; ; 0:5; 0:25; ; 0; 0ð Þ; ry : 0:75; 0:5; 0; 0:25; ; 0:5; 0:75; ð Þ: We define the cardinality of a list: #l as the number of elements in the list l different to . (1) We obtain the list dx;y : d 1 x;y; d 2 x;y; d 3 x;y; . . . ; d I x;y j d i x;y ¼ ri x À ri y 2 8ijri x – ^ri y – ; d i x;y ¼ 8ijri x ¼ _ ri y ¼ ; ð10Þ in our example: dx;y ¼ ð0; 0:25; ; 0:0625; ; ; 0:5625; Þ: (2) We obtain the MSD(x,y) measure computing the arithmetic average of the values in the list dx,y MSDðx; yÞ ¼ dx;y ¼ P i¼1::I;di x;y– d i x;y #dx;y ; ð11Þ in our example: dx;y ¼ ð0 þ 0:25 þ 0:0625 þ 0:5625Þ=4 ¼ 0:218 MSD(x,y) (11) tends towards zero as the ratings of users x and y become more similar and tends towards 1 as they became more different (we assume that the votes are normalized in the interval [0..1]). (3) We obtain the Jaccard(x,y) measure computing the propor- tion between the number of positions [1..I] in which there are elements different to in both rx and ry regarding the number of positions [1..I] in which there are elements differ- ent to in rx or in ry: Jaccardðx; yÞ ¼ rx ry rx [ ry ¼ #dx;y #rx þ #ry À #dx;y ; ð12Þ in our example: 4/(6 + 6À4) = 0.5. (4) We combine the above elements in the final equation: newmetric x; yð Þ ¼ Jaccard x; yð Þ Â 1 À MSD x; yð Þð Þ; ð13Þ in the running example: Table 1 Main parameters of the databases used in the experiments. MovieLens FilmAffinity NetFlix Number of users 4382 26447 480189 Number of movies 3952 21128 17770 Number of ratings 1000209 19126278 100480507 Min and max values 1–5 1–10 1–5 Ortega, F., SáNchez, J. L., Bobadilla, J., GutiéRrez, A. (2013). Improving collaborative filtering-based recommender systems results using Pareto dominance. Information Sciences, 239, 50-61.
  • 71. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 71 commonly used due to its low capacity to produce new recom- mendations. MSD offers both a great advantage and a great disadvantage at the same time; the advantage is that it generates very good general results: low average error, high percentage of correct predictions and low percentage of incorrect predictions: the disadvantage is that it has an intrinsic tendency to choose as similar users to one given user those users who have rated a very small number of items [35], e.g. if we have 7 items that can be rated from 1 to 5 and three users u1, u2, u3 with the following ratings: u1: (, , 4, 5, , , ), u2: (3, 4, 5, 5, 1, 4, ), u3: (3, 5, 4, 5, , 3, ) ( means not rated item), the MSD metric will indicate that (u1,u3) have a to- tal similarity (0), (u1,u2) have a similarity 0.5 and (u2,u3) have a lower similarity (0.6). This situation is not convincing, as intuitively we realize u2 and u3 are very similar, whilst u1 is only similar to u2 and u3 in 2 ratios, and, therefore, it is not logical to choose it as the most similar to them, and what is worse, if it is chosen it will not provide us with possibilities to recommend new items. The strategy to follow to design the new metric is to consider- ably raise the capacity to generate MSD predictions, without losing along the way its good behavior as regards accuracy and quality of the results. The metric designed is based on two factors: The similarity between two users calculated as the mean of the squared differences (MSD): the smaller these differences, the greater the similarity between the 2 users. This part of the met- ric enables very good accuracy results to be obtained. The number of items in which both one user and the other have made a rating regarding the total number of items which have been rated between the two users. E.g. given users u1: (3, 2, 4, , , ) and u2: (, 4, 4, 3, , 1), a common rating has been made in two items as regards a joint rating of five items. This factor enables us to greatly improve the metric’s capacity to make predictions. An important design aspect is the decision whether not to use a parameter for which the value should be given arbitrarily, i.e. the result provided by the metric should be obtained by only taking the values of the ratings provided by the users of the RS. By working on the 2 factors with standardized values [0..1], the metric obtained is as follows: Given the lists of ratings of 2 generic users x and y: rx; ry À Á : r1 x ; r2 x ; r3 x ; . . . ; rI x À Á ; r1 y ; r2 y ; r3 y ; ; . . . ; rI y j I is the number of items of our RS, where one of the possible values of each Fig. 5. MAE and coverage obtained with Pearson correlation and by combining Jaccard with Pearson correlation, cosine, constrained Pearson’s correlation, Spearman rank correlation and mean squared differences. (A) MAE, (B) Coverage. MovieLens 1M, 20% of test users, 20% of test items, k e [2..1500] step 25. Fig. 4. Measurements related to the Jaccard metric on MovieLens. (A) Number of pairs of users that display the Jaccard values represented on the x axis. (B) Averaged MAE obtained in the pairs of users with the Jaccard values represented on the x axis. (C) Averaged coverages obtained in the pairs of users with the Jaccard values represented on the x axis. Bobadilla, J., Serradilla, F., Bernal, J. (2010). A new collaborative filtering metric that improves the behavior of recommender systems. Knowledge-Based Systems, 23(6), 520-528.The comparative results in Graph 6B show improvements of up to 9% when applying the new metric as regards the correlation. ment in the results of the new metric regarding correlation, even by 15% in some cases. Fig. 6. Pearson correlation and new metric comparative results using MovieLens: (A) accuracy, (B) coverage, (C) percentage of perfect predictions, (D) precision/recall. 20% of test users, 20% of test items, k e [2..1500] step 50, N e [2..20], h = 5. Fig. 7. Correlation and new metric comparative results using NetFlix: (A) accuracy, (B) coverage, (C) percentage of perfect predictions, (D) precision/recall. 5% of test users, 20% of test items, k e [2..10000] step 100, N e [2..20], h = 9.
  • 74. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Le problème du démarrage à froid — Nouvelle application — Recommandation éditoriale — Encourager les utilisateurs à donner des avis — Nouvel utilisateur — Exploiter autant que possible d’autres informations sur l’utilisateur — formulaires, — amis sur les réseaux sociaux (= demander l’accès) — préférences sous forme de tags… — Nouvel item — Exploiter les méta-données (pour un film : année, réalisateur, acteurs…) — Exploiter les critiques que l’on peut trouver par ailleurs sur le Web 74
  • 78. Amazon : Organisation des objets (catégories) 78 Product Advertising API https://aws.amazon.com/ cf. http://www.codediesel.com/libraries/amazon-advertising-api-browsenodes/
  • 79. Similarités et espaces latents 79 Koren Y, Bell R, Volinsky C. Matrix Factorization Techniques for Recommender Systems. IEEE Computer. July 2009:42-50.
  • 80. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Projection de la matrice individus / items — Chaque item I est représenté par un vecteur q de dimension f — Chaque utilisateur U est représenté par un vecteur p de dim. f — Chaque facteur représente une propriété latente qui caractérise
 les items et qui souligne l’intérêt des utilisateurs pour celle-ci — Le produit scalaire entre q et p est une estimation de l’intérêt de U pour I — Méthode : — Décomposition en valeurs singulières — Approximation par descente de gradient (sur des données d’apprentissage) 80 note réelle note prédite facteur de régularisation constante de régularisation (apprise par validation croisée)
  • 81. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Espaces latents (suite) — Espace non convexe : risque de solution éloignée de l’optimum global — Approche par Moindres Carrés Alternés (Alternating Least Squares) . Fixe q, cherche p ; fixe p, cherche q etc. . Utile lorsque les données (notes d’apprentissage) sont implicites (matrice non creuse) — Tenir compte des biais = modifier les valeurs prédites — Des utilisateurs ont tendance à toujours donner de bonnes notes — Certains items ont toujours tendance à avoir de bonnes notes — Le score final doit dépendre de la moyenne de tous les scores (base de départ) — Intégrer les préférences a priori des utilisateurs (x : items préférés de u ; y: attributs (âge…)) — Tenir compte de la dynamique 81
  • 88. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Intégration du contexte — Très nombreuses définition du contexte — Plusieurs stratégies d’intégration 88 Adomavicius G, Mobashe B, Ricci F, Tuzhilin A. Context-Aware Recommender Systems. In AAAI 2011 :; 2017:67-81.
  • 89. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Intégration du contexte (suite) — Cube Individus x Items x Contextes remplace la matrice Individus x Items — Factorisation de tenseurs 89 Karatzoglou, A.; Amatriain, X.; Baltrunas, L.; and Oliver, N. 2010. Multiverse Recommendation: N-Dimensional Tensor Factorization for Context-Aware Collaborative Filtering. In Proceedings of the 2010 ACM Conference on Recommender Systems, 79–86.
  • 90. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Exploitation des liens (réseaux sociaux) — Le réseau social comme entrée supplémentaire 90 Yang X, Guo Y, Liu Y, Steck H. A survey of collaborative filtering based social recommender systems. Computer Communications. 2014;41(C):1-10. doi:10.1016/j.comcom.2013.06.009.
  • 91. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Exploitation des liens (réseaux sociaux) (2) — Prédiction selon les liens entre individus (inférence Bayésienne) 91 individu qui cherche une note individus qui ont noté l’item individus intermédiaires qui réunissent les notes
  • 93. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Recommandation basée sur le contenu — Lien fort avec la Recherche d’Information — La notion de « Profil utilisateur » est à rapprocher de la notion de « Requête » 93
  • 96. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Contenu audio 96 Wang, X., Wang, Y. (2014, November). Improving content-based and hybrid music recommendation using deep learning. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 627-636). ACM.
  • 99. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Recommending Books vs Searching for Books ? Very diverse needs : — Topicality — With a precise context eg. arts in China during the XXth century — With named entities : locations (the book is about a specific location OR the action takes place at this location), proper names… — Style / Expertise / Language — fiction, novel, essay, proceedings, position papers… — for experts / for dummies / for children … — in English, in French, in old French, in (very) local languages … — looking for citations / references — in what book appears a given citation — what are the books that refer to a given one — Authority : — What are the most important books about … (what most important means ?) — What are the most popular books about … 99
  • 100. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 100 http://social-book-search.humanities.uva.nl/#/overview 2 The Amazon collection The document used for this year’s Book Track is composed of Amazon pages of existing books. These pages consist of editorial information such as ISBN num- ber, title, number of pages etc... However, in this collection the most important content resides in social data. Indeed Amazon is social-oriented, and user can comment and rate products they purchased or they own. Reviews are identi- fied by the review fields and are unique for a single user: Amazon does not allow a forum-like discussion. They can also assign tags of their creation to a product. These tags are useful for refining the search of other users in the way that they are not fixed: they reflect the trends for a specific product. In the XML documents, they can be found in the tag fields. Apart from this user classification, Amazon provides its own category labels that are contained in the browseNode fields. Table 1. Some facts about the Amazon collection. Number of pages (i.e. books) 2, 781, 400 Number of reviews 15, 785, 133 Number of pages that contain a least a review 1, 915, 336 3 Retrieval model 3.1 Sequential Dependence Model Like the previous year, we used a language modeling approach to retrieval [4]. We use Metzler and Croft’s Markov Random Field (MRF) model [5] to integrate multiword phrases in the query. Specifically, we use the Sequential Dependance Organizers Marijn Koolen (University of Amsterdam) Toine Bogers (Aalborg University Copenhagen) Antal van den Bosch (Radboud University Nijmegen) Antoine Doucet (University of Caen) Maria Gaede (Humboldt University Berlin) Preben Hansen (Stockholm University) Mark Hall (Edge Hill University) Iris Hendrickx (Radboud University Nijmegen) Hugo Huurdeman (University of Amsterdam) Jaap Kamps (University of Amsterdam) Vivien Petras (Humboldt University Berlin) Michael Preminger (Oslo and Akershus University College of Applied Sciences) Mette Skov (Aalborg University Copenhagen) Suzan Verberne (Radboud University Nijmegen) David Walsh (Edge Hill University)
  • 105. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 105 Des profils « utilisateur » (catalog, reviews, ratings)
  • 106. Idée : utiliser les critiques et commentaires plutôt que les contenus 106 Commentaires 
 contiennent: - keywords - topics - sentiment - abstracts - other books
  • 107. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 107 6 La Recommandation de Livres / RI SBS 2016 – Dataset : Amazon collection of 2.8M records Index Fields Université Aix-Marseille Amal Htait
  • 108. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 108 7 La Recommandation de Livres / RI SBS 2016 – Dataset : LibraryThing Collection of 113,490 users profiles userid workid author booktitle publication-year catalogue-date rating tags u3266995 660947 Rosina Lippi Homestead 1999 2006-06 10.0 fiction u1885143 2729214 Ellen Hopkins Glass 2009 2009-05 6.0 drugs u1885143 133315 Tite Kubo Bleach, Vol. 1 2004 2009-06 6.0 manga Index Fields Université Aix-Marseille Amal Htait
  • 109. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 109 8 La Recommandation de Livres / RI SBS 2016 - Topics Query : Traitement de la requête par les Informations des Livres en Exemples Université Aix-Marseille Amal Htait
  • 110. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 110 9 La Recommandation de Livres / RI SBS 2016 - Retrieval Model : Méthode - SDM Weighting query terms [Metzler2005] ● Unigram matches ● Bigram exact matches ● Bigram matches within an un-ordered window of 8 terme Université Aix-Marseille Amal Htait
  • 112. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 112 Koolen, M., Bogers, T., Gäde, M., Hall, M., Hendrickx, I., Huurdeman, H., ... Walsh, D. (2016, September). Overview of the CLEF 2016 Social Book Search Lab. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 351-370). Springer International Publishing.
  • 115. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Building a Graph of Books — Nodes = books + properties (metadata, #reviews and ranking, page ranks, ratings…) — Edges = links between books — Book A refers to Book B according to:
 — Bibliographic references and citations (in the book / in the reviews)
 — Amazon recommendation (People who bought A bought B, People who liked A liked B…) — A is similar to B
 — They share bibliographic references
 — Full-text similarity + similarity between the metadata 115 The graph allows to estimate — « Book Ranks » (cf. the Google’s Page Rank) — Neighborhood — Shortest paths
  • 116. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 116 Jeh, G., Widom, J. (2002, July). SimRank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 538-543). ACM.
  • 117. Recommending books : IR + graph mining 117 IR : Sequential Dependance Model (SDM) - Markov Random Field (Metzler Croft, 2004) and/or Divergence From Randomness (InL2) model + Query Expansion with Dependance Analysis Ratings : The more a book has reviews and the more it has good ratings, the more relevant it is. Graph : Expanding the retrieved books with Similar Books then Reranking with PageRank 13 ● We tested many reranking methods. Combining the retrieval model scores and other scores based on social information. ● For each document compute: – PageRank: algorithm that exploits link structure to score the important of nodes in the graph. – Likeliness: Computed from information generated by users (reviews and ratings). More the book has a lot of reviews and good ratings, the more interesting it is. Graph Modeling – Reranking Schemes 12 ti Retrieving Collection DGD Dti DStartingNodes Neighbors SPnodes DgraphDgraph Delete duplications D nal 1 2 3 5 6 7 89 + 10 Reranking 11 Graph Modeling - Recommendation Page Rank + Similar Products - Very good results in 2011 (Judgements obtained by crowdsourcing) (IR and ratings)
 P@10 ≈ 0.58
 - Good results in 2014 (IR, ratings, expansion) P@10 ≈ 0.23 ; MAP ≈ 0.44
 
 - in 2015 : rank 25/47 (IR + graph 
 but graph improved IR) P@10 ≈ 0.2 (best 0.39, included 
 the price of books)
  • 118. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Une perspective: fouille de graphes multicouches — Thèse de Mohamed Ettaleb (co-dirigée par Pr. C. Latiri, B. Douhar, P. Bellot) 118 Livres « similaire à » couche « achetés ensemble » couche auteurs couche tags Question : quels sous-graphes fréquents ? comment les interpréter ?
  • 119. 119 Et dans la vraie vie ? (pour nous : OpenEdition) *ProvenanceDBased*Access*Control*for*the*Cloud! VGOLAP:*Volunteered*Geographic*OLAP* BILBO ÉCHO CLASSIFICATION AUTOMATIQUE ET MÉTADONNÉES RECOMMANDATION GRAPHE DE CONTENUS questions de communication vertigo edc echogeo vertigo quaderni BILBO - MISE EN RELATION DES COMPTES-RENDUS AVEC LES LIVRES ÉCHO - ANALYSE DES SENTIMENTS Langouet, G., (1986), « Innovations pédago- giques et technologies éducatives », Revue française de pédagogie, n° 76, pp. 25-30. Langouet, G., (1986), « Innovations pédagogiques et technologies éducatives », Revue française de pédagogie, n° 76, pp. 25-30. DOI : 10.3406/rfp.1986.1499 18 Voir Permanent Mandates Commission, Minutes of the Fifteenth Session (Geneva: League of Nations, 1929), pp. 100-1. Pour plus de détails, voir Paul Ghali, Les nationalités détachées de l’Empire ottoman à la suite de la guerre (Paris: Les Éditions Domat-Mont- chrestien, 1934), pp. 221-6. ils ont déjà édité trois recueils et auxquelles ils ont consacré de nombreux travaux critiques. Leur nouvel ouvrage, intitulé Le Roman véritable. Stratégies préfacielles au XVIIIe siècle et rédigé à six mains par Jan Herman, Mladen Kozul et Nathalie Kremer – chaque auteur se chargeant de certains chapitres au sein BILBO NIVEAU 1 NIVEAU 2 NIVEAU 3 biblauthorsurnameLangouet/surname, forenameG./forename,/author (date1986/date), title level=a« Innovations pédagogiques et technologies éducatives »/title, title level=jRevue française de pédagogie/title, abbrn°/abbr biblScope type=issue76/biblScope, abbrpp./abbr biblScope type=page25-30/biblScope. idno type=DOIDOI : 10.3406/rfp.1986.1499/idno/bibl RI sociale Extraction d’information par Programmation Logique Inductiveles de langue temporels et apprentissage de méta-caractéristiques OpenEdition OpenEdition Univ. Recife (Brésil) Extraction d’information Chercher des critiques Les reliers aux livres Analyse de sentiments Recommandation de livres SVM - Z-score - CRF Graph scoring NOTES POLARITE GRAPHE RECOMMANDATION Analyse de citations
  • 120. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Identifier des critiques de livres dans des blogs • Classification supervisée « en genre » • Caractéristiques : unigrammes, localisation des entités nommées, dates • Sélection de caractéristiques : Seuil du Z-score + random forest 120 tations (bag-of-words, feature selection using z- score, Named Entity repartition in the text). 6.1 Naive Bayes (NB) In order to evaluate different classification mod- els, we have adopted as a baseline the naive Bayes approach (Zubaryeva and Savoy, 2010). The clas- sification system has to choose between two pos- sible hypotheses: h0 = It is a Review and h1 = It is not a Review the class that has the maxi- mum value according to the Equation (5). Where |w| indicates the number of words included in the current document and wj is the number of words that appear in the document. arg max hi P(hi). |w| Y j=1 P(wj|hi) (5) where P(wj|hi) = tfj,hi nhi We estimate the probabilities with the Equation (5) and get the relation between the lexical fre- quency of the word wj in the whole size of the collection Thi (denoted tfj,hi ) and the size of the corresponding corpus. We have used different strategies to represent each textual unit. First, the unigram model (Bag-of- Words) where all words are considered as features. We also used feature selection based on the nor- malized z-score by keeping the first 1000 words according to this score (after removing all words that appear less than 5 times). As the third ap- proach, we suggested that the common features between the Review collection can be located in the Named Entity distribution in the text. Table 4: Results showing the performances of the classification models using different indexing schemes on the test set. The best values for the Review class are noted in bold and those for Review class are are underlined Review Review # Model R P F-M R P F-M 1 NB 65.5% 81.5% 72.6% 81.6% 65.7% 72.8% SVM (Linear) 99.6% 98.3% 98.9% 97.9% 99.5% 98.7% SVM (RBF) 89.8% 97.2% 93.4% 96.8% 88.5% 92.5% * C = 5.0 * = 0.00185 2 NB 90.6% 64.2% 75.1% 37.4% 76.3% 50.2% SVM (Linear) 87.2% 81.3% 84.2% 75.3% 82.7% 78.8% SVM (RBF) 87.2% 86.5% 86.8% 83.1% 84.0% 83.6% * C = 32.0 * = 0.00781 3 NB 80.0% 68.4% 73.7% 54.2% 68.7% 60.6% SVM (Linear) 77.0% 81.9% 79.4% 78.9% 73.5% 76.1% SVM (RBF) 81.2% 48.6% 79.9% 72.6% 75.8% 74.1% * C = 8.0 * = 0.03125 Z scores across the corpus. # Feature Z score # Feature Z score 1 abandonne 30.14 16 winter 9.23 2 seront 30.00 17 cleo 8.88 3 biographie 21.84 18 visible 8.75 4 entranent 21.20 19 fondamentale 8.67 5 prise 21.20 20 david 8.54 6 sacre 21.20 21 pratiques 8.52 7 toute 20.70 22 signification 8.47 8 quitte 19.55 23 01 8.38 9 dimension 15.65 24 institutionnels 8.38 10 les 14.43 25 1930 8.16 11 commandement 11.01 26 attaques 8.14 12 lie 10.61 27 courrier 8.08 13 construisent 10.16 28 moyennes 7.99 14 lieux 10.14 29 petite 7.85 15 garde 9.75 30 adapted 7.84 In our training corpus, we have 106 911 words obtained from the Bag-of-Words approach. We se- lected all tokens (features) that appear more than 5 times in each classes. The goal is therefore to design a method capable of selecting terms that clearly belong to one genre of documents. We ob- tained a vector space that contains 5 957 words (features). After calculating the normalized z- score of all features, we selected the first 1 000 features according to this score. (Poibeau, 2003). We aim to explore the distribu- tion of 3 named entities (”authors’ names”, ”loca- tions” and ”dates”) in the text after removing all XML-HTML tags. After that, we divided texts into 10 parts (the size of each part = total num- ber of words / 10). The distribution ratio of each named entity in each part is used as feature to build the new document representation and we obtained a set of 30 features. Figure 3: ”Person” named entity distribution 6 Experiments In this section we describe results from experi- ments using a collection of documents from Re- vues.org and the Web. We use supervised learning Figure 4: ”Location” named entity distribution Figure 5: ”Date” named entity distribution methods to build our classifiers, and evaluate the resulting models on new test cases. The focus of our work has been on comparing the effectiveness of different inductive learning algorithms (Naive Bayes, Support Vector Machines with RBF and Linear Kernels) in terms of classification accuracy. We also explored alternative document represen- tations (bag-of-words, feature selection using z- score, Named Entity repartition in the text). 6.2 Supp SVM desig by Vapnik recognition method is mization p tional learn learn linea a simple p tion, they radial basi layer sigm key in suc mal bound use them f garwal and the differen used the W model with Basic Func a good lev growth of stage.(Kum 6.3 Resu We have us textual uni Words) wh Figure 4: ”Location” named entity distribution Figure 5: ”Date” named entity distribution methods to build our classifiers, and evaluate the resulting models on new test cases. The focus of our work has been on comparing the effectiveness of different inductive learning algorithms (Naive Bayes, Support Vector Machines with RBF and Linear Kernels) in terms of classification accuracy. We also explored alternative document represen- tations (bag-of-words, feature selection using z- score, Named Entity repartition in the text). 6.1 Naive Bayes (NB) In order to evaluate different classification mod- els, we have adopted as a baseline the naive Bayes approach (Zubaryeva and Savoy, 2010). The clas- sification system has to choose between two pos- sible hypotheses: h0 = It is a Review and h1 = It is not a Review the class that has the maxi- mum value according to the Equation (5). Where 6.2 Support Vector Machines (SVM) SVM designates a learning approach introduced by Vapnik in 1995 for solving two-class pattern recognition problem (Vapnik, 1995). The SVM method is based on the Structural Risk Mini- mization principle (Vapnik, 1995) from computa- tional learning theory. In their basic form, SVMs learn linear threshold function. Nevertheless, by a simple plug-in of an appropriate kernel func- tion, they can be used to learn linear classifiers, radial basic function (RBF) networks, and three- layer sigmoid neural nets (Joachims, 1998). The key in such classifiers is to determine the opti- mal boundaries between the different classes and use them for the purposes of classification (Ag- garwal and Zhai, 2012). Having the vectors form the different representations presented below. we used the Weka toolkit to learning model. This model with the use of the linear kernel and Radial Basic Function(RBF) sometimes allows to reach a good level of performance at the cost of fast growth of the processing time during the learning stage.(Kummer, 2012) 6.3 Results We have used different strategies to represent each textual unit. First, the unigram model (Bag-of- Words) where all words are considered as features. We also used feature selection based on the nor- malized z-score by keeping the first 1000 words according to this score (after removing all words that appear less than 5 times). As the third ap- proach, we suggested that the common features between the Review collection can be located in the Named Entity distribution in the text.
  • 121. P. Bellot (AMU-CNRS, LSIS-OpenEdition) L’analyse de sentiments sur les critiques • Statistical Metrics (PMI, Z-score, odd ratio…) • Combined with Linguistic Ressources 121 Z_score for each term ti in a class Cj (tij) by cal- culating its term relative frequency tfrij in a par- ticular class Cj, as well as the mean (meani) which is the term probability over the whole cor- pus multiplied by nj the number of terms in the class Cj, and standard deviation (sdi) of term ti according to the underlying corpus (see Eq. (1,2)). Z!#$% !! = !#!!!#$! !# Eq. (1) Z!#$% !! = !#!!!!∗!(!) !∗! ! ∗(!!!(!)) Eq. (2) The term which has salient frequency in a class in compassion to others will have a salient Z_score. Z_score was exploited for SA by (Zubaryeva and Savoy 2010) , they choose a threshold (2) for selecting the number of terms having Z_score more than the threshold, then they used a logistic regression for combining Bing Liu's Opinion Lexicon which is created by (Hu and Liu 2004) and augmented in many latter works. We extract the number of positive, nega- tive and neutral words in tweets according to the- se lexicons. Bing Liu's lexicon only contains negative and positive annotation but Subjectivity contains negative, positive and neutral. - Part Of Speech (POS) We annotate each word in the tweet by its POS tag, and then we compute the number of adjec- tives, verbs, nouns, adverbs and connectors in each tweet. 4 Evaluation 4.1 Data collection We used the data set provided in SemEval 2013 and 2014 for subtask B of sentiment analysis in Twitter(Rosenthal, Ritter et al. 2014) (Wilson, Kozareva et al. 2013). The participants were provided with training tweets annotated as posi- Z_score for each term ti in a class Cj (tij) by cal- culating its term relative frequency tfrij in a par- ticular class Cj, as well as the mean (meani) which is the term probability over the whole cor- pus multiplied by nj the number of terms in the class Cj, and standard deviation (sdi) of term ti according to the underlying corpus (see Eq. (1,2)). Z!#$% !! = !#!!!#$! !# Eq. (1) Z!#$% !! = !#!!!!∗!(!) !∗! ! ∗(!!!(!)) Eq. (2) The term which has salient frequency in a class in compassion to others will have a salient Z_score. Z_score was exploited for SA by (Zubaryeva and Savoy 2010) , they choose a threshold (2) for selecting the number of terms having Z_score more than the threshold, then they used a logistic regression for combining these scores. We use Z_scores as added features for classification because the tweet is too short, therefore many tweets does not have any words with salient Z_score. The three following figures Bin (Hu wor tive se neg con - Pa We tag, tive eac 4 4.1 W and Tw Koz pro tive twe we of p pus multiplied by nj the number of term class Cj, and standard deviation (sdi) o according to the underlying corpus ( (1,2)). Z!#$% !! = !#!!!#$! !# Eq. (1) Z!#$% !! = !#!!!!∗!(!) !∗! ! ∗(!!!(!)) Eq. (2) The term which has salient frequency in in compassion to others will have a Z_score. Z_score was exploited for (Zubaryeva and Savoy 2010) , they c threshold (2) for selecting the number having Z_score more than the thresho they used a logistic regression for co these scores. We use Z_scores as added for classification because the tweet is to therefore many tweets does not have an with salient Z_score. The three following 1,2,3 show the distribution of Z_score o class, we remark that the majority of te (1,2)). Z!#$% !! = !#!!!#$! !# Eq. (1) Z!#$% !! = !#!!!!∗!(!) !∗! ! ∗(!!!(!)) Eq. (2) The term which has salient frequency in a class in compassion to others will have a salient Z_score. Z_score was exploited for SA by (Zubaryeva and Savoy 2010) , they choose a threshold (2) for selecting the number of terms having Z_score more than the threshold, then they used a logistic regression for combining these scores. We use Z_scores as added features for classification because the tweet is too short, therefore many tweets does not have any words with salient Z_score. The three following figures 1,2,3 show the distribution of Z_score over each class, we remark that the majority of terms has Z_score between -1.5 and 2.5 in each class and the rest are either vey frequent (2.5) or very rare (-1.5). It should indicate that negative value means that the term is not frequent in this class in comparison with its frequencies in other classes. Table1 demonstrates the first ten terms having the highest Z_scores in each class. We have test- ed to use different values for the threshold, the best results was obtained when the threshold is 3. positive Z_score negative Z_score Neutral Z_score Love Good Happy Great Excite Best Thank Hope Cant Wait 14.31 14.01 12.30 11.10 10.35 9.24 9.21 8.24 8.10 8.05 Not Fuck Don’t Shit Bad Hate Sad Sorry Cancel stupid 13.99 12.97 10.97 8.99 8.40 8.29 8.28 8.11 7.53 6.83 Httpbit Httpfb Httpbnd Intern Nov Httpdlvr Open Live Cloud begin 6.44 4.56 3.78 3.58 3.45 3.40 3.30 3.28 3.28 3.17 Table1. The first ten terms having the highest Z_score in each class - Part Of Speech (POS) We annotate each word in the tweet by its POS tag, and then we compute the number of adjec- tives, verbs, nouns, adverbs and connectors in each tweet. 4 Evaluation 4.1 Data collection We used the data set provided in SemEval 2013 and 2014 for subtask B of sentiment analysis in Twitter(Rosenthal, Ritter et al. 2014) (Wilson, Kozareva et al. 2013). The participants were provided with training tweets annotated as posi- tive, negative or neutral. We downloaded these tweets using a given script. Among 9646 tweets, we could only download 8498 of them because of protected profiles and deleted tweets. Then, we used the development set containing 1654 tweets for evaluating our methods. We combined the development set with training set and built a new model which predicted the labels of the test set 2013 and 2014. 4.2 Experiments Official Results The results of our system submitted for SemEval evaluation gave 46.38%, 52.02% for test set 2013 and 2014 respectively. It should mention that these results are not correct because of a software bug discovered after the submis- sion deadline, therefore the correct results is demonstrated as non-official results. In fact the previous results are the output of our classifier which is trained by all the features in section 3, but because of index shifting error the test set was represented by all the features except the terms. Non-official Results We have done various experiments using the features presented in Section 3 with Multinomial Naïve-Bayes model. We firstly constructed fea- ture vector of tweet terms which gave 49%, 46% features which improve the performance by 6.5% and 10.9%, then by pre-polarity features which also improve the f-measure by 4%, 6%, but the extending with POS tags decreases the f- measure. We also test all combinations with the- se previous features, Table2 demonstrates the results of each combination, we remark that POS tags are not useful over all the experiments, the best result is obtained by combining Z_score and pre-polarity features. We find that Z_score fea- tures improve significantly the f-measure and they are better than pre-polarity features. Figure 1 Z_score distribution in positive class Figure 2 Z_score distribution in neutral class Features F-measure 2013 2014 Terms 49.42 46.31 Terms+Z 55.90 57.28 Terms+POS 43.45 41.14 Terms+POL 53.53 52.73 Terms+Z+POS 52.59 54.43 Terms+Z+POL 58.34 59.38 Terms+POS+POL 48.42 50.03 Terms+Z+POS+POL 55.35 58.58 Table 2. Average f-measures for positive and negative clas- ses of SemEval2013 and 2014 test sets. We repeated all previous experiments after using a twitter dictionary where we extend the tweet by the expressions related to each emotion icons or abbreviations in tweets. The results in Table3 demonstrate that using that dictionary improves the f-measure over all the experiments, the best results obtained also by combining Z_scores and pre-polarity features. Features F-measure 2013 2014 Terms 50.15 48.56 Terms+Z 57.17 58.37 Terms+POS 44.07 42.64 Terms+POL 54.72 54.53 Terms+Z+POS 53.20 56.47 Terms+Z+POL 59.66 61.07 Terms+POS+POL 48.97 51.90 Terms+Z+POS+POL 55.83 60.22 Table 3. Average f-measures for positive and negative clas- ses of SemEval2013 and 2014 test sets after using a twitter dictionary. 5 Conclusion In this paper we tested the impact of using Twitter Dictionary, Sentiment Lexicons, Z_score features and POS tags for the sentiment classifi- cation of tweets. We extended the feature vector of tweets by all these features; we have proposed new type of features Z_score and demonstrated that they can improve the performance. features which improve the performance by 6.5% and 10.9%, then by pre-polarity features which also improve the f-measure by 4%, 6%, but the extending with POS tags decreases the f- measure. We also test all combinations with the- se previous features, Table2 demonstrates the results of each combination, we remark that POS tags are not useful over all the experiments, the best result is obtained by combining Z_score and pre-polarity features. We find that Z_score fea- tures improve significantly the f-measure and they are better than pre-polarity features. Figure 1 Z_score distribution in positive class Features F-measure 2013 2014 Terms 49.42 46.31 Terms+Z 55.90 57.28 Terms+POS 43.45 41.14 Terms+POL 53.53 52.73 Terms+Z+POS 52.59 54.43 Terms+Z+POL 58.34 59.38 Terms+POS+POL 48.42 50.03 Terms+Z+POS+POL 55.35 58.58 Table 2. Average f-measures for positive and negative clas- ses of SemEval2013 and 2014 test sets. We repeated all previous experiments after using a twitter dictionary where we extend the tweet by the expressions related to each emotion icons or abbreviations in tweets. The results in Table3 demonstrate that using that dictionary improves the f-measure over all the experiments, the best results obtained also by combining Z_scores and pre-polarity features. Features F-measure 2013 2014 Terms 50.15 48.56 Terms+Z 57.17 58.37 Terms+POS 44.07 42.64 Terms+POL 54.72 54.53 Terms+Z+POS 53.20 56.47 Terms+Z+POL 59.66 61.07 Terms+POS+POL 48.97 51.90 Terms+Z+POS+POL 55.83 60.22 Table 3. Average f-measures for positive and negative clas- ses of SemEval2013 and 2014 test sets after using a twitter dictionary. 5 Conclusion [Hamdan, Béchet Bellot, SemEval 2014] http://sentiwordnet.isti.cnr.it
  • 125. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Linking Contents by Analyzing the References In Books : no common stylesheet (or a lot of stylesheets poorly respected…) Our proposal : 1) Searching for references in the document / footnotes (Support Vector Machines) 2) Annotating the references (Conditional Random Fields) BILBO : Our (open-source) software for Reference Analysis 125 Google Digital Humanities Research Awards (2012) Annotation DOI search (Crossref) OpenEdition Journals : more than 1.5 million references analyzed Test : http://bilbo.openeditionlab.org Sources : http://github.com/OpenEdition/bilbo
  • 127. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 127 Ollagnier, A., Fournier, S., Bellot, P. (2016). A Supervised Approach for Detecting Allusive Bibliographical References in Scholarly Publications. In WIMS (p. 36).
  • 134. P. Bellot (AMU-CNRS, LSIS-OpenEdition) Conclusion — De très nombreuses approches (hybrides) — Filtrage collaboratif et exploitation de l’historique — Analyse des contenus — Exploitation de données comportementales et d’informations explicites — Exploitation des réseaux sociaux — - tout combiner dans un seul modèle d’apprentissage ?
 Quelle fonction à optimiser ? — Des liens forts avec d’autres domaines — Méthodes statistiques, fouille de données et de graphes, apprentissage… — Recherche d’information (n’est-ce pas aussi de la recommandation ?), traitement automatique des langues, analyse d’image/signal, ergonomie et interaction… — Il faut choisir les approches mais aussi les données — Usages et contextes — Préservation de la vie privée 134
  • 139. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 139 Çoba, L., Zanker, M. rrecsys: an R-package for prototyping recommendation algorithms, RecSys 2016.