The document discusses recommender systems and summarizes:
1) It introduces recommender systems and the different types including knowledge-based, collaborative filtering, and content-based recommendations.
2) It outlines some of the key resources for recommender systems including datasets, conferences, and articles.
3) It provides a high-level overview of common recommender system approaches like collaborative filtering, content-based analysis, and knowledge-based recommendations.
3. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Quelques questions ouvertes…
— Est-il utile d’exploiter les méta-données, les contenus, les commentaires ?
— Comment relier les contenus les uns aux autres ?
— Comment exploiter des contenus de nature différente ?
— Comment « comprendre » les besoins des lecteurs ? des requêtes longues ?
des profils ?
— Quels sont les usages ? Quels sont les besoins ?
— Comment aller au-delà de la pertinence informationnelle ? (genre, niveau
d’expertise, document récent ou non…)
3
— OpenEdition Lab : un programme de recherche HN
— Détecter des tendances, des sujets émergents, les livres « à lire »…
4. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Plan
— Quelques exemples : poser les problèmes et les enjeux
— Quelles ressources ?
— Quelques généralités méthodologiques
— Quelques stratégies d’évaluation d’une recommandation
— Autour du filtrage collaboratif ( = recommandation « sociale » ?)
— Autour de l’analyse de contenu et de la suggestion de contenus
. focus sur la recherche de livres par requêtes longues en langue naturelle
4
5. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Introduction
Objectifs de la recommandation :
— Recommander des « objets » (films, livres, pages Web…)
— Prédire les notes que individus donneraient
Différents types de recommandation :
— Selon des connaissances : caractéristiques sur les individus cibles (âge, salaire…)
— Selon les préférences des individus
— exprimées par les individus eux-mêmes explicitement
— devinées en analysant leur comportement (%) — lien avec classification
— En croisant les comportements des individus : filtrage collaboratif
— En construisant des profils et en les comparant aux contenus
Un grand nombre de sources d’information :
— Informations explicitement données par les individus
— Les contenus et leurs méta-données
— Le Web et les réseaux sociaux (contenus, graphes…)
5
7. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
ACM Conférences et ateliers
— Conférences :
— Recommender Systems RecSys (depuis 2007)
— Sessions « Recommendation Systems » à SIGIR, CIKM,
— Ateliers :
— Context-aware Movie Recommendation (2010+2011)
— Information Heterogeneity and Fusion in Recommender Systems (2010+2011)
— Large-Scale Recommender Systems and the Netflix Prize Competition (2008)
— Recommendation Systems for Software Engineering (2008-14)
— Recommender Systems and the Social Web (2012)
7
29. Quelques collections de données
29
MSD(x,y) (11) tends towards zero as the ratings of users x and
y become more similar and tends towards 1 as they became
more different (we assume that the votes are normalized in
the interval [0..1]).
(3) We obtain the Jaccard(x,y) measure computing the propor-
tion between the number of positions [1..I] in which there
are elements different to in both rx and ry regarding the
number of positions [1..I] in which there are elements differ-
ent to in rx or in ry:
Jaccardðx; yÞ ¼
rx ry
rx [ ry
¼
#dx;y
#rx þ #ry À #dx;y
; ð12Þ
in our example: 4/(6 + 6À4) = 0.5.
(4) We combine the above elements in the final equation:
newmetric x; yð Þ ¼ Jaccard x; yð Þ Â 1 À MSD x; yð Þð Þ; ð13Þ
in the running example:
users taken at rand
the remaining 80% w
given the huge num
its users as test user
Table 2 shows th
5. Results
In this section w
abases specified in T
MovieLens, Fig. 7 sho
responds to FilmAffi
Graph 6A shows t
ing Pearson correlat
uous). The new m
practically all the ex
of k-neighborhoods
around 0.2 stars in t
150, 200).
Graph 6B shows
small percentages i
improbable that the
film that this user h
increases, the proba
the film also increas
Table 1
Main parameters of the databases used in the experiments.
MovieLens FilmAffinity NetFlix
Number of users 4382 26447 480189
Number of movies 3952 21128 17770
Number of ratings 1000209 19126278 100480507
Min and max values 1–5 1–10 1–5
Table 2
Main parameters used in the experiments.
K (MAE, coverage, perfect predictions) Precision/recall
44. Des « individus » et des « données »
44
Soient T un tableau croisant n individus I (en lignes) et K variable
quantitatives X (en colonnes). xi,k est la valeur de la variable k pour l’indi
vidu i :
X1 X2 · · · XK variables
individus
I1 x1,1 x1,2 · · · x1,K
I2 x2,1 x2,2
... x2,K
...
...
... xi,k
...
In xn,1 xn,2
... xn,K
Un des objectifs de l’analyse de donn´ees est de d´eterminer des profil
d’individus ou, dit autrement, des classes d’individus se ressemblant. Cett
ressemblance est d´etermin´ee `a partir des valeurs des variables associ´ees au
individus.
Un autre objectif concerne les variables elles-mˆemes : calcul des corr´elatio
entre elles (`a quel point une ´evolution des valeurs de l’une entraˆıne un
´evolution des valeurs de l’autre et de quelle mani`ere), r´egression entre va
riables (formulation des liens entre variables)... L’Analyse en Composante
Principales (ACP) concerne les liaisons lin´eaires entre variables, par op
position aux liaisons quadratiques, logarithmiques ou exponentielles pa
exemple. L’ACP fait partie des analyses factorielles qui vont d´etermine
45. P. Bellot
• L’analyse des données peut être conduite selon
• les individus : recherche de ressemblance entre les individus (en fonction
des valeurs des variables) = classification automatique des individus
• les variables : quelles sont les variables qui expliquent le mieux les
données (les différences entre individus) ? quelles sont les composantes
principales ? où se trouve la plus grande variabilité ?
Etude des individus / étude des variables
45
Chapitre 2
Analyse en composantes
principales
Soient T un tableau croisant n individus I (en lignes) et K variables
quantitatives X (en colonnes). xi,k est la valeur de la variable k pour l’indi-
vidu i :
X1 X2 · · · XK variables
individus
I1 x1,1 x1,2 · · · x1,K
I2 x2,1 x2,2
... x2,K
...
...
... xi,k
...
In xn,1 xn,2
... xn,K
Un des objectifs de l’analyse de donn´ees est de d´eterminer des profils
d’individus ou, dit autrement, des classes d’individus se ressemblant. Cette
ressemblance est d´etermin´ee `a partir des valeurs des variables associ´ees aux
individus.
Un autre objectif concerne les variables elles-mˆemes : calcul des corr´elations
entre elles (`a quel point une ´evolution des valeurs de l’une entraˆıne une
´evolution des valeurs de l’autre et de quelle mani`ere), r´egression entre va-
riables (formulation des liens entre variables)... L’Analyse en Composantes
Principales (ACP) concerne les liaisons lin´eaires entre variables, par op-
position aux liaisons quadratiques, logarithmiques ou exponentielles par
exemple. L’ACP fait partie des analyses factorielles qui vont d´eterminer
des facteurs `a partir des valeurs des variables associ´ees aux individus. Ces
7
Chapitre 2
Analyse en composantes
principales
Soient T un tableau croisant n individus I (en lignes) et K variables
quantitatives X (en colonnes). xi,k est la valeur de la variable k pour l’indi-
vidu i :
X1 X2 · · · XK variables
individus
I1 x1,1 x1,2 · · · x1,K
I2 x2,1 x2,2
... x2,K
...
...
... xi,k
...
In xn,1 xn,2
... xn,K
Un des objectifs de l’analyse de donn´ees est de d´eterminer des profils
temp-data.frame(temperature[1:12])
cl = kmeans(temp,3,iter.max=2,nstart=15)
e) visualisez les classes :
summary(cl)
cl$cluster
summary(cl$cluster)
cl$center
f) Ajouter le résultat de la classification aux données
- utilisez le paquetage cluster pour accéder à la fonction clusplot : library(cluster)
- puis :
aggregate(temperature,by=list(cl$cluster),FUN=mean)
cl2-data.frame(temperature,cl$cluster)
clusplot(temperature,cl$cluster,color=TRUE,shade=TRUE,labels=2,lines=0)
5- Question «subsidiaire» : manipulation du paquetage APCluster
Installer le paquetage APCluster
Polytech’Marseille Page 2 sur 3
-6 -4 -2 0 2 4 6 8
-4-3-2-10123
Individuals factor map (PCA)
Dim 1 (86.87%)
Dim2(11.42%)
Amsterdam
Athens
Berlin
Brussels
Budapest
Copenhagen
Dublin
Elsinki
Kiev
Krakow
LisbonLondon
Madrid
Minsk
Moscow
Oslo
Paris
Prague
Reykjavik
RomeSarajevo
Sofia
Stockholm
Antwerp
Barcelona
Bordeaux
Edinburgh
FrankfurtGeneva
Genoa
Milan
Palermo
Seville
St. Petersburg
Zurich
East
North
South
West
-1.0 -0.5 0.0 0.5 1.0
-1.0-0.50.00.51.0
Variables factor map (PCA)
Dim 1 (86.87%)
Dim2(11.42%)
January
February
March
April
May
JuneJuly
August
September
October
November
December
Annual
Amplitude
Latitude
Longitude
Polytech’Marseille Page 2 sur 4
49. P. Bellot
ACP et réduction de la dimension
• Une façon de représenter en quelques dimensions des nuages d’individus
— en conservant au mieux les distances entre les individus
— en privilégiant les dimensions de plus grande variabilité (sélection itérative
des facteurs qui maximisent la variance)
= application d’une fonction de projection
49
50. P. Bellot 50
Méthodes d’apprentissage
• Différentes formes d’apprentissage
• Agent « élève » recopie l’agent « maître » -- fournir des exemples
• Raisonnement par induction (à partir d’exemples)
• Apprentissage de caractéristiques importantes
• Détection de patterns récurrents
• Ajustement des paramètres importants
• Transformation d’informations en connaissances
Exemples -- Modèle -- Test -- Correction / Enrichissement des exemples
51. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Approches statistiques, probabilistes
Apprentissage automatique
51
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
Proceedings of the 18th International Conference on Machine Learning 2001 (ICML 2001)
Yi+1
-
c
6
s
Xi+1
Yi 1 Yi Yi+1
c
s
c
s
c
s
Xi 1 Xi Xi+1
nd the chain-structured case of CRFs (right) for sequences.
e training data. Both algorithms are based on the im-
ed iterative scaling (IIS) algorithm of Della Pietra et al.
7); the proof technique based on auxiliary functions
be extended to show convergence of the algorithms for
exp (
j
λjtj(yi−1, yi, x, i) +
k
µksk(yi, x, i)), (2)
(yi−1, yi, x, i) is a transition feature function of the entire observation
and the labels at positions i and i−1 in the label sequence; sk(yi, x, i)
feature function of the label at position i and the observation sequence;
nd µk are parameters to be estimated from training data.
n defining feature functions, we construct a set of real-valued features
the observation to expresses some characteristic of the empirical dis-
n of the training data that should also hold of the model distribution.
mple of such a feature is
i) =
1 if the observation at position i is the word “September”
0 otherwise.
ture function takes on the value of one of these real-valued observation
b(x, i) if the current state (in the case of a state function) or previous
ent states (in the case of a transition function) take on particular val-
feature functions are therefore real-valued. For example, consider the
g transition function:
tj(yi−1, yi, x, i) =
b(x, i) if yi−1 = IN and yi = NNP
0 otherwise.
e remainder of this report, notation is simplified by writing
s(yi, x, i) = s(yi−1, yi, x, i)
Fj(y, x) =
n
i=1
fj(yi−1, yi, x, i),
ch fj(yi−1, yi, x, i) is either a state function s(yi−1, yi, x, i) or a transi-
ction t(yi−1, yi, x, i). This allows the probability of a label sequence y
observation sequence x to be written as
p(y|x, λ) =
1
Z(x)
exp (
j
λjFj(y, x)). (3)
a normalization factor.
4
for classification because the tweet is too short,
therefore many tweets does not have any words
with salient Z_score. The three following figures
1,2,3 show the distribution of Z_score over each
class, we remark that the majority of terms has
Z_score between -1.5 and 2.5 in each class and
the rest are either vey frequent (2.5) or very rare
(-1.5). It should indicate that negative value
means that the term is not frequent in this class in
comparison with its frequencies in other classes.
Table1 demonstrates the first ten terms having
the highest Z_scores in each class. We have test-
ed to use different values for the threshold, the
best results was obtained when the threshold is 3.
positive
Z_score
negative
Z_score
Neutral
Z_score
Love
Good
Happy
Great
Excite
Best
Thank
Hope
Cant
Wait
14.31
14.01
12.30
11.10
10.35
9.24
9.21
8.24
8.10
8.05
Not
Fuck
Don’t
Shit
Bad
Hate
Sad
Sorry
Cancel
stupid
13.99
12.97
10.97
8.99
8.40
8.29
8.28
8.11
7.53
6.83
Httpbit
Httpfb
Httpbnd
Intern
Nov
Httpdlvr
Open
Live
Cloud
begin
6.44
4.56
3.78
3.58
3.45
3.40
3.30
3.28
3.28
3.17
Table1. The first ten terms having the highest Z_score in
each class
- Sentiment Lexicon Features (POL)
We used two sentiment lexicons, MPQA Subjec-
tivity Lexicon(Wilson, Wiebe et al. 2005) and
tweets using a given s
we could only downl
of protected profiles
we used the develop
tweets for evaluating o
the development set w
new model which pre
set 2013 and 2014.
4.2 Experiments
Official Results
The results of ou
SemEval evaluation
test set 2013 and 20
mention that these res
of a software bug dis
sion deadline, theref
demonstrated as non-
previous results are t
which is trained by al
but because of index
was represented by a
terms.
Non-official Results
We have done vario
features presented in S
Naïve-Bayes model. W
ture vector of tweet te
for test set 2013, 20
augmented this origin
Z_score for each term ti in a class Cj (tij) by cal-
culating its term relative frequency tfrij in a par-
ticular class Cj, as well as the mean (meani)
which is the term probability over the whole cor-
pus multiplied by nj the number of terms in the
class Cj, and standard deviation (sdi) of term ti
according to the underlying corpus (see Eq.
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
The term which has salient frequency in a class
in compassion to others will have a salient
Z_score. Z_score was exploited for SA by
(Zubaryeva and Savoy 2010) , they choose a
threshold (2) for selecting the number of terms
having Z_score more than the threshold, then
they used a logistic regression for combining
these scores. We use Z_scores as added features
for classification because the tweet is too short,
therefore many tweets does not have any words
Bing Liu's Opinion Lexicon
(Hu and Liu 2004) and augm
works. We extract the numb
tive and neutral words in tw
se lexicons. Bing Liu's le
negative and positive annot
contains negative, positive a
- Part Of Speech (POS)
We annotate each word in
tag, and then we compute
tives, verbs, nouns, adverb
each tweet.
4 Evaluation
4.1 Data collection
We used the data set provi
and 2014 for subtask B of
Twitter(Rosenthal, Ritter e
Kozareva et al. 2013). Th
provided with training twee
tive, negative or neutral. W
tweets using a given script.
we could only download 8
Quels sont les mots caractéristiques d’un groupe de documents ?
Quelles relations significatives à partir des seules formes observées ?
Analogies, corrélations
57. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
57
e
i
e
o
y
d
0
w
,
.
l
e
l
r
e
requirements into consideration.
The recommendations are timely.
A2. Interaction Adequacy
The recommender provides an adequate way for me to express
my preferences.
The recommender provides an adequate way for me to revise
my preferences.
The recommender explains why the products are
recommended to me.*
A3. Interface Adequacy
The recommender’s interface provides sufficient information.
The information provided for the recommended items is
sufficient for me.
The labels of the recommender interface are clear and
adequate.
The layout of the recommender interface is attractive and
adequate.*
A4. Perceived Ease of Use
A.4.1 Ease of Initial Learning
19
authors. Copying permitted only for private and academic purposes.
editors: Knijnenburg, B.P., Schmidt-Thieme, L., Bollen, D.
I became familiar with the recommender system very quickly.
I easily found the recommended items.
Looking for a recommended item required too much effort
(reverse scale).
A.4.2 Ease of Preference Elicitation
I found it easy to tell the system about my preferences.
It is easy to learn to tell the system what I like.
It required too much effort to tell the system what I like
(reversed scale).
A.4.3 Ease of Preference Revision
The recommender made me more confident about my
selection/decision.
The recommended items made me confused about my choice
(reverse scale).
The recommender can be trusted.
A8. Behavioral Intentions
A.8.1 Intention to Use the System
If a recommender such as this exists, I will use it to find
products to buy.
A.8.2 Continuance and Frequency
I will use this recommender again.*
FULL PAPER
Proceedings of the ACM RecSys 2010 Workshop on User-Centric Evaluation of Recommender Systems and Their Interfaces (UCERSTI),
Barcelona, Spain, Sep 30, 2010
Published by CEUR-WS.org, ISSN 1613-0073, online ceur-ws.org/Vol-612/paper3.pdf
I found it easy to tell the system about my preferences.
It is easy to learn to tell the system what I like.
It required too much effort to tell the system what I like
(reversed scale).
A.4.3 Ease of Preference Revision
I found it easy to make the system recommend different things
to me.
It is easy to train the system to update my preferences.
I found it easy to alter the outcome of the recommended items
due to my preference changes.
It is easy for me to inform the system if I dislike/like the
recommended item.
It is easy for me to get a new set of recommendations.
A.4.4 Ease of Decision Making
Using the recommender to find what I like is easy.
I was able to take advantage of the recommender very quickly.
I quickly became productive with the recommender.
Finding an item to buy with the help of the recommender is
easy.*
Finding an item to buy, even with the help of the
recommender, consumes too much time.
A5. Perceived Usefulness
The recommended items effectively helped me find the ideal
product.*
The recommended items influence my selection of products.
I feel supported to find what I like with the help of the
recommender.*
I feel supported in selecting the items to buy with the help of
the recommender.
A6. Control/Transparency
I feel in control of telling the recommender what I want.
A8.
A.8.
A.8.
A.8.
A.8.
6.
[1]
[2]
[3]
[4]
[5]
[6]Pu P, Chen L. A User-Centric Evaluation Framework of Recommender Systems. In : ACM RecSys 2010 Workshop on User-Centric Evaluation of
Recommender Systems and Their Interfaces ; 2010:14-22.
61. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Mesures d’évaluation
— Qualité de la prédiction : Mean Absolute Error, Root Mean Squared Error, Coverage
— Qualité de la recommandation : Precision, Recall, F1-Measure
61
which in-
res: mean
s of these:
nd fallout;
o the eval-
cy of vari-
s.
ents to the
mon to at-
on, recall,
considered
pic diversi-
ating algo-
s, even at
se aspects,
mendation
e methods
MAE ¼
1
#U
X
u2U
1
#Ou
X
i2Ou
jpu;i À ru;ij
!
ð1Þ
RMSE ¼
1
#U
X
u2U
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
#Ou
X
i2Ou
ðpu;i À ru;iÞ2
s
ð2Þ
The coverage could be defined as the capacity of predicting from
a metric applied to a specific RS. In short, it calculates the percent-
age of situations in which at least one k-neighbor of each active
user can rate an item that has not been rated yet by that active
user. We defined Ku,i as the set of neighbors of u which have rated
the item i. We define the coverage of the system as the average of
the user’s coverage:
Let
Cu ¼ fi 2 Ijru;i ¼ ^ Ku;i – £g; Du ¼ fi 2 Ijru;i ¼ g
coverage ¼
1
#U
X
u2U
100 Â
#Cu
#Du
ð3Þ
. / Knowledge-Based Systems 46 (2013) 109–132 117
squared error, normalized mean absolute error, ROC and fallout;
berg et al. [87] focuses on the aspects not related to the eval-
n, Breese et al. [43] compare the predictive accuracy of vari-
methods in a set of representative problem domains.
e majority of articles discuss attempted improvements to the
acy of RS results (RMSE, MAE, etc.). It is also common to at-
t an improvement in recommendations (precision, recall,
etc.). However, additional objectives should be considered
nerating greater user satisfaction [253], such as topic diversi-
on and coverage serendipity.
rrently, the field has a growing interest in generating algo-
s with diverse and innovative recommendations, even at
xpense of accuracy and precision. To evaluate these aspects,
us metrics have been proposed to measure recommendation
ty and diversity [105,220].
e frameworks aid in defining and standardizing the methods
lgorithms employed by RS as well as the mechanisms to eval-
the quality of the results. Among the most significant papers
propose CF frameworks are Herlocker et al. [92] which
ates the following: similarity weight, significance weighting,
nce weighting, selecting neighborhood and rating normaliza-
Hernández and Gaudioso [95] proposes a framework in which
RS is formed by two different subsystems, one of them to
the user and the other to provide useful/interesting items.
ika et al. [125] is a framework which introduces levels of
action in CF process, making the modifications in the RS more
le. Antunes et al. [12] presents an evaluation framework
ming that evaluation is an evolving process during the system
cle.
e majority of RS evaluation frameworks proposed until now
nt two deficiencies: the first of these is the lack of formal-
n. Although the evaluation metrics are well defined, there
RMSE ¼
1
#U
X
u2U
1
#Ou
X
i2Ou
ðpu;i À ru;iÞ2
The coverage could be defined as the capacity of predicting
a metric applied to a specific RS. In short, it calculates the pe
age of situations in which at least one k-neighbor of each
user can rate an item that has not been rated yet by that
user. We defined Ku,i as the set of neighbors of u which have
the item i. We define the coverage of the system as the aver
the user’s coverage:
Let
Cu ¼ fi 2 Ijru;i ¼ ^ Ku;i – £g; Du ¼ fi 2 Ijru;i ¼ g
coverage ¼
1
#U
X
u2U
100 Â
#Cu
#Du
4.2. Quality of the set of recommendations: precision, recall an
The confidence of users for a certain RS does not depend d
on the accuracy for the set of possible predictions. A user
confidence on the RS when this user agrees with a reduced
recommendations made by the RS.
In this section, we define the following three most widely
recommendation quality measures: (1) precision, which ind
the proportion of relevant recommended items from the
number of recommended items, (2) recall, which indicates th
portion of relevant recommended items from the number o
vant items, and (3) F1, which is a combination of precisio
recall.
Let Xu as the set of recommendations to user u, and Zu as t
of n recommendations to user u. We will represent the eval
erlocker et al. [92] which
ight, significance weighting,
rhood and rating normaliza-
oposes a framework in which
ubsystems, one of them to
de useful/interesting items.
which introduces levels of
modifications in the RS more
s an evaluation framework
g process during the system
meworks proposed until now
these is the lack of formal-
rics are well defined, there
ementation of the methods
specified, can lead to the
similar experiments. The
tandardization of the evalu-
novelty and trust of the
plete series of mathematical
uthors provide a set of eval-
uality analysis of the follow-
tions, novelty and trust.
election of the RS evaluation
he bibliography.
solute error, accuracy and
The confidence of users for a certain RS does not depend directly
on the accuracy for the set of possible predictions. A user gains
confidence on the RS when this user agrees with a reduced set of
recommendations made by the RS.
In this section, we define the following three most widely used
recommendation quality measures: (1) precision, which indicates
the proportion of relevant recommended items from the total
number of recommended items, (2) recall, which indicates the pro-
portion of relevant recommended items from the number of rele-
vant items, and (3) F1, which is a combination of precision and
recall.
Let Xu as the set of recommendations to user u, and Zu as the set
of n recommendations to user u. We will represent the evaluation
precision, recall and F1 measures for recommendations obtained
by making n test recommendations to the user u, taking a h rele-
vancy threshold. Assuming that all users accept n test
recommendations:
precision ¼
1
#U
X
u2U
#fi 2 Zujru;i P hg
n
ð4Þ
recall ¼
1
#U
X
u2U
#fi 2 Zujru;i P hg
#fi 2 Zujru;i P hg þ # i 2 Zc
u
ru;i P h
È É ð5Þ
F1 ¼
2  precision  recall
precision þ recall
ð6Þ
4.3. Quality of the list of recommendations: rank measures
62. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Mesures d’évaluation (2)
— Qualité d’une liste de recommandations (selon les rangs) : DCG au rang k :
. le gain apporté par un item est inversement lié à sa position dans la liste
. calculé pour chaque utilisateur (u) puis moyenne sur tous les utilisateur
nDCG est la version normalisée selon le « DCG idéal » (liste idéale)
— Nouveauté et diversité
62
t mean
out.
the RS
(ru,i =
em i on
r u hav-
system
ute dif-
informs
used are the following standard information retrieval measures:
(a) half-life (7) [43], which assumes an exponential decrease in
the interest of users as they move away from the recommenda-
tions at the top and (b) discounted cumulative gain (8) [17], wherein
decay is logarithmic.
HL ¼
1
#U
X
u2U
XN
i¼1
maxðru;pi
À d; 0Þ
2ðiÀ1Þ=ðaÀ1Þ
ð7Þ
DCGk
¼
1
#U
X
u2U
ru;p1
þ
Xk
i¼2
ru;pi
log2ðiÞ
!
ð8Þ
p1,. . .,pn represents the recommendation list, ru,pi represents
the true rating of the user u for the item pi, k is the rank of the eval-
uated item, d is the default rating, a is the number of the item on
the list such that there is a 50% chance the user will review that
item.
4.4. Novelty and diversity
The novelty evaluation measure indicates the degree of differ-
ence between the items recommended to and known by the user.
The diversity quality measure indicates the degree of differentia-
tion among recommended items.
Currently, novelty and diversity measures do not have a stan-
dard; therefore, different authors propose different metrics
[163,220]. Certain authors have [105] used the following:
diversityZu
¼
1
#Zuð#Zu À 1Þ
X
i2Zu
X
j2Zu;j–i
½1 À simði; jÞŠ ð9Þ
noveltyi ¼
1
#Zu À 1
X
j2Zu
½1 À simði; jÞŠ; i 2 Zu ð10Þ
Here, sim(i, j) indicates item to item memory-based CF similar-
ity measures. Zu indicates the set of n recommendations to user u.
4.5. Stability
The stability in the predictions and recommendations influ-
ences on the users’ trust towards the RS. A RS is stable if the pre-
dicitions it provides do not change strongly over a short period
4.6. Re
The
about
recom
{1,. . .,
value o
degree
value
able if
obtain
In H
ing the
ble to
measu
cross v
ated to
vides a
which
probab
quentl
provid
for tak
measu
algorit
The
based
ity of
measu
Fig. 7. Recommender systems evalu
p1,. . .,pn represents the recommendation list, ru,pi represents
the true rating of the user u for the item pi, k is the rank of the eval-
uated item, d is the default rating, a is the number of the item on
the list such that there is a 50% chance the user will review that
item.
4.4. Novelty and diversity
The novelty evaluation measure indicates the degree of differ-
ence between the items recommended to and known by the user.
The diversity quality measure indicates the degree of differentia-
tion among recommended items.
Currently, novelty and diversity measures do not have a stan-
dard; therefore, different authors propose different metrics
[163,220]. Certain authors have [105] used the following:
diversityZu
¼
1
#Zuð#Zu À 1Þ
X
i2Zu
X
j2Zu;j–i
½1 À simði; jÞŠ ð9Þ
noveltyi ¼
1
#Zu À 1
X
j2Zu
½1 À simði; jÞŠ; i 2 Zu ð10Þ
Here, sim(i, j) indicates item to item memory-based CF similar-
4.6. Reliab
The re
about ho
recomme
{1,. . .,5},
value of p
degree th
value 4.5
able if it
obtained
In Her
ing the us
ble to be
measure
cross vali
ated to a
vides a p
which us
probably
quently,
provides
p1,. . .,pn represents the recommendation list, ru,pi represents
the true rating of the user u for the item pi, k is the rank of the eval-
uated item, d is the default rating, a is the number of the item on
the list such that there is a 50% chance the user will review that
item.
4.4. Novelty and diversity
The novelty evaluation measure indicates the degree of differ-
ence between the items recommended to and known by the user.
The diversity quality measure indicates the degree of differentia-
tion among recommended items.
Currently, novelty and diversity measures do not have a stan-
dard; therefore, different authors propose different metrics
[163,220]. Certain authors have [105] used the following:
diversityZu
¼
1
#Zuð#Zu À 1Þ
X
i2Zu
X
j2Zu;j–i
½1 À simði; jÞŠ ð9Þ
noveltyi ¼
1
#Zu À 1
X
j2Zu
½1 À simði; jÞŠ; i 2 Zu ð10Þ
Here, sim(i, j) indicates item to item memory-based CF similar-
ity measures. Zu indicates the set of n recommendations to user u.
4.5. Stability
The stability in the predictions and recommendations influ-
ences on the users’ trust towards the RS. A RS is stable if the pre-
dicitions it provides do not change strongly over a short period
of time. Adomavicius and Zhang [4] propose a quality measure of
stability, MAS (Mean Absolute Shift). This measure is defined
4.6. R
Th
abou
recom
{1,. . .
value
degre
value
able
obtai
In
ing th
ble to
meas
cross
ated
vides
whic
proba
quen
provi
for ta
meas
algor
Th
based
ity o
meas
ratin
follow
Fig. 7. Recommender systems eva
63. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Mesures d’évaluation (3)
— D’autres mesures orientées utilisateurs
— Pertinence (accuracy) perçue par l’utilisateur
— Familiarité : les items sont connus (leur existence) des utilisateurs
— Nouveauté : découverte d’items nouveaux
— Attractivité : les items attirent les utilisateurs (pas toujours le cas d’items
pertinents…)
— Utilité : les items ont été appréciés (après usage / lecture)
— Compatibilité avec le contexte de l’utilisateur
— Niveau de l’interaction
— Contrôle des paramètres
— Explications de la recommandation
— Transparence de la méthode
63
Pu P, Chen L. A User-Centric Evaluation Framework of Recommender Systems. In : ACM RecSys 2010 Workshop on User-Centric Evaluation of
Recommender Systems and Their Interfaces ; 2010:14-22.
66. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Filtrage collaboratif
Nous sommes des êtres « sociaux »
— Les « autres » dictent / influencent nos choix
— Nos relations sont typées (amis / ennemis, famille, relations professionnelles…)
— « Dis moi qui sont tes amis, je te dirai qui tu es » — homophilie
66
C Pairs (user, item) that have not been voted for and k
accept predictions
D Pairs (user, item) that have not been voted for
Exy Items that have recently been voted for by both user x ft userl,
and user y user2
S„ User's recent votes user, ji
Table 4
Running example: RS database.
ru,¡ h h h
u Í5 Í6 h Í8 h ho in Í12 Í13 Í14
Ui 5 • 3 • 4 • • 4 • 2 4 •
u?. 1 • 2 4 1 4 1
U3 5 2 4 • • 3 5 4 • • 4 •
U4 4 • 3 • • • 5 4 • • • •
¡A 3 3 4 5 • • 5 •
2.3. Obtaining a user's K-neighbors
2.3.1. Formalization
aggre
Let G
Pu,i =
Pu,i =
Pu,i =
where
W
none
make
the R
pleme
Notes explicites vs. notes implicites (nombre d’accès ou de citations, temps passé…)
Notes à prédire
69. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Quelles fonctions de similarité ?
- Corrélation de Pearson
- Corrélation de Spearman sur les rangs
- Cosinus
- Distance euclidienne
- Métriques plus complexes:
- JMSD pour intégrer des informations non numériques
(combinaison de Pearson et de Jaccard)
- « Optimum de Pareto » pour filtrer les individus les moins
représentatifs
- Intégration des scores des autres individus / autres items
69
lu;i ¼ 1
X
n2Gu;i
simðu; nÞ () Gu;i – £
,
:
The most popular similarity metrics are Pearson
cosine (7), constrained Pearson’s correlation (8)
rank correlation (9):
sim x; yð Þ ¼
P
i rx;i À rx
À Á
ry;i À ry
À Á
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i rx;i À rx
À Á2P
i ry;i À ry
À Á2
q ;
sim x; yð Þ ¼
P
irx;iry;i
ffiffiffiffiffiffiffiffiffiffiffiffiP
ir2
x;i
q ffiffiffiffiffiffiffiffiffiffiffiffiP
ir2
y;i
q ;
n2Gu;i
pu;i ¼ ru þ lu;i
X
n2Gu;i
sim u; nð Þ rn;i À rn
À Á
()
where l serves as a normalizing factor, usu
lu;i ¼ 1
X
n2Gu;i
simðu; nÞ () Gu;i – £
,
:
The most popular similarity metrics are
cosine (7), constrained Pearson’s correlat
rank correlation (9):
sim x; yð Þ ¼
P
i rx;i À rx
À Á
ry;i À ry
À Á
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i rx;i À rx
À Á2P
i ry;i À ry
À Á2
q ;
sim x; yð Þ ¼
P
irx;iry;i
ffiffiffiffiffiffiffiffiffiffiffiffiP
ir2
x;i
q ffiffiffiffiffiffiffiffiffiffiffiffiP
ir2
y;i
q ;
publications and reviews also exist which include the most com-
monly accepted metrics, aggregation approaches and evaluation
measures: mean absolute error, coverage, precision, recall and
derivatives of these: mean squared error, normalized mean absolute
error, ROC and fallout; Goldberg et al. [13] focuses on the aspects not
related to the evaluation, Breese et al. [6] compare the predictive
accuracy of various methods in a set of representative problem
domains. Candillier et al. [7] and Schafer et al. [36] review the main
collaborative filtering methods proposed in the literature.
The rest of the paper is structured as follows:
In Section 2 we provide the basis for the principles on which the
design of the new metric will be based, we present graphs
which show the way in which the users vote, we carry out
experiments which support the decisions made, we establish
the best way of selecting numerical and non-numerical infor-
mation from the votes and, finally, we establish the hypothesis
on which the paper and its proposed metric are based.
In Section 3 we establish the mathematical formulation of the
metric.
In Sections 4 and 5, respectively, we list the experiments that
will be carried out and we present and discuss the results
obtained.
Section 6 presents the most relevant conclusions of the
publication.
2. Approach and design of the new similarity metric
2.1. Introduction
Collaborative filtering methods work on a table of U users who
can rate I items. The prediction of a non-rated item i for a user u is
sim x; yð Þ ¼
P
i rx;i À rmed
À Á
ry;i À rmed
À Á
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i rx;i À rmed
À Á2
q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i ry;i À rmed
À Á2
q ;
rmed : median value in the rating scale;
sim x; yð Þ ¼
P
i rankx;i À rankx
ranky;i À ranky
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i rankx;i À rankx
2P
i ranky;i À ranky
2
r :
Although Pearson correlation is the most commonly used
ric in the process of memory-based CF (user to user), this cho
not always backed by the nature and distribution of the data i
RS. Formally, in order to be able to apply this metric with gu
tees, the following assumptions must be met:
Linear relationship between x and y.
Continuous random variables.
Both variables must be normally distributed.
These conditions are not normally met in real RS, and Pea
correlation presents some significant cases of erroneous oper
that should not be ignored in RS.
Despite the deficiencies of Pearson correlation, this simi
measure presents the best prediction and recommendation re
in CF-based RS [15,16,31,7,35], furthermore, it is the most
monly used, and therefore, any alternative metric proposed
improve its results.
On accepting that Pearson correlation is the metric for w
the results must be improved, but not necessarily the most ap
priate to be taken as a base, it is advisable to focus on the info
tion that is obtained in the different research processes and w
can sometimes be overlooked when searching for other diff
J. Bobadilla et al. / Knowledge-Based Systems 23 (2010) 520–528
x y
using standardized values [0..1]:
rx : 0:75; 1; ; 0:5; 0:25; ; 0; 0ð Þ;
ry : 0:75; 0:5; 0; 0:25; ; 0:5; 0:75; ð Þ:
We define the cardinality of a list: #l as the number of elements
in the list l different to .
(1) We obtain the list
dx;y : d
1
x;y; d
2
x;y; d
3
x;y; . . . ; d
I
x;y
j
d
i
x;y ¼ ri
x À ri
y
2
8ijri
x – ^ri
y – ; d
i
x;y ¼ 8ijri
x ¼ _ ri
y ¼ ;
ð10Þ
in our example:
dx;y ¼ ð0; 0:25; ; 0:0625; ; ; 0:5625; Þ:
(2) We obtain the MSD(x,y) measure computing the arithmetic
average of the values in the list dx,y
MSDðx; yÞ ¼ dx;y ¼
P
i¼1::I;di
x;y–
d
i
x;y
#dx;y
; ð11Þ
in our example:
dx;y ¼ ð0 þ 0:25 þ 0:0625 þ 0:5625Þ=4 ¼ 0:218
MSD(x,y) (11) tends towards zero as the ratings of users x and
y become more similar and tends towards 1 as they became
more different (we assume that the votes are normalized in
the interval [0..1]).
(3) We obtain the Jaccard(x,y) measure computing the propor-
tion between the number of positions [1..I] in which there
are elements different to in both rx and ry regarding the
number of positions [1..I] in which there are elements differ-
ent to in rx or in ry:
Jaccardðx; yÞ ¼
rx ry
rx [ ry
¼
#dx;y
#rx þ #ry À #dx;y
; ð12Þ
in our example: 4/(6 + 6À4) = 0.5.
(4) We combine the above elements in the final equation:
newmetric x; yð Þ ¼ Jaccard x; yð Þ Â 1 À MSD x; yð Þð Þ; ð13Þ
in the running example:
Table 1
Main parameters of the databases used in the experiments.
MovieLens FilmAffinity NetFlix
Number of users 4382 26447 480189
Number of movies 3952 21128 17770
Number of ratings 1000209 19126278 100480507
Min and max values 1–5 1–10 1–5
Ortega, F., SáNchez, J. L., Bobadilla, J., GutiéRrez, A. (2013). Improving collaborative filtering-based recommender systems results using Pareto dominance. Information Sciences, 239, 50-61.
71. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 71
commonly used due to its low capacity to produce new recom-
mendations.
MSD offers both a great advantage and a great disadvantage at
the same time; the advantage is that it generates very good general
results: low average error, high percentage of correct predictions
and low percentage of incorrect predictions: the disadvantage is
that it has an intrinsic tendency to choose as similar users to one
given user those users who have rated a very small number of
items [35], e.g. if we have 7 items that can be rated from 1 to 5
and three users u1, u2, u3 with the following ratings: u1: (, , 4,
5, , , ), u2: (3, 4, 5, 5, 1, 4, ), u3: (3, 5, 4, 5, , 3, ) ( means
not rated item), the MSD metric will indicate that (u1,u3) have a to-
tal similarity (0), (u1,u2) have a similarity 0.5 and (u2,u3) have a
lower similarity (0.6). This situation is not convincing, as intuitively
we realize u2 and u3 are very similar, whilst u1 is only similar to u2
and u3 in 2 ratios, and, therefore, it is not logical to choose it as the
most similar to them, and what is worse, if it is chosen it will not
provide us with possibilities to recommend new items.
The strategy to follow to design the new metric is to consider-
ably raise the capacity to generate MSD predictions, without losing
along the way its good behavior as regards accuracy and quality of
the results.
The metric designed is based on two factors:
The similarity between two users calculated as the mean of the
squared differences (MSD): the smaller these differences, the
greater the similarity between the 2 users. This part of the met-
ric enables very good accuracy results to be obtained.
The number of items in which both one user and the other have
made a rating regarding the total number of items which have
been rated between the two users. E.g. given users u1: (3, 2,
4, , , ) and u2: (, 4, 4, 3, , 1), a common rating has been
made in two items as regards a joint rating of five items. This
factor enables us to greatly improve the metric’s capacity to
make predictions.
An important design aspect is the decision whether not to use a
parameter for which the value should be given arbitrarily, i.e. the
result provided by the metric should be obtained by only taking
the values of the ratings provided by the users of the RS.
By working on the 2 factors with standardized values [0..1], the
metric obtained is as follows: Given the lists of ratings of 2 generic
users x and y: rx; ry
À Á
: r1
x ; r2
x ; r3
x ; . . . ; rI
x
À Á
; r1
y ; r2
y ; r3
y ; ; . . . ; rI
y
j I is the
number of items of our RS, where one of the possible values of each
Fig. 5. MAE and coverage obtained with Pearson correlation and by combining Jaccard with Pearson correlation, cosine, constrained Pearson’s correlation, Spearman rank
correlation and mean squared differences. (A) MAE, (B) Coverage. MovieLens 1M, 20% of test users, 20% of test items, k e [2..1500] step 25.
Fig. 4. Measurements related to the Jaccard metric on MovieLens. (A) Number of pairs of users that display the Jaccard values represented on the x axis. (B) Averaged MAE
obtained in the pairs of users with the Jaccard values represented on the x axis. (C) Averaged coverages obtained in the pairs of users with the Jaccard values represented on
the x axis.
Bobadilla, J., Serradilla, F., Bernal, J. (2010). A new collaborative filtering metric that improves the behavior of
recommender systems. Knowledge-Based Systems, 23(6), 520-528.The comparative results in Graph 6B show improvements of up
to 9% when applying the new metric as regards the correlation.
ment in the results of the new metric regarding correlation, even
by 15% in some cases.
Fig. 6. Pearson correlation and new metric comparative results using MovieLens: (A) accuracy, (B) coverage, (C) percentage of perfect predictions, (D) precision/recall. 20% of
test users, 20% of test items, k e [2..1500] step 50, N e [2..20], h = 5.
Fig. 7. Correlation and new metric comparative results using NetFlix: (A) accuracy, (B) coverage, (C) percentage of perfect predictions, (D) precision/recall. 5% of test users,
20% of test items, k e [2..10000] step 100, N e [2..20], h = 9.
74. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Le problème du démarrage à froid
— Nouvelle application
— Recommandation éditoriale
— Encourager les utilisateurs à
donner des avis
— Nouvel utilisateur
— Exploiter autant que possible
d’autres informations sur l’utilisateur
— formulaires,
— amis sur les réseaux sociaux
(= demander l’accès)
— préférences sous forme de
tags…
— Nouvel item
— Exploiter les méta-données (pour
un film : année, réalisateur,
acteurs…)
— Exploiter les critiques que l’on
peut trouver par ailleurs sur le Web
74
78. Amazon : Organisation des objets (catégories)
78
Product Advertising API https://aws.amazon.com/
cf. http://www.codediesel.com/libraries/amazon-advertising-api-browsenodes/
79. Similarités et espaces latents
79
Koren Y, Bell R, Volinsky C. Matrix Factorization Techniques for Recommender Systems. IEEE Computer. July 2009:42-50.
80. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Projection de la matrice individus / items
— Chaque item I est représenté par un vecteur q de dimension f
— Chaque utilisateur U est représenté par un vecteur p de dim. f
— Chaque facteur représente une propriété latente qui caractérise
les items et qui souligne l’intérêt des utilisateurs pour celle-ci
— Le produit scalaire entre q et p est une estimation de l’intérêt de U pour I
— Méthode :
— Décomposition en valeurs singulières
— Approximation par descente de gradient (sur des données d’apprentissage)
80
note réelle note prédite facteur de régularisation
constante de régularisation (apprise par validation croisée)
81. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Espaces latents (suite)
— Espace non convexe : risque de solution éloignée de l’optimum global
— Approche par Moindres Carrés Alternés (Alternating Least Squares)
. Fixe q, cherche p ; fixe p, cherche q etc.
. Utile lorsque les données (notes d’apprentissage) sont implicites (matrice non
creuse)
— Tenir compte des biais = modifier les valeurs prédites
— Des utilisateurs ont tendance à toujours donner de bonnes notes
— Certains items ont toujours tendance à avoir de bonnes notes
— Le score final doit dépendre de la moyenne de tous les scores (base de départ)
— Intégrer les préférences a priori des utilisateurs (x : items préférés de u ; y: attributs (âge…))
— Tenir compte de la dynamique
81
89. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Intégration du contexte (suite)
— Cube Individus x Items x Contextes remplace la matrice Individus x Items
— Factorisation de tenseurs
89
Karatzoglou, A.; Amatriain, X.; Baltrunas, L.; and Oliver, N. 2010. Multiverse Recommendation: N-Dimensional Tensor Factorization
for Context-Aware Collaborative Filtering. In Proceedings of the 2010 ACM Conference on Recommender Systems, 79–86.
90. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Exploitation des liens (réseaux sociaux)
— Le réseau social comme entrée supplémentaire
90
Yang X, Guo Y, Liu Y, Steck H. A survey of collaborative filtering based social recommender systems. Computer
Communications. 2014;41(C):1-10. doi:10.1016/j.comcom.2013.06.009.
91. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Exploitation des liens (réseaux sociaux) (2)
— Prédiction selon les liens entre individus (inférence Bayésienne)
91
individu qui cherche une note
individus qui ont noté l’item
individus intermédiaires
qui réunissent les notes
99. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Recommending Books vs Searching for Books ?
Very diverse needs :
— Topicality
— With a precise context eg. arts in China during the XXth century
— With named entities : locations (the book is about a specific location OR the action
takes place at this location), proper names…
— Style / Expertise / Language
— fiction, novel, essay, proceedings, position papers…
— for experts / for dummies / for children …
— in English, in French, in old French, in (very) local languages …
— looking for citations / references
— in what book appears a given citation
— what are the books that refer to a given one
— Authority :
— What are the most important books about … (what most important means ?)
— What are the most popular books about …
99
100. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 100
http://social-book-search.humanities.uva.nl/#/overview
2 The Amazon collection
The document used for this year’s Book Track is composed of Amazon pages of
existing books. These pages consist of editorial information such as ISBN num-
ber, title, number of pages etc... However, in this collection the most important
content resides in social data. Indeed Amazon is social-oriented, and user can
comment and rate products they purchased or they own. Reviews are identi-
fied by the review fields and are unique for a single user: Amazon does not
allow a forum-like discussion. They can also assign tags of their creation to a
product. These tags are useful for refining the search of other users in the way
that they are not fixed: they reflect the trends for a specific product. In the
XML documents, they can be found in the tag fields. Apart from this user
classification, Amazon provides its own category labels that are contained in the
browseNode fields.
Table 1. Some facts about the Amazon collection.
Number of pages (i.e. books) 2, 781, 400
Number of reviews 15, 785, 133
Number of pages that contain a least a review 1, 915, 336
3 Retrieval model
3.1 Sequential Dependence Model
Like the previous year, we used a language modeling approach to retrieval [4].
We use Metzler and Croft’s Markov Random Field (MRF) model [5] to integrate
multiword phrases in the query. Specifically, we use the Sequential Dependance
Organizers
Marijn Koolen (University of Amsterdam)
Toine Bogers (Aalborg University Copenhagen)
Antal van den Bosch (Radboud University Nijmegen)
Antoine Doucet (University of Caen)
Maria Gaede (Humboldt University Berlin)
Preben Hansen (Stockholm University)
Mark Hall (Edge Hill University)
Iris Hendrickx (Radboud University Nijmegen)
Hugo Huurdeman (University of Amsterdam)
Jaap Kamps (University of Amsterdam)
Vivien Petras (Humboldt University Berlin)
Michael Preminger (Oslo and Akershus University College of Applied Sciences)
Mette Skov (Aalborg University Copenhagen)
Suzan Verberne (Radboud University Nijmegen)
David Walsh (Edge Hill University)
110. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 110
9
La Recommandation de Livres / RI
SBS 2016 - Retrieval Model : Méthode - SDM
Weighting query terms [Metzler2005]
● Unigram matches
● Bigram exact matches
● Bigram matches within an
un-ordered window of 8 terme
Université Aix-Marseille Amal Htait
112. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 112
Koolen, M., Bogers, T., Gäde, M., Hall, M., Hendrickx, I., Huurdeman, H., ... Walsh, D. (2016, September). Overview of the CLEF 2016 Social
Book Search Lab. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 351-370). Springer
International Publishing.
115. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Building a Graph of Books
— Nodes = books + properties (metadata, #reviews and ranking, page ranks, ratings…)
— Edges = links between books
— Book A refers to Book B according to:
— Bibliographic references and citations (in the book / in the reviews)
— Amazon recommendation (People who bought A bought B, People who liked A liked B…)
— A is similar to B
— They share bibliographic references
— Full-text similarity + similarity between the metadata
115
The graph allows to
estimate
— « Book Ranks »
(cf. the Google’s Page Rank)
— Neighborhood
— Shortest paths
116. P. Bellot (AMU-CNRS, LSIS-OpenEdition) 116
Jeh, G., Widom, J. (2002, July). SimRank: a measure of structural-context similarity. In Proceedings of the eighth ACM
SIGKDD international conference on Knowledge discovery and data mining (pp. 538-543). ACM.
117. Recommending books : IR + graph mining
117
IR : Sequential Dependance Model (SDM) - Markov Random Field (Metzler Croft, 2004) and/or Divergence
From Randomness (InL2) model + Query Expansion with Dependance Analysis
Ratings : The more a book has reviews and the more it has good ratings, the more relevant it is.
Graph : Expanding the retrieved books with Similar Books then Reranking with PageRank
13
●
We tested many reranking methods. Combining the
retrieval model scores and other scores based on social
information.
●
For each document compute:
– PageRank: algorithm that exploits link structure to score
the important of nodes in the graph.
– Likeliness: Computed from information generated by
users (reviews and ratings). More the book has a lot of
reviews and good ratings, the more interesting it is.
Graph Modeling – Reranking Schemes
12
ti Retrieving
Collection
DGD
Dti
DStartingNodes
Neighbors
SPnodes
DgraphDgraph
Delete
duplications
D nal
1 2
3
5
6
7
89 + 10
Reranking
11
Graph Modeling - Recommendation
Page Rank + Similar Products - Very good results in 2011
(Judgements obtained by crowdsourcing)
(IR and ratings)
P@10 ≈ 0.58
- Good results in 2014
(IR, ratings, expansion)
P@10 ≈ 0.23 ; MAP ≈ 0.44
- in 2015 : rank 25/47
(IR + graph
but graph improved IR)
P@10 ≈ 0.2
(best 0.39, included
the price of books)
118. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Une perspective: fouille de graphes multicouches
— Thèse de Mohamed Ettaleb (co-dirigée par Pr. C. Latiri, B. Douhar, P. Bellot)
118
Livres « similaire à »
couche
« achetés
ensemble »
couche
auteurs
couche
tags
Question : quels sous-graphes fréquents ? comment les interpréter ?
119. 119
Et dans la vraie vie ? (pour nous : OpenEdition)
*ProvenanceDBased*Access*Control*for*the*Cloud!
VGOLAP:*Volunteered*Geographic*OLAP*
BILBO
ÉCHO
CLASSIFICATION
AUTOMATIQUE
ET MÉTADONNÉES
RECOMMANDATION
GRAPHE
DE CONTENUS
questions de
communication
vertigo
edc
echogeo
vertigo
quaderni
BILBO - MISE EN RELATION
DES COMPTES-RENDUS
AVEC LES LIVRES
ÉCHO - ANALYSE
DES SENTIMENTS
Langouet, G., (1986), « Innovations pédago-
giques et technologies éducatives », Revue
française de pédagogie, n° 76, pp. 25-30.
Langouet, G., (1986), « Innovations pédagogiques et technologies éducatives
», Revue française de pédagogie, n° 76, pp. 25-30.
DOI : 10.3406/rfp.1986.1499
18 Voir Permanent Mandates Commission, Minutes of
the Fifteenth Session (Geneva: League of Nations,
1929), pp. 100-1. Pour plus de détails, voir Paul Ghali,
Les nationalités détachées de l’Empire ottoman à la
suite de la guerre (Paris: Les Éditions Domat-Mont-
chrestien, 1934), pp. 221-6.
ils ont déjà édité trois recueils et auxquelles ils ont consacré
de nombreux travaux critiques. Leur nouvel ouvrage, intitulé
Le Roman véritable. Stratégies préfacielles au XVIIIe siècle
et rédigé à six mains par Jan Herman, Mladen Kozul et
Nathalie Kremer – chaque auteur se chargeant de certains
chapitres au sein
BILBO
NIVEAU 1
NIVEAU 2
NIVEAU 3
biblauthorsurnameLangouet/surname,
forenameG./forename,/author (date1986/date),
title level=a« Innovations pédagogiques et
technologies éducatives »/title, title
level=jRevue française de pédagogie/title,
abbrn°/abbr biblScope
type=issue76/biblScope, abbrpp./abbr
biblScope type=page25-30/biblScope. idno
type=DOIDOI : 10.3406/rfp.1986.1499/idno/bibl
RI sociale
Extraction d’information par Programmation Logique Inductiveles de langue temporels et apprentissage de méta-caractéristiques
OpenEdition
OpenEdition
Univ. Recife (Brésil)
Extraction d’information
Chercher des critiques
Les reliers aux livres
Analyse de sentiments Recommandation de livres
SVM - Z-score - CRF Graph scoring
NOTES
POLARITE
GRAPHE
RECOMMANDATION
Analyse de citations
120. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Identifier des critiques de livres dans des blogs
• Classification supervisée « en genre »
• Caractéristiques : unigrammes, localisation des entités nommées, dates
• Sélection de caractéristiques : Seuil du Z-score + random forest
120
tations (bag-of-words, feature selection using z-
score, Named Entity repartition in the text).
6.1 Naive Bayes (NB)
In order to evaluate different classification mod-
els, we have adopted as a baseline the naive Bayes
approach (Zubaryeva and Savoy, 2010). The clas-
sification system has to choose between two pos-
sible hypotheses: h0 = It is a Review and h1 =
It is not a Review the class that has the maxi-
mum value according to the Equation (5). Where
|w| indicates the number of words included in the
current document and wj is the number of words
that appear in the document.
arg max
hi
P(hi).
|w|
Y
j=1
P(wj|hi) (5)
where P(wj|hi) =
tfj,hi
nhi
We estimate the probabilities with the Equation
(5) and get the relation between the lexical fre-
quency of the word wj in the whole size of the
collection Thi
(denoted tfj,hi
) and the size of the
corresponding corpus.
We have used different strategies to represent each
textual unit. First, the unigram model (Bag-of-
Words) where all words are considered as features.
We also used feature selection based on the nor-
malized z-score by keeping the first 1000 words
according to this score (after removing all words
that appear less than 5 times). As the third ap-
proach, we suggested that the common features
between the Review collection can be located in
the Named Entity distribution in the text.
Table 4: Results showing the performances of
the classification models using different indexing
schemes on the test set. The best values for the
Review class are noted in bold and those for
Review class are are underlined
Review Review
# Model R P F-M R P F-M
1 NB 65.5% 81.5% 72.6% 81.6% 65.7% 72.8%
SVM (Linear) 99.6% 98.3% 98.9% 97.9% 99.5% 98.7%
SVM (RBF) 89.8% 97.2% 93.4% 96.8% 88.5% 92.5%
* C = 5.0
* = 0.00185
2 NB 90.6% 64.2% 75.1% 37.4% 76.3% 50.2%
SVM (Linear) 87.2% 81.3% 84.2% 75.3% 82.7% 78.8%
SVM (RBF) 87.2% 86.5% 86.8% 83.1% 84.0% 83.6%
* C = 32.0
* = 0.00781
3 NB 80.0% 68.4% 73.7% 54.2% 68.7% 60.6%
SVM (Linear) 77.0% 81.9% 79.4% 78.9% 73.5% 76.1%
SVM (RBF) 81.2% 48.6% 79.9% 72.6% 75.8% 74.1%
* C = 8.0
* = 0.03125
Z scores across the corpus.
# Feature Z
score
# Feature Z
score
1 abandonne 30.14 16 winter 9.23
2 seront 30.00 17 cleo 8.88
3 biographie 21.84 18 visible 8.75
4 entranent 21.20 19 fondamentale 8.67
5 prise 21.20 20 david 8.54
6 sacre 21.20 21 pratiques 8.52
7 toute 20.70 22 signification 8.47
8 quitte 19.55 23 01 8.38
9 dimension 15.65 24 institutionnels 8.38
10 les 14.43 25 1930 8.16
11 commandement 11.01 26 attaques 8.14
12 lie 10.61 27 courrier 8.08
13 construisent 10.16 28 moyennes 7.99
14 lieux 10.14 29 petite 7.85
15 garde 9.75 30 adapted 7.84
In our training corpus, we have 106 911 words
obtained from the Bag-of-Words approach. We se-
lected all tokens (features) that appear more than
5 times in each classes. The goal is therefore to
design a method capable of selecting terms that
clearly belong to one genre of documents. We ob-
tained a vector space that contains 5 957 words
(features). After calculating the normalized z-
score of all features, we selected the first 1 000
features according to this score.
(Poibeau, 2003). We aim to explore the distribu-
tion of 3 named entities (”authors’ names”, ”loca-
tions” and ”dates”) in the text after removing all
XML-HTML tags. After that, we divided texts
into 10 parts (the size of each part = total num-
ber of words / 10). The distribution ratio of each
named entity in each part is used as feature to build
the new document representation and we obtained
a set of 30 features.
Figure 3: ”Person” named entity distribution
6 Experiments
In this section we describe results from experi-
ments using a collection of documents from Re-
vues.org and the Web. We use supervised learning
Figure 4: ”Location” named entity distribution
Figure 5: ”Date” named entity distribution
methods to build our classifiers, and evaluate the
resulting models on new test cases. The focus of
our work has been on comparing the effectiveness
of different inductive learning algorithms (Naive
Bayes, Support Vector Machines with RBF and
Linear Kernels) in terms of classification accuracy.
We also explored alternative document represen-
tations (bag-of-words, feature selection using z-
score, Named Entity repartition in the text).
6.2 Supp
SVM desig
by Vapnik
recognition
method is
mization p
tional learn
learn linea
a simple p
tion, they
radial basi
layer sigm
key in suc
mal bound
use them f
garwal and
the differen
used the W
model with
Basic Func
a good lev
growth of
stage.(Kum
6.3 Resu
We have us
textual uni
Words) wh
Figure 4: ”Location” named entity distribution
Figure 5: ”Date” named entity distribution
methods to build our classifiers, and evaluate the
resulting models on new test cases. The focus of
our work has been on comparing the effectiveness
of different inductive learning algorithms (Naive
Bayes, Support Vector Machines with RBF and
Linear Kernels) in terms of classification accuracy.
We also explored alternative document represen-
tations (bag-of-words, feature selection using z-
score, Named Entity repartition in the text).
6.1 Naive Bayes (NB)
In order to evaluate different classification mod-
els, we have adopted as a baseline the naive Bayes
approach (Zubaryeva and Savoy, 2010). The clas-
sification system has to choose between two pos-
sible hypotheses: h0 = It is a Review and h1 =
It is not a Review the class that has the maxi-
mum value according to the Equation (5). Where
6.2 Support Vector Machines (SVM)
SVM designates a learning approach introduced
by Vapnik in 1995 for solving two-class pattern
recognition problem (Vapnik, 1995). The SVM
method is based on the Structural Risk Mini-
mization principle (Vapnik, 1995) from computa-
tional learning theory. In their basic form, SVMs
learn linear threshold function. Nevertheless, by
a simple plug-in of an appropriate kernel func-
tion, they can be used to learn linear classifiers,
radial basic function (RBF) networks, and three-
layer sigmoid neural nets (Joachims, 1998). The
key in such classifiers is to determine the opti-
mal boundaries between the different classes and
use them for the purposes of classification (Ag-
garwal and Zhai, 2012). Having the vectors form
the different representations presented below. we
used the Weka toolkit to learning model. This
model with the use of the linear kernel and Radial
Basic Function(RBF) sometimes allows to reach
a good level of performance at the cost of fast
growth of the processing time during the learning
stage.(Kummer, 2012)
6.3 Results
We have used different strategies to represent each
textual unit. First, the unigram model (Bag-of-
Words) where all words are considered as features.
We also used feature selection based on the nor-
malized z-score by keeping the first 1000 words
according to this score (after removing all words
that appear less than 5 times). As the third ap-
proach, we suggested that the common features
between the Review collection can be located in
the Named Entity distribution in the text.
121. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
L’analyse de sentiments sur les critiques
• Statistical Metrics (PMI, Z-score, odd ratio…)
• Combined with Linguistic Ressources
121
Z_score for each term ti in a class Cj (tij) by cal-
culating its term relative frequency tfrij in a par-
ticular class Cj, as well as the mean (meani)
which is the term probability over the whole cor-
pus multiplied by nj the number of terms in the
class Cj, and standard deviation (sdi) of term ti
according to the underlying corpus (see Eq.
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
The term which has salient frequency in a class
in compassion to others will have a salient
Z_score. Z_score was exploited for SA by
(Zubaryeva and Savoy 2010) , they choose a
threshold (2) for selecting the number of terms
having Z_score more than the threshold, then
they used a logistic regression for combining
Bing Liu's Opinion Lexicon which is created by
(Hu and Liu 2004) and augmented in many latter
works. We extract the number of positive, nega-
tive and neutral words in tweets according to the-
se lexicons. Bing Liu's lexicon only contains
negative and positive annotation but Subjectivity
contains negative, positive and neutral.
- Part Of Speech (POS)
We annotate each word in the tweet by its POS
tag, and then we compute the number of adjec-
tives, verbs, nouns, adverbs and connectors in
each tweet.
4 Evaluation
4.1 Data collection
We used the data set provided in SemEval 2013
and 2014 for subtask B of sentiment analysis in
Twitter(Rosenthal, Ritter et al. 2014) (Wilson,
Kozareva et al. 2013). The participants were
provided with training tweets annotated as posi-
Z_score for each term ti in a class Cj (tij) by cal-
culating its term relative frequency tfrij in a par-
ticular class Cj, as well as the mean (meani)
which is the term probability over the whole cor-
pus multiplied by nj the number of terms in the
class Cj, and standard deviation (sdi) of term ti
according to the underlying corpus (see Eq.
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
The term which has salient frequency in a class
in compassion to others will have a salient
Z_score. Z_score was exploited for SA by
(Zubaryeva and Savoy 2010) , they choose a
threshold (2) for selecting the number of terms
having Z_score more than the threshold, then
they used a logistic regression for combining
these scores. We use Z_scores as added features
for classification because the tweet is too short,
therefore many tweets does not have any words
with salient Z_score. The three following figures
Bin
(Hu
wor
tive
se
neg
con
- Pa
We
tag,
tive
eac
4
4.1
W
and
Tw
Koz
pro
tive
twe
we
of p
pus multiplied by nj the number of term
class Cj, and standard deviation (sdi) o
according to the underlying corpus (
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
The term which has salient frequency in
in compassion to others will have a
Z_score. Z_score was exploited for
(Zubaryeva and Savoy 2010) , they c
threshold (2) for selecting the number
having Z_score more than the thresho
they used a logistic regression for co
these scores. We use Z_scores as added
for classification because the tweet is to
therefore many tweets does not have an
with salient Z_score. The three following
1,2,3 show the distribution of Z_score o
class, we remark that the majority of te
(1,2)).
Z!#$% !!
=
!#!!!#$!
!#
Eq. (1)
Z!#$% !!
=
!#!!!!∗!(!)
!∗! ! ∗(!!!(!))
Eq. (2)
The term which has salient frequency in a class
in compassion to others will have a salient
Z_score. Z_score was exploited for SA by
(Zubaryeva and Savoy 2010) , they choose a
threshold (2) for selecting the number of terms
having Z_score more than the threshold, then
they used a logistic regression for combining
these scores. We use Z_scores as added features
for classification because the tweet is too short,
therefore many tweets does not have any words
with salient Z_score. The three following figures
1,2,3 show the distribution of Z_score over each
class, we remark that the majority of terms has
Z_score between -1.5 and 2.5 in each class and
the rest are either vey frequent (2.5) or very rare
(-1.5). It should indicate that negative value
means that the term is not frequent in this class in
comparison with its frequencies in other classes.
Table1 demonstrates the first ten terms having
the highest Z_scores in each class. We have test-
ed to use different values for the threshold, the
best results was obtained when the threshold is 3.
positive
Z_score
negative
Z_score
Neutral
Z_score
Love
Good
Happy
Great
Excite
Best
Thank
Hope
Cant
Wait
14.31
14.01
12.30
11.10
10.35
9.24
9.21
8.24
8.10
8.05
Not
Fuck
Don’t
Shit
Bad
Hate
Sad
Sorry
Cancel
stupid
13.99
12.97
10.97
8.99
8.40
8.29
8.28
8.11
7.53
6.83
Httpbit
Httpfb
Httpbnd
Intern
Nov
Httpdlvr
Open
Live
Cloud
begin
6.44
4.56
3.78
3.58
3.45
3.40
3.30
3.28
3.28
3.17
Table1. The first ten terms having the highest Z_score in
each class
- Part Of Speech (POS)
We annotate each word in the tweet by its POS
tag, and then we compute the number of adjec-
tives, verbs, nouns, adverbs and connectors in
each tweet.
4 Evaluation
4.1 Data collection
We used the data set provided in SemEval 2013
and 2014 for subtask B of sentiment analysis in
Twitter(Rosenthal, Ritter et al. 2014) (Wilson,
Kozareva et al. 2013). The participants were
provided with training tweets annotated as posi-
tive, negative or neutral. We downloaded these
tweets using a given script. Among 9646 tweets,
we could only download 8498 of them because
of protected profiles and deleted tweets. Then,
we used the development set containing 1654
tweets for evaluating our methods. We combined
the development set with training set and built a
new model which predicted the labels of the test
set 2013 and 2014.
4.2 Experiments
Official Results
The results of our system submitted for
SemEval evaluation gave 46.38%, 52.02% for
test set 2013 and 2014 respectively. It should
mention that these results are not correct because
of a software bug discovered after the submis-
sion deadline, therefore the correct results is
demonstrated as non-official results. In fact the
previous results are the output of our classifier
which is trained by all the features in section 3,
but because of index shifting error the test set
was represented by all the features except the
terms.
Non-official Results
We have done various experiments using the
features presented in Section 3 with Multinomial
Naïve-Bayes model. We firstly constructed fea-
ture vector of tweet terms which gave 49%, 46%
features which improve the performance by 6.5%
and 10.9%, then by pre-polarity features which
also improve the f-measure by 4%, 6%, but the
extending with POS tags decreases the f-
measure. We also test all combinations with the-
se previous features, Table2 demonstrates the
results of each combination, we remark that POS
tags are not useful over all the experiments, the
best result is obtained by combining Z_score and
pre-polarity features. We find that Z_score fea-
tures improve significantly the f-measure and
they are better than pre-polarity features.
Figure 1 Z_score distribution in positive class
Figure 2 Z_score distribution in neutral class
Features F-measure
2013 2014
Terms 49.42 46.31
Terms+Z 55.90 57.28
Terms+POS 43.45 41.14
Terms+POL 53.53 52.73
Terms+Z+POS 52.59 54.43
Terms+Z+POL 58.34 59.38
Terms+POS+POL 48.42 50.03
Terms+Z+POS+POL 55.35 58.58
Table 2. Average f-measures for positive and negative clas-
ses of SemEval2013 and 2014 test sets.
We repeated all previous experiments after using
a twitter dictionary where we extend the tweet by
the expressions related to each emotion icons or
abbreviations in tweets. The results in Table3
demonstrate that using that dictionary improves
the f-measure over all the experiments, the best
results obtained also by combining Z_scores and
pre-polarity features.
Features F-measure
2013 2014
Terms 50.15 48.56
Terms+Z 57.17 58.37
Terms+POS 44.07 42.64
Terms+POL 54.72 54.53
Terms+Z+POS 53.20 56.47
Terms+Z+POL 59.66 61.07
Terms+POS+POL 48.97 51.90
Terms+Z+POS+POL 55.83 60.22
Table 3. Average f-measures for positive and negative clas-
ses of SemEval2013 and 2014 test sets after using a twitter
dictionary.
5 Conclusion
In this paper we tested the impact of using
Twitter Dictionary, Sentiment Lexicons, Z_score
features and POS tags for the sentiment classifi-
cation of tweets. We extended the feature vector
of tweets by all these features; we have proposed
new type of features Z_score and demonstrated
that they can improve the performance.
features which improve the performance by 6.5%
and 10.9%, then by pre-polarity features which
also improve the f-measure by 4%, 6%, but the
extending with POS tags decreases the f-
measure. We also test all combinations with the-
se previous features, Table2 demonstrates the
results of each combination, we remark that POS
tags are not useful over all the experiments, the
best result is obtained by combining Z_score and
pre-polarity features. We find that Z_score fea-
tures improve significantly the f-measure and
they are better than pre-polarity features.
Figure 1 Z_score distribution in positive class
Features F-measure
2013 2014
Terms 49.42 46.31
Terms+Z 55.90 57.28
Terms+POS 43.45 41.14
Terms+POL 53.53 52.73
Terms+Z+POS 52.59 54.43
Terms+Z+POL 58.34 59.38
Terms+POS+POL 48.42 50.03
Terms+Z+POS+POL 55.35 58.58
Table 2. Average f-measures for positive and negative clas-
ses of SemEval2013 and 2014 test sets.
We repeated all previous experiments after using
a twitter dictionary where we extend the tweet by
the expressions related to each emotion icons or
abbreviations in tweets. The results in Table3
demonstrate that using that dictionary improves
the f-measure over all the experiments, the best
results obtained also by combining Z_scores and
pre-polarity features.
Features F-measure
2013 2014
Terms 50.15 48.56
Terms+Z 57.17 58.37
Terms+POS 44.07 42.64
Terms+POL 54.72 54.53
Terms+Z+POS 53.20 56.47
Terms+Z+POL 59.66 61.07
Terms+POS+POL 48.97 51.90
Terms+Z+POS+POL 55.83 60.22
Table 3. Average f-measures for positive and negative clas-
ses of SemEval2013 and 2014 test sets after using a twitter
dictionary.
5 Conclusion
[Hamdan, Béchet Bellot, SemEval 2014]
http://sentiwordnet.isti.cnr.it
125. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Linking Contents by Analyzing the References
In Books : no common stylesheet (or a lot of stylesheets poorly respected…)
Our proposal :
1) Searching for references in the document / footnotes (Support Vector Machines)
2) Annotating the references (Conditional Random Fields)
BILBO : Our (open-source) software for Reference Analysis
125
Google Digital Humanities Research Awards (2012)
Annotation
DOI search
(Crossref)
OpenEdition Journals : more than 1.5 million references analyzed
Test : http://bilbo.openeditionlab.org
Sources : http://github.com/OpenEdition/bilbo
134. P. Bellot (AMU-CNRS, LSIS-OpenEdition)
Conclusion
— De très nombreuses approches (hybrides)
— Filtrage collaboratif et exploitation de l’historique
— Analyse des contenus
— Exploitation de données comportementales et d’informations
explicites
— Exploitation des réseaux sociaux
— - tout combiner dans un seul modèle d’apprentissage ?
Quelle fonction à optimiser ?
— Des liens forts avec d’autres domaines
— Méthodes statistiques, fouille de données et de graphes,
apprentissage…
— Recherche d’information (n’est-ce pas aussi de la recommandation ?),
traitement automatique des langues, analyse d’image/signal, ergonomie
et interaction…
— Il faut choisir les approches mais aussi les données
— Usages et contextes
— Préservation de la vie privée
134