SlideShare a Scribd company logo
1 of 32
Download to read offline
Technical Report, March 2010




           Social Recommender Systems
                 on IEML Semantic Space

   Heung-Nam Kim, Andrew Roczniak, Pierre Lévy, Abdulmotaleb El-Saddik


                    Collective Intelligence Lab, University of Ottawa


                                          4 March 2010




H. N. Kim, A. Roczniak, P. Lévy, A. EI-Saddik, “Social recommender systems on IEML semantic space,”
Collective Intelligence Lab, University of Ottawa, Technical Report, March 2010




                                                                                                      1
Technical Report, March 2010




                   Social Recommender Systems
                      on IEML Semantic Space
    Heung-Nam Kim1,2, Andrew Roczniak1, Pierre Lévy1, Abdulmotaleb El Saddik2
                        1
                            Collective Intelligence Lab, University of Ottawa
              2
                  Multimedia Communication Research Lab, University of Ottawa



Abstract

In this report, we present two social recommendation methods that are incorporated
with the semantics of the tags: a user-based semantic collaborative filtering and an item-
based semantic collaborative filtering. Social tagging is employed as an approach in
order to grasp and filter users’ preferences for items. In addition, we analyze potential
benefits of IEML models for social recommender systems in order to solve polysemy,
synonymy, and semantic interoperability problems, which are notable challenges in
information filtering. Experimental results show that our methods offer significant
advantages both in terms of improving the recommendation quality and in dealing with
polysemy, synonymy, and interoperability issues.




1 Introduction

The prevalence of social media sites bring considerable changes not only on people’s
life patterns but also on the generation and distribution of information. This social
phenomenon has transformed the masses, who were only information consumers via
mass media, to be producers of information. However, as rich information is shared
through social media sites, the sheer amount of information that has not been available
before is increasing exponentially with daily additions. As well as finding the most
attractive and relevant content, users struggle with a great challenge in information
overload. Recommender systems that have emerged in response to the above challenges
provide users with recommendations of items that are more likely to fit their needs [17].



                                                                                                    2
Technical Report, March 2010




     With the popularity of social tagging (also known as collaborative tagging or
folksonomies) a number of researchers have recently concentrated on recommender
systems with social tagging [10, 13, 20, 22, 25]. Because modern social media sites,
such as Flickr1, YouTube2, Twitter3, and Delicious4 allow users to freely annotate their
contents with any kind of descriptive words, also known as tags [6], the users tend to
use the descriptive tags to annotate the contents that they are interested in [13].
Recommender systems incorporated with tags can alleviate limitations of traditional
recommender systems, such as the sparsity problem and the cold start problem [1], and
thus the systems eventually provide promising possibilities to better generate
personalized recommendations. Although these studies obtain reasonable promise of
improving the performance, they do not take into consideration the semantics of tags
themselves. Consequently, the lack of semantic information suffers from fundamental
problems: polysemy and synonymy of the tags, as clearly discussed in [6]. Without the
semantics of the tags used by users, the systems cannot differentiate the various social
interests of the users from the same tags. Furthermore, they cannot provide semantic
interoperability that is a notable challenge in the cyberspace [11].
     To address the discussed issues, we introduce a new concept to capture semantics of
user-generated tags. We then propose two social recommendation methods that are
incorporated with the semantics of the tags: a user-based semantic collaborative filtering
and an item-based semantic collaborative filtering. First, in the user-based method, we
determine similarities between users by utilizing users’ semantic-oriented tags,
collectively called Uniform Semantic Locator (USL) and subsequently identify
semantically like-minded users for each user. Finally, we recommend social items (e.g.,
text, picture, video) based on the social ranking of the items that are semantically
associated to tags that like-minded users annotate. Second, in the item-based method,
we determine similarities between items by utilizing USLs and identify semantically
similar items for each item. Finally, we recommend social items based on the
semantically similar items.


1
    http://www.flickr.com
2
    http://www.youtube.com
3
    http://twitter.com
4
    http://delicious.com



                                                                                                 3
Technical Report, March 2010




   The main contributions of this study toward social recommender systems can be
summarized as follows: 1) We present and formalize models for semantic-oriented
social tagging in dealing with the issues of polysemy, synonymy, and semantic
interoperability. We also illustrate how the models can be adapted and applied to
existing social tagging systems. 2) We propose the methods of social recommendations
in semantic space that aim to find semantically similar users/items and discover social
items semantically relevant to users’ needs.
   The rest of this report is organized as follows: in next section we review concepts
related to collaborative filtering and provide recent studies applying social tagging to
recommender systems. In Section 3, we provide some models used in our study. We
then describe our semantic models for social recommender systems and provide a
detailed description of how the models are applied to item recommendations in Section
4. In Section 5, we present the effectiveness of our methods through experimental
evaluations. Finally, we summarize our work.



2 Related Work

In this section, we summarize previous studies and position our study with respect to
other related works in the area.


2.1 Collaborative Filtering

Following the proposal of GroupLens [16], automated recommendations based on
Collaborative Filtering (CF) have seen the widest use. CF is based on the fact that
“word of mouth” opinions of other people have considerable influence on the buyers’
decision making [10]. If advisors have similar preferences to the buyer, he/she is much
more likely to be affected by their opinions.   In CF-based recommendation schemes,
two approaches have mainly been developed: user-based approaches [4, 16, 18] and
item-based approaches [5, 17]. Usually, user-based and item-based CF systems involve
two steps. First, the neighbor group, which are users who have a similar preference to a
target user, called k nearest neighbors (for user-based CF) or the set of items that is
similar to a target item, called k most similar items (for an item-based CF), is



                                                                                           4
Technical Report, March 2010




determined by using a variety of similarity computing methods, such as Pearson
correlation-based similarity, cosine-based similarity, and so on. This step is an
important task in CF-based recommendations because different neighbor users or items
lead to different recommendations [18]. Once the neighborhood is generated, in second
step, the prediction values of particular items, estimating how much the target user is
likely to prefer the items, are computed based on the group of neighbors. The more a
neighbor is similar to the target user or the target item, the more influence he/she or it
has for calculating a prediction value. After predicting how much the target user will
like particular items not previously rated by him/her, the top N item set, the set of
ordered items with the highest predicted values, is identified and recommended. The
target user can present feedback on whether he/she actually likes the recommend top N
items or how much he/she prefers those items as scaled ratings.


2.2 Social Tagging in Recommender Systems

Social tagging is the practice of allowing any user to freely annotate the content with
any kind of arbitrary keywords (i.e., tags) [6]. Social media sites with social tagging
have become tremendously popular in recent years. Therefore, the area of recommender
systems with social tagging (folksonomy) has become active and growing topic of
studies. These studies can be broadly divided into three topics: tag suggestions, social
searches, and social recommendations.
   With the popularity of the usage of tags, many researchers have proposed new
applications for recommender systems supporting the suggestion of suitable tags during
folksonomy development. In [21], a tag recommender system with Flickr’s dataset is
presented based on an analysis of how users annotate photos and what information is
contained in the tagging. In [9], three classes of algorithms for tag recommendations,
such as an adaptation of user-based CF, a graph-based FolkRank [8] algorithm, and
simple methods based on tag counts, are presented and evaluated. Xu et al. [23] propose
an algorithm for collaborative tag suggestions that employs a reputation score for each
user based on the quality of the tags contributed by the user. In [14], a new semantic
tagging system, SemKey, is proposed in order to combine semantic technologies with
the collaborative tagging paradigm in a way that can be highly beneficial to both areas.



                                                                                            5
Technical Report, March 2010




Differently from our aims, the purpose of these studies using social tagging is basically
to recommend appropriate tags for assisting the user in annotation related tasks. Our
approach takes a different stance. Rather than offering the tag recommendations, our
aim is to find like-minded users with tags’ semantics and identify personal resources
semantically relevant to user needs.
   Research has also been very active in relating information retrieval using social
tagging. In [8], the authors presented a formal model and a new search algorithm for
folksonomies called FolkRank. The FolkRank is applied not only to find communities
within the folksonomy but also to recommend tags and resources. In [24], a social
ranking mechanism is proposed to answer a user’s query that aims to transparently
improve content searches based on emergent tags semantics. It exploits users’ similarity
and tags’ similarity based on their past tagging activity. In [19], the ContextMerge
algorithm is introduced to support efficiently user-centric searches in social networks,
dynamically including related users in the execution. The algorithm adopts two-
dimensional expansions: social expansion considers the strength of relations among
users and semantic expansion considers the relatedness of different tags. In [2], two
algorithms are proposed, SocialSimRank and SocialPageRank. The former algorithm
calculates the similarity between tags and user queries whereas the latter one captures
page popularity based on its annotations. All these works attempt to improve users’
searches by incorporating social annotations into query expansion. Differing from these
works, our goal is to automatically identify resources without users’ queries, which are
likely to fit their needs.
   Other researchers have studied the same area as our study. In [10], authors proposed
collaborative filtering method via collaborative tagging. First they determine similarities
between users with social tags and subsequently identify the latent tags for each user to
recommend items via a naïve Bayes approach. Tso-Sutter et al. [22] proposed a generic
method that allows tags to be incorporated into CF algorithms by reducing the three-
dimensional correlations to three two-dimensional correlations and then applying a
fusion method to re-associate these correlations. Similar approach is presented by [25]
as well in order to provide improved recommendations to users. Although these studies
give reasonable promise of improving the performance, they do not take the semantics
of tags into consideration. Consequently, the lack of semantic information has


                                                                                             6
Technical Report, March 2010




limitations, such as polysemy and synonymy of tags, for identifying similar users with
user-generated tags. We believe the semantic information of tags can be more helpful
not only to grasp better users’ interests but also to enhance the quality of
recommendations. The current literature recently focuses on semantic recommender
systems that are similar to our goal. Unlike our approach, however, the existing work on
the semantic recommender systems relies on a prefixed ontology and uses technologies
from Semantic Web. The state of the topic for the semantic recommender systems has
been well analyzed in [15].



3 IEML Models

For understanding our semantic approach, this section briefly explains preliminary
concepts of Information Economy MetaLanguage (IEML5) that will be exploited in the
next sections of this report. The IEML research program promotes a radical innovation
in the notation and processing of semantics. IEML is a regular language and a symbolic
system for the notation of meaning. It is “semantic content oriented” rather than
“instruction oriented” like programming languages or “format oriented” like data
standards. IEML provides new methods for semantic interoperability, semantic
navigation, collective categorization and self-referential collective intelligence [11].
IEML research program is compatible with the major standards of the Web of data and
is in tune with the current trends in social computing.


3.1 IEML Overview

IEML expressions are built from a syntactically regular combination of six symbols,
called primitives. In IEML a sequence is a succession of 3l single primitives, where l =
(0, 1, 2, 3, 4, 5, 6). l is called the layer of a sequence. For each layer, the sequences have
respectively a length of 1, 3, 9, 27, 81, 243, and 729 primitives [12]. From a syntactic
point of view, any IEML expression is nothing else than a set of sequences. As there is a
distinct semantic for each distinct sequence, there is also a distinct semantic for each
distinct set of sequences. In general, the meaning of a set of sequences corresponds to

5
    http://ieml.org



                                                                                               7
Technical Report, March 2010




the union of the meaning of the sequences of this set. The main result is that any
algebraic operation that can be made on sets in general can also be made on semantics
(significations) once they are expressed in IEML. An IEML dictionary provides the
correspondence between IEML sequences and a natural language descriptor of an IEML
expression. The terms of the dictionary belong to layers 0-3. There are rules to create
inflected words from these terms, to create sentences from inflected words and to create
relations between sentences by using some terms as conjunctions. Given these rules, it
is possible to express any network of relations between sentences by using sequences up
to layer 6. Various notation, syntax, semantics, and examples of IEML have been
presented in [12]. Due to a lack of space, we refer the reader to [12] for more details.


3.2 IEML Language Model

We present the model of the IEML language, along with the model of semantic
variables.
   Let  be a nonempty and finite set of symbols,  = {S, B, T, U, A, E}. Let string s be
a finite sequence of symbols chosen from . The length of this string is denoted by |s|.
An empty string  is a string with zero occurrence of symbols and its length is | |= 0.
The set of all strings of length k composed with symbols from  is defined as k = {s
where |s| = k}. Note that 0 = {} and 1 = {S, B, T, U, A, E}. Although  and 1 are
sets containing exactly the same members, the former contains symbols, and the latter
strings. The set of all strings over  is defined as * = 0123 …
   A useful operation on strings is concatenation, defined as follows. For all si =
a1a2a3a4…ai * and sj = b1b2b3b4…bj*, then sisj denotes string concatenation such
that sisj = a1a2a3a4…ai b1b2b3b4…bj and |sisj| = i + j. The IEML language over  is a
subset of *, LIEML  *:

                       LIEML  {s   * | | s | 3l , 0  l  6}              (1)




                                                                                             8
Technical Report, March 2010




3.3 Model of Semantic Sequences

Definition 1 (Semantic sequence) a string s is called a semantic sequence if and only if
sLIEML.

To denote the pnth primitive of a sequence s, we use a superscript n where 1  n  3l and
write sn. Note that for any sequence s of layer l, sn is undefined for any n > 3l. Two
semantic sequences are distinct if and only if either of the following holds : i) their
layers are different, ii) they are composed from different primitives, iii) their primitives
do not follow the same order: for any sa and sb,

                             s a  sb  n, s a  sb  | s a || sb |
                                              n    n
                                                                                                 (2)

     Let’s now consider binary relations between semantic sequences in general. These
are obtained by performing a Cartesian product of two sets6. For any set of semantic
sequences X, Y where saX, sbY and using Equation 2, we define four binary relations
whole  X × Y, substance  X × Y , attribute  X × Y, and mode  X × Y as follows:

                                   whole  {( sa , sb ) | sa  sb }

               substance  {( s a , sb ) | s a  sb  | s a | 3 | sb |, 1  n  | sb |}
                                             n    n

                                                                                                 (3)
             attribute  {( s a , sb ) | s a |sb |  sb  | s a | 3 | sb |, 1  n  | sb |}
                                           n           n



              mode  {( s a , sb ) | s a  sb  2|sb |  | s a | 3 | sb |, 1  n  | sb |}
                                       n    n




     Any two semantic sequences that are equal are in a whole relationship. In addition,
any two semantic sequences that share specific subsequences may be in substance,
attribute or mode relationship. For any two semantic sequences sa and sb, if they are in
one of the above relations, then we say that sb plays a role w.r.t sa and we call sb a seme
of sequence.

Definition 2 (seme of a sequence) For any semantic sequence sa and sb, if (sa, sb) 
whole substanceattributemode, then sb plays a role w.r.t. sa and sb is called a seme.

     We can now group distinct semantic sequences together into sets. A useful grouping
is based on the layer of those semantic sequences.

6
    A Cartesian product of two sets X and Y is written as follows: X × Y = {(x, y) | xX, yY}.



                                                                                                                9
Technical Report, March 2010




3.4 Model of Semantic Categories and Catsets

A category of LIEML is a subset such that all strings of that subset have the same length:

                           c  {si , s j  LIEML where | si || s j |}                       (4)


Definition 3 (semantic category) A semantic category c is a set containing semantic
sequences at the same layer.

     The layer of any category c is exactly the same as the layer of the semantic
sequences included in that category. The set of all categories of layer l is given as the
powerset7 of the set of all strings of layer l of LIEML:

                          Cl = Powerset({s  LIEML where |s|=3l})                             (5)

     Two categories are distinct if and only if they differ by at least one element. For any
ca and cb:

                                 c a  cb  c a  c b  cb  c a                              (6)

     A weaker condition can be applied to categories of distinct layers (since two
categories are different if their layers are different) and is written as:

                                    l ( c a )  l ( cb )  c a  cb                           (7)

where l(ca) and l(cb) denotes the layer of category ca and cb, respectively. Analogously
to sequences, we consider binary relations between any categories ci and cj where l(ci),
l(cj)  1. For any set of categories X, Y where ca  X, cb  Y, we define four binary
relations wholeC  X  Y, substanceC  X  Y, attributeC  X  Y, and modeC  X  Y as
follows:

                                   whole C  {(ca , cb ) | ca  cb }

             substance C  {(ca , cb ) | s a  ca , sb  cb , ( s a , sb )  substance}
                                                                                               (8)
               attribute C  {(ca , cb ) | s a  ca , sb  cb , ( s a , sb )  attribute}

                   mode C  {(ca , cb ) | s a  c a , sb  cb , ( s a , sb )  mode}


7
    A powerset of S is the set of all subsets of S, including the empty set .



                                                                                                          10
Technical Report, March 2010




   For any two categories ca and cb, if they are in one of the above relations (ca, cb) 
wholeC  substanceC  attributeC  modeC, then we say that cb plays a role with
respect to ca and cb is called a seme of category.
   A catset is a set of distinct categories of the same layer as defined Definition 4.
Definition 4 (Catset) A catset  is a set containing categories such that  ={cn |i, j: ci
 cj, l(ci)=l(cj)}.

   The layer of a catset is given by the layer of any of its members: if some c  , then
l() = l(c). Note that a category c can be written as c  Cl, while a catset  can be
written as   Cl. All standard set operations, such as union and intersection, (e.g., 
and ), can be performed on catsets of the same layer.


3.5 Model of Uniform Semantic Locator

A USL is composed of up to seven catsets of different layers as follows:

Definition 5 (Uniform Semantic Locator, USL) A USL  is a set containing catsets of
different layers such that  = {n | i, j: l(ci)  l(cj)}.

   Note that since there are seven distinct layers, a USL can have at most seven
members. All standard set operations, such as union and intersection (e.g.,  and ) on
USLs are always performed on sets of categories (and therefore on sets of sequences),
layer by layer. Since at each layer l there is |Cl| distinct catsets, the whole semantic
space is defined by the tuple: Ł =C0  C1  C2  C3  C4  C5  C6.
   In the IEML notation of USLs, the categories are separated by ‘‘/ ”. Table 1 shows an
example of USLs used as a *tag for “Wikipedia” and “XML” and a layer by layer
English translation of those USLs. A *tag holds the place of an IEML expression by
suggesting its meaning rather than uttering the IEML expression. The meaning of a *tag
has to be understood from the singular place that its corresponding IEML expression
occupies into the network of IEML semantic relations [12].




                                                                                            11
Technical Report, March 2010




Table 1 An example of USLs in semantic space [12].

Tag: *Wikipedia
USL                                                         Semantics in English
L0: (U: + S:)                                               L0: knowledge networks
L1: (d.) / (t.)                                             L1: truth/ memory
L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-)       L2: get one’s bearings in knowledge / act for the
L3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’)                        sake of society / synthesize / organized
L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,)      knowledge/ collective creation
   / (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,)        L3: opening public space / encyclopedia
                                                            L4: collective intelligence encyclopedia in
                                                               cyberspace / volunteers producing didactic
                                                               material on any subject
Tag: *XML
USL                                                         Semantics in English
L0: (A: + S:)                                               L0: document networks
L1: (b.)                                                    L1: language
L2: (we.g.-) / (we.b.-)                                     L2: unify the documentation / cultivate information
L3: (e.o.-we.h.-’) / (b.i.-b.i.-’) / (t.e.-d.u.-’)             system
L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-      L3: establishing norms and standards/ cyberspace /
   A:.g.-’,_)                                                  meeting information needs
                                                            L5: guarantee compatibility of data through same
                                                               formal structure



4 Social Recommender Systems on IEML Semantic Space




  Fig. 1 System overview of social recommender systems based on IEML semantic tagging: a
  user-based semantic collaborative filtering and an item-based semantic collaborative filtering

In this section, we present two social recommendation methods that are incorporated
with USLs. We explore two well-known approaches of collaborative filtering to our
study: a user-based approach and an item-based approach. Fig. 1 illustrates the overall
process of each method with two phases: a neighborhood formation phase and an item


                                                                                                            12
Technical Report, March 2010




recommendation phase. In the neighborhood formation phase, we first compute
similarities between users (for a user-based method) and between items (for an item-
based method) by utilizing USLs. Thereafter, we generate a set of semantically like-
minded users for each user and a set of semantically similar items for each item. Based
on the neighborhood, in the recommendation phase, we predict a social ranking of items
to decide which items to recommend.


4.1 Semantic Models in Bipartite Space




             Fig. 2 Bipartite space of social tagging space and USL semantic space

The social tagging in our study is free-to-all allowing any user to annotate any item with
any tag [6]. Therefore, if there is a list of r users Ũ = {u1, u2, …, ur}, a list of m tags Ť =
{t1, t2, … , tm}, and a list of n items Ĩ = {i1, i2, …, in}, the social tagging, folksonomy F
is a tuple F = Ũ, Ť, Ĩ, Y where Y  Ũ × Ť × Ĩ is a ternary relationship called tag
assignments [9]. More conceptually, the triplet of the tagging space can be represented
as three-dimensional data cube. Beyond the tagging space, in our study, there is another
space where the tags are connected to USLs according to their semantics. We call this
space the IEML semantic space as illustrated in Fig. 2. Note that a USL  is composed
of catsets (Definition 5) where each catset  consists of semantic categories c
(Definition 4). Therefore an extended formal definition of the folksonomy, called
semantic folksonomy, is defined as follows:



                                                                                              13
Technical Report, March 2010




Definition 6 (Semantic Folksonomy) Let Ł = C0  C1  C2  C3  C4  C5  C6 be the
whole semantic space. A semantic folksonomy is a tuple SF = Ũ, Ť, Ĩ, Y, Ň where Ň is
a ternary relationship such as Ň  Ũ × Ť × Ł.

   From semantic folksonomies, we present a formal description of two models, a
semantic user model and a semantic item model, which are used in our social
recommender system.


4.1.1 Semantic Model for the User

From semantic folksonomies, a semantic user model is defined as follows:

Definition 7 (Semantic User Model) Given a user u  Ũ, a formal description of a user
model for user u, Mu, follows: Mu=Ťu, Ňu, where Ťu = {(t, i)  Ť × Ĩ | (u, t, i)  Y } and
Ňu = {(t, )  Ť × Ł | (u, t, )  Ň }.

   For clarity some definitions of the sets used in this work are introduced. We define,
for a given user u, the set of all tags that user u has used Tu* := {t  Ť | i  Ĩ: (t, i) 

Ťu}. Therefore, the set of all USLs of the user u can be defined as N u := {  Ł | t 
                                                                      *



Tu* : (t, )  Ňu}. For a certain item h, we define the set of tags that the user u annotated

the item h Tuh := {t  Ť | Ťu  (t ×{h})}. Accordingly, the set of all USLs of user u for

the item h can be defined as N uh := {  Ł | t  Tuh : (t, )  Ňu}. As stated previously,

all standard set operations, such as union and intersection on USLs, can always be
performed on sets of categories, layer by layer. Therefore, for a given user model Mu for
user u, with respect to semantic space, tags that represent user u has used to annotated a
certain item h, can be represented as the union of USLs   N uh :

                       6
            USLh  l 0 USLh (l ) , where USLh (l )  l ( c
               u            u                 u                                     h   cn            (9)
                                                                n )  l , cn  ,N u
                                                                              



   Likewise, all tags that user u has used are represented as the union of USLs   N u :
                                                                                      *



               n             6
    USL*  i 1USLiu  l 0 USL* (l ) , where USL* (l )  l ( c
       u                         u                 u                                             cn   (10)
                                                                          n )  l ,cn  ,N u
                                                                                            *




                                                                                                             14
Technical Report, March 2010




                                                                    User u                    tags           Item h
                                                                                                    h
                    USL*
                       u
                                                                                             USL    u




   USL* (0)
      u             USL* (3)
                       u                USL* (6)
                                           u
                                                        USLh (0)
                                                           u              …                 USLh (3)
                                                                                               u             …          USLh (6)
                                                                                                                           u




                                                                   C1             C2           ……                Cn-1         Cn


                                                                    Catset in item h
  USL* (l )                                            USLh (l )                                        Cn    Semantic Category
     u                                                    u            at layer l


               Fig. 3 Conceptual user models with respect to IEML semantic space


4.1.2 Semantic Model for the Item

From semantic folksonomies, a semantic item model is defined as follows:

Definition 8 (Semantic Item Model) Given an item i  Ĩ, a formal description of a
semantic item model for item i, M(i), follows: M(i) = Ť(i), Ň(i), where Ť(i) = {(u, t) 
Ũ ׍ | (u, t, i)  Y } and Ň(i) = {(t, )  Ť × Ł | (u, t, )  Ň }.

   For clarity, we introduce some definitions of the sets from an item perspective. We
define, for a given item i, the set of all tags that all users have annotated item i, T*i := {t
 Ť | u  Ũ: (u, t)  Ť(i)}. Therefore, the set of all USLs of the item i can be defined as
N *i := {  Ł | t  T*i : (t,   )  Ň(i)}. For a certain user v, we define the set of tags that
the user v annotated the item i, Tvi := {t  Ť | Ť(i)  ({v}× t)}. Accordingly, The set of

all USLs of user v for the item i can be defined as                N vi :=   {  Ł | t  Tvi : (t, )  Ň(i)}.

As stated previously, all standard set operations, such as union and intersection on
USLs, can always be performed on sets of categories, layer by layer. Therefore, for a
given item model M(i) for item i, with respect to semantic space, tags that represent a
certain user v has used to annotated item i can be represented as the union of USLs,  
N vi :


              USLiv  l 0 USLiv (l ) , where USLiv (l )  l ( c
                           6
                                                                                               i   cn             (11)
                                                                        n )  l , c n  , N v




                                                                                                                                   15
Technical Report, March 2010




   Likewise, all tags that have been annotated item i are represented as the union of
USLs,   N *i :

                 r               6
     USLi*  u 1USLiu  l  0 USLi* (l ) , where USLi* (l )  l ( c                        i   cn   (12)
                                                                          n )  l , cn  , N *
                                                                                        




                     USL i*                                                 USLiv

  USLi* ( 0 )        USLi* (3)          USLi* (6)     USLiv (0)              USLiv (3)                       i
                                                                                                          USLv (6)




  USLi* (l )                                           USLiv (l )


                Fig. 4 Conceptual item models with respect to IEML semantic space


4.2 User-based Semantic Collaborative Filtering

In this section, we describe a social recommendation method based on the semantic user
model (Definition 7). The basic idea of a user-based semantic collaborative filtering
starts from assuming that a certain user is likely to prefer items that like-minded users
have annotated with tags which are similar to the tags he/she used. Therefore, we first
look into the set of like-minded users who have tagged a target item and then compute
how semantically similar they are to the target user, called a user-user semantic
similarity. Based on the semantically similar users, the semantic social ranking of the
item is computed to decide whether or not to recommend.


4.2.1 Generating Semantically Nearest Neighbors

One of the most important tasks in CF recommender systems is the neighborhood
formation to identify a set of users who have similar taste, often called k nearest
neighbors. Those users can be directly defined as a group of connected friends in social
networks such as followings in Twitter, connections in Twine, people in Delicious,
friends in Facebook, and so on. However, most users have insufficient connections with



                                                                                                                     16
Technical Report, March 2010




their friends. In addition, finding like-minded users in current social network services
still relies on manually browsing networks of friends, or keywords searching. Thus, this
form of establishing neighbors becomes a time consuming and ineffective process if we
take into consideration huge amount of available people in the network [3].
   As mentioned in Section 2.1, typical collaborative filtering methods adopt a variety
of similarity measurement to determine similar users automatically. However, it also
encounters serious limitations, namely sparsity problem [1, 10]. It is often the case that
there is no intersection between two users and hence the similarity is not computable at
all. Even when the computation of similarity is possible, it may not be very reliable
because insufficient information is processed. To deal with this problem recent studies
have determined similarities between users by using user-generated tags in terms of
users’ characteristics [10, 13, 20, 22, 25]. However, there still remain limitations that
should be treated such as noise, polysemy, and synonymy tags. To this end, our study
identifies the nearest neighbors by using Uniform Semantic Locators, USLs, of each
user for more valuable and personalized analyses.
   We define semantically similar users as a group of users presenting interest
categories of IEML close to those of the target user. Semantic similarity between two
users, u and v, can be computed by the sum of layer similarities from layer 0 to layer 6.
Formally, the semantic user similarity measure is defined as:
                                                   6
                          semUSim(u , v)     simULayer l (u, v)                       (11)
                                                  l 0


where  is a normalizing factor such that the layer weights sum to unity. simULayerl(u,
v) denotes the layer similarity of two users at layer l. The layer similarity measures
between two USL sets is defined as the size of the intersection divided by the size of the
union of the USL sets. In other words, it is determined by computing the weighted
Jaccard coefficient of two USL sets. Formally, the layer similarity is given by:

                                          (l  1) | USL* (l )  USL* (l ) |
                     simULayer (u , v ) 
                                      l
                                                       u           v
                                                                                          (12)
                                             7     | USLu (l )  USLv (l ) |
                                                        *           *




where   USL* (l )
           u
                    and   USL* (l )
                             v
                                      refer to the union of USLs for user u and v at layer l, 0  l 

6, respectively. Here we give more layer weights at higher layer when computing the



                                                                                                       17
Technical Report, March 2010




semantic user similarity. That is, the intersections of higher layers present more
contribution than intersections of lower layers. When  is set to 0.25 for normalization,
the similarity value between two users is in the range of 0 and 1. The higher score a user
has, the more similar he/she is to a target user. Finally, for a given user u Ũ, particular
k users with the highest semantic similarity are identified as semantically k nearest
neighbors such that:
                                                                                     k
                                                    SSN k (u )  arg~max semUsim (u , v )                                                  (13)
                                                                                vU {u }



    To illustrate a simple example for computing semantic user similarity, consider the
following three users, Alice, Bob, and Nami. Alice annotated Media1 with tags such as
“Community” and “XML”, Bob annotated Media2 with “Wikipedia” and “OWL” tags,
and Nami annotated Media3 with “Web of data” and “Folksonomy” tags. In addition,
consider USLs of each tag as shown in Fig. 5.

 *Community                                                  *XML
 USL                                                         USL
 L0: (A: + B:)                                               L0: (A: + S:)
 L1: (k.) / (m.)                                             L1: (b.)
 L2: (k.o.-) / (p.a.-)                                       L2: (we.g.-) / (we.b.-)
 L3: (s.o.-a.a.-’)                                           L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (t.e.-d.u.-‘)
                                                             L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_)

 *Wikipedia                                                  *OWL
 USL                                                         USL
 L0: (U: + S:)                                               L0: (A: + S:)
 L1: (d.) / (t.)                                             L1: (b.)
 L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-)       L2: (we.g.-) / (we.h.-)
 L3: (a.u.-we.h.-') / (n.o.-y.y.-s.y.-')                     L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (l.o.-u.u.-‘)
 L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,)   L4: (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)
     / (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,)       L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_)

 *Web of data                                                *Folksonomy
 USL                                                         USL
 L0: (U: + S:)
                                                             L0: (A: + B:)
 L1: (k.)
                                                             L1: (k.)
 L2: (s.x.-) / (x.j.-)
                                                             L2: (s.y.-) / (k.h.-) / (we.b.-)
 L3: (e.o.-we.h.-’) / (b.i.-b.i.-’)
                                                             L3: (s.o.-a.a.-’) / (b.o.-o.o.-’) / (a.u.-we.h.-’)
 L4:(t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)
                                                                 / (k.o.-a.a.-')
     / (s.a.-t.a.-’ b.i.-l.i.-t.u.-’,)



   Fig. 5 An example of tag assignments and USLs for computing the semantic user similarity

    Now, compute the semantic similarity between Alice and Bob. Table 2 shows the
results of the union of the two USLs for “Community” and “XML” tags and for
“Wikipedia” and “OWL” tags, respectively.




                                                                                                                                                        18
Technical Report, March 2010




Table 2 The union of USLs for Alice and Bob.

USL*Alice : the union of Alice’s USLs                          USL* : the union of Bob’s USLs
                                                                  Bob

L0: (A:+B:) / (A:+S:)                                          L0: (U:+S:) / (A:+S:)
L1: (k.) / (m.) / (b.)                                         L1: (d.) / ( t.) / (b.)
L2: (k.o.-) / (p.a.-) / (we.g.-) / (we.b.-)                    L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-) /
L3: (s.o.-a.a.-’) / (e.o.-we.h.-’) / (b.i.-b.i.-’) / (t.e.-       / (we.g.-) / (we.h.-)
   d.u.-’)                                                     L3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’) / (e.o.-we.h.-’) /
L4:                                                              (b.i.-b.i.-’) / (l.o.-u.u.-’)
L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-         L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,) /
   A:.g.-’,_)                                                     (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’,) /
                                                                  (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)
                                                               L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-
                                                                  A:.g.-’,_)


    From these two USLs set, we can compute the USL layer similarity, layer by layer.
For example, in the case of the layer 0, the size of the intersection of two USL sets is 1
(i.e., (A:+S:) which means documentary networks). And the size of the union of USL
sets is 3 (i.e., (A:+B:), (A:+S:), and (U:+S:)). The layer weight of the layer 0 would be
approximately 0.143 (i.e., 1/7). Consequently, the layer similarity at the layer 0 is
calculated as follows: simULayer0(Alice, Bob) = 1/7  1/3 = 0.048. In the case of the
later 1, the size of the intersection of two USL sets is 1 whereas the size of the union of
USL sets is 5. And thus, simULayer1(Alice, Bob) = 2/7  1/5 = 0.057. In a similar
fashion, it is possible to compute that simULayer2(Alice, Bob) = 0.043, that
simULayer3(Alice,           Bob)       =    0.163,      that      simULayer4(Alice,            Bob)      =    0,    that
simULayer5(Alice, Bob) = 0.857, and that simULayer6(Alice, Bob) = 0. Finally, from the
layer 0 to the layer 6, the semantic similarity between Alice and Bob, semUSim(Alice,
Bob), is computed by the sum of each layer similarity as follows: semUSim(Alice, Bob)
=   (0.048 + 0.057 + 0.043 + 0.143 + 0.857) = 0.25 1.168 = 0.292.
    In the same way, the semantic similarity between Alice and Nami and between Bob
and Nami is calculated as semUSim(Alice, Nami) = 0.11 and semUSim(Bob, Nami) =
0.132, respectively. It means that Alice is semantically similar to Bob, rather than Nami.


4.2.2 Item Recommendation via Semantic Social Ranking

Once we have identified a group of semantically nearest neighbors, the final step is a
prediction, this is, attempting to speculate upon how a certain user would prefer unseen
items. In our study, the basic idea of uncovering relevant items starts from assuming


                                                                                                                      19
Technical Report, March 2010




that a target user is likely to prefer items that have been tagged by semantically similar
users. Items tagged by similar user to the target user should be ranked higher. We label
this prediction strategy social ranking based on the semantic user model. Formally, the
semantic social ranking score of the target user u for the target item h, denoted as
SUR(u, h), is obtained as follows:

                                                         | USL *  USL h |
                        SUR (u , h )              ( u ) | USL h | v  semUSim (u , v )
                                               vSSN k
                                                               u
                                                                                                                                        (14)
                                                                   v



where SNNk(u) is a set of k nearest neighbors of user u grouped by the semantic user
similarity and          USLh
                           v
                                  is the union of USLs connected to tags that user v has assigned item
h. semUSim(u, v) denotes the semantic similarity between user u and user v.
   Once the ranking score of the target user for items, which have not previously been
tagged by him/her, are computed, the items are sorted in descending order of the score
SSR(u, h). Two strategies can then be used to select the relevant items to user u. First, if
the ranking scores are greater than a reasonable threshold, i.e., SUR(u, h) > , the items
are recommended to user u. Second, a set of top N ranked items that have obtained the
higher scores are identified for user u, and then, those items are recommended to user u.


      USLw 2 : USLs of Bob for Webpage 2
         Bob                                                                                       USLw3 : USLs of Bob for Webpage 3
                                                                                                      Bob

     L0: (A: + S:)                                                                                 L0: (U: + S:)
     L1: (b.)                                                                                      L1: (d.) / (t.)
     L2: (we.g.-) / (we.h.-)                                                                       L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-)
     L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (l.o.-u.u.-‘)                                            L3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’)
     L4: (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)                                    L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,)
     L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_)                                 / (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,)
     | USL*Alice  USLM 2 | 6
                      Bob                                                                                                             | USL*Alice  USLM 3 | 0
                                                                                                                                                       Bob

     | USLM 2 | 9
                                              USL*Alice : Union of Alice’s USLs                                                                     | USLM 3 | 12
          Bob                                                                                                                                            Bob

                                             L0: (A:+B:) / (A:+S:)
                                             L1: (k.) / (m.) / (b.)
                                             L2: (k.o.-) / (p.a.-) / (we.g.-) / (we.b.-)
                                             L3: (s.o.-a.a.-’)/ (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (t.e.-d.u.-' )                                     | USLM 3 | 9
     | USLM 2 | 8
          Nami                                                                                                                                           Nami
                                             L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_)
     | USL*Alice  USLM 2 | 3                                                                                                        | USL *
                                                                                                                                            Alice    USLM 3 | 4
                                                                                                                                                         Nami
                      Nami


     USL w 2 : USLs of Nami for Webpage 2                                                        USLw3 : USLs of Nami for Webpage 3
                                                                                                    Nami
         Nami

     L0: (U: + S:)                                                                                L0: (A: + B:)
     L1: (k.)                                                                                     L1: (k.)
     L2: (s.x.-) / (x.j.-)
                                                                                                  L2: (s.y.-) / (k.h.-) / (we.b.-)
     L3: (e.o.-we.h.-’) / (b.i.-b.i.-’)
     L4:(t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,)                                    L3: (s.o.-a.a.-’) / (b.o.-o.o.-’) / (a.u.-we.h.-’)
         / (s.a.-t.a.-’ b.i.-l.i.-t.u.-’,)                                                            / (k.o.-a.a.-')


 Fig. 6 An example of computing the semantic social ranking of Alice for Media2 and Media3




                                                                                                                                                                20
Technical Report, March 2010




    Concentrate on the same situation as the example described in the previous section.
And assume that Alice is a target user to whom the system should recommend items and
that the neighbors of Alice are Bob and Nami. We want to answer the question: which
items should the system recommend to Alice? To provide the answer, we should
compute the semantic social ranking scores for items that Alice has not previously
tagged (e.g., Media2 and Media3). First, let us calculate the social ranking of Alice for
Media2. To this end, we should first compute a number of the intersection categories
between Alice and each of neighbors for Media2. In the case of Bob, he annotated
Media2 with tag “OWL”. As can be seen in Fig. 6, the number of intersection categories
is 6, i.e., | USL*Alice  USLM 2 | 6 , and the size of union of Bob’s USLs for Media2 is 9, i.e.,
                             Bob


| USLM 2 | 9 .
     Bob
                  In the case of Nami who annotated Media2 with tag “Web of data”, the

number of intersection categories is 3, i.e., | USL*Alice  USLM 2 | 3 , and the size of union of
                                                               Nami


Nami’s USLs for Media2 is 8, i.e., | USLM 2 | 8 . Finally, the semantic ranking score of
                                        Nami


Alice for Media2 can be computed by the weighted sum of the number of the interaction
categories using the semantic similarity as the weight:
                                                  6  0 .292 3  0 .11
                          SUR ( Alice , M 2 )                         0 .236
                                                      9          8
    Second, let us calculate the semantic ranking score of Alice for Media3. In this case,
there exist no intersections of Alice and Bob at all layers, i.e., | USL*Alice  USLM 3 | 0 . In the
                                                                                    Bob


case of Nami, a number of the intersection categories is 4, i.e., | USL*Alice  USLM 3 | 4 .
                                                                                   Nami


Consequently, the ranking score for Media3 becomes SUR(Alice, M3)=0.049.
Considering two social media, Media2 and Media3, the system predicts that Media2
will be likely to fit Alice’s needs, rather than Media3.


4.3 Item-based Semantic Collaborative Filtering

In this section, we explain an item perspective of a social recommendation method by
using the semantic item model (Definition 8). With respect to an item-based
collaborative filtering, we first look into the set of similar items that the target user has
tagged and then compute how semantically similar they are to the target item, called a
semantic item-item similarity. Based on the semantically similar items, we recommend




                                                                                                     21
Technical Report, March 2010




relevant items to the target user through capturing how he/she annotated the similar
items.


4.3.1 Generating Semantically Similar Items

We define semantically similar items as a group of items that tagged categories of
IEML close to those of the target item. Semantic similarity between two items, i and j,
can be computed by the weighted sum of layer similarities from layer 0 to layer 6.
Formally, the semantic item similarity measure is defined as:
                                                          6
                        semISim(i, j )     simILayeri (i, j )                        (15)
                                                      l 0


where  is a normalizing factor such that the layer weights sum to unity. simILayerl(i, j)
denotes the layer similarity of two items at layer l. The layer similarity measures
between two USL sets is defined as the weighted Jaccard coefficient of two USL sets.
Formally, the layer similarity is given by:

                                       (l  1) | USLi* (l )  USL*j (l ) |
                   simILayer (i, j ) 
                                    l
                                                                                        (16)
                                          7     | USLi* (l )  USL*j (l ) |

where USLi* (l ) and   USL*j (l )       refer to the union of USLs for item i and j at layer l, 0  l 
6, respectively. Here we give more layer weights at higher layer when computing the
semantic item similarity. That is, the intersections of higher layers present more
contribution than intersections of lower layers. When  is set to 0.25 for normalization,
the similarity value between two users is in the range of 0 and 1. The higher score an
item has, the more similar it is to a target item. Finally, for a given item i Ĩ, particular
k items with the highest semantic similarity are identified as semantically k most similar
items such that:
                                                  k
                            SSI k (i )  arg ~
                                             max semIsim (i, j )                         (17)
                                               jI {i}




                                                                                                      22
Technical Report, March 2010




4.3.2 Item Recommendation via Semantic Social Ranking

Once we have identified a group of semantically similar items, the final step is a
prediction, this is, attempting to speculate upon how a certain user would prefer unseen
items. In our study, the basic idea of discovering relevant items starts from assuming
that a target user is likely to prefer items which are semantically similar to items that
he/she has tagged before. We call this prediction strategy social ranking based on the
semantic item model. Formally, the prediction value of the target user u for the target
item i, denoted as SIR(u, i), is obtained as follows:

                                                    | USL i*  USL u |
                                                                   j
                    SIR (u , i )      
                                     j SSI k ( i )
                                                               j
                                                        | USL u |
                                                                        semISim ( i , j )              (17)


where SSIk(i) is a set of k most similar items of item i grouped by the semantic item
similarity and USLuj is the union of USLs connected to tags that user u has annotated
item j. semISim(i, j) denotes the semantic similarity between item i and item j.
     Once the ranking score of the target user for items, which have not previously been
tagged by her, are computed, the items are sorted in descending order of the value
SIR(u, i). Finally, a set of top-N ranked items that have obtained the higher scores are
identified for user u, and then, those items are recommended to user u.



5 Experimental Evaluation

In this section, we present experimental evaluations of the proposed approach and
compare its performance against that of the benchmark algorithms.


5.1 Evaluation Design and Metrics

The experimental data comes from BibSonomy 8 which is a collaborative tagging
application allowing users to organize and share scholarly references. The dataset used
in this study is the p-core at level 5 from BibSonomy [26]. The p-core of level 5 means
that each user, tag and item has/occurs in at least 5 posts [9]. The original dataset


8
    http://bibsonomy.org



                                                                                                                     23
Technical Report, March 2010




contains several useless tags and system tags, such as “r”, “!”, and “system:imported”,
and thus we cleaned those tags in the experiments. Table 3 briefly describes our dataset.

Table 3 Characteristic of Bibsonomy dataset (p-core at level 5)

         |Ũ|          |Ĩ|              |Ť|           |Ł|            |Y|           |Ň|             # of posts

         116          361             400            325          9,996          3,783              2,494

To evaluate the performance of the recommendations, we randomly divided the dataset
into a training set and a test set. For each user u we randomly selected 20% of items
which he had previously posted and subsequently used those as the test set. And 80% of
items which he had previously posted was used as the training set. To ensure that our
results are not sensitive to the particular training/test portioning for each user, we used a
5-fold cross validation scheme [7]. Therefore, the result values reported in the
experiment section are the averages over all five runs.
   To measure the performance of the recommendations, we adopted precision and
recall, which can judge how relevant a set of ranked recommendations is for the user
[7]. Precision measures the ratio of items in a list of recommendations that were also
contained in the test set to number of items recommended. And recall measures the
ratio of in a list of recommendations that were also contained in the test set to the
overall number of items in test set. Precision and recall for a given user u in test set is
given:

                            | Test (u )  TopN (u ) | ,               | Test (u )  TopN (u ) |
         precision (u )                                recall (u )                                    (15)
                                  | TopN (u ) |                               | Test (u ) |

where Test(u) is the item list of user u in the test set and TopN(u) is the top N
recommended item list for user u. Finally, the overall precision and recall for all users
in the test set is computed by averaging the personal precision(u) and recall(u).
However, precision and recall are often in conflict with each other. Generally, the
increment of the number of items recommended tends to decrease precision but it
increases recall [18]. Therefore, to consider both of them as the quality judgment of
recommendations we use the standard F1 metric, which combines precision and recall
into a single number:




                                                                                                               24
Technical Report, March 2010




                                   2  precision  recall
                            F1                                               (16)
                                     precision  recall

   In order to compare the performance of our algorithm, a user-based CF algorithm,
where the similarity is computed by cosine-based similarity (denoted as UCF) [18], an
item-based CF algorithm [5], which employs cosine-based similarity (denoted as ICF),
and a most popular tags approach [9], which are based on tags that a user already used
(denoted as MPT), were implemented. The top N recommendation via the semantic
social ranking of the user-based method (denoted as SUR) and the item-based method
(denoted as SIR) were evaluated in comparison with the benchmark algorithms.


5.2 Experiment Results

In this section, we present detailed experimental results. The performance evaluation is
divided into two dimensions. The influence of the number of neighbors on the
performance is first investigated, and then the quality of item recommendations is
evaluated in comparison with the benchmark methods.


5.2.1 Experiments with Neighborhood Size

As noted in a number of previous studies, the size has significant impact on the
recommendation quality of neighborhood-based algorithms [5, 10, 17, 18]. Therefore,
we varied the neighborhood size k from 5 to 80. In the case of UCF and SUR, the
parameter k denotes the number of the nearest users whereas it is the number of the
most similar items for ICF and SIR. Even though MPT is not affected by the
neighborhood size at all, we also reported its result for the purpose of comparisons. In
this experiment, we set the number of recommended items N to 10 (i.e., top-10) for each
user in the test set.
   Fig. 7 shows a graph of how precisions and recalls of four methods changes as the
neighborhood size grows. UCF elevate precision as the neighborhood size increases
from 5 to 20, after this value, it decreased slightly. With respect to recall, the curve of
the graph for UCF tends to be flat. In the case of ICF, precision and recall improved
until a neighbor size of 40; beyond this point, further increases of the size give negative




                                                                                           25
Technical Report, March 2010




influence on performance. With respect to SIR, we observe that precision and recall
tend to improve slightly as k value is increased from 5 to 30 and to 40, respectively.
However, after this point, any further increases leads to worse results. For SUR, results
obtained look different. Unlike UCF, ICF, and SIR, It can be observed from the curve of
the graphs that SUR was quite affected by the size of the neighborhood. The two charts
for SUR reveal that increasing neighborhood size has detrimental effects on both
metrics. In other words, we found that SUR provides better precision and recall when
the neighborhood size is relatively small. For example, in terms of both precision and
recall, increasing neighborhood size much beyond 20 yielded rather worse results than
when the size was 5. This result indicates that superfluous users can negatively impact
the recommendation quality of our method.




          Fig. 7 Precision and recall with respect to increasing the neighborhood size

   We continued to examine F1 values in order to compare with each other and then to
select the best neighborhood size of each method. Fig. 8 depicts F1 variation of four
methods as the neighborhood size increases. SUR outperforms MPT and ICF at all
neighborhood size levels, whereas it provides improved performance over UCF as the
neighborhood size increases from 5 to 40; beyond this point, F1 was poorer for SUR
than for UCF. Examining the best value of each method, F1 of UCF is 0.107 (k=20), F1
of ICF is 0.099 (k=40), F1 of MPT is 0.0825, F1 of SUR is 0.119 (k=10), and F1 of SIR
is 0.114 (k=30). This result implies SUR can provide better performance than the other
methods even when data is sparse or available data for users is relatively insufficient. In
practice, CF recommender systems make a trade-off between recommendation quality
and real-time performance efficiency by pre-selecting a number of neighbors. In


                                                                                               26
Technical Report, March 2010




consideration of both quality and computation cost, we selected 20, 40, 10, and 30 as
the neighborhood size of UCF, ICF, SUR, and SIR, respectively, in subsequent
experiments.




               Fig. 8 F1 value with respect to increasing the neighborhood size


5.2.2 Comparisons with Other Methods

To experimentally evaluate the performance of top N recommendation, we calculated
precision, recall, and F1 obtained by UCF, ICF, MPT, SUR, and SIR according to the
variation of number of recommended items N from 2 to 10 with an increment of 2.
Since in practice users tend to click on items with higher ranks, we only examined a
small number of recommended items.
   Fig. 9 depicts the precision-recall plot, showing how precisions and recalls of four
methods changes as N value increases. Data points on the graph curves refer to the
number of recommended items. That is, first point of each curve represents the case of
N=2 (i.e., top-2) whereas last point is the case of N=10 (i.e., top-10). As can be observed
from the graph, the curves of all methods tend to descend. This phenomenon implies
that the increment of the number of items recommended tends to decrease precision but
it increases recall. With respect to precision, in the case that N=2 and N=4, SIR
demonstrates the best performance. However, SUR demonstrates the best performance




                                                                                              27
Technical Report, March 2010




as N is increased. With respect to recall, SUR outperforms the other four methods on all
occasions.




   Fig. 9 Precision and recall as the value of the number of recommended items N increases




     Fig. 10 Comparisons of F1 values as the number of recommended items N increases

  Let us now focus on F1 results. Fig. 10 depicts the results of F1, showing how SUR
and SIR outperforms the other methods. As shown, both of the methods show
considerably improved performance compared to UCF, ICF, and MPT. For example,
SUR achieves 0.1%, 1.6%, and 1.5% improvement in the case of top-2 (N=2) whereas
SIR achieves 0.3%, 1.9% and 1.7% improvement, compared to UCF, ICF, and MPT,
respectively. When comparing the results achieved by SUR and SIR, The


                                                                                             28
Technical Report, March 2010




recommendation quality of the former is superior to that of SIR as N is increased. In
terms of the five cases in average, SUR obtains 0.6%, 1.8%, 2.6%, and 0.3%
improvement compared to UCF, ICF, MPT, and SIR, respectively.




     Fig. 11 Comparisons of precision, recall, and F1 for cold start users and active users

   We further examined the recommendation performance for users who had few posts,
namely cold start users, and had lots of posts, namely active users, in the training set. A
CF-based recommender system is generally unable to make high quality
recommendations, compared to the case of active users, which is pointed out as one of
the limitations. We selectively considered two subsets of users who have less than 6
posts (21 users) and greater than 25 posts (21 users). And for two groups we calculated
precision, recall, and F1 within top-10 of ranked result set obtained by UCF, ICF,
MPT, and SUR. Fig. 11 shows those results for the cold start users (left) and active users
(right). As we can see from the graphs, the result demonstrated that F1 values of the
cold start group were considerably low compared to those for the active group. Such
results were caused by the fact that it was hard to analyze the users’ propensity for
postings because they did not have enough information (items or tags). Nevertheless,
comparing the results achieved by SUR and the benchmark algorithms, for the cold start
dataset, precision, recall, and F1 values of the former was found to be superior to those
of the other methods. For example, SUR obtains 11.9%, 14.3%, and 2.4% improvement
for recall compared to UCF, ICF, and MPT, respectively. In terms of F1, SUR
outperforms UCF, ICF and MPT by 2.2%, 2.6% and 0.4%, respectively. Only MPT
achieves comparable results. This result indicates that utilizing tagging information can
be helpful to alleviate the problem of the cold start users and thus to improve the quality



                                                                                               29
Technical Report, March 2010




of item recommendations. With respect to the active dataset, it can be observed that
SUR provides better performance on all occasions than ICF and MPT. And comparing
F1 obtained by UCF and SUR, the difference appears insignificant in a comparative
fashion. Although precision of SUR is slightly worse than that of UCF in the active
dataset, notably the proposed method provides better quality than the benchmark
methods. That is, SUR can provide more suitable items not only to the cold start users
but also to the active users. Comparing results in the cold start and active dataset
achieved by MTP, interesting results were observed. Simple approach based on tags for
recommendations works well enough for the cold start users, compared to UCF and
ICF. However, in the other scenario, superfluous tags of users can include noise instead.
   We conclude from these comparison experiments that our approaches can provide
consistently better quality of recommendations than the other methods. Furthermore, we
believe that the results of the proposed approach will become more practically
significant on large-scale Web 2.0 frameworks.



6 Concluding Remarks

For the future of the social Web, in this report, we have presented the semantic models
for the interoperability challenges that face semantic technology. We also proposed two
methods of collaborative filtering applied the semantic models and analyzed the
potential benefits of IEML to social recommender systems. As noted in our
experimental results, our methods can successfully enhance the performance of item
recommendations. Moreover, we also observed that our methods can provide more
suitable items for user interests, even when the number of recommended is small. The
main contributions of this study can be summarized as follows: 1) Our methods can
solve traditional stumbling blocks such as polysemy, synonymy, data sparseness, cold
start problem, semantic interoperability. 2) It can also offer trustworthy items
semantically relevant to a user’ needs because it becomes easier not only to catch
his/her preference but also to recommend to him/her by capturing semantics of user-
generated tags.




                                                                                         30
Technical Report, March 2010




Acknowledgment

The work was mainly funded since 2009 by the Canada Research Chair in Collective
Intelligence at University of Ottawa.



References
1.   Adomavicius, G., Tuzhilin, A. (2005) Toward the Next Generation of Recommender Systems: A
     survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data
     Engineering 17(6): 734-749
2.   Bao, S., Wu, X., Fei, B., Xue, G., Su, Z., Yu, Y. (2007) Optimizing web search using social
     annotations. In: Proceedings of the 16th International Conference on World Wide Web, pp. 501-510
3.   Bonhard, P., Sasse, A. (2006) ‘Knowing me, knowing you’ - using profiles and social networking to
     improve recommender systems. BT Technology Journal 24(3): 84-98
4.   Breese, J. S., Heckerman, D., Kadie, C. (1998) Empirical analysis of predictive algorithms for
     collaborative filtering. In: Proceedings of the Fourteenth Annual Conference on Uncertainty in
     Artificial Intelligence, pp. 43–52
5.   Deshpande, M., Karypis, G. (2004) Item-based Top-N Recommendation Algorithms. ACM
     Transactions on Information Systems22(1): 143-177
6.   Golder, S. A., Huberman, B. A. (2006) Usage patterns of collaborative tagging systems. Journal of
     Information Science 32(2): 198-208
7.   Herlocker, J. L., Konstan, J. A., Terveen, L. G., Riedl, J. T. (2004) Evaluating collaborative filtering
     recommender systems. ACM Transactions on Information Systems 22(1): 5-53
8.   Hotho, A., Jäschke, R., Schmitz, C., Stumme, G. (2006) Information Retrieval in Folksonomies:
     Search and ranking. In: Proceedings of the 3rd European Semantic Web Conference, pp. 411-426
9.   Jäschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., Stumme, G. (2008) Tag recommendations
     in social bookmarking systems. AI Communications 21(4): 231-247
10. Kim, H.-N., Ji, A.-T., Ha, I., Jo, G.-S. (2009) Collaborative filtering based on collaborative tagging
     for enhancing the quality of recommendation. Electronic Commerce Research and Applications, Doi:
     10.1016/j.elerap.2009.08.004
11. Lévy, P. (2009) Toward a self-referential collective intelligence some philosophical background of
     the IEML research program. In: Proceedings of 1st International Conference on Computational
     Collective Intelligence - Semantic Web, Social Networks & Multiagent Systems, pp. 22-35
12. Lévy, P. (2010) From social computing to reflexive collective intelligence: The IEML research
     program. Information Sciences 180(1): 71-94
13. Li, X., Guo, L., Zhao, Y. (2008) Tag-based social interest discovery. In: Proceedings of the 17th
     International Conference on World Wide Web, pp. 675-684



                                                                                                         31
Technical Report, March 2010




14. Marchetti, A., Tesconi, M., Ronzano, F. (2007) SemKey: A semantic collaborative tagging system.
    In: Proceedings of Tagging and Metadata for Social Information Organization Workshop in the 16th
    International Conference on World Wide Web
15. Peis, E., Morales-del-Castillo, J. M., Delgado-López, J. A. (2008) Semantic recommender systems.
    Analysis of the state of the topic. Hipertext.net number 6.
     http://www.hipertext.net/english/pag1031.htm. Accessed 15 Dec 2009
16. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J. (1994) GroupLens: An open
    architecture for collaborative filtering of netnews. In: Proceedings of the ACM 1994 Conference on
    Computer Supported Cooperative Work, pp. 175–186
17. Sarwar, B., Karypis, G., Konstan, J., Reidl, J. (2001) Item-based collaborative filtering
    recommendation algorithms. In: Proceedings of the Tenth International World Wide Web
    Conference, pp. 285-295
18. Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2000) Analysis of recommendation algorithms for E-
    commerce. In: Proceedings of ACM Conference on Electronic Commerce, pp. 158–167
19. Schenkel, R., Crecelius, T., Kacimi, M., Michel, S., Neumann, T., Parreira, J. X., Weikum, G. (2008)
    Efficient top-k querying over social-tagging networks. In: Proceedings of the 31st Annual
    International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.
    523-530
20. Siersdorfer, S., Sizov, S. (2009) Social recommender systems for web 2.0 folksonomies. In:
    Proceedings of the 20th ACM conference on Hypertext and hypermedia, pp. 261-270
21. Sigurbjörnsson, B., van Zwol, R. (2008) Flickr tag recommendation based on collective knowledge.
    In: Proceedings of the 17th International Conference on World Wide Web, pp. 327-336
22. Tso-Sutter, K. H. L., Marinho, L. B., Thieme, L. S (2008) Tag-aware recommender systems by
    fusion of collaborative filtering algorithms. In: Proceedings of the 2008 ACM symposium on Applied
    computing, pp. 1995-1999
23. Xu, Z., Fu, Y., Mao, J., Su, D. (2006) Towards the Semantic Web: collaborative tag suggestions. In:
    Proceedings of the Collaborative Web Tagging Workshop in the 15th International Conference on
    the World Wide Web
24. Zanardi, V., Capra, L. (2008) Social Ranking: Uncovering relevant content using tag-based
    recommender systems. In: Proceedings of the 2008 ACM conference on Recommender Systems, pp.
    51-58
25. Zhang, Z.-K., Zhou, T., Zhang, Y.-C. (2010) Personalized recommendation via integrated diffusion
    on user-item-tag tripartite graphs. Physica A: Statistical Mechanics and its Applications 389(1): 179-
    186
26. Knowledge and Data Engineering Group (2007) University of Kassel: Benchmark Folksonomy Data
    from BibSonomy, version of April 30th, 2007. http://www.kde.cs.uni-kassel.de/bibsonomy/dumps/.
    Accessed 15 Dec 2009




                                                                                                      32

More Related Content

What's hot

Benchmarking the Privacy-­Preserving People Search
Benchmarking the Privacy-­Preserving People SearchBenchmarking the Privacy-­Preserving People Search
Benchmarking the Privacy-­Preserving People SearchDaqing He
 
2000-ACM SIGCHI-The social life of small graphical chat spaces
2000-ACM SIGCHI-The social life of small graphical chat spaces2000-ACM SIGCHI-The social life of small graphical chat spaces
2000-ACM SIGCHI-The social life of small graphical chat spacesMarc Smith
 
Published Paper
Published PaperPublished Paper
Published PaperFaeza Noor
 
2007-JOSS-Visualizing the signatures of social roles in online discussion groups
2007-JOSS-Visualizing the signatures of social roles in online discussion groups2007-JOSS-Visualizing the signatures of social roles in online discussion groups
2007-JOSS-Visualizing the signatures of social roles in online discussion groupsMarc Smith
 
A hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratingsA hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratingsAravindharamanan S
 
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONSSTRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONSijcsit
 
Characterizing microblogs
Characterizing microblogsCharacterizing microblogs
Characterizing microblogsEtico Capital
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisScott Gomer
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...BAINIDA
 
2009-JCMC-Discussion catalysts-Himelboim and Smith
2009-JCMC-Discussion catalysts-Himelboim and Smith2009-JCMC-Discussion catalysts-Himelboim and Smith
2009-JCMC-Discussion catalysts-Himelboim and SmithMarc Smith
 
Making the invisible visible through SNA
Making the invisible visible through SNAMaking the invisible visible through SNA
Making the invisible visible through SNAMYRA School of Business
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Symeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Eleftherios Spyromitros-Xioufis
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
 
2000 - CSCW - Conversation Trees and Threaded Chats
2000  - CSCW - Conversation Trees and Threaded Chats2000  - CSCW - Conversation Trees and Threaded Chats
2000 - CSCW - Conversation Trees and Threaded ChatsMarc Smith
 
Tag based image retrieval (tbir) using automatic image annotation
Tag based image retrieval (tbir) using automatic image annotationTag based image retrieval (tbir) using automatic image annotation
Tag based image retrieval (tbir) using automatic image annotationeSAT Publishing House
 
Identification of inference attacks on private Information from Social Networks
Identification of inference attacks on private Information from Social NetworksIdentification of inference attacks on private Information from Social Networks
Identification of inference attacks on private Information from Social Networkseditorjournal
 
Social Network Analysis power point presentation
Social Network Analysis power point presentation Social Network Analysis power point presentation
Social Network Analysis power point presentation Ratnesh Shah
 
Automatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature ReviewAutomatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature ReviewDr. Amarjeet Singh
 

What's hot (20)

Benchmarking the Privacy-­Preserving People Search
Benchmarking the Privacy-­Preserving People SearchBenchmarking the Privacy-­Preserving People Search
Benchmarking the Privacy-­Preserving People Search
 
2000-ACM SIGCHI-The social life of small graphical chat spaces
2000-ACM SIGCHI-The social life of small graphical chat spaces2000-ACM SIGCHI-The social life of small graphical chat spaces
2000-ACM SIGCHI-The social life of small graphical chat spaces
 
Published Paper
Published PaperPublished Paper
Published Paper
 
2007-JOSS-Visualizing the signatures of social roles in online discussion groups
2007-JOSS-Visualizing the signatures of social roles in online discussion groups2007-JOSS-Visualizing the signatures of social roles in online discussion groups
2007-JOSS-Visualizing the signatures of social roles in online discussion groups
 
A hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratingsA hybrid recommender system user profiling from keywords and ratings
A hybrid recommender system user profiling from keywords and ratings
 
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONSSTRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
STRUCTURAL COUPLING IN WEB 2.0 APPLICATIONS
 
Characterizing microblogs
Characterizing microblogsCharacterizing microblogs
Characterizing microblogs
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Understanding User-Community Engagement by Multi-faceted Features: A Case ...Understanding User-Community Engagement by Multi-faceted Features: A Case ...
Understanding User-Community Engagement by Multi-faceted Features: A Case ...
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
 
2009-JCMC-Discussion catalysts-Himelboim and Smith
2009-JCMC-Discussion catalysts-Himelboim and Smith2009-JCMC-Discussion catalysts-Himelboim and Smith
2009-JCMC-Discussion catalysts-Himelboim and Smith
 
Making the invisible visible through SNA
Making the invisible visible through SNAMaking the invisible visible through SNA
Making the invisible visible through SNA
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...
 
2000 - CSCW - Conversation Trees and Threaded Chats
2000  - CSCW - Conversation Trees and Threaded Chats2000  - CSCW - Conversation Trees and Threaded Chats
2000 - CSCW - Conversation Trees and Threaded Chats
 
Tag based image retrieval (tbir) using automatic image annotation
Tag based image retrieval (tbir) using automatic image annotationTag based image retrieval (tbir) using automatic image annotation
Tag based image retrieval (tbir) using automatic image annotation
 
Identification of inference attacks on private Information from Social Networks
Identification of inference attacks on private Information from Social NetworksIdentification of inference attacks on private Information from Social Networks
Identification of inference attacks on private Information from Social Networks
 
Social Network Analysis power point presentation
Social Network Analysis power point presentation Social Network Analysis power point presentation
Social Network Analysis power point presentation
 
Automatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature ReviewAutomatic Hate Speech Detection: A Literature Review
Automatic Hate Speech Detection: A Literature Review
 

Viewers also liked

Reliable Resources
Reliable ResourcesReliable Resources
Reliable Resourcesguest5c2a5
 
Keyword Landing Pages
Keyword Landing PagesKeyword Landing Pages
Keyword Landing PagesMax Zalevski
 
Int Math 2 Section 2-6 1011
Int Math 2 Section 2-6 1011Int Math 2 Section 2-6 1011
Int Math 2 Section 2-6 1011Jimbo Lamb
 
Integrated Math 2 Section 5-8
Integrated Math 2 Section 5-8Integrated Math 2 Section 5-8
Integrated Math 2 Section 5-8Jimbo Lamb
 
Integrated Math 2 Section 5-5
Integrated Math 2 Section 5-5Integrated Math 2 Section 5-5
Integrated Math 2 Section 5-5Jimbo Lamb
 
Whats the big idea with social media media140-2012
Whats the big idea with social media media140-2012Whats the big idea with social media media140-2012
Whats the big idea with social media media140-2012Kate Carruthers
 
HCD Process
HCD ProcessHCD Process
HCD ProcessNTUST
 

Viewers also liked (8)

Reliable Resources
Reliable ResourcesReliable Resources
Reliable Resources
 
Keyword Landing Pages
Keyword Landing PagesKeyword Landing Pages
Keyword Landing Pages
 
情報発信のススメ
情報発信のススメ情報発信のススメ
情報発信のススメ
 
Int Math 2 Section 2-6 1011
Int Math 2 Section 2-6 1011Int Math 2 Section 2-6 1011
Int Math 2 Section 2-6 1011
 
Integrated Math 2 Section 5-8
Integrated Math 2 Section 5-8Integrated Math 2 Section 5-8
Integrated Math 2 Section 5-8
 
Integrated Math 2 Section 5-5
Integrated Math 2 Section 5-5Integrated Math 2 Section 5-5
Integrated Math 2 Section 5-5
 
Whats the big idea with social media media140-2012
Whats the big idea with social media media140-2012Whats the big idea with social media media140-2012
Whats the big idea with social media media140-2012
 
HCD Process
HCD ProcessHCD Process
HCD Process
 

Similar to Ieml social recommendersystems

A Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdfA Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdfAmber Ford
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networkseSAT Publishing House
 
Towards a hybrid recommendation approach using a community detection and eval...
Towards a hybrid recommendation approach using a community detection and eval...Towards a hybrid recommendation approach using a community detection and eval...
Towards a hybrid recommendation approach using a community detection and eval...IJECEIAES
 
Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02IJwest
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...csandit
 
A literature survey on recommendation
A literature survey on recommendationA literature survey on recommendation
A literature survey on recommendationaciijournal
 
A Literature Survey on Recommendation System Based on Sentimental Analysis
A Literature Survey on Recommendation System Based on Sentimental AnalysisA Literature Survey on Recommendation System Based on Sentimental Analysis
A Literature Survey on Recommendation System Based on Sentimental Analysisaciijournal
 
A Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media DataA Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media DataIOSR Journals
 
Sentiment analysis in SemEval: a review of sentiment identification approaches
Sentiment analysis in SemEval: a review of sentiment identification approachesSentiment analysis in SemEval: a review of sentiment identification approaches
Sentiment analysis in SemEval: a review of sentiment identification approachesIJECEIAES
 
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...BO TRUE ACTIVITIES SL
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Literature Review of Information Behaviour on Social Media
Literature Review of Information Behaviour on Social MediaLiterature Review of Information Behaviour on Social Media
Literature Review of Information Behaviour on Social MediaDavid Thompson
 
a modified weight balanced algorithm for influential users community detectio...
a modified weight balanced algorithm for influential users community detectio...a modified weight balanced algorithm for influential users community detectio...
a modified weight balanced algorithm for influential users community detectio...INFOGAIN PUBLICATION
 
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...cscpconf
 
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIA
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIATHE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIA
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIAIJCSES Journal
 
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTALCONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTALcscpconf
 

Similar to Ieml social recommendersystems (20)

A Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdfA Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdf
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networks
 
Towards a hybrid recommendation approach using a community detection and eval...
Towards a hybrid recommendation approach using a community detection and eval...Towards a hybrid recommendation approach using a community detection and eval...
Towards a hybrid recommendation approach using a community detection and eval...
 
Social search
Social searchSocial search
Social search
 
Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02Integrated expert recommendation model for online communitiesst02
Integrated expert recommendation model for online communitiesst02
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
 
A literature survey on recommendation
A literature survey on recommendationA literature survey on recommendation
A literature survey on recommendation
 
A Literature Survey on Recommendation System Based on Sentimental Analysis
A Literature Survey on Recommendation System Based on Sentimental AnalysisA Literature Survey on Recommendation System Based on Sentimental Analysis
A Literature Survey on Recommendation System Based on Sentimental Analysis
 
A Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media DataA Review: Text Classification on Social Media Data
A Review: Text Classification on Social Media Data
 
O017148084
O017148084O017148084
O017148084
 
Sentiment analysis in SemEval: a review of sentiment identification approaches
Sentiment analysis in SemEval: a review of sentiment identification approachesSentiment analysis in SemEval: a review of sentiment identification approaches
Sentiment analysis in SemEval: a review of sentiment identification approaches
 
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with ...
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Literature Review of Information Behaviour on Social Media
Literature Review of Information Behaviour on Social MediaLiterature Review of Information Behaviour on Social Media
Literature Review of Information Behaviour on Social Media
 
Sub1557
Sub1557Sub1557
Sub1557
 
a modified weight balanced algorithm for influential users community detectio...
a modified weight balanced algorithm for influential users community detectio...a modified weight balanced algorithm for influential users community detectio...
a modified weight balanced algorithm for influential users community detectio...
 
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
AN EXTENDED HYBRID RECOMMENDER SYSTEM BASED ON ASSOCIATION RULES MINING IN DI...
 
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIA
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIATHE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIA
THE SURVEY OF SENTIMENT AND OPINION MINING FOR BEHAVIOR ANALYSIS OF SOCIAL MEDIA
 
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTALCONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTAL
 

More from Antonio Medina

A metodologia da pesquisa e a importância das imagens
A metodologia da pesquisa e a importância das imagensA metodologia da pesquisa e a importância das imagens
A metodologia da pesquisa e a importância das imagensAntonio Medina
 
Propagaga marketing contemporâneo na indústria cultural musical enviado
Propagaga marketing contemporâneo na indústria cultural musical enviadoPropagaga marketing contemporâneo na indústria cultural musical enviado
Propagaga marketing contemporâneo na indústria cultural musical enviadoAntonio Medina
 
Quem tem medo das imagens inter
Quem tem medo das imagens interQuem tem medo das imagens inter
Quem tem medo das imagens interAntonio Medina
 
Ieml semantic topology
Ieml semantic topologyIeml semantic topology
Ieml semantic topologyAntonio Medina
 
La nouvelle sphere_publique
La nouvelle sphere_publiqueLa nouvelle sphere_publique
La nouvelle sphere_publiqueAntonio Medina
 
Vers une science de l’intelligence collective
Vers une science de l’intelligence collectiveVers une science de l’intelligence collective
Vers une science de l’intelligence collectiveAntonio Medina
 
La sphere publique_du_21e_siecle
La sphere publique_du_21e_siecleLa sphere publique_du_21e_siecle
La sphere publique_du_21e_siecleAntonio Medina
 
Políticas públicas alm
Políticas públicas almPolíticas públicas alm
Políticas públicas almAntonio Medina
 

More from Antonio Medina (10)

A metodologia da pesquisa e a importância das imagens
A metodologia da pesquisa e a importância das imagensA metodologia da pesquisa e a importância das imagens
A metodologia da pesquisa e a importância das imagens
 
Propagaga marketing contemporâneo na indústria cultural musical enviado
Propagaga marketing contemporâneo na indústria cultural musical enviadoPropagaga marketing contemporâneo na indústria cultural musical enviado
Propagaga marketing contemporâneo na indústria cultural musical enviado
 
Quem tem medo das imagens inter
Quem tem medo das imagens interQuem tem medo das imagens inter
Quem tem medo das imagens inter
 
Ieml semantic topology
Ieml semantic topologyIeml semantic topology
Ieml semantic topology
 
La nouvelle sphere_publique
La nouvelle sphere_publiqueLa nouvelle sphere_publique
La nouvelle sphere_publique
 
Vers une science de l’intelligence collective
Vers une science de l’intelligence collectiveVers une science de l’intelligence collective
Vers une science de l’intelligence collective
 
La sphere publique_du_21e_siecle
La sphere publique_du_21e_siecleLa sphere publique_du_21e_siecle
La sphere publique_du_21e_siecle
 
Mudança
MudançaMudança
Mudança
 
Políticas públicas alm
Políticas públicas almPolíticas públicas alm
Políticas públicas alm
 
Todo e parte
Todo e parteTodo e parte
Todo e parte
 

Ieml social recommendersystems

  • 1. Technical Report, March 2010 Social Recommender Systems on IEML Semantic Space Heung-Nam Kim, Andrew Roczniak, Pierre Lévy, Abdulmotaleb El-Saddik Collective Intelligence Lab, University of Ottawa 4 March 2010 H. N. Kim, A. Roczniak, P. Lévy, A. EI-Saddik, “Social recommender systems on IEML semantic space,” Collective Intelligence Lab, University of Ottawa, Technical Report, March 2010 1
  • 2. Technical Report, March 2010 Social Recommender Systems on IEML Semantic Space Heung-Nam Kim1,2, Andrew Roczniak1, Pierre Lévy1, Abdulmotaleb El Saddik2 1 Collective Intelligence Lab, University of Ottawa 2 Multimedia Communication Research Lab, University of Ottawa Abstract In this report, we present two social recommendation methods that are incorporated with the semantics of the tags: a user-based semantic collaborative filtering and an item- based semantic collaborative filtering. Social tagging is employed as an approach in order to grasp and filter users’ preferences for items. In addition, we analyze potential benefits of IEML models for social recommender systems in order to solve polysemy, synonymy, and semantic interoperability problems, which are notable challenges in information filtering. Experimental results show that our methods offer significant advantages both in terms of improving the recommendation quality and in dealing with polysemy, synonymy, and interoperability issues. 1 Introduction The prevalence of social media sites bring considerable changes not only on people’s life patterns but also on the generation and distribution of information. This social phenomenon has transformed the masses, who were only information consumers via mass media, to be producers of information. However, as rich information is shared through social media sites, the sheer amount of information that has not been available before is increasing exponentially with daily additions. As well as finding the most attractive and relevant content, users struggle with a great challenge in information overload. Recommender systems that have emerged in response to the above challenges provide users with recommendations of items that are more likely to fit their needs [17]. 2
  • 3. Technical Report, March 2010 With the popularity of social tagging (also known as collaborative tagging or folksonomies) a number of researchers have recently concentrated on recommender systems with social tagging [10, 13, 20, 22, 25]. Because modern social media sites, such as Flickr1, YouTube2, Twitter3, and Delicious4 allow users to freely annotate their contents with any kind of descriptive words, also known as tags [6], the users tend to use the descriptive tags to annotate the contents that they are interested in [13]. Recommender systems incorporated with tags can alleviate limitations of traditional recommender systems, such as the sparsity problem and the cold start problem [1], and thus the systems eventually provide promising possibilities to better generate personalized recommendations. Although these studies obtain reasonable promise of improving the performance, they do not take into consideration the semantics of tags themselves. Consequently, the lack of semantic information suffers from fundamental problems: polysemy and synonymy of the tags, as clearly discussed in [6]. Without the semantics of the tags used by users, the systems cannot differentiate the various social interests of the users from the same tags. Furthermore, they cannot provide semantic interoperability that is a notable challenge in the cyberspace [11]. To address the discussed issues, we introduce a new concept to capture semantics of user-generated tags. We then propose two social recommendation methods that are incorporated with the semantics of the tags: a user-based semantic collaborative filtering and an item-based semantic collaborative filtering. First, in the user-based method, we determine similarities between users by utilizing users’ semantic-oriented tags, collectively called Uniform Semantic Locator (USL) and subsequently identify semantically like-minded users for each user. Finally, we recommend social items (e.g., text, picture, video) based on the social ranking of the items that are semantically associated to tags that like-minded users annotate. Second, in the item-based method, we determine similarities between items by utilizing USLs and identify semantically similar items for each item. Finally, we recommend social items based on the semantically similar items. 1 http://www.flickr.com 2 http://www.youtube.com 3 http://twitter.com 4 http://delicious.com 3
  • 4. Technical Report, March 2010 The main contributions of this study toward social recommender systems can be summarized as follows: 1) We present and formalize models for semantic-oriented social tagging in dealing with the issues of polysemy, synonymy, and semantic interoperability. We also illustrate how the models can be adapted and applied to existing social tagging systems. 2) We propose the methods of social recommendations in semantic space that aim to find semantically similar users/items and discover social items semantically relevant to users’ needs. The rest of this report is organized as follows: in next section we review concepts related to collaborative filtering and provide recent studies applying social tagging to recommender systems. In Section 3, we provide some models used in our study. We then describe our semantic models for social recommender systems and provide a detailed description of how the models are applied to item recommendations in Section 4. In Section 5, we present the effectiveness of our methods through experimental evaluations. Finally, we summarize our work. 2 Related Work In this section, we summarize previous studies and position our study with respect to other related works in the area. 2.1 Collaborative Filtering Following the proposal of GroupLens [16], automated recommendations based on Collaborative Filtering (CF) have seen the widest use. CF is based on the fact that “word of mouth” opinions of other people have considerable influence on the buyers’ decision making [10]. If advisors have similar preferences to the buyer, he/she is much more likely to be affected by their opinions. In CF-based recommendation schemes, two approaches have mainly been developed: user-based approaches [4, 16, 18] and item-based approaches [5, 17]. Usually, user-based and item-based CF systems involve two steps. First, the neighbor group, which are users who have a similar preference to a target user, called k nearest neighbors (for user-based CF) or the set of items that is similar to a target item, called k most similar items (for an item-based CF), is 4
  • 5. Technical Report, March 2010 determined by using a variety of similarity computing methods, such as Pearson correlation-based similarity, cosine-based similarity, and so on. This step is an important task in CF-based recommendations because different neighbor users or items lead to different recommendations [18]. Once the neighborhood is generated, in second step, the prediction values of particular items, estimating how much the target user is likely to prefer the items, are computed based on the group of neighbors. The more a neighbor is similar to the target user or the target item, the more influence he/she or it has for calculating a prediction value. After predicting how much the target user will like particular items not previously rated by him/her, the top N item set, the set of ordered items with the highest predicted values, is identified and recommended. The target user can present feedback on whether he/she actually likes the recommend top N items or how much he/she prefers those items as scaled ratings. 2.2 Social Tagging in Recommender Systems Social tagging is the practice of allowing any user to freely annotate the content with any kind of arbitrary keywords (i.e., tags) [6]. Social media sites with social tagging have become tremendously popular in recent years. Therefore, the area of recommender systems with social tagging (folksonomy) has become active and growing topic of studies. These studies can be broadly divided into three topics: tag suggestions, social searches, and social recommendations. With the popularity of the usage of tags, many researchers have proposed new applications for recommender systems supporting the suggestion of suitable tags during folksonomy development. In [21], a tag recommender system with Flickr’s dataset is presented based on an analysis of how users annotate photos and what information is contained in the tagging. In [9], three classes of algorithms for tag recommendations, such as an adaptation of user-based CF, a graph-based FolkRank [8] algorithm, and simple methods based on tag counts, are presented and evaluated. Xu et al. [23] propose an algorithm for collaborative tag suggestions that employs a reputation score for each user based on the quality of the tags contributed by the user. In [14], a new semantic tagging system, SemKey, is proposed in order to combine semantic technologies with the collaborative tagging paradigm in a way that can be highly beneficial to both areas. 5
  • 6. Technical Report, March 2010 Differently from our aims, the purpose of these studies using social tagging is basically to recommend appropriate tags for assisting the user in annotation related tasks. Our approach takes a different stance. Rather than offering the tag recommendations, our aim is to find like-minded users with tags’ semantics and identify personal resources semantically relevant to user needs. Research has also been very active in relating information retrieval using social tagging. In [8], the authors presented a formal model and a new search algorithm for folksonomies called FolkRank. The FolkRank is applied not only to find communities within the folksonomy but also to recommend tags and resources. In [24], a social ranking mechanism is proposed to answer a user’s query that aims to transparently improve content searches based on emergent tags semantics. It exploits users’ similarity and tags’ similarity based on their past tagging activity. In [19], the ContextMerge algorithm is introduced to support efficiently user-centric searches in social networks, dynamically including related users in the execution. The algorithm adopts two- dimensional expansions: social expansion considers the strength of relations among users and semantic expansion considers the relatedness of different tags. In [2], two algorithms are proposed, SocialSimRank and SocialPageRank. The former algorithm calculates the similarity between tags and user queries whereas the latter one captures page popularity based on its annotations. All these works attempt to improve users’ searches by incorporating social annotations into query expansion. Differing from these works, our goal is to automatically identify resources without users’ queries, which are likely to fit their needs. Other researchers have studied the same area as our study. In [10], authors proposed collaborative filtering method via collaborative tagging. First they determine similarities between users with social tags and subsequently identify the latent tags for each user to recommend items via a naïve Bayes approach. Tso-Sutter et al. [22] proposed a generic method that allows tags to be incorporated into CF algorithms by reducing the three- dimensional correlations to three two-dimensional correlations and then applying a fusion method to re-associate these correlations. Similar approach is presented by [25] as well in order to provide improved recommendations to users. Although these studies give reasonable promise of improving the performance, they do not take the semantics of tags into consideration. Consequently, the lack of semantic information has 6
  • 7. Technical Report, March 2010 limitations, such as polysemy and synonymy of tags, for identifying similar users with user-generated tags. We believe the semantic information of tags can be more helpful not only to grasp better users’ interests but also to enhance the quality of recommendations. The current literature recently focuses on semantic recommender systems that are similar to our goal. Unlike our approach, however, the existing work on the semantic recommender systems relies on a prefixed ontology and uses technologies from Semantic Web. The state of the topic for the semantic recommender systems has been well analyzed in [15]. 3 IEML Models For understanding our semantic approach, this section briefly explains preliminary concepts of Information Economy MetaLanguage (IEML5) that will be exploited in the next sections of this report. The IEML research program promotes a radical innovation in the notation and processing of semantics. IEML is a regular language and a symbolic system for the notation of meaning. It is “semantic content oriented” rather than “instruction oriented” like programming languages or “format oriented” like data standards. IEML provides new methods for semantic interoperability, semantic navigation, collective categorization and self-referential collective intelligence [11]. IEML research program is compatible with the major standards of the Web of data and is in tune with the current trends in social computing. 3.1 IEML Overview IEML expressions are built from a syntactically regular combination of six symbols, called primitives. In IEML a sequence is a succession of 3l single primitives, where l = (0, 1, 2, 3, 4, 5, 6). l is called the layer of a sequence. For each layer, the sequences have respectively a length of 1, 3, 9, 27, 81, 243, and 729 primitives [12]. From a syntactic point of view, any IEML expression is nothing else than a set of sequences. As there is a distinct semantic for each distinct sequence, there is also a distinct semantic for each distinct set of sequences. In general, the meaning of a set of sequences corresponds to 5 http://ieml.org 7
  • 8. Technical Report, March 2010 the union of the meaning of the sequences of this set. The main result is that any algebraic operation that can be made on sets in general can also be made on semantics (significations) once they are expressed in IEML. An IEML dictionary provides the correspondence between IEML sequences and a natural language descriptor of an IEML expression. The terms of the dictionary belong to layers 0-3. There are rules to create inflected words from these terms, to create sentences from inflected words and to create relations between sentences by using some terms as conjunctions. Given these rules, it is possible to express any network of relations between sentences by using sequences up to layer 6. Various notation, syntax, semantics, and examples of IEML have been presented in [12]. Due to a lack of space, we refer the reader to [12] for more details. 3.2 IEML Language Model We present the model of the IEML language, along with the model of semantic variables. Let  be a nonempty and finite set of symbols,  = {S, B, T, U, A, E}. Let string s be a finite sequence of symbols chosen from . The length of this string is denoted by |s|. An empty string  is a string with zero occurrence of symbols and its length is | |= 0. The set of all strings of length k composed with symbols from  is defined as k = {s where |s| = k}. Note that 0 = {} and 1 = {S, B, T, U, A, E}. Although  and 1 are sets containing exactly the same members, the former contains symbols, and the latter strings. The set of all strings over  is defined as * = 0123 … A useful operation on strings is concatenation, defined as follows. For all si = a1a2a3a4…ai * and sj = b1b2b3b4…bj*, then sisj denotes string concatenation such that sisj = a1a2a3a4…ai b1b2b3b4…bj and |sisj| = i + j. The IEML language over  is a subset of *, LIEML  *: LIEML  {s   * | | s | 3l , 0  l  6} (1) 8
  • 9. Technical Report, March 2010 3.3 Model of Semantic Sequences Definition 1 (Semantic sequence) a string s is called a semantic sequence if and only if sLIEML. To denote the pnth primitive of a sequence s, we use a superscript n where 1  n  3l and write sn. Note that for any sequence s of layer l, sn is undefined for any n > 3l. Two semantic sequences are distinct if and only if either of the following holds : i) their layers are different, ii) they are composed from different primitives, iii) their primitives do not follow the same order: for any sa and sb, s a  sb  n, s a  sb  | s a || sb | n n (2) Let’s now consider binary relations between semantic sequences in general. These are obtained by performing a Cartesian product of two sets6. For any set of semantic sequences X, Y where saX, sbY and using Equation 2, we define four binary relations whole  X × Y, substance  X × Y , attribute  X × Y, and mode  X × Y as follows: whole  {( sa , sb ) | sa  sb } substance  {( s a , sb ) | s a  sb  | s a | 3 | sb |, 1  n  | sb |} n n (3) attribute  {( s a , sb ) | s a |sb |  sb  | s a | 3 | sb |, 1  n  | sb |} n n mode  {( s a , sb ) | s a  sb  2|sb |  | s a | 3 | sb |, 1  n  | sb |} n n Any two semantic sequences that are equal are in a whole relationship. In addition, any two semantic sequences that share specific subsequences may be in substance, attribute or mode relationship. For any two semantic sequences sa and sb, if they are in one of the above relations, then we say that sb plays a role w.r.t sa and we call sb a seme of sequence. Definition 2 (seme of a sequence) For any semantic sequence sa and sb, if (sa, sb)  whole substanceattributemode, then sb plays a role w.r.t. sa and sb is called a seme. We can now group distinct semantic sequences together into sets. A useful grouping is based on the layer of those semantic sequences. 6 A Cartesian product of two sets X and Y is written as follows: X × Y = {(x, y) | xX, yY}. 9
  • 10. Technical Report, March 2010 3.4 Model of Semantic Categories and Catsets A category of LIEML is a subset such that all strings of that subset have the same length: c  {si , s j  LIEML where | si || s j |} (4) Definition 3 (semantic category) A semantic category c is a set containing semantic sequences at the same layer. The layer of any category c is exactly the same as the layer of the semantic sequences included in that category. The set of all categories of layer l is given as the powerset7 of the set of all strings of layer l of LIEML: Cl = Powerset({s  LIEML where |s|=3l}) (5) Two categories are distinct if and only if they differ by at least one element. For any ca and cb: c a  cb  c a  c b  cb  c a (6) A weaker condition can be applied to categories of distinct layers (since two categories are different if their layers are different) and is written as: l ( c a )  l ( cb )  c a  cb (7) where l(ca) and l(cb) denotes the layer of category ca and cb, respectively. Analogously to sequences, we consider binary relations between any categories ci and cj where l(ci), l(cj)  1. For any set of categories X, Y where ca  X, cb  Y, we define four binary relations wholeC  X  Y, substanceC  X  Y, attributeC  X  Y, and modeC  X  Y as follows: whole C  {(ca , cb ) | ca  cb } substance C  {(ca , cb ) | s a  ca , sb  cb , ( s a , sb )  substance} (8) attribute C  {(ca , cb ) | s a  ca , sb  cb , ( s a , sb )  attribute} mode C  {(ca , cb ) | s a  c a , sb  cb , ( s a , sb )  mode} 7 A powerset of S is the set of all subsets of S, including the empty set . 10
  • 11. Technical Report, March 2010 For any two categories ca and cb, if they are in one of the above relations (ca, cb)  wholeC  substanceC  attributeC  modeC, then we say that cb plays a role with respect to ca and cb is called a seme of category. A catset is a set of distinct categories of the same layer as defined Definition 4. Definition 4 (Catset) A catset  is a set containing categories such that  ={cn |i, j: ci  cj, l(ci)=l(cj)}. The layer of a catset is given by the layer of any of its members: if some c  , then l() = l(c). Note that a category c can be written as c  Cl, while a catset  can be written as   Cl. All standard set operations, such as union and intersection, (e.g.,  and ), can be performed on catsets of the same layer. 3.5 Model of Uniform Semantic Locator A USL is composed of up to seven catsets of different layers as follows: Definition 5 (Uniform Semantic Locator, USL) A USL  is a set containing catsets of different layers such that  = {n | i, j: l(ci)  l(cj)}. Note that since there are seven distinct layers, a USL can have at most seven members. All standard set operations, such as union and intersection (e.g.,  and ) on USLs are always performed on sets of categories (and therefore on sets of sequences), layer by layer. Since at each layer l there is |Cl| distinct catsets, the whole semantic space is defined by the tuple: Ł =C0  C1  C2  C3  C4  C5  C6. In the IEML notation of USLs, the categories are separated by ‘‘/ ”. Table 1 shows an example of USLs used as a *tag for “Wikipedia” and “XML” and a layer by layer English translation of those USLs. A *tag holds the place of an IEML expression by suggesting its meaning rather than uttering the IEML expression. The meaning of a *tag has to be understood from the singular place that its corresponding IEML expression occupies into the network of IEML semantic relations [12]. 11
  • 12. Technical Report, March 2010 Table 1 An example of USLs in semantic space [12]. Tag: *Wikipedia USL Semantics in English L0: (U: + S:) L0: knowledge networks L1: (d.) / (t.) L1: truth/ memory L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-) L2: get one’s bearings in knowledge / act for the L3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’) sake of society / synthesize / organized L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,) knowledge/ collective creation / (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,) L3: opening public space / encyclopedia L4: collective intelligence encyclopedia in cyberspace / volunteers producing didactic material on any subject Tag: *XML USL Semantics in English L0: (A: + S:) L0: document networks L1: (b.) L1: language L2: (we.g.-) / (we.b.-) L2: unify the documentation / cultivate information L3: (e.o.-we.h.-’) / (b.i.-b.i.-’) / (t.e.-d.u.-’) system L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.- L3: establishing norms and standards/ cyberspace / A:.g.-’,_) meeting information needs L5: guarantee compatibility of data through same formal structure 4 Social Recommender Systems on IEML Semantic Space Fig. 1 System overview of social recommender systems based on IEML semantic tagging: a user-based semantic collaborative filtering and an item-based semantic collaborative filtering In this section, we present two social recommendation methods that are incorporated with USLs. We explore two well-known approaches of collaborative filtering to our study: a user-based approach and an item-based approach. Fig. 1 illustrates the overall process of each method with two phases: a neighborhood formation phase and an item 12
  • 13. Technical Report, March 2010 recommendation phase. In the neighborhood formation phase, we first compute similarities between users (for a user-based method) and between items (for an item- based method) by utilizing USLs. Thereafter, we generate a set of semantically like- minded users for each user and a set of semantically similar items for each item. Based on the neighborhood, in the recommendation phase, we predict a social ranking of items to decide which items to recommend. 4.1 Semantic Models in Bipartite Space Fig. 2 Bipartite space of social tagging space and USL semantic space The social tagging in our study is free-to-all allowing any user to annotate any item with any tag [6]. Therefore, if there is a list of r users Ũ = {u1, u2, …, ur}, a list of m tags Ť = {t1, t2, … , tm}, and a list of n items Ĩ = {i1, i2, …, in}, the social tagging, folksonomy F is a tuple F = Ũ, Ť, Ĩ, Y where Y  Ũ × Ť × Ĩ is a ternary relationship called tag assignments [9]. More conceptually, the triplet of the tagging space can be represented as three-dimensional data cube. Beyond the tagging space, in our study, there is another space where the tags are connected to USLs according to their semantics. We call this space the IEML semantic space as illustrated in Fig. 2. Note that a USL  is composed of catsets (Definition 5) where each catset  consists of semantic categories c (Definition 4). Therefore an extended formal definition of the folksonomy, called semantic folksonomy, is defined as follows: 13
  • 14. Technical Report, March 2010 Definition 6 (Semantic Folksonomy) Let Ł = C0  C1  C2  C3  C4  C5  C6 be the whole semantic space. A semantic folksonomy is a tuple SF = Ũ, Ť, Ĩ, Y, Ň where Ň is a ternary relationship such as Ň  Ũ × Ť × Ł. From semantic folksonomies, we present a formal description of two models, a semantic user model and a semantic item model, which are used in our social recommender system. 4.1.1 Semantic Model for the User From semantic folksonomies, a semantic user model is defined as follows: Definition 7 (Semantic User Model) Given a user u  Ũ, a formal description of a user model for user u, Mu, follows: Mu=Ťu, Ňu, where Ťu = {(t, i)  Ť × Ĩ | (u, t, i)  Y } and Ňu = {(t, )  Ť × Ł | (u, t, )  Ň }. For clarity some definitions of the sets used in this work are introduced. We define, for a given user u, the set of all tags that user u has used Tu* := {t  Ť | i  Ĩ: (t, i)  Ťu}. Therefore, the set of all USLs of the user u can be defined as N u := {  Ł | t  * Tu* : (t, )  Ňu}. For a certain item h, we define the set of tags that the user u annotated the item h Tuh := {t  Ť | Ťu  (t ×{h})}. Accordingly, the set of all USLs of user u for the item h can be defined as N uh := {  Ł | t  Tuh : (t, )  Ňu}. As stated previously, all standard set operations, such as union and intersection on USLs, can always be performed on sets of categories, layer by layer. Therefore, for a given user model Mu for user u, with respect to semantic space, tags that represent user u has used to annotated a certain item h, can be represented as the union of USLs   N uh : 6 USLh  l 0 USLh (l ) , where USLh (l )  l ( c u u u h cn (9) n )  l , cn  ,N u  Likewise, all tags that user u has used are represented as the union of USLs   N u : * n 6 USL*  i 1USLiu  l 0 USL* (l ) , where USL* (l )  l ( c u u u cn (10) n )  l ,cn  ,N u  * 14
  • 15. Technical Report, March 2010 User u tags Item h h USL* u USL u USL* (0) u USL* (3) u USL* (6) u USLh (0) u … USLh (3) u … USLh (6) u C1 C2 …… Cn-1 Cn Catset in item h USL* (l ) USLh (l ) Cn Semantic Category u u at layer l Fig. 3 Conceptual user models with respect to IEML semantic space 4.1.2 Semantic Model for the Item From semantic folksonomies, a semantic item model is defined as follows: Definition 8 (Semantic Item Model) Given an item i  Ĩ, a formal description of a semantic item model for item i, M(i), follows: M(i) = Ť(i), Ň(i), where Ť(i) = {(u, t)  Ũ ׍ | (u, t, i)  Y } and Ň(i) = {(t, )  Ť × Ł | (u, t, )  Ň }. For clarity, we introduce some definitions of the sets from an item perspective. We define, for a given item i, the set of all tags that all users have annotated item i, T*i := {t  Ť | u  Ũ: (u, t)  Ť(i)}. Therefore, the set of all USLs of the item i can be defined as N *i := {  Ł | t  T*i : (t, )  Ň(i)}. For a certain user v, we define the set of tags that the user v annotated the item i, Tvi := {t  Ť | Ť(i)  ({v}× t)}. Accordingly, The set of all USLs of user v for the item i can be defined as N vi := {  Ł | t  Tvi : (t, )  Ň(i)}. As stated previously, all standard set operations, such as union and intersection on USLs, can always be performed on sets of categories, layer by layer. Therefore, for a given item model M(i) for item i, with respect to semantic space, tags that represent a certain user v has used to annotated item i can be represented as the union of USLs,   N vi : USLiv  l 0 USLiv (l ) , where USLiv (l )  l ( c 6 i cn (11) n )  l , c n  , N v 15
  • 16. Technical Report, March 2010 Likewise, all tags that have been annotated item i are represented as the union of USLs,   N *i : r 6 USLi*  u 1USLiu  l  0 USLi* (l ) , where USLi* (l )  l ( c i cn (12) n )  l , cn  , N *  USL i* USLiv USLi* ( 0 ) USLi* (3) USLi* (6) USLiv (0) USLiv (3) i USLv (6) USLi* (l ) USLiv (l ) Fig. 4 Conceptual item models with respect to IEML semantic space 4.2 User-based Semantic Collaborative Filtering In this section, we describe a social recommendation method based on the semantic user model (Definition 7). The basic idea of a user-based semantic collaborative filtering starts from assuming that a certain user is likely to prefer items that like-minded users have annotated with tags which are similar to the tags he/she used. Therefore, we first look into the set of like-minded users who have tagged a target item and then compute how semantically similar they are to the target user, called a user-user semantic similarity. Based on the semantically similar users, the semantic social ranking of the item is computed to decide whether or not to recommend. 4.2.1 Generating Semantically Nearest Neighbors One of the most important tasks in CF recommender systems is the neighborhood formation to identify a set of users who have similar taste, often called k nearest neighbors. Those users can be directly defined as a group of connected friends in social networks such as followings in Twitter, connections in Twine, people in Delicious, friends in Facebook, and so on. However, most users have insufficient connections with 16
  • 17. Technical Report, March 2010 their friends. In addition, finding like-minded users in current social network services still relies on manually browsing networks of friends, or keywords searching. Thus, this form of establishing neighbors becomes a time consuming and ineffective process if we take into consideration huge amount of available people in the network [3]. As mentioned in Section 2.1, typical collaborative filtering methods adopt a variety of similarity measurement to determine similar users automatically. However, it also encounters serious limitations, namely sparsity problem [1, 10]. It is often the case that there is no intersection between two users and hence the similarity is not computable at all. Even when the computation of similarity is possible, it may not be very reliable because insufficient information is processed. To deal with this problem recent studies have determined similarities between users by using user-generated tags in terms of users’ characteristics [10, 13, 20, 22, 25]. However, there still remain limitations that should be treated such as noise, polysemy, and synonymy tags. To this end, our study identifies the nearest neighbors by using Uniform Semantic Locators, USLs, of each user for more valuable and personalized analyses. We define semantically similar users as a group of users presenting interest categories of IEML close to those of the target user. Semantic similarity between two users, u and v, can be computed by the sum of layer similarities from layer 0 to layer 6. Formally, the semantic user similarity measure is defined as: 6 semUSim(u , v)     simULayer l (u, v) (11) l 0 where  is a normalizing factor such that the layer weights sum to unity. simULayerl(u, v) denotes the layer similarity of two users at layer l. The layer similarity measures between two USL sets is defined as the size of the intersection divided by the size of the union of the USL sets. In other words, it is determined by computing the weighted Jaccard coefficient of two USL sets. Formally, the layer similarity is given by: (l  1) | USL* (l )  USL* (l ) | simULayer (u , v )  l  u v (12) 7 | USLu (l )  USLv (l ) | * * where USL* (l ) u and USL* (l ) v refer to the union of USLs for user u and v at layer l, 0  l  6, respectively. Here we give more layer weights at higher layer when computing the 17
  • 18. Technical Report, March 2010 semantic user similarity. That is, the intersections of higher layers present more contribution than intersections of lower layers. When  is set to 0.25 for normalization, the similarity value between two users is in the range of 0 and 1. The higher score a user has, the more similar he/she is to a target user. Finally, for a given user u Ũ, particular k users with the highest semantic similarity are identified as semantically k nearest neighbors such that: k SSN k (u )  arg~max semUsim (u , v ) (13) vU {u } To illustrate a simple example for computing semantic user similarity, consider the following three users, Alice, Bob, and Nami. Alice annotated Media1 with tags such as “Community” and “XML”, Bob annotated Media2 with “Wikipedia” and “OWL” tags, and Nami annotated Media3 with “Web of data” and “Folksonomy” tags. In addition, consider USLs of each tag as shown in Fig. 5. *Community *XML USL USL L0: (A: + B:) L0: (A: + S:) L1: (k.) / (m.) L1: (b.) L2: (k.o.-) / (p.a.-) L2: (we.g.-) / (we.b.-) L3: (s.o.-a.a.-’) L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (t.e.-d.u.-‘) L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_) *Wikipedia *OWL USL USL L0: (U: + S:) L0: (A: + S:) L1: (d.) / (t.) L1: (b.) L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-) L2: (we.g.-) / (we.h.-) L3: (a.u.-we.h.-') / (n.o.-y.y.-s.y.-') L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (l.o.-u.u.-‘) L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,) L4: (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,) / (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,) L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_) *Web of data *Folksonomy USL USL L0: (U: + S:) L0: (A: + B:) L1: (k.) L1: (k.) L2: (s.x.-) / (x.j.-) L2: (s.y.-) / (k.h.-) / (we.b.-) L3: (e.o.-we.h.-’) / (b.i.-b.i.-’) L3: (s.o.-a.a.-’) / (b.o.-o.o.-’) / (a.u.-we.h.-’) L4:(t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,) / (k.o.-a.a.-') / (s.a.-t.a.-’ b.i.-l.i.-t.u.-’,) Fig. 5 An example of tag assignments and USLs for computing the semantic user similarity Now, compute the semantic similarity between Alice and Bob. Table 2 shows the results of the union of the two USLs for “Community” and “XML” tags and for “Wikipedia” and “OWL” tags, respectively. 18
  • 19. Technical Report, March 2010 Table 2 The union of USLs for Alice and Bob. USL*Alice : the union of Alice’s USLs USL* : the union of Bob’s USLs Bob L0: (A:+B:) / (A:+S:) L0: (U:+S:) / (A:+S:) L1: (k.) / (m.) / (b.) L1: (d.) / ( t.) / (b.) L2: (k.o.-) / (p.a.-) / (we.g.-) / (we.b.-) L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-) / L3: (s.o.-a.a.-’) / (e.o.-we.h.-’) / (b.i.-b.i.-’) / (t.e.- / (we.g.-) / (we.h.-) d.u.-’) L3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’) / (e.o.-we.h.-’) / L4:  (b.i.-b.i.-’) / (l.o.-u.u.-’) L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.- L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,) / A:.g.-’,_) (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’,) / (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,) L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.- A:.g.-’,_) From these two USLs set, we can compute the USL layer similarity, layer by layer. For example, in the case of the layer 0, the size of the intersection of two USL sets is 1 (i.e., (A:+S:) which means documentary networks). And the size of the union of USL sets is 3 (i.e., (A:+B:), (A:+S:), and (U:+S:)). The layer weight of the layer 0 would be approximately 0.143 (i.e., 1/7). Consequently, the layer similarity at the layer 0 is calculated as follows: simULayer0(Alice, Bob) = 1/7  1/3 = 0.048. In the case of the later 1, the size of the intersection of two USL sets is 1 whereas the size of the union of USL sets is 5. And thus, simULayer1(Alice, Bob) = 2/7  1/5 = 0.057. In a similar fashion, it is possible to compute that simULayer2(Alice, Bob) = 0.043, that simULayer3(Alice, Bob) = 0.163, that simULayer4(Alice, Bob) = 0, that simULayer5(Alice, Bob) = 0.857, and that simULayer6(Alice, Bob) = 0. Finally, from the layer 0 to the layer 6, the semantic similarity between Alice and Bob, semUSim(Alice, Bob), is computed by the sum of each layer similarity as follows: semUSim(Alice, Bob) =   (0.048 + 0.057 + 0.043 + 0.143 + 0.857) = 0.25 1.168 = 0.292. In the same way, the semantic similarity between Alice and Nami and between Bob and Nami is calculated as semUSim(Alice, Nami) = 0.11 and semUSim(Bob, Nami) = 0.132, respectively. It means that Alice is semantically similar to Bob, rather than Nami. 4.2.2 Item Recommendation via Semantic Social Ranking Once we have identified a group of semantically nearest neighbors, the final step is a prediction, this is, attempting to speculate upon how a certain user would prefer unseen items. In our study, the basic idea of uncovering relevant items starts from assuming 19
  • 20. Technical Report, March 2010 that a target user is likely to prefer items that have been tagged by semantically similar users. Items tagged by similar user to the target user should be ranked higher. We label this prediction strategy social ranking based on the semantic user model. Formally, the semantic social ranking score of the target user u for the target item h, denoted as SUR(u, h), is obtained as follows: | USL *  USL h | SUR (u , h )   ( u ) | USL h | v  semUSim (u , v ) vSSN k u (14) v where SNNk(u) is a set of k nearest neighbors of user u grouped by the semantic user similarity and USLh v is the union of USLs connected to tags that user v has assigned item h. semUSim(u, v) denotes the semantic similarity between user u and user v. Once the ranking score of the target user for items, which have not previously been tagged by him/her, are computed, the items are sorted in descending order of the score SSR(u, h). Two strategies can then be used to select the relevant items to user u. First, if the ranking scores are greater than a reasonable threshold, i.e., SUR(u, h) > , the items are recommended to user u. Second, a set of top N ranked items that have obtained the higher scores are identified for user u, and then, those items are recommended to user u. USLw 2 : USLs of Bob for Webpage 2 Bob USLw3 : USLs of Bob for Webpage 3 Bob L0: (A: + S:) L0: (U: + S:) L1: (b.) L1: (d.) / (t.) L2: (we.g.-) / (we.h.-) L2: (wo.y.-) / (wa.k.-) / (e.y.-) / (s.y.-) / (k.h.-) L3: (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (l.o.-u.u.-‘) L3: (a.u.-we.h.-’) / (n.o.-y.y.-s.y.-’) L4: (t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,) L4: (n.o.-y.y.-s.y.-’ s.o.-k.o.-S:.-’ b.i.-b.i.-T:.l.-’,) L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_) / (u.e.-we.h.-’ m.a.-n.a.-f.o.-’ U:.-E:.-T:.-’ ,) | USL*Alice  USLM 2 | 6 Bob | USL*Alice  USLM 3 | 0 Bob | USLM 2 | 9 USL*Alice : Union of Alice’s USLs | USLM 3 | 12 Bob Bob L0: (A:+B:) / (A:+S:) L1: (k.) / (m.) / (b.) L2: (k.o.-) / (p.a.-) / (we.g.-) / (we.b.-) L3: (s.o.-a.a.-’)/ (e.o.-we.h.-‘) / (b.i.-b.i.-‘) / (t.e.-d.u.-' ) | USLM 3 | 9 | USLM 2 | 8 Nami Nami L5: (i.i.-we.h.-’ E:.-’ s.a.-t.a.-’, E:.-’, b.o.-j.j.-A:.g.-’,_) | USL*Alice  USLM 2 | 3 | USL * Alice  USLM 3 | 4 Nami Nami USL w 2 : USLs of Nami for Webpage 2 USLw3 : USLs of Nami for Webpage 3 Nami Nami L0: (U: + S:) L0: (A: + B:) L1: (k.) L1: (k.) L2: (s.x.-) / (x.j.-) L2: (s.y.-) / (k.h.-) / (we.b.-) L3: (e.o.-we.h.-’) / (b.i.-b.i.-’) L4:(t.e.-t.u.-wa.e.-’ k.i.-b.i.-t.u.-’ b.e.-E:.-A:.x.-’,) L3: (s.o.-a.a.-’) / (b.o.-o.o.-’) / (a.u.-we.h.-’) / (s.a.-t.a.-’ b.i.-l.i.-t.u.-’,) / (k.o.-a.a.-') Fig. 6 An example of computing the semantic social ranking of Alice for Media2 and Media3 20
  • 21. Technical Report, March 2010 Concentrate on the same situation as the example described in the previous section. And assume that Alice is a target user to whom the system should recommend items and that the neighbors of Alice are Bob and Nami. We want to answer the question: which items should the system recommend to Alice? To provide the answer, we should compute the semantic social ranking scores for items that Alice has not previously tagged (e.g., Media2 and Media3). First, let us calculate the social ranking of Alice for Media2. To this end, we should first compute a number of the intersection categories between Alice and each of neighbors for Media2. In the case of Bob, he annotated Media2 with tag “OWL”. As can be seen in Fig. 6, the number of intersection categories is 6, i.e., | USL*Alice  USLM 2 | 6 , and the size of union of Bob’s USLs for Media2 is 9, i.e., Bob | USLM 2 | 9 . Bob In the case of Nami who annotated Media2 with tag “Web of data”, the number of intersection categories is 3, i.e., | USL*Alice  USLM 2 | 3 , and the size of union of Nami Nami’s USLs for Media2 is 8, i.e., | USLM 2 | 8 . Finally, the semantic ranking score of Nami Alice for Media2 can be computed by the weighted sum of the number of the interaction categories using the semantic similarity as the weight: 6  0 .292 3  0 .11 SUR ( Alice , M 2 )    0 .236 9 8 Second, let us calculate the semantic ranking score of Alice for Media3. In this case, there exist no intersections of Alice and Bob at all layers, i.e., | USL*Alice  USLM 3 | 0 . In the Bob case of Nami, a number of the intersection categories is 4, i.e., | USL*Alice  USLM 3 | 4 . Nami Consequently, the ranking score for Media3 becomes SUR(Alice, M3)=0.049. Considering two social media, Media2 and Media3, the system predicts that Media2 will be likely to fit Alice’s needs, rather than Media3. 4.3 Item-based Semantic Collaborative Filtering In this section, we explain an item perspective of a social recommendation method by using the semantic item model (Definition 8). With respect to an item-based collaborative filtering, we first look into the set of similar items that the target user has tagged and then compute how semantically similar they are to the target item, called a semantic item-item similarity. Based on the semantically similar items, we recommend 21
  • 22. Technical Report, March 2010 relevant items to the target user through capturing how he/she annotated the similar items. 4.3.1 Generating Semantically Similar Items We define semantically similar items as a group of items that tagged categories of IEML close to those of the target item. Semantic similarity between two items, i and j, can be computed by the weighted sum of layer similarities from layer 0 to layer 6. Formally, the semantic item similarity measure is defined as: 6 semISim(i, j )     simILayeri (i, j ) (15) l 0 where  is a normalizing factor such that the layer weights sum to unity. simILayerl(i, j) denotes the layer similarity of two items at layer l. The layer similarity measures between two USL sets is defined as the weighted Jaccard coefficient of two USL sets. Formally, the layer similarity is given by: (l  1) | USLi* (l )  USL*j (l ) | simILayer (i, j )  l  (16) 7 | USLi* (l )  USL*j (l ) | where USLi* (l ) and USL*j (l ) refer to the union of USLs for item i and j at layer l, 0  l  6, respectively. Here we give more layer weights at higher layer when computing the semantic item similarity. That is, the intersections of higher layers present more contribution than intersections of lower layers. When  is set to 0.25 for normalization, the similarity value between two users is in the range of 0 and 1. The higher score an item has, the more similar it is to a target item. Finally, for a given item i Ĩ, particular k items with the highest semantic similarity are identified as semantically k most similar items such that: k SSI k (i )  arg ~ max semIsim (i, j ) (17) jI {i} 22
  • 23. Technical Report, March 2010 4.3.2 Item Recommendation via Semantic Social Ranking Once we have identified a group of semantically similar items, the final step is a prediction, this is, attempting to speculate upon how a certain user would prefer unseen items. In our study, the basic idea of discovering relevant items starts from assuming that a target user is likely to prefer items which are semantically similar to items that he/she has tagged before. We call this prediction strategy social ranking based on the semantic item model. Formally, the prediction value of the target user u for the target item i, denoted as SIR(u, i), is obtained as follows: | USL i*  USL u | j SIR (u , i )   j SSI k ( i ) j | USL u |  semISim ( i , j ) (17) where SSIk(i) is a set of k most similar items of item i grouped by the semantic item similarity and USLuj is the union of USLs connected to tags that user u has annotated item j. semISim(i, j) denotes the semantic similarity between item i and item j. Once the ranking score of the target user for items, which have not previously been tagged by her, are computed, the items are sorted in descending order of the value SIR(u, i). Finally, a set of top-N ranked items that have obtained the higher scores are identified for user u, and then, those items are recommended to user u. 5 Experimental Evaluation In this section, we present experimental evaluations of the proposed approach and compare its performance against that of the benchmark algorithms. 5.1 Evaluation Design and Metrics The experimental data comes from BibSonomy 8 which is a collaborative tagging application allowing users to organize and share scholarly references. The dataset used in this study is the p-core at level 5 from BibSonomy [26]. The p-core of level 5 means that each user, tag and item has/occurs in at least 5 posts [9]. The original dataset 8 http://bibsonomy.org 23
  • 24. Technical Report, March 2010 contains several useless tags and system tags, such as “r”, “!”, and “system:imported”, and thus we cleaned those tags in the experiments. Table 3 briefly describes our dataset. Table 3 Characteristic of Bibsonomy dataset (p-core at level 5) |Ũ| |Ĩ| |Ť| |Ł| |Y| |Ň| # of posts 116 361 400 325 9,996 3,783 2,494 To evaluate the performance of the recommendations, we randomly divided the dataset into a training set and a test set. For each user u we randomly selected 20% of items which he had previously posted and subsequently used those as the test set. And 80% of items which he had previously posted was used as the training set. To ensure that our results are not sensitive to the particular training/test portioning for each user, we used a 5-fold cross validation scheme [7]. Therefore, the result values reported in the experiment section are the averages over all five runs. To measure the performance of the recommendations, we adopted precision and recall, which can judge how relevant a set of ranked recommendations is for the user [7]. Precision measures the ratio of items in a list of recommendations that were also contained in the test set to number of items recommended. And recall measures the ratio of in a list of recommendations that were also contained in the test set to the overall number of items in test set. Precision and recall for a given user u in test set is given: | Test (u )  TopN (u ) | , | Test (u )  TopN (u ) | precision (u )  recall (u )  (15) | TopN (u ) | | Test (u ) | where Test(u) is the item list of user u in the test set and TopN(u) is the top N recommended item list for user u. Finally, the overall precision and recall for all users in the test set is computed by averaging the personal precision(u) and recall(u). However, precision and recall are often in conflict with each other. Generally, the increment of the number of items recommended tends to decrease precision but it increases recall [18]. Therefore, to consider both of them as the quality judgment of recommendations we use the standard F1 metric, which combines precision and recall into a single number: 24
  • 25. Technical Report, March 2010 2  precision  recall F1  (16) precision  recall In order to compare the performance of our algorithm, a user-based CF algorithm, where the similarity is computed by cosine-based similarity (denoted as UCF) [18], an item-based CF algorithm [5], which employs cosine-based similarity (denoted as ICF), and a most popular tags approach [9], which are based on tags that a user already used (denoted as MPT), were implemented. The top N recommendation via the semantic social ranking of the user-based method (denoted as SUR) and the item-based method (denoted as SIR) were evaluated in comparison with the benchmark algorithms. 5.2 Experiment Results In this section, we present detailed experimental results. The performance evaluation is divided into two dimensions. The influence of the number of neighbors on the performance is first investigated, and then the quality of item recommendations is evaluated in comparison with the benchmark methods. 5.2.1 Experiments with Neighborhood Size As noted in a number of previous studies, the size has significant impact on the recommendation quality of neighborhood-based algorithms [5, 10, 17, 18]. Therefore, we varied the neighborhood size k from 5 to 80. In the case of UCF and SUR, the parameter k denotes the number of the nearest users whereas it is the number of the most similar items for ICF and SIR. Even though MPT is not affected by the neighborhood size at all, we also reported its result for the purpose of comparisons. In this experiment, we set the number of recommended items N to 10 (i.e., top-10) for each user in the test set. Fig. 7 shows a graph of how precisions and recalls of four methods changes as the neighborhood size grows. UCF elevate precision as the neighborhood size increases from 5 to 20, after this value, it decreased slightly. With respect to recall, the curve of the graph for UCF tends to be flat. In the case of ICF, precision and recall improved until a neighbor size of 40; beyond this point, further increases of the size give negative 25
  • 26. Technical Report, March 2010 influence on performance. With respect to SIR, we observe that precision and recall tend to improve slightly as k value is increased from 5 to 30 and to 40, respectively. However, after this point, any further increases leads to worse results. For SUR, results obtained look different. Unlike UCF, ICF, and SIR, It can be observed from the curve of the graphs that SUR was quite affected by the size of the neighborhood. The two charts for SUR reveal that increasing neighborhood size has detrimental effects on both metrics. In other words, we found that SUR provides better precision and recall when the neighborhood size is relatively small. For example, in terms of both precision and recall, increasing neighborhood size much beyond 20 yielded rather worse results than when the size was 5. This result indicates that superfluous users can negatively impact the recommendation quality of our method. Fig. 7 Precision and recall with respect to increasing the neighborhood size We continued to examine F1 values in order to compare with each other and then to select the best neighborhood size of each method. Fig. 8 depicts F1 variation of four methods as the neighborhood size increases. SUR outperforms MPT and ICF at all neighborhood size levels, whereas it provides improved performance over UCF as the neighborhood size increases from 5 to 40; beyond this point, F1 was poorer for SUR than for UCF. Examining the best value of each method, F1 of UCF is 0.107 (k=20), F1 of ICF is 0.099 (k=40), F1 of MPT is 0.0825, F1 of SUR is 0.119 (k=10), and F1 of SIR is 0.114 (k=30). This result implies SUR can provide better performance than the other methods even when data is sparse or available data for users is relatively insufficient. In practice, CF recommender systems make a trade-off between recommendation quality and real-time performance efficiency by pre-selecting a number of neighbors. In 26
  • 27. Technical Report, March 2010 consideration of both quality and computation cost, we selected 20, 40, 10, and 30 as the neighborhood size of UCF, ICF, SUR, and SIR, respectively, in subsequent experiments. Fig. 8 F1 value with respect to increasing the neighborhood size 5.2.2 Comparisons with Other Methods To experimentally evaluate the performance of top N recommendation, we calculated precision, recall, and F1 obtained by UCF, ICF, MPT, SUR, and SIR according to the variation of number of recommended items N from 2 to 10 with an increment of 2. Since in practice users tend to click on items with higher ranks, we only examined a small number of recommended items. Fig. 9 depicts the precision-recall plot, showing how precisions and recalls of four methods changes as N value increases. Data points on the graph curves refer to the number of recommended items. That is, first point of each curve represents the case of N=2 (i.e., top-2) whereas last point is the case of N=10 (i.e., top-10). As can be observed from the graph, the curves of all methods tend to descend. This phenomenon implies that the increment of the number of items recommended tends to decrease precision but it increases recall. With respect to precision, in the case that N=2 and N=4, SIR demonstrates the best performance. However, SUR demonstrates the best performance 27
  • 28. Technical Report, March 2010 as N is increased. With respect to recall, SUR outperforms the other four methods on all occasions. Fig. 9 Precision and recall as the value of the number of recommended items N increases Fig. 10 Comparisons of F1 values as the number of recommended items N increases Let us now focus on F1 results. Fig. 10 depicts the results of F1, showing how SUR and SIR outperforms the other methods. As shown, both of the methods show considerably improved performance compared to UCF, ICF, and MPT. For example, SUR achieves 0.1%, 1.6%, and 1.5% improvement in the case of top-2 (N=2) whereas SIR achieves 0.3%, 1.9% and 1.7% improvement, compared to UCF, ICF, and MPT, respectively. When comparing the results achieved by SUR and SIR, The 28
  • 29. Technical Report, March 2010 recommendation quality of the former is superior to that of SIR as N is increased. In terms of the five cases in average, SUR obtains 0.6%, 1.8%, 2.6%, and 0.3% improvement compared to UCF, ICF, MPT, and SIR, respectively. Fig. 11 Comparisons of precision, recall, and F1 for cold start users and active users We further examined the recommendation performance for users who had few posts, namely cold start users, and had lots of posts, namely active users, in the training set. A CF-based recommender system is generally unable to make high quality recommendations, compared to the case of active users, which is pointed out as one of the limitations. We selectively considered two subsets of users who have less than 6 posts (21 users) and greater than 25 posts (21 users). And for two groups we calculated precision, recall, and F1 within top-10 of ranked result set obtained by UCF, ICF, MPT, and SUR. Fig. 11 shows those results for the cold start users (left) and active users (right). As we can see from the graphs, the result demonstrated that F1 values of the cold start group were considerably low compared to those for the active group. Such results were caused by the fact that it was hard to analyze the users’ propensity for postings because they did not have enough information (items or tags). Nevertheless, comparing the results achieved by SUR and the benchmark algorithms, for the cold start dataset, precision, recall, and F1 values of the former was found to be superior to those of the other methods. For example, SUR obtains 11.9%, 14.3%, and 2.4% improvement for recall compared to UCF, ICF, and MPT, respectively. In terms of F1, SUR outperforms UCF, ICF and MPT by 2.2%, 2.6% and 0.4%, respectively. Only MPT achieves comparable results. This result indicates that utilizing tagging information can be helpful to alleviate the problem of the cold start users and thus to improve the quality 29
  • 30. Technical Report, March 2010 of item recommendations. With respect to the active dataset, it can be observed that SUR provides better performance on all occasions than ICF and MPT. And comparing F1 obtained by UCF and SUR, the difference appears insignificant in a comparative fashion. Although precision of SUR is slightly worse than that of UCF in the active dataset, notably the proposed method provides better quality than the benchmark methods. That is, SUR can provide more suitable items not only to the cold start users but also to the active users. Comparing results in the cold start and active dataset achieved by MTP, interesting results were observed. Simple approach based on tags for recommendations works well enough for the cold start users, compared to UCF and ICF. However, in the other scenario, superfluous tags of users can include noise instead. We conclude from these comparison experiments that our approaches can provide consistently better quality of recommendations than the other methods. Furthermore, we believe that the results of the proposed approach will become more practically significant on large-scale Web 2.0 frameworks. 6 Concluding Remarks For the future of the social Web, in this report, we have presented the semantic models for the interoperability challenges that face semantic technology. We also proposed two methods of collaborative filtering applied the semantic models and analyzed the potential benefits of IEML to social recommender systems. As noted in our experimental results, our methods can successfully enhance the performance of item recommendations. Moreover, we also observed that our methods can provide more suitable items for user interests, even when the number of recommended is small. The main contributions of this study can be summarized as follows: 1) Our methods can solve traditional stumbling blocks such as polysemy, synonymy, data sparseness, cold start problem, semantic interoperability. 2) It can also offer trustworthy items semantically relevant to a user’ needs because it becomes easier not only to catch his/her preference but also to recommend to him/her by capturing semantics of user- generated tags. 30
  • 31. Technical Report, March 2010 Acknowledgment The work was mainly funded since 2009 by the Canada Research Chair in Collective Intelligence at University of Ottawa. References 1. Adomavicius, G., Tuzhilin, A. (2005) Toward the Next Generation of Recommender Systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17(6): 734-749 2. Bao, S., Wu, X., Fei, B., Xue, G., Su, Z., Yu, Y. (2007) Optimizing web search using social annotations. In: Proceedings of the 16th International Conference on World Wide Web, pp. 501-510 3. Bonhard, P., Sasse, A. (2006) ‘Knowing me, knowing you’ - using profiles and social networking to improve recommender systems. BT Technology Journal 24(3): 84-98 4. Breese, J. S., Heckerman, D., Kadie, C. (1998) Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence, pp. 43–52 5. Deshpande, M., Karypis, G. (2004) Item-based Top-N Recommendation Algorithms. ACM Transactions on Information Systems22(1): 143-177 6. Golder, S. A., Huberman, B. A. (2006) Usage patterns of collaborative tagging systems. Journal of Information Science 32(2): 198-208 7. Herlocker, J. L., Konstan, J. A., Terveen, L. G., Riedl, J. T. (2004) Evaluating collaborative filtering recommender systems. ACM Transactions on Information Systems 22(1): 5-53 8. Hotho, A., Jäschke, R., Schmitz, C., Stumme, G. (2006) Information Retrieval in Folksonomies: Search and ranking. In: Proceedings of the 3rd European Semantic Web Conference, pp. 411-426 9. Jäschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., Stumme, G. (2008) Tag recommendations in social bookmarking systems. AI Communications 21(4): 231-247 10. Kim, H.-N., Ji, A.-T., Ha, I., Jo, G.-S. (2009) Collaborative filtering based on collaborative tagging for enhancing the quality of recommendation. Electronic Commerce Research and Applications, Doi: 10.1016/j.elerap.2009.08.004 11. Lévy, P. (2009) Toward a self-referential collective intelligence some philosophical background of the IEML research program. In: Proceedings of 1st International Conference on Computational Collective Intelligence - Semantic Web, Social Networks & Multiagent Systems, pp. 22-35 12. Lévy, P. (2010) From social computing to reflexive collective intelligence: The IEML research program. Information Sciences 180(1): 71-94 13. Li, X., Guo, L., Zhao, Y. (2008) Tag-based social interest discovery. In: Proceedings of the 17th International Conference on World Wide Web, pp. 675-684 31
  • 32. Technical Report, March 2010 14. Marchetti, A., Tesconi, M., Ronzano, F. (2007) SemKey: A semantic collaborative tagging system. In: Proceedings of Tagging and Metadata for Social Information Organization Workshop in the 16th International Conference on World Wide Web 15. Peis, E., Morales-del-Castillo, J. M., Delgado-López, J. A. (2008) Semantic recommender systems. Analysis of the state of the topic. Hipertext.net number 6. http://www.hipertext.net/english/pag1031.htm. Accessed 15 Dec 2009 16. Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J. (1994) GroupLens: An open architecture for collaborative filtering of netnews. In: Proceedings of the ACM 1994 Conference on Computer Supported Cooperative Work, pp. 175–186 17. Sarwar, B., Karypis, G., Konstan, J., Reidl, J. (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the Tenth International World Wide Web Conference, pp. 285-295 18. Sarwar, B., Karypis, G., Konstan, J., Riedl, J. (2000) Analysis of recommendation algorithms for E- commerce. In: Proceedings of ACM Conference on Electronic Commerce, pp. 158–167 19. Schenkel, R., Crecelius, T., Kacimi, M., Michel, S., Neumann, T., Parreira, J. X., Weikum, G. (2008) Efficient top-k querying over social-tagging networks. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 523-530 20. Siersdorfer, S., Sizov, S. (2009) Social recommender systems for web 2.0 folksonomies. In: Proceedings of the 20th ACM conference on Hypertext and hypermedia, pp. 261-270 21. Sigurbjörnsson, B., van Zwol, R. (2008) Flickr tag recommendation based on collective knowledge. In: Proceedings of the 17th International Conference on World Wide Web, pp. 327-336 22. Tso-Sutter, K. H. L., Marinho, L. B., Thieme, L. S (2008) Tag-aware recommender systems by fusion of collaborative filtering algorithms. In: Proceedings of the 2008 ACM symposium on Applied computing, pp. 1995-1999 23. Xu, Z., Fu, Y., Mao, J., Su, D. (2006) Towards the Semantic Web: collaborative tag suggestions. In: Proceedings of the Collaborative Web Tagging Workshop in the 15th International Conference on the World Wide Web 24. Zanardi, V., Capra, L. (2008) Social Ranking: Uncovering relevant content using tag-based recommender systems. In: Proceedings of the 2008 ACM conference on Recommender Systems, pp. 51-58 25. Zhang, Z.-K., Zhou, T., Zhang, Y.-C. (2010) Personalized recommendation via integrated diffusion on user-item-tag tripartite graphs. Physica A: Statistical Mechanics and its Applications 389(1): 179- 186 26. Knowledge and Data Engineering Group (2007) University of Kassel: Benchmark Folksonomy Data from BibSonomy, version of April 30th, 2007. http://www.kde.cs.uni-kassel.de/bibsonomy/dumps/. Accessed 15 Dec 2009 32