SlideShare uma empresa Scribd logo
1 de 9
Baixar para ler offline
A Contextual-bandit Algorithm for Mobile Context-
               Aware Recommender System

            Djallel Bouneffouf, Amel Bouzeghoub & Alda Lopes Gançarski

    Department of Computer Science, Télécom SudParis, UMR CNRS Samovar, 91011 Evry
                                      Cedex, France

{Djallel.Bouneffouf, Amel.Bouzeghoub, Alda.Gancarski}@it-
                       sudparis.eu



        Abstract. Most existing approaches in Mobile Context-Aware Recommender
        Systems focus on recommending relevant items to users taking into account
        contextual information, such as time, location, or social aspects. However, none
        of them has considered the problem of user’s content evolution. We introduce
        in this paper an algorithm that tackles this dynamicity. It is based on dynamic
        exploration/exploitation and can adaptively balance the two aspects by deciding
        which user’s situation is most relevant for exploration or exploitation. Within a
        deliberately designed offline simulation framework we conduct evaluations
        with real online event log data. The experimental results demonstrate that our
        algorithm outperforms surveyed algorithms.

        Keywords: recommender system; machine learning; exploration/exploitation
        dilemma; artificial intelligence.


1       Introduction

Mobile technologies have made access to a huge collection of information, anywhere
and anytime. In particular, most professional mobile users acquire and maintain a
large amount of content in their repository. Moreover, the content of such repository
changes dynamically, undergoes frequent insertions and deletions. In this sense, rec-
ommender systems must promptly identify the importance of new documents, while
adapting to the fading value of old documents. In such a setting, it is crucial to identi-
fy interesting content for users. This problem has been addressed in recent research in
the Mobile Context-Aware Recommender Systems (MCRS) area [2, 4, 5, 14, 19, 20].
Most of these approaches are based on the user computational behavior and his sur-
rounding environment. Nevertheless, they do not tackle the dynamicity of the user’s
content problem. The bandit algorithm is a well-known solution that addresses this
problem as a need for balancing exploration/exploitation (exr/exp) tradeoff. A bandit
algorithm B exploits its past experience to select documents that appear more fre-
quently. Besides, these seemingly optimal documents may in fact be suboptimal, be-
cause of the imprecision in B’s knowledge. In order to avoid this undesired case, B


adfa, p. 1, 2011.
© Springer-Verlag Berlin Heidelberg 2011
has to explore documents by choosing seemingly suboptimal documents so as to
gather more information about them. Exploitation can decrease short-term user’s sat-
isfaction since some suboptimal documents may be chosen. However, obtaining in-
formation about the documents’ average rewards (i.e., exploration) can refine B’s
estimate of the documents’ rewards and in turn increases long-term user’s satisfac-
tion. Clearly, neither a purely exploring nor a purely exploiting algorithm works well,
and a good tradeoff is needed. One classical solution to the multi-armed bandit prob-
lem is the ε-greedy strategy [12]. With the probability 1-ε, this algorithm chooses the
best documents based on current knowledge; and with the probability ε, it uniformly
chooses any other documents uniformly. The ε parameter controls essentially the
exp/exr tradeoff between exploitation and exploration. One drawback of this algo-
rithm is that it is difficult to decide in advance the optimal value. Instead, we intro-
duce an algorithm named Contextual-ε-greedy that achieves this goal by balancing
adaptively the exp/exr tradeoff according to the user’s situation. This algorithm ex-
tends the ε-greedy strategy with an update of the exr/exp-tradeoff by selecting suitable
user’s situations for either exploration or exploitation.
    The rest of the paper is organized as follows. Section 2 gives the key notions used
throughout this paper. Section 3 reviews some related works. Section 4 presents our
MCRS model and describes the algorithms involved in the proposed approach. The
experimental evaluation is illustrated in Section 5. The last section concludes the pa-
per and points out possible directions for future work.


2        Key Notions

   In this section, we briefly sketch the key notions that will be of use in this paper.
The user’s model: The user’s model is structured as a case based, which is composed
of a set of situations with their corresponding user’s preferences, denoted U = {(S i;
UPi)}, where Si is the user’s situation and UPi its corresponding user’s preferences.
The user’s preferences: The user’s preferences are deduced during the user’s naviga-
tion activities, for example the number of clicks on the visited documents or the time
spent on a document. Let UP be the preferences submitted by a specific user in the
system at a given situation. Each document in UP is represented as a single vector
d=(c1,...,cn), where ci (i=1, .., n) is the value of a component characterizing the prefer-
ences of d. We consider the following components: the total number of clicks on d,
the total time spent reading d and the number of times d was recommended.
Context: A user’s context C is a multi-ontology representation where each ontology
corresponds to a context dimension C=(OLocation, OTime, OSocial). Each dimension mod-
els and manages a context information type. We focus on these three dimensions since
they cover all needed information. These ontologies are described in [1, 16].
Situation: A situation is an instantiation of the user’s context. We consider a situation
as a triple S = (OLocation.xi, OTime.xj, OSocial.xk) where xi, xj and xk are ontology concepts
or instances. Suppose the following data are sensed from the user’s mobile phone: the
GPS shows the latitude and longitude of a point "48.89, 2.23"; the local time is
"Oct_3_12:10_2012" and the calendar states "meeting with Paul Gerard". The corre-
sponding situation is: S=("48.89,2.23","Oct_3_12:10_2012","Paul_Gerard"). To build
a more abstracted situation, we interpret the user’s behavior from this low-level multi-
modal sensor data using ontologies reasoning means. For example, from S, we obtain
the following situation: Meeting=(Restaurant, Work_day, Financial_client).
Among the set of captured situations, some of them are characterized as High-Level
Critical Situations.
High-Level Critical Situations (HLCS): A HLCS is a class of situations where the
user needs the best information that can be recommended by the system, for instance,
during a professional meeting. In such a situation, the system must exclusively perform
exploitation rather than exploration-oriented learning. In the other case, where the user
is for instance using his/her information system at home, on vacation with friends, the
system can make some exploration by recommending some information ignoring
his/her interest. The HLCS are predefined by the domain expert. In our case we con-
duct the study with professional mobile users, which is described in detail in Section 5.
As examples of HLCS, we can find S1 = (restaurant, midday, client) or S2= (company,
morning, manager).


3      Related Work

We refer, in the following, recent recommendation techniques that tackle the problem
of making dynamic exr/exp (bandit algorithms). Existing works considering the user’s
situation in recommendation are not considered in this section, refer to [1] for further
information.
Very frequently used in reinforcement learning to study the exr/exp tradeoff, the mul-
ti-armed bandit problem was originally described by Robbins [11]. The ε-greedy is
one of the most used strategy to solve the bandit problem and was first described in
[10]. The ε-greedy strategy chooses a random document with epsilon-frequency (ε),
and chooses the document with the highest estimated mean otherwise. The estimation
is based on the rewards observed thus far. ε must be in the interval [0, 1] and its
choice is left to the user. The first variant of the ε-greedy strategy is what [6, 10] refer
to as the ε-beginning strategy. This strategy makes exploration all at once at the be-
ginning. For a given number I of iterations, documents are randomly pulled during the
εI first iterations; during the remaining (1−ε)I iterations, the document of highest
estimated mean is pulled. Another variant of the ε-greedy strategy is what [10] calls
the ε-decreasing. In this strategy, the document with the highest estimated mean is
always pulled except when a random document is pulled instead with εi frequency,
where εi = {ε0/ i}, ε0 ∈]0,1] and i is the index of the current round. Besides ε-
decreasing, four other strategies presented [3]. Those strategies are not described here
because the experiments done by [3] seem to show that ε-decreasing is always as
good as the other strategies. Compared to the standard multi-armed bandit problem
with a fixed set of possible actions, in MCRS, old documents may expire and new
documents may frequently emerge. Therefore it may not be desirable to perform the
exploration all at once at the beginning as in [6] or to decrease monotonically the
effort on exploration as the decreasing strategy in [10].
As far as we know, no existing works address the problem of exr/exp tradeoff in
MCRS. However few research works are dedicated to study the contextual bandit
problem on recommender systems, where they consider the user’s behavior as the
context of the bandit problem. In [13], the authors extend the ε-greedy strategy by
dynamically updating the ε exploration value. At each iteration, they run a sampling
procedure to select a new ε from a finite set of candidates. The probabilities associat-
ed to the candidates are uniformly initialized and updated with the Exponentiated
Gradient (EG) [7]. This updating rule increases the probability of a candidate ε if it
leads to a user’s click. Compared to both ε-beginning and ε-decreasing, this technique
gives better results. In [9], authors model the recommendation as a contextual bandit
problem. They propose an approach in which a learning algorithm sequentially selects
documents to serve users based on their behavior. To maximize the total number of
user’s clicks, this work proposes LINUCB algorithm that is computationally efficient.
   As shown above, none of the mentioned works tackles both problems of exr/exp
dynamicity and user’s situation consideration in the exr/exp strategy. This is precisely
what we intend to do with our approach. Our intuition is that, considering the criticali-
ty of the situation when managing the exr/exp-tradeoff, improves the result of the
MCRS. This strategy achieves high exploration when the current user’s situation is
not critical and achieves high exploitation in the inverse case.


4      MCRS Model

In our recommender system, the recommendation of documents is modeled as a con-
textual bandit problem including user’s situation information [8]. Formally, a bandit
algorithm proceeds in discrete trials t = 1…T. For each trial t, the algorithm performs
the following tasks:
Task 1: Let S t be the current user’s situation, and PS the set of past situations.
The system compares S t with the situations in PS in order to choose the most
similar one, S p:
                               S p= arg max sim( S t , S c )             (1)
                                            Sc PS

The semantic similarity metric is computed by:

                                                                    
                                  sim(S t ,S c ) =  j  sim j x tj ,x c
                                                                        j      (2)
                                                      j

In Eq.2, simj is the similarity metric related to dimension j between two concepts
xjt and xjc; αj is the weight associated to dimension j (during the experimental
phase, αj has a value of 1 for all dimensions). This similarity depends on how
closely xj c and xjc are related in the corresponding ontology. We use the same
similarity measure as [15, 17, 18] defined by:

                             
                sim j x tj , x c  2 
                                              deph( LCS )                       (3)
                                         (deph( x c )  deph( x tj ))
                               j
                                                  j

  In Eq. 3, LCS is the Least Common Subsumer of xjt and xjc, and deph is the
number of nodes in the path from the node to the ontology root.
Task 2: Let D be the document collection and Dp  D the set of documents rec-
ommended in situation S p. After retrieving S p, the system observes the user’s
behavior when reading each document d p  Dp. Based on observed rewards, the
algorithm chooses document d p with the greater reward r p.
Task 3: After receiving the user ’s reward, the algorithm improves its document-
selection strategy with the new observation: in situation S t, document d p obtains
a reward rt.
   When a document is presented to the user and this one selects it by a click, a
reward of 1 is incurred; otherwise, the reward is 0. The reward of a document is
precisely its Click Through Rate (CTR). The CTR is the average number of
clicks on a document by recommendation.


4.1    The ε-greedy algorithm
The ε-greedy algorithm recommends a predefined number of documents N selected
using the following equation:

           di         argmax UC ( getCTR( d ))

                        Random(UC )
                                                         if ( q  )

                                                         otherwise
                                                                                       (4)

   In Eq. 4, i∈{1,…N}, UC={d1,…,dP} is the set of documents corresponding to the
user’s preferences; getCTR() computes the CTR of a given document; Random() re-
turns a random element from a given set, allowing to perform exploration; q is a ran-
dom value uniformly distributed over [0, 1] which defines the exr/exp tradeoff; ε is
the probability of recommending a random exploratory document.


4.2      T h e contextual-ε-greedy algorithm
To improve the adaptation of the ε-greedy algorithm to HLCS situations, the
contextual-ε-greedy algorithm compares the current user’s situation St with the HLCS
class of situations. Depending on the similarity between the St and its most similar
situation Sm ∈ HLCS, being B the similarity threshold (this metric is discussed below),
two scenarios are possible:
(1) If sim(St, Sm) ≥ B, the current situation is critical; the ε-greedy algorithm is used
with ε=0 (exploitation) and St is inserted in the HLCS class of situations.
(2) If sim(St, Sm) < B, the current situation is not critical; the ε-greedy algorithm is
used with ε>0 (exploration) computed as indicated in Eq.5.


                            
                                   sim( S t , S m ) 
                                1                      if ( sim( S t , S m ))  B
                                                    
                                      B                                            (5)
                                0                          otherwise
To summarize, the system does not make exploration when the current user’s situa-
tion is critical; otherwise, the system performs exploration. In this case, the degree of
exploration decreases when the similarity between St and Sm increases.
5         Experimental Evaluation

In order to empirically evaluate the performance of our approach, and in the absence
of a standard evaluation framework, we propose an evaluation framework based on a
diary set of study entries. The main objectives of the experimental evaluation are: (1)
to find the optimal threshold B value described in Section 4.2 and (2) to evaluate the
performance of the proposed algorithm (contextual-ε-greedy). In the following, we
describe our experimental datasets and then present and discuss the obtained results.
   We have conducted a diary study with the collaboration of the French software
company Nomalys1. This company provides a history application, which records the
time, current location, social and navigation information of its users during their ap-
plication use. The diary study has taken 18 months and has generated 178 369 diary
situation entries. Each diary situation entry represents the capture, of contextual time,
location and social information. For each entry, the captured data are replaced with
more abstracted information using time, spatial and social ontologies [1]. From the
diary study, we have obtained a total of 2 759 283 entries concerning the user’s navi-
gation, expressed with an average of 15.47 entries per situation.
   In order to set out the threshold similarity value, we use a manual classification as
a baseline and compare it with the results obtained by our technique. So, we take a
random sampling of 10% of the situation entries, and we manually group similar situ-
ations; then we compare the constructed groups with the results obtained by our simi-
larity algorithm, with different threshold values.




                  Fig. 1.   Effect of B threshold value on the similarity precision

Fig. 1 shows the effect of varying the threshold situation similarity parameter B in the
interval [0, 3] on the overall precision. Results show that the best performance is ob-
tained when B has the value 2.4 achieving a precision of 0.849. Consequently, we use
the optimal threshold value B = 2.4 for testing our MCRS.
   To test the proposed contextual-ε-greedy algorithm, we firstly have collected 3000
situations with an occurrence greater than 100 to be statistically meaningful. Then, we

1
    Nomalys is a company that provides a graphical application on Smartphones allowing users to
     access their company’s data.
have sampled 10000 documents that have been shown on any of these situations. The
testing step consists of evaluating the algorithms for each testing situation using the
average CTR. The average CTR for a particular iteration is the ratio between the total
number of clicks and the total number of displays. Then, we calculate the average
CTR over every 1000 iterations. The number of documents (N) returned by the rec-
ommender system for each situation is 10 and we have run the simulation until the
number of iterations reaches 10000, which is the number of iterations where all algo-
rithms have converged. In the first experiment, in addition to a pure exploitation base-
line, we have compared our algorithm to the algorithms described in the related work
(Section 3): ε-greedy; ε-beginning, ε-decreasing and EG. In Fig. 2, the horizontal axis
is the number of iterations and the vertical axis is the performance metric.




                        Fig. 2. Average CTR for exr/exp algorithms
We have parameterized the different algorithms as follows: ε-greedy was tested with
two parameter values: 0.5 and 0.9; ε-decreasing and EG use the same set {εi = 1- 0.01
* i, i = 1,...,100}; ε-decreasing starts using the highest value and reduces it by 0.01
every 100 iterations, until it reaches the smallest value. Overall tested algorithms
have better performance than the baseline. However, for the first 2000 iterations, with
pure exploitation, the exploitation baseline achieves a faster increase convergence.
But in the long run, all exr/exp algorithms improve the average CTR at convergence.
We have several observations regarding the different exr/exp algorithms. For the ε-
decreasing algorithm, the converged average CTR increases as the ε decreases (ex-
ploitation augments). For the ε-greedy(0.9) and ε-greedy(0.5), even after conver-
gence, the algorithms still give respectively 90% and 50% of the opportunities to
documents having low average CTR, which decreases significantly their results.
While the EG algorithm converges to a higher average CTR, its overall performance
is not as good as ε-decreasing. Its average CTR is low at the early step because of
more exploration, but does not converge faster. The contextual-ε-greedy algorithm
effectively learns the optimal ε; it has the best convergence rate, increases the average
CTR by a factor of 2 over the baseline and outperforms all other exr/exp algorithms.
The improvement comes from a dynamic tradeoff between exr/exp, controlled by the
critical situation (HLCS) estimation. At the early stage, this algorithm takes full ad-
vantage of exploration without wasting opportunities to establish good results.
6      Conclusion

In this paper, we study the problem of exploitation and exploration in mobile context-
aware recommender systems and propose a novel approach that balances adaptively
exr/exp regarding the user’s situation. In order to evaluate the performance of the
proposed algorithm, we compare it with other standard exr/exp strategies. The exper-
imental results demonstrate that our algorithm performs better on average CTR in
various configurations. In the future, we plan to evaluate the scalability of the algo-
rithm on-board a mobile device and investigate other public benchmarks.

References
 1. Bouneffouf, D., Bouzeghoub A. & Gançarski A. L.: Following the User’s Interests in Mo-
    bile Context-Aware recommender systems. AINA Workshops, pp. 657-662. Fukuoka, Ja-
    pan (2012).
 2. Adomavicius, G., Mobasher, B., Ricci, F. Alexander T.: Context-Aware Recommend-
    er Systems. AI Magazine. vol. 32(3), pp. 67-80 (2011).
 3. Auer, P., Cesa-Bianchi, N., and Fischer, P.: Finite Time Analysis of the Multiarmed Ban-
    dit Problem. Machine Learning, vol 2, pp. 235–256. Kluwer Academic Publishers, Hing-
    ham (2002).
 4. Baltrunas, L., Ludwig, B., Peer, S., and Ricci, F.: Context relevance assessment and
    exploitation in mobile recommender systems. Personal and Ubiquitous Computing,
    pp.1-20, Springer, London (2011).
 5. Bellotti, V., Begole, B., Chi, E.H., Ducheneaut, N., Fang, J., Isaacs, E.: Activity-
    Based Serendipitous Recommendations with the Magitti Mobile Leisure Guide. Pro-
    ceedings on the Move. pp. 1157-1166. ACM, New York (2008).
 6. Even-Dar, E., Mannor, S., and Mansour, Y.: PAC Bounds for Multi-Armed Bandit
    and Markov Decision Processes. In Fifteenth Annual Conference on Computational
    Learning Theory, 255–270 (2002).
 7. Kivinen, J., and Manfred, K.: Warmuth. Exponentiated gradient versus gradient de-
    scent for linear predictors. Information and Computation, vol 132. (1995).
 8. Langford, J., and Zhang, T.: The epoch-greedy algorithm for contextual multi-armed ban-
    dits. In Advances in Neural Information Processing Systems (2008).
 9. Lihong, Li., Wei, C., Langford, J.: A Contextual-Bandit Approach to Personalized
    News Document Recommendation. Proceedings of the 19th international conference on
    World wide web, app. 661-670, ACM, New York (2010).
10. Mannor, S., and Tsitsiklis, J. N.: The Sample Complexity of Exploration in the Mul-
    ti-Armed Bandit Problem. Computational Learning Theory, app. 255-270 (2003).
11. Robbins, H.: Some aspects of the sequential design of experiments. Bulletin of the Ameri-
    can Mathematical Society. Vol 58, pp.527-535 (1952).
12. Watkins, C.: Learning from Delayed Rewards. Ph.D. thesis. Cambridge University
    (1989).
13. Wei, Li., Wang, X., Zhang, R., Cui, Y., Mao, J., Jin, R.: Exploitation and Exploration
    in a Performance based Contextual Advertising System. Proceedings of the Interna-
    tional Conference on Knowledge discovery and data mining: pp. 27-36. ACM, New
    York (2010).
14. Sohn, T., Li, K. A., Griswold, W. G., Hollan, J. D.: A Diary Study of Mobile Information
    Needs. Proceedings of the twenty-sixth annual SIGCHI conference on Human factors
    in computing, pp. 433-442, ACM, Florence (2008).
15. Wu, Z., and Palmer, M.: Verb Semantics and Lexical Selection. In Proceedings of the
    32nd Meeting of the Association for Computational Linguistics, pp.133-138 (1994).
16. Bouneffouf, D., Bouzeghoub A. & Gançarski A. L.: Hybrid-ε-greedy for Mobile Context-
    Aware Recommender System. PAKDD (1) 2012: 468-479
17. Bouneffouf, D., Bouzeghoub A. & Gançarski A. L.: Exploration / Exploitation Trade-Off
    in Mobile Context-Aware Recommender Systems. Australasian Conference on Artificial
    Intelligence 2012: 591-601
18. Bouneffouf, D., Bouzeghoub A. & Gançarski A. L.: Considering the High Level Critical
    Situations in Context-Aware Recommender Systems. IMMoA 2012: 26-32
19. Bouneffouf, D.: L'apprentissage automatique, une étape importante dans l'adaptation des
    systèmes d'information à l'utilisateur. inforsid 2011 : 427-428
20. Bouneffouf, D.: Applying machine learning techniques to improve user acceptance on
    ubiquitous environment. CAISE Doctoral Consortium 2011

Mais conteúdo relacionado

Mais procurados

IRJET- Personalize Travel Recommandation based on Facebook Data
IRJET- Personalize Travel Recommandation based on Facebook DataIRJET- Personalize Travel Recommandation based on Facebook Data
IRJET- Personalize Travel Recommandation based on Facebook DataIRJET Journal
 
GTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerceGTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerceGrubhubTech
 
PRE-PROCESSING TECHNIQUES FOR FACIAL EMOTION RECOGNITION SYSTEM
PRE-PROCESSING TECHNIQUES FOR FACIAL EMOTION RECOGNITION SYSTEMPRE-PROCESSING TECHNIQUES FOR FACIAL EMOTION RECOGNITION SYSTEM
PRE-PROCESSING TECHNIQUES FOR FACIAL EMOTION RECOGNITION SYSTEMIAEME Publication
 
Amazon Product Review Sentiment Analysis with Machine Learning
Amazon Product Review Sentiment Analysis with Machine LearningAmazon Product Review Sentiment Analysis with Machine Learning
Amazon Product Review Sentiment Analysis with Machine Learningijtsrd
 
Crowdsourcing predictors of behavioral outcomes
Crowdsourcing predictors of behavioral outcomesCrowdsourcing predictors of behavioral outcomes
Crowdsourcing predictors of behavioral outcomesJPINFOTECH JAYAPRAKASH
 
Learning Process Interaction Aided by an Adapter Agent
Learning Process Interaction Aided by an Adapter AgentLearning Process Interaction Aided by an Adapter Agent
Learning Process Interaction Aided by an Adapter Agentpaperpublications3
 
Using user personalized ontological profile to infer semantic knowledge for p...
Using user personalized ontological profile to infer semantic knowledge for p...Using user personalized ontological profile to infer semantic knowledge for p...
Using user personalized ontological profile to infer semantic knowledge for p...Joao Luis Tavares
 
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...csandit
 
Thesis Blurb intro-Multisensory Warning Cue Evaluation in Driving-Jan18
Thesis Blurb intro-Multisensory Warning Cue Evaluation in Driving-Jan18Thesis Blurb intro-Multisensory Warning Cue Evaluation in Driving-Jan18
Thesis Blurb intro-Multisensory Warning Cue Evaluation in Driving-Jan18Yuanjing Sun
 
Friend Recommendation on Social Network Site Based on Their Life Style
Friend Recommendation on Social Network Site Based on Their Life StyleFriend Recommendation on Social Network Site Based on Their Life Style
Friend Recommendation on Social Network Site Based on Their Life Stylepaperpublications3
 
Scalable recommendation with social contextual information
Scalable recommendation with social contextual informationScalable recommendation with social contextual information
Scalable recommendation with social contextual informationeSAT Journals
 
Integrated bio-search approaches with multi-objective algorithms for optimiza...
Integrated bio-search approaches with multi-objective algorithms for optimiza...Integrated bio-search approaches with multi-objective algorithms for optimiza...
Integrated bio-search approaches with multi-objective algorithms for optimiza...TELKOMNIKA JOURNAL
 
organic information design
organic information designorganic information design
organic information designrosa qin
 
Ontological and clustering approach for content based recommendation systems
Ontological and clustering approach for content based recommendation systemsOntological and clustering approach for content based recommendation systems
Ontological and clustering approach for content based recommendation systemsvikramadityajakkula
 

Mais procurados (19)

IRJET- Personalize Travel Recommandation based on Facebook Data
IRJET- Personalize Travel Recommandation based on Facebook DataIRJET- Personalize Travel Recommandation based on Facebook Data
IRJET- Personalize Travel Recommandation based on Facebook Data
 
Ce4201528546
Ce4201528546Ce4201528546
Ce4201528546
 
woot2
woot2woot2
woot2
 
Phd defence presentation
Phd defence presentationPhd defence presentation
Phd defence presentation
 
GTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerceGTC 2021: Counterfactual Learning to Rank in E-commerce
GTC 2021: Counterfactual Learning to Rank in E-commerce
 
PRE-PROCESSING TECHNIQUES FOR FACIAL EMOTION RECOGNITION SYSTEM
PRE-PROCESSING TECHNIQUES FOR FACIAL EMOTION RECOGNITION SYSTEMPRE-PROCESSING TECHNIQUES FOR FACIAL EMOTION RECOGNITION SYSTEM
PRE-PROCESSING TECHNIQUES FOR FACIAL EMOTION RECOGNITION SYSTEM
 
Amazon Product Review Sentiment Analysis with Machine Learning
Amazon Product Review Sentiment Analysis with Machine LearningAmazon Product Review Sentiment Analysis with Machine Learning
Amazon Product Review Sentiment Analysis with Machine Learning
 
Crowdsourcing predictors of behavioral outcomes
Crowdsourcing predictors of behavioral outcomesCrowdsourcing predictors of behavioral outcomes
Crowdsourcing predictors of behavioral outcomes
 
Major
MajorMajor
Major
 
Learning Process Interaction Aided by an Adapter Agent
Learning Process Interaction Aided by an Adapter AgentLearning Process Interaction Aided by an Adapter Agent
Learning Process Interaction Aided by an Adapter Agent
 
Using user personalized ontological profile to infer semantic knowledge for p...
Using user personalized ontological profile to infer semantic knowledge for p...Using user personalized ontological profile to infer semantic knowledge for p...
Using user personalized ontological profile to infer semantic knowledge for p...
 
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
IMPROVING SUPERVISED CLASSIFICATION OF DAILY ACTIVITIES LIVING USING NEW COST...
 
Thesis Blurb intro-Multisensory Warning Cue Evaluation in Driving-Jan18
Thesis Blurb intro-Multisensory Warning Cue Evaluation in Driving-Jan18Thesis Blurb intro-Multisensory Warning Cue Evaluation in Driving-Jan18
Thesis Blurb intro-Multisensory Warning Cue Evaluation in Driving-Jan18
 
Friend Recommendation on Social Network Site Based on Their Life Style
Friend Recommendation on Social Network Site Based on Their Life StyleFriend Recommendation on Social Network Site Based on Their Life Style
Friend Recommendation on Social Network Site Based on Their Life Style
 
Scalable recommendation with social contextual information
Scalable recommendation with social contextual informationScalable recommendation with social contextual information
Scalable recommendation with social contextual information
 
Integrated bio-search approaches with multi-objective algorithms for optimiza...
Integrated bio-search approaches with multi-objective algorithms for optimiza...Integrated bio-search approaches with multi-objective algorithms for optimiza...
Integrated bio-search approaches with multi-objective algorithms for optimiza...
 
organic information design
organic information designorganic information design
organic information design
 
B045041114
B045041114B045041114
B045041114
 
Ontological and clustering approach for content based recommendation systems
Ontological and clustering approach for content based recommendation systemsOntological and clustering approach for content based recommendation systems
Ontological and clustering approach for content based recommendation systems
 

Semelhante a A contextual bandit algorithm for mobile context-aware recommender system

A Research Platform for Coevolving Agents.doc
A Research Platform for Coevolving Agents.docA Research Platform for Coevolving Agents.doc
A Research Platform for Coevolving Agents.docbutest
 
A Research Platform for Coevolving Agents.doc
A Research Platform for Coevolving Agents.docA Research Platform for Coevolving Agents.doc
A Research Platform for Coevolving Agents.docbutest
 
A Research Platform for Coevolving Agents.doc
A Research Platform for Coevolving Agents.docA Research Platform for Coevolving Agents.doc
A Research Platform for Coevolving Agents.docbutest
 
MODELING THE ADAPTION RULE IN CONTEXTAWARE SYSTEMS
MODELING THE ADAPTION RULE IN CONTEXTAWARE SYSTEMSMODELING THE ADAPTION RULE IN CONTEXTAWARE SYSTEMS
MODELING THE ADAPTION RULE IN CONTEXTAWARE SYSTEMSijasuc
 
Modeling the Adaption Rule in Contextaware Systems
Modeling the Adaption Rule in Contextaware SystemsModeling the Adaption Rule in Contextaware Systems
Modeling the Adaption Rule in Contextaware Systemsijasuc
 
Temporal Reasoning Graph for Activity Recognition
Temporal Reasoning Graph for Activity RecognitionTemporal Reasoning Graph for Activity Recognition
Temporal Reasoning Graph for Activity RecognitionIRJET Journal
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Miningbutest
 
Inspection of Certain RNN-ELM Algorithms for Societal Applications
Inspection of Certain RNN-ELM Algorithms for Societal ApplicationsInspection of Certain RNN-ELM Algorithms for Societal Applications
Inspection of Certain RNN-ELM Algorithms for Societal ApplicationsIRJET Journal
 
Army Study: Ontology-based Adaptive Systems of Cyber Defense
Army Study: Ontology-based Adaptive Systems of Cyber DefenseArmy Study: Ontology-based Adaptive Systems of Cyber Defense
Army Study: Ontology-based Adaptive Systems of Cyber DefenseRDECOM
 
A comprehensive review of the firefly algorithms
A comprehensive review of the firefly algorithmsA comprehensive review of the firefly algorithms
A comprehensive review of the firefly algorithmsXin-She Yang
 
factorization methods
factorization methodsfactorization methods
factorization methodsShaina Raza
 
C H I 9 5 M O S A I C OF CREATIVITY - May 7 1 1 1995 P a p e.docx
C H I  9 5 M O S A I C OF CREATIVITY - May 7 1 1 1995 P a p e.docxC H I  9 5 M O S A I C OF CREATIVITY - May 7 1 1 1995 P a p e.docx
C H I 9 5 M O S A I C OF CREATIVITY - May 7 1 1 1995 P a p e.docxjasoninnes20
 
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...ijdpsjournal
 
Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...
Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...
Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...WTHS
 
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent Itemsets
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent ItemsetsA Novel Collaborative Filtering Algorithm by Bit Mining Frequent Itemsets
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent ItemsetsLoc Nguyen
 
smartwatch-user-identification
smartwatch-user-identificationsmartwatch-user-identification
smartwatch-user-identificationSebastian W. Cheah
 
The social dynamics of software development
The social dynamics of software developmentThe social dynamics of software development
The social dynamics of software developmentaliaalistartup
 
A HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESS
A HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESSA HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESS
A HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESSIJNSA Journal
 

Semelhante a A contextual bandit algorithm for mobile context-aware recommender system (20)

A Research Platform for Coevolving Agents.doc
A Research Platform for Coevolving Agents.docA Research Platform for Coevolving Agents.doc
A Research Platform for Coevolving Agents.doc
 
A Research Platform for Coevolving Agents.doc
A Research Platform for Coevolving Agents.docA Research Platform for Coevolving Agents.doc
A Research Platform for Coevolving Agents.doc
 
A Research Platform for Coevolving Agents.doc
A Research Platform for Coevolving Agents.docA Research Platform for Coevolving Agents.doc
A Research Platform for Coevolving Agents.doc
 
MODELING THE ADAPTION RULE IN CONTEXTAWARE SYSTEMS
MODELING THE ADAPTION RULE IN CONTEXTAWARE SYSTEMSMODELING THE ADAPTION RULE IN CONTEXTAWARE SYSTEMS
MODELING THE ADAPTION RULE IN CONTEXTAWARE SYSTEMS
 
Modeling the Adaption Rule in Contextaware Systems
Modeling the Adaption Rule in Contextaware SystemsModeling the Adaption Rule in Contextaware Systems
Modeling the Adaption Rule in Contextaware Systems
 
Temporal Reasoning Graph for Activity Recognition
Temporal Reasoning Graph for Activity RecognitionTemporal Reasoning Graph for Activity Recognition
Temporal Reasoning Graph for Activity Recognition
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
 
Inspection of Certain RNN-ELM Algorithms for Societal Applications
Inspection of Certain RNN-ELM Algorithms for Societal ApplicationsInspection of Certain RNN-ELM Algorithms for Societal Applications
Inspection of Certain RNN-ELM Algorithms for Societal Applications
 
Army Study: Ontology-based Adaptive Systems of Cyber Defense
Army Study: Ontology-based Adaptive Systems of Cyber DefenseArmy Study: Ontology-based Adaptive Systems of Cyber Defense
Army Study: Ontology-based Adaptive Systems of Cyber Defense
 
A comprehensive review of the firefly algorithms
A comprehensive review of the firefly algorithmsA comprehensive review of the firefly algorithms
A comprehensive review of the firefly algorithms
 
Poster
PosterPoster
Poster
 
factorization methods
factorization methodsfactorization methods
factorization methods
 
C H I 9 5 M O S A I C OF CREATIVITY - May 7 1 1 1995 P a p e.docx
C H I  9 5 M O S A I C OF CREATIVITY - May 7 1 1 1995 P a p e.docxC H I  9 5 M O S A I C OF CREATIVITY - May 7 1 1 1995 P a p e.docx
C H I 9 5 M O S A I C OF CREATIVITY - May 7 1 1 1995 P a p e.docx
 
Smart-X: an Adaptive Multi-Agent Platform for Smart-Topics
Smart-X: an Adaptive Multi-Agent Platform for Smart-TopicsSmart-X: an Adaptive Multi-Agent Platform for Smart-Topics
Smart-X: an Adaptive Multi-Agent Platform for Smart-Topics
 
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...
A LIGHT-WEIGHT DISTRIBUTED SYSTEM FOR THE PROCESSING OF REPLICATED COUNTER-LI...
 
Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...
Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...
Paper Gloria Cea - Goal-Oriented Design Methodology Applied to User Interface...
 
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent Itemsets
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent ItemsetsA Novel Collaborative Filtering Algorithm by Bit Mining Frequent Itemsets
A Novel Collaborative Filtering Algorithm by Bit Mining Frequent Itemsets
 
smartwatch-user-identification
smartwatch-user-identificationsmartwatch-user-identification
smartwatch-user-identification
 
The social dynamics of software development
The social dynamics of software developmentThe social dynamics of software development
The social dynamics of software development
 
A HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESS
A HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESSA HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESS
A HUMAN-CENTRIC APPROACH TO GROUP-BASED CONTEXT-AWARENESS
 

A contextual bandit algorithm for mobile context-aware recommender system

  • 1. A Contextual-bandit Algorithm for Mobile Context- Aware Recommender System Djallel Bouneffouf, Amel Bouzeghoub & Alda Lopes Gançarski Department of Computer Science, Télécom SudParis, UMR CNRS Samovar, 91011 Evry Cedex, France {Djallel.Bouneffouf, Amel.Bouzeghoub, Alda.Gancarski}@it- sudparis.eu Abstract. Most existing approaches in Mobile Context-Aware Recommender Systems focus on recommending relevant items to users taking into account contextual information, such as time, location, or social aspects. However, none of them has considered the problem of user’s content evolution. We introduce in this paper an algorithm that tackles this dynamicity. It is based on dynamic exploration/exploitation and can adaptively balance the two aspects by deciding which user’s situation is most relevant for exploration or exploitation. Within a deliberately designed offline simulation framework we conduct evaluations with real online event log data. The experimental results demonstrate that our algorithm outperforms surveyed algorithms. Keywords: recommender system; machine learning; exploration/exploitation dilemma; artificial intelligence. 1 Introduction Mobile technologies have made access to a huge collection of information, anywhere and anytime. In particular, most professional mobile users acquire and maintain a large amount of content in their repository. Moreover, the content of such repository changes dynamically, undergoes frequent insertions and deletions. In this sense, rec- ommender systems must promptly identify the importance of new documents, while adapting to the fading value of old documents. In such a setting, it is crucial to identi- fy interesting content for users. This problem has been addressed in recent research in the Mobile Context-Aware Recommender Systems (MCRS) area [2, 4, 5, 14, 19, 20]. Most of these approaches are based on the user computational behavior and his sur- rounding environment. Nevertheless, they do not tackle the dynamicity of the user’s content problem. The bandit algorithm is a well-known solution that addresses this problem as a need for balancing exploration/exploitation (exr/exp) tradeoff. A bandit algorithm B exploits its past experience to select documents that appear more fre- quently. Besides, these seemingly optimal documents may in fact be suboptimal, be- cause of the imprecision in B’s knowledge. In order to avoid this undesired case, B adfa, p. 1, 2011. © Springer-Verlag Berlin Heidelberg 2011
  • 2. has to explore documents by choosing seemingly suboptimal documents so as to gather more information about them. Exploitation can decrease short-term user’s sat- isfaction since some suboptimal documents may be chosen. However, obtaining in- formation about the documents’ average rewards (i.e., exploration) can refine B’s estimate of the documents’ rewards and in turn increases long-term user’s satisfac- tion. Clearly, neither a purely exploring nor a purely exploiting algorithm works well, and a good tradeoff is needed. One classical solution to the multi-armed bandit prob- lem is the ε-greedy strategy [12]. With the probability 1-ε, this algorithm chooses the best documents based on current knowledge; and with the probability ε, it uniformly chooses any other documents uniformly. The ε parameter controls essentially the exp/exr tradeoff between exploitation and exploration. One drawback of this algo- rithm is that it is difficult to decide in advance the optimal value. Instead, we intro- duce an algorithm named Contextual-ε-greedy that achieves this goal by balancing adaptively the exp/exr tradeoff according to the user’s situation. This algorithm ex- tends the ε-greedy strategy with an update of the exr/exp-tradeoff by selecting suitable user’s situations for either exploration or exploitation. The rest of the paper is organized as follows. Section 2 gives the key notions used throughout this paper. Section 3 reviews some related works. Section 4 presents our MCRS model and describes the algorithms involved in the proposed approach. The experimental evaluation is illustrated in Section 5. The last section concludes the pa- per and points out possible directions for future work. 2 Key Notions In this section, we briefly sketch the key notions that will be of use in this paper. The user’s model: The user’s model is structured as a case based, which is composed of a set of situations with their corresponding user’s preferences, denoted U = {(S i; UPi)}, where Si is the user’s situation and UPi its corresponding user’s preferences. The user’s preferences: The user’s preferences are deduced during the user’s naviga- tion activities, for example the number of clicks on the visited documents or the time spent on a document. Let UP be the preferences submitted by a specific user in the system at a given situation. Each document in UP is represented as a single vector d=(c1,...,cn), where ci (i=1, .., n) is the value of a component characterizing the prefer- ences of d. We consider the following components: the total number of clicks on d, the total time spent reading d and the number of times d was recommended. Context: A user’s context C is a multi-ontology representation where each ontology corresponds to a context dimension C=(OLocation, OTime, OSocial). Each dimension mod- els and manages a context information type. We focus on these three dimensions since they cover all needed information. These ontologies are described in [1, 16]. Situation: A situation is an instantiation of the user’s context. We consider a situation as a triple S = (OLocation.xi, OTime.xj, OSocial.xk) where xi, xj and xk are ontology concepts or instances. Suppose the following data are sensed from the user’s mobile phone: the GPS shows the latitude and longitude of a point "48.89, 2.23"; the local time is "Oct_3_12:10_2012" and the calendar states "meeting with Paul Gerard". The corre- sponding situation is: S=("48.89,2.23","Oct_3_12:10_2012","Paul_Gerard"). To build
  • 3. a more abstracted situation, we interpret the user’s behavior from this low-level multi- modal sensor data using ontologies reasoning means. For example, from S, we obtain the following situation: Meeting=(Restaurant, Work_day, Financial_client). Among the set of captured situations, some of them are characterized as High-Level Critical Situations. High-Level Critical Situations (HLCS): A HLCS is a class of situations where the user needs the best information that can be recommended by the system, for instance, during a professional meeting. In such a situation, the system must exclusively perform exploitation rather than exploration-oriented learning. In the other case, where the user is for instance using his/her information system at home, on vacation with friends, the system can make some exploration by recommending some information ignoring his/her interest. The HLCS are predefined by the domain expert. In our case we con- duct the study with professional mobile users, which is described in detail in Section 5. As examples of HLCS, we can find S1 = (restaurant, midday, client) or S2= (company, morning, manager). 3 Related Work We refer, in the following, recent recommendation techniques that tackle the problem of making dynamic exr/exp (bandit algorithms). Existing works considering the user’s situation in recommendation are not considered in this section, refer to [1] for further information. Very frequently used in reinforcement learning to study the exr/exp tradeoff, the mul- ti-armed bandit problem was originally described by Robbins [11]. The ε-greedy is one of the most used strategy to solve the bandit problem and was first described in [10]. The ε-greedy strategy chooses a random document with epsilon-frequency (ε), and chooses the document with the highest estimated mean otherwise. The estimation is based on the rewards observed thus far. ε must be in the interval [0, 1] and its choice is left to the user. The first variant of the ε-greedy strategy is what [6, 10] refer to as the ε-beginning strategy. This strategy makes exploration all at once at the be- ginning. For a given number I of iterations, documents are randomly pulled during the εI first iterations; during the remaining (1−ε)I iterations, the document of highest estimated mean is pulled. Another variant of the ε-greedy strategy is what [10] calls the ε-decreasing. In this strategy, the document with the highest estimated mean is always pulled except when a random document is pulled instead with εi frequency, where εi = {ε0/ i}, ε0 ∈]0,1] and i is the index of the current round. Besides ε- decreasing, four other strategies presented [3]. Those strategies are not described here because the experiments done by [3] seem to show that ε-decreasing is always as good as the other strategies. Compared to the standard multi-armed bandit problem with a fixed set of possible actions, in MCRS, old documents may expire and new documents may frequently emerge. Therefore it may not be desirable to perform the exploration all at once at the beginning as in [6] or to decrease monotonically the effort on exploration as the decreasing strategy in [10]. As far as we know, no existing works address the problem of exr/exp tradeoff in MCRS. However few research works are dedicated to study the contextual bandit problem on recommender systems, where they consider the user’s behavior as the
  • 4. context of the bandit problem. In [13], the authors extend the ε-greedy strategy by dynamically updating the ε exploration value. At each iteration, they run a sampling procedure to select a new ε from a finite set of candidates. The probabilities associat- ed to the candidates are uniformly initialized and updated with the Exponentiated Gradient (EG) [7]. This updating rule increases the probability of a candidate ε if it leads to a user’s click. Compared to both ε-beginning and ε-decreasing, this technique gives better results. In [9], authors model the recommendation as a contextual bandit problem. They propose an approach in which a learning algorithm sequentially selects documents to serve users based on their behavior. To maximize the total number of user’s clicks, this work proposes LINUCB algorithm that is computationally efficient. As shown above, none of the mentioned works tackles both problems of exr/exp dynamicity and user’s situation consideration in the exr/exp strategy. This is precisely what we intend to do with our approach. Our intuition is that, considering the criticali- ty of the situation when managing the exr/exp-tradeoff, improves the result of the MCRS. This strategy achieves high exploration when the current user’s situation is not critical and achieves high exploitation in the inverse case. 4 MCRS Model In our recommender system, the recommendation of documents is modeled as a con- textual bandit problem including user’s situation information [8]. Formally, a bandit algorithm proceeds in discrete trials t = 1…T. For each trial t, the algorithm performs the following tasks: Task 1: Let S t be the current user’s situation, and PS the set of past situations. The system compares S t with the situations in PS in order to choose the most similar one, S p: S p= arg max sim( S t , S c )  (1) Sc PS The semantic similarity metric is computed by:  sim(S t ,S c ) =  j  sim j x tj ,x c j  (2) j In Eq.2, simj is the similarity metric related to dimension j between two concepts xjt and xjc; αj is the weight associated to dimension j (during the experimental phase, αj has a value of 1 for all dimensions). This similarity depends on how closely xj c and xjc are related in the corresponding ontology. We use the same similarity measure as [15, 17, 18] defined by:   sim j x tj , x c  2  deph( LCS ) (3) (deph( x c )  deph( x tj )) j j In Eq. 3, LCS is the Least Common Subsumer of xjt and xjc, and deph is the number of nodes in the path from the node to the ontology root. Task 2: Let D be the document collection and Dp  D the set of documents rec-
  • 5. ommended in situation S p. After retrieving S p, the system observes the user’s behavior when reading each document d p  Dp. Based on observed rewards, the algorithm chooses document d p with the greater reward r p. Task 3: After receiving the user ’s reward, the algorithm improves its document- selection strategy with the new observation: in situation S t, document d p obtains a reward rt. When a document is presented to the user and this one selects it by a click, a reward of 1 is incurred; otherwise, the reward is 0. The reward of a document is precisely its Click Through Rate (CTR). The CTR is the average number of clicks on a document by recommendation. 4.1 The ε-greedy algorithm The ε-greedy algorithm recommends a predefined number of documents N selected using the following equation: di   argmax UC ( getCTR( d )) Random(UC ) if ( q  ) otherwise (4) In Eq. 4, i∈{1,…N}, UC={d1,…,dP} is the set of documents corresponding to the user’s preferences; getCTR() computes the CTR of a given document; Random() re- turns a random element from a given set, allowing to perform exploration; q is a ran- dom value uniformly distributed over [0, 1] which defines the exr/exp tradeoff; ε is the probability of recommending a random exploratory document. 4.2 T h e contextual-ε-greedy algorithm To improve the adaptation of the ε-greedy algorithm to HLCS situations, the contextual-ε-greedy algorithm compares the current user’s situation St with the HLCS class of situations. Depending on the similarity between the St and its most similar situation Sm ∈ HLCS, being B the similarity threshold (this metric is discussed below), two scenarios are possible: (1) If sim(St, Sm) ≥ B, the current situation is critical; the ε-greedy algorithm is used with ε=0 (exploitation) and St is inserted in the HLCS class of situations. (2) If sim(St, Sm) < B, the current situation is not critical; the ε-greedy algorithm is used with ε>0 (exploration) computed as indicated in Eq.5.   sim( S t , S m )  1  if ( sim( S t , S m ))  B      B  (5) 0 otherwise To summarize, the system does not make exploration when the current user’s situa- tion is critical; otherwise, the system performs exploration. In this case, the degree of exploration decreases when the similarity between St and Sm increases.
  • 6. 5 Experimental Evaluation In order to empirically evaluate the performance of our approach, and in the absence of a standard evaluation framework, we propose an evaluation framework based on a diary set of study entries. The main objectives of the experimental evaluation are: (1) to find the optimal threshold B value described in Section 4.2 and (2) to evaluate the performance of the proposed algorithm (contextual-ε-greedy). In the following, we describe our experimental datasets and then present and discuss the obtained results. We have conducted a diary study with the collaboration of the French software company Nomalys1. This company provides a history application, which records the time, current location, social and navigation information of its users during their ap- plication use. The diary study has taken 18 months and has generated 178 369 diary situation entries. Each diary situation entry represents the capture, of contextual time, location and social information. For each entry, the captured data are replaced with more abstracted information using time, spatial and social ontologies [1]. From the diary study, we have obtained a total of 2 759 283 entries concerning the user’s navi- gation, expressed with an average of 15.47 entries per situation. In order to set out the threshold similarity value, we use a manual classification as a baseline and compare it with the results obtained by our technique. So, we take a random sampling of 10% of the situation entries, and we manually group similar situ- ations; then we compare the constructed groups with the results obtained by our simi- larity algorithm, with different threshold values. Fig. 1. Effect of B threshold value on the similarity precision Fig. 1 shows the effect of varying the threshold situation similarity parameter B in the interval [0, 3] on the overall precision. Results show that the best performance is ob- tained when B has the value 2.4 achieving a precision of 0.849. Consequently, we use the optimal threshold value B = 2.4 for testing our MCRS. To test the proposed contextual-ε-greedy algorithm, we firstly have collected 3000 situations with an occurrence greater than 100 to be statistically meaningful. Then, we 1 Nomalys is a company that provides a graphical application on Smartphones allowing users to access their company’s data.
  • 7. have sampled 10000 documents that have been shown on any of these situations. The testing step consists of evaluating the algorithms for each testing situation using the average CTR. The average CTR for a particular iteration is the ratio between the total number of clicks and the total number of displays. Then, we calculate the average CTR over every 1000 iterations. The number of documents (N) returned by the rec- ommender system for each situation is 10 and we have run the simulation until the number of iterations reaches 10000, which is the number of iterations where all algo- rithms have converged. In the first experiment, in addition to a pure exploitation base- line, we have compared our algorithm to the algorithms described in the related work (Section 3): ε-greedy; ε-beginning, ε-decreasing and EG. In Fig. 2, the horizontal axis is the number of iterations and the vertical axis is the performance metric. Fig. 2. Average CTR for exr/exp algorithms We have parameterized the different algorithms as follows: ε-greedy was tested with two parameter values: 0.5 and 0.9; ε-decreasing and EG use the same set {εi = 1- 0.01 * i, i = 1,...,100}; ε-decreasing starts using the highest value and reduces it by 0.01 every 100 iterations, until it reaches the smallest value. Overall tested algorithms have better performance than the baseline. However, for the first 2000 iterations, with pure exploitation, the exploitation baseline achieves a faster increase convergence. But in the long run, all exr/exp algorithms improve the average CTR at convergence. We have several observations regarding the different exr/exp algorithms. For the ε- decreasing algorithm, the converged average CTR increases as the ε decreases (ex- ploitation augments). For the ε-greedy(0.9) and ε-greedy(0.5), even after conver- gence, the algorithms still give respectively 90% and 50% of the opportunities to documents having low average CTR, which decreases significantly their results. While the EG algorithm converges to a higher average CTR, its overall performance is not as good as ε-decreasing. Its average CTR is low at the early step because of more exploration, but does not converge faster. The contextual-ε-greedy algorithm effectively learns the optimal ε; it has the best convergence rate, increases the average CTR by a factor of 2 over the baseline and outperforms all other exr/exp algorithms. The improvement comes from a dynamic tradeoff between exr/exp, controlled by the critical situation (HLCS) estimation. At the early stage, this algorithm takes full ad- vantage of exploration without wasting opportunities to establish good results.
  • 8. 6 Conclusion In this paper, we study the problem of exploitation and exploration in mobile context- aware recommender systems and propose a novel approach that balances adaptively exr/exp regarding the user’s situation. In order to evaluate the performance of the proposed algorithm, we compare it with other standard exr/exp strategies. The exper- imental results demonstrate that our algorithm performs better on average CTR in various configurations. In the future, we plan to evaluate the scalability of the algo- rithm on-board a mobile device and investigate other public benchmarks. References 1. Bouneffouf, D., Bouzeghoub A. & Gançarski A. L.: Following the User’s Interests in Mo- bile Context-Aware recommender systems. AINA Workshops, pp. 657-662. Fukuoka, Ja- pan (2012). 2. Adomavicius, G., Mobasher, B., Ricci, F. Alexander T.: Context-Aware Recommend- er Systems. AI Magazine. vol. 32(3), pp. 67-80 (2011). 3. Auer, P., Cesa-Bianchi, N., and Fischer, P.: Finite Time Analysis of the Multiarmed Ban- dit Problem. Machine Learning, vol 2, pp. 235–256. Kluwer Academic Publishers, Hing- ham (2002). 4. Baltrunas, L., Ludwig, B., Peer, S., and Ricci, F.: Context relevance assessment and exploitation in mobile recommender systems. Personal and Ubiquitous Computing, pp.1-20, Springer, London (2011). 5. Bellotti, V., Begole, B., Chi, E.H., Ducheneaut, N., Fang, J., Isaacs, E.: Activity- Based Serendipitous Recommendations with the Magitti Mobile Leisure Guide. Pro- ceedings on the Move. pp. 1157-1166. ACM, New York (2008). 6. Even-Dar, E., Mannor, S., and Mansour, Y.: PAC Bounds for Multi-Armed Bandit and Markov Decision Processes. In Fifteenth Annual Conference on Computational Learning Theory, 255–270 (2002). 7. Kivinen, J., and Manfred, K.: Warmuth. Exponentiated gradient versus gradient de- scent for linear predictors. Information and Computation, vol 132. (1995). 8. Langford, J., and Zhang, T.: The epoch-greedy algorithm for contextual multi-armed ban- dits. In Advances in Neural Information Processing Systems (2008). 9. Lihong, Li., Wei, C., Langford, J.: A Contextual-Bandit Approach to Personalized News Document Recommendation. Proceedings of the 19th international conference on World wide web, app. 661-670, ACM, New York (2010). 10. Mannor, S., and Tsitsiklis, J. N.: The Sample Complexity of Exploration in the Mul- ti-Armed Bandit Problem. Computational Learning Theory, app. 255-270 (2003). 11. Robbins, H.: Some aspects of the sequential design of experiments. Bulletin of the Ameri- can Mathematical Society. Vol 58, pp.527-535 (1952). 12. Watkins, C.: Learning from Delayed Rewards. Ph.D. thesis. Cambridge University (1989). 13. Wei, Li., Wang, X., Zhang, R., Cui, Y., Mao, J., Jin, R.: Exploitation and Exploration in a Performance based Contextual Advertising System. Proceedings of the Interna- tional Conference on Knowledge discovery and data mining: pp. 27-36. ACM, New York (2010). 14. Sohn, T., Li, K. A., Griswold, W. G., Hollan, J. D.: A Diary Study of Mobile Information Needs. Proceedings of the twenty-sixth annual SIGCHI conference on Human factors in computing, pp. 433-442, ACM, Florence (2008).
  • 9. 15. Wu, Z., and Palmer, M.: Verb Semantics and Lexical Selection. In Proceedings of the 32nd Meeting of the Association for Computational Linguistics, pp.133-138 (1994). 16. Bouneffouf, D., Bouzeghoub A. & Gançarski A. L.: Hybrid-ε-greedy for Mobile Context- Aware Recommender System. PAKDD (1) 2012: 468-479 17. Bouneffouf, D., Bouzeghoub A. & Gançarski A. L.: Exploration / Exploitation Trade-Off in Mobile Context-Aware Recommender Systems. Australasian Conference on Artificial Intelligence 2012: 591-601 18. Bouneffouf, D., Bouzeghoub A. & Gançarski A. L.: Considering the High Level Critical Situations in Context-Aware Recommender Systems. IMMoA 2012: 26-32 19. Bouneffouf, D.: L'apprentissage automatique, une étape importante dans l'adaptation des systèmes d'information à l'utilisateur. inforsid 2011 : 427-428 20. Bouneffouf, D.: Applying machine learning techniques to improve user acceptance on ubiquitous environment. CAISE Doctoral Consortium 2011