SlideShare a Scribd company logo
1 of 82
Download to read offline
KDD for Personalization
                          PKDD 2001 Tutorial
                                September 6, 2001




Bamshad Mobasher - DePaul University, Chicago

Bettina Berendt - Humboldt University Berlin

Myra Spiliopoulou - Leipzig Graduate School of Management




    Web Personalization
    •   The Problem
         – dynamically serve customized content (pages, products,
           recommendations, etc.) to users based on their profiles,
           preferences, or expected interests

    •   Personalization v. Customization
         – In customization, user controls and customizes the site
           or the product based on his/her preferences

         – usually manual, but sometimes semi-automatic based on
           a given user profile

         – Personalization is done automatically based on the
           user’s actions, the user’s profile, and (possibly) the
           profiles of others with “similar” profiles


  PKDD 2001 Tutorial: “KDD for Personalization”                      [I-2]
                                                                      [2]
Customization Example




    my.yahoo.com
    my.yahoo.com




PKDD 2001 Tutorial: “KDD for Personalization”   [I-3]
                                                 [3]




  Personalization Example




      amazon.com
       amazon.com




PKDD 2001 Tutorial: “KDD for Personalization”   [I-4]
                                                 [4]
A simplified scheme for personalization
                                                                                      what kind?
                selects                                                               - document etc.
                                                                                      - query
         user       how?                                    information object(s)
                    - request, specification
                    - rating                                    related to
                                                                   why?
                                                                   - similarity (syntactic/semantic)
                                                                   - co-occurrence in other users´
                                                                      navigation histories
                                                                   - co-occurrence in user´s other
                                                                      navigation histories
 system
 recommends                                                 other information object(s)




PKDD 2001 Tutorial: "KDD for Personalization"                                                                       [I-5]




     ÃÒÓÛ Ì Ý Ù×ØÓÑ Ö                                                       ÃÒÓÛÐ                      × ÈÓÛ Ö
     Ê Ð Ø ÓÒ×          Ô×        ×     ÓÒ    Ù×ØÓÑ Ö Ò×           Ø ÔÖÓÔ Ð        Ò ÓÖ       Ò Þ Ø ÓÒ ÖÓÑ

     × ÑÔÐÝ ØÖ       ØÒ        Ù×ØÓÑ Ö×                 ÒØÐÝ ØÓ ØÖ       ØÒ    Ø    Ñ Ö Ð ØÚ           ØÓ Ø    Ö

     Ò      ׸ ÔÖ       Ö Ò       ׸   Ò     Ú ÐÙ   ÔÓØ ÒØ    к º º º

     ÃÒÓÛ Ò         Ø         Ù×ØÓÑ Ö × Ô Ö ÑÓÙÒØ Ò ØÓ                    Ý³× Ñ Ö         ØÔÐ          Û   Ö   Ø

      Ù×ØÓÑ Ö           × ÑÓÖ          ÓÔØ ÓÒ׸     Ö   Ø Ö    Ü     Ð ØÝ     Ò               Ö   ÜÔ   Ø Ø ÓÒ׺

     ººº


                                                                ÂÓ Ò          º Æ ×       ´        ÒØÙÖ µ Ò




ÈÃ         ¾¼¼½ ÌÙØÓÖ    Ð    Ã        ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                          [I-6]
Ù×ØÓÑ Ö     ÒÓÛÐ            ÑÔÐ ×

     ½ºµ     ÕÙ × Ø ÓÒ Ó         Ù×ØÓÑ Ö       Ø

     ¾ºµ    Ò ÐÝ× × Ó       Ù×ØÓÑ Ö        Ø

     ¿ºµ     Ø ÓÒ Ò         ÓÖ    Ò    ÛØ      Ø      Ò   Ò×   Ø×




ÈÃ     ¾¼¼½ ÌÙØÓÖ   Ð   Ã        ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                             [I-7]




       ÕÙ × Ø ÓÒ Ó Ù×ØÓÑ Ö Ø
      Ù×ØÓÑ Ö       Ø    Ö Ö ÓÖ Ò × Ó
      ¯    ÔÖ Ö Ò ×
      ¯    ØÖ Ò× Ø ÓÒ×
      ¯    ÔÖ ¹× Ð × ÓÒØ Ø×
      ¯      Ø Ö¹× Ð × ×ÙÔÔÓÖØ
      ¯       ÑÓ Ö Ô      Ò ÓÖÑ Ø ÓÒ
     ËÓÑ Ó Ø ×              Ø
      ¬    ÑÝ      ÔÙÖ × ÖÓÑ Ø Ö Ô ÖØ ×
      ¬    ÑÝ        Ð Ò ÑÙÐØ ÔÐ ×Ô Ö Ø      Ø × × Ø Ø × ÖÚ ÓÑÔÐ Ø ÐÝ
               Ö ÒØ ÔÙÖÔÓ× ×
      ¬     Ö Ó Ú ÖÝ Ò ÕÙ Ð ØÝ
           Û Ø Ö ×Ô Ø ØÓ ÖÖÓÖ Ö Ø ×¸ Ö Ð Ð Øݸ ÓÚ Ö ¸ Ö ÔÖ × ÒØ Ø Ú Ò ××
                                                                    Ø ÈÖ Ô Ö Ø ÓÒ
ÈÃ     ¾¼¼½ ÌÙØÓÖ   Ð   Ã        ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                             [I-8]
Ò ÐÝ× × Ó Ù×ØÓÑ Ö Ø
       Ø           Ò ÐÝ× × × ÓÙÐ ÔÖÓÚ                                               ÓÒ ÕÙ ×Ø ÓÒ× Ð
     ¯     Ï    Ù× Ö× Û ÐÐ    ÓÑ Ù×ØÓÑ Ö×
     ¯     Ï     Ù×ØÓÑ Ö× Û ÐÐ Ö ØÙÖÒ  Ò
     ¯     Ï Ó × ÑÓÖ Ð ÐÝ ØÓ Ö ×ÔÓÒ ØÓ ÔÖÓÑÓØ ÓÒ Ø ÓÒ
     ¯     Ï Ó ÛÓÙÐ      ÒØ Ö ×Ø Ò ÖÓ××¹× Ð »ÙÔ¹× Ð ×Ù ×Ø ÓÒ×
     ÐÓ× ÐÝ Ö Ð Ø              ØÓ ÕÙ ×Ø ÓÒ× Ð
     ¯ Á× Ø             Ï ¹× Ø                  ÔÔÖÓÔÖ Ø ÐÝ                  × Ò          ØÓ × ÖÚ Ø       ÓÖ Ò × Ø ÓÒ³×
       Ó           Ð×
     ¯ Ö            Ø         Ù×ØÓÑ Ö× × Ø ×
     ¯ Ö            Ø         Ù×ØÓÑ Ö× × Ø ×                           ÒÓÙ              ØÓ ÓÑ    Ò
     ¯ Ö            Ø         Ù×ØÓÑ Ö× × Ø ×                           ÒÓÙ              ØÓ  ÓÑ ÔÖÓÑÓØ Ö× Ó Ø          ×Ø
                                                                                                               Ø ÅÒÒ
ÈÃ     ¾¼¼½ ÌÙØÓÖ         Ð    Ã                ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                          [I-9]




       Ø ÓÒ Ò                 ÓÖ Ò Û Ø Ø                                        Ò        Ò× Ø×
     ¯         Ð    ÒÑ ÒØ Ó             Ø        Ñ Ö        Ø Ò        ÔÓÐ      Ý


     ¯         Ð    ÒÑ ÒØ Ó             Ø        ×ÙÔÔÐÝ                Ò¸       Ò ÐÙ     Ò      Ø Ö × Ð × ×ÙÔÔÓÖØ


     ¯             Ù×ØÑ ÒØ Ó                Ø      Û        × Ø


               ¡ ×Ø Ø         × Ø       Ö ¹        ×    Ò

               ¡       ÖÓÛ× Ò »Æ Ú                     Ø ÓÒ ×Ù              ×Ø ÓÒ×

               ¡ Ê      ÓÑÑ Ò                   Ø ÓÒ× ÓÒ Ø              Ô

               ¡ ÁÒØ ÐÐ        ÒØ           ×× ×Ø Ò

               ¡ È Ö×ÓÒ Ð Þ                     Ð ÝÓÙØ      Ò          ÓÒØ ÒØ




           Ø       Ì     Ø Ñ        Ð              ØÛ       Ò     Ò×        Ø       Ò        Ø ÓÒ × ÓÙÐ     Ñ Ò Ñ Þ   º




ÈÃ     ¾¼¼½ ÌÙØÓÖ         Ð    Ã                ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                         [I-10]
Ì         Ø ÓÒ × ÓÙÐ Ö Ø            Ú ÐÙ
         ¯   ÓÖ Ø Ù×ØÓÑ Ö
         ¯   ÓÖ Ø ÓÖ Ò × Ø ÓÒ




ÈÃ        ¾¼¼½ ÌÙØÓÖ   Ð    Ã     ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                       [I-11]




         × ÓÖØ Ü ÙÖ× ÓÒ ÓÒ Ú ÐÙ Ö Ø ÓÒ
     ÁÒ ¾       ¹ ÓÑÑ Ö ¸ × ÒÓØ ×Ù                     ÒØ ØÓ
         ¯ Ó   Ö Ò Ü ×Ø Ò ÔÖÓ Ù Ø Ø ÖÓÙ                   Ø    ÁÒØ ÖÒ Ø
         ¯     Ø Þ Ô ÖØ» ÐÐ Ó Ø           ÑÖ           Ò ÞÒ        Ò
         ¯ ÒØÖÓ Ù               Ö ÐÐ ÒØ Ò Û ÔÖÓ Ù Ø Ò Ø          ÑÖ Ø
     Ì       ÔÖÓ Ù Ø ÑÙ×Ø Ö Ò                   Ú ÐÙ ØÓ
         ¯ ÛÒ Ø            Ù×ØÓÑ Ö                                        Ù×ØÓÑ Ö ÓÒÚ Ö× ÓÒ
         ¯ Ö Ø ÒØ           Ù×ØÓÑ Ö                                        Ù×ØÓÑ Ö Ê Ø ÒØ ÓÒ



ÈÃ        ¾¼¼½ ÌÙØÓÖ   Ð    Ã     ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                      [1-12]
Ì       ÑÓ Ð Ó ÃÙ Ð Ò ÓÒ×                Ö× Ø    ÓÐÐÓÛ Ò ØÝÔ × Ó Ú ÐÙ ¿¾
     ´½µ ÓÑÔ Ö Ø Ú
     ´¾µ ÑÔÖÓÚ Ò                ÒÝ
     ´¿µ ÑÔÖÓÚ Ò               Ø Ú ØÝ
     ´ µ ÒØ Ö Ø Ú
     ´ µ ÓÖ Ò × Ø ÓÒ Ð
     ´ µ ×ØÖ Ø
     ´ µ ÒÒÓÚ Ø Ú



ÈÃ       ¾¼¼½ ÌÙØÓÖ    Ð   Ã    ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                          [1-13]




      ÖÓÑ         ÕÙ × Ø ÓÒ ØÓ          Ø ÓÒ
         ¯ Ì Ö        × ÒÓ Ð     Ó      Ø º
              ¡   Ð  ×ØÖ Ñ Ø       ÙÑÙÐ Ø Ò ØÖ Ñ Ò ÓÙ× Ô º
              ¡    ÑÓ Ö Ô       Ø  Ò     ÕÙ Ö º
              ¡   Ù×ØÓÑ Ö ÔÖÓ Ð × Ö Ú Ð Ð ÓÖ Ò        ÕÙ Ö º
         ¯   Ì Ö × ÒÓ Ð          Ó Ñ Ø Ó ÓÐÓ           × ÓÖ     Ø   Ò ÐÝ× ×º


         ¯   Ì        Ð ØÝ ØÓ ÜÔÐÓ Ø Ø           Ø Ò Ö × × Ø         ÑÙ     ×ÐÓÛ Ö Ô
             Ò Ø       ÒÙÑ Ö Ó Ô Ö×ÓÒ Ð Þ             Ï       × Ø × × ÒÓØ Ö ÐÐÝ Ð Ö º


         ¯   Ì ØÓÐ Ö Ð         Ð Ô×     ØÑ           ØÛ Ò     ÕÙ × Ø ÓÒ Ò      Ø ÓÒ × ÐÓÛ
             ½ º

ÈÃ       ¾¼¼½ ÌÙØÓÖ    Ð   Ã    ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                          [I-14]
Personalization: An HCI perspective
   = does personalization increase usability?
   A Web site’s usability is high if users
   - achieve their goals / perform their tasks in little time,
   - do so with a low error rate,
   - experience high subjective satisfaction.

   Usability testing:
   - qualitative and quantitative methods
   - experts and "normal" users
   - questionnaires and experiments

   Usability is a special concern on the Web because
   unlike with other products / software, "users experience
   usability first and pay later". (Nielsen [49]
                                            [B12])
PKDD 2001 Tutorial: "KDD for Personalization"                      [I-15]




        Data Preparation for Personalization




    PKDD 2001 Tutorial: “KDD for Personalization”                [DP-1]
Web Usage Mining

      • Discovery of meaningful patterns from data
        generated by client-server transactions on one or
        more Web servers
      • Typical Sources of Data
           – automatically generated data stored in server access
             logs, referrer logs, agent logs, and client-side cookies
           – e-commerce and product-oriented user events (e.g.,
             shopping cart changes, ad or product click-throughs,
             etc.)
           – user profiles and/or user ratings
           – meta-data, page attributes, page content, site structure




  PKDD 2001 Tutorial: “KDD for Personalization”                               [DP-2]




What’s in a Typical Server Log?
<ip_addr><base_url> -- <date><method><file><protocol><code><bytes><referrer><user_agent>
 <ip_addr><base_url>    <date><method><file><protocol><code><bytes><referrer><user_agent>


203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:21 -0600] "GET /Calls/OWOM.html
HTTP/1.0" 200 3942 "http://www.lycos.com/cgi-
bin/pursuit?query=advertising+psychology&maxhits=20&cat=dir" "Mozilla/4.5 [en] (Win98;
I)"
203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:23 -0600] "GET
/Calls/Images/earthani.gif HTTP/1.0" 200 10689 "http://www.acr-news.org/Calls/OWOM.html"
"Mozilla/4.5 [en] (Win98; I)"
203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:24 -0600] "GET /Calls/Images/line.gif
HTTP/1.0" 200 190 "http://www.acr-news.org/Calls/OWOM.html" "Mozilla/4.5 [en] (Win98; I)"
203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:25 -0600] "GET /Calls/Images/red.gif
HTTP/1.0" 200 104 "http://www.acr-news.org/Calls/OWOM.html" "Mozilla/4.5 [en] (Win98; I)"
203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:31 -0600] "GET / HTTP/1.0" 200 4980
"" "Mozilla/4.06 [en] (Win95; I)"
203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:35 -0600] "GET /Images/line.gif
HTTP/1.0" 200 190 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)"
203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:35 -0600] "GET /Images/red.gif
HTTP/1.0" 200 104 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)"
203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:35 -0600] "GET /Images/earthani.gif
HTTP/1.0" 200 10689 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)"
203.252.234.33 www.acr-news.org - [01/Jun/1999:03:33:11 -0600] "GET /CP.html HTTP/1.0"
200 3218 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)"
The Web Usage Mining Process




                                             C ontent and
                                            S tructure D ata




              P re processing              P attern D iscove ry                   P attern A n alysis




    R aw U sage                 P reprocessed                                                             "Interesting"
                                                                  R ules, P atterns,
       D ata                     C lickstream                                                           R ules, P atterns,
                                                                   and S tatistics
                                      D ata                                                              and S tatistics




PKDD 2001 Tutorial: “KDD for Personalization”                                                                    [DP-4]




   Usage Data Preprocessing

                  Raw Usage
                    Data




        Data            User/Session          Page View                    Path
      Cleaning          Identification       Identification              Completion           Server Session File



                                                                                                   Episode
                                                                                                 Identification


                        Usage Statistics
                                              Site Structure
                                              and Content
                                                                                                  Episode File




PKDD 2001 Tutorial: “KDD for Personalization”                                                                    [DP-5]
Data Preprocessing for Web Usage Mining

   • Data cleaning
        – remove irrelevant references and fields in server logs
        – remove references due to spider navigation
        – remove erroneous references
        – add missing references due to caching (done after
          sessionization)
   • Data integration
        – synchronize data from multiple server logs
        – integrate e-commerce and application server data
        – integrate meta-data (e.g., content labels)
        – integrate demographic / registration data



PKDD 2001 Tutorial: “KDD for Personalization”                         [DP-6]




   Data Preparation for Web Usage Mining
   (Cooley, Mobasher, Srivastava, 1999 [15])

   • Data Transformation
        – user identification
        – sessionization / episode identification
        – pageview identification
             • a pageview is a set of page files and associated objects
               that contribute to a single display in a Web Browser
   • Data Reduction
        – sampling and dimensionality reduction (ignoring
          certain pageviews / items)
   • Identifying User Transactions (i.e., sets or sequences
     of pageviews possibly with associated weights)



PKDD 2001 Tutorial: “KDD for Personalization”                         [DP-7]
User and Session Identification: Need for
   Reliable Usage Data

   • Validity of results in Web usage mining is affected by
     the ability to:
        – distinguish among different users to a site
        – reconstruct the activities of the users within the site
   • Difficult to obtaining reliable usage data
        – proxy servers and anonymizers
        – rotating IP addresses connections through ISPs
        – missing references due to caching
        – inability of servers to distinguish among different visits




PKDD 2001 Tutorial: “KDD for Personalization”                       [DP-8]




   Identifying Users and Sessions

   • Server log L is a list of log entries each containing
     timestamp, host identifier, URL request (including
     URL stem and query), referrer, agent, cookie, etc.
   • User identification and sessionization
        – user activity log is a sequence of log entries in L
          belonging to the same user
        – user identification is the process of partitioning L into
          a set of user activity logs
        – the goal of sessionization is to further partition each
          user activity log into sequences of entries
          corresponding to each user visit




PKDD 2001 Tutorial: “KDD for Personalization”                       [DP-9]
Sessionization Heuristics

   • Real v. Constructed Sessions
        – Conceptually, the log L is partitioned into an ordered
          collection of “real” sessions R
        – Each heuristic h partitions L into an ordered collection
          of “constructed sessions” Ch
        – The ideal heuristic h*: Ch* = R
   • Two Basic Types of Sessionization Heuristics
        – Time-oriented heuristics
        – Navigation-oriented heuristics




PKDD 2001 Tutorial: “KDD for Personalization”                  [DP-10]




   Time-Oriented Heuristics
    • Consider boundaries on time spent on individual
      pages or in the entire a site during a single visit
        –    Boundaries can be based on a maximum session
             length or maximum time allowable for each pageview
        –    Additional granularity can be obtained by treating
             different boundaries on different (types of) pageviews

        h1: Given t0, and a threshold θ, the timestamp for first
            request in a constructed session S, the request with
            timestamp t is assigned to S, iff t - t0 ≤ θ.

        h2: Given t1, and a threshold δ, the timestamp for a
            request in constructed session S, the next request
            with timestamp t2 is assigned to S, iff t2 - t1 ≤ δ.



PKDD 2001 Tutorial: “KDD for Personalization”                  [DP-11]
Navigation-Oriented Heuristics
   • Take the linkage between pages into account
        – “linkage” can be based on site topology (e.g., split a
          session at a request that could not have been reached
          from previous requests in the session)
        – or can be usage-based (using referrers in log entries)
             • usually more restrictive than topology-based heuristics
               and more difficult to implement in frame-based sites

       href: Given two consecutive requests p and q, with p
       belonging to constructed session S. Then q is assigned
       to S, if the referrer for q was previously invoked in S, or if
       the referrer for q is “undefined” and tq - tp ≤ ∆ (time delay
       ∆ is to allow for proper loading of frameset pages).




PKDD 2001 Tutorial: “KDD for Personalization”                       [DP-12]




   Measures for Sessionization Accuracy
   (Berendt, Mobasher, Spiliopoulou, 2001 [7])

   • A heuristic h maps entries in the log L into
     elements of constructed sessions, such that:
        – (a) each entry in L is mapped to exactly one element
          of a constructed session
        – (b) the mapping is order-preserving
   • Measures quantify the successful mappings of real
     sessions to constructed sessions
        – a measure M evaluates a heuristic h based on the
          differences between Ch and R
        – each measure assigns to h a value M(h) ∈ [0,1] so
          that M(h*) = 1




PKDD 2001 Tutorial: “KDD for Personalization”                       [DP-13]
Measures for Sessionization Accuracy

    • Categorical and Gradual Measures
         – categorical measures: based on the number of real
           sessions that are reconstructed by the heuristics
         – gradual measures: based on the degree to which the
           real sessions are reconstructed by the heuristics




PKDD 2001 Tutorial: “KDD for Personalization”                      [DP-14]




   Categorical Measures

    • Based on the notion of “complete reconstruction”
         – a real session is completely reconstructed if all its
           elements are contained in the same constructed
           session
         – the measure Mcr(h) is the ratio of the number of
           completely reconstructed real sessions in Ch to the
           total number of real sessions |R|




PKDD 2001 Tutorial: “KDD for Personalization”                      [DP-15]
Categorical Measures

   • Derived categorical measures:
        – Mcrs considers only completely reconstructed real
          sessions whose first element is also the first element of
          a constructed session
        – Mcre considers only completely reconstructed real
          sessions whose last element is also the last element of
          a constructed session
        – Mcrse considers only completely reconstructed real
          sessions with correct starts and ends
             • in absence of overlapping real sessions for individual
               users, this gives the number of constructed sessions
               that are identical to corresponding real sessions




PKDD 2001 Tutorial: “KDD for Personalization”                     [DP-16]




   Gradual Measures

    • Allow for measuring partial overlaps between real
      and constructed sessions
         – degree of overlap between real sessions r and
           constructed session c, dego(r,c), is the number of
           elements they have in common divided by total
           number of elements in r.
         – degree of overlap for a real session r is the maximum
           dego(r,c) over all constructed sessions c.
         – the measure Mo(h) is the average degree of overlap
           over all real sessions
         – if a real session is completely reconstructed, its
           overlap degree is 1




PKDD 2001 Tutorial: “KDD for Personalization”                     [DP-17]
Gradual Measures

   • To take the size of constructed session into account,
     we define the degree of similarity
        – degs(r,c) = | r ∩ c | / | r ∪ c |
        – Ms(h) is is the average degree of similarityt over all real
          sessions
        – if a real session is completely reconstructed, its
          similarity degree is 1




PKDD 2001 Tutorial: “KDD for Personalization”                      [DP-18]




   Which Measures?

   • The choice of the measures depends on the goals of
     usage analysis, for example:
        – “complete reconstruction” may be appropriate for
          clustering and association-based analyses (it correctly
          shows set of pages accessed together)
             • it also preserves sequential order of accesses, so it can
               be used for the analysis of users’ navigational behavior
        – Mcrs: useful for analyzing access to entry points
        – Mcre: useful for analyzing access to exit points
        – overlap-based measures can be useful for comparing
          overall effectiveness of sessionization heuristics in
          grouping pages or objects




PKDD 2001 Tutorial: “KDD for Personalization”                      [DP-19]
Which Sessionization Heuristics?

   • The choice of sessionization heuristic depends on
     the characteristics of the data
         – if individual users visit the site in short but temporally
           dense sessions, h2 may perform better than h1
         – in cases when timestamps are not reliable (e.g., using
           integrated data across many log files), href may be a
           better choice for sessionization
         – referrer-based heuristics tend to perform worse in
           highly dynamic, frame-based sites




PKDD 2001 Tutorial: “KDD for Personalization”                                                        [DP-20]




   Comparison of Sessionization
   Heuristics
                        h1-30           h2-10            h-ref
                                                                             •• cookies used to identify
                                                                                 cookies used to identify
                                                                                unique users
                                                                                 unique users
  1.00
                                                                             •• server generated session
                                                                                 server generated session
  0.95
                                                                                variable used to identify
                                                                                 variable used to identify
  0.90                                                                          “real” sessions
                                                                                 “real” sessions
  0.85                                                                       •• site was frame-based and
                                                                                 site was frame-based and
  0.80                                                                          highly dynamic
                                                                                 highly dynamic
  0.75                                                                       •• thresholds of 30 and 10
                                                                                 thresholds of 30 and 10
  0.70
                                                                                minutes were used for h1
                                                                                 minutes were used for h1
                                                                                and h2, respectively
                                                                                 and h2, respectively
  0.65
                                                                             •• href performed poorly, due
                                                                                 href performed poorly, due
  0.60
                                                                                to propagated errors in
                                                                                 to propagated errors in
  0.55
                                                                                misclassified frameset
                                                                                 misclassified frameset
  0.50                                                                          references
                                                                                 references
                                                                 M_o
                                                M_crse
         M_cr




                M_crs




                                M_cre




                                                                       M_s




                                                                             •• 30% of users had multiple
                                                                                 30% of users had multiple
                                                                                IP addresses (coming from
                                                                                 IP addresses (coming from
                                                                                behind proxy servers)
                                                                                 behind proxy servers)




PKDD 2001 Tutorial: “KDD for Personalization”                                                        [DP-21]
Mechanisms for User Identification
    Method                     Description                 Priv acy         Adv antages                    Disadv antages
                                                          Concerns
   IP A ddre s s +         A s s um e e a c h unique      Lo w           A lw a ys a va ila ble . N o      N o t g ua ra nte e d to be
   A g e nt                IP a ddre s s /A g e nt                       a dditio na l                     unique . D e fe a te d by
                           pa ir is a unique us e r                      te c hno lo g y re quire d.       ro ta ting IP s .

   E m be dde d            U s e dyna m ic a lly          Lo w to        A lw a ys a va ila ble .          C a nno t c a pture
   S e s s io n Ids        g e ne ra te d pa g e s to     m e dium       Inde pe nde nt o f IP             re pe a t vis ito rs .
                           a s s o c ia te ID w ith                      a ddre s s e s .                  A dditio na l o ve rhe a d
                           e ve ry hype rlink                                                              fo r dyna m ic pa g e s .
   R e g is tra tio n      U s e r e xplic itly lo g s    M e dium       C a n tra c k                     M a ny us e rs w o n't
                           in to the s ite .                             individua ls no t jus t           re g is te r. N o t
                                                                         bro w s e rs                      a va ila ble be fo re
                                                                                                           re g is tra tio n.
   C o o k ie              S a ve ID o n the c lie nt     M e dium to    C a n tra c k re pe a t           C a n be turne d o ff by
                           m a c hine .                   hig h          vis its fro m s a m e             us e rs .
                                                                         bro w s e r.
   S o ftw a re            P ro g ra m lo a de d into     H ig h         A c c ura te us a g e da ta       Lik e ly to be re je c te d
   A g e nts               bro w s e r a nd s e nds                      fo r a s ing le s ite .           by us e rs .
                           ba c k us a g e da ta .


PKDD 2001 Tutorial: “KDD for Personalization”                                                                                 [DP-22]




   Impact of User Identification Heuristics
      These experiments show the impact of using IP+Agent heuristic for user
       These experiments show the impact of using IP+Agent heuristic for user
      identification on sessionization heuristics (as compared to cookies)
       identification on sessionization heuristics (as compared to cookies)


                           h1-30-real         h1-30-ipa                                      h -ref-real         h -ref-ipa
   1.00                                                                 1.00

   0.90                                                                 0.90

   0.80                                                                 0.80

   0.70                                                                 0.70

   0.60                                                                 0.60

   0.50                                                                 0.50

   0.40                                                                 0.40

   0.30                                                                 0.30
                                                          _s
                                                  _o
           r




                                          e
                      rs


                               re




                                                                                                                              _s
                                                                                r




                                                                                                             e

                                                                                                                    _o
                                                                                        rs


                                                                                                 re
          _c




                                                                               _c
                                        rs




                                                                                                           rs
                  _c


                             _c




                                                                                     _c


                                                                                              _c
                                                          M
                                                M




                                                                                                                              M
                                                                                                                   M
                                     _c
       M




                                                                                                        _c
                                                                           M
                  M


                           M




                                                                                    M


                                                                                             M
                                    M




                                                                                                       M




PKDD 2001 Tutorial: “KDD for Personalization”                                                                                 [DP-23]
Inferring User Transactions from Sessions

   •   Observation: reference lengths
       follow an exponential
       distribution
   •   Page types correlate with                          Histogram of
       reference lengths                                  page reference
                                                          lengths (secs)
   •   Page types: navigational,
       content, or hybrid
   •   Can automatically classify
       pages as navigational or content
       using statistical modeling
   •   A transaction can be defined as
       an intra-session path ending in a
       content page, or as a set of             navigational     content
       content pages in a session                 pages           pages



PKDD 2001 Tutorial: “KDD for Personalization”                          [DP-24]




   Path Completion

   • Refers to the problem of inferring missing user
     references due to caching.
   • Effective path completion requires extensive
     knowledge of the link structure within the site
   • Referrer information in server logs can also be used
     in disambiguating the inferred paths.
   • Problem gets much more complicated in frame-
     based sites.




PKDD 2001 Tutorial: “KDD for Personalization”                          [DP-25]
Path Completion - An Example

                    A                           User’s navigation path:
                                                  A => B => D => E
                                                    => D => B => C
                                                 URL Referrer
        B                     C                   A    --
                                                  B    A
                                                  D    B
                                                  E    D
        D               E             F           C    B


   •   There may be multiple candidates for completing the path.
       For example consider the two paths : E => D => B => C and
       E => D => B => A => C.
   •   In this case, the referrer field allows us to partially
       disambiguate. But, what about: E => D => B => A => B => C?
   •   One heuristic: always take the path that requires the fewest


PKDD 2001 Tutorial: “KDD for Personalization”                         [DP-26]




   Integrating E-Commerce Events

   • Either product oriented or visit oriented
   • Not necessarily a one-to-one correspondence with
     user actions
   • Used to track and analyze conversion of browsers to
     buyers
   • Major difficulty for E-commerce events is defining
     and implementing the events for a site
        – however, in contrast to clickstream data, getting
          reliable preprocessed data is not a problem
   • Another major challenge is the successful
     integration with clickstream data




PKDD 2001 Tutorial: “KDD for Personalization”                         [DP-27]
Product-Oriented Events

   • Product View
        – Occurs every time a product is displayed on a
          pageview
        – Typical Types: Image, Link, Text
   • Product Click-through
        – Occurs every time a user “clicks” on a product to get
          more information
             • Category click-through
             • Product detail or extra detail (e.g. large image) click-
               through
             • Advertisement click-through




PKDD 2001 Tutorial: “KDD for Personalization”                             [DP-28]




   Product-Oriented Events

   • Shopping Cart Changes
        – Shopping Cart Add or Remove
        – Shopping Cart Change - quantity or other feature (e.g.
          size) is changed
   • Product Buy or Bid
        – Separate buy event occurs for each product in the
          shopping cart
        – Auction sites can track bid events in addition to the
          product purchases




PKDD 2001 Tutorial: “KDD for Personalization”                             [DP-29]
Content and Structure Preprocessing

   • Processing content and structure of the site are
     often essential for successful usage analysis
   • Two primary tasks:
        – determine what constitutes a unique page file (i.e.,
          pageview)
        – represent content and structure of the pages in a
          quantifiable form




PKDD 2001 Tutorial: “KDD for Personalization”                            [DP-30]




   Content and Structure Preprocessing

   • Basic elements in content and structure processing
        – creation of a site map
             • captures linkage and frame structure of the site
             • also needs to identify script templates for dynamically
               generated pages
        – extracting important content elements in pages
             • meta-information, keywords, internal and external links,
               etc.
        – identifying and classifying pages based on their
          content and structural characteristics




PKDD 2001 Tutorial: “KDD for Personalization”                            [DP-31]
Quantifying Content and Structure

   • Static Pages
       – All of information is contained within the HTML files for
         a site
       – Each file can be parsed to get a list of links, frames,
         images, and text
       – Files can be obtained through the file system, or HTTP
         requests from an automated agent (site spider)




PKDD 2001 Tutorial: “KDD for Personalization”                      [DP-32]




   Quantifying Content and Structure

   • Dynamic Pages
        – Pages do not exist until they are created due to a
          specific request
        – Relevant information can come from a variety of
          sources: Templates, databases, scripts, HTML, etc.
        – Three methods of obtaining content and structure
          information:
             • Series of HTTP requests from a site mapping tool
             • Compile information from internal sources
             • Content server tools




PKDD 2001 Tutorial: “KDD for Personalization”                      [DP-33]
Integrating content and structure I
  Domain knowledge: content
     - purpose: group pages by their content
     - method: analyze text, meta-tags, and/or URL (query string)
     - grouping by classification or clustering

     Concept hierarchies
                 Entertainment

       Performing        Music        ...               Example of a
       Arts                                             content-based
          Artists     Genres New Releases ...           concept hierarchy

             Blues       Jazz      New Age      ...

PKDD 2001 Tutorial: "KDD for Personalization"                                 [DP-34]




 Integrating content and structure II

  Content profiles from feature clusters
  1, vector space model: each unique word in corpus = one dimension,
     each page(view) is a vector with a non-zero weight for each word
     in that page(view), zero weight for other words

  2. feature - pageview matrix                  (note: "feature" = word,
                                                       "pageview" because of frames)
                 music jazz artist ...
       pv1       1.00  0.80 0.05
       pv2       1.00  0.00 0.70
       ...
  3. features as weighted vectors of pageviews
       jazz = [ <pv1,0.80>, <pv2,0.00>, ... ]

  4. group features -> feature clusters -> content profiles

PKDD 2001 Tutorial: "KDD for Personalization"                                [DP-35]
Integrating content and structure III
  Structure
     - purpose: group pages by their hyperlink structure
     - ex. page types in Pirolli et al. [54] and Cooley et al. [B20]:
                                        [B24]                  [15]:

          head, navigation, content, look-up, personal
     - ex. path distance to a reference page
            A.html            B.html            C.html
                              dA = 1            dA = 2

     - structure as weighted vector of page(view)s
       S = [ <A.html,0>, <B.html,1>, <C.html,0>, ... ](only B content page)
       S = [ <A.html,0>, <B.html,1>, <C.html,3>, ... ] (path distances)
     - grouping by classification or clustering

PKDD 2001 Tutorial: "KDD for Personalization"                           [DP-36]




 Relating content and structure to mined usage I :
 Content/structure mining as pre-/post-processing steps
 Ex. online catalog search (Berendt & Spiliopoulou [B18, B17]):
                                                   [8, 6]):


     1. service-based concept hierarchy: which query options?

                                  Info on schools
                       indiv. school list of schools      ...


                       1 parameter 2 par.s 3 parameters


          Location Name ...           Location+Name ...         ...




PKDD 2001 Tutorial: "KDD for Personalization"                           [DP-37]
Relating content and structure to mined usage I
  2. discovering and comparing navigation patterns in classified pages
       part of a resulting WUM navigation pattern:




PKDD 2001 Tutorial: "KDD for Personalization"                             [DP-38]




 Relating content and structure to mined usage I
  Ex. WebSIFT Information Filter (from Cooley [14]):
                                              [B19]):
  Mined knowledge             domain know-      interesting belief example
                              ledge source
  general                     site structure    The head page is not the most
  usage statistics                              common entry point
  general                     site content      A page designed to provide
  usage statistics                              content is being used as a
                                                navigation page
  frequent itemsets           site structure    A set of pages is frequently
                                                accessed together, but not
  usage clusters              site content      directly linked
                                                A usage cluster contains
  => discover patterns at different             pages from multiple content
  levels of abstraction, discover               categories
  deviations from intended usage

PKDD 2001 Tutorial: "KDD for Personalization"                             [DP-39]
Relating content and structure to mined usage II :
 Usage, content, and structure mining as 3 ways
 of deriving a common kind of representation
      Mobasher, Dai, Luo, Sun, & Zhu [44]
                                     [B22]
      - a vector of tuples <pageview,weight>:
          usage: sessions / visits, or parts of them (past + current)
          content: features
          structure: pages and their characteristics
      - unordered or ordered collections

      => identify clusters that are similar,
         where similarity is by usage, content, or structure


PKDD 2001 Tutorial: "KDD for Personalization"                             [DP-40]




     È ØØ ÖÒ × ÓÚ ÖÝ ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ




ÈÃ    ¾¼¼½ ÌÙØÓÖ   Ð   Ã   ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ   ÅÝÖ ËÔ Ð ÓÔÓÙÐÓÙ   ÀÀÄ º[PD-1]
                                                                         ºº
Ï    ÒØ Ý Ø                ÓÐÐÓÛ Ò ×Ô Ø× Ó Ø Ô Ö×ÓÒ Ð Þ Ø ÓÒ × ÖÚ ×¸ Û Ò
     ÒÚ ×     ×Ø                Ö ×ÙÐØ Ó Ô ØØ ÖÒ × ÓÚ ÖÝ

         Î × Ð ØÝ                                             Ë ÖÚ     Ð Ñ ÒØ
          ¯ Ô Ö×ÓÒ Ð Ö ÓÑÑ Ò                     Ø ÓÒ          ¯ ´Ð Ò ØÓµ Ô
          ¯ × Ð ÒØ ÝÒ Ñ                       Ù×ØÑ ÒØ          ¯ ÔÔÐ     Ø ÓÒ Ó       Ø
          ¯ ×Ø Ø       Ô        »× Ø           Ù×ØÑ ÒØ
         Å Ø     Ò         ×      ÓÒ                            ÕÙ × Ø ÓÒ Ø ÐÐ    Ø ÓÒ
          ¯ Ù× Ö ÔÖÓ Ð ×                                       ¯ ÐÐ ×Ø Ô× ÓҹРÒ
          ¯ Ù× Ö Ö Ø Ò ×                                       ¯ Ó ¹Ð Ò Ô ØØ ÖÒ       × ÓÚ ÖÝ
          ¯ Ù× Ö            Ú ÓÙÖ                                ² ÓÒ¹Ð Ò Ñ Ø         Ò
          ¯ ÓÒØ ÒØ Ó Ó                   Ø×


ÈÃ       ¾¼¼½ ÌÙØÓÖ    Ð    Ã          ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ     ÅÝÖ ËÔ Ð ÓÔÓÙÐÓÙ        ÀÀÄ º[PD-2]
                                                                                            ºº




     È ØØ ÖÒ × ÓÚ ÖÝ                       ÔØ Ú Û × Ø ×
                                          Ì ÔÔÖÓ Ó È Ö ÓÛ ØÞ ² ØÞ ÓÒ ¾¸ ¿
     Ì     ÁÒ Ü Ò Ö ÓÒ× ×Ø× Ó Ø Ö                           Ô × ×
     ½º   ÄÓ     ÔÖÓ       ×× Ò          ×Ø Ð × Ñ ÒØ Ó × ×× ÓÒ× × × Ø× Ó Ô                Ö ÕÙ ×Ø×
     ¾º     ÐÙ×Ø Ö Ñ Ò Ò                 ÖÓÙÔ Ò Ó        Ó¹Ó ÙÖ Ò ÒÓÒ¹Ð Ò         Ô       × ÛØ   ÐÔ
          Ó Ø        ×Ø        Ö Ô
     ¿º     ÓÒ     ÔØÙ Ð        ÐÙ×Ø Ö Ò


            ¡   Ì Ö ÔÖ × ÒØ Ø Ú ÓÒ ÔØ Ó                ÐÙ×Ø Ö × ÒØ  º
            ¡      ÐÙ×Ø Ö Ñ Ñ Ö× ÒÓØ          Ö Ò ØÓ Ø × ÓÒ ÔØ Ö Ö ÑÓÚ   ÖÓÑ
                Ø      ÐÙ×Ø Öº
            ¡   È ×            Ö Ò ØÓ Ø × ÓÒ ÔØ Ò ÒÓØ ÔÔ Ö Ò Ò Ø       ÐÙ×Ø Ö
                  Ö ØØ           ØÓ Ø   ÐÙ×Ø Öº

ÈÃ       ¾¼¼½ ÌÙØÓÖ    Ð    Ã          ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                        [PD-3]
ÓÖ           ÐÙ×Ø Ö¸ Ø     ÁÒ Ü Ò           Ö ÔÖ × ÒØ× ØÓ Ø   Ï      × ÒÖ
         ¯       Ò Ò ÜÔ         Û Ø Ð Ò × ØÓ ÐÐ Ô         ×Ó    ÐÙ×Ø Ö
     Ì       Ï         × ÒÖ           ×
      ¬      Û Ø ÖØ Ò ÛÔ        × ÓÙÐ Ò                        ×Ø Ð ×
      ¬      Û Ø Ø× Ð Ð × ÓÙÐ
      ¬      Û Ö Ø × ÓÙÐ     ÐÓ Ø Ò Ø × Ø

             ÓÖ Ò ØÓ ÓÙÖ        Ø ÓÖ Þ Ø ÓÒ
       Î × Ð ØÝ                                        Ë ÖÚ   Ð Ñ ÒØ Ô      ÓÒØ Ò Ò
      ËØ Ø Ô »× Ø                Ù×ØÑ ÒØ              × Ò Ð ÔÔÐ Ø ÓÒ Ó Ø
       Å Ø         Ò       × ÓÒ                         Ç ¹Ð Ò Ô ØØ ÖÒ × ÓÚ ÖÝ
      Ù× Ö             Ú ÓÙÖ Ò Ô          ÓÒØ ÒØ

ÈÃ       ¾¼¼½ ÌÙØÓÖ     Ð   Ã    ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                               [PD-4]




     È ØØ ÖÒ × ÓÚ ÖÝ ÓÖ Ê ÓÑÑ Ò Ø ÓÒ×

      Ì ÓÐÐ ÓÖ Ø Ú ÐØ Ö Ò ÔÔÖÓ
     Å       Ò   Ì Ó Ø× ×Ù ×Ø ØÓ Ù× Ö Ö Ø Ó× ÔÖ ÖÖ Ý Ù× Ö×
     × Ñ Ð Ö ØÓ Öº
           ½º Ì Ù× Ö³× ØÖ Ò× Ø ÓÒ × Ñ Ø       Ò×Ø ÐÓ ØÖ Ò× Ø ÓÒ׺
           ¾º Ì Ñ Ø × Ö Ö Ò º
           ¿º Ì ×Ø ´× Ø Ó µ Ñ Ø ´ ×µ Ö × Ð Ø º
            º Ì Ó Ø× Ø Ø Û Ö × ÓÛÒ Ò Ø × Ð Ø ØÖ Ò× Ø ÓÒ× Ö
              ÖÒ        Ü ÐÙ Ò Ó Ø× ÐÖ Ý × Òº
            º Ì Ó Ø× Û Ø Ø           ÖÑÓ×Ø Ö Ò Ö × ÓÛÒ ØÓ Ø Ù× Öº
       ÐÐ ×Ø Ô× ÓҹРÒ
ÈÃ       ¾¼¼½ ÌÙØÓÖ     Ð   Ã    ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                               [PD-5]
È ØØ ÖÒ × ÓÚ ÖÝ ÓÖ Ê ÓÑÑ Ò Ø ÓÒ×
     Ì        Ø Å Ò Ò ÔÔÖÓ
     Å    Ò       Í× Ö × Ñ Ð Ö ØÝ Ò    Ò Ò Ø ÖÑ× Ó Ú ÓÙÖ¸
      ÒØ Ö ×Ø׸ ÔÖ Ö Ò × Ø Ø Ø Ò ÑÓ ÐÐ Ó ¹Ð Ò
          ½º È ØØ ÖÒ × ÓÚ ÖÝ ÓÚ Ö Ø ÐÓ       Ø
          ¾º Ì ÓÒØ ÒØ× Ó Ø Ù× Ö³× ØÖ Ò× Ø ÓÒ Ö Ñ Ø                 Ò×Ø
              Ø × ÓÚ Ö Ô ØØ ÖÒ׺
          ¿º Ì Ñ Ø × Ö Ö Ò º
            º Ì Ó Ø× ××Ó Ø Û Ø Ø ×Ø Ñ Ø × Ö Ö Ò
               Ü ÐÙ Ò Ó Ø× ÐÖ Ý × Òº
            º Ì Ó Ø× Û Ø Ø          ÖÑÓ×Ø Ö Ò Ö × ÓÛÒ ØÓ Ø Ù× Öº
     ×Ó Ø Ø µ Ì ÚÓÐÙÑ ØÒÓÙ× ÐÓ × Ô Ö ÓÖÑ Ö ÓÒÐÝ ÔÖÓ Ö Ú×× ÔÓØعÐÖÒ׺º
               µ ÇÒ¹Ð Ò Ñ Ò
                                        Ø
                                               Ò×Ø
                                                                 Ò
ÈÃ       ¾¼¼½ ÌÙØÓÖ   Ð   Ã   ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                             [PD-6]




     È ØØ ÖÒ          × ÓÚ ÖÝ Ê ÓÑÑ Ò Ø ÓÒ× ÓÒ ÓÖÖ Ð Ø Ø Ñ×
                              Ì   ÔÔÖÓ  Ó ÎÙ Ø Ò Ç Ö ÓÚ                                      ¼

     Ì    Ö ÓÑÑ Ò Ø ÓÒ ÔÖÓ Ð Ñ ×                    Ò       ×
              Ú Ò Ø Ö ØÒ × Ó Ø      Ø Ú Ù× Ö ÓÒ × Ø Ó Ø Ñ׸ Û                         Û ÐÐ
                 Ö Ö Ø Ò × ÓÒ Ø Ö Ñ Ò Ò Ø Ñ×

                  Ì Ö ØÒ × Ó Ò Ø Ñ                      Ò       ÔÖ   Ø   ÖÓÑ Ø      Ö ØÒ ×
     Å    Ò
                  ÓÒ ÓÖÖ Ð Ø Ø Ñ׺

      Î × Ð ØÝ                                     Ë ÖÚ         Ð Ñ ÒØ   ÔÔÐ     Ø ÓÒ Ó      Ø
      È Ö×ÓÒ Ð Ö ÓÑÑ Ò Ø ÓÒ
         Å Ø Ò      × ÓÒ Ê Ø¹                 Ç ¹Ð Ò × ÓÚ ÖÝ Ó ÔÖ ØÓÖ× ÓÖ Ø
         Ò × Ó ÓÖÖ Ð Ø Ø Ñ×                   ÑÔ Ø Ó Ø Ñ ÓÖÖ Ð Ø ÓÒ ÓÒ Ö Ø Ò ×
ÈÃ       ¾¼¼½ ÌÙØÓÖ   Ð   Ã   ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                             [PD-7]
Å Ø Ó ÓÐÓ Ý
      ¯ Ì     Ö ØÒ Ó        Ø Ñ Ú Ò ÒÓØ Ö Ø Ñ × ÔÔÖÓÜ Ñ Ø                                       Ù× Ò
          Ð Ò Ö ÙÒ Ø ÓÒ ´Ò Ñ    ÜÔ Öصº
      ¯ Ì      ÚÖ     ÓÖÖ Ð Ø ÓÒ ÑÓÒ Ô Ö× Ó Ø Ñ× × ÔÔÖÓÜ Ñ Ø                                      Ù× Ò
          Ö Ò ÓÑ × ÑÔÐ Ò ÓÚ Ö Ø Ù× Ö Ö Ø Ò ×º
      ¯      Û Ø Ò × Ñ × ÔÖÓÔÓ×                          ØÓ      Ð ÛØ Ø       Ø Ø Ø Ù× Ö× Û Ø
          × Ñ Ð Ö ÔÖ Ö Ò × Ñ Ý ÔÖÓÚ                           Ö ÒØ Ö Ø Ò × ÓÖ Ø × Ñ × Ø Ó
           Ø Ñ׺

     ÁÒ Ø × ×     Ñ
      ¬   Ì     Ð Ò Ö ÜÔ ÖØ× ÓÖ ÐÐ Ô Ö× Ó Ø Ñ×                       Ò        ÓÑÔÙØ       Ó ¹Ð Ò º
      ¬   Ì Ö Ø Ò × ÓÖ Ò Ø Ú Ù× Ö Ö ÔÖ Ø                                 ÖÓÑ Ø       × Ø Ó Ô Ö× Ó
          Ø Ñ× Ö Ø Ö Ø Ò Ø × Ø Ó Ù× Ö Ö Ø Ò ×º


ÈÃ       ¾¼¼½ ÌÙØÓÖ   Ð   Ã     ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                     [PD-8]




     È ØØ ÖÒ          × ÓÚ ÖÝ       Ê Ô           ع ÙÝ Ò     Ø    ÓÖÝ ÓÖ Ô Ö×ÓÒ Ð Þ Ø ÓÒ
                                     Ì            ÔÔÖÓ        Ó       Ý Ö¹Ë    ÙÐÞ    Ø    Ð ¾



     Å    Ò


     µ Ê ÓÑÑ Ò Ø ÓÒ× Ö                        ×     ÓÒ ÓÖÖ Ð Ø        ÔÖÓ Ù Ø׺
     µ ÓÖÖ Ð Ø ÓÒ× Ò                     ÒØ          ÛØ           Ö Ò Ö ³× Ö Ô Ø¹ ÙÝ Ò Ø ÓÖݸ
     µ Ø Ö Ù×Ø Ò Ø ØÓ Ø                   Ô ÖØ ÙÐ Ö Ø × Ó ÒÓÒÝÑÓÙ× Ù× Ö × ×× ÓÒ׺

          ÓÖ Ò ØÓ ÓÙÖ         Ø ÓÖ Þ Ø ÓÒ
         Î × Ð ØÝ Ê ÓÑÑ Ò Ø ÓÒ Ó Ò¹                                Ë ÖÚ  Ð Ñ ÒØ           ÔÔÐ     Ø ÓÒ
         ÓÖÑ Ø ÓÒ ÔÖÓ Ù Ø×                                        Ó Ø ÓÖ ÍÊÄ
         Å Ø Ò    × ÓÒ Ù× Ö ÔÖ                     Ö¹       Ç ¹Ð Ò   × ÓÚ ÖÝ Ó            ÓÖÖ Ð Ø
         Ò × ÓÖ ÔÔÐ Ø ÓÒ Ó Ø×                                ÔÔÐ Ø ÓÒ Ó Ø×

ÈÃ       ¾¼¼½ ÌÙØÓÖ   Ð   Ã     ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                     [PD-9]
Ö Ò       Ö ³× Ö Ô                  ع ÙÝ Ò                Ø       ÓÖÝ


         ¡ ÔÖ              Ø×       ÙÝ Ö                  Ú ÓÙÖ             ÖÓÑ ´ µ Ô Ò ØÖ Ø ÓÒ                  Ò     ´ µ       Ú Ö

             ÔÙÖ           ×        Ö ÕÙ Ò Ý Ó                          Ò    Ø Ñ


         ¡       Ý ÔÖÓÚ             Ò           Ö      Ö Ò              ÑÓ            Ð Ø     Ø       Ö    Ø Ö Þ × Ö Ô          Ø

                 Ó¹Ó       ÙÖ Ò          ÔÙÖ              × × Ó             Ø Ñ×            × Ö Ò ÓÑ ÓÖ ÒÓØ Ö Ò ÓÑ


     Û       Ö


             Ô Ò ØÖ Ø ÓÒ                 Ö          Ö× ØÓ Ø                 ÔÖ        Ö Ò         Ó       Ù×ØÓÑ Ö         ÓÖ        Ö Ò


                 Ú Ö            ÔÙÖ             ×      Ö ÕÙ Ò Ý                  Ö      Ö× ØÓ Ö Ô          Ø     ÔÙÖ           × × Ó   Ø

                 Ø Ñ¸       ÒÓÖ Ò                     Ö     Ø Ö ×Ø               × Ó    Ø         Ø Ñ¸     ÑÓÙÒØ Ó          Ø       Ø Ñ     Ò

             × Þ       Ó    Ø           ÔÙÖ           ×         ×           Û ÓÐ º




ÈÃ       ¾¼¼½ ÌÙØÓÖ             Ð       Ã           ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                                      [PD-10]




      ××ÙÑÔØ ÓÒ× Ó                          ¾

      ¬      Ì         ÔÖÓ              Ð ØÝ Ó         Ö    Ó¹Ó              ÙÖ Ò           × Ó ØÛÓ ÔÖÓ Ù Ø× Ò ×Ù × ÕÙ ÒØ
             ÔÙÖ           × × ÓÐÐÓÛ×                      ÐÓ           ÖØ Ñ            × Ö ×         ×ØÖ ÙØ ÓÒº
      ¬      ËÙ × ÕÙ ÒØ ÔÙÖ                               × × Ó Ø                    × Ñ          Ù×ØÓÑ Ö´×µ          Ò         Ó × ÖÚ           ×
                 ÕÙ Ú Ð ÒØ ØÓ                   × Ø Ó ÔÙÖ                         ×     × ×× ÓÒ×          ÙÖ Ò   Ø     ÐÓ       Ô ÖÓ º

     Å Ø Ó ÓÐÓ Ý

         ¯       ÓÑÔÙØ Ø ÓÒ Ó Ø                             Ö ÕÙ Ò Ý                        ×ØÖ ÙØ ÓÒ× Ó         ÐÐ    Ó¹Ó       ÙÖ Ò      × Ó
             ÔÖÓ Ù Ø Ô Ö׸ ÓÙÒØ Ò                                       ÓÒ           Ó¹Ó      ÙÖ Ò        Ô Ö × ×× ÓÒ ÓÒÐÝ
         ¯       Ð Ñ Ò Ø ÓÒ Ó                       ×ØÖ ÙØ ÓÒ× Û Ø                            ×Ñ ÐÐ ÒÙÑ          Ö Ó Ó × ÖÚ Ø ÓÒ×
         ¯       Ð Ñ Ò Ø ÓÒ Ó Ø                             Ô Ö              ÒØ Ð Ó Ø                     Ö Ô    ع ÙÝ Ô Ö×
         ¯       ÓÑÔÙØ Ø ÓÒ Ó Ø                             Ó¹Ó              ÙÖ Ò            ÔÖ       ØÓÖ ÓÖ               Ô Ö

     ×Ó Ø         Ø ÓÙØÐ Ö× ÓÖ                             ÔÖ                ØÓÖ        Ò         Ó × ÖÚ         × ÓÖÖ Ð Ø                Ø Ñ׺



ÈÃ       ¾¼¼½ ÌÙØÓÖ             Ð       Ã           ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                                       [PD-11]
Pattern Discovery Association mining for personalization


   Basic Idea: match left-hand side of rules with the active user
   session and recommend items in the rule’s consequent
   Essential to store patterns in efficient data structures
       • the search of all rules in real-time is computationally
         ineffective
   Ordering of accessed pages is not taken into account

   Good recommendation accuracy, but the main problem is
   “coverage”
       • high support thresholds lead to low coverage and may
         eliminate important, but infrequent items from consideration
       • low support thresholds result in very large model sizes and
         computationally expensive pattern discovery phase

PKDD 2001 Tutorial: “KDD for Personalization”                      [PD-12]
                                                                     [1]




   Association Mining - Basic Concepts
   We start with a set I of items and a set D of transactions.
   A transaction T is a set of items (a subset of I):

                      I = { i1 , i 2 ,..., i m } T ⊆ I
   An Association Rule is an implication on itemsets X and Y,
   denoted by X ==> Y, where
                        X ⊆ I , Y ⊆ I , X ∩Y =∅
   The rule meets a minimum confidence of c, meaning that
   c% of transactions in D which contain X also contain Y. In
   addition for each itemset a minimum support of s must be
   satisfied:
                    s ≤ X ∪Y / D                c ≤ X ∪Y / X


PKDD 2001 Tutorial: “KDD for Personalization”                      [PD-13]
                                                                     [2]
È ØØ ÖÒ × ÓÚ ÖÝ                   ××Ó Ø » ××Ó Ø Ø Ñ× Ò Ù× Ö×
                                      Ì ÔÔÖÓ Ó Ä Ò¸ ÐÚ Ö Þ ² ÊÙ Þ ¿
     Å    Ò


     µ Í× Ö×       Ö       ××Ó    Ø    ØÓ        ÓØ Ö Ò Ø ÖÑ× Ó ÓÛ Ø Ý Ö Ø Ø Ñ׺
     µ ÁØ Ñ×       Ö       ××Ó    Ø    ØÓ        ÓØ Ö Û Ø Ö ×Ô Ø ØÓ Ù× Ö ÔÖ Ö Ò ×º
      ××Ó      Ø ÓÒ× ÑÓÒ Ø Ñ× Ò                    ÓÙÒ Ó ¹Ð Ò º
      ××Ó      Ø ÓÒ× ØÓ Ø Ø Ú Ù× Ö               Ò    ÓÙÒ ÓÒ¹Ð Ò º
          ÓÖ Ò ØÓ ÓÙÖ            Ø ÓÖ Þ Ø ÓÒ
      Î × Ð ØÝ                                        Ë ÖÚ     Ð Ñ ÒØ   ÔÔÐ   Ø ÓÒ Ó      Ø
      È Ö×ÓÒ Ð Ö ÓÑÑ Ò Ø ÓÒ
         Å Ø Ò   × ÓÒ   ××Ó Ø ÓÒ×                            ÇÒ¹Ð Ò    × ÓÚ ÖÝ Ó       ××Ó º
         ÑÓÒ Ø Ñ× Ò ÑÓÒ Ù× Ö×                                ÖÙÐ × Û Ø Ú Ò ÊÀË
ÈÃ        ¾¼¼½ ÌÙØÓÖ   Ð   Ã     ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                      [PD-14]




     Å Ø Ó ÓÐÓ Ý
      ¯ Ê ÓÑÑ Ò          Ø ÓÒ× Ö ×Ù Ø ØÓ Ñ Ò ÑÙÑ ÓÒ Ò         Ò Ñ Ò ÑÙÑ
           ÒÙÑ Ö Ó ÖÙÐ × ÓÒ×ØÖ ÒØ׺
      ¯    Ì Ñ Ò Ö × ÓÚ Ö× ××Ó Ø ÓÒ ÖÙÐ × Ø Ö Ø Ú Ðݸ ÙÒØ Ð Ø    ×Ö
           ÒÙÑ Ö Ó ÖÙÐ × × ÜØÖ Ø º
           Ì ×ÙÔÔÓÖØ ÙØÓ × Ù×Ø Ò                Ø Ö Ø ÓÒº
      ¯    ÊÙÐ × ÓÒ ÖÒ ÓØ Ø Ñ× Ò Ù× Ö×
                     Í× Ö½ Ð     Æ Í× Ö¾ ×Ð µ Ì Ö ØÍ× Ö Ð
                     ÁØ Ñ½ Ð     Æ ÁØ Ñ¾ Ð    µ Ì Ö ØÁØ Ñ Ð
      ¯       Ò     Ø Ø Ñ× Ö ÓÑÔÙØ ÖÓÑ ××Ó Ø ÓÒ× ÒÚÓÐÚ Ò Ù× Ö×
           × Ñ Ð Ö ØÓ Ø     Ø Ú Ù× Öº                             ÓҹРÒ
      ¯    Ë ÓÖ × Ó Ø Ñ× Ö ÓÑÔÙØ ÖÓÑ ××Ó Ø ÓÒ× Ö           Ø Ò Ù× Ö
           ÔÖ Ö Ò ×º                                              Ó ¹Ð Ò
      ¯    Ì      Ò     Ø Ø Ñ× Û Ø     ×Ø × ÓÖ × Ö ×Ù ×Ø ØÓ Ø        ØÚ
           Ù× Öº                                                  ÓҹРÒ

ÈÃ        ¾¼¼½ ÌÙØÓÖ   Ð   Ã     ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                      [PD-15]
Pattern Discovery Association mining for personalization
                            The approach of Mobasher, et al, 2001 [45]


   Main Idea: avoid offline generation of all association rules;
   generate recommendations directly from itemsets
      • discovered frequent itemsets of are stored into an “itemset
        graph” (an extension of lexicographic tree structure of
        Agrawal, et al 1999 [2])
      • recommendation generation can be done in constant time
        by doing a directed search to a limited depth

  According to our categorization
   Visibility: Personal recommenda-             Service element: pageview
   tions or silent dynamic adjustment

   Matching based on: user behaviour

PKDD 2001 Tutorial: “KDD for Personalization”                          [PD-16]
                                                                          [3]




   Methodology:

   • Construct Frequent Itemset Graph
         – each node at depth d in the graph corresponds to an
           itemset
         – I, of size d and is linked to itemsets of size d+1 that
           contain I at level d+1
         – the single root node at level 0 corresponds to the empty
           itemset

   • frequent itemsets are matched against a user's active
     session S by performing a search of graph to depth |S|
   • a recommendation r is an item at level |S+1| whose
     recommendation score is the confidence of rule S ==> r



PKDD 2001 Tutorial: “KDD for Personalization”                            [PD-17]
                                                                           [4]
Pattern Discovery Sequence mining for personalization


   Main Idea: take the ordering of accessed items into account
   Two basic approaches
        • use contiguous sequences (e.g., Web navigational patterns)
        • use general sequential patterns
      Contiguous sequential patterns are often modeled as
      Markov chains and used for prefetching (i.e., predicting
      the next user access based on previously accessed pages
      In context of recommendations, they can achieve higher
      accuracy than other methods, but may be difficult to obtain
      reasonable coverage



PKDD 2001 Tutorial: “KDD for Personalization”                        [PD-18]
                                                                       [5]




   Pattern Discovery Sequence mining for personalization

  Markov chain representation often leads to high space
  complexity due to model sizes
  Some Solutions
    • selective Markov Models (Deshpande, Karypis, 2000 [17])
         use various pruning strategies to reduce the number of states
         (e.g., support or confidence pruning, error pruning)
    • longest repeating subsequences (Pitkow, Pirolli, 1999 [])
         similar to support pruning, used to focus only on significant
         navigational paths
    • increased coverage can be achieved by using all-Kth-order
      models (i.e., using all possible sizes for user histories)

PKDD 2001 Tutorial: “KDD for Personalization”                        [PD-19]
                                                                       [6]
È ØØ ÖÒ × ÓÚ ÖÝ Ë ÕÙ Ò Ñ Ò Ò ÓÖ Ô Ö×ÓÒ Ð Þ Ø ÓÒ
                      Ì ÔÔÖÓ Ó ÙÐ ² Ë Ñ Ø¹Ì Ñ ¾
     Å       Ò


     µ Ê ÓÑÑ Ò Ø ÓÒ× Ö                                  ×
                                         ÓÒ Ö ÕÙ ÒØ Ô ØØ ÖÒ× Ó Ô ×Ø   Ú ÓÙÖº
     µ Ö ÓÑÑ Ò Ö × ÔÖ                 ØÓÖ ÓÖ Ð ×× Ó Ú ÒØ׺
     µ Ì ÓÒ×Ø ÐÐ Ø ÓÒ Ó Ø           Ö ÓÑÑ Ò Ö× ÓÖ ÐÐ Ð ×× × Ö ØÙÖÒ× Ø
                  ×Ø Ö ÓÑÑ Ò Ø ÓÒ× ÓÖ     Ú Ò Ù× Ö ×ØÓÖݺ

             ÓÖ Ò ØÓ ÓÙÖ               Ø ÓÖ Þ Ø ÓÒ
     Î × Ð ØÝ                                         Ë ÖÚ             Ð Ñ ÒØ ÍÊÄ׸ × Ø Ó                       Ø×
     Ê ÓÑÑ Ò Ø ÓÒ
         Å Ø Ò     × ÓÒ Ò Ú Ø ÓÒ                                          Ç ¹Ð Ò ØÖ Ò Ò Ó Ð ×× Ö×
          ×ØÓÖ × Ò ÍÊÄ ÔÖÓÜ Ñ ØÝ                                          ÐÓ Ð Ö ÓÑÑ Ò Ö ×Ý×Ø Ñ×
ÈÃ       ¾¼¼½ ÌÙØÓÖ        Ð    Ã      ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                                  [PD-20]




             Ò Ö          Ö Ñ ÛÓÖ


         ¯   Û Ø       Ñ       ×ÙÖ ×   ÓÖ Ø          ÕÙ Ð ØÝ Ó             Ö     ÓÑÑ Ò          Ø ÓÒ¸ Ø    Ò      Ø

                 ×Ø Ò            ØÛ    Ò        Ò          Ø    ÍÊÄ×           ÒØÓ      ÓÙÒØ

         ¯       ×Ø Ò Ù ×       Ò       ØÛ       Ò     ÝÒ Ñ                Ò    ×Ø Ø    Ö   ÓÑÑ Ò          Ö× Ø        Ø

                 Ó» Ó ÒÓØ Ø            Ù× Ö          ×ØÓÖ       ×    ÒØÓ         ÓÙÒØ

         ¯       ÓÑ    Ò Ò      ÐÓ Ð    Ö       ÓÑÑ Ò               Ö ×Ý×Ø Ñ׸              Ó    Û        ÔÖ          Ø×

                 Ð ×× Ó        Ú ÒØ×


     Û       Ö         Ð ××      Ò         ÓÒ       Ù× Ö        ×ØÓÖݸ           ÖÓÙÔ Ó         ×ØÓÖ   × ÓÖ Ø         Û ÓÐ

         Ø × Øº




     Ì       Ö    ݸ      Ò Ú        Ø ÓÒ       ×ØÓÖÝ       ×


     ¬           × Ø Ó         Ú ÒØ×

     ¬           × ÕÙ Ò          Ó     Ú ÒØ×

     ¬           ÑÓÖ       ÓÑÔÐ Ü ×ØÖÙ ØÙÖ                  Ó       Ó¹Ó        ÙÖ Ò     Ú ÒØ×



ÈÃ       ¾¼¼½ ÌÙØÓÖ        Ð    Ã      ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                                  [PD-21]
È ØØ ÖÒ            × ÓÚ ÖÝ           Í×           ÔÖÓ Ð × ÓÖ Ô Ö×ÓÒ Ð Þ Ø ÓÒ
                                          Ì            ÔÔÖÓ    Ó ÅÓ × Ö Ø Ð                     ¿¸ ¾
     ÌÛÓ ØÝÔ × Ó Ù×               ÔÖÓ Ð ×
          ÐÙ×Ø Ö× Ó         × Ñ Ð Ö Ù× Ö ØÖ Ò× Ø ÓÒ×            Ò¹          ÐÙ×Ø Ö× Ó   Ô   ×      ××
          Ò         Ý   Û        ØÒ   ×    Ñ       Ø    Ø Ö ÑÓÚ ×      ØÓ Ø       Ö
      Ô        × ÛØ     ×ÙÔÔÓÖØ Ð ×× Ø         Ò   Ñ     Ò Ú ÐÙ

          Ö        ØÒ Ø          Ñ Ñ Ö× Ó                  ÐÙ×Ø Ö ÒØÓ ÓÒ Ö ÔÖ × ÒØ Ø Ú ÔÖÓ Ð

          ÓÖ Ò ØÓ ÓÙÖ             Ø ÓÖ Þ Ø ÓÒ
       Î × Ð ØÝ È Ö×ÓÒ Ð Ö ÓÑÑ Ò ¹                                   Ë ÖÚ       Ð Ñ ÒØ Ô         Ú Û
      Ø ÓÒ ÓÖ × Ð ÒØ ÝÒ Ñ  Ù×ØÑ ÒØ
       Å Ø Ò                 × ÓÒ Ù× Ö                  Ú ÓÙÖ          Ç ¹Ð Ò × ÓÚ ÖÝ Ó
       Ð×Ó Ô                ÓÒØ ÒØ Ò                                     Ö Ø ÔÖÓ Ð ×

ÈÃ        ¾¼¼½ ÌÙØÓÖ    Ð    Ã     ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                  [PD-22]




                    µ Ú            × Ñ Ð Ö Ô Ö ÓÖÑ Ò ØÓ ÓÒ¹Ð Ò ÓÐÐ ÓÖ Ø Ú ÐØ Ö Ò
      Ñ×
                    µ Ù× Ò         Ñ Ò Ñ Ð ÒÙÑ Ö Ó Ô Ú Û× ÓÖ Ø          Ø Ú Ù× Ö

     Å Ø Ó ÓÐÓ Ý
      ¯    ÈÖ ÔÖÓ ×× Ò Ô ×
           ¬ ×× ÒÑ ÒØ Ó Û Ø× ØÓ Ø Ô Ú Û×
           ¬ Ë Ò Ò Ø ×Ø Ò ¸ × ÓÒ Ô ×Ø Ý Ø Ñ
           ¬ ÆÓÖÑ Ð Þ Ø ÓÒ Ó Ô Ú Û Û Ø×
      ¯    È       Ì ÈÖÓ Ð       Ö Ø ÓÒ × ÓÒ ÐÙ×Ø Ö Ò Ì Ò ÕÙ ×
           ½º        ÐÙ×Ø Ö Ò Ó Ù×     Ø ØÓ ×Ø Ð × Ø        Ö Ø ÔÖÓ Ð ×
           ¾º      Å Ø Ö Ð Þ Ø ÓÒ Ó Ø ÔÖÓ Ð × × Ú ØÓÖ× Ó ´Ô ¸Û     ص Ô Ö×
           ¿º      Ë Ò Ó Ø Ù× Ö³× ×ØÓÖÝ Ý Ñ Ò× Ó ×Ð Ò Û Ò ÓÛ Ø Ø
                    ÐÐÓÛ× ÓÒÐÝ × Ø Ó Ô       ×× × ØÓ     ÓÒ× Ö Ò Ø ÔÖÓ Ð
               º   Å Ø Ò Ø Ù× Ö × ×× ÓÒ Û Ø          ÔÖÓ Ð
               º   Å Ø Ö Ò Ò
ÈÃ        ¾¼¼½ ÌÙØÓÖ    Ð    Ã     ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                               [PD-23]
A Framework for Personalization Based on
   Aggregate Profiles




                             Offline Phase

PKDD 2001 Tutorial: “KDD for Personalization”                          [PD-24]
                                                                         [7]




   A Framework for Personalization Based on
   Aggregate Profiles

        Input from the
        batch process

                                                                Online
          Usage Profiles                                        Phase

         Content Profiles




   •   Match current user’s activity against the discovered profiles
   •   Each recommended item is assigned a score based on
        – matching criteria and quality of aggregate profiles
        – “information value” of the item based on domain knowledge

PKDD 2001 Tutorial: “KDD for Personalization”                          [PD-25]
                                                                         [8]
Aggregate Profiles Based on Clustering
   Transactions (PACT) (Mobasher, et al, [42, 43])

   •   Input
        – set of relevant pageviews in preprocessed log

                           P = { p1 , p2 ,! , pn }
        – set of user transactions

                               T = {t1 , t 2 , ! , t m }

        – each transaction is a pageview vector

                     t = w( p1 , t ), w( p2 , t ),..., w( pn , t )




PKDD 2001 Tutorial: “KDD for Personalization”                        [PD-26]
                                                                       [9]




   Aggregate Profiles Based on Clustering
   Transactions (PACT)

   •   Transaction Clusters
        – each cluster contains a set of transaction vectors

        – for each cluster compute centroid as cluster
          representative
                              "
                              c = u1c , u2 ,!, un
                                         c      c



   •   Aggregate Usage Profiles
        – a set of pageview-weight pairs: for transaction cluster
                                                   c
          C, select each pageview pi such that ui (in the cluster
          centroid) is greater than a pre-specified threshold



PKDD 2001 Tutorial: “KDD for Personalization”                        [PD-27]
                                                                      [10]
Example Aggregate Profiles

   •     Example Profiles based on the PACT method
          – Based on data from the Association for Consumer
            Research Site:

       1.00
        1.00   Call for Papers
                Call for Papers
       0.67
        0.67   ACR News Special Topics
                ACR News Special Topics
       0.67
        0.67   CFP: Journal of Psychology and Marketing I
                CFP: Journal of Psychology and Marketing I
       0.67
        0.67   CFP: Journal of Psychology and Marketing II
                CFP: Journal of Psychology and Marketing II
       0.67
        0.67   CFP: Journal of Consumer Psychology II
                CFP: Journal of Consumer Psychology II
       0.67
        0.67   CFP: Journal of Consumer Psychology I
                CFP: Journal of Consumer Psychology I

                       1.00
                        1.00   CFP: Winter 2000 SCP Conference
                                CFP: Winter 2000 SCP Conference
                       1.00
                        1.00   Call for Papers
                                Call for Papers
                       0.36
                        0.36   CFP: ACR 1999 Asia-Pacific Conference
                                CFP: ACR 1999 Asia-Pacific Conference
                       0.30
                        0.30   ACR 1999 Annual Conference
                                ACR 1999 Annual Conference
                       0.25
                        0.25   ACR News Updates
                                ACR News Updates
                       0.24
                        0.24   Conference Update
                                Conference Update



PKDD 2001 Tutorial: “KDD for Personalization”                      [PD-28]
                                                                    [11]




       Hypergraph-Based Clustering
       (Han, Karypis, Kumar, Mobasher, 1998 [26])



   •     Construct a hypergraph from
         sets of related items
          – Each hyperedge represents a
            frequent itemset

          – Weight of each hyperedge can
            be based on the characteristics
            of frequent itemsets or
            association rules (e.g.,
            support, confidence, interest,
            etc.)




PKDD 2001 Tutorial: “KDD for Personalization”                      [PD-29]
                                                                     [12]
Hypergraph-Based Clustering

  •     Recursively partition hypergraph so that each partition
        contains only highly connected items
         – Given a hypergraph we find a k-way partitioning such
           that the weight of the hyperedges that are cut is
           minimized
         – The fitness of partitions measured in terms of the ratio
           of weights of cut edges to the weights of uncut edges
           within the partitions
         – The connectivity measures the percentage of edges
           within the partition with which the vertex is associated --
           used for filtering partitions
         – Vertices from partial edges can be added back to
           clusters based on a user-specified overlap factor



PKDD 2001 Tutorial: “KDD for Personalization”                        [PD-30]
                                                                       [13]




       Profiles Based on Hypergraph Clusters
       (Mobasher, Cooley, Srivastava, 1999 [41])


   •     Input
          – input for clustering is the set of large itemsets from
            association rule module

          – each itemset is a hyperedge (weights are a function of
            the interest of the itemset)

                                            support( I )
                        Interest ( I ) =
                                           ∏ i∈I support(i)
          – In practice can use the log of interest to avoid few
            highly frequent patterns from totally dominating




PKDD 2001 Tutorial: “KDD for Personalization”                        [PD-31]
                                                                       [14]
Profiles Based on Hypergraph Clusters


   • Aggregate Profiles (Item/Pageview Clusters)
        – clustering program directly outputs a set of
          overlapping pageview clusters

        – the weight associated with pageview p in a cluster
          C is based on the connectivity value of p in
          hypergraph partition:

                                      {e | e ⊆ C , p ∈ e}
                     conn( p, C ) =
                                         {e | e ⊆ C}




PKDD 2001 Tutorial: “KDD for Personalization”                       [PD-32]
                                                                      [15]




    Recommendation Engine for Using
    Aggregate Profiles
   •   Match user’s activity against discovered profiles
        – a sliding window over the active session to capture the
          current user’s “short-term” history depth
        – profiles and the active session are treated as vectors
        – matching score is computed based on the similarity
          between vectors (e.g., normalized cosine similarity)
   •   Recommendation scores are based on
             • matching score to aggregate profiles
             • “information value” of the recommended item (e.g., link
               distance of the recommendation to the active session)
        – recommendations are contributed by multiple profiles


PKDD 2001 Tutorial: “KDD for Personalization”                         [16]
                                                                    [PD-33]
Active Session Window

    •   Example: Session window of size 5

    A.html ! B.html ! C.html ! D.html ! E.html ! D.html ! F.html




                   active user session             Session window

    •   Associating weight with items in the active session:
         – assigned by site owner based on perceived importance
         – based on recency (recent pages weighted higher) or
           time spent on pages
         – based on page types (e.g., content v. navigational)




PKDD 2001 Tutorial: “KDD for Personalization”                               [PD-34]
                                                                              [17]




  Example: Recommendations Based on PACT
  Example profiles: Current User Session U: A.html => B.html => C.html => E.html
   PROFILE 0
   -------------     Assume session window size of 3 and unit weights, using
   1.00   D.html     (cosine) similarity between active session and each profile:
   0.50   A.html
   0.50   C.html     Sim(U, P0) = (0.5+0.5) / SQRT (1.75 * 3) = 0.44
   0.50   E.html     Sim(U, P1) = (0.5+0.5+0.5) / SQRT(2.5*3) = 0.20
                     Sim(U, P2) = (0.75+0.5) / SQRT(1.69*3) = 0.25
   PROFILE 1
   -------------                                                 Recommendations
   1.00   A.html     Candidate Recommendations:
   0.50   B.html
   0.50   C.html     P0: D.html (SQRT(0.44*1.00) = 0.66)
   0.50   D.html         A.html (SQRT(0.44*0.50) = 0.47)
   0.50   E.html
   0.50   F.html
                     P1: A.html (SQRT(0.20*1.00) = 0.45)
   PROFILE 2             D.html (SQRT(0.20*0.50) = 0.32)
   -------------         F.html (SQRT(0.20*0.50) = 0.32)
   0.75   B.html
   0.75   F.html
   0.50   A.html     P2: F.html (SQRT(0.22*0.75) = 0.41)
   0.50   C.html         A.html (SQRT(0.22*0.50) = 0.33)
   0.25   D.html         D.html (SQRT(0.22*0.25) = 0.23)



PKDD 2001 Tutorial: “KDD for Personalization”                               [PD-35]
                                                                              [18]
Integration of Content Profiles
   (Mobasher, et al., 2000 [44])


   •   Cluster features over the n-dimensional space of pageviews

   •   For each feature cluster derive a content profile by
       collecting pageviews in which these features appear as
       significant (represented as overlapping collections of
       pageview-weight pairs)

  Weight                       Pageview ID                           Significant Features (stems)
   1.00    CFP: One World One Market                          world challeng busi co manag global
   0.63    CFP: Int'l Conf. on Marketing & Development        challeng co contact develop intern
   0.35    CFP: Journal of Global Marketing                   busi global
   0.32    CFP: Journal of Consumer Psychology                busi manag global
  Weight                       Pageview ID                           Significant Features (stems)
   1.00    CFP: Journal of Psych. & Marketing                 psychologi consum special market
   1.00    CFP: Journal of Consumer Psychology I              psychologi journal consum special market
   0.72    CFP: Journal of Global Marketing                   journal special market
   0.61    CFP: Journal of Consumer Psychology II             psychologi journal consum special
   0.50    CFP: Society for Consumer Psychology               psychologi consum special
   0.50    CFP: Conf. on Gender, Market., Consumer Behavior   journal consum market


PKDD 2001 Tutorial: “KDD for Personalization”                                                   [PD-36]
                                                                                                  [19]




   Integration of Content Profiles

   •   Integration with Recommendation Engine
        – Usage and content profiles have similar representation,
          so they can be used by the recommendation engine in
          the same way
              • Item weights in profiles must be normalized, so content
                and usage profiles can be compared on the same scale
        – One approach: match active user session with all
          profiles (both content and usage); then use the maximal
          recommendation score for candidate recommendations
        – Another approach: use content profiles for generating
          recommendations only if no matching usage profiles
          (with sufficient confidence) is found




PKDD 2001 Tutorial: “KDD for Personalization”                                                   [PD-37]
                                                                                                  [20]
Evaluating Personalization




    PKDD 2001 Tutorial: “KDD for Personalization”                  [E-1]




  Evaluating usability: goals / tasks?

  Recall operational definition:
     A Web site’s usability is high if users
     - achieve their goals / perform their tasks in little time,
     - do so with a low error rate,
     - experience high subjective satisfaction.

  Depending on the site, relevant goals / tasks may be to:
  - stay in the site, return to the site, buy... => E-metrics
  - locate content (search),
  - learn,
  - ...


PKDD 2001 Tutorial: "KDD for Personalization"                      [E-2]
Evaluating usability: methodological caveats
  Questionnaire data:
     self-reports are often biased;
     observation of behavior in experiments advisable

  Comparisons of sites with/without personalization,
  or before/after personalization introduced,
  with respect to "normal user behavior" (server logs):
     usually a quasi-experiment
     - many uncontrolled variables (e.g., user intentions)
     - poss. several differences between sites/site versions
     => causal attribution of success to personalization
         becomes difficult

PKDD 2001 Tutorial: "KDD for Personalization"                     [E-3]




  Evaluating usability: results I

  CyberBehavior Research Center 1999 survey

     - 81% of 694 respondents have visited a person. site
     - 64% of those found it useful: helpful, time saving
     - perceived usefulness changes with product
        (books > music > inf.technol. > news/articles > other)
     - main problems: privacy, ineffectiveness when behav.
        did not reflect user "personally" (e.g., buying a gift)
     - concern that possible choices may be limited
     - little differences of opinion between personalization
        occurring in response to behavior or to solicited input


PKDD 2001 Tutorial: "KDD for Personalization"                     [E-4]
Evaluating usability: results II

  Belkin [3], reviewing studies of recommendations
  in IR systems carried out at Rutgers Univ. since 1995:

     - measures of performance and subj. satisfaction

     - relevance feedback worked well, but bettter with both
        increased knowledge of how it worked, and with
        increased control by the user of its suggestions:
     - relevance feedback + term suggestion performed better
        than, and was preferred to, pure relevance feedback
     - users preferred to save effort:
        were willing to hand over the subsidiary task of term
        selection to a system they trust ed


PKDD 2001 Tutorial: "KDD for Personalization"               [E-5]




  Evaluating usability: results III

  Nielsen Net Ratings 1999

     registered visitors of portal sites,
     i.e., those who can customize,

     - spend > 3 times longer at home portal than others
     - view 3-4 times more pages




PKDD 2001 Tutorial: "KDD for Personalization"               [E-6]
Why are results scarce? Possible reasons

  "In essence, web design is a problem in user interface design.
  However, ... few web designers can afford to subject their
  web sites to formal usability testing in special labs."
    Perkowitz & Etzioni [52]: Adaptive web sites: an AI challenge.

  "Web personalization is much over-rated and mainly used as
  a poor excuse for not designing a navigable website."
                             Nielsen [47]: Personalization is over-rated.

  "Personalization costs. ... You’re more likely to get a good
  return on your efforts ... by fixing other problems, such as
  difficulty in locating content."

                             Lighthouse on the Web [36], quoting from
                            Mainspring and User Interface Engineering
PKDD 2001 Tutorial: "KDD for Personalization"                               [E-7]




  Can other results be transferred?

  Research on adaptive educational software since ~ 1970
     - usually, user control helpful for learning;
        adaptive interfaces particularly helpful for novices
     - interfaces changing over time: difficult to learn
     - adaptive presentation (more info depending on user
        knowledge) improves comprehension and reduces
        reading time
     - adaptive link annotation
        - can reduce no. of visited pages + learning time
        - encourages novices to navigate non-sequentially
        - enables users to rate the difficulty of a page better


PKDD 2001 Tutorial: "KDD for Personalization"                               [E-8]
Can other results be transferred? (contd.)


     - adaptive link ordering improves user performance
        in information search tasks

     - but unstable order of options is confusing for novices
        so hiding is better for novices
     - for novices, direct guidance is useful
        ("next" link is most popular choice)

     - the more users agree with the system’s suggestions,
        the better their test results

                                                (surveys in [11,12])
PKDD 2001 Tutorial: "KDD for Personalization"                     [E-9]




 Further factors affecting subjective satisfaction
- user control (general guideline for software development)
- must match user’s interests at the moment
- users don’t want extra work: "paradox of the active user"
- users don’t like to be recognized too soon
- users want to be anonymous, at least at certain times
- users want openness / disclosure
- people don’t want relationships with corporations,
   but with other people
- be specific without being exclusive
- consider information structure on Web
   (non-monetary rewards better than differential pricing)
                    respect the user !
PKDD 2001 Tutorial: "KDD for Personalization"                    [E-10]
È ØØ ÖÒ Ú ÐÙ Ø ÓÒ ÖÓÑ Ø                                                         Ù× Ò ××
     È Ö×Ô Ø Ú




ÈÃ       ¾¼¼½ ÌÙØÓÖ       Ð    Ã    ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ            ÅÝÖ ËÔ Ð ÓÔÓÙÐÓÙ           ÀÀÄ º[E-11]
                                                                                                   ºº




     Í× Ö Ë Ø × Ø ÓÒ ² Ù× Ò ×× ËÙ ××
         ÓÑÔ ÒÝ ÓÔ Ö Ø Ò                 Ï          ×Ø   × ÓÙÐ       Ö     ØÓ    Ö    Ø   Ú ÐÙ       ÓÖ Ø×

     ´ÔÖÓ×Ô      ØÚ µ         Ù×ØÓÑ Ö×


     µ    Á Ø     Ö       × ÒÓ               Ú ÐÙ    ÓÖ Ø       Ù× Ö׸ Ø    Ý Û ÐÐ ÒÓØ       ÙÝ       Ò   Ø   Ý

          Û ÐÐ ÒÓØ            ÓÑ        Òº


     µ    Á Ø      Ù× Ö×» Ù×ØÓÑ Ö×              Ö    ÒÓØ × Ø ×       ¸ Ø        Ý Û ÐÐ ÒÓØ   ÙÝ       Ò »ÓÖ

          Ø     Ý Û ÐÐ ÒÓØ         ÓÑ           Òº


     µ    Í× Ö» Ù×ØÓÑ Ö × Ø ×                   Ø ÓÒ ×       ÔÖ Ö ÕÙ × Ø         ÓÖ Û ÒÒ Ò       Ø    Ñ ØÓ Ø

              ÓÑÔ Òݺ


                                    ¯        ÓÒÚ Ö× ÓÒ      Ì     Ù× Ö           ÓÑ ×     Ù×ØÓÑ Öº
     Ï ÒÒ Ò           Ñ       Ò×
                                    ¯   Ê Ø ÒØ ÓÒ        Ì        Ù×ØÓÑ Ö ×Ø Ý× ÐÓÝ Ðº




ÈÃ       ¾¼¼½ ÌÙØÓÖ       Ð    Ã    ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                        [E-12]
Í× Ö Ë Ø × Ø ÓÒ ÅÓ ÐÐ Ò
     ÁÒ           ØÓÖ× Ø             Ø Ö ÕÙ Ö         ÒØ Ö    Ø ÓÒ Û Ø      Ø     Ù× Ö


          ¯   ÁÒØ Ö         Ø Ú ØÝ

          ¯        ×    Ó    Ù×

          ¯   ÈÐ       × Ò           ÒÚ ÖÓÒÑ Òظ             ÒØ ÖØ    Ò Ò       ÒÚ ÖÓÒÑ ÒØ

          ¯   ÅÙÐØ ÔÐ            Ò Ú             Ø ÓÒ Ñ Ø Ô ÓÖ×

          ¯   ººº

          ¯   Î ÐÙ           Ö           Ø ÓÒ¸    × Ô Ö      Ú       Ý Ø    Ù× Ö


     ÁÒ           ØÓÖ× Ø             Ø       Ò        Ñ   ×ÙÖ    » ÔÔÖÓÜ Ñ Ø             Û Ø ÓÙØ Ù× Ö   ÒØ Ö   Ø ÓÒ


          ¯   È        × Ô Ö Ú × ØÓÖ

          ¯       ÙÖ Ø ÓÒ Ó                ×Ø Ý

          ¯   Î × ØÓÖ× Ô Ö Ô                          ¼

          ¯   Ê ×ÔÓÒ×                Ø Ñ          ¼



ÈÃ         ¾¼¼½ ÌÙØÓÖ            Ð       Ã       ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                             [E-13]




     Í× Ö Ë Ø × Ø ÓÒ ÓÑÔÙØ Ø ÓÒ
          ¯ Á ÒØ                 Ø ÓÒ Ó               × Ø Ó × Ø×       Ø ÓÒ Ò        ØÓÖ×
          ¯        × ÒÓ                   Ò ÔÔÖÓÔÖ Ø ÕÙ ×Ø ÓÒÒ Ö
          ¯ ÈÖ × ÒØ Ø ÓÒ Ó Ø                           ÕÙ ×Ø ÓÒÒ Ö ØÓ             Ö ÔÖ × ÒØ Ø Ú Ù× Ö × ÑÔÐ
          ¯       Ò ÐÝ× × Ó Ø                    Ö ×ÔÓÒ× ×
          ¯    ÓÒ ÐÙ× ÓÒ× ÓÒ Ø                         ÑÔ Ø Ó Ø             ÓÖÖ Ð Ø ÓÒ× ÑÓÒ Ø           × Ø×    Ø ÓÒ
              Ò    ØÓÖ×




ÈÃ         ¾¼¼½ ÌÙØÓÖ            Ð       Ã       ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                              [E-14]
Í× Ö Ë Ø ×                       Ø ÓÒ           Ò       ÜÔ Ö Ñ ÒØ            Ì         ×ØÙ Ý Ó             Ñ Ý ¾½

         ¯              ØÓÖ× Ö            Ø Ò Ù× Ö × Ø ×              Ø ÓÒ
                ¡        × Ó Ù×
                ¡     ÁÒ ÓÖÑ Ø ÓÒ ÙØ Ð ØÝ Ó Ø ÔÖ × ÒØ  ÓÒØ ÒØ
                ¡       ØØÖ Ø Ú Ò ×× Ó Ø ÔÖ × ÒØ Ø ÓÒ Ñ Ø Ô ÓÖ
                ¡     ººº
         ¯     ÜÔ Ö Ñ ÒØ Ð × ØØ Ò × ÓÖ Ø                              Ú ÐÙ Ø ÓÒ Ó                × ØÓ      ÓÑÑ Ö        Ð ×Ø ×
                ¡     Å ÔÔ Ò Ó Ø        ØÓÖ× ÓÒ ÕÙ ×Ø ÓÒÒ Ö
                ¡      ×Ø Ð × Ñ ÒØ Ó      ÖÓÙÔ Ó Ö ÔÖ × ÒØ Ø Ú Ù× Ö×
                ¡      ÜÔ Ö Ñ ÒØ Ø ÓÒ ÓÒ ÐÓ Ð ÓÑÔÙØ Ö ÔÓÓÐ                                                             Ò Ú ØÖÓ
         ¯    ËØ Ø ×Ø                Ð Ò ÐÝ× × Ó Ø              Ù× Ö Ö ×ÔÓÒ× ×
         ¯    Ê Ò Ò Ó Ø                              ØÓÖ× Ý ÑÔÓÖØ Ò

ÈÃ           ¾¼¼½ ÌÙØÓÖ          Ð    Ã       ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                            [E-15]




     Ì            Ò      Ò × Ó           ¾½      Ö


         ¯    ÉÙ Ð ØÝ Ó Ø                  ÔÖ × ÒØ Ø ÓÒ Ñ Ø Ô ÓÖ

                ÒØ ÖØ        ÒÑ ÒØ Û                 Ò         ×× Ò   Ø      ×Ø       ÔÐ Ý× Ø         ÑÓ×Ø ÑÔÓÖØ ÒØ

              ÖÓÐ º


         ¯    ÁÒ ÓÖÑ Ø ÓÒ ÙØ Ð ØÝ

              Ì             ÑÓÙÒØ Ó           Ò ÓÖÑ Ø ÓÒ Ñ                   Ú    Ð     Ð       × Ø   ×    ÓÒ    ÑÓ×Ø

               ÑÔÓÖØ ÒØ                   ØÓÖº



             ÙÖØ        Ö    Ò       Ò    ×

     Ì             Û        × Ø × Ø ×Ø                       ÒÓØ Ñ           ×ØÖÓÒ          Ò    Ù×   ÙÐ   ÓÒÒ       Ø ÓÒ

     ÛØ             Ø        ÒØ Ö ×Ø× Ó Ø                    ×ØÙ Ý Ô ÖØ      Ô ÒØ×          Ò           ÒÓØ     ×Ù

         Ò     Ö        ØÒ               ÓÒØ ÜØ          Ò    × Ò×    Ó      ÓÑÑÙÒ ØÝ Ò                    ØÓ    ÙÐ

         ÓÒØ ÒÙ Ò                Ö Ð Ø ÓÒ×           Ô ÛØ       Û     ×Ø     Ù× Ö× º



ÈÃ           ¾¼¼½ ÌÙØÓÖ          Ð    Ã       ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                            [E-16]
ź ËÔ Ò ÓÐ Ò Ö            ÔØÙÖ ×       Ú    Ý     Ö× Ó   ÒØ ¹ Ù×ØÓÑ Ö¹× Ø ×            Ø ÓÒ Ö ÔÓÖØ×

     ÒØÓ Ø     ÕÙ ×Ø ÓÒ



                              Á×    Ù×ØÓÑ Ö Ë Ø ×             Ø ÓÒ    ÁÖÖ Ð Ú ÒØ



      Ò×Û Ö       Ù×ØÓÑ Ö Ñ           ×ÙÖ Ñ ÒØ ×Ý×Ø Ñ× × ÓÙÐ                      Ö Ú×Ø     º




ÈÃ     ¾¼¼½ ÌÙØÓÖ   Ð     Ã        ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                       [E-17]




     Í× Ö Ë Ø × Ø ÓÒ ² Ù× Ò ×× ËÙ ××
      ¯   Í× Ö»    Ù×ØÓÑ Ö × Ø ×            Ø ÓÒ       ×   ÔÖ Ö ÕÙ × Ø       ÓÖ    Û      ¹× Ø ³× ×Ù        ×׺



      ¯   Í× Ö»    Ù×ØÓÑ Ö × Ø ×            Ø ÓÒ       Ó × ÒÓØ      ÑÔÐÝ     Û     ¹× Ø ³× ×Ù         ×׺




          Ù×


      ¬   Ì       Ó Ð Ó        Û     ¹× Ø       × ÒÓØ ØÓ Ñ           Ù× Ö×    ÔÔݺ



      ¬   Ì       Ó Ð Ó        Û     ¹× Ø       × ØÓ   ÓÒØÖ    ÙØ     ÒØÓ    Ù× Ò ×× ×Ù         ×׺




ÈÃ     ¾¼¼½ ÌÙØÓÖ   Ð     Ã        ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                                       [E-18]
Í× Ö Ë Ø × Ø ÓÒ ² Ù× Ò ×× ËÙ ××
      ¯      Û Ö Ò ××
      ¯      ÓÒØ Ø
      ¯      ÓÒÚ Ö× ÓÒ
                                           ¬          Ò ÓÒÑ ÒØ Ò×Ø            Ó     ÓÒÚ Ö× ÓÒ
      ¯ Ê Ø ÒØ ÓÒ                     Ò
                                           ¬     ØØÖ Ø ÓÒ Ò×Ø          Ó Ö Ø ÒØ ÓÒ



     ÀÓÛ      ÒØ       ×   ÓÒ ÔØ×         ØÖ Ò×Ð Ø       ÒØÓ Ò            ØÓÖ× ÓÑÔÙØ     Ð ÙÔÓÒ
     Ù×ØÓÑ Ö           Ø




ÈÃ        ¾¼¼½ ÌÙØÓÖ   Ð   Ã     ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                             [E-19]




      Ù× Ò ×× ËÙ               ×× ÖÓÑ Ø Ú ÛÔÓ ÒØ Ó Ø                       ËØ


                                                ¡ ÆÙÑ         Ö Ó Ô        Ö ÕÙ ×Ø×
      ¯     ËØ             Ò Ý    ×                                                             ¾¼
                                                ¡     ÙÖ Ø ÓÒ Ó × Ø        Ú × Ø×


                                                ¡ Ê ×ÔÓÒ×        ØÑ

                                                ¡ ËÙÔÔÓÖØ         Ò Ú       Ø ÓÒ ÑÓ

                                                ¡      × ÓÚ Ö      Ð ØÝ
      ¯ Ë Ø      ÕÙ Ð ØÝ                                                                         ¼
                                                ¡        ××     Ð ØÝ

                                                ¡ È      × Ô Ö Ú × ØÓÖ

                                                ¡ Î × ØÓÖ× Ô Ö Ô




ÈÃ        ¾¼¼½ ÌÙØÓÖ   Ð   Ã     ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ                                             [E-20]
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization
Kdd for personalization

More Related Content

What's hot

บทความวิจัย"การพัฒนาบทเรียนออนไลน์เรื่องหลักการเขียนโปรแกรมคอมพิวเตอร์"
บทความวิจัย"การพัฒนาบทเรียนออนไลน์เรื่องหลักการเขียนโปรแกรมคอมพิวเตอร์"บทความวิจัย"การพัฒนาบทเรียนออนไลน์เรื่องหลักการเขียนโปรแกรมคอมพิวเตอร์"
บทความวิจัย"การพัฒนาบทเรียนออนไลน์เรื่องหลักการเขียนโปรแกรมคอมพิวเตอร์"Kanokrat Jirasatjanukul
 
Kemete ρατσισμός και μμε
Kemete ρατσισμός και μμεKemete ρατσισμός και μμε
Kemete ρατσισμός και μμε4Gym Glyfadas
 
Taluka Population
Taluka PopulationTaluka Population
Taluka Populationpriteeg
 

What's hot (6)

From Virtual Worlds To The 3 D Web
From Virtual Worlds To The 3 D WebFrom Virtual Worlds To The 3 D Web
From Virtual Worlds To The 3 D Web
 
บทความวิจัย"การพัฒนาบทเรียนออนไลน์เรื่องหลักการเขียนโปรแกรมคอมพิวเตอร์"
บทความวิจัย"การพัฒนาบทเรียนออนไลน์เรื่องหลักการเขียนโปรแกรมคอมพิวเตอร์"บทความวิจัย"การพัฒนาบทเรียนออนไลน์เรื่องหลักการเขียนโปรแกรมคอมพิวเตอร์"
บทความวิจัย"การพัฒนาบทเรียนออนไลน์เรื่องหลักการเขียนโปรแกรมคอมพิวเตอร์"
 
UGC Paper i set-z (1)
UGC Paper i set-z (1)UGC Paper i set-z (1)
UGC Paper i set-z (1)
 
Kemete ρατσισμός και μμε
Kemete ρατσισμός και μμεKemete ρατσισμός και μμε
Kemete ρατσισμός και μμε
 
Khutbah Jumuah, Eid and Nikah (خطبہ جمعہ،عید و نکاح )
Khutbah Jumuah, Eid and Nikah (خطبہ جمعہ،عید و نکاح )Khutbah Jumuah, Eid and Nikah (خطبہ جمعہ،عید و نکاح )
Khutbah Jumuah, Eid and Nikah (خطبہ جمعہ،عید و نکاح )
 
Taluka Population
Taluka PopulationTaluka Population
Taluka Population
 

Viewers also liked

P7130102
P7130102P7130102
P7130102pjgross
 
Caesar Javier Goddard
Caesar Javier GoddardCaesar Javier Goddard
Caesar Javier Goddardguest79cceb
 
Poets and hackers
Poets and  hackersPoets and  hackers
Poets and hackersAjay Ohri
 
Dhs cybersecurity-roadmap
Dhs cybersecurity-roadmapDhs cybersecurity-roadmap
Dhs cybersecurity-roadmapAjay Ohri
 
security and privacy for medical implantable devices
security and privacy for medical implantable devicessecurity and privacy for medical implantable devices
security and privacy for medical implantable devicesAjay Ohri
 
8rules sigrec
8rules sigrec8rules sigrec
8rules sigrecAjay Ohri
 

Viewers also liked (7)

P7130102
P7130102P7130102
P7130102
 
Caesar Javier Goddard
Caesar Javier GoddardCaesar Javier Goddard
Caesar Javier Goddard
 
security product list
security product listsecurity product list
security product list
 
Poets and hackers
Poets and  hackersPoets and  hackers
Poets and hackers
 
Dhs cybersecurity-roadmap
Dhs cybersecurity-roadmapDhs cybersecurity-roadmap
Dhs cybersecurity-roadmap
 
security and privacy for medical implantable devices
security and privacy for medical implantable devicessecurity and privacy for medical implantable devices
security and privacy for medical implantable devices
 
8rules sigrec
8rules sigrec8rules sigrec
8rules sigrec
 

Similar to Kdd for personalization

122404969 கல்விச்-சுடர்-2
122404969 கல்விச்-சுடர்-2122404969 கல்விச்-சுடர்-2
122404969 கல்விச்-சுடர்-2Kogila Nadasan
 
122404969 கல்விச்-சுடர்-2
122404969 கல்விச்-சுடர்-2122404969 கல்விச்-சுடர்-2
122404969 கல்விச்-சுடர்-2Kogila Nadasan
 
राष्ट्रीय उत्पन्नाचे मोजमाप व राष्ट्रीय उत्पन्नाच्या संकल्पना
राष्ट्रीय उत्पन्नाचे मोजमाप व राष्ट्रीय उत्पन्नाच्या संकल्पना राष्ट्रीय उत्पन्नाचे मोजमाप व राष्ट्रीय उत्पन्नाच्या संकल्पना
राष्ट्रीय उत्पन्नाचे मोजमाप व राष्ट्रीय उत्पन्नाच्या संकल्पना Dr. Annaji Madavi
 
UGC NET June 2014 Paper I
UGC NET June 2014 Paper IUGC NET June 2014 Paper I
UGC NET June 2014 Paper IBonala Kondal
 
ENC Times-December 10,2017
ENC Times-December 10,2017ENC Times-December 10,2017
ENC Times-December 10,2017ENC
 
Defesa de Lula volta a pedir suspeição de Sergio Moro
Defesa de Lula volta a pedir suspeição de Sergio MoroDefesa de Lula volta a pedir suspeição de Sergio Moro
Defesa de Lula volta a pedir suspeição de Sergio MoroAquiles Lins
 

Similar to Kdd for personalization (20)

122404969 கல்விச்-சுடர்-2
122404969 கல்விச்-சுடர்-2122404969 கல்விச்-சுடர்-2
122404969 கல்விச்-சுடர்-2
 
122404969 கல்விச்-சுடர்-2
122404969 கல்விச்-சுடர்-2122404969 கல்விச்-சுடர்-2
122404969 கல்விச்-சுடர்-2
 
Paisewari
Paisewari Paisewari
Paisewari
 
UGC Paper i set-x (1)
 UGC Paper i set-x (1) UGC Paper i set-x (1)
UGC Paper i set-x (1)
 
UGC Paper i set-x
 UGC Paper i set-x UGC Paper i set-x
UGC Paper i set-x
 
UGC Paper i set-z
UGC Paper i set-zUGC Paper i set-z
UGC Paper i set-z
 
UGC Paper i set-y
 UGC Paper i set-y UGC Paper i set-y
UGC Paper i set-y
 
UGC Paper i set-y (1)
 UGC Paper i set-y (1) UGC Paper i set-y (1)
UGC Paper i set-y (1)
 
राष्ट्रीय उत्पन्नाचे मोजमाप व राष्ट्रीय उत्पन्नाच्या संकल्पना
राष्ट्रीय उत्पन्नाचे मोजमाप व राष्ट्रीय उत्पन्नाच्या संकल्पना राष्ट्रीय उत्पन्नाचे मोजमाप व राष्ट्रीय उत्पन्नाच्या संकल्पना
राष्ट्रीय उत्पन्नाचे मोजमाप व राष्ट्रीय उत्पन्नाच्या संकल्पना
 
D2006p3
D2006p3D2006p3
D2006p3
 
D2006p3
D2006p3D2006p3
D2006p3
 
UGC Paper i set-w
 UGC Paper i set-w UGC Paper i set-w
UGC Paper i set-w
 
net exam
net examnet exam
net exam
 
Paper i set-w
Paper i set-wPaper i set-w
Paper i set-w
 
Paper i set-w (1)
Paper i set-w (1)Paper i set-w (1)
Paper i set-w (1)
 
Fthorio
FthorioFthorio
Fthorio
 
S 08-social-science
S 08-social-scienceS 08-social-science
S 08-social-science
 
UGC NET June 2014 Paper I
UGC NET June 2014 Paper IUGC NET June 2014 Paper I
UGC NET June 2014 Paper I
 
ENC Times-December 10,2017
ENC Times-December 10,2017ENC Times-December 10,2017
ENC Times-December 10,2017
 
Defesa de Lula volta a pedir suspeição de Sergio Moro
Defesa de Lula volta a pedir suspeição de Sergio MoroDefesa de Lula volta a pedir suspeição de Sergio Moro
Defesa de Lula volta a pedir suspeição de Sergio Moro
 

More from Ajay Ohri

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay OhriAjay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RAjay Ohri
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionAjay Ohri
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for freeAjay Ohri
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10Ajay Ohri
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri ResumeAjay Ohri
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientistsAjay Ohri
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...Ajay Ohri
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data scienceAjay Ohri
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessAjay Ohri
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data ScienceAjay Ohri
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data ScientistsAjay Ohri
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in PythonAjay Ohri
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen OomsAjay Ohri
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsAjay Ohri
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha Ajay Ohri
 
Analyze this
Analyze thisAnalyze this
Analyze thisAjay Ohri
 

More from Ajay Ohri (20)

Introduction to R ajay Ohri
Introduction to R ajay OhriIntroduction to R ajay Ohri
Introduction to R ajay Ohri
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Social Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 ElectionSocial Media and Fake News in the 2016 Election
Social Media and Fake News in the 2016 Election
 
Pyspark
PysparkPyspark
Pyspark
 
Download Python for R Users pdf for free
Download Python for R Users pdf for freeDownload Python for R Users pdf for free
Download Python for R Users pdf for free
 
Install spark on_windows10
Install spark on_windows10Install spark on_windows10
Install spark on_windows10
 
Ajay ohri Resume
Ajay ohri ResumeAjay ohri Resume
Ajay ohri Resume
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
National seminar on emergence of internet of things (io t) trends and challe...
National seminar on emergence of internet of things (io t)  trends and challe...National seminar on emergence of internet of things (io t)  trends and challe...
National seminar on emergence of internet of things (io t) trends and challe...
 
Tools and techniques for data science
Tools and techniques for data scienceTools and techniques for data science
Tools and techniques for data science
 
How Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help businessHow Big Data ,Cloud Computing ,Data Science can help business
How Big Data ,Cloud Computing ,Data Science can help business
 
Training in Analytics and Data Science
Training in Analytics and Data ScienceTraining in Analytics and Data Science
Training in Analytics and Data Science
 
Tradecraft
Tradecraft   Tradecraft
Tradecraft
 
Software Testing for Data Scientists
Software Testing for Data ScientistsSoftware Testing for Data Scientists
Software Testing for Data Scientists
 
Craps
CrapsCraps
Craps
 
A Data Science Tutorial in Python
A Data Science Tutorial in PythonA Data Science Tutorial in Python
A Data Science Tutorial in Python
 
How does cryptography work? by Jeroen Ooms
How does cryptography work?  by Jeroen OomsHow does cryptography work?  by Jeroen Ooms
How does cryptography work? by Jeroen Ooms
 
Using R for Social Media and Sports Analytics
Using R for Social Media and Sports AnalyticsUsing R for Social Media and Sports Analytics
Using R for Social Media and Sports Analytics
 
Kush stats alpha
Kush stats alpha Kush stats alpha
Kush stats alpha
 
Analyze this
Analyze thisAnalyze this
Analyze this
 

Recently uploaded

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 

Recently uploaded (20)

Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 

Kdd for personalization

  • 1. KDD for Personalization PKDD 2001 Tutorial September 6, 2001 Bamshad Mobasher - DePaul University, Chicago Bettina Berendt - Humboldt University Berlin Myra Spiliopoulou - Leipzig Graduate School of Management Web Personalization • The Problem – dynamically serve customized content (pages, products, recommendations, etc.) to users based on their profiles, preferences, or expected interests • Personalization v. Customization – In customization, user controls and customizes the site or the product based on his/her preferences – usually manual, but sometimes semi-automatic based on a given user profile – Personalization is done automatically based on the user’s actions, the user’s profile, and (possibly) the profiles of others with “similar” profiles PKDD 2001 Tutorial: “KDD for Personalization” [I-2] [2]
  • 2. Customization Example my.yahoo.com my.yahoo.com PKDD 2001 Tutorial: “KDD for Personalization” [I-3] [3] Personalization Example amazon.com amazon.com PKDD 2001 Tutorial: “KDD for Personalization” [I-4] [4]
  • 3. A simplified scheme for personalization what kind? selects - document etc. - query user how? information object(s) - request, specification - rating related to why? - similarity (syntactic/semantic) - co-occurrence in other users´ navigation histories - co-occurrence in user´s other navigation histories system recommends other information object(s) PKDD 2001 Tutorial: "KDD for Personalization" [I-5] ÃÒÓÛ Ì Ý Ù×ØÓÑ Ö ÃÒÓÛÐ × ÈÓÛ Ö Ê Ð Ø ÓÒ× Ô× × ÓÒ Ù×ØÓÑ Ö Ò× Ø ÔÖÓÔ Ð Ò ÓÖ Ò Þ Ø ÓÒ ÖÓÑ × ÑÔÐÝ ØÖ ØÒ Ù×ØÓÑ Ö× ÒØÐÝ ØÓ ØÖ ØÒ Ø Ñ Ö Ð ØÚ ØÓ Ø Ö Ò ×¸ ÔÖ Ö Ò ×¸ Ò Ú ÐÙ ÔÓØ ÒØ Ðº º º º ÃÒÓÛ Ò Ø Ù×ØÓÑ Ö × Ô Ö ÑÓÙÒØ Ò ØÓ Ý³× Ñ Ö ØÔÐ Û Ö Ø Ù×ØÓÑ Ö × ÑÓÖ ÓÔØ ÓÒ׸ Ö Ø Ö Ü Ð ØÝ Ò Ö ÜÔ Ø Ø ÓÒ׺ ººº ÂÓ Ò º Æ × ´ ÒØÙÖ µ Ò Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [I-6]
  • 4. Ù×ØÓÑ Ö ÒÓÛÐ ÑÔÐ × ½ºµ ÕÙ × Ø ÓÒ Ó Ù×ØÓÑ Ö Ø ¾ºµ Ò ÐÝ× × Ó Ù×ØÓÑ Ö Ø ¿ºµ Ø ÓÒ Ò ÓÖ Ò ÛØ Ø Ò Ò× Ø× Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [I-7] ÕÙ × Ø ÓÒ Ó Ù×ØÓÑ Ö Ø Ù×ØÓÑ Ö Ø Ö Ö ÓÖ Ò × Ó ¯ ÔÖ Ö Ò × ¯ ØÖ Ò× Ø ÓÒ× ¯ ÔÖ ¹× Ð × ÓÒØ Ø× ¯ Ø Ö¹× Ð × ×ÙÔÔÓÖØ ¯ ÑÓ Ö Ô Ò ÓÖÑ Ø ÓÒ ËÓÑ Ó Ø × Ø ¬ ÑÝ ÔÙÖ × ÖÓÑ Ø Ö Ô ÖØ × ¬ ÑÝ Ð Ò ÑÙÐØ ÔÐ ×Ô Ö Ø Ø × × Ø Ø × ÖÚ ÓÑÔÐ Ø ÐÝ Ö ÒØ ÔÙÖÔÓ× × ¬ Ö Ó Ú ÖÝ Ò ÕÙ Ð ØÝ Û Ø Ö ×Ô Ø ØÓ ÖÖÓÖ Ö Ø ×¸ Ö Ð Ð Øݸ ÓÚ Ö ¸ Ö ÔÖ × ÒØ Ø Ú Ò ××   Ø ÈÖ Ô Ö Ø ÓÒ Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [I-8]
  • 5. Ò ÐÝ× × Ó Ù×ØÓÑ Ö Ø Ø Ò ÐÝ× × × ÓÙÐ ÔÖÓÚ ÓÒ ÕÙ ×Ø ÓÒ× Ð ¯ Ï Ù× Ö× Û ÐÐ ÓÑ Ù×ØÓÑ Ö× ¯ Ï Ù×ØÓÑ Ö× Û ÐÐ Ö ØÙÖÒ Ò ¯ Ï Ó × ÑÓÖ Ð ÐÝ ØÓ Ö ×ÔÓÒ ØÓ ÔÖÓÑÓØ ÓÒ Ø ÓÒ ¯ Ï Ó ÛÓÙÐ ÒØ Ö ×Ø Ò ÖÓ××¹× Ð »ÙÔ¹× Ð ×Ù ×Ø ÓÒ× ÐÓ× ÐÝ Ö Ð Ø ØÓ ÕÙ ×Ø ÓÒ× Ð ¯ Á× Ø Ï ¹× Ø ÔÔÖÓÔÖ Ø ÐÝ × Ò ØÓ × ÖÚ Ø ÓÖ Ò × Ø ÓÒ³× Ó Ð× ¯ Ö Ø Ù×ØÓÑ Ö× × Ø × ¯ Ö Ø Ù×ØÓÑ Ö× × Ø × ÒÓÙ ØÓ ÓÑ Ò ¯ Ö Ø Ù×ØÓÑ Ö× × Ø × ÒÓÙ ØÓ ÓÑ ÔÖÓÑÓØ Ö× Ó Ø ×Ø   Ø ÅÒÒ Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [I-9] Ø ÓÒ Ò ÓÖ Ò Û Ø Ø Ò Ò× Ø× ¯ Ð ÒÑ ÒØ Ó Ø Ñ Ö Ø Ò ÔÓÐ Ý ¯ Ð ÒÑ ÒØ Ó Ø ×ÙÔÔÐÝ Ò¸ Ò ÐÙ Ò Ø Ö × Ð × ×ÙÔÔÓÖØ ¯ Ù×ØÑ ÒØ Ó Ø Û × Ø ¡ ×Ø Ø × Ø Ö ¹ × Ò ¡ ÖÓÛ× Ò »Æ Ú Ø ÓÒ ×Ù ×Ø ÓÒ× ¡ Ê ÓÑÑ Ò Ø ÓÒ× ÓÒ Ø Ô ¡ ÁÒØ ÐÐ ÒØ ×× ×Ø Ò ¡ È Ö×ÓÒ Ð Þ Ð ÝÓÙØ Ò ÓÒØ ÒØ Ø Ì Ø Ñ Ð ØÛ Ò Ò× Ø Ò Ø ÓÒ × ÓÙÐ Ñ Ò Ñ Þ º Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [I-10]
  • 6. Ì Ø ÓÒ × ÓÙÐ Ö Ø Ú ÐÙ ¯ ÓÖ Ø Ù×ØÓÑ Ö ¯ ÓÖ Ø ÓÖ Ò × Ø ÓÒ Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [I-11] × ÓÖØ Ü ÙÖ× ÓÒ ÓÒ Ú ÐÙ Ö Ø ÓÒ ÁÒ ¾ ¹ ÓÑÑ Ö ¸ × ÒÓØ ×Ù ÒØ ØÓ ¯ Ó Ö Ò Ü ×Ø Ò ÔÖÓ Ù Ø Ø ÖÓÙ Ø ÁÒØ ÖÒ Ø ¯ Ø Þ Ô ÖØ» ÐÐ Ó Ø ÑÖ Ò ÞÒ Ò ¯ ÒØÖÓ Ù Ö ÐÐ ÒØ Ò Û ÔÖÓ Ù Ø Ò Ø ÑÖ Ø Ì ÔÖÓ Ù Ø ÑÙ×Ø Ö Ò Ú ÐÙ ØÓ ¯ ÛÒ Ø Ù×ØÓÑ Ö Ù×ØÓÑ Ö ÓÒÚ Ö× ÓÒ ¯ Ö Ø ÒØ Ù×ØÓÑ Ö Ù×ØÓÑ Ö Ê Ø ÒØ ÓÒ Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [1-12]
  • 7. Ì ÑÓ Ð Ó ÃÙ Ð Ò ÓÒ× Ö× Ø ÓÐÐÓÛ Ò ØÝÔ × Ó Ú ÐÙ ¿¾ ´½µ ÓÑÔ Ö Ø Ú ´¾µ ÑÔÖÓÚ Ò ÒÝ ´¿µ ÑÔÖÓÚ Ò Ø Ú ØÝ ´ µ ÒØ Ö Ø Ú ´ µ ÓÖ Ò × Ø ÓÒ Ð ´ µ ×ØÖ Ø ´ µ ÒÒÓÚ Ø Ú Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [1-13] ÖÓÑ ÕÙ × Ø ÓÒ ØÓ Ø ÓÒ ¯ Ì Ö × ÒÓ Ð Ó Ø º ¡ Ð ×ØÖ Ñ Ø ÙÑÙÐ Ø Ò ØÖ Ñ Ò ÓÙ× Ô º ¡ ÑÓ Ö Ô Ø Ò ÕÙ Ö º ¡ Ù×ØÓÑ Ö ÔÖÓ Ð × Ö Ú Ð Ð ÓÖ Ò ÕÙ Ö º ¯ Ì Ö × ÒÓ Ð Ó Ñ Ø Ó ÓÐÓ × ÓÖ Ø Ò ÐÝ× ×º ¯ Ì Ð ØÝ ØÓ ÜÔÐÓ Ø Ø Ø Ò Ö × × Ø ÑÙ ×ÐÓÛ Ö Ô Ò Ø ÒÙÑ Ö Ó Ô Ö×ÓÒ Ð Þ Ï × Ø × × ÒÓØ Ö ÐÐÝ Ð Ö º ¯ Ì ØÓÐ Ö Ð Ð Ô× ØÑ ØÛ Ò ÕÙ × Ø ÓÒ Ò Ø ÓÒ × ÐÓÛ ½ º Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [I-14]
  • 8. Personalization: An HCI perspective = does personalization increase usability? A Web site’s usability is high if users - achieve their goals / perform their tasks in little time, - do so with a low error rate, - experience high subjective satisfaction. Usability testing: - qualitative and quantitative methods - experts and "normal" users - questionnaires and experiments Usability is a special concern on the Web because unlike with other products / software, "users experience usability first and pay later". (Nielsen [49] [B12]) PKDD 2001 Tutorial: "KDD for Personalization" [I-15] Data Preparation for Personalization PKDD 2001 Tutorial: “KDD for Personalization” [DP-1]
  • 9. Web Usage Mining • Discovery of meaningful patterns from data generated by client-server transactions on one or more Web servers • Typical Sources of Data – automatically generated data stored in server access logs, referrer logs, agent logs, and client-side cookies – e-commerce and product-oriented user events (e.g., shopping cart changes, ad or product click-throughs, etc.) – user profiles and/or user ratings – meta-data, page attributes, page content, site structure PKDD 2001 Tutorial: “KDD for Personalization” [DP-2] What’s in a Typical Server Log? <ip_addr><base_url> -- <date><method><file><protocol><code><bytes><referrer><user_agent> <ip_addr><base_url> <date><method><file><protocol><code><bytes><referrer><user_agent> 203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:21 -0600] "GET /Calls/OWOM.html HTTP/1.0" 200 3942 "http://www.lycos.com/cgi- bin/pursuit?query=advertising+psychology&maxhits=20&cat=dir" "Mozilla/4.5 [en] (Win98; I)" 203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:23 -0600] "GET /Calls/Images/earthani.gif HTTP/1.0" 200 10689 "http://www.acr-news.org/Calls/OWOM.html" "Mozilla/4.5 [en] (Win98; I)" 203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:24 -0600] "GET /Calls/Images/line.gif HTTP/1.0" 200 190 "http://www.acr-news.org/Calls/OWOM.html" "Mozilla/4.5 [en] (Win98; I)" 203.30.5.145 www.acr-news.org - [01/Jun/1999:03:09:25 -0600] "GET /Calls/Images/red.gif HTTP/1.0" 200 104 "http://www.acr-news.org/Calls/OWOM.html" "Mozilla/4.5 [en] (Win98; I)" 203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:31 -0600] "GET / HTTP/1.0" 200 4980 "" "Mozilla/4.06 [en] (Win95; I)" 203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:35 -0600] "GET /Images/line.gif HTTP/1.0" 200 190 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)" 203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:35 -0600] "GET /Images/red.gif HTTP/1.0" 200 104 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)" 203.252.234.33 www.acr-news.org - [01/Jun/1999:03:32:35 -0600] "GET /Images/earthani.gif HTTP/1.0" 200 10689 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)" 203.252.234.33 www.acr-news.org - [01/Jun/1999:03:33:11 -0600] "GET /CP.html HTTP/1.0" 200 3218 "http://www.acr-news.org/" "Mozilla/4.06 [en] (Win95; I)"
  • 10. The Web Usage Mining Process C ontent and S tructure D ata P re processing P attern D iscove ry P attern A n alysis R aw U sage P reprocessed "Interesting" R ules, P atterns, D ata C lickstream R ules, P atterns, and S tatistics D ata and S tatistics PKDD 2001 Tutorial: “KDD for Personalization” [DP-4] Usage Data Preprocessing Raw Usage Data Data User/Session Page View Path Cleaning Identification Identification Completion Server Session File Episode Identification Usage Statistics Site Structure and Content Episode File PKDD 2001 Tutorial: “KDD for Personalization” [DP-5]
  • 11. Data Preprocessing for Web Usage Mining • Data cleaning – remove irrelevant references and fields in server logs – remove references due to spider navigation – remove erroneous references – add missing references due to caching (done after sessionization) • Data integration – synchronize data from multiple server logs – integrate e-commerce and application server data – integrate meta-data (e.g., content labels) – integrate demographic / registration data PKDD 2001 Tutorial: “KDD for Personalization” [DP-6] Data Preparation for Web Usage Mining (Cooley, Mobasher, Srivastava, 1999 [15]) • Data Transformation – user identification – sessionization / episode identification – pageview identification • a pageview is a set of page files and associated objects that contribute to a single display in a Web Browser • Data Reduction – sampling and dimensionality reduction (ignoring certain pageviews / items) • Identifying User Transactions (i.e., sets or sequences of pageviews possibly with associated weights) PKDD 2001 Tutorial: “KDD for Personalization” [DP-7]
  • 12. User and Session Identification: Need for Reliable Usage Data • Validity of results in Web usage mining is affected by the ability to: – distinguish among different users to a site – reconstruct the activities of the users within the site • Difficult to obtaining reliable usage data – proxy servers and anonymizers – rotating IP addresses connections through ISPs – missing references due to caching – inability of servers to distinguish among different visits PKDD 2001 Tutorial: “KDD for Personalization” [DP-8] Identifying Users and Sessions • Server log L is a list of log entries each containing timestamp, host identifier, URL request (including URL stem and query), referrer, agent, cookie, etc. • User identification and sessionization – user activity log is a sequence of log entries in L belonging to the same user – user identification is the process of partitioning L into a set of user activity logs – the goal of sessionization is to further partition each user activity log into sequences of entries corresponding to each user visit PKDD 2001 Tutorial: “KDD for Personalization” [DP-9]
  • 13. Sessionization Heuristics • Real v. Constructed Sessions – Conceptually, the log L is partitioned into an ordered collection of “real” sessions R – Each heuristic h partitions L into an ordered collection of “constructed sessions” Ch – The ideal heuristic h*: Ch* = R • Two Basic Types of Sessionization Heuristics – Time-oriented heuristics – Navigation-oriented heuristics PKDD 2001 Tutorial: “KDD for Personalization” [DP-10] Time-Oriented Heuristics • Consider boundaries on time spent on individual pages or in the entire a site during a single visit – Boundaries can be based on a maximum session length or maximum time allowable for each pageview – Additional granularity can be obtained by treating different boundaries on different (types of) pageviews h1: Given t0, and a threshold θ, the timestamp for first request in a constructed session S, the request with timestamp t is assigned to S, iff t - t0 ≤ θ. h2: Given t1, and a threshold δ, the timestamp for a request in constructed session S, the next request with timestamp t2 is assigned to S, iff t2 - t1 ≤ δ. PKDD 2001 Tutorial: “KDD for Personalization” [DP-11]
  • 14. Navigation-Oriented Heuristics • Take the linkage between pages into account – “linkage” can be based on site topology (e.g., split a session at a request that could not have been reached from previous requests in the session) – or can be usage-based (using referrers in log entries) • usually more restrictive than topology-based heuristics and more difficult to implement in frame-based sites href: Given two consecutive requests p and q, with p belonging to constructed session S. Then q is assigned to S, if the referrer for q was previously invoked in S, or if the referrer for q is “undefined” and tq - tp ≤ ∆ (time delay ∆ is to allow for proper loading of frameset pages). PKDD 2001 Tutorial: “KDD for Personalization” [DP-12] Measures for Sessionization Accuracy (Berendt, Mobasher, Spiliopoulou, 2001 [7]) • A heuristic h maps entries in the log L into elements of constructed sessions, such that: – (a) each entry in L is mapped to exactly one element of a constructed session – (b) the mapping is order-preserving • Measures quantify the successful mappings of real sessions to constructed sessions – a measure M evaluates a heuristic h based on the differences between Ch and R – each measure assigns to h a value M(h) ∈ [0,1] so that M(h*) = 1 PKDD 2001 Tutorial: “KDD for Personalization” [DP-13]
  • 15. Measures for Sessionization Accuracy • Categorical and Gradual Measures – categorical measures: based on the number of real sessions that are reconstructed by the heuristics – gradual measures: based on the degree to which the real sessions are reconstructed by the heuristics PKDD 2001 Tutorial: “KDD for Personalization” [DP-14] Categorical Measures • Based on the notion of “complete reconstruction” – a real session is completely reconstructed if all its elements are contained in the same constructed session – the measure Mcr(h) is the ratio of the number of completely reconstructed real sessions in Ch to the total number of real sessions |R| PKDD 2001 Tutorial: “KDD for Personalization” [DP-15]
  • 16. Categorical Measures • Derived categorical measures: – Mcrs considers only completely reconstructed real sessions whose first element is also the first element of a constructed session – Mcre considers only completely reconstructed real sessions whose last element is also the last element of a constructed session – Mcrse considers only completely reconstructed real sessions with correct starts and ends • in absence of overlapping real sessions for individual users, this gives the number of constructed sessions that are identical to corresponding real sessions PKDD 2001 Tutorial: “KDD for Personalization” [DP-16] Gradual Measures • Allow for measuring partial overlaps between real and constructed sessions – degree of overlap between real sessions r and constructed session c, dego(r,c), is the number of elements they have in common divided by total number of elements in r. – degree of overlap for a real session r is the maximum dego(r,c) over all constructed sessions c. – the measure Mo(h) is the average degree of overlap over all real sessions – if a real session is completely reconstructed, its overlap degree is 1 PKDD 2001 Tutorial: “KDD for Personalization” [DP-17]
  • 17. Gradual Measures • To take the size of constructed session into account, we define the degree of similarity – degs(r,c) = | r ∩ c | / | r ∪ c | – Ms(h) is is the average degree of similarityt over all real sessions – if a real session is completely reconstructed, its similarity degree is 1 PKDD 2001 Tutorial: “KDD for Personalization” [DP-18] Which Measures? • The choice of the measures depends on the goals of usage analysis, for example: – “complete reconstruction” may be appropriate for clustering and association-based analyses (it correctly shows set of pages accessed together) • it also preserves sequential order of accesses, so it can be used for the analysis of users’ navigational behavior – Mcrs: useful for analyzing access to entry points – Mcre: useful for analyzing access to exit points – overlap-based measures can be useful for comparing overall effectiveness of sessionization heuristics in grouping pages or objects PKDD 2001 Tutorial: “KDD for Personalization” [DP-19]
  • 18. Which Sessionization Heuristics? • The choice of sessionization heuristic depends on the characteristics of the data – if individual users visit the site in short but temporally dense sessions, h2 may perform better than h1 – in cases when timestamps are not reliable (e.g., using integrated data across many log files), href may be a better choice for sessionization – referrer-based heuristics tend to perform worse in highly dynamic, frame-based sites PKDD 2001 Tutorial: “KDD for Personalization” [DP-20] Comparison of Sessionization Heuristics h1-30 h2-10 h-ref •• cookies used to identify cookies used to identify unique users unique users 1.00 •• server generated session server generated session 0.95 variable used to identify variable used to identify 0.90 “real” sessions “real” sessions 0.85 •• site was frame-based and site was frame-based and 0.80 highly dynamic highly dynamic 0.75 •• thresholds of 30 and 10 thresholds of 30 and 10 0.70 minutes were used for h1 minutes were used for h1 and h2, respectively and h2, respectively 0.65 •• href performed poorly, due href performed poorly, due 0.60 to propagated errors in to propagated errors in 0.55 misclassified frameset misclassified frameset 0.50 references references M_o M_crse M_cr M_crs M_cre M_s •• 30% of users had multiple 30% of users had multiple IP addresses (coming from IP addresses (coming from behind proxy servers) behind proxy servers) PKDD 2001 Tutorial: “KDD for Personalization” [DP-21]
  • 19. Mechanisms for User Identification Method Description Priv acy Adv antages Disadv antages Concerns IP A ddre s s + A s s um e e a c h unique Lo w A lw a ys a va ila ble . N o N o t g ua ra nte e d to be A g e nt IP a ddre s s /A g e nt a dditio na l unique . D e fe a te d by pa ir is a unique us e r te c hno lo g y re quire d. ro ta ting IP s . E m be dde d U s e dyna m ic a lly Lo w to A lw a ys a va ila ble . C a nno t c a pture S e s s io n Ids g e ne ra te d pa g e s to m e dium Inde pe nde nt o f IP re pe a t vis ito rs . a s s o c ia te ID w ith a ddre s s e s . A dditio na l o ve rhe a d e ve ry hype rlink fo r dyna m ic pa g e s . R e g is tra tio n U s e r e xplic itly lo g s M e dium C a n tra c k M a ny us e rs w o n't in to the s ite . individua ls no t jus t re g is te r. N o t bro w s e rs a va ila ble be fo re re g is tra tio n. C o o k ie S a ve ID o n the c lie nt M e dium to C a n tra c k re pe a t C a n be turne d o ff by m a c hine . hig h vis its fro m s a m e us e rs . bro w s e r. S o ftw a re P ro g ra m lo a de d into H ig h A c c ura te us a g e da ta Lik e ly to be re je c te d A g e nts bro w s e r a nd s e nds fo r a s ing le s ite . by us e rs . ba c k us a g e da ta . PKDD 2001 Tutorial: “KDD for Personalization” [DP-22] Impact of User Identification Heuristics These experiments show the impact of using IP+Agent heuristic for user These experiments show the impact of using IP+Agent heuristic for user identification on sessionization heuristics (as compared to cookies) identification on sessionization heuristics (as compared to cookies) h1-30-real h1-30-ipa h -ref-real h -ref-ipa 1.00 1.00 0.90 0.90 0.80 0.80 0.70 0.70 0.60 0.60 0.50 0.50 0.40 0.40 0.30 0.30 _s _o r e rs re _s r e _o rs re _c _c rs rs _c _c _c _c M M M M _c M _c M M M M M M M PKDD 2001 Tutorial: “KDD for Personalization” [DP-23]
  • 20. Inferring User Transactions from Sessions • Observation: reference lengths follow an exponential distribution • Page types correlate with Histogram of reference lengths page reference lengths (secs) • Page types: navigational, content, or hybrid • Can automatically classify pages as navigational or content using statistical modeling • A transaction can be defined as an intra-session path ending in a content page, or as a set of navigational content content pages in a session pages pages PKDD 2001 Tutorial: “KDD for Personalization” [DP-24] Path Completion • Refers to the problem of inferring missing user references due to caching. • Effective path completion requires extensive knowledge of the link structure within the site • Referrer information in server logs can also be used in disambiguating the inferred paths. • Problem gets much more complicated in frame- based sites. PKDD 2001 Tutorial: “KDD for Personalization” [DP-25]
  • 21. Path Completion - An Example A User’s navigation path: A => B => D => E => D => B => C URL Referrer B C A -- B A D B E D D E F C B • There may be multiple candidates for completing the path. For example consider the two paths : E => D => B => C and E => D => B => A => C. • In this case, the referrer field allows us to partially disambiguate. But, what about: E => D => B => A => B => C? • One heuristic: always take the path that requires the fewest PKDD 2001 Tutorial: “KDD for Personalization” [DP-26] Integrating E-Commerce Events • Either product oriented or visit oriented • Not necessarily a one-to-one correspondence with user actions • Used to track and analyze conversion of browsers to buyers • Major difficulty for E-commerce events is defining and implementing the events for a site – however, in contrast to clickstream data, getting reliable preprocessed data is not a problem • Another major challenge is the successful integration with clickstream data PKDD 2001 Tutorial: “KDD for Personalization” [DP-27]
  • 22. Product-Oriented Events • Product View – Occurs every time a product is displayed on a pageview – Typical Types: Image, Link, Text • Product Click-through – Occurs every time a user “clicks” on a product to get more information • Category click-through • Product detail or extra detail (e.g. large image) click- through • Advertisement click-through PKDD 2001 Tutorial: “KDD for Personalization” [DP-28] Product-Oriented Events • Shopping Cart Changes – Shopping Cart Add or Remove – Shopping Cart Change - quantity or other feature (e.g. size) is changed • Product Buy or Bid – Separate buy event occurs for each product in the shopping cart – Auction sites can track bid events in addition to the product purchases PKDD 2001 Tutorial: “KDD for Personalization” [DP-29]
  • 23. Content and Structure Preprocessing • Processing content and structure of the site are often essential for successful usage analysis • Two primary tasks: – determine what constitutes a unique page file (i.e., pageview) – represent content and structure of the pages in a quantifiable form PKDD 2001 Tutorial: “KDD for Personalization” [DP-30] Content and Structure Preprocessing • Basic elements in content and structure processing – creation of a site map • captures linkage and frame structure of the site • also needs to identify script templates for dynamically generated pages – extracting important content elements in pages • meta-information, keywords, internal and external links, etc. – identifying and classifying pages based on their content and structural characteristics PKDD 2001 Tutorial: “KDD for Personalization” [DP-31]
  • 24. Quantifying Content and Structure • Static Pages – All of information is contained within the HTML files for a site – Each file can be parsed to get a list of links, frames, images, and text – Files can be obtained through the file system, or HTTP requests from an automated agent (site spider) PKDD 2001 Tutorial: “KDD for Personalization” [DP-32] Quantifying Content and Structure • Dynamic Pages – Pages do not exist until they are created due to a specific request – Relevant information can come from a variety of sources: Templates, databases, scripts, HTML, etc. – Three methods of obtaining content and structure information: • Series of HTTP requests from a site mapping tool • Compile information from internal sources • Content server tools PKDD 2001 Tutorial: “KDD for Personalization” [DP-33]
  • 25. Integrating content and structure I Domain knowledge: content - purpose: group pages by their content - method: analyze text, meta-tags, and/or URL (query string) - grouping by classification or clustering Concept hierarchies Entertainment Performing Music ... Example of a Arts content-based Artists Genres New Releases ... concept hierarchy Blues Jazz New Age ... PKDD 2001 Tutorial: "KDD for Personalization" [DP-34] Integrating content and structure II Content profiles from feature clusters 1, vector space model: each unique word in corpus = one dimension, each page(view) is a vector with a non-zero weight for each word in that page(view), zero weight for other words 2. feature - pageview matrix (note: "feature" = word, "pageview" because of frames) music jazz artist ... pv1 1.00 0.80 0.05 pv2 1.00 0.00 0.70 ... 3. features as weighted vectors of pageviews jazz = [ <pv1,0.80>, <pv2,0.00>, ... ] 4. group features -> feature clusters -> content profiles PKDD 2001 Tutorial: "KDD for Personalization" [DP-35]
  • 26. Integrating content and structure III Structure - purpose: group pages by their hyperlink structure - ex. page types in Pirolli et al. [54] and Cooley et al. [B20]: [B24] [15]: head, navigation, content, look-up, personal - ex. path distance to a reference page A.html B.html C.html dA = 1 dA = 2 - structure as weighted vector of page(view)s S = [ <A.html,0>, <B.html,1>, <C.html,0>, ... ](only B content page) S = [ <A.html,0>, <B.html,1>, <C.html,3>, ... ] (path distances) - grouping by classification or clustering PKDD 2001 Tutorial: "KDD for Personalization" [DP-36] Relating content and structure to mined usage I : Content/structure mining as pre-/post-processing steps Ex. online catalog search (Berendt & Spiliopoulou [B18, B17]): [8, 6]): 1. service-based concept hierarchy: which query options? Info on schools indiv. school list of schools ... 1 parameter 2 par.s 3 parameters Location Name ... Location+Name ... ... PKDD 2001 Tutorial: "KDD for Personalization" [DP-37]
  • 27. Relating content and structure to mined usage I 2. discovering and comparing navigation patterns in classified pages part of a resulting WUM navigation pattern: PKDD 2001 Tutorial: "KDD for Personalization" [DP-38] Relating content and structure to mined usage I Ex. WebSIFT Information Filter (from Cooley [14]): [B19]): Mined knowledge domain know- interesting belief example ledge source general site structure The head page is not the most usage statistics common entry point general site content A page designed to provide usage statistics content is being used as a navigation page frequent itemsets site structure A set of pages is frequently accessed together, but not usage clusters site content directly linked A usage cluster contains => discover patterns at different pages from multiple content levels of abstraction, discover categories deviations from intended usage PKDD 2001 Tutorial: "KDD for Personalization" [DP-39]
  • 28. Relating content and structure to mined usage II : Usage, content, and structure mining as 3 ways of deriving a common kind of representation Mobasher, Dai, Luo, Sun, & Zhu [44] [B22] - a vector of tuples <pageview,weight>: usage: sessions / visits, or parts of them (past + current) content: features structure: pages and their characteristics - unordered or ordered collections => identify clusters that are similar, where similarity is by usage, content, or structure PKDD 2001 Tutorial: "KDD for Personalization" [DP-40] È ØØ ÖÒ × ÓÚ ÖÝ ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ ÅÝÖ ËÔ Ð ÓÔÓÙÐÓÙ ÀÀÄ º[PD-1] ºº
  • 29. Ï ÒØ Ý Ø ÓÐÐÓÛ Ò ×Ô Ø× Ó Ø Ô Ö×ÓÒ Ð Þ Ø ÓÒ × ÖÚ ×¸ Û Ò ÒÚ × ×Ø Ö ×ÙÐØ Ó Ô ØØ ÖÒ × ÓÚ ÖÝ Î × Ð ØÝ Ë ÖÚ Ð Ñ ÒØ ¯ Ô Ö×ÓÒ Ð Ö ÓÑÑ Ò Ø ÓÒ ¯ ´Ð Ò ØÓµ Ô ¯ × Ð ÒØ ÝÒ Ñ Ù×ØÑ ÒØ ¯ ÔÔÐ Ø ÓÒ Ó Ø ¯ ×Ø Ø Ô »× Ø Ù×ØÑ ÒØ Å Ø Ò × ÓÒ ÕÙ × Ø ÓÒ Ø ÐÐ Ø ÓÒ ¯ Ù× Ö ÔÖÓ Ð × ¯ ÐÐ ×Ø Ô× ÓÒ¹Ð Ò ¯ Ù× Ö Ö Ø Ò × ¯ Ó ¹Ð Ò Ô ØØ ÖÒ × ÓÚ ÖÝ ¯ Ù× Ö Ú ÓÙÖ ² ÓÒ¹Ð Ò Ñ Ø Ò ¯ ÓÒØ ÒØ Ó Ó Ø× Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ ÅÝÖ ËÔ Ð ÓÔÓÙÐÓÙ ÀÀÄ º[PD-2] ºº È ØØ ÖÒ × ÓÚ ÖÝ ÔØ Ú Û × Ø × Ì ÔÔÖÓ Ó È Ö ÓÛ ØÞ ² ØÞ ÓÒ ¾¸ ¿ Ì ÁÒ Ü Ò Ö ÓÒ× ×Ø× Ó Ø Ö Ô × × ½º ÄÓ ÔÖÓ ×× Ò ×Ø Ð × Ñ ÒØ Ó × ×× ÓÒ× × × Ø× Ó Ô Ö ÕÙ ×Ø× ¾º ÐÙ×Ø Ö Ñ Ò Ò ÖÓÙÔ Ò Ó Ó¹Ó ÙÖ Ò ÒÓÒ¹Ð Ò Ô × ÛØ ÐÔ Ó Ø ×Ø Ö Ô ¿º ÓÒ ÔØÙ Ð ÐÙ×Ø Ö Ò ¡ Ì Ö ÔÖ × ÒØ Ø Ú ÓÒ ÔØ Ó ÐÙ×Ø Ö × ÒØ º ¡ ÐÙ×Ø Ö Ñ Ñ Ö× ÒÓØ Ö Ò ØÓ Ø × ÓÒ ÔØ Ö Ö ÑÓÚ ÖÓÑ Ø ÐÙ×Ø Öº ¡ È × Ö Ò ØÓ Ø × ÓÒ ÔØ Ò ÒÓØ ÔÔ Ö Ò Ò Ø ÐÙ×Ø Ö Ö ØØ ØÓ Ø ÐÙ×Ø Öº Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-3]
  • 30. ÓÖ ÐÙ×Ø Ö¸ Ø ÁÒ Ü Ò Ö ÔÖ × ÒØ× ØÓ Ø Ï × ÒÖ ¯ Ò Ò ÜÔ Û Ø Ð Ò × ØÓ ÐÐ Ô ×Ó ÐÙ×Ø Ö Ì Ï × ÒÖ × ¬ Û Ø ÖØ Ò ÛÔ × ÓÙÐ Ò ×Ø Ð × ¬ Û Ø Ø× Ð Ð × ÓÙÐ ¬ Û Ö Ø × ÓÙÐ ÐÓ Ø Ò Ø × Ø ÓÖ Ò ØÓ ÓÙÖ Ø ÓÖ Þ Ø ÓÒ Î × Ð ØÝ Ë ÖÚ Ð Ñ ÒØ Ô ÓÒØ Ò Ò ËØ Ø Ô »× Ø Ù×ØÑ ÒØ × Ò Ð ÔÔÐ Ø ÓÒ Ó Ø Å Ø Ò × ÓÒ Ç ¹Ð Ò Ô ØØ ÖÒ × ÓÚ ÖÝ Ù× Ö Ú ÓÙÖ Ò Ô ÓÒØ ÒØ Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-4] È ØØ ÖÒ × ÓÚ ÖÝ ÓÖ Ê ÓÑÑ Ò Ø ÓÒ× Ì ÓÐÐ ÓÖ Ø Ú ÐØ Ö Ò ÔÔÖÓ Å Ò Ì Ó Ø× ×Ù ×Ø ØÓ Ù× Ö Ö Ø Ó× ÔÖ ÖÖ Ý Ù× Ö× × Ñ Ð Ö ØÓ Öº ½º Ì Ù× Ö³× ØÖ Ò× Ø ÓÒ × Ñ Ø Ò×Ø ÐÓ ØÖ Ò× Ø ÓÒ׺ ¾º Ì Ñ Ø × Ö Ö Ò º ¿º Ì ×Ø ´× Ø Ó µ Ñ Ø ´ ×µ Ö × Ð Ø º º Ì Ó Ø× Ø Ø Û Ö × ÓÛÒ Ò Ø × Ð Ø ØÖ Ò× Ø ÓÒ× Ö ÖÒ Ü ÐÙ Ò Ó Ø× ÐÖ Ý × Òº º Ì Ó Ø× Û Ø Ø ÖÑÓ×Ø Ö Ò Ö × ÓÛÒ ØÓ Ø Ù× Öº ÐÐ ×Ø Ô× ÓÒ¹Ð Ò Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-5]
  • 31. È ØØ ÖÒ × ÓÚ ÖÝ ÓÖ Ê ÓÑÑ Ò Ø ÓÒ× Ì Ø Å Ò Ò ÔÔÖÓ Å Ò Í× Ö × Ñ Ð Ö ØÝ Ò Ò Ò Ø ÖÑ× Ó Ú ÓÙÖ¸ ÒØ Ö ×Ø׸ ÔÖ Ö Ò × Ø Ø Ø Ò ÑÓ ÐÐ Ó ¹Ð Ò ½º È ØØ ÖÒ × ÓÚ ÖÝ ÓÚ Ö Ø ÐÓ Ø ¾º Ì ÓÒØ ÒØ× Ó Ø Ù× Ö³× ØÖ Ò× Ø ÓÒ Ö Ñ Ø Ò×Ø Ø × ÓÚ Ö Ô ØØ ÖÒ׺ ¿º Ì Ñ Ø × Ö Ö Ò º º Ì Ó Ø× ××Ó Ø Û Ø Ø ×Ø Ñ Ø × Ö Ö Ò Ü ÐÙ Ò Ó Ø× ÐÖ Ý × Òº º Ì Ó Ø× Û Ø Ø ÖÑÓ×Ø Ö Ò Ö × ÓÛÒ ØÓ Ø Ù× Öº ×Ó Ø Ø µ Ì ÚÓÐÙÑ ØÒÓÙ× ÐÓ × Ô Ö ÓÖÑ Ö ÓÒÐÝ ÔÖÓ Ö Ú×× ÔÓØعÐÖÒ׺º µ ÇÒ¹Ð Ò Ñ Ò Ø Ò×Ø Ò Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-6] È ØØ ÖÒ × ÓÚ ÖÝ Ê ÓÑÑ Ò Ø ÓÒ× ÓÒ ÓÖÖ Ð Ø Ø Ñ× Ì ÔÔÖÓ Ó ÎÙ Ø Ò Ç Ö ÓÚ ¼ Ì Ö ÓÑÑ Ò Ø ÓÒ ÔÖÓ Ð Ñ × Ò × Ú Ò Ø Ö ØÒ × Ó Ø Ø Ú Ù× Ö ÓÒ × Ø Ó Ø Ñ׸ Û Û ÐÐ Ö Ö Ø Ò × ÓÒ Ø Ö Ñ Ò Ò Ø Ñ× Ì Ö ØÒ × Ó Ò Ø Ñ Ò ÔÖ Ø ÖÓÑ Ø Ö ØÒ × Å Ò ÓÒ ÓÖÖ Ð Ø Ø Ñ׺ Î × Ð ØÝ Ë ÖÚ Ð Ñ ÒØ ÔÔÐ Ø ÓÒ Ó Ø È Ö×ÓÒ Ð Ö ÓÑÑ Ò Ø ÓÒ Å Ø Ò × ÓÒ Ê Ø¹ Ç ¹Ð Ò × ÓÚ ÖÝ Ó ÔÖ ØÓÖ× ÓÖ Ø Ò × Ó ÓÖÖ Ð Ø Ø Ñ× ÑÔ Ø Ó Ø Ñ ÓÖÖ Ð Ø ÓÒ ÓÒ Ö Ø Ò × Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-7]
  • 32. Å Ø Ó ÓÐÓ Ý ¯ Ì Ö ØÒ Ó Ø Ñ Ú Ò ÒÓØ Ö Ø Ñ × ÔÔÖÓÜ Ñ Ø Ù× Ò Ð Ò Ö ÙÒ Ø ÓÒ ´Ò Ñ ÜÔ Öصº ¯ Ì ÚÖ ÓÖÖ Ð Ø ÓÒ ÑÓÒ Ô Ö× Ó Ø Ñ× × ÔÔÖÓÜ Ñ Ø Ù× Ò Ö Ò ÓÑ × ÑÔÐ Ò ÓÚ Ö Ø Ù× Ö Ö Ø Ò ×º ¯ Û Ø Ò × Ñ × ÔÖÓÔÓ× ØÓ Ð ÛØ Ø Ø Ø Ø Ù× Ö× Û Ø × Ñ Ð Ö ÔÖ Ö Ò × Ñ Ý ÔÖÓÚ Ö ÒØ Ö Ø Ò × ÓÖ Ø × Ñ × Ø Ó Ø Ñ׺ ÁÒ Ø × × Ñ ¬ Ì Ð Ò Ö ÜÔ ÖØ× ÓÖ ÐÐ Ô Ö× Ó Ø Ñ× Ò ÓÑÔÙØ Ó ¹Ð Ò º ¬ Ì Ö Ø Ò × ÓÖ Ò Ø Ú Ù× Ö Ö ÔÖ Ø ÖÓÑ Ø × Ø Ó Ô Ö× Ó Ø Ñ× Ö Ø Ö Ø Ò Ø × Ø Ó Ù× Ö Ö Ø Ò ×º Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-8] È ØØ ÖÒ × ÓÚ ÖÝ Ê Ô Ø¹ ÙÝ Ò Ø ÓÖÝ ÓÖ Ô Ö×ÓÒ Ð Þ Ø ÓÒ Ì ÔÔÖÓ Ó Ý Ö¹Ë ÙÐÞ Ø Ð ¾ Å Ò µ Ê ÓÑÑ Ò Ø ÓÒ× Ö × ÓÒ ÓÖÖ Ð Ø ÔÖÓ Ù Ø׺ µ ÓÖÖ Ð Ø ÓÒ× Ò ÒØ ÛØ Ö Ò Ö ³× Ö Ô Ø¹ ÙÝ Ò Ø ÓÖݸ µ Ø Ö Ù×Ø Ò Ø ØÓ Ø Ô ÖØ ÙÐ Ö Ø × Ó ÒÓÒÝÑÓÙ× Ù× Ö × ×× ÓÒ׺ ÓÖ Ò ØÓ ÓÙÖ Ø ÓÖ Þ Ø ÓÒ Î × Ð ØÝ Ê ÓÑÑ Ò Ø ÓÒ Ó Ò¹ Ë ÖÚ Ð Ñ ÒØ ÔÔÐ Ø ÓÒ ÓÖÑ Ø ÓÒ ÔÖÓ Ù Ø× Ó Ø ÓÖ ÍÊÄ Å Ø Ò × ÓÒ Ù× Ö ÔÖ Ö¹ Ç ¹Ð Ò × ÓÚ ÖÝ Ó ÓÖÖ Ð Ø Ò × ÓÖ ÔÔÐ Ø ÓÒ Ó Ø× ÔÔÐ Ø ÓÒ Ó Ø× Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-9]
  • 33. Ö Ò Ö ³× Ö Ô Ø¹ ÙÝ Ò Ø ÓÖÝ ¡ ÔÖ Ø× ÙÝ Ö Ú ÓÙÖ ÖÓÑ ´ µ Ô Ò ØÖ Ø ÓÒ Ò ´ µ Ú Ö ÔÙÖ × Ö ÕÙ Ò Ý Ó Ò Ø Ñ ¡ Ý ÔÖÓÚ Ò Ö Ö Ò ÑÓ Ð Ø Ø Ö Ø Ö Þ × Ö Ô Ø Ó¹Ó ÙÖ Ò ÔÙÖ × × Ó Ø Ñ× × Ö Ò ÓÑ ÓÖ ÒÓØ Ö Ò ÓÑ Û Ö Ô Ò ØÖ Ø ÓÒ Ö Ö× ØÓ Ø ÔÖ Ö Ò Ó Ù×ØÓÑ Ö ÓÖ Ö Ò Ú Ö ÔÙÖ × Ö ÕÙ Ò Ý Ö Ö× ØÓ Ö Ô Ø ÔÙÖ × × Ó Ø Ø Ñ¸ ÒÓÖ Ò Ö Ø Ö ×Ø × Ó Ø Ø Ñ¸ ÑÓÙÒØ Ó Ø Ø Ñ Ò × Þ Ó Ø ÔÙÖ × × Û ÓÐ º Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-10] ××ÙÑÔØ ÓÒ× Ó ¾ ¬ Ì ÔÖÓ Ð ØÝ Ó Ö Ó¹Ó ÙÖ Ò × Ó ØÛÓ ÔÖÓ Ù Ø× Ò ×Ù × ÕÙ ÒØ ÔÙÖ × × ÓÐÐÓÛ× ÐÓ ÖØ Ñ × Ö × ×ØÖ ÙØ ÓÒº ¬ ËÙ × ÕÙ ÒØ ÔÙÖ × × Ó Ø × Ñ Ù×ØÓÑ Ö´×µ Ò Ó × ÖÚ × ÕÙ Ú Ð ÒØ ØÓ × Ø Ó ÔÙÖ × × ×× ÓÒ× ÙÖ Ò Ø ÐÓ Ô ÖÓ º Å Ø Ó ÓÐÓ Ý ¯ ÓÑÔÙØ Ø ÓÒ Ó Ø Ö ÕÙ Ò Ý ×ØÖ ÙØ ÓÒ× Ó ÐÐ Ó¹Ó ÙÖ Ò × Ó ÔÖÓ Ù Ø Ô Ö׸ ÓÙÒØ Ò ÓÒ Ó¹Ó ÙÖ Ò Ô Ö × ×× ÓÒ ÓÒÐÝ ¯ Ð Ñ Ò Ø ÓÒ Ó ×ØÖ ÙØ ÓÒ× Û Ø ×Ñ ÐÐ ÒÙÑ Ö Ó Ó × ÖÚ Ø ÓÒ× ¯ Ð Ñ Ò Ø ÓÒ Ó Ø Ô Ö ÒØ Ð Ó Ø Ö Ô Ø¹ ÙÝ Ô Ö× ¯ ÓÑÔÙØ Ø ÓÒ Ó Ø Ó¹Ó ÙÖ Ò ÔÖ ØÓÖ ÓÖ Ô Ö ×Ó Ø Ø ÓÙØÐ Ö× ÓÖ ÔÖ ØÓÖ Ò Ó × ÖÚ × ÓÖÖ Ð Ø Ø Ñ׺ Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-11]
  • 34. Pattern Discovery Association mining for personalization Basic Idea: match left-hand side of rules with the active user session and recommend items in the rule’s consequent Essential to store patterns in efficient data structures • the search of all rules in real-time is computationally ineffective Ordering of accessed pages is not taken into account Good recommendation accuracy, but the main problem is “coverage” • high support thresholds lead to low coverage and may eliminate important, but infrequent items from consideration • low support thresholds result in very large model sizes and computationally expensive pattern discovery phase PKDD 2001 Tutorial: “KDD for Personalization” [PD-12] [1] Association Mining - Basic Concepts We start with a set I of items and a set D of transactions. A transaction T is a set of items (a subset of I): I = { i1 , i 2 ,..., i m } T ⊆ I An Association Rule is an implication on itemsets X and Y, denoted by X ==> Y, where X ⊆ I , Y ⊆ I , X ∩Y =∅ The rule meets a minimum confidence of c, meaning that c% of transactions in D which contain X also contain Y. In addition for each itemset a minimum support of s must be satisfied: s ≤ X ∪Y / D c ≤ X ∪Y / X PKDD 2001 Tutorial: “KDD for Personalization” [PD-13] [2]
  • 35. È ØØ ÖÒ × ÓÚ ÖÝ ××Ó Ø » ××Ó Ø Ø Ñ× Ò Ù× Ö× Ì ÔÔÖÓ Ó Ä Ò¸ ÐÚ Ö Þ ² ÊÙ Þ ¿ Å Ò µ Í× Ö× Ö ××Ó Ø ØÓ ÓØ Ö Ò Ø ÖÑ× Ó ÓÛ Ø Ý Ö Ø Ø Ñ׺ µ ÁØ Ñ× Ö ××Ó Ø ØÓ ÓØ Ö Û Ø Ö ×Ô Ø ØÓ Ù× Ö ÔÖ Ö Ò ×º ××Ó Ø ÓÒ× ÑÓÒ Ø Ñ× Ò ÓÙÒ Ó ¹Ð Ò º ××Ó Ø ÓÒ× ØÓ Ø Ø Ú Ù× Ö Ò ÓÙÒ ÓÒ¹Ð Ò º ÓÖ Ò ØÓ ÓÙÖ Ø ÓÖ Þ Ø ÓÒ Î × Ð ØÝ Ë ÖÚ Ð Ñ ÒØ ÔÔÐ Ø ÓÒ Ó Ø È Ö×ÓÒ Ð Ö ÓÑÑ Ò Ø ÓÒ Å Ø Ò × ÓÒ ××Ó Ø ÓÒ× ÇÒ¹Ð Ò × ÓÚ ÖÝ Ó ××Ó º ÑÓÒ Ø Ñ× Ò ÑÓÒ Ù× Ö× ÖÙÐ × Û Ø Ú Ò ÊÀË Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-14] Å Ø Ó ÓÐÓ Ý ¯ Ê ÓÑÑ Ò Ø ÓÒ× Ö ×Ù Ø ØÓ Ñ Ò ÑÙÑ ÓÒ Ò Ò Ñ Ò ÑÙÑ ÒÙÑ Ö Ó ÖÙÐ × ÓÒ×ØÖ ÒØ׺ ¯ Ì Ñ Ò Ö × ÓÚ Ö× ××Ó Ø ÓÒ ÖÙÐ × Ø Ö Ø Ú Ðݸ ÙÒØ Ð Ø ×Ö ÒÙÑ Ö Ó ÖÙÐ × × ÜØÖ Ø º Ì ×ÙÔÔÓÖØ ÙØÓ × Ù×Ø Ò Ø Ö Ø ÓÒº ¯ ÊÙÐ × ÓÒ ÖÒ ÓØ Ø Ñ× Ò Ù× Ö× Í× Ö½ Ð Æ Í× Ö¾ ×Ð µ Ì Ö ØÍ× Ö Ð ÁØ Ñ½ Ð Æ ÁØ Ñ¾ Ð µ Ì Ö ØÁØ Ñ Ð ¯ Ò Ø Ø Ñ× Ö ÓÑÔÙØ ÖÓÑ ××Ó Ø ÓÒ× ÒÚÓÐÚ Ò Ù× Ö× × Ñ Ð Ö ØÓ Ø Ø Ú Ù× Öº ÓÒ¹Ð Ò ¯ Ë ÓÖ × Ó Ø Ñ× Ö ÓÑÔÙØ ÖÓÑ ××Ó Ø ÓÒ× Ö Ø Ò Ù× Ö ÔÖ Ö Ò ×º Ó ¹Ð Ò ¯ Ì Ò Ø Ø Ñ× Û Ø ×Ø × ÓÖ × Ö ×Ù ×Ø ØÓ Ø ØÚ Ù× Öº ÓÒ¹Ð Ò Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-15]
  • 36. Pattern Discovery Association mining for personalization The approach of Mobasher, et al, 2001 [45] Main Idea: avoid offline generation of all association rules; generate recommendations directly from itemsets • discovered frequent itemsets of are stored into an “itemset graph” (an extension of lexicographic tree structure of Agrawal, et al 1999 [2]) • recommendation generation can be done in constant time by doing a directed search to a limited depth According to our categorization Visibility: Personal recommenda- Service element: pageview tions or silent dynamic adjustment Matching based on: user behaviour PKDD 2001 Tutorial: “KDD for Personalization” [PD-16] [3] Methodology: • Construct Frequent Itemset Graph – each node at depth d in the graph corresponds to an itemset – I, of size d and is linked to itemsets of size d+1 that contain I at level d+1 – the single root node at level 0 corresponds to the empty itemset • frequent itemsets are matched against a user's active session S by performing a search of graph to depth |S| • a recommendation r is an item at level |S+1| whose recommendation score is the confidence of rule S ==> r PKDD 2001 Tutorial: “KDD for Personalization” [PD-17] [4]
  • 37. Pattern Discovery Sequence mining for personalization Main Idea: take the ordering of accessed items into account Two basic approaches • use contiguous sequences (e.g., Web navigational patterns) • use general sequential patterns Contiguous sequential patterns are often modeled as Markov chains and used for prefetching (i.e., predicting the next user access based on previously accessed pages In context of recommendations, they can achieve higher accuracy than other methods, but may be difficult to obtain reasonable coverage PKDD 2001 Tutorial: “KDD for Personalization” [PD-18] [5] Pattern Discovery Sequence mining for personalization Markov chain representation often leads to high space complexity due to model sizes Some Solutions • selective Markov Models (Deshpande, Karypis, 2000 [17]) use various pruning strategies to reduce the number of states (e.g., support or confidence pruning, error pruning) • longest repeating subsequences (Pitkow, Pirolli, 1999 []) similar to support pruning, used to focus only on significant navigational paths • increased coverage can be achieved by using all-Kth-order models (i.e., using all possible sizes for user histories) PKDD 2001 Tutorial: “KDD for Personalization” [PD-19] [6]
  • 38. È ØØ ÖÒ × ÓÚ ÖÝ Ë ÕÙ Ò Ñ Ò Ò ÓÖ Ô Ö×ÓÒ Ð Þ Ø ÓÒ Ì ÔÔÖÓ Ó ÙÐ ² Ë Ñ Ø¹Ì Ñ ¾ Å Ò µ Ê ÓÑÑ Ò Ø ÓÒ× Ö × ÓÒ Ö ÕÙ ÒØ Ô ØØ ÖÒ× Ó Ô ×Ø Ú ÓÙÖº µ Ö ÓÑÑ Ò Ö × ÔÖ ØÓÖ ÓÖ Ð ×× Ó Ú ÒØ׺ µ Ì ÓÒ×Ø ÐÐ Ø ÓÒ Ó Ø Ö ÓÑÑ Ò Ö× ÓÖ ÐÐ Ð ×× × Ö ØÙÖÒ× Ø ×Ø Ö ÓÑÑ Ò Ø ÓÒ× ÓÖ Ú Ò Ù× Ö ×ØÓÖݺ ÓÖ Ò ØÓ ÓÙÖ Ø ÓÖ Þ Ø ÓÒ Î × Ð ØÝ Ë ÖÚ Ð Ñ ÒØ ÍÊÄ׸ × Ø Ó Ø× Ê ÓÑÑ Ò Ø ÓÒ Å Ø Ò × ÓÒ Ò Ú Ø ÓÒ Ç ¹Ð Ò ØÖ Ò Ò Ó Ð ×× Ö× ×ØÓÖ × Ò ÍÊÄ ÔÖÓÜ Ñ ØÝ ÐÓ Ð Ö ÓÑÑ Ò Ö ×Ý×Ø Ñ× Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-20] Ò Ö Ö Ñ ÛÓÖ ¯ Û Ø Ñ ×ÙÖ × ÓÖ Ø ÕÙ Ð ØÝ Ó Ö ÓÑÑ Ò Ø ÓÒ¸ Ø Ò Ø ×Ø Ò ØÛ Ò Ò Ø ÍÊÄ× ÒØÓ ÓÙÒØ ¯ ×Ø Ò Ù × Ò ØÛ Ò ÝÒ Ñ Ò ×Ø Ø Ö ÓÑÑ Ò Ö× Ø Ø Ó» Ó ÒÓØ Ø Ù× Ö ×ØÓÖ × ÒØÓ ÓÙÒØ ¯ ÓÑ Ò Ò ÐÓ Ð Ö ÓÑÑ Ò Ö ×Ý×Ø Ñ׸ Ó Û ÔÖ Ø× Ð ×× Ó Ú ÒØ× Û Ö Ð ×× Ò ÓÒ Ù× Ö ×ØÓÖݸ ÖÓÙÔ Ó ×ØÓÖ × ÓÖ Ø Û ÓÐ Ø × Øº Ì Ö Ý¸ Ò Ú Ø ÓÒ ×ØÓÖÝ × ¬ × Ø Ó Ú ÒØ× ¬ × ÕÙ Ò Ó Ú ÒØ× ¬ ÑÓÖ ÓÑÔÐ Ü ×ØÖÙ ØÙÖ Ó Ó¹Ó ÙÖ Ò Ú ÒØ× Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-21]
  • 39. È ØØ ÖÒ × ÓÚ ÖÝ Í× ÔÖÓ Ð × ÓÖ Ô Ö×ÓÒ Ð Þ Ø ÓÒ Ì ÔÔÖÓ Ó ÅÓ × Ö Ø Ð ¿¸ ¾ ÌÛÓ ØÝÔ × Ó Ù× ÔÖÓ Ð × ÐÙ×Ø Ö× Ó × Ñ Ð Ö Ù× Ö ØÖ Ò× Ø ÓÒ× Ò¹ ÐÙ×Ø Ö× Ó Ô × ×× Ò Ý Û ØÒ × Ñ Ø Ø Ö ÑÓÚ × ØÓ Ø Ö Ô × ÛØ ×ÙÔÔÓÖØ Ð ×× Ø Ò Ñ Ò Ú ÐÙ Ö ØÒ Ø Ñ Ñ Ö× Ó ÐÙ×Ø Ö ÒØÓ ÓÒ Ö ÔÖ × ÒØ Ø Ú ÔÖÓ Ð ÓÖ Ò ØÓ ÓÙÖ Ø ÓÖ Þ Ø ÓÒ Î × Ð ØÝ È Ö×ÓÒ Ð Ö ÓÑÑ Ò ¹ Ë ÖÚ Ð Ñ ÒØ Ô Ú Û Ø ÓÒ ÓÖ × Ð ÒØ ÝÒ Ñ Ù×ØÑ ÒØ Å Ø Ò × ÓÒ Ù× Ö Ú ÓÙÖ Ç ¹Ð Ò × ÓÚ ÖÝ Ó Ð×Ó Ô ÓÒØ ÒØ Ò Ö Ø ÔÖÓ Ð × Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-22] µ Ú × Ñ Ð Ö Ô Ö ÓÖÑ Ò ØÓ ÓÒ¹Ð Ò ÓÐÐ ÓÖ Ø Ú ÐØ Ö Ò Ñ× µ Ù× Ò Ñ Ò Ñ Ð ÒÙÑ Ö Ó Ô Ú Û× ÓÖ Ø Ø Ú Ù× Ö Å Ø Ó ÓÐÓ Ý ¯ ÈÖ ÔÖÓ ×× Ò Ô × ¬ ×× ÒÑ ÒØ Ó Û Ø× ØÓ Ø Ô Ú Û× ¬ Ë Ò Ò Ø ×Ø Ò ¸ × ÓÒ Ô ×Ø Ý Ø Ñ ¬ ÆÓÖÑ Ð Þ Ø ÓÒ Ó Ô Ú Û Û Ø× ¯ È Ì ÈÖÓ Ð Ö Ø ÓÒ × ÓÒ ÐÙ×Ø Ö Ò Ì Ò ÕÙ × ½º ÐÙ×Ø Ö Ò Ó Ù× Ø ØÓ ×Ø Ð × Ø Ö Ø ÔÖÓ Ð × ¾º Å Ø Ö Ð Þ Ø ÓÒ Ó Ø ÔÖÓ Ð × × Ú ØÓÖ× Ó ´Ô ¸Û ص Ô Ö× ¿º Ë Ò Ó Ø Ù× Ö³× ×ØÓÖÝ Ý Ñ Ò× Ó ×Ð Ò Û Ò ÓÛ Ø Ø ÐÐÓÛ× ÓÒÐÝ × Ø Ó Ô ×× × ØÓ ÓÒ× Ö Ò Ø ÔÖÓ Ð º Å Ø Ò Ø Ù× Ö × ×× ÓÒ Û Ø ÔÖÓ Ð º Å Ø Ö Ò Ò Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [PD-23]
  • 40. A Framework for Personalization Based on Aggregate Profiles Offline Phase PKDD 2001 Tutorial: “KDD for Personalization” [PD-24] [7] A Framework for Personalization Based on Aggregate Profiles Input from the batch process Online Usage Profiles Phase Content Profiles • Match current user’s activity against the discovered profiles • Each recommended item is assigned a score based on – matching criteria and quality of aggregate profiles – “information value” of the item based on domain knowledge PKDD 2001 Tutorial: “KDD for Personalization” [PD-25] [8]
  • 41. Aggregate Profiles Based on Clustering Transactions (PACT) (Mobasher, et al, [42, 43]) • Input – set of relevant pageviews in preprocessed log P = { p1 , p2 ,! , pn } – set of user transactions T = {t1 , t 2 , ! , t m } – each transaction is a pageview vector t = w( p1 , t ), w( p2 , t ),..., w( pn , t ) PKDD 2001 Tutorial: “KDD for Personalization” [PD-26] [9] Aggregate Profiles Based on Clustering Transactions (PACT) • Transaction Clusters – each cluster contains a set of transaction vectors – for each cluster compute centroid as cluster representative " c = u1c , u2 ,!, un c c • Aggregate Usage Profiles – a set of pageview-weight pairs: for transaction cluster c C, select each pageview pi such that ui (in the cluster centroid) is greater than a pre-specified threshold PKDD 2001 Tutorial: “KDD for Personalization” [PD-27] [10]
  • 42. Example Aggregate Profiles • Example Profiles based on the PACT method – Based on data from the Association for Consumer Research Site: 1.00 1.00 Call for Papers Call for Papers 0.67 0.67 ACR News Special Topics ACR News Special Topics 0.67 0.67 CFP: Journal of Psychology and Marketing I CFP: Journal of Psychology and Marketing I 0.67 0.67 CFP: Journal of Psychology and Marketing II CFP: Journal of Psychology and Marketing II 0.67 0.67 CFP: Journal of Consumer Psychology II CFP: Journal of Consumer Psychology II 0.67 0.67 CFP: Journal of Consumer Psychology I CFP: Journal of Consumer Psychology I 1.00 1.00 CFP: Winter 2000 SCP Conference CFP: Winter 2000 SCP Conference 1.00 1.00 Call for Papers Call for Papers 0.36 0.36 CFP: ACR 1999 Asia-Pacific Conference CFP: ACR 1999 Asia-Pacific Conference 0.30 0.30 ACR 1999 Annual Conference ACR 1999 Annual Conference 0.25 0.25 ACR News Updates ACR News Updates 0.24 0.24 Conference Update Conference Update PKDD 2001 Tutorial: “KDD for Personalization” [PD-28] [11] Hypergraph-Based Clustering (Han, Karypis, Kumar, Mobasher, 1998 [26]) • Construct a hypergraph from sets of related items – Each hyperedge represents a frequent itemset – Weight of each hyperedge can be based on the characteristics of frequent itemsets or association rules (e.g., support, confidence, interest, etc.) PKDD 2001 Tutorial: “KDD for Personalization” [PD-29] [12]
  • 43. Hypergraph-Based Clustering • Recursively partition hypergraph so that each partition contains only highly connected items – Given a hypergraph we find a k-way partitioning such that the weight of the hyperedges that are cut is minimized – The fitness of partitions measured in terms of the ratio of weights of cut edges to the weights of uncut edges within the partitions – The connectivity measures the percentage of edges within the partition with which the vertex is associated -- used for filtering partitions – Vertices from partial edges can be added back to clusters based on a user-specified overlap factor PKDD 2001 Tutorial: “KDD for Personalization” [PD-30] [13] Profiles Based on Hypergraph Clusters (Mobasher, Cooley, Srivastava, 1999 [41]) • Input – input for clustering is the set of large itemsets from association rule module – each itemset is a hyperedge (weights are a function of the interest of the itemset) support( I ) Interest ( I ) = ∏ i∈I support(i) – In practice can use the log of interest to avoid few highly frequent patterns from totally dominating PKDD 2001 Tutorial: “KDD for Personalization” [PD-31] [14]
  • 44. Profiles Based on Hypergraph Clusters • Aggregate Profiles (Item/Pageview Clusters) – clustering program directly outputs a set of overlapping pageview clusters – the weight associated with pageview p in a cluster C is based on the connectivity value of p in hypergraph partition: {e | e ⊆ C , p ∈ e} conn( p, C ) = {e | e ⊆ C} PKDD 2001 Tutorial: “KDD for Personalization” [PD-32] [15] Recommendation Engine for Using Aggregate Profiles • Match user’s activity against discovered profiles – a sliding window over the active session to capture the current user’s “short-term” history depth – profiles and the active session are treated as vectors – matching score is computed based on the similarity between vectors (e.g., normalized cosine similarity) • Recommendation scores are based on • matching score to aggregate profiles • “information value” of the recommended item (e.g., link distance of the recommendation to the active session) – recommendations are contributed by multiple profiles PKDD 2001 Tutorial: “KDD for Personalization” [16] [PD-33]
  • 45. Active Session Window • Example: Session window of size 5 A.html ! B.html ! C.html ! D.html ! E.html ! D.html ! F.html active user session Session window • Associating weight with items in the active session: – assigned by site owner based on perceived importance – based on recency (recent pages weighted higher) or time spent on pages – based on page types (e.g., content v. navigational) PKDD 2001 Tutorial: “KDD for Personalization” [PD-34] [17] Example: Recommendations Based on PACT Example profiles: Current User Session U: A.html => B.html => C.html => E.html PROFILE 0 ------------- Assume session window size of 3 and unit weights, using 1.00 D.html (cosine) similarity between active session and each profile: 0.50 A.html 0.50 C.html Sim(U, P0) = (0.5+0.5) / SQRT (1.75 * 3) = 0.44 0.50 E.html Sim(U, P1) = (0.5+0.5+0.5) / SQRT(2.5*3) = 0.20 Sim(U, P2) = (0.75+0.5) / SQRT(1.69*3) = 0.25 PROFILE 1 ------------- Recommendations 1.00 A.html Candidate Recommendations: 0.50 B.html 0.50 C.html P0: D.html (SQRT(0.44*1.00) = 0.66) 0.50 D.html A.html (SQRT(0.44*0.50) = 0.47) 0.50 E.html 0.50 F.html P1: A.html (SQRT(0.20*1.00) = 0.45) PROFILE 2 D.html (SQRT(0.20*0.50) = 0.32) ------------- F.html (SQRT(0.20*0.50) = 0.32) 0.75 B.html 0.75 F.html 0.50 A.html P2: F.html (SQRT(0.22*0.75) = 0.41) 0.50 C.html A.html (SQRT(0.22*0.50) = 0.33) 0.25 D.html D.html (SQRT(0.22*0.25) = 0.23) PKDD 2001 Tutorial: “KDD for Personalization” [PD-35] [18]
  • 46. Integration of Content Profiles (Mobasher, et al., 2000 [44]) • Cluster features over the n-dimensional space of pageviews • For each feature cluster derive a content profile by collecting pageviews in which these features appear as significant (represented as overlapping collections of pageview-weight pairs) Weight Pageview ID Significant Features (stems) 1.00 CFP: One World One Market world challeng busi co manag global 0.63 CFP: Int'l Conf. on Marketing & Development challeng co contact develop intern 0.35 CFP: Journal of Global Marketing busi global 0.32 CFP: Journal of Consumer Psychology busi manag global Weight Pageview ID Significant Features (stems) 1.00 CFP: Journal of Psych. & Marketing psychologi consum special market 1.00 CFP: Journal of Consumer Psychology I psychologi journal consum special market 0.72 CFP: Journal of Global Marketing journal special market 0.61 CFP: Journal of Consumer Psychology II psychologi journal consum special 0.50 CFP: Society for Consumer Psychology psychologi consum special 0.50 CFP: Conf. on Gender, Market., Consumer Behavior journal consum market PKDD 2001 Tutorial: “KDD for Personalization” [PD-36] [19] Integration of Content Profiles • Integration with Recommendation Engine – Usage and content profiles have similar representation, so they can be used by the recommendation engine in the same way • Item weights in profiles must be normalized, so content and usage profiles can be compared on the same scale – One approach: match active user session with all profiles (both content and usage); then use the maximal recommendation score for candidate recommendations – Another approach: use content profiles for generating recommendations only if no matching usage profiles (with sufficient confidence) is found PKDD 2001 Tutorial: “KDD for Personalization” [PD-37] [20]
  • 47. Evaluating Personalization PKDD 2001 Tutorial: “KDD for Personalization” [E-1] Evaluating usability: goals / tasks? Recall operational definition: A Web site’s usability is high if users - achieve their goals / perform their tasks in little time, - do so with a low error rate, - experience high subjective satisfaction. Depending on the site, relevant goals / tasks may be to: - stay in the site, return to the site, buy... => E-metrics - locate content (search), - learn, - ... PKDD 2001 Tutorial: "KDD for Personalization" [E-2]
  • 48. Evaluating usability: methodological caveats Questionnaire data: self-reports are often biased; observation of behavior in experiments advisable Comparisons of sites with/without personalization, or before/after personalization introduced, with respect to "normal user behavior" (server logs): usually a quasi-experiment - many uncontrolled variables (e.g., user intentions) - poss. several differences between sites/site versions => causal attribution of success to personalization becomes difficult PKDD 2001 Tutorial: "KDD for Personalization" [E-3] Evaluating usability: results I CyberBehavior Research Center 1999 survey - 81% of 694 respondents have visited a person. site - 64% of those found it useful: helpful, time saving - perceived usefulness changes with product (books > music > inf.technol. > news/articles > other) - main problems: privacy, ineffectiveness when behav. did not reflect user "personally" (e.g., buying a gift) - concern that possible choices may be limited - little differences of opinion between personalization occurring in response to behavior or to solicited input PKDD 2001 Tutorial: "KDD for Personalization" [E-4]
  • 49. Evaluating usability: results II Belkin [3], reviewing studies of recommendations in IR systems carried out at Rutgers Univ. since 1995: - measures of performance and subj. satisfaction - relevance feedback worked well, but bettter with both increased knowledge of how it worked, and with increased control by the user of its suggestions: - relevance feedback + term suggestion performed better than, and was preferred to, pure relevance feedback - users preferred to save effort: were willing to hand over the subsidiary task of term selection to a system they trust ed PKDD 2001 Tutorial: "KDD for Personalization" [E-5] Evaluating usability: results III Nielsen Net Ratings 1999 registered visitors of portal sites, i.e., those who can customize, - spend > 3 times longer at home portal than others - view 3-4 times more pages PKDD 2001 Tutorial: "KDD for Personalization" [E-6]
  • 50. Why are results scarce? Possible reasons "In essence, web design is a problem in user interface design. However, ... few web designers can afford to subject their web sites to formal usability testing in special labs." Perkowitz & Etzioni [52]: Adaptive web sites: an AI challenge. "Web personalization is much over-rated and mainly used as a poor excuse for not designing a navigable website." Nielsen [47]: Personalization is over-rated. "Personalization costs. ... You’re more likely to get a good return on your efforts ... by fixing other problems, such as difficulty in locating content." Lighthouse on the Web [36], quoting from Mainspring and User Interface Engineering PKDD 2001 Tutorial: "KDD for Personalization" [E-7] Can other results be transferred? Research on adaptive educational software since ~ 1970 - usually, user control helpful for learning; adaptive interfaces particularly helpful for novices - interfaces changing over time: difficult to learn - adaptive presentation (more info depending on user knowledge) improves comprehension and reduces reading time - adaptive link annotation - can reduce no. of visited pages + learning time - encourages novices to navigate non-sequentially - enables users to rate the difficulty of a page better PKDD 2001 Tutorial: "KDD for Personalization" [E-8]
  • 51. Can other results be transferred? (contd.) - adaptive link ordering improves user performance in information search tasks - but unstable order of options is confusing for novices so hiding is better for novices - for novices, direct guidance is useful ("next" link is most popular choice) - the more users agree with the system’s suggestions, the better their test results (surveys in [11,12]) PKDD 2001 Tutorial: "KDD for Personalization" [E-9] Further factors affecting subjective satisfaction - user control (general guideline for software development) - must match user’s interests at the moment - users don’t want extra work: "paradox of the active user" - users don’t like to be recognized too soon - users want to be anonymous, at least at certain times - users want openness / disclosure - people don’t want relationships with corporations, but with other people - be specific without being exclusive - consider information structure on Web (non-monetary rewards better than differential pricing) respect the user ! PKDD 2001 Tutorial: "KDD for Personalization" [E-10]
  • 52. È ØØ ÖÒ Ú ÐÙ Ø ÓÒ ÖÓÑ Ø Ù× Ò ×× È Ö×Ô Ø Ú Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ ÅÝÖ ËÔ Ð ÓÔÓÙÐÓÙ ÀÀÄ º[E-11] ºº Í× Ö Ë Ø × Ø ÓÒ ² Ù× Ò ×× ËÙ ×× ÓÑÔ ÒÝ ÓÔ Ö Ø Ò Ï ×Ø × ÓÙÐ Ö ØÓ Ö Ø Ú ÐÙ ÓÖ Ø× ´ÔÖÓ×Ô ØÚ µ Ù×ØÓÑ Ö× µ Á Ø Ö × ÒÓ Ú ÐÙ ÓÖ Ø Ù× Ö׸ Ø Ý Û ÐÐ ÒÓØ ÙÝ Ò Ø Ý Û ÐÐ ÒÓØ ÓÑ Òº µ Á Ø Ù× Ö×» Ù×ØÓÑ Ö× Ö ÒÓØ × Ø × ¸ Ø Ý Û ÐÐ ÒÓØ ÙÝ Ò »ÓÖ Ø Ý Û ÐÐ ÒÓØ ÓÑ Òº µ Í× Ö» Ù×ØÓÑ Ö × Ø × Ø ÓÒ × ÔÖ Ö ÕÙ × Ø ÓÖ Û ÒÒ Ò Ø Ñ ØÓ Ø ÓÑÔ Òݺ ¯ ÓÒÚ Ö× ÓÒ Ì Ù× Ö ÓÑ × Ù×ØÓÑ Öº Ï ÒÒ Ò Ñ Ò× ¯ Ê Ø ÒØ ÓÒ Ì Ù×ØÓÑ Ö ×Ø Ý× ÐÓÝ Ðº Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-12]
  • 53. Í× Ö Ë Ø × Ø ÓÒ ÅÓ ÐÐ Ò ÁÒ ØÓÖ× Ø Ø Ö ÕÙ Ö ÒØ Ö Ø ÓÒ Û Ø Ø Ù× Ö ¯ ÁÒØ Ö Ø Ú ØÝ ¯ × Ó Ù× ¯ ÈÐ × Ò ÒÚ ÖÓÒÑ Òظ ÒØ ÖØ Ò Ò ÒÚ ÖÓÒÑ ÒØ ¯ ÅÙÐØ ÔÐ Ò Ú Ø ÓÒ Ñ Ø Ô ÓÖ× ¯ ººº ¯ Î ÐÙ Ö Ø ÓÒ¸ × Ô Ö Ú Ý Ø Ù× Ö ÁÒ ØÓÖ× Ø Ø Ò Ñ ×ÙÖ » ÔÔÖÓÜ Ñ Ø Û Ø ÓÙØ Ù× Ö ÒØ Ö Ø ÓÒ ¯ È × Ô Ö Ú × ØÓÖ ¯ ÙÖ Ø ÓÒ Ó ×Ø Ý ¯ Î × ØÓÖ× Ô Ö Ô ¼ ¯ Ê ×ÔÓÒ× Ø Ñ ¼ Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-13] Í× Ö Ë Ø × Ø ÓÒ ÓÑÔÙØ Ø ÓÒ ¯ Á ÒØ Ø ÓÒ Ó × Ø Ó × Ø× Ø ÓÒ Ò ØÓÖ× ¯ × ÒÓ Ò ÔÔÖÓÔÖ Ø ÕÙ ×Ø ÓÒÒ Ö ¯ ÈÖ × ÒØ Ø ÓÒ Ó Ø ÕÙ ×Ø ÓÒÒ Ö ØÓ Ö ÔÖ × ÒØ Ø Ú Ù× Ö × ÑÔÐ ¯ Ò ÐÝ× × Ó Ø Ö ×ÔÓÒ× × ¯ ÓÒ ÐÙ× ÓÒ× ÓÒ Ø ÑÔ Ø Ó Ø ÓÖÖ Ð Ø ÓÒ× ÑÓÒ Ø × Ø× Ø ÓÒ Ò ØÓÖ× Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-14]
  • 54. Í× Ö Ë Ø × Ø ÓÒ Ò ÜÔ Ö Ñ ÒØ Ì ×ØÙ Ý Ó Ñ Ý ¾½ ¯ ØÓÖ× Ö Ø Ò Ù× Ö × Ø × Ø ÓÒ ¡ × Ó Ù× ¡ ÁÒ ÓÖÑ Ø ÓÒ ÙØ Ð ØÝ Ó Ø ÔÖ × ÒØ ÓÒØ ÒØ ¡ ØØÖ Ø Ú Ò ×× Ó Ø ÔÖ × ÒØ Ø ÓÒ Ñ Ø Ô ÓÖ ¡ ººº ¯ ÜÔ Ö Ñ ÒØ Ð × ØØ Ò × ÓÖ Ø Ú ÐÙ Ø ÓÒ Ó × ØÓ ÓÑÑ Ö Ð ×Ø × ¡ Å ÔÔ Ò Ó Ø ØÓÖ× ÓÒ ÕÙ ×Ø ÓÒÒ Ö ¡ ×Ø Ð × Ñ ÒØ Ó ÖÓÙÔ Ó Ö ÔÖ × ÒØ Ø Ú Ù× Ö× ¡ ÜÔ Ö Ñ ÒØ Ø ÓÒ ÓÒ ÐÓ Ð ÓÑÔÙØ Ö ÔÓÓÐ Ò Ú ØÖÓ ¯ ËØ Ø ×Ø Ð Ò ÐÝ× × Ó Ø Ù× Ö Ö ×ÔÓÒ× × ¯ Ê Ò Ò Ó Ø ØÓÖ× Ý ÑÔÓÖØ Ò Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-15] Ì Ò Ò × Ó ¾½ Ö ¯ ÉÙ Ð ØÝ Ó Ø ÔÖ × ÒØ Ø ÓÒ Ñ Ø Ô ÓÖ ÒØ ÖØ ÒÑ ÒØ Û Ò ×× Ò Ø ×Ø ÔÐ Ý× Ø ÑÓ×Ø ÑÔÓÖØ ÒØ ÖÓÐ º ¯ ÁÒ ÓÖÑ Ø ÓÒ ÙØ Ð ØÝ Ì ÑÓÙÒØ Ó Ò ÓÖÑ Ø ÓÒ Ñ Ú Ð Ð × Ø × ÓÒ ÑÓ×Ø ÑÔÓÖØ ÒØ ØÓÖº ÙÖØ Ö Ò Ò × Ì Û × Ø × Ø ×Ø ÒÓØ Ñ ×ØÖÓÒ Ò Ù× ÙÐ ÓÒÒ Ø ÓÒ ÛØ Ø ÒØ Ö ×Ø× Ó Ø ×ØÙ Ý Ô ÖØ Ô ÒØ× Ò ÒÓØ ×Ù Ò Ö ØÒ ÓÒØ ÜØ Ò × Ò× Ó ÓÑÑÙÒ ØÝ Ò ØÓ ÙÐ ÓÒØ ÒÙ Ò Ö Ð Ø ÓÒ× Ô ÛØ Û ×Ø Ù× Ö× º Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-16]
  • 55. ź ËÔ Ò ÓÐ Ò Ö ÔØÙÖ × Ú Ý Ö× Ó ÒØ ¹ Ù×ØÓÑ Ö¹× Ø × Ø ÓÒ Ö ÔÓÖØ× ÒØÓ Ø ÕÙ ×Ø ÓÒ Á× Ù×ØÓÑ Ö Ë Ø × Ø ÓÒ ÁÖÖ Ð Ú ÒØ Ò×Û Ö Ù×ØÓÑ Ö Ñ ×ÙÖ Ñ ÒØ ×Ý×Ø Ñ× × ÓÙÐ Ö Ú×Ø º Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-17] Í× Ö Ë Ø × Ø ÓÒ ² Ù× Ò ×× ËÙ ×× ¯ Í× Ö» Ù×ØÓÑ Ö × Ø × Ø ÓÒ × ÔÖ Ö ÕÙ × Ø ÓÖ Û ¹× Ø ³× ×Ù ×׺ ¯ Í× Ö» Ù×ØÓÑ Ö × Ø × Ø ÓÒ Ó × ÒÓØ ÑÔÐÝ Û ¹× Ø ³× ×Ù ×׺ Ù× ¬ Ì Ó Ð Ó Û ¹× Ø × ÒÓØ ØÓ Ñ Ù× Ö× ÔÔݺ ¬ Ì Ó Ð Ó Û ¹× Ø × ØÓ ÓÒØÖ ÙØ ÒØÓ Ù× Ò ×× ×Ù ×׺ Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-18]
  • 56. Í× Ö Ë Ø × Ø ÓÒ ² Ù× Ò ×× ËÙ ×× ¯ Û Ö Ò ×× ¯ ÓÒØ Ø ¯ ÓÒÚ Ö× ÓÒ ¬ Ò ÓÒÑ ÒØ Ò×Ø Ó ÓÒÚ Ö× ÓÒ ¯ Ê Ø ÒØ ÓÒ Ò ¬ ØØÖ Ø ÓÒ Ò×Ø Ó Ö Ø ÒØ ÓÒ ÀÓÛ ÒØ × ÓÒ ÔØ× ØÖ Ò×Ð Ø ÒØÓ Ò ØÓÖ× ÓÑÔÙØ Ð ÙÔÓÒ Ù×ØÓÑ Ö Ø Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-19] Ù× Ò ×× ËÙ ×× ÖÓÑ Ø Ú ÛÔÓ ÒØ Ó Ø ËØ ¡ ÆÙÑ Ö Ó Ô Ö ÕÙ ×Ø× ¯ ËØ Ò Ý × ¾¼ ¡ ÙÖ Ø ÓÒ Ó × Ø Ú × Ø× ¡ Ê ×ÔÓÒ× ØÑ ¡ ËÙÔÔÓÖØ Ò Ú Ø ÓÒ ÑÓ ¡ × ÓÚ Ö Ð ØÝ ¯ Ë Ø ÕÙ Ð ØÝ ¼ ¡ ×× Ð ØÝ ¡ È × Ô Ö Ú × ØÓÖ ¡ Î × ØÓÖ× Ô Ö Ô Èà ¾¼¼½ ÌÙØÓÖ Ð Ã ÓÖ È Ö×ÓÒ Ð Þ Ø ÓÒ [E-20]