Pharos Social Map Based Recommendation For Content Centric Social Websites

IBM China Research Laboratory

Social Map Based Recommendation for
Content-Centric Social Websites

IBM Research - China
Presenter: Shiwan Zhao (zhaosw@cn.ibm.com)

Pharos Team:

赵石顽袁泉张夏天郑文涛

Advisor: Michelle Zhou, Rongyao Fu, Changyan Chi 1


About me

1993~1998
– B.S. Computer Science, Tsinghua University
1998~2000
– M.S. Computer Science, Tsinghua University
2000~now
– IBM Research - China

2007~now
– Focus on recommendation technologies

2


Agenda

Part 1:
– Problem & challenges
– Pharos solution overview
– Demo
Part 2:
– Some technology details

3


Problem

Content-centric social websites (e.g., forums,
wikis, and blogs) have flourished with the
exponential growth of user-generated information
– Overwhelming amount
– Evolving over time
– Not well organized
It is hard for users, especially new users, to grasp
what’s out there and then find out interested
information

4

Example China Research Laboratory
IBM

A Blog website contains huge amount of dynamically evolving content (blog
entries), while not providing effective navigation approaches
– Search
• Be useful when users have well-defined goals
– Recent entries
– Top entries by
• most comments
• most ratings
• most visits
– Featured blog entries
– Tag cloud
– …
Like looking for needles in a haystack, without guidance, novice users can
NOT find anything interesting, then leaves BlogCentral quickly (low
stickiness), and won’t come back again (low stickiness)

5


Existing solutions & challenges

Researchers have developed recommender
systems to solve this information overload
problem
– E.g. Blog/News/Webpage recommender

However, current recommenders must address
two challenges:
– difficult to make effective recommendations for new users
(the cold start problem) due to the lack of user
information
– difficult to explain recommendation rationales to end
users to make the recommendation more trustworthy
6


Pharos Solution
Dynamically create a social map helping users find out who's talking
about what in an online site.

Social map creation
– Modeling & summarizing
time-sensitive user
behaviors of content-centric
online sites as a set of
“latent communities”
Social map based
recommendations
– Provide social landmarks
for new users to jump start
– Provide personalized social
map for experienced users
to effectively navigate the
community

7

Demo screenshot

John

Steve

Michael

Alice

Tom

8


Agenda

Part 1:
– Problem & challenges
– Pharos solution overview
– Demo
Part 2:
– Some technology details

9


Pharos Overview

* Multi-faceted recommendation
Triggers Visual Recommendation
Explanations Info item (page, fragment)
Explicit
People (reference to Bluepages, URL)
Implicit Recommendation
Algorithms Community (latent, dynamic
community)

Social Map
.. . . .. .
. . .......
.. Time-sensitive social map as
. ... ..... . .
. .... recommendation context
target user

Time

Content Modeling
Content Modeling
Behavior Mining
Behavior Mining

User behavior on content
10


Pharos Technical Focus

Visual Recommendation
Explanations

3. Community
Recommendation
summary Algorithms

2. Community/item/
people ranking
Social Map
.. . . .. .
. . .......
..
. ... ..... . .
. ....
target user
1. Latent community
Time
extraction
Content Modeling
Content Modeling
Behavior Mining
Behavior Mining

11


Latent community extraction

Three approaches
– Directly model user-content relationships by using co-
clustering methods
– Group people firstly, then find associated content
– Group content firstly, then find associated people

12


Approach 1: time-elastic co-clustering

How long of the time window size we should use
to mining the communities?
How long is right?

. . ...
.
... ... .. ..... ... . .. . . .. ..... ....
.. . . .
........ .......
.. .. ... .... ..
. .
... ........... . .... .. . .. .. . ..
.. . . . .... . ....... .. ...... .......
.. . ..
. . . ... . . . . . . .
. . . . . .. .. . ... .
Time
Time-Elastic ad hoc April 2009
community detection
Community Map

GraphScope: Parameter-free Mining of Large Time-evolving
Graphs, Jimeng Sun, et al. KDD’07
13

Input Data – Graph Stream
User actions as a stream
... .............. ...... .. . ..... . .. ........ .......
. . . . . .. . ....
... . . .................. . . . . .
. . .. . .
... .
. .. .. .. .
Time

Split click stream into many small time atom frame

... . . .. .................. . . . . . . . .... ....
. . .... .. . . . . ..
. . . . . .. . . . .. .
Time

A frame click stream data can
be presented by a user-item
matrix (Graph).
– In the matrix, 1 means one
interaction between user
and item.
14

Approach
Two Step
– Co-clustering graphs
– Decide whether a new come graph should be merged with
current segment or start a new segment
Based on the MDL (Minimum Description Length) of
graphs
– MDL is the limit of graphs can be compressed
– Decide merging or splitting a segment
• If compress graphs together can save more encoding cost
than compress them respectively, we merge the new graphs
with current segment.
• Otherwise, we start a new segment by the new Graph

15


Pros and cons
Pros
– Clustering users and items on the same time
– Parameter free
• Don’t need to assign cluster numbers
– Automatically decide the size of time window
Cons
– Fixed Graph Size
• Any graphs must have the same size (rows and columns)
• Can’t handle new users and items
– Can’t handle large scale graphs
– Can’t guarantee the optimal result
– Result on very sparse graph is not very good
• Communities don’t make sense.
• Our data is extremely sparse (< 0.1%)

16

Approach 2: evolutionary spectral clustering for user
clustering

Discover communities within a time window
– Get high quality clustering in each time window
Model community evolution for a sequence of time windows
– Make the evolution between time windows smooth

Community Map

.. . .. .. ... ..
..
.. ..
... .. .. .. .. . ... ..
.. .. . ..

Time
Jan 2009 Feb 2009 Mar 2009 Apr 2009

In BlogCentral Domain

17

Evolutionary framework

Basic Idea
– Cost Function: Cost = α*CS +β*CT
• Snapshot cost (CS), measures the snapshot quality of the current
clustering result with respect to the current data features,
• Temporal cost (CT), measures the temporal smoothness in terms of the
goodness-of-fit of the current clustering result with respect to either
historic data features or historic clustering results
Two Evolutionary framework
– PCQ for preserving cluster quality, the current partition is applied to
historic data and the resulting cluster quality determines the temporal
cost.
– PCM for preserving cluster membership, the current partition is directly
compared with the historic partition and the resulting difference
determines the temporal cost.
– PCQ is our currently implemented framework

Evolutionary Spectral Clustering by Incorporating Temporal
Smoothness, Yun Chi, et al. KDD’07 18


Approach 3: LDA for content clustering

Latent Dirichlet Allocation (LDA), a probabilistic latent
semantic model for topic analysis
⎛ N ⎞ k
p (w α , β ) = ∫ p (θ α )⎜ ∏∑ p ( z n θ ) p ( wn z n , β ) ⎟d θ
⎜ n =1 z ⎟
⎝ n ⎠
[Blei et al. 03]

LDA is a generative probabilistic model of a corpus. The basic
idea is that the documents are represented as random mixtures
over latent topics, where a topic is characterized by a
distribution over words.

19


Graphical Model of LDA

20


Latent community extraction - comparison

Co-clustering
– Not work well for extremely sparse data (<0.1%)
Spectral clustering for user
– Most behaviors are from anonymous user, difficult to
distinguish users
– Topics are not concentrated for each community
* LDA for content clustering
– Users are more likely to be interested in content

21



Explanations

3. Community
Recommendation
summary Algorithms

2. Item/people
ranking
Social Map
.. . . .. .
. . .......
..
. ... ..... . .
. ....
target user
1. Latent community
Time
extraction
Content Modeling
Content Modeling
Behavior Mining
Behavior Mining

22

Item/People Ranking
PR( p j )
PR( pi ) = (1 − d )cvi + d ∑
Authority-based ranking by
context-sensitive PageRank,
considering p j ∈M ( pi ) L( p j )
– Time factor
Context vector (e.g., item attributes)
– Context information, e.g., item
attributes, report chain of people

People Blog entries
Influential people:
Active author with A 1
high quality entries Influential entry:
written by influential
authors, high visited /
B 2 commented
Authority from author to entry
Authority from entry to author
C 3
Authority from commenter/rater to entry
Authority from visitor to entry
D 4
23



Explanations

3. Community
Recommendation
summary Algorithms

2. Item/people
ranking
Social Map
.. . . .. .
. . .......
..
. ... ..... . .
. ....
target user
1. Latent community
Time
extraction
Content Modeling
Content Modeling
Behavior Mining
Behavior Mining

24


Community Summary & visualization

Community representative keywords extraction
– Modified TF/IDF
– Content topic modeling by LDA (Latent Dirichlet Allocation)

Visualization
– A bubble chart layout (used by ManyEyes2) to pack top-N
communities tightly on the social map
• bubble’s size is determined by community’s ‘hotness’
– Inside each community, Wordle3 layout used to pack labels
tightly
25


Summary
Model, detect, and use a social map that summarizes user behavior of
online sites to make accurate and trustworthy recommendations

Increase recommendation accuracy
– Helps “cold start” problem by providing new users with “social landmarks” of
a social site to jump start their engagement
– Provides users with overall social awareness to compensate for
recommendation inaccuracy
Enhance recommendation trustworthiness
– Explain recommendation results in the context of a social map
Interactive recommendation
– User can navigation through the social map to find what they need

26


Thanks!

27

Pharos Social Map Based Recommendation For Content Centric Social Websites

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Pharos Social Map Based Recommendation For Content Centric Social Websites

Semelhante a Pharos Social Map Based Recommendation For Content Centric Social Websites (20)

Mais de gu wendong

Mais de gu wendong (8)

Último

Último (20)

Pharos Social Map Based Recommendation For Content Centric Social Websites