This presentation on Diversification is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Arcomem training diversification
1. Athena Research and Innovation Center
Yahoo! Research
Diversifying User Comments
on News Articles
2. 2
Problem
Problem description:
Given a news article and the respective set of user comments,
return a subset of the most diverse comments
Perception of a diverse set of comments:
A set of comments that represents different opinions and
sentiments,
…expressed by users with different demographic
characteristics,
…covering different aspects of the news article.
Motivation
Article’s content itself is not always enough to form a complete
view over a topic
The public opinion complements the article and represents
the “wisdom of the crowds”
3. 3
Example
Given a political article:
Find all the subtopics handled
Persons related
Events (election, bill voting)
Find all opinions and
sentiments expressed
Positive/negative/neutral
On the whole article/on specific
subtopics
Find different kinds of users
commenting
Different demographics
Different commenting history on
previous articles
Present a set of comments that
better represents the diversity
of the above dimensions
4. 4
Motivation
Several articles are very
popular (>10000 comments)
Articles get aggregated
even more comments
Impossible for the reader to
review
Current comment sorting
options are based on more
simple criteria
Date
Votes
Replies
5. 5
Method outline
Define diversification criteria
Dimensions
Content, Sentiment, Named Entities, User co-commenting
behavior
Define a (dis)similarity function that produces a diversity
score based on the criteria
Quantify the dissimilarity of comments
Weighted sum of cosine similarities on diversity feature
vectors
Apply and iterative heuristic algorithm that, at each step,
selects the candidate comment that maximizes a diversity
objective
6. 6
Method description - Criteria
Content
Baseline diversity criterion
Used in the rest of the literature to diversify search results.
Objective obtain comments with diverse content.
Processing
Comments’ text term vectors
Document length-normalized tf values
7. 7
Method description - Criteria
Named Entities (Nes)
Person, Organizations, Locations
Many times news articles revolve around Nes
Even when an article talks about events or situations, usually one
or more Persons or Locations are involved
Objective obtain comments that cover (uniformly) as many
different NEs as possible
Processing
Extraction of Nes in comments (Stanford NER)
Comments’ Nes term vectors
Document length-normalized tf values
8. 8
Method description - Criteria
Sentiment
9 classes of sentiment within the interval [-4, 4]
-4 very negative
4 very positive
0 neutral
Expresses users’ opinions on the news articles’ topics.
Objective obtain comments that cover (uniformly) different
classes of sentiment
Processing
Sentiment analysis of the comment’s text (SentiStrength)
Construct sentiment vectors
Each vector value represents a sentiment class
9. 9
Comment scoring
Cosine similarity function between
A pair of comments
A comment and a set of comments
Apply the similarity function for each criterion separately
Produce a final diversity score as a weighted sum of all
criterion scores
Produce a final score that incorporates comment-to-article
similarity
10. 10
Algorithm (MAXSUM)
Initially
Empty diverse result set all comments belong to the
candidate set
Arbitrary insertion of a candidate comment into the result set
Greedy construction heuristic
Compare each candidate comment with the centroid
(average) of the current result set
Finish after (k-1) iterations k comments are inserted
11. 11
Evaluation
Comparison of methods’ coverage on different information
nuggets they contain
Baseline diversification based only on content
Proposed method (combination of multiple criteria)
Proposed methods outperform the baseline
12. 12
Framework - Implementations
A desktop java application retrieving news articles and
comments comments stored in a MySQL database
News and comments obtained by the NY Times API
Arcomem offline module for calculating diverse
WebObjects of WebResources