A brief description of the Opinion-Based Entity Ranking paper published in the Information Retrieval Journal, Volume 15, Number 2, 2012.
Slides By Kavita Ganesan.
Presentation on how to chat with PDF using ChatGPT code interpreter
Opinion-Based Entity Ranking
1. Ganesan & Zhai 2012, Information Retrieval, Vol 15, Number 2
Kavita Ganesan (www.kavita-ganesan.com)
University of Illinois @ Urbana Champaign
Journal
Project Page
2. Currently: No easy or direct way of finding
entities (e.g. products, people, businesses)
based on online opinions
You need to read opinions about different
entities to find entities that fulfill personal
criteria
e.g. finding mp3 players with ‘good sound quality’
3. Currently: No easy or direct way of finding
entities (e.g. products, people, businesses)
based on online opinions
You need to read opinions about different
entities to find entities that fulfill personal
criteria
(e.g. finding mp3 players with ‘good sound quality’
Time consuming process & impairs
user productivity!
4. Use existing opinions to rank entities based on
a set of unstructured user preferences
Example of user preferences:
Finding a hotel: “clean rooms, heated pools”
Finding a restaurant: “authentic food, good ambience”
5. Most obvious way: use results of existing
opinion mining methods
Find sentiment ratings on various aspects
▪ For example, for an mp3 player: find ratings for screen, sound,
battery life aspects
▪ Then, rank entities based on these discovered aspect ratings
Problem is that this is Not practical!
▪ Costly – It is costly to mine large amounts of textual content
▪ Prior knowledge – You need to know the set of queriable
aspects in advance. So, you may have to define aspects for
each domain either manually or through text mining
▪ Supervision – Most of the existing methods rely on some form
of supervision like the presence of overall user ratings. Such
information may not always be available.
6. Leverage Existing Text Retrieval Models
Why?
Retrieval models can scale up to large amounts of
textual content
The models themselves can be tweaked or
redefined
This does not require costly information extraction
or text mining
7. Leveraging robust text retrieval models
Indexed
rank
Entity 1 Entity 1
Reviews
rank retrieval User Preferences
Entity 2 models (query)
Entity 2
Reviews (BM25, LM, PL2)
rank
Entity 3 Entity 3
Reviews Keyword match
between user prefs
& textual reviews
8. Leveraging robust text retrieval models
Indexed
rank
Entity 3 Entity 3
Reviews
rank retrieval User Preferences
Entity 2 models (query)
Entity 2
Reviews (BM25, LM, PL2)
rank
Entity 1 Entity 1
Reviews Keyword match
between user prefs
& textual reviews
9. Based on the basic setup, this ranking problem seems
similar to regular document retrieval problem
However, there are important differences:
1. The query is meant to express a user's preferences in keywords
Query is expected to be longer than regular keyword queries
Query may contain sub-queries expressing preferences for different
aspects
It may actually be beneficial to model these semantic aspects
2. Ranking is to capture how well an entity satisfies a user's
preferences
Not the relevance of a document to a query (as in regular retrieval)
The matching of opinion/sentiment words would be important in
this case
10. Investigate use of text retrieval models for the
task of Opinion-Based Entity Ranking
Explore some extensions over IR models
Propose evaluation method for the ranking task
User Study
To determine if results make sense to users
Validate effectiveness of evaluation method
11. In standard text retrieval we cannot distinguish
the multiple preferences in a query.
For example: “clean rooms, cheap, good service”
Would be treated as a long keyword query even
though there are 3 preferences in the query
Problem with this is that an entity may score highly
because of matching one aspect extremely well
To improve this:
We try to score each preference separately and then
combine the results
12. Aspect Queries
“clean rooms, cheap, “good
“clean rooms” “cheap”
service”
good service”
scored
retrieval model separately
retrieval model
result set 1 result set 2 result set 3
Results
results
Results
combined
13. In standard retrieval models the matching of
an opinion word & a standard topic word is
not distinguished
However, with Opinion-Based Entity Ranking:
It is important to match opinion words in the
query, but opinion words tend to have more
variation than topic words
Solution: Expand a query with similar opinion
words to help emphasize the matching of opinions
14. Similar Meaning to
Fantastic battery life “Fantastic battery life”
Query
Good battery life
Great battery life
Excellent battery life
Review documents
15. Similar Meaning to
Fantastic battery life “Fantastic battery life”
Query
Add synonyms of
Good battery life
word “fantastic”
Fantastic, good, Great battery life
great,excellent…
battery life
Excellent battery life
Expanded Query
Review documents
17. Document Collection:
Reviews of Hotels – Tripadvisor
Reviews of Cars – Edmunds
Numerical
aspect ratings
Gold
standard
Free text reviews
18. Gold Standard:
Needed to asses performance of ranking task
For each entity & for each aspect (in dataset):
Average numerical ratings across reviews. This will
give the judgment score for each aspect
Assumption:
Since the numerical ratings were given by users,
this would be a good approximation to actual
human judgment
19. Gold Standard:
Ex. User looking for cars with “good performance”
Ideally, the system should return cars with
▪ High numerical ratings on performance aspect
▪ Otherwise, we can say that the system is not doing well in
ranking
Should have high
ratings on
performance
20. User Queries
Semi synthethic queries
Not able to obtain natural sample of queries
Ask users to specify preferences on different aspects
of car & hotel based on aspects available in dataset
▪ Seed queries
▪ Ex. Fuel: “good gas mileage”, “great mpg”
Randomly combine seed queries from different
aspects forms synthetic queries
▪ Ex. Query 1: “great mpg, reliable car”
▪ Ex. Query 2: “comfortable, good performance”
21. Evaluation Measure: nDCG
This measure is ideal because it is based on
multiple levels of ranking
The numerical ratings used as judgment scores has
a range of values and nDCG will actually support
this.
22. Users were asked to manually determine the relevance
of system generated rankings to a set of queries
Two reasons for user study:
Validate that results made sense to real users
On average, users thought that the entities retrieved by the
system were a reasonable match to the queries
Validate effectiveness of gold standard rankings
Gold standard ranking has relatively strong agreement
with user rankings. This means the gold standard based on
numerical ratings is a good approximation to human
judgment
23. Most effective Most effective
on BM25 (p23) on BM25 (p23)
8.0% Hotels 2.5% Cars
6.0% 2.0%
1.5%
4.0%
1.0%
2.0% 0.5%
0.0% 0.0%
PL2 LM BM25 PL2 LM BM25
QAM QAM + OpinExp QAM QAM + OpinExp
Improvement in ranking using QAM
Improvement in ranking using QAM + OpinExp
24. Lightweight approach to ranking entities based
on opinions
Use existing text retrieval models
Explored some enhancements over retrieval
models
Namely opinion expansion & query aspect modeling
Both showed some improvement in ranking
Proposed evaluation method using user ratings
User study shows that the evaluation method is sound
This method can be used for future evaluation tasks
Notas do Editor
So this long keyword query will be split into 3 separate queries. Each called an aspect query.These aspect queries are scored separately and the results are then combined.
-
-for each entity, average the numerical ratings of each aspect-assumption: this would be a good approximation to human judgment
Otherwise, this tells you that the system is not really doing well in ranking.
-could not obtain natural queries, so we used semi synthetic queries.-what we did was-and then we randomly combined queries…to form a set of queries.
Then finally we conducted a user study where users were asked to manually determine the relevance of the the sysGen results to query. This is to validate that the results made sense to real usersAnd also to validate the effectiveness of the gold standard rankings which is based on the…Based on this we found that…Which means that this evaluation method can be safely used for similar ranking tasks…