1. Search Engine Technology.
Project – Feature-based Opinion Extraction from Amazon reviews.
Ravi Kiran Holur Vijay – rh2424
Contents
• Abstract
• Motivation
• System Description
• Evaluation
• Tools & Data
• Important Files
• Walkthrough – Using the System
• Walkthrough – Evaluating the System
Abstract.
The goal of this project is to develop a software tool that can generate ratings for individual features of a
product from its opinionated reviews, i.e, given a set of reviews about a product; we can obtain a set of
features and its ratings.
Motivation.
The large number of online review sites put a lot of useful and relevant information within a consumer’s
reach. These reviews can be used to compare offerings by different competitors and consequently to
make an informed decision about buying a particular offering. But, for a typical consumer, making this
decision would turn out to be difficult for the following reasons:
• The consumer might not be familiar with the various metrics used to compare the offerings in
that particular domain.
• The consumer might have to read a lot of reviews to get an overview of the product and its
features as reading just a few reviews might not help if they are all biased similarly.
2. Therefore, it would turn out to be helpful if we can somehow:
• Pick out the right metrics that could be useful indicators of the product’s performance, specific
to its domain.
• Summarize the opinions about these important metrics which can be obtained from the large
number of reviews into a couple of positive and negative points.
These observations in turn led to my decision to develop a software tool that could do precisely what
was stated above.
System Description.
At the highest level, the system accomplishes the following tasks:
• Gather reviews about the product from Amazon.com.
• Select a set of product features to rate on.
• Determine the ratings for the selected features based on the sentiment of the sentence in which
it appears.
• Summarize the ratings for the features as the total number of positive and negative points for
each of the review.
The techniques implemented were adapted from the paper “Minqing Hu and Bing Liu. "Mining and
summarizing customer reviews". Proceedings of the ACM SIGKDD International Conference on
Knowledge Discovery & Data Mining (KDD-2004, full paper), Seattle, Washington, USA, Aug 22-25,
2004”. Here’s a snapshot of the general system architecture proposed by Minqing Hu and Bing Liu.
3. Figure 1 - Architecture of the system.
Here’s a breakdown of each step and the implementation details for that step:
• Review collection: There are many sources on internet that provide reviews about products. I
choose to pull out reviews from Amazon because of the large domain it covers and the large
number of choices it offers for the consumer. It also has considerable number of different
reviews for each of the items. The reviews are obtained using Amazon’s Web Services API,
whereby we get an XML response. This XML file is later parsed to obtain the reviews. The system
currently fetches upto 20 pages of reviews, with 5 reviews per page. This option can be changed
to any integer.
• Sentence segmentation and POS tagging: I have used the NLProcessor program to accomplish
this step. This program is available for both windows and unix. Once we have the product
reviews, we run the reviews through the NLProcessor software to obtain an output in the
format defined by NLProcessor.
• Frequent feature identification: All the nouns and noun phrases occurring in each sentence are
chosen as candidate features and are aggregated into a transaction file. A variant of Apriori
4. algorithm is then run on this to identify the features that are frequently commented upon, with
the hope that these are the features that really matter for the product. For the Apriori algorithm
part, a package from CPAN named “Data::Mining:AssociationRules” is used. From this, we get a
set of frequent patterns which might be candidate features for the product.
• Feature Pruning: Once we have a set of candidate features, we can use a couple of heuristics for
removing some items that might not be a relevant feature. I have implemented the
Compactness and Redundancy pruning heuristics, as described in the paper by “Minqing Hu and
Bing Liu”.
• Opinion Words Extraction: Now, we have a set of product features and we need to identify the
opinion words that describe them. For this, we extract the adjectives that are within some fixed
distance from each of the feature words. Thus, we get a list of adjectives describing each of the
features.
• Opinion Orientation Identification: Once we have a set of opinion words, we need to calculate
its orientation i.e. whether the opinion word is expressing a positive or a negative opinion. For
this, I have used the data from Sentiwordnet, as described by “Andrea Esuli and Fabrizio
Sebastiani. SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. In
Proceedings of LREC-06, 5th Conference on Language Resources and Evaluation, Genova, IT,
2006, pp. 417-422”. I have written 2 modules which collect data from either the locally available
database or from the web by parsing HTML output generated by Sentiwordnet. By default, I will
be using the locally available copy of Sentiwordnet. Given a word, it gives us a score for
positivity, negativity and neutrality.
• Opinion sentence orientation identification: Now that we have the orientations of individual
opinion words, we can try to estimate the orientation of the sentence containing them. For this,
I have implemented the algorithm described in the paper by “Minqing Hu and Bing Liu”. Only
the sentences that contain at least one feature word are considered.
• Opinion Summarization: We can calculate the total number of positive and negative sentences
that describe each of the features. The features are ranked first by the number of terms they
contain and then by the number of times they appear in the reviews (frequency). So, we have a
tuple of <Feature, Positive scores, Negative scores>.
Evaluation.
I carried out a basic evaluation of the system as follows:
• Obtained the hand-annotated dataset by “Minqing Hu and Bing Liu” from
http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html.
• Extracted the manually identified features.
• Extracted the features from the reviews automatically using the software tool.
• Extracted the manually identified opinion sentences.
• Extracted the opinion sentences automatically using the software tool.
• For more details, please take a look at the paper by “Minqing Hu and Bing Liu”.
5. • Now that we have the set of actual features and sentences, along with the automatically
retrieved features and sentences, we can calculate the Precision and Recall measures.
Here are the results of running the evaluation program as described above:
Product No. of No. of Precision Recall No. of No. of Precision Recall for Accuracy
annotated extracted for for annotated extracted for sentences for
Features features features features sentences sentences sentences sentences.
Camera1 106 78 0.295 0.217 239 400 0.42 0.703 0.60
Camera2 75 93 0. 162 0.2 160 266 0.451 0.75 0.67
DVD 116 61 0.345 0.181 344 463 0.523 0.70 0.60
Player
Cell 111 83 0.35 0.26 265 352 0.59 0.78 0.70
Phone
Mp3 190 78 0.372 0.153 720 1100 0.46 0.70 0.57
Player
Figure 2 - System Evaluation
• A very important comment I would like to make is that these results appear to be lower than
that obtained by “Minqing Hu and Bing Liu”. The reason is that they have considered only a
subset of the manually annotated features for each of the products, as can be seen from their
feature counts. Whereas the evaluation that I have documented includes all of the annotated
features, including the implicit features (like “size” in “the phone fits in my pocket”) and those
requiring pronoun resolution (like size and mobile in “it fits in my pocket”). Also, they have not
documented what subset of features they considered during their evaluation in order to reduce
the feature set to the numbers they have tabulated.
• Another point worth nothing is the difference in techniques used to calculate the orientation
of each feature. In the paper by “Minqing Hu and Bing Liu”, they use an algorithm based on
WordNet and an initial set of seed adjectives, whereas I am using the Sentiwordnet database for
the same task.
Tools and Data.
I have used the following third-party tools and libraries:
• Data::Mining:AssociationRules for mining association rules.
(http://search.cpan.org/~dfrankow/Data-Mining-AssociationRules-
0.10/lib/Data/Mining/AssociationRules.pm).
• NLProcessor for POS tagging and sentence segmenting
(http://www.infogistics.com/textanalysis.html).
• SentiWordNet for calculating orientation of individual words. Andrea Esuli and Fabrizio
Sebastiani. SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. In
Proceedings of LREC-06, 5th Conference on Language Resources and Evaluation, Genova, IT,
2006, pp. 417-422. (http://sentiwordnet.isti.cnr.it)
6. • Amazon web services API for extracting reviews from Amazon
(http://docs.amazonwebservices.com/AWSECommerceService/latest/DG/).
• “Minqing Hu and Bing Liu. "Mining and summarizing customer reviews". Proceedings of the
ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-2004, full
paper), Seattle, Washington, USA, Aug 22-25, 2004”.
Important Files.
• FeatureExtraction.pm: The module that contains all the methods required to process reviews
and give out summary for each of the features.
• SystemEval.pm: The module that contains the methods for evaluating the system using
Precision and Recall measures.
• ExtractReviews.pl: A command line client using the API’s provided by FeatureExtraction module
for generating features and their ratings.
• Evaluate.pl: A command line client using the API’s provided by SystemEval module for
evaluating the system.
• SentiWordNet_1.0.1.txt: The SentiWordNet database containing positive and negative scores
for words.
• eval_reviews and eval_results: Contains some reviews in the annotated format and also the
results of running the evaluation program on those files.
• FeatureExtraction.html: POD2HTML format documentation for the FeatureExtraction module.
• SystemEval.html: POD2HTML format documentation for the SystemEval module.
A demo walkthrough using the system.
• Verify the prerequisites: The following libraries should be available either in the program’s
directory or in the Perl’s Lib directory.
o DataMiningAssociationRules.pm.
o SentementFeatureExtraction.pm, SentementSystemEval.pm, SentementData
directory.
o SentiWordNet_1.0.1.txt in the Program’s directory.
o LWP::Simple Perl library.
o POSIX Perl library.
• The following external programs must be installed.
o NLProcessor from http://www.infogistics.com/demos/
o NLProcessor should be working, else we will get some weird errors in our program.
o I have included the archive as well as the installation instructions.
• Obtain the ASIN: We need a product to mine opinions for. For this, visit Amazon.com using any
internet browser and browse to the product you are interested in. For the purpose of this demo,
I am interested in the product “Canon Digital Rebel XSi 12.2 MP Digital SLR Camera with EF-S 18-
55mm f/3.5-5.6 IS Lens (Black)”. Once we are on the item’s page, search for the item’s ASIN. Just
7. search for the string “asin:” on the product’s page and you should have it. For the product
mentioned above, the ASIN is “B0012YA85A”.
• Run the extraction and rating script: ExtractReviews.pl <ASIN> <Output file> <NLProcessor>
o ASIN of the product from Amazon.
o Output file to write the results to.
o Full path to the NLProcessor executable program.
o In our case, I used the following command - perl ExtractReviews.pl "B0012YA85A"
"features_canonrebel.txt" "c:nlpbinnlp.cmd"
o Now, we have the output in the file “features_canonrebel.txt” in the format: feature,
number of positive ratings, number of negative ratings.
• Since the format is CSV, we can easily import the data into Matlab and get some fancy plots.
Here’s what we can do:
o Copy the features output file (features_canonrebel.txt) and the Matlab visualization
script (createfigure.m) into Matlab’s work directory or any other directory of your
choice.
o Start Matlab and run the visualization script on the output features file.
createfigure(<featured file>,<top ‘n’ features to include>
eg: createfigure(‘features_canonrebel.txt’, 10)
If everything goes fine, we can see a graphical display of the feature ratings. As
indicated by the legend, the Green bars indicate number of positive reviews and
the Red bar indicates number of negative reviews. The numbers 1 … 10
corrospond to the features in the feature file (specifically, the line number in the
features file).
Figure 3 - Top 50 features
8. Figure 4 - Top 10 Features
A demo walkthrough for evaluating the system.
• Identify the annotated review file: I have included some sample reviews in the “eval_reviews”
folder. The reviews should be in the format as described in “Minqing Hu and Bing Liu. "Mining
and summarizing customer reviews". Proceedings of the ACM SIGKDD International Conference
on Knowledge Discovery & Data Mining (KDD-2004, full paper), Seattle, Washington, USA, Aug
22-25, 2004”. I obtained these files from
http://www.cs.uic.edu/~liub/FBS/CustomerReviewData.zip. For this demo, let’s select
“camera1.txt”.
• Run the evaluation script: Evaluate.pl <annotated reviews> <NLProcessor command>.
o perl Evaluate.pl "eval_reviewsmp3player.txt" "c:nlpbinnlp.cmd" > mp3player.txt.
• The system will be automatically evaluated and we get the values for precision and recall at
both the feature and the sentence levels.
• Here’s a sample output from the command.
o For features ...
o Precision = 0.371794871794872 ... Recall = 0.152631578947368
o For Sentences ...
o Precision = 0.457194899817851 ... Recall = 0.698191933240612 ... Accuracy =
0.567729083665339
We now know how to process reviews as well as how to evaluate the system through practical
walkthroughs.