Product Review Summarization and Sentiment Analysis
1. Summarization and Opinion
Detection In Product Reviews
Team :
Suman Papanaboina (p.suman@students.iiit.ac.in)
Swapnil Patil (swapnil.patil@students.iiit.ac.in)
Shubham Srivastava (shubham.srivastava@students.iiit.ac.in)
Spandana Otra (otra.spandana@students.iiit.ac.in)
Project Mentor:
Aditya Joshi (aditya.joshi@research.iiit.ac.in)
2. Project Motivation
• As e-commerce is becoming more and more
popular, the number of customer reviews that
a product receives grows rapidly.
• For a popular product, the number of reviews
can be in hundreds or even
3. Project Motivation
This makes it difficult for a
potential customer to read them
to make an informed decision
on whether to purchase the
product.
It also makes it difficult for the
manufacturer of the product to
keep track and to manage
customer opinions .
5. How it is different from Traditional
Summarization?
• We only mine the features of the product on
which the customers have expressed their
opinions and whether the opinions are positive
or negative.
• We do not summarize the reviews by selecting a
subset or rewrite some of the original sentences
from the reviews to capture the main points as in
the classic text summarization.
10. Feature/Opinion Extractor Module
• Used stanford dependency parser
• Extract only nsubj, amod, nn pairs. These pairs
turns out to be the required feature/opinion
pairs.
• Identify any negations expressed and adjust
the opinion accordingly.
11. Frequent Feature Identification
• We defined frequent feature as a feature
which appears in more than 3 sentences (this
parameter can be configured).
• We used Apache Mahout library to find
frequent patterns.
13. Redundancy Pruning
• We defined a feature X as redundant feature if
• X is a part of another feature
• And the feature X does not appear on its own at least in
3 sentences (threshold is configurable, currently in our
system we configured it as 3)
• After implementing this technique we are able
to eliminate redundant features like
battery, life, battery life.
15. Junk Features
• Some of the reviews we have sentences like Flipkart
services are awesome in this case our system is
extracting service as feature and awesome as
opinion.
Frequent Features
Junk Feature
Pruner
Junk Feature File
Output Featues
18. Feature Summary Rest Service
• We implemented Rest service to provide
following functionalities to the UI.
– Find List of categories in the system
– Find list of products for a given category
– Find feature based summary for a given product
• We used Grizzly embedded container to implement
rest service.
24. Evaluation
No. of feature-opinion pairs manual extracted 20
No. of initial feature-opinion pairs extracted by our
system
40
After frequent pattern mining 25
After pruning (final stage) 18
No. of correct feature-opinion pairs 15
No. of incorrect feature-opinion pairs 3
Precision 15/20 (75%)
Recall 18/20 (90%)
F1-Measure ( 2*precision*recall)/(precision+recall) 0.81
25. Conclusion
• It is a great learning experience for all of us. we
are really excited in applying data mining and
natural processing techniques to implement the
system.
• We do believe that this system can help users to
quickly identify what is good/bad in a product
basing on other user comments. It also provides a
better perspective of user’s comments to the
Manufacturers which can aid in proving business
intelligence.
26. Future Enhancements
• We need to add more rules to improve overall accuracy of
the feature/opinion identification.
• Migrate entire system to run on Hadoop YARN using Hbase
instead of Mysql.
• Try unsupervised/supervised machine learning approaches
for feature/opinion identifications.
• Replace our home grown Crawler with more robust and
opensource crawler Apache Nutch
(https://nutch.apache.org/)