Product Review Summarization and Sentiment Analysis

Summarization and Opinion
Detection In Product Reviews
Team :
Suman Papanaboina (p.suman@students.iiit.ac.in)
Swapnil Patil (swapnil.patil@students.iiit.ac.in)
Shubham Srivastava (shubham.srivastava@students.iiit.ac.in)
Spandana Otra (otra.spandana@students.iiit.ac.in)
Project Mentor:
Aditya Joshi (aditya.joshi@research.iiit.ac.in)

Project Motivation
• As e-commerce is becoming more and more
popular, the number of customer reviews that
a product receives grows rapidly.
• For a popular product, the number of reviews
can be in hundreds or even

Project Motivation
This makes it difficult for a
potential customer to read them
to make an informed decision
on whether to purchase the
product.
It also makes it difficult for the
manufacturer of the product to
keep track and to manage
customer opinions .

Project Objective
• Providing Structured feature based summary
for the new customer by mining reviews.

How it is different from Traditional
Summarization?
• We only mine the features of the product on
which the customers have expressed their
opinions and whether the opinions are positive
or negative.
• We do not summarize the reviews by selecting a
subset or rewrite some of the original sentences
from the reviews to capture the main points as in
the classic text summarization.

End-to-End Architecture
Crawler
UI
Rest Service
Sentence
Splitter/Preprocesser
Feature/Opinion
Extractor
Frequent Feature
Identifier
Feature Pruner
Sentiment
Analyzer
Persistence
Summarizer
MySQl

Crawler Module
Flipkart
Jsoup Scraping
Tool
Persister
MySQL
Crawled below information
Product Name
Rating
Review Comment
Commented User
Commented Date/Time

Sentence Splitter/Preprocessor
Review
Sentence
Splitter
OpenNLP
MySQL
Persister
Sentence
Preprocessor
Stop words
filter
Stemming

Feature/Opinion Extractor Module
Sentence
Stanford
Dependency
Parser
Extract
nusbj, amod, n
n
Find any
negations
Persister
MySQL

Feature/Opinion Extractor Module
• Used stanford dependency parser
• Extract only nsubj, amod, nn pairs. These pairs
turns out to be the required feature/opinion
pairs.
• Identify any negations expressed and adjust
the opinion accordingly.

Frequent Feature Identification
• We defined frequent feature as a feature
which appears in more than 3 sentences (this
parameter can be configured).
• We used Apache Mahout library to find
frequent patterns.

Frequent Feature Identification
Features
Mahout Frequent
Pattern Miner
Sentences
FP-Grwoth/Fp-tree
Frequent Features Persister
MySQL

Redundancy Pruning
• We defined a feature X as redundant feature if
• X is a part of another feature
• And the feature X does not appear on its own at least in
3 sentences (threshold is configurable, currently in our
system we configured it as 3)
• After implementing this technique we are able
to eliminate redundant features like
battery, life, battery life.

Redundancy Pruning
Redundancy
Pruner
Battery, life, batter
life
Battery life

Junk Features
• Some of the reviews we have sentences like Flipkart
services are awesome in this case our system is
extracting service as feature and awesome as
opinion.
Frequent Features
Junk Feature
Pruner
Junk Feature File
Output Featues

Sentiment Analysis
Opinion Words
Sentiment
Analyzer
SentiWordnet
Positive Seed List Negative Seed List

Summarizer
• Summarizer generated feature based
structured summary as shown below.

Feature Summary Rest Service
• We implemented Rest service to provide
following functionalities to the UI.
– Find List of categories in the system
– Find list of products for a given category
– Find feature based summary for a given product
• We used Grizzly embedded container to implement
rest service.

Screen Shots/Feature based summary

Screenshots/Individual sentences

Evaluation
No. of feature-opinion pairs manual extracted 20
No. of initial feature-opinion pairs extracted by our
system
40
After frequent pattern mining 25
After pruning (final stage) 18
No. of correct feature-opinion pairs 15
No. of incorrect feature-opinion pairs 3
Precision 15/20 (75%)
Recall 18/20 (90%)
F1-Measure ( 2*precision*recall)/(precision+recall) 0.81

Conclusion
• It is a great learning experience for all of us. we
are really excited in applying data mining and
natural processing techniques to implement the
system.
• We do believe that this system can help users to
quickly identify what is good/bad in a product
basing on other user comments. It also provides a
better perspective of user’s comments to the
Manufacturers which can aid in proving business
intelligence.

Future Enhancements
• We need to add more rules to improve overall accuracy of
the feature/opinion identification.
• Migrate entire system to run on Hadoop YARN using Hbase
instead of Mysql.
• Try unsupervised/supervised machine learning approaches
for feature/opinion identifications.
• Replace our home grown Crawler with more robust and
opensource crawler Apache Nutch
(https://nutch.apache.org/)

Product Review Summarization and Sentiment Analysis

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (10)

Semelhante a Product Review Summarization and Sentiment Analysis

Semelhante a Product Review Summarization and Sentiment Analysis (20)

Último

Último (20)

Product Review Summarization and Sentiment Analysis