Most of us use e-shopping (Any product) these days and refer its rating or reviews before we download or buy that product. Amazon/Play store provide a great number of products but unfortunately few of those product reviews are fraud. Hence such products must be marked, so that they will be recognizable for rest of the users. Here we are comparing reviews from two sites so that we can get more clear idea. We can get higher probability of getting real reviews if we take data from multiple sites. We are proposing a system to develop an android application that will take reviews from two different websites for single product, and analyze them with NLP for positive or negative rating. In this, user will give two different URLs of two different sites for same product to the system as input. For every URL reviews and comments will be fetched separately and analyzed with NLP for positive negative rating. Then their rating will be combined together with average to give final rating for the product. As we are handling the big data here, we are using Hadoops map reduce. So it will be easier to decide which product reviews are fraud or not.
1. IJSRD - International Journal for Scientific Research & Development| Vol. 3, Issue 10, 2015 | ISSN (online): 2321-0613
All rights reserved by www.ijsrd.com 408
Detection of Fraud Reviews for Products
Dr. S.K. Pathan1
Janhavi Sankpal2
Pooja Sankpal3
Kanchan Haral4
1
Professor 2,3,4
Student
1,2,3,4
Department of Computer Science
1,2,3,4
Smt. Kashibai Navale College of Engineering, Pune
Abstract— Most of us use e-shopping (Any product) these
days and refer its rating or reviews before we download or
buy that product. Amazon/Play store provide a great number
of products but unfortunately few of those product reviews
are fraud. Hence such products must be marked, so that they
will be recognizable for rest of the users. Here we are
comparing reviews from two sites so that we can get more
clear idea. We can get higher probability of getting real
reviews if we take data from multiple sites. We are
proposing a system to develop an android application that
will take reviews from two different websites for single
product, and analyze them with NLP for positive or negative
rating. In this, user will give two different URLs of two
different sites for same product to the system as input. For
every URL reviews and comments will be fetched
separately and analyzed with NLP for positive negative
rating. Then their rating will be combined together with
average to give final rating for the product. As we are
handling the big data here, we are using Hadoops map
reduce. So it will be easier to decide which product reviews
are fraud or not.
Key words: Primal Text Mining and Pre-processing, NLP,
Mapping Text Data
I. INTRODUCTION
This article is provided for the people who download or buy
products online. Before they proceed with their purchase,
review of that product is a key aspect. There is no means by
which they can be sure that the given reviews are authentic.
A database is provided through which we can map the
comments by which we can conclude which of the
comments are positive and which are negative. User can
check the reviews of the same product from two different
sites. User will provide the URL of the one product from
two different sites. After analysing the reviews and mapping
with the database result will be generated. The generated
result will be in the form of positive or negative and in terms
of percentage and a graph. Using NLP we can process the
reviews and generate results. To map the data with the
database Hadoop’s MapReduce method can be used.
II. EXISTING WORK
Many of the sites just provide ranking, rating and reviews.
They do not assure its authenticity. There is a need to cross
check if the commented reviews are original or they are just
for bumping up the product to gain popularity.
On this domain there are exist some related work,
such as web ranking spam detection [1], online review spam
detection [2] and mobile App recommendation systems [3],
the issue in detecting ranking of the fraud for mobile
applications or any online product is still unfathomed.
Currently, the most popular and widely used
Online e-shopping sites provide facility for buyers to view
review of the product they are interested to buy and for the
people already bought products to share their shopping
experience and rate or review the product. This helps future
buyers to have the base of reference to which they can refer
before buying products online. Even on platforms like Play
store, one gets an opportunity to study reviews before
actually downloading the application. However, as the
positive side enables us to these facilities of studying the
experiences of users on the other hand there also raises
questions about authenticity of these reviews. In this digital
world it has become easier to rate any products online.
If we consider real-world observations, we find that
each review is always associated with a respective untapped
topic. For instance, some reviews might be related to the
untapped topic “worth a try” while some might be related to
the untapped topic “not so good”. At the same time,
different buyers have different personal preferences of
mobile applications or different site preferences for buying
products. A product or an application may have different
topic distributions in their historical review records.
Plausibly, the topic distribution of reviews, be it of a product
or an application, in a normal leading session of application,
should be consistent with the topic distribution in all
historical review records of that application. And so applies
to the product reviews. The possible cause to this are the
reviews of the topics that are based on the user’s personal
usage experiences and choices but not the popularity of
mobile applications. On the other hand, if the reviews of
leading sessions have been manipulated, the two topic
distributions will be prominently different. For example,
there may contain more positive topics, such as “worth a
try”, “popular” and “good” in the leading session.
III. DRAWBACKS OF THE EXISTING SYSTEM
The existing system cannot determine the authenticity of the
reviews which are posted for any product.
Buyers or the users are not able to recognize the
authenticity of the reviews given to a product and hence
they face dilemma whether the product is really good or bad.
The existing system does not detect fraudulent or deceptive
activities in reviews which might be just for gaining
popularity of that product.
For example, the developer or the brand which are
selling their products can themselves rate the products to
manipulate consumer’s perception. Also the rivals or the
opponents can post negative reviews for inflating the
product’s reputation.
Table 1
After using these search words both, positive
keywords classification and negative keywords
classification are used which leads to the generation of result
from the keywords in the database [10].
2. Detection of Fraud Reviews for Products
(IJSRD/Vol. 3/Issue 10/2015/082)
All rights reserved by www.ijsrd.com 409
A. Primal Text Mining and Pre-processing:
Many software’s / algorithms like Rapid-miner, WorldNet,
and data collection and processing trees have been
developed to test common positive and negative words.
Fig. 1: A sample architecture [10]
The URLs from two different sites for the same
product are taken as input in the system. The reviews from
those sites are scanned and mapped into the database to get
the result in terms of positive and negative percentage. A
database is maintained to map the words which are used in
the comments to check the positivity or the negativity of the
comments or reviews. Based on these results, a user can
roughly get an idea of whether the product (application or
any product) is worth buying or downloading.
B. Mapping Text Data:
The final word set will be pruned using the above mentioned
methods, with the results displayed in Table 1,2 which have
separate column of both positive and negative words.
Positive Negative
Agree Bad
Appreciate Cannot
Beneficial Damage
Comfort Dangerous
Ease Depression
Easier Died
Enjoy Difficult
Good Error
Great Hard
Greatest Failure
Help Impossible
Hope Lack
Important No
Thanks Not
Superb Poor
Yes Weak
Table 2: Positive and Negative Classification
of Words from Posts
IV. PROPOSED SYSTEM
Our main aim is to provide users with the authenticity of
product reviews and authenticity of application reviews.
This will enable user to have a clearer view towards the
quality and reliability of the product or application.
We therefore propose a system which will perform
the following tasks: - 1) System that will collect all the
information about particular application/sites such as any
comment or reviews about it. 2) Use NLP for processing
those comments and reviews. 3) Use Map reduce Algorithm
for querying. Hence we can detect whether the reviews
fraud or not.
The system uses the following basic concepts: -
A. Natural Language Processing:
It is a field of computer science, artificial intelligence, and
computational linguistics concern with interactions between
computers and human languages.
B. Tokenization:
It transforms a stream of characters into a stream of
processing units called tokens.
C. Stop Words Filtering:
It consists in eliminating stop words. Stop words or words
are filtered out before or after natural language processing.
For example, “is”,”the”,”at”,”who”,”which”.
D. Stemming:
It is the process of reducing derived words into their root
form. The related words should match to the same stem.
For example, the words “drove”, “drives”, “driven” would
fall under there root word “drive”.
V. CONCLUSION
Thus, if the gathered data is processed and studied precisely
then it will lead to better judgment of the products, the
developers will no longer bring their applications to market
just for the sake of money, people will get idea about its
quality and reliability easily. Reviews will easily be
detected and would not be taken into consideration unless
they are truly genuine.
ACKNOWLEDGEMENTS
We take this chance to convey our sincere gratitude, respect
and deep regards to our Project guide, Dr.S.K.Pathan, for his
encouraging guidance, monitoring and constant motivation
throughout the process of writing this research paper.
The blessing, help and guidance given by him shall
from time to time, carry us a long way on the journey of life
which we are about to embark.
We are also competed to Prof.Pramod Mali, our
Project Coordinator, for the valuable information and
guidance provided by him.
REFERENCES
[1] Online:
https://en.wikipedia.org/wiki/Information_retrieval
[2] E.-P. Lim, V.-A. Nguyen, N. Jindal, B. Liu, and H. W.
Lauw, “Detecting product review spammers using
rating behaviours,” in Proc. 19th ACM Int. Conf.
Inform. Knowl. Manage., 2010, pp. 939–948
[3] A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly,
“Detecting spam web pages through content analysis,”
in Proc. 15th Int. Conf. World Wide Web, 2006, pp.
83–92.
[4] N. Spirin and J. Han, “Survey on web spam detection:
Principles and algorithms,” SIGKDD Explor.
Newslett., vol. 13, no. 2, pp. 50– 64, May 2012.
3. Detection of Fraud Reviews for Products
(IJSRD/Vol. 3/Issue 10/2015/082)
All rights reserved by www.ijsrd.com 410
[5] G. Heinrich, Parameter estimation for text analysis,
“Univ. Leipzig, Leipzig, Germany, Tech.Rep.,
http://faculty.cs.byu.edu/~ringger/CS601R/papers/Hein
rich-GibbsLDA.pdf, 2008.
[6] N. Spirin and J. Han, “Survey on web spam detection:
Principles and algorithms,” SIGKDD Explor.
Newslett., vol. 13, no. 2, pp. 50– 64, May 2012.
[7] B. Zhou, J. Pei, and Z. Tang, “A spamicity approach to
web spam detection,” in Proc. SIAM Int. Conf. Data
Mining, 2008, pp. 277–288.
[8] Z. Wu, J. Wu, J. Cao, and D. Tao, “HySAD: A semi-
supervised hybrid shilling attack detector for
trustworthy product recommendation,” in Proc. 18th
ACM SIGKDD Int. Conf. Knowl. Discovery Data
Mining, 2012, pp. 985–993.
[9] S. Xie, G. Wang, S. Lin, and P. S. Yu, “Review spam
detection via temporal pattern discovery,” in Proc. 18th
ACM SIGKDD Int. Conf. Knowl. Discovery Data
Mining, 2012, pp. 823–831.
[10]Altug Akay, Andrei Dragomir and Bj orn Erik
Erlandsson, Senior Member, IEEE. “Network Based
Modeling and Intelligent Data Mining of Social Media
for Improving Care”, IEEE Journal of Biomedical and
Health Informatics, Vol. 19, No. 1, January 2015.
[11]H. Zhu, E. Chen, K. Yu, H. Cao, H. Xiong, and J. Tian,
“Mining personal context-aware preferences for
mobile users,” in Proc. IEEE 12th Int. Conf. Data
Mining, 2012, pp. 1212–1217