SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
CA652A




    Semantic Web
    Based Sentiment
    Engine
    A system to determine online sentiment
    on current affairs for the purpose of
    analysis and prediction




                            11210889
                            52595354
                                 CA652A
ABSTRACT
Sentiment analysis involves classifying opinions from text as "positive", "negative" or
“neutral. Its purpose and benefit is to assist in extracting valuable information and insight
from copious amounts of unstructured data. This proposed system will have the capability to
determine online sentiment on current affairs for the purpose of analysis and prediction. For
the sentiment analysis a cluster-method approach is recommended, which is a recent
advancement in this area. Various APIs will assist in extracting other data such as location
and time. Evaluation of system through the use of the Pang et al movie review data sets is
recommended to validate basic functionality and real life data in the form of the 2008 US
presidential race data to evaluate all functionality of the system. Multiple industries are
identified as potential users of this system from marketing companies to hotels adding to our
benefit in the commercialisation potential of the system.




                                                                                 1|Page
A report submitted to Dublin City University, School of Computing for module

CA652: Information Access, 2011/2012.

We hereby certify that the work presented and the material contained herein is
my/our own except where explicitly stated references to other material are made


Student Numbers

52595354

11210889




                                                                        2|Page
TABLE OF CONTENTS
Abstract .................................................................................................................................... 1

Introduction ............................................................................................................................ 5

Concept Overview ................................................................................................................. 5

   Constraints and Limitations ............................................................................................ 5

Functional Description ......................................................................................................... 6

   Sentiment Search Functions............................................................................................... 6

   Techniques ........................................................................................................................... 6

       Time parameter Based Search ....................................................................................... 8

       Geographical Extraction Based ..................................................................................... 8

       Social Sentiment Extraction Based data ....................................................................... 9

       Graphical Data Generation Tools ................................................................................. 9

   Pros & Cons of proposed system ...................................................................................... 9

Evaluation Plan..................................................................................................................... 10

   Stage One Testing - Validation ..................................................................................... 10

   Stage Two Testing – Functionality Testing ................................................................ 11

   Stage Three Testing – Real Life Data ........................................................................... 11

Commercialisation Potential ............................................................................................. 13

Conclusion and Further Research Opportunities .......................................................... 14

References .............................................................................................................................. 15




                                                                                                                             3|Page
Table of Figures

Figure 1 - Sentiment Analysis framework ........................................................................... 7
Figure 2 - Cluster Method Accuracy/Efficiency ................................................................ 8
Figure 3 - Graphical Representation of content .................................................................. 9
Figure 4 - Basic Validation Testing Results ....................................................................... 11
Figure 5 - Two Topic Validation Testing ........................................................................... 11
Figure 6 - Sample Test Output (Obama) ............................................................................ 12
Figure 7 - Sample Test Data (McCain) ............................................................................... 13




                                                                                                        4|Page
INTRODUCTION
The ‘media’ as we now conceptualise it has changed dramatically. With the internet,
people have an opportunity to ‘weigh in’ on events, by providing their opinions, and
feedback and in real time through blogs, forum, social networks and commenting
systems on news websites. There is a growing interest in measuring sentiment that
can be contributed to the dramatic increase in the volume of digitized information.

“An increasing number of studies in political communication focus on the “sentiment” or
“tone” of news content, political speeches, or advertisements” (Young, L, & Soroka, S 2012)

This report discusses the concept of developing a Semantic Web based sentiment
engine that will be able to analyse public sentiment on current issues, from politics
to reality TV shows. Based on the analysis, tracking of popular opinion through
social media channels and leveraging research in the area of sentiment analysis,
accurate predictions could be made possible on events from presidential elections to
the X-Factor competition.


CONCEPT OVERVIEW
This proposed system is not a standard sentiment engine that returns static data; it
offers increased functionality to assist with data interpretation. By allowing end
users to customise their search, filter the returned data under multiple parameters
and have graphical representation of results to facilitate interpretation.


CONSTRAINTS AND LIMITATIONS
The limitations of this concept are not due to the technological constraints but are
simply down to the volatility of public opinion and that is something that cannot be
remedied or correcting by technology.

Another limitation is the scope of the opinion being captured. User groups of social
media and participants in online forums are statistical of a younger age group. The
lack of inclusion of the opinion of older age groups could greatly affect the accuracy




                                                                               5|Page
of the data as it would not be entirely representative – the impact of this imbalance
would particularly impact politics with older groups statistical more likely to vote.


FUNCTIONAL DESCRIPTION
SENTIMENT SEARCH FUNCTIONS

   •   Users can enter multiple search terms for the purpose of data comparison.
       Other features would be utilised to improve the analysis returns.
   •   Multiple Search Parameters
          o Time Frame Defined Search - Data retrieved can be limited to a specific
              time frame.
          o Geographical Location Based Search – Search data retrieved can be
              filtered by location of users
          o Narrow Search Scope – Select websites to exclude or restrict search to
              small number of websites.
   •   Graphical representations of the data are generated.

TECHNIQUES

Sentiment Analysis Techniques

There is much research in the area of sentiment analysis, the primary objective being
to find a technique where there is no trade-off between speed and accuracy. Several
new and emerging techniques have been researched as part of identifying the best fit
for this system.

   •   Proximity-Based Approach (Hasan, S, & Adjeroh, D 2011)
          o This proposed method uses proximity-based features to determine
              sentiment; proximity distribution, mutual information between
              proximity types, and proximity patterns.




                                                                            6|Page
•   Based on Annotation (Shukla, A 2011)
          o This proposed method counts all the annotation present, calculates
             sentiment scores of all annotation including comments to determine
             sentiments.


   •   Sentence-level Lexical Based Semantic Orientation (Khan, A et al, 2011)
          o This proposed method uses SentiWordNet to calculate the semantic
             ‘score’ of sentences it has classified as subjective from reviews and blog
             comments.


   •   Machine Learning approach to contextual information (YANG, C et al, 2008)
          o This proposed method differentiates itself from others by taking
             context into account when determining the sentiment category. Its
             primary focus and test data sets have been blog posts. Figure 1 below,
             shows the framework employed.




                        FIGURE 1 - SENTIMENT ANALYSIS FRAMEWORK


   •   Clustering-Based Sentiment Analysis Approach (Li, G, & Liu, F 2012)

The method deemed most appropriate for this proposed system was based on a
article from the Journal Of Information Science in April this year, which outlined the
Clustering-Based Sentiment Analysis approach. It proposed that by applying a “TF-
IDF weighting method, a voting mechanism and importing term scores, an acceptable and
stable clustering result can be obtained” (Li, G, & Liu, F 2012) The evaluation results


                                                                            7|Page
were the most impressive of all techniques reviewed as part of this research. It
appears to have performed well in terms of both accuracy and efficiency with no
need for human participation, as can be seen from figure 1.




                         FIGURE 2 - CLUSTER METHOD ACCURACY/EFFICIENCY


Apart from its accuracy and efficiency, this technique was deemed the most suitable
as it can be applied universally to any data set. Other techniques researched, have
been developed for particular data types, customer reviews or blogs and their
evaluation appraisals appear to suggest they do not perform as well outside of these
data types.

TIME PARAMETER BASED SEARCH

This sentiment engine would make use of the adaptible Librato API libraries to
allow sentiment returns to be time sensative. This would be in order for a user to
evaluate how sentiment is changing over time             or what sentiment was during
specific time periods.

GEOGRAPHICAL EXTRACTION BASED

Adding a geographical element would be a unique feature allowing for mapping of
sentiment results. Preferred location content will be pulled from the Twitter API as
it gives access to Twitter profile location. Comment systems used by news websites
etc. request a location prior to posting the comment like on the Irish Times website.
Facebook API allows access to location of user if the privacy setting is turned on.
OAUTH setting would be used to allow the users of the sentiment engine to explore
the opinions of their friends and networked associates and how it would fit on the
sentiment scales. Other free use location APIs may also be needed.




                                                                           8|Page
SOCIAL SENTIMENT EXTRACTION BASED DATA

The content used to create athematrix of information to evaluate sentiment within
via FLP would likely be the following but not limited to: Twitter; Disqus; Livefyre;
Intensedebate; Drupal comments; Wordpress comments; other blog posts; scraped
open facebook and fan page comments; facebook comment system; text comments;
G+ posts; Slideshare.net; Pinterest pins; Google News articles; various bookmarking
site comments like fark.com reddit; and other language relavent wire news services.

GRAPHICAL DATA GENERATION TOOLS

Graphical representations of the data are generated. The results could be rendered as
web-based flash objects or in way that is complient to the evolving HTML5
standards and be IOS 5 comlient given the anamosity Apple has with Adobe over
flash for results to be useful on mobile devices and tablets. These reports woud be
exportable to Crystal Reports.

          1600
          1400
          1200
          1000
           800                                                    Candidate A

           600                                                    Candidate B

           400
           200
             0
                    Postive        Neutral        Negative


                     FIGURE 3 - GRAPHICAL REPRESENTATION OF CONTENT


PROS & CONS OF PROPOSED SYSTEM

The primary argument for why sentiment engines via Semantic Web and linked data
are useful is based upon the new information and insight that can be gleaned from it.
The ability to know relative and positional sentiment can be useful in many anytical
or informational arbitrage situations.


                                                                                9|Page
In terms of the cons, primary concern would be data quality. Problems with data
quality are a huge issue and can skew any resulting analysis. The extent of the data
quality problem has been often discovered by information activists working in the
open data movement.

Secondly privacy concerns and staying within the spirit and letter of the relavent
data privacy laws of the regulatory regime you operate under may at times be an
issue. This can be tricky given the interconnected nature of the web.

Lastly, inaccuracies of data and it being organisied in “short sets” vs deeper data
may create false sentiments. Is their enough data being looked at to create a realist
postive or negative sentiment? Some additional analysis may need some addition
parsing to tease out, for example, intial heated emotion responses from the rationale
morning after response.


EVALUATION PLAN

STAGE ONE TESTING - VALIDATION
The evaluation plan would begin with simple software validation. The first test case
would consist of validating the fundamental functionality of the system, its ability to
differentiate between sentiments. The data set that’s to be used is the movie review
data from Pang et al experiments1 Movie review data is widely regarded as the most
challenging data for sentiment engines to analysis, this can be contributed to the fact
that a positive review may contain descriptions of gory or violent scenes and equally
a negative review could contain descriptions of light-hearted pleasant scenes. For
additional testing other data sets could be used for each iteration of this dynamic
testing stage



1   Pang B, Lee L, Vaithyanathan S. Thumbs up, Sentiment classification using
machine learning techniques. In: Conference on empirical methods in natural
language processing (EMNLP). Philadelphia, Pennsylvania, USA, 2002, p. 79.



                                                                          10 | P a g e
20%
                        39%
                                                                       Neutral
                                                                       Positive
                                                41%                    Negative




         .

                        FIGURE 4 - BASIC VALIDATION TESTING RESULTS


STAGE TWO TESTING – FUNCTIONALITY TESTING
The second stage of testing would be the validation of the multiple input
functionality; to ensure that data can be retrieved for two or more search terms and
also that they can be accurately differentiated. The test case for this would be built
on the first stage of testing with added content regarding a second movie etc.


          Schlinder's List                            The Usual Suspects



         39%    20%                                        20%   21%
                                   Neutral                                            Neutral

                  41%              Positive                                           Positive
                                                             59%
                                   Negative                                           Negative




                          FIGURE 5 - TWO TOPIC VALIDATION TESTING


STAGE THREE TESTING – REAL LIFE DATA
The final stage of the evaluation plan would be to perform testing using previous
high profile events as the test cases, such as the US Presidential Election of 2008 and


                                                                                  11 | P a g e
the X-Factor competition from previous years. This validation is more complex as it
will span the entire internet not just the staging website.

The testing would be performed over different time intervals, days, weeks, months,
and the entire duration of the event. In the case of the political elections these time
periods could be used to coincide with official opinion polls, for example Gallop and
Rasmussen state side or RedC for Irish based events.

Validation of the geographical based sentiment analysis function would be tested to
gauge the accuracy of the location results. In the case of the US Presidential Election
the final voting percentages for each candidate per state would give an accurate
basis for comparison.

SAMPLE EVALUATION TEST CASE

By taking the ten states where each candidate won by the largest percentage
majority, and graphing the percentage of votes each candidate received, and also the
percentage of positive, negative and neutral data regarding that candidate. What one
would expect in a fully evaluated system would be a close correlation between
positive data and the percentage of votes and also a correlation with the negative or
neutral data and the other candidate’s percentage of votes, as per the sample charts
below for Obama and McCain respectively.

         90
                                                         Obama’s Data
         80
         70                                                   Obama's Percentage
         60                                                   of Votes
         50                                                   McCain's Percentage
         40                                                   of Votes
         30                                                   Positive %
         20
         10                                                   Negative %
          0
                                                              Neutral %




                           FIGURE 6 - SAMPLE TEST OUTPUT (OBAMA)

                                                                              12 | P a g e
70
                                                        McCain’s Data
           60
                                                             McCain's Percentage
           50                                                of Votes

           40                                                Obama's Percentage
                                                             of Votes
           30                                                Positive %
           20
                                                             Negative %
           10

            0                                                Neutral %




                           FIGURE 7 - SAMPLE TEST DATA (MCCAIN)



COMMERCIALISATION POTENTIAL
In an era where both business and individuals are attempting to move further and
further to data driven decision sentiment engine products have a range of
commercial potential.

Some companies have already begun commercializing Semantic Web applications
like IBM licensing of their WebFountain Internet analytical engine to FActiva and
ThompsonReuters 2003 for example for those interested in corporate reputational
data.

Various market research for people who cannot afford Enterprise Resoruce Planning
(ERP) add ons like SAP Business Objects, SAS, or say LexisNexis Analytics and for
who the current available crop of free semantic sentiment engines (name a few from
those ten) tools are just insufficient, too niche, or unscalable (Basu, 2010). Semantic
Web products are becoming important in internal and external Business Inframatics.

However, information arbitrage is not merely for professional market traders. This
system would likely be a software as service (SaaS) on the web, it could be sold on a
free-mium basis or a monthly subscription or yearly license depending on the
implementation.


                                                                              13 | P a g e
Primary clients would depend on the sentiments needing to be parsed and the
proprietary and public data sets being used in within the sentiment engine.

Examples to be included: Corporate Media; Content Publishing industry; PR firms;
polling; market research firms; Trading platforms; Political Parties; Elections;
Government agencies; security services; and Bookmarkers for deciding odds on
Novelty bets - reality TV shows, politics etc.


CONCLUSION AND FURTHER RESEARCH OPPORTUNITIES
Where does the Semantic Web lead to exactly? We don’t really know, but opening
up the segregated data silos and making sense of deeper dark ‘big data,’ in pursuit
of the benefits of a deeper rooted “hyperdata” would be a nice path. However, the
road will be long but it may improve our day to day lives immensely.

       "Many applications and services claim to be "semantic" in one manner or another,
       but that does not mean they are "Semantic Web." Semantic applications include any
       applications that can make sense of meaning, particularly in language such as
       unstructured text, or structured data in some cases. By this definition, all search
       engines today are somewhat "semantic" but few would qualify as "Semantic Web"
       apps. (Spivak, 2007)

How we get from the early steps of Web 3.0 to this deeper data web will be a long
process. It will provide countless benefits, many of which we may not even percieve
today. However, sentiment engines are mearly one way to get the public and the
developer community interested and excited for all the other benefits that this open
data future could hold. For that reason sentiment engines will remain an important
component in the near term future, as “big data,” holds much of the future promise
to bring the of the “web of things” and make sense and use of them.




                                                                             14 | P a g e
REFERENCES
Abbasi, A, Hsinchun, C, & Salem, A 2008, 'Sentiment Analysis in Multiple
Languages: Feature Selection for Opinion Classification in Web Forums', ACM
Transactions On Information Systems, 26, 3, pp. 1-34, Computers & Applied Sciences
Complete, viewed 4 May 2012.

Basu, Saikat 2010. 10 Web Tools To Try Out Sentiment Search & Feel the Pulse Make
Use Of [Online] 30 April. http://www.makeuseof.com/tag/10-web-tools-sentiment-
search-feel-pulse/ [Accessed 1 May 2012]

Bergman, Mike 2010. I Have Yet to Metadata I Didn’t Like. AI3 [Online] 16 August.
http://www.mkbergman.com/902/i-have-yet-to-metadata-i-didnt-like/ [Accessed
1 May 2012]

Bollen, J. Mao, Huina. Zeng, Xiao-Jun March 2011. Twitter mood predicts the stock
market. Journal of Computational Science, 2(1), Pages 1-8 Available from:
http://arxiv.org/abs/1010.3003

Cai, K, Spangler, S, Ying, C, & Li, Z 2010, 'Leveraging sentiment analysis for topic
detection', Web Intelligence & Agent Systems, 8, 3, pp. 291-302, Academic Search
Complete, viewed 20 April 2012.

Dalton, Jeff 2007. Caffè Java Open Source NLP and Text Mining tools. Jeff's Search
Engine Caffé [Online] 16 March. http://www.searchenginecaffe.com/2007/03/java-
open-source-text-mining-and.html [Accessed 1 May 2012]

Hamouda, A, Marei, M, & Rohaim, M 2011, 'Building Machine Learning Based Senti-
word Lexicon for Sentiment Analysis', Journal Of Advances In Information Technology,
2, 4, pp. 199-203, Library, Information Science & Technology Abstracts with Full
Text, , viewed 1 May 2012.

Hasan, S, & Adjeroh, D 2011, 'Detecting Human Sentiment from Text using a
Proximity-Based Approach', Journal Of Digital Information Management, 9, 5, pp.




                                                                        15 | P a g e
206-212, Library, Information Science & Technology Abstracts with Full Text,              ,
viewed 7 May 2012.

Kang, H, Yoo, S, & Han, D 2012, 'Senti-lexicon and improved Naïve Bayes
algorithms for sentiment analysis of restaurant reviews', Expert Systems With
Applications, 39, 5, pp. 6000-6010, Academic Search Complete, , viewed 10 April
2012.

Lévy, Pierre CRC, FRSC 2007. Elements of Semantic Engineering I3 workshop / WWW
Consortium       Conference      /       Banff     2007        Available         from:
http://www.ieml.org/text/semantic_space.pdf

Li, G, & Liu, F 2012, 'Application of a clustering method on sentiment analysis',
Journal Of Information Science, 38, 2, pp. 127-139, Business Source Complete, ,
viewed 21 April 2012.

Pang B, Lee L, Vaithyanathan S. Thumbs up, Sentiment classification using machine
learning techniques. In: Conference on empirical methods in natural language
processing (EMNLP). Philadelphia, Pennsylvania, USA, 2002, p. 79.

Shukla, A 2011, 'SENTIMENT ANALYSIS OF DOCUMENT BASED ON
ANNOTATION', International Journal Of Web & Semantic Technology, 2, 4, pp. 91-103,
Computers & Applied Sciences Complete, , viewed 6 May 2012.

Spivac, Nova 2007. The Semantic Web, Collective Intelligence and Hyperdata.
novaspivack.typepad.com              [Online]             18               September.
http://novaspivack.typepad.com/nova_spivacks_weblog/2007/09/hyperdata.html
[Accessed 1 May 2012]

Vishwanath, J, & Aishwarya, S 2011, 'User Suggestions Extraction from customer
Reviews: A Sentiment Analysis approach', International Journal On Computer Science
& Engineering, 3, 3, pp. 1203-1206, Academic Search Complete, , viewed 1 May 2012.

YANG, C, LIN, K, & CHEN, H 2008, 'Sentiment Analysis in Weblog Using
Contextual Information:: A Machine Learning Approach', International Journal Of


                                                                           16 | P a g e
Computer Processing Of Languages, 21, 4, pp. 331-345, Academic Search Complete, ,
viewed 27 April 2012.

Young, L, & Soroka, S 2012, 'Affective News: The Automated Coding of Sentiment in
Political Texts', Political Communication, 29, 2, pp. 205-231, Academic Search
Complete, , viewed 10 May 2012.




                                                                     17 | P a g e

Mais conteúdo relacionado

Mais procurados

A Clustering Method for Weak Signals to Support Anticipative Intelligence
A Clustering Method for Weak Signals to Support Anticipative IntelligenceA Clustering Method for Weak Signals to Support Anticipative Intelligence
A Clustering Method for Weak Signals to Support Anticipative IntelligenceCSCJournals
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingGalit Shmueli
 
IRJET- Prediction of Stock Market using Machine Learning Algorithms
IRJET- Prediction of Stock Market using Machine Learning AlgorithmsIRJET- Prediction of Stock Market using Machine Learning Algorithms
IRJET- Prediction of Stock Market using Machine Learning AlgorithmsIRJET Journal
 
Accenture multi-speed-it-po v
Accenture multi-speed-it-po vAccenture multi-speed-it-po v
Accenture multi-speed-it-po vMarco Ciobo
 
Empirical Model of Supervised Learning Approach for Opinion Mining
Empirical Model of Supervised Learning Approach for Opinion MiningEmpirical Model of Supervised Learning Approach for Opinion Mining
Empirical Model of Supervised Learning Approach for Opinion MiningIRJET Journal
 
Regression and correlation
Regression and correlationRegression and correlation
Regression and correlationVrushaliSolanke
 
Sentiment Analysis of Feedback Data
Sentiment Analysis of Feedback DataSentiment Analysis of Feedback Data
Sentiment Analysis of Feedback Dataijtsrd
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slidesQuantUniversity
 
Sentiment Features based Analysis of Online Reviews
Sentiment Features based Analysis of Online ReviewsSentiment Features based Analysis of Online Reviews
Sentiment Features based Analysis of Online Reviewsiosrjce
 
To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?Galit Shmueli
 
To explain or to predict
To explain or to predictTo explain or to predict
To explain or to predictGalit Shmueli
 
Application of AI in customer relationship management
Application of AI in customer relationship managementApplication of AI in customer relationship management
Application of AI in customer relationship managementShashwat Shankar
 
Approach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule ThresholdsApproach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule ThresholdsMayank Johri
 
IRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation ForestIRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation ForestIRJET Journal
 
Demand forecasting
Demand forecastingDemand forecasting
Demand forecastingdkamalim92
 

Mais procurados (18)

A Clustering Method for Weak Signals to Support Anticipative Intelligence
A Clustering Method for Weak Signals to Support Anticipative IntelligenceA Clustering Method for Weak Signals to Support Anticipative Intelligence
A Clustering Method for Weak Signals to Support Anticipative Intelligence
 
D018212428
D018212428D018212428
D018212428
 
Statistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and PredictingStatistical Modeling in 3D: Describing, Explaining and Predicting
Statistical Modeling in 3D: Describing, Explaining and Predicting
 
IRJET- Prediction of Stock Market using Machine Learning Algorithms
IRJET- Prediction of Stock Market using Machine Learning AlgorithmsIRJET- Prediction of Stock Market using Machine Learning Algorithms
IRJET- Prediction of Stock Market using Machine Learning Algorithms
 
Accenture multi-speed-it-po v
Accenture multi-speed-it-po vAccenture multi-speed-it-po v
Accenture multi-speed-it-po v
 
Empirical Model of Supervised Learning Approach for Opinion Mining
Empirical Model of Supervised Learning Approach for Opinion MiningEmpirical Model of Supervised Learning Approach for Opinion Mining
Empirical Model of Supervised Learning Approach for Opinion Mining
 
Recommender system
Recommender system Recommender system
Recommender system
 
Regression and correlation
Regression and correlationRegression and correlation
Regression and correlation
 
Sentiment Analysis of Feedback Data
Sentiment Analysis of Feedback DataSentiment Analysis of Feedback Data
Sentiment Analysis of Feedback Data
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
Sentiment Features based Analysis of Online Reviews
Sentiment Features based Analysis of Online ReviewsSentiment Features based Analysis of Online Reviews
Sentiment Features based Analysis of Online Reviews
 
Classification
ClassificationClassification
Classification
 
To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?To Explain, To Predict, or To Describe?
To Explain, To Predict, or To Describe?
 
To explain or to predict
To explain or to predictTo explain or to predict
To explain or to predict
 
Application of AI in customer relationship management
Application of AI in customer relationship managementApplication of AI in customer relationship management
Application of AI in customer relationship management
 
Approach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule ThresholdsApproach to BSA/AML Rule Thresholds
Approach to BSA/AML Rule Thresholds
 
IRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation ForestIRJET- Credit Card Fraud Detection using Isolation Forest
IRJET- Credit Card Fraud Detection using Isolation Forest
 
Demand forecasting
Demand forecastingDemand forecasting
Demand forecasting
 

Destaque (20)

Trabajo cmc copia
Trabajo cmc   copiaTrabajo cmc   copia
Trabajo cmc copia
 
Cuánto fósforo aplico
Cuánto fósforo aplicoCuánto fósforo aplico
Cuánto fósforo aplico
 
Ti nicole karolina_gema_powerpoint
Ti nicole karolina_gema_powerpointTi nicole karolina_gema_powerpoint
Ti nicole karolina_gema_powerpoint
 
Presentación D E Q U I M I C A
Presentación  D E  Q U I M I C APresentación  D E  Q U I M I C A
Presentación D E Q U I M I C A
 
Miss HIV
Miss HIVMiss HIV
Miss HIV
 
Saude do idoso
Saude do idosoSaude do idoso
Saude do idoso
 
Felipe hincapié m octavo23
Felipe hincapié m octavo23Felipe hincapié m octavo23
Felipe hincapié m octavo23
 
Case do Grêmio publicado na revista Case Studies nº96 em 2013 - case de mark...
Case do Grêmio publicado na revista Case Studies nº96  em 2013 - case de mark...Case do Grêmio publicado na revista Case Studies nº96  em 2013 - case de mark...
Case do Grêmio publicado na revista Case Studies nº96 em 2013 - case de mark...
 
Fdm1
Fdm1Fdm1
Fdm1
 
La huelga
La huelgaLa huelga
La huelga
 
La Diferencia Que Hace La Diferencia
La Diferencia Que Hace La DiferenciaLa Diferencia Que Hace La Diferencia
La Diferencia Que Hace La Diferencia
 
CRM - João / Frederico
CRM - João / FredericoCRM - João / Frederico
CRM - João / Frederico
 
Navajo code talkers
Navajo code talkersNavajo code talkers
Navajo code talkers
 
Plano agricola e pecuario 2012 e 2013 mapa
Plano agricola e pecuario 2012 e 2013 mapaPlano agricola e pecuario 2012 e 2013 mapa
Plano agricola e pecuario 2012 e 2013 mapa
 
Columnas
Columnas Columnas
Columnas
 
El periódico
El periódicoEl periódico
El periódico
 
Bernardo33
Bernardo33Bernardo33
Bernardo33
 
Texto Linux
Texto LinuxTexto Linux
Texto Linux
 
Los números naturales
Los números naturalesLos números naturales
Los números naturales
 
Reus do mensalão
Reus do mensalãoReus do mensalão
Reus do mensalão
 

Semelhante a Semantic Web Based Sentiment Engine

ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNINGENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNINGIRJET Journal
 
IRJET - Sentiment Analysis and Rumour Detection in Online Product Reviews
IRJET -  	  Sentiment Analysis and Rumour Detection in Online Product ReviewsIRJET -  	  Sentiment Analysis and Rumour Detection in Online Product Reviews
IRJET - Sentiment Analysis and Rumour Detection in Online Product ReviewsIRJET Journal
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...theijes
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET-  	  Analysis of Brand Value Prediction based on Social Media DataIRJET-  	  Analysis of Brand Value Prediction based on Social Media Data
IRJET- Analysis of Brand Value Prediction based on Social Media DataIRJET Journal
 
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...IRJET Journal
 
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
IRJET- Searching an Optimal Algorithm for Movie Recommendation SystemIRJET- Searching an Optimal Algorithm for Movie Recommendation System
IRJET- Searching an Optimal Algorithm for Movie Recommendation SystemIRJET Journal
 
Emotion Recognition By Textual Tweets Using Machine Learning
Emotion Recognition By Textual Tweets Using Machine LearningEmotion Recognition By Textual Tweets Using Machine Learning
Emotion Recognition By Textual Tweets Using Machine LearningIRJET Journal
 
IRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News ArticlesIRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News ArticlesIRJET Journal
 
Extracting Business Intelligence from Online Product Reviews
Extracting Business Intelligence from Online Product Reviews  Extracting Business Intelligence from Online Product Reviews
Extracting Business Intelligence from Online Product Reviews ijsc
 
Product Analyst Advisor
Product Analyst AdvisorProduct Analyst Advisor
Product Analyst AdvisorIRJET Journal
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET Journal
 
IRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User InterestIRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User InterestIRJET Journal
 
IRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET Journal
 
A Brief Survey on Recommendation System for a Gradient Classifier based Inade...
A Brief Survey on Recommendation System for a Gradient Classifier based Inade...A Brief Survey on Recommendation System for a Gradient Classifier based Inade...
A Brief Survey on Recommendation System for a Gradient Classifier based Inade...Christo Ananth
 
IRJET- Opinion Mining and Sentiment Analysis for Online Review
IRJET-  	  Opinion Mining and Sentiment Analysis for Online ReviewIRJET-  	  Opinion Mining and Sentiment Analysis for Online Review
IRJET- Opinion Mining and Sentiment Analysis for Online ReviewIRJET Journal
 
Recommendation System using Machine Learning Techniques
Recommendation System using Machine Learning TechniquesRecommendation System using Machine Learning Techniques
Recommendation System using Machine Learning TechniquesIRJET Journal
 
A Novel Jewellery Recommendation System using Machine Learning and Natural La...
A Novel Jewellery Recommendation System using Machine Learning and Natural La...A Novel Jewellery Recommendation System using Machine Learning and Natural La...
A Novel Jewellery Recommendation System using Machine Learning and Natural La...IRJET Journal
 

Semelhante a Semantic Web Based Sentiment Engine (20)

ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNINGENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
 
Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822Risk mgmt-analysis-wp-326822
Risk mgmt-analysis-wp-326822
 
IRJET - Sentiment Analysis and Rumour Detection in Online Product Reviews
IRJET -  	  Sentiment Analysis and Rumour Detection in Online Product ReviewsIRJET -  	  Sentiment Analysis and Rumour Detection in Online Product Reviews
IRJET - Sentiment Analysis and Rumour Detection in Online Product Reviews
 
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
A Novel Feature Selection with Annealing For Computer Vision And Big Data Lea...
 
IRJET- Analysis of Brand Value Prediction based on Social Media Data
IRJET-  	  Analysis of Brand Value Prediction based on Social Media DataIRJET-  	  Analysis of Brand Value Prediction based on Social Media Data
IRJET- Analysis of Brand Value Prediction based on Social Media Data
 
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
 
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
IRJET- Searching an Optimal Algorithm for Movie Recommendation SystemIRJET- Searching an Optimal Algorithm for Movie Recommendation System
IRJET- Searching an Optimal Algorithm for Movie Recommendation System
 
Emotion Recognition By Textual Tweets Using Machine Learning
Emotion Recognition By Textual Tweets Using Machine LearningEmotion Recognition By Textual Tweets Using Machine Learning
Emotion Recognition By Textual Tweets Using Machine Learning
 
IRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News ArticlesIRJET- Stock Market Prediction using Financial News Articles
IRJET- Stock Market Prediction using Financial News Articles
 
FYP
FYPFYP
FYP
 
Developing Movie Recommendation System
Developing Movie Recommendation SystemDeveloping Movie Recommendation System
Developing Movie Recommendation System
 
Extracting Business Intelligence from Online Product Reviews
Extracting Business Intelligence from Online Product Reviews  Extracting Business Intelligence from Online Product Reviews
Extracting Business Intelligence from Online Product Reviews
 
Product Analyst Advisor
Product Analyst AdvisorProduct Analyst Advisor
Product Analyst Advisor
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
 
IRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User InterestIRJET- Analysis of Rating Difference and User Interest
IRJET- Analysis of Rating Difference and User Interest
 
IRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence MatrixIRJET- Analyzing Voting Results using Influence Matrix
IRJET- Analyzing Voting Results using Influence Matrix
 
A Brief Survey on Recommendation System for a Gradient Classifier based Inade...
A Brief Survey on Recommendation System for a Gradient Classifier based Inade...A Brief Survey on Recommendation System for a Gradient Classifier based Inade...
A Brief Survey on Recommendation System for a Gradient Classifier based Inade...
 
IRJET- Opinion Mining and Sentiment Analysis for Online Review
IRJET-  	  Opinion Mining and Sentiment Analysis for Online ReviewIRJET-  	  Opinion Mining and Sentiment Analysis for Online Review
IRJET- Opinion Mining and Sentiment Analysis for Online Review
 
Recommendation System using Machine Learning Techniques
Recommendation System using Machine Learning TechniquesRecommendation System using Machine Learning Techniques
Recommendation System using Machine Learning Techniques
 
A Novel Jewellery Recommendation System using Machine Learning and Natural La...
A Novel Jewellery Recommendation System using Machine Learning and Natural La...A Novel Jewellery Recommendation System using Machine Learning and Natural La...
A Novel Jewellery Recommendation System using Machine Learning and Natural La...
 

Último

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Último (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Semantic Web Based Sentiment Engine

  • 1. CA652A Semantic Web Based Sentiment Engine A system to determine online sentiment on current affairs for the purpose of analysis and prediction 11210889 52595354 CA652A
  • 2. ABSTRACT Sentiment analysis involves classifying opinions from text as "positive", "negative" or “neutral. Its purpose and benefit is to assist in extracting valuable information and insight from copious amounts of unstructured data. This proposed system will have the capability to determine online sentiment on current affairs for the purpose of analysis and prediction. For the sentiment analysis a cluster-method approach is recommended, which is a recent advancement in this area. Various APIs will assist in extracting other data such as location and time. Evaluation of system through the use of the Pang et al movie review data sets is recommended to validate basic functionality and real life data in the form of the 2008 US presidential race data to evaluate all functionality of the system. Multiple industries are identified as potential users of this system from marketing companies to hotels adding to our benefit in the commercialisation potential of the system. 1|Page
  • 3. A report submitted to Dublin City University, School of Computing for module CA652: Information Access, 2011/2012. We hereby certify that the work presented and the material contained herein is my/our own except where explicitly stated references to other material are made Student Numbers 52595354 11210889 2|Page
  • 4. TABLE OF CONTENTS Abstract .................................................................................................................................... 1 Introduction ............................................................................................................................ 5 Concept Overview ................................................................................................................. 5 Constraints and Limitations ............................................................................................ 5 Functional Description ......................................................................................................... 6 Sentiment Search Functions............................................................................................... 6 Techniques ........................................................................................................................... 6 Time parameter Based Search ....................................................................................... 8 Geographical Extraction Based ..................................................................................... 8 Social Sentiment Extraction Based data ....................................................................... 9 Graphical Data Generation Tools ................................................................................. 9 Pros & Cons of proposed system ...................................................................................... 9 Evaluation Plan..................................................................................................................... 10 Stage One Testing - Validation ..................................................................................... 10 Stage Two Testing – Functionality Testing ................................................................ 11 Stage Three Testing – Real Life Data ........................................................................... 11 Commercialisation Potential ............................................................................................. 13 Conclusion and Further Research Opportunities .......................................................... 14 References .............................................................................................................................. 15 3|Page
  • 5. Table of Figures Figure 1 - Sentiment Analysis framework ........................................................................... 7 Figure 2 - Cluster Method Accuracy/Efficiency ................................................................ 8 Figure 3 - Graphical Representation of content .................................................................. 9 Figure 4 - Basic Validation Testing Results ....................................................................... 11 Figure 5 - Two Topic Validation Testing ........................................................................... 11 Figure 6 - Sample Test Output (Obama) ............................................................................ 12 Figure 7 - Sample Test Data (McCain) ............................................................................... 13 4|Page
  • 6. INTRODUCTION The ‘media’ as we now conceptualise it has changed dramatically. With the internet, people have an opportunity to ‘weigh in’ on events, by providing their opinions, and feedback and in real time through blogs, forum, social networks and commenting systems on news websites. There is a growing interest in measuring sentiment that can be contributed to the dramatic increase in the volume of digitized information. “An increasing number of studies in political communication focus on the “sentiment” or “tone” of news content, political speeches, or advertisements” (Young, L, & Soroka, S 2012) This report discusses the concept of developing a Semantic Web based sentiment engine that will be able to analyse public sentiment on current issues, from politics to reality TV shows. Based on the analysis, tracking of popular opinion through social media channels and leveraging research in the area of sentiment analysis, accurate predictions could be made possible on events from presidential elections to the X-Factor competition. CONCEPT OVERVIEW This proposed system is not a standard sentiment engine that returns static data; it offers increased functionality to assist with data interpretation. By allowing end users to customise their search, filter the returned data under multiple parameters and have graphical representation of results to facilitate interpretation. CONSTRAINTS AND LIMITATIONS The limitations of this concept are not due to the technological constraints but are simply down to the volatility of public opinion and that is something that cannot be remedied or correcting by technology. Another limitation is the scope of the opinion being captured. User groups of social media and participants in online forums are statistical of a younger age group. The lack of inclusion of the opinion of older age groups could greatly affect the accuracy 5|Page
  • 7. of the data as it would not be entirely representative – the impact of this imbalance would particularly impact politics with older groups statistical more likely to vote. FUNCTIONAL DESCRIPTION SENTIMENT SEARCH FUNCTIONS • Users can enter multiple search terms for the purpose of data comparison. Other features would be utilised to improve the analysis returns. • Multiple Search Parameters o Time Frame Defined Search - Data retrieved can be limited to a specific time frame. o Geographical Location Based Search – Search data retrieved can be filtered by location of users o Narrow Search Scope – Select websites to exclude or restrict search to small number of websites. • Graphical representations of the data are generated. TECHNIQUES Sentiment Analysis Techniques There is much research in the area of sentiment analysis, the primary objective being to find a technique where there is no trade-off between speed and accuracy. Several new and emerging techniques have been researched as part of identifying the best fit for this system. • Proximity-Based Approach (Hasan, S, & Adjeroh, D 2011) o This proposed method uses proximity-based features to determine sentiment; proximity distribution, mutual information between proximity types, and proximity patterns. 6|Page
  • 8. Based on Annotation (Shukla, A 2011) o This proposed method counts all the annotation present, calculates sentiment scores of all annotation including comments to determine sentiments. • Sentence-level Lexical Based Semantic Orientation (Khan, A et al, 2011) o This proposed method uses SentiWordNet to calculate the semantic ‘score’ of sentences it has classified as subjective from reviews and blog comments. • Machine Learning approach to contextual information (YANG, C et al, 2008) o This proposed method differentiates itself from others by taking context into account when determining the sentiment category. Its primary focus and test data sets have been blog posts. Figure 1 below, shows the framework employed. FIGURE 1 - SENTIMENT ANALYSIS FRAMEWORK • Clustering-Based Sentiment Analysis Approach (Li, G, & Liu, F 2012) The method deemed most appropriate for this proposed system was based on a article from the Journal Of Information Science in April this year, which outlined the Clustering-Based Sentiment Analysis approach. It proposed that by applying a “TF- IDF weighting method, a voting mechanism and importing term scores, an acceptable and stable clustering result can be obtained” (Li, G, & Liu, F 2012) The evaluation results 7|Page
  • 9. were the most impressive of all techniques reviewed as part of this research. It appears to have performed well in terms of both accuracy and efficiency with no need for human participation, as can be seen from figure 1. FIGURE 2 - CLUSTER METHOD ACCURACY/EFFICIENCY Apart from its accuracy and efficiency, this technique was deemed the most suitable as it can be applied universally to any data set. Other techniques researched, have been developed for particular data types, customer reviews or blogs and their evaluation appraisals appear to suggest they do not perform as well outside of these data types. TIME PARAMETER BASED SEARCH This sentiment engine would make use of the adaptible Librato API libraries to allow sentiment returns to be time sensative. This would be in order for a user to evaluate how sentiment is changing over time or what sentiment was during specific time periods. GEOGRAPHICAL EXTRACTION BASED Adding a geographical element would be a unique feature allowing for mapping of sentiment results. Preferred location content will be pulled from the Twitter API as it gives access to Twitter profile location. Comment systems used by news websites etc. request a location prior to posting the comment like on the Irish Times website. Facebook API allows access to location of user if the privacy setting is turned on. OAUTH setting would be used to allow the users of the sentiment engine to explore the opinions of their friends and networked associates and how it would fit on the sentiment scales. Other free use location APIs may also be needed. 8|Page
  • 10. SOCIAL SENTIMENT EXTRACTION BASED DATA The content used to create athematrix of information to evaluate sentiment within via FLP would likely be the following but not limited to: Twitter; Disqus; Livefyre; Intensedebate; Drupal comments; Wordpress comments; other blog posts; scraped open facebook and fan page comments; facebook comment system; text comments; G+ posts; Slideshare.net; Pinterest pins; Google News articles; various bookmarking site comments like fark.com reddit; and other language relavent wire news services. GRAPHICAL DATA GENERATION TOOLS Graphical representations of the data are generated. The results could be rendered as web-based flash objects or in way that is complient to the evolving HTML5 standards and be IOS 5 comlient given the anamosity Apple has with Adobe over flash for results to be useful on mobile devices and tablets. These reports woud be exportable to Crystal Reports. 1600 1400 1200 1000 800 Candidate A 600 Candidate B 400 200 0 Postive Neutral Negative FIGURE 3 - GRAPHICAL REPRESENTATION OF CONTENT PROS & CONS OF PROPOSED SYSTEM The primary argument for why sentiment engines via Semantic Web and linked data are useful is based upon the new information and insight that can be gleaned from it. The ability to know relative and positional sentiment can be useful in many anytical or informational arbitrage situations. 9|Page
  • 11. In terms of the cons, primary concern would be data quality. Problems with data quality are a huge issue and can skew any resulting analysis. The extent of the data quality problem has been often discovered by information activists working in the open data movement. Secondly privacy concerns and staying within the spirit and letter of the relavent data privacy laws of the regulatory regime you operate under may at times be an issue. This can be tricky given the interconnected nature of the web. Lastly, inaccuracies of data and it being organisied in “short sets” vs deeper data may create false sentiments. Is their enough data being looked at to create a realist postive or negative sentiment? Some additional analysis may need some addition parsing to tease out, for example, intial heated emotion responses from the rationale morning after response. EVALUATION PLAN STAGE ONE TESTING - VALIDATION The evaluation plan would begin with simple software validation. The first test case would consist of validating the fundamental functionality of the system, its ability to differentiate between sentiments. The data set that’s to be used is the movie review data from Pang et al experiments1 Movie review data is widely regarded as the most challenging data for sentiment engines to analysis, this can be contributed to the fact that a positive review may contain descriptions of gory or violent scenes and equally a negative review could contain descriptions of light-hearted pleasant scenes. For additional testing other data sets could be used for each iteration of this dynamic testing stage 1 Pang B, Lee L, Vaithyanathan S. Thumbs up, Sentiment classification using machine learning techniques. In: Conference on empirical methods in natural language processing (EMNLP). Philadelphia, Pennsylvania, USA, 2002, p. 79. 10 | P a g e
  • 12. 20% 39% Neutral Positive 41% Negative . FIGURE 4 - BASIC VALIDATION TESTING RESULTS STAGE TWO TESTING – FUNCTIONALITY TESTING The second stage of testing would be the validation of the multiple input functionality; to ensure that data can be retrieved for two or more search terms and also that they can be accurately differentiated. The test case for this would be built on the first stage of testing with added content regarding a second movie etc. Schlinder's List The Usual Suspects 39% 20% 20% 21% Neutral Neutral 41% Positive Positive 59% Negative Negative FIGURE 5 - TWO TOPIC VALIDATION TESTING STAGE THREE TESTING – REAL LIFE DATA The final stage of the evaluation plan would be to perform testing using previous high profile events as the test cases, such as the US Presidential Election of 2008 and 11 | P a g e
  • 13. the X-Factor competition from previous years. This validation is more complex as it will span the entire internet not just the staging website. The testing would be performed over different time intervals, days, weeks, months, and the entire duration of the event. In the case of the political elections these time periods could be used to coincide with official opinion polls, for example Gallop and Rasmussen state side or RedC for Irish based events. Validation of the geographical based sentiment analysis function would be tested to gauge the accuracy of the location results. In the case of the US Presidential Election the final voting percentages for each candidate per state would give an accurate basis for comparison. SAMPLE EVALUATION TEST CASE By taking the ten states where each candidate won by the largest percentage majority, and graphing the percentage of votes each candidate received, and also the percentage of positive, negative and neutral data regarding that candidate. What one would expect in a fully evaluated system would be a close correlation between positive data and the percentage of votes and also a correlation with the negative or neutral data and the other candidate’s percentage of votes, as per the sample charts below for Obama and McCain respectively. 90 Obama’s Data 80 70 Obama's Percentage 60 of Votes 50 McCain's Percentage 40 of Votes 30 Positive % 20 10 Negative % 0 Neutral % FIGURE 6 - SAMPLE TEST OUTPUT (OBAMA) 12 | P a g e
  • 14. 70 McCain’s Data 60 McCain's Percentage 50 of Votes 40 Obama's Percentage of Votes 30 Positive % 20 Negative % 10 0 Neutral % FIGURE 7 - SAMPLE TEST DATA (MCCAIN) COMMERCIALISATION POTENTIAL In an era where both business and individuals are attempting to move further and further to data driven decision sentiment engine products have a range of commercial potential. Some companies have already begun commercializing Semantic Web applications like IBM licensing of their WebFountain Internet analytical engine to FActiva and ThompsonReuters 2003 for example for those interested in corporate reputational data. Various market research for people who cannot afford Enterprise Resoruce Planning (ERP) add ons like SAP Business Objects, SAS, or say LexisNexis Analytics and for who the current available crop of free semantic sentiment engines (name a few from those ten) tools are just insufficient, too niche, or unscalable (Basu, 2010). Semantic Web products are becoming important in internal and external Business Inframatics. However, information arbitrage is not merely for professional market traders. This system would likely be a software as service (SaaS) on the web, it could be sold on a free-mium basis or a monthly subscription or yearly license depending on the implementation. 13 | P a g e
  • 15. Primary clients would depend on the sentiments needing to be parsed and the proprietary and public data sets being used in within the sentiment engine. Examples to be included: Corporate Media; Content Publishing industry; PR firms; polling; market research firms; Trading platforms; Political Parties; Elections; Government agencies; security services; and Bookmarkers for deciding odds on Novelty bets - reality TV shows, politics etc. CONCLUSION AND FURTHER RESEARCH OPPORTUNITIES Where does the Semantic Web lead to exactly? We don’t really know, but opening up the segregated data silos and making sense of deeper dark ‘big data,’ in pursuit of the benefits of a deeper rooted “hyperdata” would be a nice path. However, the road will be long but it may improve our day to day lives immensely. "Many applications and services claim to be "semantic" in one manner or another, but that does not mean they are "Semantic Web." Semantic applications include any applications that can make sense of meaning, particularly in language such as unstructured text, or structured data in some cases. By this definition, all search engines today are somewhat "semantic" but few would qualify as "Semantic Web" apps. (Spivak, 2007) How we get from the early steps of Web 3.0 to this deeper data web will be a long process. It will provide countless benefits, many of which we may not even percieve today. However, sentiment engines are mearly one way to get the public and the developer community interested and excited for all the other benefits that this open data future could hold. For that reason sentiment engines will remain an important component in the near term future, as “big data,” holds much of the future promise to bring the of the “web of things” and make sense and use of them. 14 | P a g e
  • 16. REFERENCES Abbasi, A, Hsinchun, C, & Salem, A 2008, 'Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums', ACM Transactions On Information Systems, 26, 3, pp. 1-34, Computers & Applied Sciences Complete, viewed 4 May 2012. Basu, Saikat 2010. 10 Web Tools To Try Out Sentiment Search & Feel the Pulse Make Use Of [Online] 30 April. http://www.makeuseof.com/tag/10-web-tools-sentiment- search-feel-pulse/ [Accessed 1 May 2012] Bergman, Mike 2010. I Have Yet to Metadata I Didn’t Like. AI3 [Online] 16 August. http://www.mkbergman.com/902/i-have-yet-to-metadata-i-didnt-like/ [Accessed 1 May 2012] Bollen, J. Mao, Huina. Zeng, Xiao-Jun March 2011. Twitter mood predicts the stock market. Journal of Computational Science, 2(1), Pages 1-8 Available from: http://arxiv.org/abs/1010.3003 Cai, K, Spangler, S, Ying, C, & Li, Z 2010, 'Leveraging sentiment analysis for topic detection', Web Intelligence & Agent Systems, 8, 3, pp. 291-302, Academic Search Complete, viewed 20 April 2012. Dalton, Jeff 2007. Caffè Java Open Source NLP and Text Mining tools. Jeff's Search Engine Caffé [Online] 16 March. http://www.searchenginecaffe.com/2007/03/java- open-source-text-mining-and.html [Accessed 1 May 2012] Hamouda, A, Marei, M, & Rohaim, M 2011, 'Building Machine Learning Based Senti- word Lexicon for Sentiment Analysis', Journal Of Advances In Information Technology, 2, 4, pp. 199-203, Library, Information Science & Technology Abstracts with Full Text, , viewed 1 May 2012. Hasan, S, & Adjeroh, D 2011, 'Detecting Human Sentiment from Text using a Proximity-Based Approach', Journal Of Digital Information Management, 9, 5, pp. 15 | P a g e
  • 17. 206-212, Library, Information Science & Technology Abstracts with Full Text, , viewed 7 May 2012. Kang, H, Yoo, S, & Han, D 2012, 'Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews', Expert Systems With Applications, 39, 5, pp. 6000-6010, Academic Search Complete, , viewed 10 April 2012. Lévy, Pierre CRC, FRSC 2007. Elements of Semantic Engineering I3 workshop / WWW Consortium Conference / Banff 2007 Available from: http://www.ieml.org/text/semantic_space.pdf Li, G, & Liu, F 2012, 'Application of a clustering method on sentiment analysis', Journal Of Information Science, 38, 2, pp. 127-139, Business Source Complete, , viewed 21 April 2012. Pang B, Lee L, Vaithyanathan S. Thumbs up, Sentiment classification using machine learning techniques. In: Conference on empirical methods in natural language processing (EMNLP). Philadelphia, Pennsylvania, USA, 2002, p. 79. Shukla, A 2011, 'SENTIMENT ANALYSIS OF DOCUMENT BASED ON ANNOTATION', International Journal Of Web & Semantic Technology, 2, 4, pp. 91-103, Computers & Applied Sciences Complete, , viewed 6 May 2012. Spivac, Nova 2007. The Semantic Web, Collective Intelligence and Hyperdata. novaspivack.typepad.com [Online] 18 September. http://novaspivack.typepad.com/nova_spivacks_weblog/2007/09/hyperdata.html [Accessed 1 May 2012] Vishwanath, J, & Aishwarya, S 2011, 'User Suggestions Extraction from customer Reviews: A Sentiment Analysis approach', International Journal On Computer Science & Engineering, 3, 3, pp. 1203-1206, Academic Search Complete, , viewed 1 May 2012. YANG, C, LIN, K, & CHEN, H 2008, 'Sentiment Analysis in Weblog Using Contextual Information:: A Machine Learning Approach', International Journal Of 16 | P a g e
  • 18. Computer Processing Of Languages, 21, 4, pp. 331-345, Academic Search Complete, , viewed 27 April 2012. Young, L, & Soroka, S 2012, 'Affective News: The Automated Coding of Sentiment in Political Texts', Political Communication, 29, 2, pp. 205-231, Academic Search Complete, , viewed 10 May 2012. 17 | P a g e