The document describes the technical details of an API for sentiment analysis of English text, which can be accessed for free on the SemanticAnalyzer Group website and via the Mashape API platform. The sentiment analyzer uses either a high precision or high recall classifier and can identify sentiment in text as positive, negative, or neutral. It also provides customizable sentiment analysis that can orient sentiment towards a specified target entity mentioned in the text.
1. Starget: Sentiment Analyzer for English
Technical description
SemanticAnalyzer Group, 2013-11-19
www.semanticanalyzer.info
This document describes technical details of sentiment analyzer API for the English language. The
component can be served as API or as a library for you applications. You can gain the access to the API by
registering with mashape.com. The free demo is available on the SemanticAnalyzer Group web site:
http://semanticanalyzer.info/blog/starget-english-sentiment-analysis/
The API has two types of analysis:
●
●
Finding accurate sentiment hits, but missing some difficult cases optimizing for precision
Finding more, but less accurate, sentiment hits, optimizing for recall
The precision is 75-85% at 20% recall. Using high precision classifier may especially suit your project where
human analysts are annotating sentiment of a mass media stream of messages, and you would like to have
an automatic tool as a pre-filter. What the tool has missed could be annotated by your human analysts to
speed up the data release to your end user and / or report to your boss. On the other hand, if you would like
to get a sense of emotions in a big corpora of texts quickly and are comfortable missing some difficult cases
with un-obvious sentiment polarity, high recall classifier is your bet.
The sentiment analyzer is implementing a highly customizable and highly adaptable to your domain
rule-based approach and is analyzing a sentiment flow orientated towards a user specified target (brand
name, person name, abstract entity or any other object). The full version of the algorithm (not online, a
separate package) is able to resolve anaphora links (when he or she are referring to an object mentioned
earlier in a text) with high accuracy.
Currently the API is attributing a text to one of the three classes {NEGATIVE, NEUTRAL, POSITIVE}.
Speed of processing
Server: Intel(R) Core(TM) i7-3820QM CPU @ 2.70GHz, 16Gb
Operating system: ubuntu 12.10, Java 1.7.0_21 64 bit server
1820 characters/ms (recall optimized)
38 characters/ms (precision optimized)
Tests were conducted in a single thread on 498 tweet messages with 13915 words and 69121 characters. Total
time of execution: 1823 ms for the precision optimized classifier and 38 ms for the recall optimized.
Precision / recall
Rulebased classifier with parsing
positive : P = 0.96; R = 0.13186813186813187; F = 0.2318840579710145
neutral : P = 0.3048245614035088; R = 1.0; F = 0.4672268907563025
negative : P = 0.9411764705882353; R = 0.0903954802259887; F = 0.1649484536082474
Total time taken: 19673 ms; 39 ms per item
3. Output:
Sentiment: positive
Input:
Context: Sentiment analysis has never been perfect.
Target: Sentiment analysis
Output:
Sentiment: negative
Input:
Context: Sentiment analysis has never been so perfect.
Target: Sentiment analysis
Output:
Sentiment: positive