One of the big trends we see happening in customer feedback at the moment is the explosion in the amount of free-form text feedback – both solicited and spontaneous comments – coming in through various channels. Companies nowadays have access to a multitude of options for analyzing these large numbers of text comments.
In this whitepaper, we aim to give an overview of all analysis options available. Furthermore, we identify the optimal method of analyzing free-form text feedback in order to yield actionable insight that can be easily shared with key stakeholders throughout the organization.
Etuma whitepaper On Extracting Meaning And Feeling - Enterprise Feedback Analysis Today
1. ON EXTRACTING MEANING AND FEELING:
ENTERPRISE FEEDBACK ANALYSIS TODAY
What are the technologies and solutions currently
available for analyzing free-form customer feedback?
2. ON EXTRACTING MEANING AND FEELING:
ENTERPRISE FEEDBACK ANALYSIS TODAY
Contents
Introduction 1
Requirements of an effective free-form customer feedback analysis system 1
Manual analysis is slow, expensive and mostly ineffective 2
Extracting keywords through tag clouds can be visually impressive, but... 3
You can detect the whole comment sentiment, but... 3
Statistical analysis: takes time to sift through historic data 4
Analyzing feedback the natural way: rule-based semantic analysis 5
3. ON EXTRACTING MEANING AND FEELING:
ENTERPRISE FEEDBACK ANALYSIS TODAY
Since merchants began producing goods and services to sell, their customers have had
opinions about those goods and services. The advent of the telephone and, subsequently,
personal computers and the Internet made it easy for large numbers of customers to share their
opinions. Companies soon realized the value in collecting this feedback and, today, are actively
pursuing feedback from their customers and potential customers. Some companies have even
gone a step further and begun to systematically analyze the feedback from the numerous
channels at their disposal.
Up to now, companies have focused on gathering structured feedback, typically in the form of
rating various aspects of a product or service on a numerical scale. This has enabled them to
affordably and systematically analyze the data, recognizing trends and patterns over periods of
time. But there are some serious shortcomings with this method of gathering feedback: chiefly,
that real wisdom and insight are often found in unstructured, free-form feedback. The use of
unstructured feedback minimizes the leading of customers’ thoughts, allowing them to openly
express what it is they actually want from or think of a product.
The volume of this type of free-form feedback is growing because of
• Social media
• Transaction-based queries:
How was your flight?
How was your user experience in this purchase?
How was your car maintenance?
• Net Promoter Score-type surveys:
How willing are you to recommend our product on a scale of 0-10?
Why? What would you improve?
The analysis of free-form customer feedback is an emerging market with high-growth potential,
and a vast amount of money will be spent on these services in the next three years. In 2011, the
market was valued at over one billion US dollars with an estimated twenty-three to twenty-seven
percent Compound Annual Growth Rate (Gartner 2012). The companies that provide the most
accurate, efficient, and low-cost solutions for analyzing free-form customer feedback will
dominate the market and earn a substantial portion of this revenue.
This paper will look at the technologies and solutions currently available for analyzing free-form
customer feedback.
Requirements of an effective free-form customer feedback analysis system
First, let’s look at the components needed to make a customer feedback analysis system useful
in an organization:
• Prioritization of feedback to make it actionable: requires alerts and trend analysis
• Speed and real-time analysis
• Automation
1
4. ON EXTRACTING MEANING AND FEELING:
ENTERPRISE FEEDBACK ANALYSIS TODAY
• Flexibility and ability to learn: categories
need to be dynamically updated as feedback The analysis of free-form text is
topics change not a simple task, and the
• Consistency: minimizing the impact of human requirements for an effective free-
interpretation and subjectivity form customer feedback analysis
• Sentiment detection system are diverse.
• Capacity to analyze multiple languages
simultaneously
• Grouping of keywords/terms/phrases which have similar meaning: “not expensive”
detected correctly as “cheap”
• Detection of various discussion topics and sentiments inside the same feedback item
• Affordability: avoiding the need for long initial projects
• Name detection: names vs brand names
• Readiness to start the analysis on Day 1: bypassing the need to first execute a time-
consuming analysis of historical data
Manual analysis is slow, expensive and mostly ineffective
Traditionally, enterprise feedback analysis has relied on inefficient and expensive manual labor,
which is prone to errors and inconsistencies. This, in fact, is still the predominant method of
feedback analysis today. Large numbers of people are employed to manually read feedback
messages gathered from various sources and to classify them according to some preset
categories, which are typically related to the
company’s organizational or product structure. They
The analysis of free-form then set a sentiment score for the entire
customer feedback is an emerging response,even in the case of long text responses
market with high-growth spanning many paragraphs.
potential. The companies that
provide the most accurate, There are a number of issues with this method of
efficient, and low-cost solutions feedback analysis. Aside from the major cost
for analyzing free-form customer associated with employing the necessary number of
feedback will dominate the market people to systematically keep up with the continuous
and earn a substantial portion of flow of increasing feedback volumes in large
its revenue. companies, there are various problems with the
accuracy and efficiency of such organizations.
Humans tend not to be very good at discrete and
consistent categorization. In companies employing
large staffs due to high feedback volumes, this problem is compounded by the fact that
individuals with different backgrounds create very different analysis results.
Because the categories are rigidly preset by the company, when a new topic emerges
spontaneously, it can take valuable time to create a new category. Comments coming in the
interim may be lost completely. Feedback in multiple languages further complicates the issue.
And doing this work manually is slow. If a crisis arises with a massive amount of claims or social
2
5. ON EXTRACTING MEANING AND FEELING:
ENTERPRISE FEEDBACK ANALYSIS TODAY
media discussions, the crucial time for action may have passed before the company is even
aware of the problem.
Finally, because of the inefficiencies of such a manual process, analysts are often forced to
concentrate on negative comments where corrective action may be needed. Typically, if there
are both positive and negative comments in a single response, the analyst will focus on the
negative ones, categorizing and determining the sentiment accordingly. Positive comments and
other seemingly minor subjects in the feedback may not be taken into account at all.
In summary, manual processing is inconsistent, slow and expensive. Let’s look at some of the
other alternatives available.
Extracting keywords through tag clouds can be visually impressive, but...
Technological developments have made possible great advances in analyzing feedback, and
today there are dozens of companies offering services like statistical reporting and primitive
language analysis technologies.
There are quite a few free or Open Source solutions for keyword extraction. But while keywords
can be useful in marketing, an unstructured list of
keywords is impossible to turn into reliable,
consistent and actionable trend analysis. Simply
knowing how many times a specific word is used is
not enough. There is no grouping of keywords or
keyword-based sentiment. Without this deeper “Last month ‘OUR BRAND’ got
insight, data cannot be turned into statistically mentioned 300,000 times. Out
of those 180,000 were positive
manageable information.
and the rest negative.”
Another problem with keyword extraction is the
method used for returning words to their base form:
How much can be done with this
‘run’ and ‘running’ should both be returned as ‘run’.
type of information?
The typical method of returning keywords to their
base form is stemming. This means, in English,
taking the basic form of the word and dropping off
the “ed”, “s”, or “ing” as well as mapping irregular
verbs. But, in fact, English is the one of the few
languages where stemming yields relatively accurate results. In most other languages, keyword
extraction requires morphological analysis. Without this, each conjugation of a word is
presented as a completely different word.
You can detect the whole comment sentiment, but...
Some companies offer sentiment analysis of social media feedback. This type of analysis
detects whether a comment is positive or negative about a specific term by searching comments
for specific pre-determined terms (brands, models, etc) and checking for positive and negative
words near that term.
Detecting sentiment in this manner results in the following type of information:
3
6. ON EXTRACTING MEANING AND FEELING:
ENTERPRISE FEEDBACK ANALYSIS TODAY
• Last month “OUR BRAND” got mentioned 300,000 times. Out of those, 180,000 were
positive and the rest negative.
• In the previous month “OUR BRAND” got mentioned 320,000 times. Out of those
195,000 were positive and the rest negative.
What can be done with this type of information? Of course, long-term trends can be analyzed
and historical crises can be pinpointed, but there is no real insight into what people are saying.
This type of analysis provides some information, but it is neither detailed nor accurate enough to
be used in undertaking any specific action by the people responsible for the specific area of the
business for which the feedback was actually intended.
The problem with most sentiment detection technologies is that they are based on statistical
approaches and, consequently, are highly inaccurate. To reach acceptable levels of accuracy,
they require expensive months-long projects conducted by experienced experts, and, so, are
out of reach for all but the largest enterprises.
Statistical analysis: takes time to sift through historic data
Semantic statistical analysis involves creating an immense corpus of text strings, which are
mapped one-by-one to predetermined topics and sentiments. It is often regarded as an
adequate technology, but the cost of deploying such a system again prohibits all but the largest
companies from even considering it. This type of analysis requires a massive amount of
historical data to build a corpus large enough for the results to be consistent and comparable
over time. Building up such a dataset takes a lot of time and comes at a high cost.
A handful of companies have gone a step further, trying to automatically derive meaning and
topic-specific sentiment. In order to set up a database that enables pattern matching, they run a
language mapping project before the launch of the
service. Such projects are usually costly and time-
Semantic statistical analysis consuming, since they involve lots of manual work
involves creating a corpus of text and low-level linguistic technologies. Moreover, the
strings, which are mapped one-by- resulting analysis can only be done in the language
one to predetermined topics and originally configured, with new languages requiring
sentiment scores. separate projects.
These systems have very limited capacity to learn,
which means that each new set of terms or
concepts–new products, new sales channels, new memes–requires a separate project to
commence analysis. This is especially difficult when it comes to social media where the
language is continuously changing and evolving.
Semantic statistical analysis is always customer-specific, so it lacks a fundamentally important
component: universal system learning. In the statistical approach, a system’s analysis must be
limited to customer-specific or industry-specific data, because this is all the information available
in that system.
4
7. ON EXTRACTING MEANING AND FEELING:
ENTERPRISE FEEDBACK ANALYSIS TODAY
Some statistical semantic analysis systems use a low level of corpus sharing and advanced
language tools, which reduce the amount of customer-specific work up front. And these systems
are being continuously improved by taking advantage of lessons from ongoing customer
projects.
Analyzing feedback the natural way: rule-based semantic analysis
A rule-based semantic analysis system, on the other hand, understands language structures
and the relationships between words. It has a massive set of pre-configured rules that enable
automatic and accurate real-time analysis. Developing a rule-based language analysis system
takes a considerable amount of work and time, sometimes stretching even up to twenty years.
Very few companies have the skills or resources to develop such a system.
Once the system has been developed, however, it is ready to be applied to any free-format text,
guaranteeing high levels of accuracy. Long configuration and implementation projects are not
necessary in rule-based semantic analysis, because the system combines decades of projects
in which all the necessary rules have been gathered
and a world view has been created: intelligence has
been built into the system.
Rule-based semantic analysis
A rule-based system also gives a much more likely systems are especially suited for
possibility of accurate sentiment detection. Systems customer feedback analysis.
such as these are especially suited for running Positive or negative opinions can
customer feedback analyses. Positive or negative be expressed in so many different
opinions can be expressed in so many different ways ways that it would be impossible
that it would be impossible to list all possible ways in to list all possible ways in a
a matching system. And a typical rule-based system statistical mapping system. The
can also handle idiosyncrasies like bad grammar, likelihood of detecting the right
spelling, and jargon. So, the likelihood of detecting sentiment “out of the box” is much
the right sentiment “out of the box” is much higher higher for a rule-based semantic
for a rule-based semantic analysis system than for a analysis system.
statistical system.
In a rule-based system, all customers’ data go
through the same analysis engine. Knowledge is recycled, so everybody learns from everybody
else. Only in this way can all incoming new messages teach the system and all customers get
the benefit of this learning. This is especially important when analyzing input from environments
such as social media, in which the language and discussion topics change frequently over time.
Buying and running a rule-based semantic analysis system is significantly cheaper than a
statistical semantic analysis system. The price level and ease of entry of a rule-based system
opens the market to small- and medium-size firms for whom the price of a statistical semantic
analysis system is absolutely prohibitive. Rule-based systems enable companies of all sizes to
get started on their analysis immediately, allowing them to focus on what is important: quick,
real-time actionable insights on customer feedback topics, sentiments and trends.
5
8. WOULD YOU LIKE TO LEARN MORE ABOUT ETUMA’S CUSTOMER
FEEDBACK ANALYSIS SERVICE?
To find out how our rule-based semantic analysis service can be of benefit to your
company, visit www.etuma.com or call Matti +358.40.822.2010.