SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
White Paper




Search vs.Text Classification
Increasing the signal, decreasing the noise




   1 West Street New York NY 10004 | 646-545-3900 | info@networkedinsights.com | networkedinsights.com
White Paper                                                                                                 Networked Insights
                                                                                                              Network




Search vs. Text Classification
Increasing the signal, decreasing the noise

Since the advent of the World Wide Web, businesses and                             Topic discovery—
consumers have used a variety of ways to find information.                         letting data speak for itself
These various methods of discovery have trained us to think                        Topic discovery is a valuable type of
and behave in ways that make understanding analytics                               semantic analysis based on text
challenging. In fact, what makes retrieving information easy                       classification. Whereas sentiment analysis
for individuals is not the manner in which we should examine                       simply reveals people’s likes and dislikes,
social data. Confused?                                                             semantic analysis refers to a group of
                                                                                   methods that allow machines to discover
In the infancy of the commercial public Web, navigation was nearly impos-          the fundamental patterns of words or
sible without directories and then information portals. With the explosion         phrases that act as building blocks in a
of the Web in the late 1990s, keyword searching and using search engines           large set of text. Topics, themes, sentiment
has become as ubiquitous as the Internet itself. While the underlying              and similar elements of meaning appear
methods of search have evolved over the years, its primary use has stayed          as intricate weavings of those fundamental
constant since the early days of companies like Yahoo!, Altavista, Lycos,          patterns. So semantic analysis is the
Excite and Google. Reflecting its mass popularity and understanding,               summarization of large amounts of text
search is often the first tool applied to a wide variety of data challenges.       by automatically discovering the topics
                                                                                   and themes within.
But is search always the right solution? There are many things you can do
with a hammer, but it’s not so great if you need to turn a screw.                  By grouping social media posts based on
                                                                                   semantic similarity, rather than preset
To learn what customers think about your products and services, you may            sentiment categories such as positive, nega-
need to apply sentiment analysis across millions of social media posts.            tive and neutral, topic discovery can help
Or, to guide your media buying, you might use topic discovery to uncover           companies uncover important information –
market trends in the social conversation.                                          for example, what exactly people are saying
                                                                                   about a product or service; where and how
In either case, using search to identify the set of posts you’ll submit to         they use it; the features they use most; and
scrutiny could send your social media analysis down the wrong path from            the enhancements or new offerings they’re
the start. Your approach to conducting sentiment analysis or topic                 interested in. All of this information can
discovery could be spot on. But if it’s based on a number of posts that            ultimately drive product development, new
aren’t actually about what you think they are, which typically happens             revenue streams and strategies for market-
with search, the noise created can flaw the inferences and conclusions you         ing, advertising and media planning.
ultimately draw.

Text classification is an alternative to search that may be more appropri-
ate for social media data analysis. Text classification is the task of assigning
predefined categories to free-text documents. It can provide conceptual
views of document collections and has important applications in the real
world. Using text classification as the foundation for analysis – i.e., teach-
ing a machine to categorize posts the way humans do – can dramatically
improve your ability to gather the right data and, ultimately, increase the
chances that you’ll uncover what you need to know.




2
White Paper                                                                                         Networked Insights

Search vs. Text Classification



The impact of bad data
A look at several related but distinct topics illustrates how seriously the
problems of search can impact analysis.

A Networked Insights analyst designed search queries for five topics that
moms typically discuss – pregnancy and newborns; school-aged children;
food, nutrition and health; shopping and money; and illness and injury.
Searches were run on the five topics, then another analyst reviewed
the results under two test scenarios to determine how well the search
delivered posts fitting the intended criteria as defined by the query.

In the first test, the analyst reviewed only the top 20 results returned       traditional search
by each search as ordered by the search engine. In the second test, the
analyst reviewed a random sample of 200 results returned by the search.
In each case, the analyst was asked to judge whether each resulting post
was appropriate for the intended category or if it fit better in a different
one. The percent of appropriate posts is a measure of the “precision” of
the search.

The test results (Table 1) reveal search’s severe limitations. Precision was   Significant problems arise
high when only the top 20 results were examined (90 percent or higher),        with search when you’re
but falls precipitously when examining a larger number of randomly sam-
pled posts. In only one search, pregnancy and newborns, did the results
                                                                               after a broad collection of
yield a somewhat reliable level of precision (86.5 percent). In three of the   similar posts, not a handful
five searches, precision rates were under 50 percent.                          of the best ones.
In practical terms, these results mean there’s a greater chance that a ran-
domly selected search result will not meet the intended criteria than that
it will. Said another way, search might be used to support other analyses
by returning a large number of posts assumed to cover the same basic
topic. The problem: the majority of the data isn’t relevant to the topic you
want to understand.

Table 1. Keyword Search Precision

 Desired Topic                Top 20 Results Only   Random Sample
 Pregnancy and newborns       95%                   86.5%
 School-aged children         95%                   19.5%
 Food, nutrition, health      90%                   39.5%
 Shopping and money           100%                  57.5%
 Illness and Injury           100%                  41%
 Overall                      96%                   48.8%




3
White Paper                                                                                           Networked Insights

Search vs. Text Classification



The shortcomings of search
By definition, the intent of search is to uncover the best responses to a
query. A search engine goes out and grabs hundreds of thousands of posts
that match the word or phrase programmed into the query and attempts
to rank them in order of relevance. Its goal is to put the post most likely to
be the one you’re looking for at the top of the list. The search engine does
this effectively, as seen in the first column of results in Table 1.

Significant problems arise with search when you’re after a broad
collection of similar posts, not a handful of the best ones. This is often the
case in social media analysis, when the goal is to analyze millions of posts
to identify trends that can inform marketing decisions or uncover insights       traditional search
that can reveal business opportunities. Simply stated, more data points are
sometimes much better than a few. In these cases, search will undermine
your efforts. The first 20, or even 200, posts might be great matches. But
the last 20 or 200 might not match at all, as seen in the second results
column of Table 1.

Search methodology has other significant shortcomings, which are
more apparent when it’s applied to social media data than when used              Search cannot contemplate
with other, more structured forms of text. For example, search struggles         the context of how words
when you’re looking for something more complicated than whether
or not a document contains a particular word or phrase. Search
                                                                                 and phrases are used in
cannot contemplate the context of how words and phrases are used                 relationship to one another;
in relationship to one another; it simply can identify whether or not            it simply can identify wheth-
that word or phrase is present.
                                                                                 er or not that word or phrase
Search also suffers a bias problem. If the searcher uses words that are          is present.
not a direct reflection of the words that millions of other people use for
a given topic, search can’t accommodate the differences.

To sum up the problems, search does not inherently provide a mechanism
for determining which results should belong to the desired group and
which should not. The norm is to simply say that all posts that match a
query belong to the desired topic and use all of them in further analyses.

A better way — the power of classification                                       classification
In contrast to search, text classification uses machine-learning algorithms
to learn from a set of examples how to separate posts into topics. If an
algorithm, or program, is presented with examples of how a human would
separate posts based on topic, it can learn to mimic that person’s process
                                                                                 Classification offers the
on new, previously unseen posts. One major advantage of this approach is         potential to produce a
that the program can scale up to perform its process on millions of docu-        dataset in which all of the
ments. People do not scale up so easily.
                                                                                 posts are relevant to the
Classification offers the potential to produce a dataset in which all of the     topics being analyzed. The
posts are relevant to the topics being analyzed. The last 20 are as valuable
                                                                                 last 20 are as valuable to
to the analysis as the first 20.
                                                                                 the analysis as the first 20.

4    © 2011 Networked Insights, Inc. All rights reserved.
White Paper                                                                                                         Networked Insights

Search vs. Text Classification


The classification process begins with a human analyst selecting a sampling
of posts that relate to a specific topic, such as pregnancy and newborns.
The analyst also selects posts that are irrelevant, so the algorithm being
used can detect the difference. These posts serve as the training examples
from which the machine will learn.

A variety of algorithms can be used for classification, including artificial
neural networks, support vector machines and Naive Bayes algorithms.
Selecting the right algorithm and tuning it are critical, as some do well at
certain problems and not so well at others.
                                                                                              creating a stronger signal
In the next step, the algorithm learns how to categorize new posts by
reading the example posts and identifying general rules that differentiate
the relevant and irrelevant posts. For example, when the program sees the                     Millions of people use
phrases “little one” and “hospital” together in a post, it might notice that
the probability the post belongs to the pregnancy and newborns category
                                                                                              search every day to find
increases significantly. It then uses this knowledge in categorizing other                    what they’re looking for
posts. The goal is not to memorize the training examples, but to find gen-                    online. But search can send
eral characteristics that help the algorithm categorize new posts.
                                                                                              you off into the social media
Table 2 adds a third column to Table 1 that shows the result of using clas-                   wilderness if you’re using
sification instead of search to identify posts presumably related to the five
mom topics. The analysis approach for classification was the same as that
                                                                                              traditional monitoring tools
applied to the search precision test. An independent analyst reviewed 200                     to discover conversations
randomly sampled results from classification and determined whether or                        and trends. So stop
not they matched the intended topic. The improvement over the search
precision test is dramatic. The overall precision of using classification was                 searching. Instead, start
86 percent vs. 49 percent using search across all posts. For one topic –                      asking how real-time data
food, nutrition and health – precision rose from 39.5 percent with search
                                                                                              can support your existing
to 100 percent through classification.
                                                                                              decision-making processes
Table 2. Precision of Using Classification to Identify Posts in Comparison to Search          and then use classification
                                    Top 20 Results Only      Random Sample   Classification
 Desired Topic
                                                                                              techniques to cut through
 Pregnancy and newborns             95%                      86.5%           88.0%
 School-aged children               95%                      19.5%           72%
                                                                                              the noise and sharpen your
 Food, nutrition, health            90%                      39.5%           100%             social analysis.
 Shopping and money                 100%                     57.5%           87%
 Illness and Injury                 100%                     41%             83%
 Overall                            96%                      48.8%           86%


Classification clearly provides greater precision in social data analysis.
It offers deeper insights – both on a broad scale and when drilling into
specific topics – than can be gleaned from standard search techniques.

Questions about this report? Want a free consultation on how social data
can improve your media planning and other marketing? Contact us.

                                                                                              646-545-3900
                                                                                              info@networkedinsights.com
5     © 2011 Networked Insights, Inc. All rights reserved.                                    networkedinsights.com

Mais conteúdo relacionado

Mais procurados

Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic SearchPaul Wlodarczyk
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialPeter Mika
 
Smoke Signals and Social Signals: A look at the patents and papers
Smoke Signals and Social Signals: A look at the patents and papersSmoke Signals and Social Signals: A look at the patents and papers
Smoke Signals and Social Signals: A look at the patents and papersBill Slawski
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialBarbara Starr
 
Henry stewart dam2010_taxonomicsearch_markohurst
Henry stewart dam2010_taxonomicsearch_markohurstHenry stewart dam2010_taxonomicsearch_markohurst
Henry stewart dam2010_taxonomicsearch_markohurstWIKOLO
 
Wk5 contextualized onlinesearchandresearchskills
Wk5 contextualized onlinesearchandresearchskillsWk5 contextualized onlinesearchandresearchskills
Wk5 contextualized onlinesearchandresearchskillsResty Aldana
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorialThengo Kim
 
Search Analytics For Content Strategists @CSofNYC
Search Analytics For Content Strategists @CSofNYCSearch Analytics For Content Strategists @CSofNYC
Search Analytics For Content Strategists @CSofNYCWIKOLO
 
Lesson Six Researching And The Internet
Lesson Six   Researching And The InternetLesson Six   Researching And The Internet
Lesson Six Researching And The Internetbsimoneaux
 
Sa discover text webinar
Sa discover text webinarSa discover text webinar
Sa discover text webinarQuestionPro
 
Improving search result via search keywords and data classification similarity
Improving search result via search keywords and data classification similarityImproving search result via search keywords and data classification similarity
Improving search result via search keywords and data classification similarityConference Papers
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the RisePeter Mika
 
Structured Data in Web Search
Structured Data in Web SearchStructured Data in Web Search
Structured Data in Web SearcheXascale Infolab
 
How to be successful with search in your organisation
How to be successful with search in your organisationHow to be successful with search in your organisation
How to be successful with search in your organisationvoginip
 

Mais procurados (17)

Implementing Semantic Search
Implementing Semantic SearchImplementing Semantic Search
Implementing Semantic Search
 
SemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorialSemTech 2011 Semantic Search tutorial
SemTech 2011 Semantic Search tutorial
 
Smoke Signals and Social Signals: A look at the patents and papers
Smoke Signals and Social Signals: A look at the patents and papersSmoke Signals and Social Signals: A look at the patents and papers
Smoke Signals and Social Signals: A look at the patents and papers
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
Henry stewart dam2010_taxonomicsearch_markohurst
Henry stewart dam2010_taxonomicsearch_markohurstHenry stewart dam2010_taxonomicsearch_markohurst
Henry stewart dam2010_taxonomicsearch_markohurst
 
Wk5 contextualized onlinesearchandresearchskills
Wk5 contextualized onlinesearchandresearchskillsWk5 contextualized onlinesearchandresearchskills
Wk5 contextualized onlinesearchandresearchskills
 
Sem tech2013 tutorial
Sem tech2013 tutorialSem tech2013 tutorial
Sem tech2013 tutorial
 
Search Analytics For Content Strategists @CSofNYC
Search Analytics For Content Strategists @CSofNYCSearch Analytics For Content Strategists @CSofNYC
Search Analytics For Content Strategists @CSofNYC
 
Research4C4U
Research4C4UResearch4C4U
Research4C4U
 
Apresentação UCA
Apresentação UCAApresentação UCA
Apresentação UCA
 
Lesson Six Researching And The Internet
Lesson Six   Researching And The InternetLesson Six   Researching And The Internet
Lesson Six Researching And The Internet
 
Sa discover text webinar
Sa discover text webinarSa discover text webinar
Sa discover text webinar
 
Improving search result via search keywords and data classification similarity
Improving search result via search keywords and data classification similarityImproving search result via search keywords and data classification similarity
Improving search result via search keywords and data classification similarity
 
Semantic Search on the Rise
Semantic Search on the RiseSemantic Search on the Rise
Semantic Search on the Rise
 
Structured Data in Web Search
Structured Data in Web SearchStructured Data in Web Search
Structured Data in Web Search
 
How to be successful with search in your organisation
How to be successful with search in your organisationHow to be successful with search in your organisation
How to be successful with search in your organisation
 

Semelhante a Search vs Text Classification

The Case for Social Consumer Insights
The Case for Social Consumer InsightsThe Case for Social Consumer Insights
The Case for Social Consumer InsightsBrandwatch
 
Responses to Other Students Respond to at least 2 of your fellow .docx
Responses to Other Students Respond to at least 2 of your fellow .docxResponses to Other Students Respond to at least 2 of your fellow .docx
Responses to Other Students Respond to at least 2 of your fellow .docxronak56
 
Content tagging and recommender systems
Content tagging and recommender systemsContent tagging and recommender systems
Content tagging and recommender systemsmettadata
 
Let’s talk about you
Let’s talk about youLet’s talk about you
Let’s talk about youTNS
 
The Role of Families and the Community Proposal Template (N.docx
The Role of Families and the Community Proposal Template  (N.docxThe Role of Families and the Community Proposal Template  (N.docx
The Role of Families and the Community Proposal Template (N.docxssusera34210
 
This assignment is in three parts and asks you to take a critica.docx
This assignment is in three parts and asks you to take a critica.docxThis assignment is in three parts and asks you to take a critica.docx
This assignment is in three parts and asks you to take a critica.docxadkinspaige22
 
Industry and branding information
Industry and branding informationIndustry and branding information
Industry and branding informationMegan Heuer
 
Data Collection Tool Used For Information About Individuals
Data Collection Tool Used For Information About IndividualsData Collection Tool Used For Information About Individuals
Data Collection Tool Used For Information About IndividualsChristy Hunt
 
Marketers: the future is ready for you now
Marketers: the future is ready for you nowMarketers: the future is ready for you now
Marketers: the future is ready for you nowTNS
 
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...ijaia
 
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...gerogepatton
 
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...gerogepatton
 
How to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inHow to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inMary Chitty
 

Semelhante a Search vs Text Classification (20)

The Case for Social Consumer Insights
The Case for Social Consumer InsightsThe Case for Social Consumer Insights
The Case for Social Consumer Insights
 
Responses to Other Students Respond to at least 2 of your fellow .docx
Responses to Other Students Respond to at least 2 of your fellow .docxResponses to Other Students Respond to at least 2 of your fellow .docx
Responses to Other Students Respond to at least 2 of your fellow .docx
 
Why Quertle?
Why Quertle?Why Quertle?
Why Quertle?
 
Campus Session 2
Campus Session 2Campus Session 2
Campus Session 2
 
Content tagging and recommender systems
Content tagging and recommender systemsContent tagging and recommender systems
Content tagging and recommender systems
 
Task 1 nature
Task 1 natureTask 1 nature
Task 1 nature
 
Let’s talk about you
Let’s talk about youLet’s talk about you
Let’s talk about you
 
The Role of Families and the Community Proposal Template (N.docx
The Role of Families and the Community Proposal Template  (N.docxThe Role of Families and the Community Proposal Template  (N.docx
The Role of Families and the Community Proposal Template (N.docx
 
This assignment is in three parts and asks you to take a critica.docx
This assignment is in three parts and asks you to take a critica.docxThis assignment is in three parts and asks you to take a critica.docx
This assignment is in three parts and asks you to take a critica.docx
 
Lecture 5
Lecture 5Lecture 5
Lecture 5
 
Lecture 5
Lecture 5Lecture 5
Lecture 5
 
Industry and branding information
Industry and branding informationIndustry and branding information
Industry and branding information
 
Data Collection Tool Used For Information About Individuals
Data Collection Tool Used For Information About IndividualsData Collection Tool Used For Information About Individuals
Data Collection Tool Used For Information About Individuals
 
Paper 1
Paper 1Paper 1
Paper 1
 
Marketers: the future is ready for you now
Marketers: the future is ready for you nowMarketers: the future is ready for you now
Marketers: the future is ready for you now
 
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
 
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
 
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
 
SAL2016_paper_15
SAL2016_paper_15SAL2016_paper_15
SAL2016_paper_15
 
How to create a taxonomy for management buy-in
How to create a taxonomy for management buy-inHow to create a taxonomy for management buy-in
How to create a taxonomy for management buy-in
 

Mais de Networked Insights

What Can Big Grocery Learn from Farmers' Markets Using Social Data?
What Can Big Grocery Learn from Farmers' Markets Using Social Data?What Can Big Grocery Learn from Farmers' Markets Using Social Data?
What Can Big Grocery Learn from Farmers' Markets Using Social Data?Networked Insights
 
Networked Insights Super Bowl XLVIII Brand and Advertising Analysis
Networked Insights Super Bowl XLVIII Brand and Advertising AnalysisNetworked Insights Super Bowl XLVIII Brand and Advertising Analysis
Networked Insights Super Bowl XLVIII Brand and Advertising AnalysisNetworked Insights
 
Outsmart The Upfronts by Networked Insights
Outsmart The Upfronts by Networked InsightsOutsmart The Upfronts by Networked Insights
Outsmart The Upfronts by Networked InsightsNetworked Insights
 
Big Data - A Revolution in Marketing & Media
Big Data - A Revolution in Marketing & MediaBig Data - A Revolution in Marketing & Media
Big Data - A Revolution in Marketing & MediaNetworked Insights
 
Networked Insights Press Highlights
Networked Insights Press HighlightsNetworked Insights Press Highlights
Networked Insights Press HighlightsNetworked Insights
 
Academy awards analysis networked insights
Academy awards analysis   networked insightsAcademy awards analysis   networked insights
Academy awards analysis networked insightsNetworked Insights
 
Insights from super bowl xlvii 2013 post game analysis (brands + celebs) 20...
Insights from super bowl xlvii   2013 post game analysis (brands + celebs) 20...Insights from super bowl xlvii   2013 post game analysis (brands + celebs) 20...
Insights from super bowl xlvii 2013 post game analysis (brands + celebs) 20...Networked Insights
 
Festival of Media - Macro Trends
Festival of Media - Macro TrendsFestival of Media - Macro Trends
Festival of Media - Macro TrendsNetworked Insights
 
Influencers - Finding the Fans that Work for You
Influencers - Finding the Fans that Work for YouInfluencers - Finding the Fans that Work for You
Influencers - Finding the Fans that Work for YouNetworked Insights
 
Making marketing decisions at the speed of your consumer
Making marketing decisions at the speed of your consumerMaking marketing decisions at the speed of your consumer
Making marketing decisions at the speed of your consumerNetworked Insights
 
The Most Anticipated New Fall TV Shows
The Most Anticipated New Fall TV ShowsThe Most Anticipated New Fall TV Shows
The Most Anticipated New Fall TV ShowsNetworked Insights
 
New Audience Insights From SocialTV
New Audience Insights From SocialTVNew Audience Insights From SocialTV
New Audience Insights From SocialTVNetworked Insights
 
CMOs: How to Spend the Minimal Effective Amount on Media
CMOs: How to Spend the Minimal Effective Amount on MediaCMOs: How to Spend the Minimal Effective Amount on Media
CMOs: How to Spend the Minimal Effective Amount on MediaNetworked Insights
 
Networked Insights Media Optimization Guide
Networked Insights Media Optimization GuideNetworked Insights Media Optimization Guide
Networked Insights Media Optimization GuideNetworked Insights
 
Stage-Gate success: How the social web drives product development
Stage-Gate success: How the social web drives product developmentStage-Gate success: How the social web drives product development
Stage-Gate success: How the social web drives product developmentNetworked Insights
 
Social Intelligence Report: Kim Kardashian
Social Intelligence Report: Kim KardashianSocial Intelligence Report: Kim Kardashian
Social Intelligence Report: Kim KardashianNetworked Insights
 
12 Ways to Monitize Social Media
12 Ways to Monitize Social Media12 Ways to Monitize Social Media
12 Ways to Monitize Social MediaNetworked Insights
 
True Blood Social intelligence Report
True Blood Social intelligence ReportTrue Blood Social intelligence Report
True Blood Social intelligence ReportNetworked Insights
 

Mais de Networked Insights (20)

What Can Big Grocery Learn from Farmers' Markets Using Social Data?
What Can Big Grocery Learn from Farmers' Markets Using Social Data?What Can Big Grocery Learn from Farmers' Markets Using Social Data?
What Can Big Grocery Learn from Farmers' Markets Using Social Data?
 
Networked Insights Super Bowl XLVIII Brand and Advertising Analysis
Networked Insights Super Bowl XLVIII Brand and Advertising AnalysisNetworked Insights Super Bowl XLVIII Brand and Advertising Analysis
Networked Insights Super Bowl XLVIII Brand and Advertising Analysis
 
Outsmart The Upfronts by Networked Insights
Outsmart The Upfronts by Networked InsightsOutsmart The Upfronts by Networked Insights
Outsmart The Upfronts by Networked Insights
 
Big Data - A Revolution in Marketing & Media
Big Data - A Revolution in Marketing & MediaBig Data - A Revolution in Marketing & Media
Big Data - A Revolution in Marketing & Media
 
Networked Insights Press Highlights
Networked Insights Press HighlightsNetworked Insights Press Highlights
Networked Insights Press Highlights
 
Academy awards analysis networked insights
Academy awards analysis   networked insightsAcademy awards analysis   networked insights
Academy awards analysis networked insights
 
2012 Holiday Movie Analysis
2012 Holiday Movie Analysis2012 Holiday Movie Analysis
2012 Holiday Movie Analysis
 
Insights from super bowl xlvii 2013 post game analysis (brands + celebs) 20...
Insights from super bowl xlvii   2013 post game analysis (brands + celebs) 20...Insights from super bowl xlvii   2013 post game analysis (brands + celebs) 20...
Insights from super bowl xlvii 2013 post game analysis (brands + celebs) 20...
 
Festival of Media - Macro Trends
Festival of Media - Macro TrendsFestival of Media - Macro Trends
Festival of Media - Macro Trends
 
Influencers - Finding the Fans that Work for You
Influencers - Finding the Fans that Work for YouInfluencers - Finding the Fans that Work for You
Influencers - Finding the Fans that Work for You
 
Making marketing decisions at the speed of your consumer
Making marketing decisions at the speed of your consumerMaking marketing decisions at the speed of your consumer
Making marketing decisions at the speed of your consumer
 
The Most Anticipated New Fall TV Shows
The Most Anticipated New Fall TV ShowsThe Most Anticipated New Fall TV Shows
The Most Anticipated New Fall TV Shows
 
New Audience Insights From SocialTV
New Audience Insights From SocialTVNew Audience Insights From SocialTV
New Audience Insights From SocialTV
 
CMOs: How to Spend the Minimal Effective Amount on Media
CMOs: How to Spend the Minimal Effective Amount on MediaCMOs: How to Spend the Minimal Effective Amount on Media
CMOs: How to Spend the Minimal Effective Amount on Media
 
Networked Insights Media Optimization Guide
Networked Insights Media Optimization GuideNetworked Insights Media Optimization Guide
Networked Insights Media Optimization Guide
 
Stage-Gate success: How the social web drives product development
Stage-Gate success: How the social web drives product developmentStage-Gate success: How the social web drives product development
Stage-Gate success: How the social web drives product development
 
2011 Retail Brands Report
2011 Retail Brands Report2011 Retail Brands Report
2011 Retail Brands Report
 
Social Intelligence Report: Kim Kardashian
Social Intelligence Report: Kim KardashianSocial Intelligence Report: Kim Kardashian
Social Intelligence Report: Kim Kardashian
 
12 Ways to Monitize Social Media
12 Ways to Monitize Social Media12 Ways to Monitize Social Media
12 Ways to Monitize Social Media
 
True Blood Social intelligence Report
True Blood Social intelligence ReportTrue Blood Social intelligence Report
True Blood Social intelligence Report
 

Último

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Último (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Search vs Text Classification

  • 1. White Paper Search vs.Text Classification Increasing the signal, decreasing the noise 1 West Street New York NY 10004 | 646-545-3900 | info@networkedinsights.com | networkedinsights.com
  • 2. White Paper Networked Insights Network Search vs. Text Classification Increasing the signal, decreasing the noise Since the advent of the World Wide Web, businesses and Topic discovery— consumers have used a variety of ways to find information. letting data speak for itself These various methods of discovery have trained us to think Topic discovery is a valuable type of and behave in ways that make understanding analytics semantic analysis based on text challenging. In fact, what makes retrieving information easy classification. Whereas sentiment analysis for individuals is not the manner in which we should examine simply reveals people’s likes and dislikes, social data. Confused? semantic analysis refers to a group of methods that allow machines to discover In the infancy of the commercial public Web, navigation was nearly impos- the fundamental patterns of words or sible without directories and then information portals. With the explosion phrases that act as building blocks in a of the Web in the late 1990s, keyword searching and using search engines large set of text. Topics, themes, sentiment has become as ubiquitous as the Internet itself. While the underlying and similar elements of meaning appear methods of search have evolved over the years, its primary use has stayed as intricate weavings of those fundamental constant since the early days of companies like Yahoo!, Altavista, Lycos, patterns. So semantic analysis is the Excite and Google. Reflecting its mass popularity and understanding, summarization of large amounts of text search is often the first tool applied to a wide variety of data challenges. by automatically discovering the topics and themes within. But is search always the right solution? There are many things you can do with a hammer, but it’s not so great if you need to turn a screw. By grouping social media posts based on semantic similarity, rather than preset To learn what customers think about your products and services, you may sentiment categories such as positive, nega- need to apply sentiment analysis across millions of social media posts. tive and neutral, topic discovery can help Or, to guide your media buying, you might use topic discovery to uncover companies uncover important information – market trends in the social conversation. for example, what exactly people are saying about a product or service; where and how In either case, using search to identify the set of posts you’ll submit to they use it; the features they use most; and scrutiny could send your social media analysis down the wrong path from the enhancements or new offerings they’re the start. Your approach to conducting sentiment analysis or topic interested in. All of this information can discovery could be spot on. But if it’s based on a number of posts that ultimately drive product development, new aren’t actually about what you think they are, which typically happens revenue streams and strategies for market- with search, the noise created can flaw the inferences and conclusions you ing, advertising and media planning. ultimately draw. Text classification is an alternative to search that may be more appropri- ate for social media data analysis. Text classification is the task of assigning predefined categories to free-text documents. It can provide conceptual views of document collections and has important applications in the real world. Using text classification as the foundation for analysis – i.e., teach- ing a machine to categorize posts the way humans do – can dramatically improve your ability to gather the right data and, ultimately, increase the chances that you’ll uncover what you need to know. 2
  • 3. White Paper Networked Insights Search vs. Text Classification The impact of bad data A look at several related but distinct topics illustrates how seriously the problems of search can impact analysis. A Networked Insights analyst designed search queries for five topics that moms typically discuss – pregnancy and newborns; school-aged children; food, nutrition and health; shopping and money; and illness and injury. Searches were run on the five topics, then another analyst reviewed the results under two test scenarios to determine how well the search delivered posts fitting the intended criteria as defined by the query. In the first test, the analyst reviewed only the top 20 results returned traditional search by each search as ordered by the search engine. In the second test, the analyst reviewed a random sample of 200 results returned by the search. In each case, the analyst was asked to judge whether each resulting post was appropriate for the intended category or if it fit better in a different one. The percent of appropriate posts is a measure of the “precision” of the search. The test results (Table 1) reveal search’s severe limitations. Precision was Significant problems arise high when only the top 20 results were examined (90 percent or higher), with search when you’re but falls precipitously when examining a larger number of randomly sam- pled posts. In only one search, pregnancy and newborns, did the results after a broad collection of yield a somewhat reliable level of precision (86.5 percent). In three of the similar posts, not a handful five searches, precision rates were under 50 percent. of the best ones. In practical terms, these results mean there’s a greater chance that a ran- domly selected search result will not meet the intended criteria than that it will. Said another way, search might be used to support other analyses by returning a large number of posts assumed to cover the same basic topic. The problem: the majority of the data isn’t relevant to the topic you want to understand. Table 1. Keyword Search Precision Desired Topic Top 20 Results Only Random Sample Pregnancy and newborns 95% 86.5% School-aged children 95% 19.5% Food, nutrition, health 90% 39.5% Shopping and money 100% 57.5% Illness and Injury 100% 41% Overall 96% 48.8% 3
  • 4. White Paper Networked Insights Search vs. Text Classification The shortcomings of search By definition, the intent of search is to uncover the best responses to a query. A search engine goes out and grabs hundreds of thousands of posts that match the word or phrase programmed into the query and attempts to rank them in order of relevance. Its goal is to put the post most likely to be the one you’re looking for at the top of the list. The search engine does this effectively, as seen in the first column of results in Table 1. Significant problems arise with search when you’re after a broad collection of similar posts, not a handful of the best ones. This is often the case in social media analysis, when the goal is to analyze millions of posts to identify trends that can inform marketing decisions or uncover insights traditional search that can reveal business opportunities. Simply stated, more data points are sometimes much better than a few. In these cases, search will undermine your efforts. The first 20, or even 200, posts might be great matches. But the last 20 or 200 might not match at all, as seen in the second results column of Table 1. Search methodology has other significant shortcomings, which are more apparent when it’s applied to social media data than when used Search cannot contemplate with other, more structured forms of text. For example, search struggles the context of how words when you’re looking for something more complicated than whether or not a document contains a particular word or phrase. Search and phrases are used in cannot contemplate the context of how words and phrases are used relationship to one another; in relationship to one another; it simply can identify whether or not it simply can identify wheth- that word or phrase is present. er or not that word or phrase Search also suffers a bias problem. If the searcher uses words that are is present. not a direct reflection of the words that millions of other people use for a given topic, search can’t accommodate the differences. To sum up the problems, search does not inherently provide a mechanism for determining which results should belong to the desired group and which should not. The norm is to simply say that all posts that match a query belong to the desired topic and use all of them in further analyses. A better way — the power of classification classification In contrast to search, text classification uses machine-learning algorithms to learn from a set of examples how to separate posts into topics. If an algorithm, or program, is presented with examples of how a human would separate posts based on topic, it can learn to mimic that person’s process Classification offers the on new, previously unseen posts. One major advantage of this approach is potential to produce a that the program can scale up to perform its process on millions of docu- dataset in which all of the ments. People do not scale up so easily. posts are relevant to the Classification offers the potential to produce a dataset in which all of the topics being analyzed. The posts are relevant to the topics being analyzed. The last 20 are as valuable last 20 are as valuable to to the analysis as the first 20. the analysis as the first 20. 4 © 2011 Networked Insights, Inc. All rights reserved.
  • 5. White Paper Networked Insights Search vs. Text Classification The classification process begins with a human analyst selecting a sampling of posts that relate to a specific topic, such as pregnancy and newborns. The analyst also selects posts that are irrelevant, so the algorithm being used can detect the difference. These posts serve as the training examples from which the machine will learn. A variety of algorithms can be used for classification, including artificial neural networks, support vector machines and Naive Bayes algorithms. Selecting the right algorithm and tuning it are critical, as some do well at certain problems and not so well at others. creating a stronger signal In the next step, the algorithm learns how to categorize new posts by reading the example posts and identifying general rules that differentiate the relevant and irrelevant posts. For example, when the program sees the Millions of people use phrases “little one” and “hospital” together in a post, it might notice that the probability the post belongs to the pregnancy and newborns category search every day to find increases significantly. It then uses this knowledge in categorizing other what they’re looking for posts. The goal is not to memorize the training examples, but to find gen- online. But search can send eral characteristics that help the algorithm categorize new posts. you off into the social media Table 2 adds a third column to Table 1 that shows the result of using clas- wilderness if you’re using sification instead of search to identify posts presumably related to the five mom topics. The analysis approach for classification was the same as that traditional monitoring tools applied to the search precision test. An independent analyst reviewed 200 to discover conversations randomly sampled results from classification and determined whether or and trends. So stop not they matched the intended topic. The improvement over the search precision test is dramatic. The overall precision of using classification was searching. Instead, start 86 percent vs. 49 percent using search across all posts. For one topic – asking how real-time data food, nutrition and health – precision rose from 39.5 percent with search can support your existing to 100 percent through classification. decision-making processes Table 2. Precision of Using Classification to Identify Posts in Comparison to Search and then use classification Top 20 Results Only Random Sample Classification Desired Topic techniques to cut through Pregnancy and newborns 95% 86.5% 88.0% School-aged children 95% 19.5% 72% the noise and sharpen your Food, nutrition, health 90% 39.5% 100% social analysis. Shopping and money 100% 57.5% 87% Illness and Injury 100% 41% 83% Overall 96% 48.8% 86% Classification clearly provides greater precision in social data analysis. It offers deeper insights – both on a broad scale and when drilling into specific topics – than can be gleaned from standard search techniques. Questions about this report? Want a free consultation on how social data can improve your media planning and other marketing? Contact us. 646-545-3900 info@networkedinsights.com 5 © 2011 Networked Insights, Inc. All rights reserved. networkedinsights.com