SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
TEXT MINING-TAPPING 
HIDDEN KERNELS OF WISDOM 
A thought paper on ‘business benefits of text mining.’
Executive Summary 
Text Mining has evolved and garnered 
a lot of interest in recent times, not 
without a reason though. Organizations 
collect huge amounts of text data. 
They can use various text mining tools 
and techniques to analyse that data 
for meaningful, actionable insights. 
This paper discusses how automatic 
document classification, information 
retrieval, word frequency calculation, 
sentiment analysis, topic modelling 
and trend analysis can be utilized for 
root cause analysis, devising 
competitive strategies, enhancing 
customer experienceand so on. 
The Approach 
Text data can be in different forms - Facebook scraping tools. Known attributes of the text, like 
posts/comments, tweets, customer feedback, date, location, customer ID, business area, should 
blog posts, sales rep. notes, patient health records, also be captured and recorded, as that would help 
complaint logs, third-party surveys (free text in filtering and selective analysis. Incoming text 
formats), newswires, newsfeeds, documents, etc. can be cleaned up by removing irrelevant data, if 
Conceptually, text mining is a three-step required. This text data can then be mined with the 
process (Figure 1). help of commercial programs (SAS, SPSS, SAP 
Comments/feedback can be stored in simple Infinite Insight), niche commercial or open source tables in a spreadsheet/database and documents programs (Attensity, OpenText, SmartLogic, openNLP, can be stored in folders. Text available on social KNIME) or analytics tools (R, Python) (Figure 2). 
media and other websites can be collectedusing 
Figure 1 
! Web Sensors 
! Open and Paid API 
! Web Scraping 
! Internal Data 
! Customer Feedbacks 
! Value Chain Partner 
Interaction 
LOADING 
! Data into flat files or 
spreadsheets 
! Data load in 
Databases 
! Advanced Text 
Analytics 
! Word Frequency 
! Word Associations 
! Sentiment Analysis 
! Trend Analysis 
EXTRACTION ANALYTICS 
Figure 2 
Survey 
Emails 
Logs 
Web 
Scaping 
tools 
! SAS Text Analytics 
! SPASS Text Analytics 
! SAP Infinitelnsight 
! Smart Logic 
! Rapid Miner 
! Open Text 
! R Test Mining 
World 
Cloud 
Setiment 
Analysis & 
Trends 
Topic 
Analysis 
BI Strategy & Framework BI Strategy & Framework
! Machine Learning – Here, a training data set is Topic & Content Analysis 
made available to the program with predefined 
Advanced text mining can be used sentiments. The algorithm trains based on to analyze topics 
occurrence of words or patterns, and new and context. Just likesentiment analysis, topic 
incoming texts are classified based on the analysis can be done either through machine 
occurrence of such patterns or words. learning or based on predefined dictionaries. 
Advanced techniques, like Support Vector Machines, 
! Sentiment Lexicons – Here, matches of positive Latent Dirichlet Allocation, etc., can also be used. 
or negative words are found and a sentiment In addition, word clustering (Hierarchical or K Means 
measure is calculated. Clustering techniques), network diagrams and word 
However, it is easier said than done. The complexity associations can be used to look for topics or 
of languages, sarcasm, colloquialism, poor spelling context based on natural word groups. Co-and 
grammar can marthe accuracy of results in occurrence and proximity are two of the most 
this area. In addition, sentiments need to be useful ways to group words in similar topics. 
analyzed holistically and in a context. For e.g., 
finding two products or services similar can only 
be construed positive or negative by knowing 
about the two products or services being discussed. 
The comment could be a compliment or criticism, 
depending upon what it is being compared to. 
Sentiment analysis engines are generally 60-70% 
accurate (humans are 80% accurate), but results 
of sentiment analysis are improving with 
continuous research. 
Techniques 
Document Classification and Grouping returned. The simplest and most obvious similarity 
measure is the count of common words. More 
The immediate benefit organisations get form sophisticated measures include extending the idea 
text mining is better classification of documents of word count, like count with weightage, cosine 
(mails, feedback, comments). It helps them similarity, etc. 
classify documents based on functions, 
departments, business areas, etc., improving Word Frequency 
efficiency. Queries or news wires can be classified 
and routed to the concerned people, so they can One of the most widely used techniques for text 
take appropriate action.Techniques like K Nearest mining based on word frequency is wordcloud – a 
Neighbor (KNN), Decision Trees, Bayesian visually appealing representation of the most frequent 
Classifier, etc. can all be used to classify words in the text data, which makes the font size 
documents. proportional to the frequency of the word. Most 
text mining tools provide multiple options to control 
Usually, documents are matched against already the number of words, remove redundant words, 
known categories, but there could be cases reduce the frequency of similar words,etc. They 
where several incoming documents need to be offer a simple albeit insightful way to highlight 
grouped, without classifying them based on hitherto unknown aspects of the business. Wordclouds 
their structures. This is where the concept of can be made more insightful by building drill-high 
homogeneity and high heterogeneity comes down capabilities that can search for comments 
into picture. In this regard, document classification with a specified ‘keyword’. Organisations can 
and grouping is very similar to supervised and either opt for specialized software to generate 
unsupervised learning respectively. wordclouds, or choose general purpose text mining 
programs that have such functionalities in-built in Information Retrieval them. 
Businesses need information to base their Sentiment Analysis decisions on, and hence must have such insightful 
information at their disposal. Information retrieval The most exciting and challenging aspect oftext 
refers to the technique of fetching a set of documents mining is analyzing the sentiments in the text data. 
containing desired information. This can be Imagine a scenario where businesses can understand 
achieved by passing a ‘clue’ to the program, the sentiments (positive, negative or neutral) 
which can be in form of a keyword or combination latent in comments/posts/feedback without going 
of words. The program returns documents through copious amounts of text data. Such 
containing text with the closest match to the ‘clue’. information would be extremely valuable to 
From text mining perspective, single or multiple organisations operating in customer centric B2C 
words passed as ‘clue’ can be considered as a markets, like hospitality, travel, banking, retail, etc. 
document. Based on ‘similarity’ measures, a Sentiments can be analyzedin two ways: 
set of documents with the best match is 
BI Strategy & Framework BI Strategy & Framework
Business Use Cases 
Text mining finds application across industries Text mining can also be used in tandem with 
and functions – travel,retail, banking, hospitality, voice-to-text technology for analyzing transcripts 
and healthcare, etc. Market intelligence teams of the cockpit voice recorders of airlinestogain 
can segregate news feeds and web articles insights. This can help understand the reasons 
based on document classification or grouping. behind anomalous and risky flight manoeuvres or 
Similarly, incoming emails can be separated flight incidences. Voice-based feedback in hotel or 
and auto-forwarded to respective teams. An banks can also serve as important inputs for text 
airline company may find that its in-flight mining. 
entertainment system or baggage handling 
process is a sore point for its customers, or a 
hotel may find that a seemingly innocuous 
construction near its premises could irritate its 
otherwise satisfied customers. HR teams can 
analyze employee feedback and find potential 
areas of improvement for organizational 
development. Reviews can be made even more 
valuable by crawling the top web searches for 
user queries and providing text mining results 
(frequent words, sentiments, broad areas of 
discussions, trends etc.) in real time. 
Text mining can be very valuable for both intra 
as well as inter-organizational benchmarking. 
One can view word clouds, sentiments, etc. in 
two different time frames, for two different 
geographies, departments, functions, etc. Text 
mining can be used for comparison against 
industry rivals too. If time stamped data is 
available, it can be used for a trend analysis of 
sentiments or social outreach (no. of 
comments, posts, likes etc.). For e.g., an F&B 
organization can see how its flagship product 
pitches against its rival’s product. A simple 
wordcloud and drilldown can reveal that it is 
not considered as healthya breakfast 
companion as the other beverage sold by its 
competitor. It can be also be used to find 
associations between diseases and interaction 
between and adverse effects of drugs. 
BI Strategy & Framework BI Strategy & Framework
Author 
Anand Nath Jha, 
Analytics Architect – DWBI & Analytics, 
ITC Infotech 
Mr. Anand has 17 years of diverse 
experience in strategic Marketing, 
Analytics, Project Management and 
Aerospace Engineering. He has worked 
for Honeywell, General Electric, 
Hindustan Aeronautics Ltd. and LM 
Windpower. He graduated in Aerospace 
Engineering from IIT Kanpur, and pursued 
MBA and Advanced Certification in 
Analytics from University of Phoenix and 
IIM Lucnkow respectively. 
Co-Author 
Viros Sharma, 
Vice President & Global Practice Head- 
DWBI & Analytics, ITC Infotech 
Mr. Viros has more than 20 years of 
experience in DW/BI Consulting and Practice- 
Building space. He has worked for 
multinational IT companies like BearingPoint 
and iGATE in India and USA. He did his AMP 
from IIMB and holds double masters in 
Mathematics and Computer Applications. 
About ITC Infotech 
ITC Infotech, a fully owned subsidiary of USD 7 billion ITC Ltd, provides IT services and solutions to leading global 
customers. The company has carved a niche for itself by addressing customer challenges through innovative IT 
solutions. 
ITC Infotech is focused on servicing the BFSI (Banking, Financial Services & Insurance), CPG&R (Consumer Packaged 
Goods & Retail), Life Sciences, Manufacturing & Engineering Services, THT (Travel, Hospitality and Transportation) and 
Media & Entertainment industries. 
For more information, please visit http://www.itcinfotech.com | Or write to: contact.us@itcinfotech.com

Mais conteúdo relacionado

Mais procurados

Sentiment Analysis
Sentiment Analysis Sentiment Analysis
Sentiment Analysis Bytesview
 
Text Analysis for Social Media
Text Analysis for Social MediaText Analysis for Social Media
Text Analysis for Social MediaBytesview
 
Boost Your Text Analytics Accuracy - MeaningCloud Webinar
Boost Your Text Analytics Accuracy - MeaningCloud WebinarBoost Your Text Analytics Accuracy - MeaningCloud Webinar
Boost Your Text Analytics Accuracy - MeaningCloud WebinarMeaningCloud
 
Emotion analysis
Emotion analysisEmotion analysis
Emotion analysisBytesview
 
Using Data Mining Techniques in Customer Segmentation
Using Data Mining Techniques in Customer SegmentationUsing Data Mining Techniques in Customer Segmentation
Using Data Mining Techniques in Customer SegmentationIJERA Editor
 
Text Analysis in Retail
Text Analysis in RetailText Analysis in Retail
Text Analysis in RetailBytesview
 
Search Analytics For Content Strategists @CSofNYC
Search Analytics For Content Strategists @CSofNYCSearch Analytics For Content Strategists @CSofNYC
Search Analytics For Content Strategists @CSofNYCWIKOLO
 
Customer Segmentation Project
Customer Segmentation ProjectCustomer Segmentation Project
Customer Segmentation ProjectAditya Ekawade
 
Improving search result via search keywords and data classification similarity
Improving search result via search keywords and data classification similarityImproving search result via search keywords and data classification similarity
Improving search result via search keywords and data classification similarityConference Papers
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayAmit Sheth
 
A Baseline Based Deep Learning Approach of Live Tweets
A Baseline Based Deep Learning Approach of Live TweetsA Baseline Based Deep Learning Approach of Live Tweets
A Baseline Based Deep Learning Approach of Live Tweetsijtsrd
 
03 chapter3 information-quality-assessment
03 chapter3 information-quality-assessment03 chapter3 information-quality-assessment
03 chapter3 information-quality-assessmentYsaias arana
 
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Artificial Intelligence Institute at UofSC
 
Analyzing and Comparing opinions on the Web mining Consumer Reviews
Analyzing and Comparing opinions on the Web mining Consumer ReviewsAnalyzing and Comparing opinions on the Web mining Consumer Reviews
Analyzing and Comparing opinions on the Web mining Consumer Reviewsijsrd.com
 
An impact of knowledge mining on satisfaction of consumers in super bazaars
An impact of knowledge mining on satisfaction of consumers in super bazaarsAn impact of knowledge mining on satisfaction of consumers in super bazaars
An impact of knowledge mining on satisfaction of consumers in super bazaarsIAEME Publication
 

Mais procurados (17)

Sentiment Analysis
Sentiment Analysis Sentiment Analysis
Sentiment Analysis
 
Text Analysis for Social Media
Text Analysis for Social MediaText Analysis for Social Media
Text Analysis for Social Media
 
Boost Your Text Analytics Accuracy - MeaningCloud Webinar
Boost Your Text Analytics Accuracy - MeaningCloud WebinarBoost Your Text Analytics Accuracy - MeaningCloud Webinar
Boost Your Text Analytics Accuracy - MeaningCloud Webinar
 
Emotion analysis
Emotion analysisEmotion analysis
Emotion analysis
 
Using Data Mining Techniques in Customer Segmentation
Using Data Mining Techniques in Customer SegmentationUsing Data Mining Techniques in Customer Segmentation
Using Data Mining Techniques in Customer Segmentation
 
Text Analysis in Retail
Text Analysis in RetailText Analysis in Retail
Text Analysis in Retail
 
Search Analytics For Content Strategists @CSofNYC
Search Analytics For Content Strategists @CSofNYCSearch Analytics For Content Strategists @CSofNYC
Search Analytics For Content Strategists @CSofNYC
 
Customer Segmentation Project
Customer Segmentation ProjectCustomer Segmentation Project
Customer Segmentation Project
 
Improving search result via search keywords and data classification similarity
Improving search result via search keywords and data classification similarityImproving search result via search keywords and data classification similarity
Improving search result via search keywords and data classification similarity
 
Applications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World TodayApplications of Semantic Technology in the Real World Today
Applications of Semantic Technology in the Real World Today
 
A Baseline Based Deep Learning Approach of Live Tweets
A Baseline Based Deep Learning Approach of Live TweetsA Baseline Based Deep Learning Approach of Live Tweets
A Baseline Based Deep Learning Approach of Live Tweets
 
03 chapter3 information-quality-assessment
03 chapter3 information-quality-assessment03 chapter3 information-quality-assessment
03 chapter3 information-quality-assessment
 
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
 
Capturing Marketing Information to Fuel Growth
Capturing Marketing Information to Fuel GrowthCapturing Marketing Information to Fuel Growth
Capturing Marketing Information to Fuel Growth
 
Hh
HhHh
Hh
 
Analyzing and Comparing opinions on the Web mining Consumer Reviews
Analyzing and Comparing opinions on the Web mining Consumer ReviewsAnalyzing and Comparing opinions on the Web mining Consumer Reviews
Analyzing and Comparing opinions on the Web mining Consumer Reviews
 
An impact of knowledge mining on satisfaction of consumers in super bazaars
An impact of knowledge mining on satisfaction of consumers in super bazaarsAn impact of knowledge mining on satisfaction of consumers in super bazaars
An impact of knowledge mining on satisfaction of consumers in super bazaars
 

Destaque

Looking Ahead – The Big Opportunity for Network Design - GST Introduction in ...
Looking Ahead – The Big Opportunity for Network Design - GST Introduction in ...Looking Ahead – The Big Opportunity for Network Design - GST Introduction in ...
Looking Ahead – The Big Opportunity for Network Design - GST Introduction in ...ITC Infotech
 
Shared services solution
Shared services solutionShared services solution
Shared services solutionITC Infotech
 
Dynamic Oil Pricing System Enabled Faster Market Reaction Time
Dynamic Oil Pricing System Enabled Faster Market Reaction TimeDynamic Oil Pricing System Enabled Faster Market Reaction Time
Dynamic Oil Pricing System Enabled Faster Market Reaction TimeITC Infotech
 
Offshore Engineering Development Center Engineering Services
Offshore Engineering Development Center Engineering ServicesOffshore Engineering Development Center Engineering Services
Offshore Engineering Development Center Engineering ServicesITC Infotech
 
Metadata as a Service
 Metadata as a Service Metadata as a Service
Metadata as a ServiceITC Infotech
 
Testing in Financial Services - Leveraging Process Maps
Testing in Financial Services - Leveraging Process MapsTesting in Financial Services - Leveraging Process Maps
Testing in Financial Services - Leveraging Process MapsITC Infotech
 
ITC Infotech’s PLM offering for Automotive sector in Autocar Professional mag...
ITC Infotech’s PLM offering for Automotive sector in Autocar Professional mag...ITC Infotech’s PLM offering for Automotive sector in Autocar Professional mag...
ITC Infotech’s PLM offering for Automotive sector in Autocar Professional mag...ITC Infotech
 
Supply chain-network-optimization-services 16
Supply chain-network-optimization-services 16Supply chain-network-optimization-services 16
Supply chain-network-optimization-services 16ITC Infotech
 
APPROACH TO VALUE ENGINEERING USING PRO/ENGINEER
APPROACH TO VALUE ENGINEERING USING PRO/ENGINEERAPPROACH TO VALUE ENGINEERING USING PRO/ENGINEER
APPROACH TO VALUE ENGINEERING USING PRO/ENGINEERITC Infotech
 
Looking Ahead–The Big Opportunity for Network Design - GST Introduction in India
Looking Ahead–The Big Opportunity for Network Design - GST Introduction in IndiaLooking Ahead–The Big Opportunity for Network Design - GST Introduction in India
Looking Ahead–The Big Opportunity for Network Design - GST Introduction in IndiaITC Infotech
 
Social Media Interconnect With Plm System
 Social Media Interconnect With Plm System Social Media Interconnect With Plm System
Social Media Interconnect With Plm SystemITC Infotech
 
Dynamic Oil Pricing System Enabled Faster Market Reaction Time
Dynamic Oil Pricing System Enabled Faster Market Reaction TimeDynamic Oil Pricing System Enabled Faster Market Reaction Time
Dynamic Oil Pricing System Enabled Faster Market Reaction TimeITC Infotech
 
European Market Infrastructure Regulation
European Market Infrastructure RegulationEuropean Market Infrastructure Regulation
European Market Infrastructure RegulationITC Infotech
 
FAST FASHION: ARE YOU READY TO LEAD
FAST FASHION: ARE YOU READY TO LEADFAST FASHION: ARE YOU READY TO LEAD
FAST FASHION: ARE YOU READY TO LEADITC Infotech
 
Integrated Engineering Calculation App Enabled Standardization of Elevator De...
Integrated Engineering Calculation App Enabled Standardization of Elevator De...Integrated Engineering Calculation App Enabled Standardization of Elevator De...
Integrated Engineering Calculation App Enabled Standardization of Elevator De...ITC Infotech
 
Top 5 Trends For CPG & Retail Industry 2015
Top 5 Trends For CPG & Retail Industry 2015Top 5 Trends For CPG & Retail Industry 2015
Top 5 Trends For CPG & Retail Industry 2015ITC Infotech
 
3 d interface-with-plm-solutions
3 d interface-with-plm-solutions3 d interface-with-plm-solutions
3 d interface-with-plm-solutionsITC Infotech
 

Destaque (18)

Elastomer
ElastomerElastomer
Elastomer
 
Looking Ahead – The Big Opportunity for Network Design - GST Introduction in ...
Looking Ahead – The Big Opportunity for Network Design - GST Introduction in ...Looking Ahead – The Big Opportunity for Network Design - GST Introduction in ...
Looking Ahead – The Big Opportunity for Network Design - GST Introduction in ...
 
Shared services solution
Shared services solutionShared services solution
Shared services solution
 
Dynamic Oil Pricing System Enabled Faster Market Reaction Time
Dynamic Oil Pricing System Enabled Faster Market Reaction TimeDynamic Oil Pricing System Enabled Faster Market Reaction Time
Dynamic Oil Pricing System Enabled Faster Market Reaction Time
 
Offshore Engineering Development Center Engineering Services
Offshore Engineering Development Center Engineering ServicesOffshore Engineering Development Center Engineering Services
Offshore Engineering Development Center Engineering Services
 
Metadata as a Service
 Metadata as a Service Metadata as a Service
Metadata as a Service
 
Testing in Financial Services - Leveraging Process Maps
Testing in Financial Services - Leveraging Process MapsTesting in Financial Services - Leveraging Process Maps
Testing in Financial Services - Leveraging Process Maps
 
ITC Infotech’s PLM offering for Automotive sector in Autocar Professional mag...
ITC Infotech’s PLM offering for Automotive sector in Autocar Professional mag...ITC Infotech’s PLM offering for Automotive sector in Autocar Professional mag...
ITC Infotech’s PLM offering for Automotive sector in Autocar Professional mag...
 
Supply chain-network-optimization-services 16
Supply chain-network-optimization-services 16Supply chain-network-optimization-services 16
Supply chain-network-optimization-services 16
 
APPROACH TO VALUE ENGINEERING USING PRO/ENGINEER
APPROACH TO VALUE ENGINEERING USING PRO/ENGINEERAPPROACH TO VALUE ENGINEERING USING PRO/ENGINEER
APPROACH TO VALUE ENGINEERING USING PRO/ENGINEER
 
Looking Ahead–The Big Opportunity for Network Design - GST Introduction in India
Looking Ahead–The Big Opportunity for Network Design - GST Introduction in IndiaLooking Ahead–The Big Opportunity for Network Design - GST Introduction in India
Looking Ahead–The Big Opportunity for Network Design - GST Introduction in India
 
Social Media Interconnect With Plm System
 Social Media Interconnect With Plm System Social Media Interconnect With Plm System
Social Media Interconnect With Plm System
 
Dynamic Oil Pricing System Enabled Faster Market Reaction Time
Dynamic Oil Pricing System Enabled Faster Market Reaction TimeDynamic Oil Pricing System Enabled Faster Market Reaction Time
Dynamic Oil Pricing System Enabled Faster Market Reaction Time
 
European Market Infrastructure Regulation
European Market Infrastructure RegulationEuropean Market Infrastructure Regulation
European Market Infrastructure Regulation
 
FAST FASHION: ARE YOU READY TO LEAD
FAST FASHION: ARE YOU READY TO LEADFAST FASHION: ARE YOU READY TO LEAD
FAST FASHION: ARE YOU READY TO LEAD
 
Integrated Engineering Calculation App Enabled Standardization of Elevator De...
Integrated Engineering Calculation App Enabled Standardization of Elevator De...Integrated Engineering Calculation App Enabled Standardization of Elevator De...
Integrated Engineering Calculation App Enabled Standardization of Elevator De...
 
Top 5 Trends For CPG & Retail Industry 2015
Top 5 Trends For CPG & Retail Industry 2015Top 5 Trends For CPG & Retail Industry 2015
Top 5 Trends For CPG & Retail Industry 2015
 
3 d interface-with-plm-solutions
3 d interface-with-plm-solutions3 d interface-with-plm-solutions
3 d interface-with-plm-solutions
 

Semelhante a TAPPING HIDDEN KERNELS OF WISDOM

Text sentiment analysis
Text sentiment analysisText sentiment analysis
Text sentiment analysisRepustate
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data miningEr. Nawaraj Bhandari
 
solutions and understanding text analytics
solutions and understanding text analyticssolutions and understanding text analytics
solutions and understanding text analyticsrajshreemuthiah
 
AI for sentiment analysis - An Overview.pdf
AI for sentiment analysis - An Overview.pdfAI for sentiment analysis - An Overview.pdf
AI for sentiment analysis - An Overview.pdfStephenAmell4
 
Topic-Based Sentiment Analysis.pptx
Topic-Based Sentiment Analysis.pptxTopic-Based Sentiment Analysis.pptx
Topic-Based Sentiment Analysis.pptxRepustate
 
Paper id 26201475
Paper id 26201475Paper id 26201475
Paper id 26201475IJRAT
 
A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...
A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...
A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...IRJET Journal
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewINFOGAIN PUBLICATION
 
Feature Based Semantic Polarity Analysis Through Ontology
Feature Based Semantic Polarity Analysis Through OntologyFeature Based Semantic Polarity Analysis Through Ontology
Feature Based Semantic Polarity Analysis Through OntologyIOSR Journals
 
Topic labeling
Topic labelingTopic labeling
Topic labelingBytesview
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSijistjournal
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSijistjournal
 
Text analysis and its Importance.pdf
Text analysis and its Importance.pdfText analysis and its Importance.pdf
Text analysis and its Importance.pdfVivekDixit486466
 
16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docxRAJU852744
 

Semelhante a TAPPING HIDDEN KERNELS OF WISDOM (20)

Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
NLP Ecosystem
NLP EcosystemNLP Ecosystem
NLP Ecosystem
 
Text sentiment analysis
Text sentiment analysisText sentiment analysis
Text sentiment analysis
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
solutions and understanding text analytics
solutions and understanding text analyticssolutions and understanding text analytics
solutions and understanding text analytics
 
AI for sentiment analysis - An Overview.pdf
AI for sentiment analysis - An Overview.pdfAI for sentiment analysis - An Overview.pdf
AI for sentiment analysis - An Overview.pdf
 
Topic-Based Sentiment Analysis.pptx
Topic-Based Sentiment Analysis.pptxTopic-Based Sentiment Analysis.pptx
Topic-Based Sentiment Analysis.pptx
 
Paper id 26201475
Paper id 26201475Paper id 26201475
Paper id 26201475
 
D018212428
D018212428D018212428
D018212428
 
A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...
A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...
A Novel Voice Based Sentimental Analysis Technique to Mine the User Driven Re...
 
Sentiment analysis on unstructured review
Sentiment analysis on unstructured reviewSentiment analysis on unstructured review
Sentiment analysis on unstructured review
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A Review
 
C017141317
C017141317C017141317
C017141317
 
Feature Based Semantic Polarity Analysis Through Ontology
Feature Based Semantic Polarity Analysis Through OntologyFeature Based Semantic Polarity Analysis Through Ontology
Feature Based Semantic Polarity Analysis Through Ontology
 
Topic labeling
Topic labelingTopic labeling
Topic labeling
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
 
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWSTOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
TOWARDS AUTOMATIC DETECTION OF SENTIMENTS IN CUSTOMER REVIEWS
 
Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1
 
Text analysis and its Importance.pdf
Text analysis and its Importance.pdfText analysis and its Importance.pdf
Text analysis and its Importance.pdf
 
16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx
 

Último

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

TAPPING HIDDEN KERNELS OF WISDOM

  • 1. TEXT MINING-TAPPING HIDDEN KERNELS OF WISDOM A thought paper on ‘business benefits of text mining.’
  • 2. Executive Summary Text Mining has evolved and garnered a lot of interest in recent times, not without a reason though. Organizations collect huge amounts of text data. They can use various text mining tools and techniques to analyse that data for meaningful, actionable insights. This paper discusses how automatic document classification, information retrieval, word frequency calculation, sentiment analysis, topic modelling and trend analysis can be utilized for root cause analysis, devising competitive strategies, enhancing customer experienceand so on. The Approach Text data can be in different forms - Facebook scraping tools. Known attributes of the text, like posts/comments, tweets, customer feedback, date, location, customer ID, business area, should blog posts, sales rep. notes, patient health records, also be captured and recorded, as that would help complaint logs, third-party surveys (free text in filtering and selective analysis. Incoming text formats), newswires, newsfeeds, documents, etc. can be cleaned up by removing irrelevant data, if Conceptually, text mining is a three-step required. This text data can then be mined with the process (Figure 1). help of commercial programs (SAS, SPSS, SAP Comments/feedback can be stored in simple Infinite Insight), niche commercial or open source tables in a spreadsheet/database and documents programs (Attensity, OpenText, SmartLogic, openNLP, can be stored in folders. Text available on social KNIME) or analytics tools (R, Python) (Figure 2). media and other websites can be collectedusing Figure 1 ! Web Sensors ! Open and Paid API ! Web Scraping ! Internal Data ! Customer Feedbacks ! Value Chain Partner Interaction LOADING ! Data into flat files or spreadsheets ! Data load in Databases ! Advanced Text Analytics ! Word Frequency ! Word Associations ! Sentiment Analysis ! Trend Analysis EXTRACTION ANALYTICS Figure 2 Survey Emails Logs Web Scaping tools ! SAS Text Analytics ! SPASS Text Analytics ! SAP Infinitelnsight ! Smart Logic ! Rapid Miner ! Open Text ! R Test Mining World Cloud Setiment Analysis & Trends Topic Analysis BI Strategy & Framework BI Strategy & Framework
  • 3. ! Machine Learning – Here, a training data set is Topic & Content Analysis made available to the program with predefined Advanced text mining can be used sentiments. The algorithm trains based on to analyze topics occurrence of words or patterns, and new and context. Just likesentiment analysis, topic incoming texts are classified based on the analysis can be done either through machine occurrence of such patterns or words. learning or based on predefined dictionaries. Advanced techniques, like Support Vector Machines, ! Sentiment Lexicons – Here, matches of positive Latent Dirichlet Allocation, etc., can also be used. or negative words are found and a sentiment In addition, word clustering (Hierarchical or K Means measure is calculated. Clustering techniques), network diagrams and word However, it is easier said than done. The complexity associations can be used to look for topics or of languages, sarcasm, colloquialism, poor spelling context based on natural word groups. Co-and grammar can marthe accuracy of results in occurrence and proximity are two of the most this area. In addition, sentiments need to be useful ways to group words in similar topics. analyzed holistically and in a context. For e.g., finding two products or services similar can only be construed positive or negative by knowing about the two products or services being discussed. The comment could be a compliment or criticism, depending upon what it is being compared to. Sentiment analysis engines are generally 60-70% accurate (humans are 80% accurate), but results of sentiment analysis are improving with continuous research. Techniques Document Classification and Grouping returned. The simplest and most obvious similarity measure is the count of common words. More The immediate benefit organisations get form sophisticated measures include extending the idea text mining is better classification of documents of word count, like count with weightage, cosine (mails, feedback, comments). It helps them similarity, etc. classify documents based on functions, departments, business areas, etc., improving Word Frequency efficiency. Queries or news wires can be classified and routed to the concerned people, so they can One of the most widely used techniques for text take appropriate action.Techniques like K Nearest mining based on word frequency is wordcloud – a Neighbor (KNN), Decision Trees, Bayesian visually appealing representation of the most frequent Classifier, etc. can all be used to classify words in the text data, which makes the font size documents. proportional to the frequency of the word. Most text mining tools provide multiple options to control Usually, documents are matched against already the number of words, remove redundant words, known categories, but there could be cases reduce the frequency of similar words,etc. They where several incoming documents need to be offer a simple albeit insightful way to highlight grouped, without classifying them based on hitherto unknown aspects of the business. Wordclouds their structures. This is where the concept of can be made more insightful by building drill-high homogeneity and high heterogeneity comes down capabilities that can search for comments into picture. In this regard, document classification with a specified ‘keyword’. Organisations can and grouping is very similar to supervised and either opt for specialized software to generate unsupervised learning respectively. wordclouds, or choose general purpose text mining programs that have such functionalities in-built in Information Retrieval them. Businesses need information to base their Sentiment Analysis decisions on, and hence must have such insightful information at their disposal. Information retrieval The most exciting and challenging aspect oftext refers to the technique of fetching a set of documents mining is analyzing the sentiments in the text data. containing desired information. This can be Imagine a scenario where businesses can understand achieved by passing a ‘clue’ to the program, the sentiments (positive, negative or neutral) which can be in form of a keyword or combination latent in comments/posts/feedback without going of words. The program returns documents through copious amounts of text data. Such containing text with the closest match to the ‘clue’. information would be extremely valuable to From text mining perspective, single or multiple organisations operating in customer centric B2C words passed as ‘clue’ can be considered as a markets, like hospitality, travel, banking, retail, etc. document. Based on ‘similarity’ measures, a Sentiments can be analyzedin two ways: set of documents with the best match is BI Strategy & Framework BI Strategy & Framework
  • 4. Business Use Cases Text mining finds application across industries Text mining can also be used in tandem with and functions – travel,retail, banking, hospitality, voice-to-text technology for analyzing transcripts and healthcare, etc. Market intelligence teams of the cockpit voice recorders of airlinestogain can segregate news feeds and web articles insights. This can help understand the reasons based on document classification or grouping. behind anomalous and risky flight manoeuvres or Similarly, incoming emails can be separated flight incidences. Voice-based feedback in hotel or and auto-forwarded to respective teams. An banks can also serve as important inputs for text airline company may find that its in-flight mining. entertainment system or baggage handling process is a sore point for its customers, or a hotel may find that a seemingly innocuous construction near its premises could irritate its otherwise satisfied customers. HR teams can analyze employee feedback and find potential areas of improvement for organizational development. Reviews can be made even more valuable by crawling the top web searches for user queries and providing text mining results (frequent words, sentiments, broad areas of discussions, trends etc.) in real time. Text mining can be very valuable for both intra as well as inter-organizational benchmarking. One can view word clouds, sentiments, etc. in two different time frames, for two different geographies, departments, functions, etc. Text mining can be used for comparison against industry rivals too. If time stamped data is available, it can be used for a trend analysis of sentiments or social outreach (no. of comments, posts, likes etc.). For e.g., an F&B organization can see how its flagship product pitches against its rival’s product. A simple wordcloud and drilldown can reveal that it is not considered as healthya breakfast companion as the other beverage sold by its competitor. It can be also be used to find associations between diseases and interaction between and adverse effects of drugs. BI Strategy & Framework BI Strategy & Framework
  • 5. Author Anand Nath Jha, Analytics Architect – DWBI & Analytics, ITC Infotech Mr. Anand has 17 years of diverse experience in strategic Marketing, Analytics, Project Management and Aerospace Engineering. He has worked for Honeywell, General Electric, Hindustan Aeronautics Ltd. and LM Windpower. He graduated in Aerospace Engineering from IIT Kanpur, and pursued MBA and Advanced Certification in Analytics from University of Phoenix and IIM Lucnkow respectively. Co-Author Viros Sharma, Vice President & Global Practice Head- DWBI & Analytics, ITC Infotech Mr. Viros has more than 20 years of experience in DW/BI Consulting and Practice- Building space. He has worked for multinational IT companies like BearingPoint and iGATE in India and USA. He did his AMP from IIMB and holds double masters in Mathematics and Computer Applications. About ITC Infotech ITC Infotech, a fully owned subsidiary of USD 7 billion ITC Ltd, provides IT services and solutions to leading global customers. The company has carved a niche for itself by addressing customer challenges through innovative IT solutions. ITC Infotech is focused on servicing the BFSI (Banking, Financial Services & Insurance), CPG&R (Consumer Packaged Goods & Retail), Life Sciences, Manufacturing & Engineering Services, THT (Travel, Hospitality and Transportation) and Media & Entertainment industries. For more information, please visit http://www.itcinfotech.com | Or write to: contact.us@itcinfotech.com