SlideShare uma empresa Scribd logo
1 de 4
Baixar para ler offline
International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 
E-ISSN: 2321-9637 
166 
Big Data: Text Analytics 
Mrs.Balshetwar S.V.1, Prof (Dr.)Tugnayat R.M.2 
1HOD, Department of Information Technology, Satara College of Engineering and Management ,limb,satara 
Shivaji University, Kolhapur (Maharashtra) Email: balshetwar.satara@gmail.com 
2Principal, Shri Shankarprasad Agnihotri College of Engineering,,Wardha 
Sant Gadge Baba Amravati university, Amravati (Maharashtra) 
Abstract-Every part of this technological world is flooded of big data today. Almost 80% of this big data is 
unstructured, because the data comes from various new sources like device logs, server logs, twitter feeds, chat 
data, blogs, web pages, emails, social media content. This makes a huge collection of text data which is created 
by humans to express themselves to others, so it has become an important source of data that may contain 
valuable information. Through text analytics techniques we can extract information from these collected sources 
of data and utilize them in customer management, sentiment analysis, and collaborative analysis. This paper 
discusses some basic techniques that identify useful patterns from the text in big data. 
Index Terms - Big data, Text analytics, LDA. 
1. INTRODUCTION 
A text is piece of information through which 
human communicate with each other. Broad range of 
application and devices are available for text 
communication and sharing intentional data and so it 
is collected at unexpected scale. Decisions are made 
today on the basis of this data which were previously 
made on guess or on the models of reality. Big data 
analysis now drives every aspect of modern 
application, gadgets, industry as well as society. 
Text within big data, such as data form 
newspapers, magazines, web pages, emails, blogs, 
tweets, is particularly important because there are 
sources of information that has valuable information 
for humans. To utilize this large amount of text data it 
requires techniques for processing the data. 
Those techniques must have following 
characteristics: 
(1) They must be fast & accurate in processing. 
(2) Must be able to find Relationship with other 
information. 
(3) Must remove 100% ambiguity from data. 
(4) Must handle heterogeneous data efficiently. 
Steps in big data analysis 
Flow chart in fig.1 shows big data steps for 
retrieving or extracting important & valuable 
information. 
The analyzing steps shown in fig.1 for big data 
querying and mining are very different from 
traditional analysis methods that are worked out on 
small amount of data. Big data is often noisy, 
dynamics, interrelated and untrustworthy but 
nevertheless even noisy big data could be more 
valuable than small samples because statistics 
obtained from frequent pattern and correlated analysis 
usually disclose more reliable hidden patterns 
knowledge. 
Analyzing step can extract meaningful and related 
information by processing the text which is in natural 
language, but it is not so easy to analyze using simple 
regression models or decision trees. However the 
group of technique called as text analytics can help to 
get deep information from these sources by translating 
this complex textual information into useful signals 
that can give deeper analysis. 
Various sources of data 
Recording data 
Cleaning data 
Integrating 
Analyzing 
Interpreting 
Fig1. Steps in big data analysis 
2. LITERATURE REVIEW 
Based on a survey of over 4,000 information 
technology (IT) professionals from 93countries and 
25 industries, the IBM Tech Trends Report (2011) 
identified business analytics as one of the four major 
technology trends in the 2010s. In a survey of the 
state of business analytics by Bloomberg
International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 
E-ISSN: 2321-9637 
167 
Businessweek (2011), 97 percent of companies with 
revenues exceeding $100 million were found to use 
some form of business analytics. A report by the 
McKinsey Global Institute (Manyika et al. 2011) 
predictedthat by 2018, the United States alone will 
face a shortage of 140,000 to 190,000 people with 
deep analytical skills, as well as a shortfall of 1.5 
million data-savvy managers with the know-how to 
analyze big data to make effective decisions[1] 
Emerging analytics research opportunities can be 
classified into five critical technical areas—(big) data 
analytics, text analytics, web analytics, network 
analytics, and mobile analytics—all of which can 
contribute to BI&A[1]. 
In “hype cycle for big data” report (July 2013), 
Gartner positions text analytics as delivering great 
business benefits and project adoption in next two to 
five years[2]. 
Text analytics has its academic roots in 
information retrieval and computational linguistics. In 
information retrieval, document representation and 
query processing are the foundations for developing 
the vector-space model, Boolean retrieval model, and 
probabilistic retrieval model, which in turn, became 
the basis for the modern digital libraries, search 
engines, and enterprise search systems[3]. In 
computational linguistics, statistical natural language 
processing (NLP) techniques for lexical acquisition, 
word sense disambiguation, part-of-speech-tagging 
(POST), and probabilistic context-free grammars have 
also become important for representing text [4]. 
Text analytics Academic roots 
Document representation 
Information retrieval Query processing 
Application 
Digital libraries, Search engines 
NLP 
Computational linguistic CFG 
POST 
Application 
for representing text 
3. ALGORITHMS FOR TEXT ANLYTICS 
Text analytics refers to the process of deriving 
high quality information from text. Information is 
derived by finding/learning patterns and trends in it 
using statistical methods. 
To extract specific type of information from text 
data there are many algorithms but which to apply 
depends on the type of data analysis project at hand. 
Some projects are clear in objectives and certain are 
just trying to deep inspect the data and get some 
valuable data from mass of information where the 
outcome is known, which can then be used for further 
analysis. 
As per the project in hand by making use of 
statistical methods based on frequency matrix (counts 
words appearing in various text sources) or term 
document matrix (lists all the unique terms in the text 
which is examined) often gives a new useful feature 
after applying proper statistical technique. However 
this technique gives intermediate results which can be 
used as foundation for further analysis. In this type of 
method after examining documents a scorecard is 
prepared showing score of every term in respect of 
number of times it appears in the document then by 
applying a threshold only those terms are collected 
that is above threshold which is then used to construct 
a larger concept 
Other text analytic technique may make use of 
Named Entity Extraction (NEE) , it is a method that 
identifies every smallest element in text and classifies 
them into predefined entities like person, place, 
product, date etc. 
Making use of NEE, probabilities can be set that a 
particular document refers to an named entity. 
NEE is based on natural language processing 
(NLP). After analyzing the structure of text, NEE 
generates a score foe every entity that is identified 
from that text. Considering the score for every entity 
and applying threshold, those entities can be used in 
creation of structured features and make use of it 
further in prediction models. 
NEE has been successfully developed for news 
analysis and biomedical application [1]. 
Another text analytic technique which is widely 
used in emerging areas like topic model is LDA 
(Latent Drichelt allocation). LDA is mainly used for 
finding main topic/themes that are in every part of a 
large unstructured collection of documents and it is 
also useful for detecting changes in customer 
behavior. 
LDA is an unsupervised method that is applied on 
unstructured data. In comparison with NEE it is not 
NLP based rather it looks for pattern in text. 
LDA can be applied to any type of structured, 
unstructured, semi structured data from any number of 
sources to identify patterns in text. 
Text analytics are widely used in emerging areas 
like information extraction, topic models and 
opinion/sentiment analysis. 
While working on text in big data, where exactly 
is text analytics technique applied. Text data are 
typically held as notes, documents and various forms 
of electronic correspondence (emails for example). 
Structured data on the other hand are usually 
contained in databases with fixed structures. Many 
data mining techniques have been developed to 
extract useful patterns from structured data and this 
process is often enhanced by the addition of variables 
(called features) which add new ‘dimensions’, 
providing information that is not implicitly contained
International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 
E-ISSN: 2321-9637 
168 
in existing features. The appropriate processing of text 
data can allow such new features to be added, 
improving the effectiveness of predictive models or 
providing new insights [5]. 
Let us consider an example for customer 
information. Structured data about an customer is 
stored in database using fields like name, salary and 
order value and there be an unstructured texts about 
customer information. Both structured and 
unstructured text is lacking of customer ID. By 
applying text analytics technique like NEE or LDA 
Structured Data Unstructured Data 
Name Salary Order 
Raj 45,000 3400 
Jyoti 65,000 4500 
Text Analytics 
New 
Feature 
Data Mining 
Fig 2. Where to apply text analytics? 
One can infer the nature, strength or absence of 
relationships among individuals and yield a new 
feature like sentiment and then by mining data 
predictive pattern are created that can be used in 
application and create valuable information from it. 
4. OBSTACLES IN TEXT ANALYTICS 
Although the field is new, text analytics is 
achieving a level of development that makes its 
widespread use. 
Following are the obstacles for adoption of text 
analytics 
1. How to deploy the results? 
2. How to handle heterogeneous data? 
3. Lack of methods to determine what 
exactly is in text. 
Recent technological development has overcome 
these problems. 
5. CONCLUSION 
The real strength of big data lies in its utilization. 
Big data has huge amount of text data and it can be 
used for extracting sensible information. Making use 
of big data for training a classifier, applying NLP 
along with text analytics technique can give valuable 
information from raw text data. This paper discusses 
how NEE, LDA and term matrix can be used to 
extract information from large unstructured text. 
REFERENCES 
[1] Hsinchun Chen.; Roger H. L. Chiang.; Veda C. 
Storey (2012). Business intelligence and 
analytics:From big data to big impact MIS 
Quarterly Vol. 36 No. 4, pp. 1165- 
1188/December . 
[2] https://www.gartner.com/doc/2574616 
[3] Salton, G. 1989. Automatic Text Processing, 
Reading, MA: Addison Wesley 
[4] Manning, C. D., and Schütze, H. 1999. 
Foundations of Statistical Natural Language 
Processing, Cambridge, MA: The MIT Press. 
March, S. T., and Storey, V. C. 2008. “Design 
Science in the Information Systems Discipline,” 
MIS Quarterly (32:4), pp. 725-730. 
[5] http://butleranalytics.com/unstructured-meets-structured- 
data/ 
[6] http://www.informs.org/ 
[7] http://www.twitter.com. Twitter, Inc. 
[8] http://www.google.com. Google, Inc. 
[9] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent 
dirichlet allocation. J.Mach. Learn. Res., 3:993– 
1022, March 2003. 
[10] J. Chen, R. Nairn, L. Nelson, M. Bernstein, and 
E. Chi. Short and Tweet: Experiments on 
Recommending Content from Information 
Streams. CHI 2010. 
[11]W. Dou, X. Wang, R. Chang, and W. Ribarsky. 
ParallelTopics: A Probabilistic Approach to 
Exploring Document Collections. In Visual 
Analytics Science and Technology (VAST), 2011 
IEEE Conference on, 2011. 
Name Salary Order Sentiment 
Raj 45,000 3400 Negative 
Jyoti 65,000 4500 Positive 
Patterns
International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 
E-ISSN: 2321-9637 
169 
[12]Salton, G & Buckley, C. 1988. Term-weighting 
approaches in automatic text retrieval. 
Information Processing & Management 24 
(5):513-523. 
[13]X. Wei and W. B. Croft. Lda-based document 
models for ad-hoc retrieval. In Proceedings of the 
29th annual international ACM SIGIR conference 
on Research and development in information 
retrieval,SIGIR ’06, pages 178–185, New York, 
NY, USA, 2006. ACM

Mais conteúdo relacionado

Mais procurados

Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data scienceShilpaKrishna6
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Toolsijsrd.com
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challengesfazail amin
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining AreaMahamudHasanCSE
 
Research Topics on Data Mining
Research Topics on Data MiningResearch Topics on Data Mining
Research Topics on Data MiningPhdtopiccom
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisManuel Martín
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersMelinda Thielbar
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like YouSalford Systems
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal databaseTPO TPO
 
Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in pythonUmmeSalmaM1
 
Top (10) challenging problems in data mining
Top (10) challenging problems  in data miningTop (10) challenging problems  in data mining
Top (10) challenging problems in data miningAhmedasbasb
 
Project Topics in Data Mining
Project Topics in Data MiningProject Topics in Data Mining
Project Topics in Data MiningPhdtopiccom
 
Cheat sheets for data scientists
Cheat sheets for data scientistsCheat sheets for data scientists
Cheat sheets for data scientistsAjay Ohri
 
Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...IRJET Journal
 

Mais procurados (20)

Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Real World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining ToolsReal World Application of Big Data In Data Mining Tools
Real World Application of Big Data In Data Mining Tools
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
Elementary Concepts of data minig
Elementary Concepts of data minigElementary Concepts of data minig
Elementary Concepts of data minig
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area
 
Research Topics on Data Mining
Research Topics on Data MiningResearch Topics on Data Mining
Research Topics on Data Mining
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data Analysis
 
Tools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl WintersTools and Methods for Big Data Analytics by Dahl Winters
Tools and Methods for Big Data Analytics by Dahl Winters
 
9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You9 Data Mining Challenges From Data Scientists Like You
9 Data Mining Challenges From Data Scientists Like You
 
Big data road map
Big data road mapBig data road map
Big data road map
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal database
 
Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
 
Data mining
Data miningData mining
Data mining
 
Programming for data science in python
Programming for data science in pythonProgramming for data science in python
Programming for data science in python
 
Data mining
Data miningData mining
Data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Top (10) challenging problems in data mining
Top (10) challenging problems  in data miningTop (10) challenging problems  in data mining
Top (10) challenging problems in data mining
 
Project Topics in Data Mining
Project Topics in Data MiningProject Topics in Data Mining
Project Topics in Data Mining
 
Cheat sheets for data scientists
Cheat sheets for data scientistsCheat sheets for data scientists
Cheat sheets for data scientists
 
Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...
 

Destaque

Paper id 36201537
Paper id 36201537Paper id 36201537
Paper id 36201537IJRAT
 
Paper id 21201419
Paper id 21201419Paper id 21201419
Paper id 21201419IJRAT
 
Paper id 24201441
Paper id 24201441Paper id 24201441
Paper id 24201441IJRAT
 
Paper id 21201465
Paper id 21201465Paper id 21201465
Paper id 21201465IJRAT
 
Paper id 252014156
Paper id 252014156Paper id 252014156
Paper id 252014156IJRAT
 
Paper id 41201624
Paper id 41201624Paper id 41201624
Paper id 41201624IJRAT
 
Paper id 2820149
Paper id 2820149Paper id 2820149
Paper id 2820149IJRAT
 
Paper id 212014121
Paper id 212014121Paper id 212014121
Paper id 212014121IJRAT
 
Paper id 41201622
Paper id 41201622Paper id 41201622
Paper id 41201622IJRAT
 
Paper id 312201522
Paper id 312201522Paper id 312201522
Paper id 312201522IJRAT
 
Paper id 24201433
Paper id 24201433Paper id 24201433
Paper id 24201433IJRAT
 
Paper id 21201488
Paper id 21201488Paper id 21201488
Paper id 21201488IJRAT
 
Paper id 21201486
Paper id 21201486Paper id 21201486
Paper id 21201486IJRAT
 
Paper id 37201536
Paper id 37201536Paper id 37201536
Paper id 37201536IJRAT
 
Paper id 212014116
Paper id 212014116Paper id 212014116
Paper id 212014116IJRAT
 
Paper id 36201509
Paper id 36201509Paper id 36201509
Paper id 36201509IJRAT
 
Paper id 41201605
Paper id 41201605Paper id 41201605
Paper id 41201605IJRAT
 
Paper id 27201444
Paper id 27201444Paper id 27201444
Paper id 27201444IJRAT
 
Paper id 36201529
Paper id 36201529Paper id 36201529
Paper id 36201529IJRAT
 
Paper id 36201510
Paper id 36201510Paper id 36201510
Paper id 36201510IJRAT
 

Destaque (20)

Paper id 36201537
Paper id 36201537Paper id 36201537
Paper id 36201537
 
Paper id 21201419
Paper id 21201419Paper id 21201419
Paper id 21201419
 
Paper id 24201441
Paper id 24201441Paper id 24201441
Paper id 24201441
 
Paper id 21201465
Paper id 21201465Paper id 21201465
Paper id 21201465
 
Paper id 252014156
Paper id 252014156Paper id 252014156
Paper id 252014156
 
Paper id 41201624
Paper id 41201624Paper id 41201624
Paper id 41201624
 
Paper id 2820149
Paper id 2820149Paper id 2820149
Paper id 2820149
 
Paper id 212014121
Paper id 212014121Paper id 212014121
Paper id 212014121
 
Paper id 41201622
Paper id 41201622Paper id 41201622
Paper id 41201622
 
Paper id 312201522
Paper id 312201522Paper id 312201522
Paper id 312201522
 
Paper id 24201433
Paper id 24201433Paper id 24201433
Paper id 24201433
 
Paper id 21201488
Paper id 21201488Paper id 21201488
Paper id 21201488
 
Paper id 21201486
Paper id 21201486Paper id 21201486
Paper id 21201486
 
Paper id 37201536
Paper id 37201536Paper id 37201536
Paper id 37201536
 
Paper id 212014116
Paper id 212014116Paper id 212014116
Paper id 212014116
 
Paper id 36201509
Paper id 36201509Paper id 36201509
Paper id 36201509
 
Paper id 41201605
Paper id 41201605Paper id 41201605
Paper id 41201605
 
Paper id 27201444
Paper id 27201444Paper id 27201444
Paper id 27201444
 
Paper id 36201529
Paper id 36201529Paper id 36201529
Paper id 36201529
 
Paper id 36201510
Paper id 36201510Paper id 36201510
Paper id 36201510
 

Semelhante a Big Data Text Analytics Techniques for Information Extraction

Decision Support for E-Governance: A Text Mining Approach
Decision Support for E-Governance: A Text Mining ApproachDecision Support for E-Governance: A Text Mining Approach
Decision Support for E-Governance: A Text Mining ApproachIJMIT JOURNAL
 
A SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSA SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSijistjournal
 
Applying Clustering Techniques for Efficient Text Mining in Twitter Data
Applying Clustering Techniques for Efficient Text Mining in Twitter DataApplying Clustering Techniques for Efficient Text Mining in Twitter Data
Applying Clustering Techniques for Efficient Text Mining in Twitter Dataijbuiiir1
 
Structured and Unstructured Information Extraction Using Text Mining and Natu...
Structured and Unstructured Information Extraction Using Text Mining and Natu...Structured and Unstructured Information Extraction Using Text Mining and Natu...
Structured and Unstructured Information Extraction Using Text Mining and Natu...rahulmonikasharma
 
IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...
IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...
IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...IRJET Journal
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine LearningIRJET Journal
 
A novel approach for text extraction using effective pattern matching technique
A novel approach for text extraction using effective pattern matching techniqueA novel approach for text extraction using effective pattern matching technique
A novel approach for text extraction using effective pattern matching techniqueeSAT Journals
 
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...IJECEIAES
 
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...ijtsrd
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Piyush Malik
 
1. What are the business costs or risks of poor data quality Sup.docx
1.  What are the business costs or risks of poor data quality Sup.docx1.  What are the business costs or risks of poor data quality Sup.docx
1. What are the business costs or risks of poor data quality Sup.docxSONU61709
 
4 postsRe Topic 2 DQ 1Qualitative research produces a v.docx
4 postsRe Topic 2 DQ 1Qualitative research produces a v.docx4 postsRe Topic 2 DQ 1Qualitative research produces a v.docx
4 postsRe Topic 2 DQ 1Qualitative research produces a v.docxmeghanivkwserie
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
A Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisA Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisMichele Thomas
 

Semelhante a Big Data Text Analytics Techniques for Information Extraction (20)

Decision Support for E-Governance: A Text Mining Approach
Decision Support for E-Governance: A Text Mining ApproachDecision Support for E-Governance: A Text Mining Approach
Decision Support for E-Governance: A Text Mining Approach
 
A SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICSA SURVEY OF BIG DATA ANALYTICS
A SURVEY OF BIG DATA ANALYTICS
 
Applying Clustering Techniques for Efficient Text Mining in Twitter Data
Applying Clustering Techniques for Efficient Text Mining in Twitter DataApplying Clustering Techniques for Efficient Text Mining in Twitter Data
Applying Clustering Techniques for Efficient Text Mining in Twitter Data
 
Structured and Unstructured Information Extraction Using Text Mining and Natu...
Structured and Unstructured Information Extraction Using Text Mining and Natu...Structured and Unstructured Information Extraction Using Text Mining and Natu...
Structured and Unstructured Information Extraction Using Text Mining and Natu...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...
IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...
IRJET- Design and Implementation of Sentiment Analyzer for Top Engineering Co...
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
 
A novel approach for text extraction using effective pattern matching technique
A novel approach for text extraction using effective pattern matching techniqueA novel approach for text extraction using effective pattern matching technique
A novel approach for text extraction using effective pattern matching technique
 
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
Insights to Problems, Research Trend and Progress in Techniques of Sentiment ...
 
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...Detailed Investigation of Text Classification and Clustering of Twitter Data ...
Detailed Investigation of Text Classification and Clustering of Twitter Data ...
 
Ijetcas14 409
Ijetcas14 409Ijetcas14 409
Ijetcas14 409
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
[IJET-V1I3P10] Authors : Kalaignanam.K, Aishwarya.M, Vasantharaj.K, Kumaresan...
 
Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...
 
Text mining
Text miningText mining
Text mining
 
1. What are the business costs or risks of poor data quality Sup.docx
1.  What are the business costs or risks of poor data quality Sup.docx1.  What are the business costs or risks of poor data quality Sup.docx
1. What are the business costs or risks of poor data quality Sup.docx
 
4 postsRe Topic 2 DQ 1Qualitative research produces a v.docx
4 postsRe Topic 2 DQ 1Qualitative research produces a v.docx4 postsRe Topic 2 DQ 1Qualitative research produces a v.docx
4 postsRe Topic 2 DQ 1Qualitative research produces a v.docx
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
A Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And AnalysisA Systems Approach To Qualitative Data Management And Analysis
A Systems Approach To Qualitative Data Management And Analysis
 

Mais de IJRAT

96202108
9620210896202108
96202108IJRAT
 
97202107
9720210797202107
97202107IJRAT
 
93202101
9320210193202101
93202101IJRAT
 
92202102
9220210292202102
92202102IJRAT
 
91202104
9120210491202104
91202104IJRAT
 
87202003
8720200387202003
87202003IJRAT
 
87202001
8720200187202001
87202001IJRAT
 
86202013
8620201386202013
86202013IJRAT
 
86202008
8620200886202008
86202008IJRAT
 
86202005
8620200586202005
86202005IJRAT
 
86202004
8620200486202004
86202004IJRAT
 
85202026
8520202685202026
85202026IJRAT
 
711201940
711201940711201940
711201940IJRAT
 
711201939
711201939711201939
711201939IJRAT
 
711201935
711201935711201935
711201935IJRAT
 
711201927
711201927711201927
711201927IJRAT
 
711201905
711201905711201905
711201905IJRAT
 
710201947
710201947710201947
710201947IJRAT
 
712201907
712201907712201907
712201907IJRAT
 
712201903
712201903712201903
712201903IJRAT
 

Mais de IJRAT (20)

96202108
9620210896202108
96202108
 
97202107
9720210797202107
97202107
 
93202101
9320210193202101
93202101
 
92202102
9220210292202102
92202102
 
91202104
9120210491202104
91202104
 
87202003
8720200387202003
87202003
 
87202001
8720200187202001
87202001
 
86202013
8620201386202013
86202013
 
86202008
8620200886202008
86202008
 
86202005
8620200586202005
86202005
 
86202004
8620200486202004
86202004
 
85202026
8520202685202026
85202026
 
711201940
711201940711201940
711201940
 
711201939
711201939711201939
711201939
 
711201935
711201935711201935
711201935
 
711201927
711201927711201927
711201927
 
711201905
711201905711201905
711201905
 
710201947
710201947710201947
710201947
 
712201907
712201907712201907
712201907
 
712201903
712201903712201903
712201903
 

Último

Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodManicka Mamallan Andavar
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfManish Kumar
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptNoman khan
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptxmohitesoham12
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Communityprachaibot
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 

Último (20)

Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdfModule-1-(Building Acoustics) Noise Control (Unit-3). pdf
Module-1-(Building Acoustics) Noise Control (Unit-3). pdf
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).ppt
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
 
Python Programming for basic beginners.pptx
Python Programming for basic beginners.pptxPython Programming for basic beginners.pptx
Python Programming for basic beginners.pptx
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Community
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 

Big Data Text Analytics Techniques for Information Extraction

  • 1. International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637 166 Big Data: Text Analytics Mrs.Balshetwar S.V.1, Prof (Dr.)Tugnayat R.M.2 1HOD, Department of Information Technology, Satara College of Engineering and Management ,limb,satara Shivaji University, Kolhapur (Maharashtra) Email: balshetwar.satara@gmail.com 2Principal, Shri Shankarprasad Agnihotri College of Engineering,,Wardha Sant Gadge Baba Amravati university, Amravati (Maharashtra) Abstract-Every part of this technological world is flooded of big data today. Almost 80% of this big data is unstructured, because the data comes from various new sources like device logs, server logs, twitter feeds, chat data, blogs, web pages, emails, social media content. This makes a huge collection of text data which is created by humans to express themselves to others, so it has become an important source of data that may contain valuable information. Through text analytics techniques we can extract information from these collected sources of data and utilize them in customer management, sentiment analysis, and collaborative analysis. This paper discusses some basic techniques that identify useful patterns from the text in big data. Index Terms - Big data, Text analytics, LDA. 1. INTRODUCTION A text is piece of information through which human communicate with each other. Broad range of application and devices are available for text communication and sharing intentional data and so it is collected at unexpected scale. Decisions are made today on the basis of this data which were previously made on guess or on the models of reality. Big data analysis now drives every aspect of modern application, gadgets, industry as well as society. Text within big data, such as data form newspapers, magazines, web pages, emails, blogs, tweets, is particularly important because there are sources of information that has valuable information for humans. To utilize this large amount of text data it requires techniques for processing the data. Those techniques must have following characteristics: (1) They must be fast & accurate in processing. (2) Must be able to find Relationship with other information. (3) Must remove 100% ambiguity from data. (4) Must handle heterogeneous data efficiently. Steps in big data analysis Flow chart in fig.1 shows big data steps for retrieving or extracting important & valuable information. The analyzing steps shown in fig.1 for big data querying and mining are very different from traditional analysis methods that are worked out on small amount of data. Big data is often noisy, dynamics, interrelated and untrustworthy but nevertheless even noisy big data could be more valuable than small samples because statistics obtained from frequent pattern and correlated analysis usually disclose more reliable hidden patterns knowledge. Analyzing step can extract meaningful and related information by processing the text which is in natural language, but it is not so easy to analyze using simple regression models or decision trees. However the group of technique called as text analytics can help to get deep information from these sources by translating this complex textual information into useful signals that can give deeper analysis. Various sources of data Recording data Cleaning data Integrating Analyzing Interpreting Fig1. Steps in big data analysis 2. LITERATURE REVIEW Based on a survey of over 4,000 information technology (IT) professionals from 93countries and 25 industries, the IBM Tech Trends Report (2011) identified business analytics as one of the four major technology trends in the 2010s. In a survey of the state of business analytics by Bloomberg
  • 2. International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637 167 Businessweek (2011), 97 percent of companies with revenues exceeding $100 million were found to use some form of business analytics. A report by the McKinsey Global Institute (Manyika et al. 2011) predictedthat by 2018, the United States alone will face a shortage of 140,000 to 190,000 people with deep analytical skills, as well as a shortfall of 1.5 million data-savvy managers with the know-how to analyze big data to make effective decisions[1] Emerging analytics research opportunities can be classified into five critical technical areas—(big) data analytics, text analytics, web analytics, network analytics, and mobile analytics—all of which can contribute to BI&A[1]. In “hype cycle for big data” report (July 2013), Gartner positions text analytics as delivering great business benefits and project adoption in next two to five years[2]. Text analytics has its academic roots in information retrieval and computational linguistics. In information retrieval, document representation and query processing are the foundations for developing the vector-space model, Boolean retrieval model, and probabilistic retrieval model, which in turn, became the basis for the modern digital libraries, search engines, and enterprise search systems[3]. In computational linguistics, statistical natural language processing (NLP) techniques for lexical acquisition, word sense disambiguation, part-of-speech-tagging (POST), and probabilistic context-free grammars have also become important for representing text [4]. Text analytics Academic roots Document representation Information retrieval Query processing Application Digital libraries, Search engines NLP Computational linguistic CFG POST Application for representing text 3. ALGORITHMS FOR TEXT ANLYTICS Text analytics refers to the process of deriving high quality information from text. Information is derived by finding/learning patterns and trends in it using statistical methods. To extract specific type of information from text data there are many algorithms but which to apply depends on the type of data analysis project at hand. Some projects are clear in objectives and certain are just trying to deep inspect the data and get some valuable data from mass of information where the outcome is known, which can then be used for further analysis. As per the project in hand by making use of statistical methods based on frequency matrix (counts words appearing in various text sources) or term document matrix (lists all the unique terms in the text which is examined) often gives a new useful feature after applying proper statistical technique. However this technique gives intermediate results which can be used as foundation for further analysis. In this type of method after examining documents a scorecard is prepared showing score of every term in respect of number of times it appears in the document then by applying a threshold only those terms are collected that is above threshold which is then used to construct a larger concept Other text analytic technique may make use of Named Entity Extraction (NEE) , it is a method that identifies every smallest element in text and classifies them into predefined entities like person, place, product, date etc. Making use of NEE, probabilities can be set that a particular document refers to an named entity. NEE is based on natural language processing (NLP). After analyzing the structure of text, NEE generates a score foe every entity that is identified from that text. Considering the score for every entity and applying threshold, those entities can be used in creation of structured features and make use of it further in prediction models. NEE has been successfully developed for news analysis and biomedical application [1]. Another text analytic technique which is widely used in emerging areas like topic model is LDA (Latent Drichelt allocation). LDA is mainly used for finding main topic/themes that are in every part of a large unstructured collection of documents and it is also useful for detecting changes in customer behavior. LDA is an unsupervised method that is applied on unstructured data. In comparison with NEE it is not NLP based rather it looks for pattern in text. LDA can be applied to any type of structured, unstructured, semi structured data from any number of sources to identify patterns in text. Text analytics are widely used in emerging areas like information extraction, topic models and opinion/sentiment analysis. While working on text in big data, where exactly is text analytics technique applied. Text data are typically held as notes, documents and various forms of electronic correspondence (emails for example). Structured data on the other hand are usually contained in databases with fixed structures. Many data mining techniques have been developed to extract useful patterns from structured data and this process is often enhanced by the addition of variables (called features) which add new ‘dimensions’, providing information that is not implicitly contained
  • 3. International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637 168 in existing features. The appropriate processing of text data can allow such new features to be added, improving the effectiveness of predictive models or providing new insights [5]. Let us consider an example for customer information. Structured data about an customer is stored in database using fields like name, salary and order value and there be an unstructured texts about customer information. Both structured and unstructured text is lacking of customer ID. By applying text analytics technique like NEE or LDA Structured Data Unstructured Data Name Salary Order Raj 45,000 3400 Jyoti 65,000 4500 Text Analytics New Feature Data Mining Fig 2. Where to apply text analytics? One can infer the nature, strength or absence of relationships among individuals and yield a new feature like sentiment and then by mining data predictive pattern are created that can be used in application and create valuable information from it. 4. OBSTACLES IN TEXT ANALYTICS Although the field is new, text analytics is achieving a level of development that makes its widespread use. Following are the obstacles for adoption of text analytics 1. How to deploy the results? 2. How to handle heterogeneous data? 3. Lack of methods to determine what exactly is in text. Recent technological development has overcome these problems. 5. CONCLUSION The real strength of big data lies in its utilization. Big data has huge amount of text data and it can be used for extracting sensible information. Making use of big data for training a classifier, applying NLP along with text analytics technique can give valuable information from raw text data. This paper discusses how NEE, LDA and term matrix can be used to extract information from large unstructured text. REFERENCES [1] Hsinchun Chen.; Roger H. L. Chiang.; Veda C. Storey (2012). Business intelligence and analytics:From big data to big impact MIS Quarterly Vol. 36 No. 4, pp. 1165- 1188/December . [2] https://www.gartner.com/doc/2574616 [3] Salton, G. 1989. Automatic Text Processing, Reading, MA: Addison Wesley [4] Manning, C. D., and Schütze, H. 1999. Foundations of Statistical Natural Language Processing, Cambridge, MA: The MIT Press. March, S. T., and Storey, V. C. 2008. “Design Science in the Information Systems Discipline,” MIS Quarterly (32:4), pp. 725-730. [5] http://butleranalytics.com/unstructured-meets-structured- data/ [6] http://www.informs.org/ [7] http://www.twitter.com. Twitter, Inc. [8] http://www.google.com. Google, Inc. [9] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J.Mach. Learn. Res., 3:993– 1022, March 2003. [10] J. Chen, R. Nairn, L. Nelson, M. Bernstein, and E. Chi. Short and Tweet: Experiments on Recommending Content from Information Streams. CHI 2010. [11]W. Dou, X. Wang, R. Chang, and W. Ribarsky. ParallelTopics: A Probabilistic Approach to Exploring Document Collections. In Visual Analytics Science and Technology (VAST), 2011 IEEE Conference on, 2011. Name Salary Order Sentiment Raj 45,000 3400 Negative Jyoti 65,000 4500 Positive Patterns
  • 4. International Journal of Research in Advent Technology, Vol.2, No.6, June 2014 E-ISSN: 2321-9637 169 [12]Salton, G & Buckley, C. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24 (5):513-523. [13]X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval,SIGIR ’06, pages 178–185, New York, NY, USA, 2006. ACM