These slides explain the basic meaning of text mining,its comparision with other data retrieval methods,its subtasks and applications, limitations, present and future of text mining. Also included is the topic data mining with its goals and applications.
2. OUTLINE
What is Text Mining?
What is unstructured text
Need for Text Mining?
Text Mining sub tasks
Applications of text mining
Barriers
Today of text mining
Tomorrow of text mining
Data Mining
Goals of Data Mining
What can Data mining do?
6. Text retrieval
Information is retrieved so as to fulfill the needs of
customers.
Does not discover anything new about the query.
IRS find the result from a large database by matching the
query.
E.g.: the search engines, which identify the relevant
documents according to a given set of words on www.
7. IE is the process of automatically extracting structured
data from unstructured machine readable codes.
It highly relies on Natural Language Processing systems.
Natural Language Processing
It converts samples of human language into formal
representation which can be understood by the computer.
Its types are:
Natural Language Generation System
Natural Language Understanding System
Information extraction
8. Spam filtering
• A spam filter is a program that is used to detect
unwanted email and prevent those
messages from getting to a user's inbox.
Sophisticated program, such as Bayesian filters ,
attempt to identify spam through suspicious word
patterns or word frequency .
• Bayesian spam filtering :It identifies spam e-mail through
suspicious word patterns or word frequency.
Applications of Text Mining
9. Creating suggestion and recommendations
• Text mining helps customers in providing suggestions for online stores such as
amazon, based on their interests. The prediction algorithms are of huge
importance to online stores -the more accurate they are, the more the online store
will sell.
• A large online store like Amazon may have millions of customers and millions
of items in stock. New customers will have limited information about their
preferences, while more established customers may have too much.
• The data on which these algorithms work is constantly updated and changed.
Customers are browsing the site and the prediction algorithm should take the
recently browsed items into consideration.
• Traditionally, these recommendation algorithms have worked by finding similar
customers in the database.
10. Barriers that we need to overcome to
make best use of text mining tools in the
future:
1) Text mining is a complex technical
process that requires skilled staff.
2) It requires unrestricted access to
information sources.
3) Copyright can be a barrier.
11. • Text mining is already producing efficiencies and new knowledge in areas as
diverse as biological science, particle physics, media and communications. It has
been used to hypothesise the causes of rare diseases and how pre-existing drugs
could be used to target different diseases.
• The technique was also used recently to analyse the vast amount of text
produced on websites, blogs and social media such as Twitter - where copyright
holders allowed - and showed that the messages exchanged on Twitter during
the English riots of 2011 were not to blame for inciting riots.
• The business benefit of text mining is in identifying emerging trends, and to
explore consumer preferences and competitor developments. Text mining is
particularly used in larger companies as part of their customer relationship
management strategy and in the pharmaceutical industry as part of their research
and development strategy.
Today of Text Mining
12. Text mining has been garnering a significant amount of
importance in recent years, creating a strong industrial
impact. Based on this observation, it is evident that the future
of text mining companies would be promising in the coming
years. The age of innovation for this is not over.
It is, therefore, unmistakable that in the years to come many
new doors and exciting opportunities will open up through
the advanced text mining services offered by various
professional text mining companies
13. DATA MINING
It is the process of discovering interesting knowledge, such as
patterns, associations, changes, anomalies and significant
structures from large amount of data stored in databases, data
warehouses or other information repositories.
Why Data mining?
Due to wide availability of huge amounts of data in electronic forms and the
imminent need to turn such data into useful information and knowledge for
broad applications including business management, decision report, market
analysis and decision report data mining has attracted a great deal of attention
in information industry in recent years.
14. Prediction: how certain attributes within
the data will behave in the future.
Identification : identify the existence of
an item, an event, an activity.
Classification: partition the data into
categories.
Optimization: optimize the use of limited
resources.
Goals of Data Mining
15. Application of Data Mining
Marketing:
analysis of human behavior.
advertising campaigns.
targeted mailings
segmentation of customers, stores or
products.
Finance:
creditworthiness of clients.
performance analysis of finance
investments.
fraud detection
16. Manufacturing:
optimization of resources.
optimization of manufacturing processes.
product design based on customer
requirements.
Healthcare:
discovering patterns in X-ray images.
analyzing the side effects of drugs.
analyzing the effectiveness of treatments
Continued