1. By :
Birju Tank (141060753017)
Introduction to Data
Mining
GTU PG School, BISAG, GANDHINAGAR
2. • Data mining is also called knowledge discovery and data mining (KDD)
• Data mining is
– extraction of useful patterns from data sources, e.g., databases, texts, web,
image.
• Other Definitions
– Non-trivial extraction of implicit, previously unknown and potentially
useful information from data
– Exploration & analysis, by automatic or semi-automatic means, of large
quantities of data in order to discover meaningful patterns
What is Data Mining?
2
4. • 80% of customers who buy cheese and milk also buy bread, and 5% of
customers buy all of them together
• Cheese, Milk Bread [sup =5%, confid=80%]
Example
4
5. What is (not) Data Mining?
5
• What is not Data Mining?
• Look up phone
number in phone
directory
• Query a Web search
engine for
information about
“Amazon”
• What is Data Mining?
• Certain names are more prevalent in
certain locations (O’Brien, O’Rurke,
O’Reilly… in Boston area)
• Group together similar documents
returned by search engine according to
their context (e.g. Amazon rainforest,
Amazon.com,)
6. 7
Area DBMS OLAP Data Mining
Task
Extraction of
detailed and
summary data
Summaries, trends
and forecasts
Knowledge
discovery of hidden
patterns and insights
Type of result Information Analysis
Insight and
Prediction
Method
Deduction (Ask the
question, verify with
data)
Multidimensional
data modeling,
Aggregation,
Statistics
Induction (Build the
model, apply it to
new data, get the
result)
Example
question
Who purchased
mutual funds in the
last 3 years?
What is the average
income of mutual
fund buyers by
region by year?
Who will buy a
mutual fund in the
next 6 months and
7. • Classification :
– mining patterns that can classify future data into known classes.
• Clustering :
– identifying a set of similarity groups in the data
• Prediction Methods :
– Use some variables to predict unknown or future values of other variables.
Data Mining Tasks
8
8. • Association rule mining
– mining any rule of the form X Y, where X and Y are sets of data items.
• Deviation detection :
– discovering the most significant changes in data.
• Data visualization:
– using graphical methods to show patterns in data.
Data Mining Tasks (Cont..)
9
9. • Rapid computerization of businesses produce huge amount of data
• To make best use of data
• Knowledge discovered from data can be used for competitive advantage
• There is a big gap from stored data to knowledge; and the transition won’t
occur automatically.
• Many interesting things you want to find cannot be found using database
queries
“find people likely to buy my products”
“Who are likely to respond to my promotion”
Why Data Mining is Necessary?
10
10. • Marketing, customer profiling and retention, identifying potential customers,
market segmentation.
• Fraud detection
– Ex. identifying credit card fraud, intrusion detection
• Scientific data analysis
• Text and web mining
• Any application that involves a large amount of data
Applications
11
11. • Your data is full of undiscovered gems; start digging!
Conclusion
12
12. 1. “Research on data mining models for the internet of things”, Shen Bin; Liu Yuan; Wang
Xiaoyi Image Analysis and Signal Processing (IASP), 2010 International Conference on
DOI: 10.1109/IASP.2010.5476146 Publication Year: 2010.
2. “Data mining and ware housing” Bora, S.P. Electronics Computer Technology (ICECT),
2011 3rd International Conference on Volume: 1 DOI: 10.1109/ICECTECH.2011.5941548
Publication Year: 2011.
3. “A study on classification techniques in data mining” Kesavaraj, G. Sukumaran, S.
Computing, Communications and Networking Technologies (ICCCNT),2013 Fourth
International Conference on DOI: 10.1109/ICCCNT.2013.6726842 Publication Year: 2013
References
13