SlideShare uma empresa Scribd logo
1 de 27
DATA MINING
SUBMITTED BY :
SHUBHAM GUPTA, SUMAN CHATTERJEE,
SIDDHARTH TIU
SUBMITTED TO :
Dr. A.C.S. Rao
3
1. What is Data Mining
Data mining is the process of discovering interesting patterns (or knowledge)
from large amounts of data.
The data sources can include databases, data warehouses, the Web, other
information repositories, or data that are streamed into the system dynamically.
Why Data Mining
 Credit ratings/targeted marketing:
 Given a database of 100,000 names, which persons are the
least likely to default on their credit cards?
 Identify likely responders to sales promotions
 Fraud detection
 Which types of transactions are likely to be fraudulent, given
the demographics and transactional history of a particular
customer?
 Customer relationship management:
 Which of my customers are likely to be the most loyal, and
which are most likely to leave for a competitor? :
Data mining
 Process of semi-automatically analyzing large
databases to find patterns that are:
 valid: hold on new data with some certainity
 novel: non-obvious to the system
 useful: should be possible to act on the item
 understandable: humans should be able to interpret
the pattern
 Also known as Knowledge Discovery in Databases (KDD)
Applications
 Banking: loan/credit card approval
 predict good customers based on old customers
 Customer relationship management:
 identify those who are likely to leave for a competitor.
 Targeted marketing:
 identify likely responders to promotions
 Fraud detection: telecommunications, financial
transactions
 from an online stream of event identify fraudulent events
 Manufacturing and production:
 automatically adjust knobs when process parameter changes
Applications (continued)
 Medicine: disease outcome, effectiveness of
treatments
 analyze patient disease history: find relationship between
diseases
 Molecular/Pharmaceutical: identify new drugs
 Scientific data analysis:
 identify new galaxies by searching for sub clusters
 Web site/store design and promotion:
 find affinity of visitor to pages and modify layout
Data Mining Techniques
 Classification
 Clustering
 Regression
 Association Rules
Classification Models
 Neural networks
 Statistical models – linear/quadratic discriminants
 Decision trees
 Genetic models
8
Decision Trees
9
Technique for Classification
 Decision-Tree Classifiers
Job
Income
Job
Income Income
Carpenter
Engineer Doctor
Bad Good Bad Good Bad Good
<30K <40K <50K>50K >90K
>100K
Predicting credit risk of a person with the jobs specified.
Decision trees
 Tree where internal nodes are simple decision rules on
one or more attributes and leaf nodes are predicted
class labels.
Salary < 1 M
Prof = teacher
Good
Age < 30
BadBad
Good
Decision Trees
 A decision tree T encodes d (a classifier or regression function) in form of a
tree.
 A node t in T without children is called a leaf node. Otherwise t is called an
internal node.
12
Internal Nodes
 Each internal node has an associated splitting predicate. Most common are
binary predicates.
Example predicates:
 Age <= 20
 Profession in {student, teacher}
 5000*Age + 3*Salary – 10000 > 0
13
Leaf Nodes
Consider leaf node t:
 Classification problem: Node t is labeled with one class label c in
dom(C)
 Regression problem: Two choices
 Piecewise constant model:
t is labeled with a constant y in dom(Y).
 Piecewise linear model:
t is labeled with a linear model
Y = yt + Σ aiXi
14
Example
Encoded classifier:
If (age<30 and
carType=Minivan)
Then YES
If (age <30 and
(carType=Sports or
carType=Truck))
Then NO
If (age >= 30)
Then YES
15
Minivan
Age
Car Type
YES NO
YES
<30 >=30
Sports, Truck
Why Decision Tree Model?
 Relatively fast compared to other classification models
 Obtain similar and sometimes better accuracy compared to other models
 Simple and easy to understand
 Can be converted into simple and easy to understand classification rules
16
Pros and Cons of decision trees
· Cons
- Cannot handle complicated
relationship between features
- simple decision boundaries
- problems with lots of missing
data
· Pros
+ Reasonable training
time
+ Fast application
+ Easy to interpret
+ Easy to implement
+ Can handle large
number of features
Consumer Profiling
Businesses need to effectively leverage
available data to improve customer
acquisition and retention. We will explore
how analytics tools such as decision
trees can help with customer
acquisition.
EXAMPLE
A manufacturer of home improvement
equipment wants to identify which
existing customers are best candidates
for a new product they are developing.
A decision tree such as the one shown
below
Clustering
 Group Data into Clusters
 Similar data is grouped in the same cluster
 Dissimilar data is grouped in the same cluster
 How is this achieved ?
 K-Nearest Neighbor
 A classification method that classifies a point by calculating the
distances between the point and points in the training data set.
Then it assigns the point to the class that is most common among
its k-nearest neighbors (where k is an integer).(2)
 Hierarchical
 Group data into t-trees
Regression
 “Regression deals with the prediction of a value, rather than a class.”
(1, P747)
 Example: Find out if there is a relationship between smoking patients
and cancer related illness.
 Given values: X1, X2... Xn
 Objective predict variable Y
 One way is to predict coefficients a0, a1, a2
 Y = a0 + a1X1 + a2X2 + … anXn
 Linear Regression
Association Rules
 “An association algorithm creates rules that describe how often
events have occurred together.” (2)
 Example: When a customer buys a hammer, then 90% of the
time they will buy nails.
Advantages of Data Mining
 Provides new knowledge from existing data
 Public databases
 Government sources
 Company Databases
 Old data can be used to develop new knowledge
 New knowledge can be used to improve services or products
 Improvements lead to:
 Bigger profits
 More efficient service
Uses of Data Mining
 Sales/ Marketing
 Diversify target market
 Identify clients needs to increase response rates
 Risk Assessment
 Identify Customers that pose high credit risk
 Fraud Detection
 Identify people misusing the system. E.g. People who have two Social
Security Numbers
 Customer Care
 Identify customers likely to change providers
 Identify customer needs
Relationship with other fields
 Overlaps with machine learning, statistics,
artificial intelligence, databases, visualization
but more stress on
 scalability of number of features and instances
 stress on algorithms and architectures whereas
foundations of methods and formulations provided
by statistics and machine learning.
 automation for handling large, heterogeneous data
THANK YOU

Mais conteúdo relacionado

Mais procurados

Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
pcherukumalla
 

Mais procurados (20)

Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Business analytics
Business analyticsBusiness analytics
Business analytics
 
Classification and prediction
Classification and predictionClassification and prediction
Classification and prediction
 
Data mining and its applications!
Data mining and its applications!Data mining and its applications!
Data mining and its applications!
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Mycin
MycinMycin
Mycin
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
05 Classification And Prediction
05   Classification And Prediction05   Classification And Prediction
05 Classification And Prediction
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 

Semelhante a Customer Profiling using Data Mining

Cluster2
Cluster2Cluster2
Cluster2
work
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
Niyitegekabilly
 
datamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptxdatamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptx
shyam1985
 

Semelhante a Customer Profiling using Data Mining (20)

Data mining and its concepts
Data mining and its conceptsData mining and its concepts
Data mining and its concepts
 
Data analytics and visualization
Data analytics and visualizationData analytics and visualization
Data analytics and visualization
 
Cluster2
Cluster2Cluster2
Cluster2
 
Datamining
DataminingDatamining
Datamining
 
Datamining
DataminingDatamining
Datamining
 
Datamining for crm
Datamining for crmDatamining for crm
Datamining for crm
 
Data Mining Lec1.pptx
Data Mining Lec1.pptxData Mining Lec1.pptx
Data Mining Lec1.pptx
 
Chapter14 example2
Chapter14 example2Chapter14 example2
Chapter14 example2
 
Data MiningData MiningData MiningData Mining
Data MiningData MiningData MiningData MiningData MiningData MiningData MiningData Mining
Data MiningData MiningData MiningData Mining
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
datamining.ppt
datamining.pptdatamining.ppt
datamining.ppt
 
datamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptxdatamining management slyabbus and ppt.pptx
datamining management slyabbus and ppt.pptx
 
Introduction to Business Analytics---PPT
Introduction to Business Analytics---PPTIntroduction to Business Analytics---PPT
Introduction to Business Analytics---PPT
 
Unit 1.pptx
Unit 1.pptxUnit 1.pptx
Unit 1.pptx
 
Dwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basisDwdm ppt for the btech student contain basis
Dwdm ppt for the btech student contain basis
 
Dwd mdatamining intro-iep
Dwd mdatamining intro-iepDwd mdatamining intro-iep
Dwd mdatamining intro-iep
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Data analytics
Data analyticsData analytics
Data analytics
 

Último

development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 

Último (20)

Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 

Customer Profiling using Data Mining

  • 1. DATA MINING SUBMITTED BY : SHUBHAM GUPTA, SUMAN CHATTERJEE, SIDDHARTH TIU SUBMITTED TO : Dr. A.C.S. Rao
  • 2. 3 1. What is Data Mining Data mining is the process of discovering interesting patterns (or knowledge) from large amounts of data. The data sources can include databases, data warehouses, the Web, other information repositories, or data that are streamed into the system dynamically.
  • 3. Why Data Mining  Credit ratings/targeted marketing:  Given a database of 100,000 names, which persons are the least likely to default on their credit cards?  Identify likely responders to sales promotions  Fraud detection  Which types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer?  Customer relationship management:  Which of my customers are likely to be the most loyal, and which are most likely to leave for a competitor? :
  • 4. Data mining  Process of semi-automatically analyzing large databases to find patterns that are:  valid: hold on new data with some certainity  novel: non-obvious to the system  useful: should be possible to act on the item  understandable: humans should be able to interpret the pattern  Also known as Knowledge Discovery in Databases (KDD)
  • 5. Applications  Banking: loan/credit card approval  predict good customers based on old customers  Customer relationship management:  identify those who are likely to leave for a competitor.  Targeted marketing:  identify likely responders to promotions  Fraud detection: telecommunications, financial transactions  from an online stream of event identify fraudulent events  Manufacturing and production:  automatically adjust knobs when process parameter changes
  • 6. Applications (continued)  Medicine: disease outcome, effectiveness of treatments  analyze patient disease history: find relationship between diseases  Molecular/Pharmaceutical: identify new drugs  Scientific data analysis:  identify new galaxies by searching for sub clusters  Web site/store design and promotion:  find affinity of visitor to pages and modify layout
  • 7. Data Mining Techniques  Classification  Clustering  Regression  Association Rules
  • 8. Classification Models  Neural networks  Statistical models – linear/quadratic discriminants  Decision trees  Genetic models 8
  • 10. Technique for Classification  Decision-Tree Classifiers Job Income Job Income Income Carpenter Engineer Doctor Bad Good Bad Good Bad Good <30K <40K <50K>50K >90K >100K Predicting credit risk of a person with the jobs specified.
  • 11. Decision trees  Tree where internal nodes are simple decision rules on one or more attributes and leaf nodes are predicted class labels. Salary < 1 M Prof = teacher Good Age < 30 BadBad Good
  • 12. Decision Trees  A decision tree T encodes d (a classifier or regression function) in form of a tree.  A node t in T without children is called a leaf node. Otherwise t is called an internal node. 12
  • 13. Internal Nodes  Each internal node has an associated splitting predicate. Most common are binary predicates. Example predicates:  Age <= 20  Profession in {student, teacher}  5000*Age + 3*Salary – 10000 > 0 13
  • 14. Leaf Nodes Consider leaf node t:  Classification problem: Node t is labeled with one class label c in dom(C)  Regression problem: Two choices  Piecewise constant model: t is labeled with a constant y in dom(Y).  Piecewise linear model: t is labeled with a linear model Y = yt + Σ aiXi 14
  • 15. Example Encoded classifier: If (age<30 and carType=Minivan) Then YES If (age <30 and (carType=Sports or carType=Truck)) Then NO If (age >= 30) Then YES 15 Minivan Age Car Type YES NO YES <30 >=30 Sports, Truck
  • 16. Why Decision Tree Model?  Relatively fast compared to other classification models  Obtain similar and sometimes better accuracy compared to other models  Simple and easy to understand  Can be converted into simple and easy to understand classification rules 16
  • 17. Pros and Cons of decision trees · Cons - Cannot handle complicated relationship between features - simple decision boundaries - problems with lots of missing data · Pros + Reasonable training time + Fast application + Easy to interpret + Easy to implement + Can handle large number of features
  • 18. Consumer Profiling Businesses need to effectively leverage available data to improve customer acquisition and retention. We will explore how analytics tools such as decision trees can help with customer acquisition.
  • 19. EXAMPLE A manufacturer of home improvement equipment wants to identify which existing customers are best candidates for a new product they are developing. A decision tree such as the one shown below
  • 20.
  • 21. Clustering  Group Data into Clusters  Similar data is grouped in the same cluster  Dissimilar data is grouped in the same cluster  How is this achieved ?  K-Nearest Neighbor  A classification method that classifies a point by calculating the distances between the point and points in the training data set. Then it assigns the point to the class that is most common among its k-nearest neighbors (where k is an integer).(2)  Hierarchical  Group data into t-trees
  • 22. Regression  “Regression deals with the prediction of a value, rather than a class.” (1, P747)  Example: Find out if there is a relationship between smoking patients and cancer related illness.  Given values: X1, X2... Xn  Objective predict variable Y  One way is to predict coefficients a0, a1, a2  Y = a0 + a1X1 + a2X2 + … anXn  Linear Regression
  • 23. Association Rules  “An association algorithm creates rules that describe how often events have occurred together.” (2)  Example: When a customer buys a hammer, then 90% of the time they will buy nails.
  • 24. Advantages of Data Mining  Provides new knowledge from existing data  Public databases  Government sources  Company Databases  Old data can be used to develop new knowledge  New knowledge can be used to improve services or products  Improvements lead to:  Bigger profits  More efficient service
  • 25. Uses of Data Mining  Sales/ Marketing  Diversify target market  Identify clients needs to increase response rates  Risk Assessment  Identify Customers that pose high credit risk  Fraud Detection  Identify people misusing the system. E.g. People who have two Social Security Numbers  Customer Care  Identify customers likely to change providers  Identify customer needs
  • 26. Relationship with other fields  Overlaps with machine learning, statistics, artificial intelligence, databases, visualization but more stress on  scalability of number of features and instances  stress on algorithms and architectures whereas foundations of methods and formulations provided by statistics and machine learning.  automation for handling large, heterogeneous data