4. Enabling Product
Evolutionary Step Business Question Characteristics
Technologies Providers
"What was my total
Computers, tapes, Retrospective, static data
Data Collection (1960s) revenue in the last IBM, CDC
disks delivery
five years?"
Relational
databases
Oracle, Sybase, Retrospective, dynamic
"What were unit sold (RDBMS),
Data Access (1980s) Informix, IBM, data delivery at record
last March?" Structured Query
Microsoft level
Language (SQL),
ODBC
On-line analytic
"What were unit
processing (OLAP), SPSS, Comshare, Retrospective, dynamic
Data Warehousing & Decision sales in last March?
multidimensional Arbor, Cognos, data delivery at multiple
Support (1990s) Drill down to
databases, data Microstrategy,NCR levels
Other."
warehouses
Advanced SPSS/Clementine,
"What’s likely to algorithms, Lockheed, IBM,
Prospective, proactive
Data Mining (Emerging Today) happen to unit sales multiprocessor SGI, SAS, NCR,
information delivery
next month? Why?" computers, massive Oracle, numerous
databases startups
- RDBMS: A relational database management system
-
- ODBC: Open Database Connectivity (ODBC) provides a standard software API method for using database management systems (DBMS).
- OLAP : Online analytical processing, is an approach to quickly answer multi-dimensional analytical queries.
- SPSS: Statistical Package for the Social Sciences (formerly SPSS) is a computer program used for statistical analysis. Before 2009 it was called SPSS, but in 2009 it was re-
branded as PASW.
6. Results of Data Mining Include
• Forecasting what may happen in the future
• Classifying people or things into groups by
recognizing patterns.
• Clustering people or things into groups based on
their attributes.
• Associating what events are likely to occur
together.
• Sequencing what events are likely to lead to later
events.
7. Results of Data Mining Include
• Forecasting what may happen in the future
• Classifying people or things into groups by
recognizing patterns.
• Clustering people or things into groups based on
their attributes.
• Associating what events are likely to occur
together.
• Sequencing what events are likely to lead to later
events.
8. Results of Data Mining Include
• Forecasting what may happen in the future
• Classifying people or things into groups by
recognizing patterns.
• Clustering people or things into groups based on
their attributes.
• Associating what events are likely to occur
together.
• Sequencing what events are likely to lead to later
events.
9. Results of Data Mining Include
• Forecasting what may happen in the future
• Classifying people or things into groups by
recognizing patterns.
• Clustering people or things into groups based on
their attributes.
• Associating what events are likely to occur
together.
• Sequencing what events are likely to lead to later
events.
10. Results of Data Mining Include
• Forecasting what may happen in the future
• Classifying people or things into groups by
recognizing patterns.
• Clustering people or things into groups based on
their attributes.
• Associating what events are likely to occur
together.
• Sequencing what events are likely to lead to later
events.
11. Data mining is not
• Crunching of bulk data
• “Blind” application of algorithms
• Going to find relationships where none exist
• Presenting data in different ways
• A database intensive task
• A difficult to understand technology requiring
an advanced degree in computer science
12. Data Mining Is
• A class of techniques that find patterns in data.
• A user-centric, interactive process which leverages
analysis technologies and computing power.
• A group of techniques that find relationships that
have not previously been discovered.
• Not reliant on an existing database.
• A relatively easy task that requires knowledge of the
business problem/subject matter expertise.
13. Data Mining Is
• A class of techniques that find patterns in data.
• A user-centric, interactive process which leverages
analysis technologies and computing power.
• A group of techniques that find relationships that
have not previously been discovered.
• Not reliant on an existing database.
• A relatively easy task that requires knowledge of the
business problem/subject matter expertise.
14. Data Mining Is
• A class of techniques that find patterns in data.
• A user-centric, interactive process which leverages
analysis technologies and computing power.
• A group of techniques that find relationships that
have not previously been discovered.
• Not reliant on an existing database.
• A relatively easy task that requires knowledge of the
business problem/subject matter expertise.
15. Data Mining Is
• A class of techniques that find patterns in data.
• A user-centric, interactive process which leverages
analysis technologies and computing power.
• A group of techniques that find relationships that
have not previously been discovered.
• Not reliant on an existing database.
• A relatively easy task that requires knowledge of the
business problem/subject matter expertise.
16. Data Mining Is
• A class of techniques that find patterns in data.
• A user-centric, interactive process which leverages
analysis technologies and computing power.
• A group of techniques that find relationships that
have not previously been discovered.
• Not reliant on an existing database.
• A relatively easy task that requires knowledge of the
business problem/subject matter expertise.
17. Examples of What People are
Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
• Isolate the factors that lead to fraud, waste and abuse
• Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
18. Examples of What People are
Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
• Isolate the factors that lead to fraud, waste and abuse
• Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
19. Examples of What People are
Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
• Isolate the factors that lead to fraud, waste and abuse
• Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
20. Examples of What People are
Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
• Isolate the factors that lead to fraud, waste and abuse
• Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
21. Examples of What People are
Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
• Isolate the factors that lead to fraud, waste and abuse
• Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
22. Examples of What People are
Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
• Isolate the factors that lead to fraud, waste and abuse
• Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
23. Examples of What People are
Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
• Isolate the factors that lead to fraud, waste and abuse
• Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
24. Examples of What People are
Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
• Isolate the factors that lead to fraud, waste and abuse
• Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
25. Examples of What People are
Doing with Data Mining:
Right offer for the right customer throw the right channel in the right time
26. How Can We Do Data Mining?
•A standard process
•Existing data
•Software technologies
•Situational expertise
27. How Can We Do Data Mining?
•A standard process
The data mining process must be
reliable and repeatable by people
with little data mining background.
•Existing data
•Software technologies
•Situational expertise
28. Phases and Tasks
Business Data Data
Modeling Evaluation Deployment
Understanding Understanding Preparation
Determine Collect Initial Data Data Set Select Modeling Evaluate Results Plan Deployment
Business Objectives Initial Data Collection Data Set Description Technique Assessment of Data Deployment Plan
Background Report Modeling Technique Mining Results w.r.t.
Business Objectives Select Data Modeling Assumptions Business Success Plan Monitoring and
Business Success Describe Data Rationale for Inclusion / Criteria Maintenance
Criteria Data Description Report Exclusion Generate Test Design Approved Models Monitoring and
Test Design Maintenance Plan
Situation Assessment Explore Data Clean Data Review Process
Inventory of Resources Data Exploration Report Data Cleaning Report Build Model Review of Process Produce Final Report
Requirements, Parameter Settings Final Report
Assumptions, and Verify Data Quality Construct Data Models Determine Next Steps Final Presentation
Constraints Data Quality Report Derived Attributes Model Description List of Possible Actions
Risks and Contingencies Generated Records Decision Review Project
Terminology Assess Model Experience
Costs and Benefits Integrate Data Model Assessment Documentation
Merged Data Revised Parameter
Determine Settings
Data Mining Goal Format Data
Data Mining Goals Reformatted Data
Data Mining Success
Criteria
Produce Project Plan
Project Plan
Initial Asessment of
Tools and Techniques
31. Phases and Tasks
A) Business Understanding
Determine Business Determine Data
Objectives Mining Goal
Background Data Mining Goals
Business Objectives Data Mining Success
Business Success Criteria
Criteria
Situation Assessment Produce Project Plan
Inventory of Resources Project Plan
Requirements, Initial Asessment of
Assumptions, and Tools and Techniques
Constraints
33. Phases and Tasks
B) Data Understanding
Explore Data
Data Exploration Report
Verify Data Quality
Data Quality Report
Collect Initial Data
Initial Data Collection
Report
Describe Data
Data Description Report
35. Phases and Tasks
C) Data Preparation
Data Set Integrate Data
Data Set Description Merged Data
Select Data Format Data
Rationale for Reformatted Data
Inclusion/Exclusion Construct Data
Clean Data Derived Attributes
Data Cleaning Report Generated Records
37. Phases and Tasks
D) Modeling
Select Modeling
Modeling Technique
Modeling Assumptions
Generate Test Design
Test Design
Build Model
Parameter Settings
Models and Model
Description
Assess Model
Model Assessment
Revised Parameter
39. Phases and Tasks
D) Evaluation
Evaluate Results
Assessment of Data
Mining Results w.r.t.
Business Success
Criteria
Approved Models
Review Process
Review of Process
Determine Next Steps
List of Possible Actions
Decision
41. Phases and Tasks
E) Deployment
Plan Deployment
Deployment Plan
Plan Monitoring and
Maintenance
Monitoring and
Maintenance Plan
Produce Final Report
Final Report
Final Presentation
Review Project
Experience and
Documentation
42. Data mining success story
The US Internal Revenue Service
needed to improve customer service and...
Scheduled its workforce
to provide faster, more accurate answers
to questions.
43. Data mining success story
The US Drug Enforcement Agency needed to be
more effective in their drug “busts” and
analyzed suspects’ cell phone usage to
focus investigations.
44. Data mining success story
HSBC need to cross-sell more effectively by
identifying profiles that would be interested in
higher yielding investments and...
Reduced direct mail costs by 30% while
garnering 95% of the campaign’s
revenue.
46. Data Mining can be utilized in any
organization that needs to find
patterns or relationships in their
data.
47. Data Mining can be utilized in any
organization that needs to find
patterns or relationships in their
data.
By using the DM methodology,
analysts can have a reasonable
level of assurance that their Data
Mining efforts will render useful,
repeatable, and valid results.