SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
DATA MINING PROCESS
Lecture 2
1.11.2012
Barbro Back
DATA MINING PROCESSES- STANDARD
PROCESSES
  Crisp   – DM
      Cross-Industry Standard Process for Data Mining
  Semma
      Is specific to SAS
Cross-Industry Standard Process
                                       for Data Mining (CRISP-DM)
                                       provides an overview of the life
                                       cycle of a data mining project.

                                       Six phases:
                                       Business understanding
                                       Data understanding
                                       Data preparation
                                       Modeling
                                       Evaluation
                                       Deployment

Phases of the CRISP-DM Process Model
CRISP- DM
1.    Business Understanding
2.    Data Understanding
3.    Data Preparation
4.    Modeling
5.    Evaluation
6.    Deployment
1 BUSINESS UNDERSTANDING
  Includes:
      Determining business objectives
        A managerial need for new knowledge
           What types of customers are interested in each of our

            products?
           What are typical profiles of our customers and how much

            value do each of them provide to us
      Assessing the current situation
      Establishing data mining goals
      Developing a project plan including a budget
2 DATA UNDERSTANDING
  Selectthe data
  Three important issues
      Set up a concise and clear description of the problem
      Identify the relevant data for the problem description
      (The selected variables should be independent of each
       other, depends on the method)
  Types    of data
      Demographic data – income, education, gender etc
      Socio-graphic data – hobbies, club memberships etc
      Transactional data –sales records, credit card spending etc.
      Quantitative data – numerical values
      Qualitative data – contains nominal and ordinal data
SCALES
  Nominal  – no order between data points - gender
  Ordinal – order between data points – ranking
   results
  Interval – order between data points and equal
   distances between measurements – no true zero
   point
  Ratio – an interval scale with a true zero point –
   Sales has doubled - sales previous month 1 milj.,
   this month 2 milj.

  Question: Is the Likert scale an ordinal or
  interval scale?
3 DATA PREPARATION
  Cleandata for better quality
  Convert data to be consistent

  Treatment of missing values

  Redundant data

  Determine the data types:
      In SPSS Modeler the following data types are used
      RANGE Numeric values (integer, real)
      FLAG Binary (yes/no, 0/1)
      SET Data with distinct multiple values, (string)
      TYPELESS For other types of data
4 MODELING
  Data   treatment
      Training set, validation set, test set
  Data   mining techniques
      Association
      Classification
      Clustering-segmentation
      Predictions
      Sequential Patterns
      Similar Time Sequences
5 EVALUATION
  How
     to recognize the business value from
 knowledge discovered.
      A puzzle to be solved between data analysts, business
       analysts and decision makers
  Which    visualization tool to use
      Pie charts, histograms, box plots, scatter plots, self-
       organizing maps
6 DEPLOYMENT
  Resultsneed to be reported to project sponsors
  Monitoring for change is important
SEMMA (BY THE SAS INSTITUTE)
  Sample

  Explore

  Modify

  Model

  Assess

  See
      http://www.sas.com/offices/europe/uk/technologies/
       analytics/datamining/miner/semma.html
AN APPLICATION EXAMPLE                (CRISP – DM)
  Topredict which customers would be insolvent early
  enough for the firm to take preventive actions
  Billing
         period was 2 months
  Customers used their phone for 4 weeks

  Received  bill about 1 week later
  Payment was due 30 days after receiving the bill

  Actions if bill not paid before 14 days after due date.

  Phone disconnected if bill exceeded a certain amount
  Hypothesis: Customer’s change their calling
   behaviour before becoming insolvent
EXAMPLE CONT.
  Data 100 000 customers
  17 month period



  Discriminant   Analysis, decision trees and neural
   networks were used
  2066 cases

  46 initial variables

  Costs were allocated to misclassification errors


  Final result:
  89.8 % correctly classified with test data and a cost
   function = 360 € compared to 14 580 € in the first
   run.

Mais conteúdo relacionado

Mais procurados

Data Mining Techniques for CRM
Data Mining Techniques for CRMData Mining Techniques for CRM
Data Mining Techniques for CRMShyaamini Balu
 
Business Intelligence Introduction
Business Intelligence IntroductionBusiness Intelligence Introduction
Business Intelligence IntroductionAmr Ali
 
Data warehouse Vs Big Data
Data warehouse Vs Big Data Data warehouse Vs Big Data
Data warehouse Vs Big Data Lisette ZOUNON
 
Subscriber Data Mining in Telecommunication
Subscriber Data Mining in TelecommunicationSubscriber Data Mining in Telecommunication
Subscriber Data Mining in TelecommunicationNarayan Kandel
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industryskewdlogix
 
Business Intelligence Module 1
Business Intelligence Module 1Business Intelligence Module 1
Business Intelligence Module 1Home
 
Data mining tutorial
Data mining tutorialData mining tutorial
Data mining tutorialgrinu
 
Predictive analytics km chicago
Predictive analytics km chicagoPredictive analytics km chicago
Predictive analytics km chicagoKM Chicago
 
Data Mining in Telecommunication Industry
Data Mining in Telecommunication IndustryData Mining in Telecommunication Industry
Data Mining in Telecommunication Industryijsrd.com
 
Business Intelligence Module 3
Business Intelligence Module 3Business Intelligence Module 3
Business Intelligence Module 3Home
 
USE OF DATA MINING IN BANKING SECTOR
USE OF DATA MINING IN BANKING SECTORUSE OF DATA MINING IN BANKING SECTOR
USE OF DATA MINING IN BANKING SECTORarpit bhadoriya
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discoveryHoang Nguyen
 
Business Intelligence Module 2
Business Intelligence Module 2Business Intelligence Module 2
Business Intelligence Module 2Home
 
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...ijcsit
 
DW DIMENSN MODELNG
DW DIMENSN MODELNGDW DIMENSN MODELNG
DW DIMENSN MODELNGDivya Tadi
 

Mais procurados (20)

Data Mining Techniques for CRM
Data Mining Techniques for CRMData Mining Techniques for CRM
Data Mining Techniques for CRM
 
Business Intelligence Introduction
Business Intelligence IntroductionBusiness Intelligence Introduction
Business Intelligence Introduction
 
Unit ii data analytics
Unit ii data analytics Unit ii data analytics
Unit ii data analytics
 
Data warehouse Vs Big Data
Data warehouse Vs Big Data Data warehouse Vs Big Data
Data warehouse Vs Big Data
 
Subscriber Data Mining in Telecommunication
Subscriber Data Mining in TelecommunicationSubscriber Data Mining in Telecommunication
Subscriber Data Mining in Telecommunication
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
 
Business Intelligence Module 1
Business Intelligence Module 1Business Intelligence Module 1
Business Intelligence Module 1
 
Data mining tutorial
Data mining tutorialData mining tutorial
Data mining tutorial
 
Business Intelligence - Conceptual Introduction
Business Intelligence - Conceptual IntroductionBusiness Intelligence - Conceptual Introduction
Business Intelligence - Conceptual Introduction
 
Unit 4 Advanced Data Analytics
Unit 4 Advanced Data AnalyticsUnit 4 Advanced Data Analytics
Unit 4 Advanced Data Analytics
 
Predictive analytics km chicago
Predictive analytics km chicagoPredictive analytics km chicago
Predictive analytics km chicago
 
Data mining
Data miningData mining
Data mining
 
Day 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business AnalyticsDay 1 (Lecture 2): Business Analytics
Day 1 (Lecture 2): Business Analytics
 
Data Mining in Telecommunication Industry
Data Mining in Telecommunication IndustryData Mining in Telecommunication Industry
Data Mining in Telecommunication Industry
 
Business Intelligence Module 3
Business Intelligence Module 3Business Intelligence Module 3
Business Intelligence Module 3
 
USE OF DATA MINING IN BANKING SECTOR
USE OF DATA MINING IN BANKING SECTORUSE OF DATA MINING IN BANKING SECTOR
USE OF DATA MINING IN BANKING SECTOR
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Business Intelligence Module 2
Business Intelligence Module 2Business Intelligence Module 2
Business Intelligence Module 2
 
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
DATA MINING MODEL PERFORMANCE OF SALES PREDICTIVE ALGORITHMS BASED ON RAPIDMI...
 
DW DIMENSN MODELNG
DW DIMENSN MODELNGDW DIMENSN MODELNG
DW DIMENSN MODELNG
 

Semelhante a CRISP-DM: The Cross-Industry Standard Process for Data Mining

Data mining (prefinals)
Data mining (prefinals)Data mining (prefinals)
Data mining (prefinals)sadam33146
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss sessionM Baddar
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Miningtobiemuir
 
Lecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptLecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptAsadkhan47384
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
Data mining
Data miningData mining
Data miningsagar dl
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena PekezInstitute of Contemporary Sciences
 
Customer Relationship Management unit 5 trends in crm
Customer Relationship Management unit 5 trends in crmCustomer Relationship Management unit 5 trends in crm
Customer Relationship Management unit 5 trends in crmGanesha Pandian
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013nkabra
 
Module 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdfModule 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdffathiah5
 
Fuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernanceFuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernancePedro Martins
 
Erp related technologies
Erp related technologiesErp related technologies
Erp related technologiesLalit Singh
 
Presentation Title
Presentation TitlePresentation Title
Presentation Titlebutest
 

Semelhante a CRISP-DM: The Cross-Industry Standard Process for Data Mining (20)

Data mining (prefinals)
Data mining (prefinals)Data mining (prefinals)
Data mining (prefinals)
 
ml-02x01.pdf
ml-02x01.pdfml-02x01.pdf
ml-02x01.pdf
 
Egypt hackathon 2014 analytics & spss session
Egypt hackathon 2014   analytics & spss sessionEgypt hackathon 2014   analytics & spss session
Egypt hackathon 2014 analytics & spss session
 
Data Mining Technique - SEMMA
Data Mining Technique - SEMMAData Mining Technique - SEMMA
Data Mining Technique - SEMMA
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 
Lecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.pptLecture 10 - DataMiningEngineering.ppt
Lecture 10 - DataMiningEngineering.ppt
 
Kdd crisp-semma
Kdd crisp-semmaKdd crisp-semma
Kdd crisp-semma
 
Analytics
AnalyticsAnalytics
Analytics
 
Data mining knowing the unknown
Data mining knowing the unknownData mining knowing the unknown
Data mining knowing the unknown
 
Data mining applications
Data mining applicationsData mining applications
Data mining applications
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Data mining
Data miningData mining
Data mining
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
 
Customer Relationship Management unit 5 trends in crm
Customer Relationship Management unit 5 trends in crmCustomer Relationship Management unit 5 trends in crm
Customer Relationship Management unit 5 trends in crm
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 
Module 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdfModule 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdf
 
Fuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data GovernanceFuel your Data-Driven Ambitions with Data Governance
Fuel your Data-Driven Ambitions with Data Governance
 
Erp related technologies
Erp related technologiesErp related technologies
Erp related technologies
 
Presentation Title
Presentation TitlePresentation Title
Presentation Title
 

Último

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 

Último (20)

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

CRISP-DM: The Cross-Industry Standard Process for Data Mining

  • 1. DATA MINING PROCESS Lecture 2 1.11.2012 Barbro Back
  • 2. DATA MINING PROCESSES- STANDARD PROCESSES   Crisp – DM   Cross-Industry Standard Process for Data Mining   Semma   Is specific to SAS
  • 3. Cross-Industry Standard Process for Data Mining (CRISP-DM) provides an overview of the life cycle of a data mining project. Six phases: Business understanding Data understanding Data preparation Modeling Evaluation Deployment Phases of the CRISP-DM Process Model
  • 4. CRISP- DM 1.  Business Understanding 2.  Data Understanding 3.  Data Preparation 4.  Modeling 5.  Evaluation 6.  Deployment
  • 5. 1 BUSINESS UNDERSTANDING   Includes:   Determining business objectives  A managerial need for new knowledge  What types of customers are interested in each of our products?  What are typical profiles of our customers and how much value do each of them provide to us   Assessing the current situation   Establishing data mining goals   Developing a project plan including a budget
  • 6. 2 DATA UNDERSTANDING   Selectthe data   Three important issues   Set up a concise and clear description of the problem   Identify the relevant data for the problem description   (The selected variables should be independent of each other, depends on the method)   Types of data   Demographic data – income, education, gender etc   Socio-graphic data – hobbies, club memberships etc   Transactional data –sales records, credit card spending etc.   Quantitative data – numerical values   Qualitative data – contains nominal and ordinal data
  • 7. SCALES   Nominal – no order between data points - gender   Ordinal – order between data points – ranking results   Interval – order between data points and equal distances between measurements – no true zero point   Ratio – an interval scale with a true zero point – Sales has doubled - sales previous month 1 milj., this month 2 milj.   Question: Is the Likert scale an ordinal or interval scale?
  • 8. 3 DATA PREPARATION   Cleandata for better quality   Convert data to be consistent   Treatment of missing values   Redundant data   Determine the data types:   In SPSS Modeler the following data types are used   RANGE Numeric values (integer, real)   FLAG Binary (yes/no, 0/1)   SET Data with distinct multiple values, (string)   TYPELESS For other types of data
  • 9. 4 MODELING   Data treatment   Training set, validation set, test set   Data mining techniques   Association   Classification   Clustering-segmentation   Predictions   Sequential Patterns   Similar Time Sequences
  • 10. 5 EVALUATION   How to recognize the business value from knowledge discovered.   A puzzle to be solved between data analysts, business analysts and decision makers   Which visualization tool to use   Pie charts, histograms, box plots, scatter plots, self- organizing maps
  • 11. 6 DEPLOYMENT   Resultsneed to be reported to project sponsors   Monitoring for change is important
  • 12. SEMMA (BY THE SAS INSTITUTE)   Sample   Explore   Modify   Model   Assess   See   http://www.sas.com/offices/europe/uk/technologies/ analytics/datamining/miner/semma.html
  • 13. AN APPLICATION EXAMPLE (CRISP – DM)   Topredict which customers would be insolvent early enough for the firm to take preventive actions   Billing period was 2 months   Customers used their phone for 4 weeks   Received bill about 1 week later   Payment was due 30 days after receiving the bill   Actions if bill not paid before 14 days after due date.   Phone disconnected if bill exceeded a certain amount   Hypothesis: Customer’s change their calling behaviour before becoming insolvent
  • 14. EXAMPLE CONT.   Data 100 000 customers   17 month period   Discriminant Analysis, decision trees and neural networks were used   2066 cases   46 initial variables   Costs were allocated to misclassification errors   Final result:   89.8 % correctly classified with test data and a cost function = 360 € compared to 14 580 € in the first run.