SlideShare uma empresa Scribd logo
1 de 50
Ahmed Moussa
 30-Feb-2010
What is Data Mining?
The Evolution of Data Analysis
Enabling                       Product
            Evolutionary Step                          Business Question                                                                               Characteristics
                                                                                         Technologies                     Providers

                                                       "What was my total
                                                                            Computers, tapes,                                                    Retrospective, static data
        Data Collection (1960s)                         revenue in the last                                              IBM, CDC
                                                                                disks                                                                    delivery
                                                           five years?"

                                                                              Relational
                                                                               databases
                                                                                                                      Oracle, Sybase,             Retrospective, dynamic
                                                      "What were unit sold    (RDBMS),
           Data Access (1980s)                                                                                        Informix, IBM,              data delivery at record
                                                         last March?"      Structured Query
                                                                                                                         Microsoft                         level
                                                                           Language (SQL),
                                                                                ODBC

                                                                              On-line analytic
                                                         "What were unit
                                                                            processing (OLAP), SPSS, Comshare, Retrospective, dynamic
    Data Warehousing & Decision                        sales in last March?
                                                                             multidimensional   Arbor, Cognos, data delivery at multiple
          Support (1990s)                                 Drill down to
                                                                              databases, data Microstrategy,NCR          levels
                                                              Other."
                                                                                warehouses

                                                                              Advanced                              SPSS/Clementine,
                                                        "What’s likely to    algorithms,                             Lockheed, IBM,
                                                                                                                                                   Prospective, proactive
   Data Mining (Emerging Today)                        happen to unit sales multiprocessor                           SGI, SAS, NCR,
                                                                                                                                                    information delivery
                                                       next month? Why?" computers, massive                         Oracle, numerous
                                                                              databases                                  startups

- RDBMS: A relational database management system

                                                                                        -
- ODBC: Open Database Connectivity (ODBC) provides a standard software API method for using database management systems (DBMS).
- OLAP : Online analytical processing, is an approach to quickly answer multi-dimensional analytical queries.
- SPSS: Statistical Package for the Social Sciences (formerly SPSS) is a computer program used for statistical analysis. Before 2009 it was called SPSS, but in 2009 it was re-
branded as PASW.
Results of Data Mining Include
Results of Data Mining Include
• Forecasting what may happen in the future

• Classifying people or things into groups by
  recognizing patterns.

• Clustering people or things into groups based on
  their attributes.

• Associating what events are likely to occur
  together.

• Sequencing what events are likely to lead to later
  events.
Results of Data Mining Include
• Forecasting what may happen in the future

• Classifying people or things into groups by
  recognizing patterns.

• Clustering people or things into groups based on
  their attributes.

• Associating what events are likely to occur
  together.

• Sequencing what events are likely to lead to later
  events.
Results of Data Mining Include
• Forecasting what may happen in the future

• Classifying people or things into groups by
  recognizing patterns.

• Clustering people or things into groups based on
  their attributes.

• Associating what events are likely to occur
  together.

• Sequencing what events are likely to lead to later
  events.
Results of Data Mining Include
• Forecasting what may happen in the future

• Classifying people or things into groups by
  recognizing patterns.

• Clustering people or things into groups based on
  their attributes.

• Associating what events are likely to occur
  together.

• Sequencing what events are likely to lead to later
  events.
Results of Data Mining Include
• Forecasting what may happen in the future

• Classifying people or things into groups by
  recognizing patterns.

• Clustering people or things into groups based on
  their attributes.

• Associating what events are likely to occur
  together.

• Sequencing what events are likely to lead to later
  events.
Data mining is not
• Crunching of bulk data

• “Blind” application of algorithms
• Going to find relationships where none exist

• Presenting data in different ways

• A database intensive task

• A difficult to understand technology requiring
  an advanced degree in computer science
Data Mining Is
• A class of techniques that find patterns in data.

• A user-centric, interactive process which leverages
  analysis technologies and computing power.

• A group of techniques that find relationships that
  have not previously been discovered.

• Not reliant on an existing database.

• A relatively easy task that requires knowledge of the
  business problem/subject matter expertise.
Data Mining Is
• A class of techniques that find patterns in data.

• A user-centric, interactive process which leverages
  analysis technologies and computing power.

• A group of techniques that find relationships that
  have not previously been discovered.

• Not reliant on an existing database.

• A relatively easy task that requires knowledge of the
  business problem/subject matter expertise.
Data Mining Is
• A class of techniques that find patterns in data.

• A user-centric, interactive process which leverages
  analysis technologies and computing power.

• A group of techniques that find relationships that
  have not previously been discovered.

• Not reliant on an existing database.

• A relatively easy task that requires knowledge of the
  business problem/subject matter expertise.
Data Mining Is
• A class of techniques that find patterns in data.

• A user-centric, interactive process which leverages
  analysis technologies and computing power.

• A group of techniques that find relationships that
  have not previously been discovered.

• Not reliant on an existing database.

• A relatively easy task that requires knowledge of the
  business problem/subject matter expertise.
Data Mining Is
• A class of techniques that find patterns in data.

• A user-centric, interactive process which leverages
  analysis technologies and computing power.

• A group of techniques that find relationships that
  have not previously been discovered.

• Not reliant on an existing database.

• A relatively easy task that requires knowledge of the
  business problem/subject matter expertise.
Examples of What People are
      Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
    • Isolate the factors that lead to fraud, waste and abuse
    • Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
  customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
Examples of What People are
      Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
    • Isolate the factors that lead to fraud, waste and abuse
    • Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
  customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
Examples of What People are
      Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
    • Isolate the factors that lead to fraud, waste and abuse
    • Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
  customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
Examples of What People are
      Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
    • Isolate the factors that lead to fraud, waste and abuse
    • Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
  customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
Examples of What People are
      Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
    • Isolate the factors that lead to fraud, waste and abuse
    • Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
  customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
Examples of What People are
      Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
    • Isolate the factors that lead to fraud, waste and abuse
    • Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
  customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
Examples of What People are
      Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
    • Isolate the factors that lead to fraud, waste and abuse
    • Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
  customers)
• Service Delivery and Customer Retention
• Build profiles of customers likely to use which services
Examples of What People are
      Doing with Data Mining:
• Fraud/Non-Compliance Anomaly detection
    • Isolate the factors that lead to fraud, waste and abuse
    • Target auditing and investigative efforts more effectively
• Credit/Risk Scoring
• Intrusion detection
• Parts failure prediction
• Recruiting/Attracting customers
• Maximizing profitability (cross selling, identifying profitable
  customers)
• Service Delivery and Customer Retention
     • Build profiles of customers likely to use which services
Examples of What People are
      Doing with Data Mining:
Right offer for the right customer throw the right channel in the right time
How Can We Do Data Mining?
           •A standard process


           •Existing data


           •Software technologies


           •Situational expertise
How Can We Do Data Mining?
            •A standard process
           The data mining process must be
           reliable and repeatable by people
           with little data mining background.
            •Existing data


            •Software technologies


            •Situational expertise
Phases and Tasks
   Business                      Data                     Data
                                                                                     Modeling               Evaluation               Deployment
 Understanding               Understanding             Preparation


Determine                 Collect Initial Data      Data Set                    Select Modeling        Evaluate Results           Plan Deployment
 Business Objectives      Initial Data Collection   Data Set Description         Technique             Assessment of Data         Deployment Plan
Background                  Report                                              Modeling Technique      Mining Results w.r.t.
Business Objectives                                 Select Data                 Modeling Assumptions    Business Success          Plan Monitoring and
Business Success          Describe Data             Rationale for Inclusion /                           Criteria                    Maintenance
 Criteria                 Data Description Report    Exclusion                  Generate Test Design   Approved Models            Monitoring and
                                                                                Test Design                                        Maintenance Plan
Situation Assessment      Explore Data              Clean Data                                         Review Process
Inventory of Resources    Data Exploration Report   Data Cleaning Report        Build Model            Review of Process          Produce Final Report
Requirements,                                                                   Parameter Settings                                Final Report
 Assumptions, and         Verify Data Quality       Construct Data              Models                 Determine Next Steps       Final Presentation
 Constraints              Data Quality Report       Derived Attributes          Model Description      List of Possible Actions
Risks and Contingencies                             Generated Records                                  Decision                   Review Project
Terminology                                                                     Assess Model                                      Experience
Costs and Benefits                                  Integrate Data              Model Assessment                                   Documentation
                                                    Merged Data                 Revised Parameter
Determine                                                                        Settings
 Data Mining Goal                                   Format Data
Data Mining Goals                                   Reformatted Data
Data Mining Success
 Criteria

Produce Project Plan
Project Plan
Initial Asessment of
 Tools and Techniques
Phases and Tasks
Phases and Tasks
Phases and Tasks
A) Business Understanding
  Determine Business       Determine Data
  Objectives               Mining Goal
  Background               Data Mining Goals
  Business Objectives      Data Mining Success
  Business Success          Criteria
  Criteria
  Situation Assessment     Produce Project Plan
  Inventory of Resources   Project Plan
  Requirements,            Initial Asessment of
  Assumptions, and         Tools and Techniques
  Constraints
Phases and Tasks
Phases and Tasks
B) Data Understanding
                 Explore Data
                 Data Exploration Report
                 Verify Data Quality
                 Data Quality Report
                 Collect Initial Data
                 Initial Data Collection
                 Report
                 Describe Data
                 Data Description Report
Phases and Tasks
Phases and Tasks
C) Data Preparation
  Data Set               Integrate Data
  Data Set Description   Merged Data
  Select Data            Format Data
  Rationale for          Reformatted Data
  Inclusion/Exclusion    Construct Data
  Clean Data             Derived Attributes
  Data Cleaning Report   Generated Records
Phases and Tasks
Phases and Tasks
D) Modeling
                Select Modeling
                 Modeling Technique
                Modeling Assumptions
                Generate Test Design
                Test Design
                Build Model
                Parameter Settings
                Models and Model
                Description
                Assess Model
                Model Assessment
                Revised Parameter
Phases and Tasks
Phases and Tasks
D) Evaluation
                Evaluate Results
                Assessment of Data
                Mining Results w.r.t.
                Business Success
                Criteria
                Approved Models
                Review Process
                Review of Process
                Determine Next Steps
                List of Possible Actions
                Decision
Phases and Tasks
Phases and Tasks
E) Deployment
                 Plan Deployment
                 Deployment Plan
                 Plan Monitoring and
                 Maintenance
                 Monitoring and
                 Maintenance Plan
                 Produce Final Report
                 Final Report
                 Final Presentation
                 Review Project
                 Experience and
                 Documentation
Data mining success story
       The US Internal Revenue Service
   needed to improve customer service and...




        Scheduled its workforce
to provide faster, more accurate answers
              to questions.
Data mining success story
The US Drug Enforcement Agency needed to be
    more effective in their drug “busts” and




analyzed suspects’ cell phone usage to
         focus investigations.
Data mining success story
  HSBC need to cross-sell more effectively by
 identifying profiles that would be interested in
        higher yielding investments and...




Reduced direct mail costs by 30% while
   garnering 95% of the campaign’s
               revenue.
Final Comments
Data Mining can be utilized in any
 organization that needs to find
 patterns or relationships in their
 data.
Data Mining can be utilized in any
 organization that needs to find
 patterns or relationships in their
 data.

By using the DM methodology,
 analysts can have a reasonable
 level of assurance that their Data
 Mining efforts will render useful,
 repeatable, and valid results.
Data mining
Data mining
Data mining

Mais conteúdo relacionado

Mais procurados

Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big dataHari Priya
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
Linear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in MLLinear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in MLKumud Arora
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmhktripathy
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data miningkavitha muneeshwaran
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionmrizwan969
 
Big-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewBig-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewHamdaoui Younes
 
Data preparation
Data preparationData preparation
Data preparationTony Nguyen
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P
 

Mais procurados (20)

Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Linear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in MLLinear Regression and Logistic Regression in ML
Linear Regression and Logistic Regression in ML
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Data cleaning-outlier-detection
Data cleaning-outlier-detectionData cleaning-outlier-detection
Data cleaning-outlier-detection
 
Data Mining
Data MiningData Mining
Data Mining
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Big-Data in HealthCare _ Overview
Big-Data in HealthCare _ OverviewBig-Data in HealthCare _ Overview
Big-Data in HealthCare _ Overview
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
 
Kdd process
Kdd processKdd process
Kdd process
 
Data preparation
Data preparationData preparation
Data preparation
 
Data Mining
Data MiningData Mining
Data Mining
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 

Destaque

The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data ScienceKenny Daniel
 
Weka By Chathawee Luangmanotham 54102011144
Weka By Chathawee Luangmanotham 54102011144Weka By Chathawee Luangmanotham 54102011144
Weka By Chathawee Luangmanotham 54102011144Chathawee May
 
Prospect Identification from a Credit Database using Regression, Decision Tre...
Prospect Identification from a Credit Database using Regression, Decision Tre...Prospect Identification from a Credit Database using Regression, Decision Tre...
Prospect Identification from a Credit Database using Regression, Decision Tre...Akanksha Jain
 
Data Mining Final Presentation
Data Mining Final PresentationData Mining Final Presentation
Data Mining Final Presentationkrampert
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSivagowry Shathesh
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar reportmayurik19
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive BayesJosh Patterson
 

Destaque (15)

The Evolution of Data Science
The Evolution of Data ScienceThe Evolution of Data Science
The Evolution of Data Science
 
Weka By Chathawee Luangmanotham 54102011144
Weka By Chathawee Luangmanotham 54102011144Weka By Chathawee Luangmanotham 54102011144
Weka By Chathawee Luangmanotham 54102011144
 
Prospect Identification from a Credit Database using Regression, Decision Tre...
Prospect Identification from a Credit Database using Regression, Decision Tre...Prospect Identification from a Credit Database using Regression, Decision Tre...
Prospect Identification from a Credit Database using Regression, Decision Tre...
 
Data Mining Final Presentation
Data Mining Final PresentationData Mining Final Presentation
Data Mining Final Presentation
 
weka data mining
weka data mining weka data mining
weka data mining
 
Random forest
Random forestRandom forest
Random forest
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease prediction
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Random forest
Random forestRandom forest
Random forest
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Classification with Naive Bayes
Classification with Naive BayesClassification with Naive Bayes
Classification with Naive Bayes
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 

Semelhante a Data mining

Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.docbutest
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsIJMER
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data ApplicationsRichard McDougall
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureOdinot Stanislas
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Tech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataTech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataSteve Watt
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
NoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesNoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesVishy Poosala
 
Streaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionStreaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionDATAVERSITY
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time ApplicationsDataWorks Summit
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyHitachi Vantara
 
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata 제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata Gruter
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
 
Identifying the Potential of Near Data Processing for Apache Spark
Identifying the Potential of Near Data Processing for Apache SparkIdentifying the Potential of Near Data Processing for Apache Spark
Identifying the Potential of Near Data Processing for Apache SparkAhsan Javed Awan
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_publicAttila Barta
 
The IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse applianceThe IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse applianceIBM Danmark
 

Semelhante a Data mining (20)

Sample Paper.doc.doc
Sample Paper.doc.docSample Paper.doc.doc
Sample Paper.doc.doc
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Building Big Data Applications
Building Big Data ApplicationsBuilding Big Data Applications
Building Big Data Applications
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
Kurukshetra - Big Data
Kurukshetra - Big DataKurukshetra - Big Data
Kurukshetra - Big Data
 
Galaxy of bits
Galaxy of bitsGalaxy of bits
Galaxy of bits
 
Data mining
Data miningData mining
Data mining
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Tech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big DataTech4Africa - Opportunities around Big Data
Tech4Africa - Opportunities around Big Data
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
NoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, OpportunitiesNoSQL & Big Data Analytics: History, Hype, Opportunities
NoSQL & Big Data Analytics: History, Hype, Opportunities
 
Streaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise AdoptionStreaming Hadoop for Enterprise Adoption
Streaming Hadoop for Enterprise Adoption
 
Steve Watt Presentation
Steve Watt PresentationSteve Watt Presentation
Steve Watt Presentation
 
Big Data Real Time Applications
Big Data Real Time ApplicationsBig Data Real Time Applications
Big Data Real Time Applications
 
Big Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage StrategyBig Data, Big Content, and Aligning Your Storage Strategy
Big Data, Big Content, and Aligning Your Storage Strategy
 
제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata 제1회 Korea Community Day 발표자료 Bigdata
제1회 Korea Community Day 발표자료 Bigdata
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
Identifying the Potential of Near Data Processing for Apache Spark
Identifying the Potential of Near Data Processing for Apache SparkIdentifying the Potential of Near Data Processing for Apache Spark
Identifying the Potential of Near Data Processing for Apache Spark
 
INF2190_W1_2016_public
INF2190_W1_2016_publicINF2190_W1_2016_public
INF2190_W1_2016_public
 
The IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse applianceThe IBM Netezza datawarehouse appliance
The IBM Netezza datawarehouse appliance
 

Mais de Ahmed Moussa

Mais de Ahmed Moussa (12)

The effect of CRM
The effect of CRMThe effect of CRM
The effect of CRM
 
Interview do s
Interview do sInterview do s
Interview do s
 
Bibalex-SACI presentation
Bibalex-SACI presentationBibalex-SACI presentation
Bibalex-SACI presentation
 
INFORMATION SECURITY
INFORMATION SECURITYINFORMATION SECURITY
INFORMATION SECURITY
 
Staff appraisal
Staff appraisalStaff appraisal
Staff appraisal
 
The effect of CRM- short
The effect of CRM-  shortThe effect of CRM-  short
The effect of CRM- short
 
Green egypt
Green egyptGreen egypt
Green egypt
 
Efficiency enhancers
Efficiency enhancersEfficiency enhancers
Efficiency enhancers
 
Data mining
Data miningData mining
Data mining
 
Cisco systems
Cisco systemsCisco systems
Cisco systems
 
Conflict management
Conflict managementConflict management
Conflict management
 
Tqm chaos and complexity
Tqm chaos and complexityTqm chaos and complexity
Tqm chaos and complexity
 

Data mining

  • 2. What is Data Mining?
  • 3. The Evolution of Data Analysis
  • 4. Enabling Product Evolutionary Step Business Question Characteristics Technologies Providers "What was my total Computers, tapes, Retrospective, static data Data Collection (1960s) revenue in the last IBM, CDC disks delivery five years?" Relational databases Oracle, Sybase, Retrospective, dynamic "What were unit sold (RDBMS), Data Access (1980s) Informix, IBM, data delivery at record last March?" Structured Query Microsoft level Language (SQL), ODBC On-line analytic "What were unit processing (OLAP), SPSS, Comshare, Retrospective, dynamic Data Warehousing & Decision sales in last March? multidimensional Arbor, Cognos, data delivery at multiple Support (1990s) Drill down to databases, data Microstrategy,NCR levels Other." warehouses Advanced SPSS/Clementine, "What’s likely to algorithms, Lockheed, IBM, Prospective, proactive Data Mining (Emerging Today) happen to unit sales multiprocessor SGI, SAS, NCR, information delivery next month? Why?" computers, massive Oracle, numerous databases startups - RDBMS: A relational database management system - - ODBC: Open Database Connectivity (ODBC) provides a standard software API method for using database management systems (DBMS). - OLAP : Online analytical processing, is an approach to quickly answer multi-dimensional analytical queries. - SPSS: Statistical Package for the Social Sciences (formerly SPSS) is a computer program used for statistical analysis. Before 2009 it was called SPSS, but in 2009 it was re- branded as PASW.
  • 5. Results of Data Mining Include
  • 6. Results of Data Mining Include • Forecasting what may happen in the future • Classifying people or things into groups by recognizing patterns. • Clustering people or things into groups based on their attributes. • Associating what events are likely to occur together. • Sequencing what events are likely to lead to later events.
  • 7. Results of Data Mining Include • Forecasting what may happen in the future • Classifying people or things into groups by recognizing patterns. • Clustering people or things into groups based on their attributes. • Associating what events are likely to occur together. • Sequencing what events are likely to lead to later events.
  • 8. Results of Data Mining Include • Forecasting what may happen in the future • Classifying people or things into groups by recognizing patterns. • Clustering people or things into groups based on their attributes. • Associating what events are likely to occur together. • Sequencing what events are likely to lead to later events.
  • 9. Results of Data Mining Include • Forecasting what may happen in the future • Classifying people or things into groups by recognizing patterns. • Clustering people or things into groups based on their attributes. • Associating what events are likely to occur together. • Sequencing what events are likely to lead to later events.
  • 10. Results of Data Mining Include • Forecasting what may happen in the future • Classifying people or things into groups by recognizing patterns. • Clustering people or things into groups based on their attributes. • Associating what events are likely to occur together. • Sequencing what events are likely to lead to later events.
  • 11. Data mining is not • Crunching of bulk data • “Blind” application of algorithms • Going to find relationships where none exist • Presenting data in different ways • A database intensive task • A difficult to understand technology requiring an advanced degree in computer science
  • 12. Data Mining Is • A class of techniques that find patterns in data. • A user-centric, interactive process which leverages analysis technologies and computing power. • A group of techniques that find relationships that have not previously been discovered. • Not reliant on an existing database. • A relatively easy task that requires knowledge of the business problem/subject matter expertise.
  • 13. Data Mining Is • A class of techniques that find patterns in data. • A user-centric, interactive process which leverages analysis technologies and computing power. • A group of techniques that find relationships that have not previously been discovered. • Not reliant on an existing database. • A relatively easy task that requires knowledge of the business problem/subject matter expertise.
  • 14. Data Mining Is • A class of techniques that find patterns in data. • A user-centric, interactive process which leverages analysis technologies and computing power. • A group of techniques that find relationships that have not previously been discovered. • Not reliant on an existing database. • A relatively easy task that requires knowledge of the business problem/subject matter expertise.
  • 15. Data Mining Is • A class of techniques that find patterns in data. • A user-centric, interactive process which leverages analysis technologies and computing power. • A group of techniques that find relationships that have not previously been discovered. • Not reliant on an existing database. • A relatively easy task that requires knowledge of the business problem/subject matter expertise.
  • 16. Data Mining Is • A class of techniques that find patterns in data. • A user-centric, interactive process which leverages analysis technologies and computing power. • A group of techniques that find relationships that have not previously been discovered. • Not reliant on an existing database. • A relatively easy task that requires knowledge of the business problem/subject matter expertise.
  • 17. Examples of What People are Doing with Data Mining: • Fraud/Non-Compliance Anomaly detection • Isolate the factors that lead to fraud, waste and abuse • Target auditing and investigative efforts more effectively • Credit/Risk Scoring • Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services
  • 18. Examples of What People are Doing with Data Mining: • Fraud/Non-Compliance Anomaly detection • Isolate the factors that lead to fraud, waste and abuse • Target auditing and investigative efforts more effectively • Credit/Risk Scoring • Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services
  • 19. Examples of What People are Doing with Data Mining: • Fraud/Non-Compliance Anomaly detection • Isolate the factors that lead to fraud, waste and abuse • Target auditing and investigative efforts more effectively • Credit/Risk Scoring • Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services
  • 20. Examples of What People are Doing with Data Mining: • Fraud/Non-Compliance Anomaly detection • Isolate the factors that lead to fraud, waste and abuse • Target auditing and investigative efforts more effectively • Credit/Risk Scoring • Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services
  • 21. Examples of What People are Doing with Data Mining: • Fraud/Non-Compliance Anomaly detection • Isolate the factors that lead to fraud, waste and abuse • Target auditing and investigative efforts more effectively • Credit/Risk Scoring • Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services
  • 22. Examples of What People are Doing with Data Mining: • Fraud/Non-Compliance Anomaly detection • Isolate the factors that lead to fraud, waste and abuse • Target auditing and investigative efforts more effectively • Credit/Risk Scoring • Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services
  • 23. Examples of What People are Doing with Data Mining: • Fraud/Non-Compliance Anomaly detection • Isolate the factors that lead to fraud, waste and abuse • Target auditing and investigative efforts more effectively • Credit/Risk Scoring • Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services
  • 24. Examples of What People are Doing with Data Mining: • Fraud/Non-Compliance Anomaly detection • Isolate the factors that lead to fraud, waste and abuse • Target auditing and investigative efforts more effectively • Credit/Risk Scoring • Intrusion detection • Parts failure prediction • Recruiting/Attracting customers • Maximizing profitability (cross selling, identifying profitable customers) • Service Delivery and Customer Retention • Build profiles of customers likely to use which services
  • 25. Examples of What People are Doing with Data Mining: Right offer for the right customer throw the right channel in the right time
  • 26. How Can We Do Data Mining? •A standard process •Existing data •Software technologies •Situational expertise
  • 27. How Can We Do Data Mining? •A standard process The data mining process must be reliable and repeatable by people with little data mining background. •Existing data •Software technologies •Situational expertise
  • 28. Phases and Tasks Business Data Data Modeling Evaluation Deployment Understanding Understanding Preparation Determine Collect Initial Data Data Set Select Modeling Evaluate Results Plan Deployment Business Objectives Initial Data Collection Data Set Description Technique Assessment of Data Deployment Plan Background Report Modeling Technique Mining Results w.r.t. Business Objectives Select Data Modeling Assumptions Business Success Plan Monitoring and Business Success Describe Data Rationale for Inclusion / Criteria Maintenance Criteria Data Description Report Exclusion Generate Test Design Approved Models Monitoring and Test Design Maintenance Plan Situation Assessment Explore Data Clean Data Review Process Inventory of Resources Data Exploration Report Data Cleaning Report Build Model Review of Process Produce Final Report Requirements, Parameter Settings Final Report Assumptions, and Verify Data Quality Construct Data Models Determine Next Steps Final Presentation Constraints Data Quality Report Derived Attributes Model Description List of Possible Actions Risks and Contingencies Generated Records Decision Review Project Terminology Assess Model Experience Costs and Benefits Integrate Data Model Assessment Documentation Merged Data Revised Parameter Determine Settings Data Mining Goal Format Data Data Mining Goals Reformatted Data Data Mining Success Criteria Produce Project Plan Project Plan Initial Asessment of Tools and Techniques
  • 31. Phases and Tasks A) Business Understanding Determine Business Determine Data Objectives Mining Goal Background Data Mining Goals Business Objectives Data Mining Success Business Success Criteria Criteria Situation Assessment Produce Project Plan Inventory of Resources Project Plan Requirements, Initial Asessment of Assumptions, and Tools and Techniques Constraints
  • 33. Phases and Tasks B) Data Understanding Explore Data Data Exploration Report Verify Data Quality Data Quality Report Collect Initial Data Initial Data Collection Report Describe Data Data Description Report
  • 35. Phases and Tasks C) Data Preparation Data Set Integrate Data Data Set Description Merged Data Select Data Format Data Rationale for Reformatted Data Inclusion/Exclusion Construct Data Clean Data Derived Attributes Data Cleaning Report Generated Records
  • 37. Phases and Tasks D) Modeling Select Modeling Modeling Technique Modeling Assumptions Generate Test Design Test Design Build Model Parameter Settings Models and Model Description Assess Model Model Assessment Revised Parameter
  • 39. Phases and Tasks D) Evaluation Evaluate Results Assessment of Data Mining Results w.r.t. Business Success Criteria Approved Models Review Process Review of Process Determine Next Steps List of Possible Actions Decision
  • 41. Phases and Tasks E) Deployment Plan Deployment Deployment Plan Plan Monitoring and Maintenance Monitoring and Maintenance Plan Produce Final Report Final Report Final Presentation Review Project Experience and Documentation
  • 42. Data mining success story The US Internal Revenue Service needed to improve customer service and... Scheduled its workforce to provide faster, more accurate answers to questions.
  • 43. Data mining success story The US Drug Enforcement Agency needed to be more effective in their drug “busts” and analyzed suspects’ cell phone usage to focus investigations.
  • 44. Data mining success story HSBC need to cross-sell more effectively by identifying profiles that would be interested in higher yielding investments and... Reduced direct mail costs by 30% while garnering 95% of the campaign’s revenue.
  • 46. Data Mining can be utilized in any organization that needs to find patterns or relationships in their data.
  • 47. Data Mining can be utilized in any organization that needs to find patterns or relationships in their data. By using the DM methodology, analysts can have a reasonable level of assurance that their Data Mining efforts will render useful, repeatable, and valid results.