SlideShare uma empresa Scribd logo
1 de 26
27/Sep/2008




Data Mining   July 16, 2009        1
Evolution of Database
              technology
YEAR       PURPOSE
1960’s     Network Model, Batch Reports

1970’s     Relational data model, Executive information Systems

1980’s     Application specific DBMS(spatial data, scientific data,
           image data, …)
1990’s     Terabyte Data warehouses, Object Oriented, middleware
           and web technology
2000’s     Business Process

2010’s     Sensor DB systems, DBs on embedded systems, large
           scale pub/ sub systems
                                             Data Mining   July 16, 2009   2
Motivation : Necessity is the
       mother of invention
   Data explosion problem

    ◦ Automated data collection tools and mature database technology
      lead to tremendous amounts of data stored in databases, data
      warehouses and other information repositories
   We are drowning in data, but starving for knowledge!
   Solution: Data warehousing and data mining

    ◦ Extraction of interesting knowledge (rules, regularities, patterns,
      constraints) from data in large databases



                                                  Data Mining   July 16, 2009   3
Why Data Mining?


      Data, Data, Data Every where …

         I can’t find data I need – data is
          scattered over network

         I can’t get the data I need

         I can’t understand the data I
          need

         I can’t use the data I found


                      Data Mining   July 16, 2009   4
   An abundance of data                 This data occupies
     Super Market Scanners, POS
     data
                                           Terabytes - 10^12 bytes
     Credit cards transactions
     Call Center records
                                           Petabytes - 10^15 bytes
     ATM Machines
     Demographic data
                                           Exabytes - 10^18bytes
     Sensor Networks
     Cameras
                                           Zettabytes - 10^21bytes
     Web server logs
     Customer web site trails
                                           Zottabytes-10^24bytes
     Geographic Information System
     National Medical Records             Walmart - 24 Terabytes
     Weather Images



                                                Data Mining   July 16, 2009   5
   Process of sorting through large amounts of data and picking
    out relevant information

   Process of analyzing data from different perspectives and
    summarizing it into useful information

   Discovering hidden value in database

   It is non-trivial process of identifying valid, novel, useful and
    understandable patterns in data

   Extracting or mining knowledge from large amounts of data


                                              Data Mining   July 16, 2009   6
History Notes – Many Names of Data
              Mining

 YEAR            Names                           USES


  1960    Data Fishing, Data     Statisticians
          Dredging
  1990    Data Mining            DB Community, business


  1989    Knowledge Discovery    AI, Machine Learning community
          in databases
Other Names

Data Archaeology, Information Harvesting, Information Discovery,
Knowledge Extraction,


                                                 Data Mining   July 16, 2009   7
Data Warehousing provides the
                            Enterprise with a memory




         Data Mining provides the
        Enterprise with intelligence

July 16, 2009                      Data Mining      8
Why Data Mining?(Cont..)

   Data Warehouse is single, complete and consistent store of data from
    variety of different sources available to end users

   For example, AT and T handles billions of calls per day. Europe's Very
    Long Baseline Interferometer (VLBI) has 16 telescopes, each of which
    produces 1 Gigabit/second of astronomical data over a 25-day
    observation session

   We need data mining for
      Transforming data into useful information to users
      Present data in useful format
      Provide data access to business analyst, Information technology
       professionals



                                                 Data Mining   July 16, 2009   9
Data Mining Process
   Data Mining is the technique used to carry out KDD.

   Data Mining turns data into information and then to knowledge


                             Information




                   Data

                                           Knowledge



                                              Data Mining   July 16, 2009   10
Steps in Data Mining
1. Data cleaning
        To remove noise and inconsistent data
2. Data integration
   To integrate (compile) multiple data
sources
3. Data selection
   Data relevant to analysis is selected
4. Data transformation
   Summary normalization aggregation operations are performed
   (convert data into two dimension form) and consolidate the data



                                           Data Mining   July 16, 2009   11
Steps in Data Mining(Cont..)
5. Data mining
 Intelligent methods are applied to the data to discover
 knowledge or patterns

6. Pattern evaluation
 Evaluation of the interesting patterns by thresholding

7. Knowledge Discovery
 Visualization and presentation methods are used to present
 the mined knowledge to the user.


                                           Data Mining   July 16, 2009   12
Pattern Evaluation
◦ Data mining: the core of
  knowledge discovery
  process.                         Data Mining

                    Task-relevant Data


      Data                   Selection
      Warehouse
Data Cleaning

          Data Integration


        Databases
                                                 Data Mining   July 16, 2009   13
Data Mining Tasks
1. Classification
•   Classification maps data into predefined groups or classes.
•   It may be represented by methods such as decision trees, etc.

Decision tree
 Flow chart like tree structure
 Each node denotes test of
  an attribute value
 Each branch represents
  outcome of test
 Leaves represent classes
  or class distribution.


                                            Data Mining   July 16, 2009   14
2. Regression
Used to map a data item to a real valued prediction variable.
Example. A manager wants to reach a certain level of savings before his
  retirement. Periodically he predicts his retirement savings by current value
  and several past values. He uses a simple linear regressive formula to
  predict the values of savings in future.


3. Prediction
Many real world applications can be seen
predicting future data states based on
past and current data.
Example -   Predicting flooding is difficult problem


                                                         Data Mining   July 16, 2009   15
4. Clustering
Clustering is similar to classification
except that the groups are not predefined.
5. Association Rule
Association refers to uncovering relationship                              1998
among data.
Used in retail sales community to identify the items                       Bread and
(products) that are frequently                                              Jam sell
                                             Zzzz...
purchased together.                                                         together!




                                             Data Mining   July 16, 2009            16
6. Summarization
Summarization of general characteristics or features of target class of
  data.
Data characterization presented in various forms - pie charts, bar
  charts, curves.
Data discrimination comparison of general features of target class of
  data objects with general features of objects from one or a set of
  contrasting classes.
7. Outlier Analysis
Database may contain data objects that do not comply with general
  behavior model of data. These data objects are called as outliers.
Data mining methods discard outliers as noise or exceptions.
In applications such as fraud detection, rare events may be more
  interesting than regularly occurring events.
                                               Data Mining   July 16, 2009   17
Data Mining: Types of Data

   Relational data and transactional data

   Text

   Images, video

   Mixtures of data




                                         Data Mining   July 16, 2009   18
Data Mining Products

   DataMind -- neurOagent
   Information Discovery -- IDIS
   SAS Institute -- SAS/Neuronets




                                      19
                             Data Mining   July 16, 2009
Data Mining Software
   RapidMiner and Weka – Defining data mining process

   Top 8 data mining software in 2008

           Angoss software
           Infor CRM Epiphany
           Portrait Software
           SAS
           SPSS
           ThinkAnalytics
           Unica
           Viscovery


                                            Data Mining   July 16, 2009   20
Application Areas


       Industry            Application
       Finance             Credit Card Analysis
       Insurance           Fraud Analysis
       Telecommunication   Call record analysis




July 16, 2009                Data Mining          21
Applications
   Financial Industry, Banks, Businesses, E-commerce
    ◦ Stock and investment analysis
    ◦ Identify loyal customers and risky customer
    ◦ Predict customer spending

   Database analysis and decision support
    ◦ Market analysis and management
      target marketing, customer relation management, market basket
       analysis.
    ◦ Risk analysis and management
      Forecasting, quality control, competitive analysis
    ◦ Fraud detection and management

                                                   Data Mining   July 16, 2009   22
Data Mining in Usage

1.   Intelligent Miner
    It is IBM data mining product
    Distinct feature is include scalability of its mining algorithm and tight
     integration with IBM DB2 related data base system.


5.   DB Miner
      Developed by DBMiner Technologies Inc.
     Distinct features of DBMiner are Data cube based Online Analytical
     Mining



                                                   Data Mining   July 16, 2009   23
The Telecomm Slice
Product




Household

Telecomm          o ns
              e gi
             R
   Video                 Europe
                  Far East
   Audio        India

            Retail Direct    Special            Sales Channel




                                             Data Mining   July 16, 2009   24
Conclusion
   Data mining: discovering interesting patterns from large amounts of
    data
   A KDD process includes data cleaning, data integration, data
    selection, transformation, data mining, pattern evaluation, and
    knowledge presentation
   Mining can be performed in a variety of information repositories
   Data mining functionalities: characterization,               discrimination,
    association, classification, clustering, outlier etc




                                                 Data Mining   July 16, 2009       25
Thank you !!!
         Data Mining   July 16, 2009   26

Mais conteúdo relacionado

Mais procurados

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ankur bhalla
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
Saif Ullah
 

Mais procurados (20)

Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Data mining
Data miningData mining
Data mining
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
Web mining
Web miningWeb mining
Web mining
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Chapter 12 outlier
Chapter 12 outlierChapter 12 outlier
Chapter 12 outlier
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 

Destaque

Big Data v Data Mining
Big Data v Data MiningBig Data v Data Mining
Big Data v Data Mining
University of Hertfordshire
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 

Destaque (18)

Big Data v Data Mining
Big Data v Data MiningBig Data v Data Mining
Big Data v Data Mining
 
Data mining and_big_data_web
Data mining and_big_data_webData mining and_big_data_web
Data mining and_big_data_web
 
Lecture 01 Data Mining
Lecture 01 Data MiningLecture 01 Data Mining
Lecture 01 Data Mining
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area
 
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
 
Machine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataMachine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web Data
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data mining
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence Tools
 
Ch 1 Intro to Data Mining
Ch 1 Intro to Data MiningCh 1 Intro to Data Mining
Ch 1 Intro to Data Mining
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
Data mining
Data miningData mining
Data mining
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 

Semelhante a Data Mining Overview

Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
Young Alista
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
Harry Potter
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
James Wong
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
Fraboni Ec
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
Luis Goldster
 

Semelhante a Data Mining Overview (20)

Data mining concepts
Data mining conceptsData mining concepts
Data mining concepts
 
Data mining
Data miningData mining
Data mining
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019
 
D
DD
D
 
Data Mining - Presentation.pptx
Data Mining - Presentation.pptxData Mining - Presentation.pptx
Data Mining - Presentation.pptx
 
isd314-01
isd314-01isd314-01
isd314-01
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notes
 
18231979 Data Mining
18231979 Data Mining18231979 Data Mining
18231979 Data Mining
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 
Data mining and knowledge discovery
Data mining and knowledge discoveryData mining and knowledge discovery
Data mining and knowledge discovery
 

Último

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Philosophy of china and it's charactistics
Philosophy of china and it's charactisticsPhilosophy of china and it's charactistics
Philosophy of china and it's charactistics
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

Data Mining Overview

  • 1. 27/Sep/2008 Data Mining July 16, 2009 1
  • 2. Evolution of Database technology YEAR PURPOSE 1960’s Network Model, Batch Reports 1970’s Relational data model, Executive information Systems 1980’s Application specific DBMS(spatial data, scientific data, image data, …) 1990’s Terabyte Data warehouses, Object Oriented, middleware and web technology 2000’s Business Process 2010’s Sensor DB systems, DBs on embedded systems, large scale pub/ sub systems Data Mining July 16, 2009 2
  • 3. Motivation : Necessity is the mother of invention  Data explosion problem ◦ Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories  We are drowning in data, but starving for knowledge!  Solution: Data warehousing and data mining ◦ Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases Data Mining July 16, 2009 3
  • 4. Why Data Mining?  Data, Data, Data Every where …  I can’t find data I need – data is scattered over network  I can’t get the data I need  I can’t understand the data I need  I can’t use the data I found Data Mining July 16, 2009 4
  • 5. An abundance of data  This data occupies  Super Market Scanners, POS data  Terabytes - 10^12 bytes  Credit cards transactions  Call Center records  Petabytes - 10^15 bytes  ATM Machines  Demographic data  Exabytes - 10^18bytes  Sensor Networks  Cameras  Zettabytes - 10^21bytes  Web server logs  Customer web site trails  Zottabytes-10^24bytes  Geographic Information System  National Medical Records  Walmart - 24 Terabytes  Weather Images Data Mining July 16, 2009 5
  • 6. Process of sorting through large amounts of data and picking out relevant information  Process of analyzing data from different perspectives and summarizing it into useful information  Discovering hidden value in database  It is non-trivial process of identifying valid, novel, useful and understandable patterns in data  Extracting or mining knowledge from large amounts of data Data Mining July 16, 2009 6
  • 7. History Notes – Many Names of Data Mining YEAR Names USES 1960 Data Fishing, Data Statisticians Dredging 1990 Data Mining DB Community, business 1989 Knowledge Discovery AI, Machine Learning community in databases Other Names Data Archaeology, Information Harvesting, Information Discovery, Knowledge Extraction, Data Mining July 16, 2009 7
  • 8. Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence July 16, 2009 Data Mining 8
  • 9. Why Data Mining?(Cont..)  Data Warehouse is single, complete and consistent store of data from variety of different sources available to end users  For example, AT and T handles billions of calls per day. Europe's Very Long Baseline Interferometer (VLBI) has 16 telescopes, each of which produces 1 Gigabit/second of astronomical data over a 25-day observation session  We need data mining for  Transforming data into useful information to users  Present data in useful format  Provide data access to business analyst, Information technology professionals Data Mining July 16, 2009 9
  • 10. Data Mining Process  Data Mining is the technique used to carry out KDD.  Data Mining turns data into information and then to knowledge Information Data Knowledge Data Mining July 16, 2009 10
  • 11. Steps in Data Mining 1. Data cleaning To remove noise and inconsistent data 2. Data integration To integrate (compile) multiple data sources 3. Data selection Data relevant to analysis is selected 4. Data transformation Summary normalization aggregation operations are performed (convert data into two dimension form) and consolidate the data Data Mining July 16, 2009 11
  • 12. Steps in Data Mining(Cont..) 5. Data mining Intelligent methods are applied to the data to discover knowledge or patterns 6. Pattern evaluation Evaluation of the interesting patterns by thresholding 7. Knowledge Discovery Visualization and presentation methods are used to present the mined knowledge to the user. Data Mining July 16, 2009 12
  • 13. Pattern Evaluation ◦ Data mining: the core of knowledge discovery process. Data Mining Task-relevant Data Data Selection Warehouse Data Cleaning Data Integration Databases Data Mining July 16, 2009 13
  • 14. Data Mining Tasks 1. Classification • Classification maps data into predefined groups or classes. • It may be represented by methods such as decision trees, etc. Decision tree  Flow chart like tree structure  Each node denotes test of an attribute value  Each branch represents outcome of test  Leaves represent classes or class distribution. Data Mining July 16, 2009 14
  • 15. 2. Regression Used to map a data item to a real valued prediction variable. Example. A manager wants to reach a certain level of savings before his retirement. Periodically he predicts his retirement savings by current value and several past values. He uses a simple linear regressive formula to predict the values of savings in future. 3. Prediction Many real world applications can be seen predicting future data states based on past and current data. Example - Predicting flooding is difficult problem Data Mining July 16, 2009 15
  • 16. 4. Clustering Clustering is similar to classification except that the groups are not predefined. 5. Association Rule Association refers to uncovering relationship 1998 among data. Used in retail sales community to identify the items Bread and (products) that are frequently Jam sell Zzzz... purchased together. together! Data Mining July 16, 2009 16
  • 17. 6. Summarization Summarization of general characteristics or features of target class of data. Data characterization presented in various forms - pie charts, bar charts, curves. Data discrimination comparison of general features of target class of data objects with general features of objects from one or a set of contrasting classes. 7. Outlier Analysis Database may contain data objects that do not comply with general behavior model of data. These data objects are called as outliers. Data mining methods discard outliers as noise or exceptions. In applications such as fraud detection, rare events may be more interesting than regularly occurring events. Data Mining July 16, 2009 17
  • 18. Data Mining: Types of Data  Relational data and transactional data  Text  Images, video  Mixtures of data Data Mining July 16, 2009 18
  • 19. Data Mining Products  DataMind -- neurOagent  Information Discovery -- IDIS  SAS Institute -- SAS/Neuronets 19 Data Mining July 16, 2009
  • 20. Data Mining Software  RapidMiner and Weka – Defining data mining process  Top 8 data mining software in 2008  Angoss software  Infor CRM Epiphany  Portrait Software  SAS  SPSS  ThinkAnalytics  Unica  Viscovery Data Mining July 16, 2009 20
  • 21. Application Areas Industry Application Finance Credit Card Analysis Insurance Fraud Analysis Telecommunication Call record analysis July 16, 2009 Data Mining 21
  • 22. Applications  Financial Industry, Banks, Businesses, E-commerce ◦ Stock and investment analysis ◦ Identify loyal customers and risky customer ◦ Predict customer spending  Database analysis and decision support ◦ Market analysis and management  target marketing, customer relation management, market basket analysis. ◦ Risk analysis and management  Forecasting, quality control, competitive analysis ◦ Fraud detection and management Data Mining July 16, 2009 22
  • 23. Data Mining in Usage 1. Intelligent Miner  It is IBM data mining product  Distinct feature is include scalability of its mining algorithm and tight integration with IBM DB2 related data base system. 5. DB Miner  Developed by DBMiner Technologies Inc.  Distinct features of DBMiner are Data cube based Online Analytical Mining Data Mining July 16, 2009 23
  • 24. The Telecomm Slice Product Household Telecomm o ns e gi R Video Europe Far East Audio India Retail Direct Special Sales Channel Data Mining July 16, 2009 24
  • 25. Conclusion  Data mining: discovering interesting patterns from large amounts of data  A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation  Mining can be performed in a variety of information repositories  Data mining functionalities: characterization, discrimination, association, classification, clustering, outlier etc Data Mining July 16, 2009 25
  • 26. Thank you !!! Data Mining July 16, 2009 26