O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

An efficient data preprocessing method for mining

506 visualizações

Publicada em

Small presentation on Data mining

Publicada em: Dados e análise
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

An efficient data preprocessing method for mining

  1. 1. An Efficient Data Preprocessing Method for Mining Customer Survey Data PRESENTED BY., KAMESHWARAN S VISHNU J
  2. 2. INTRODUCTION:  Data Preprocessing: Data preprocessing primarily consists of data attribute selection, data cleaning and missing value resolution.  It is well known that over 80% of the time required to carry out any real world data mining project is usually spent on data preprocessing..  Data preprocessing lays the groundwork for data mining. Before the discovery of useful information/knowledge, the target data set must be properly prepared.
  3. 3. INTRODUCTION:  Without adequate preparation of your data, the return on the resources invested in mining is certain to be disappointing.It is well known that success of every data mining algorithm is strongly dependent on a quality of data preprocessing.  In this context it is natural that data preprocessing can be a very complicated task. Sometimes, data preprocessing takes more than half of the total time spent by solving the data mining problem. There are a number of different tools and methods used for preprocessing.  Lets discuss an efficient approach for data preprocessing for mining Web based customer survey data in order to speed up the data preparation process.
  4. 4. Web Based Customer Survey Data:  The Survey Designer designs and distributes the survey on web  The customers are responsible for answering the survey which will reflect their intention about the products or items.  The results of the survey are called as Customer Survey Data.  The proposed approach is based on a unified data model derived from analysis of the characteristics of the customer survey data. The unified data model is used as a standard representation for the incoming data so that it can be mined.
  5. 5.  The data inconsistence between data sets is the main difficulty for the data preprocessing though the survey process analysis. Solution to this problem for mining Web based customer survey data by means of a unified data model to speed the process of data preparation based on the characteristics of the data and the process of survey.  A unified data model is a standard data set whose elements are well defined and unanimous for all survey datasets. Based on the unified data model, the data mining process is seamlessly integrated with the survey process.
  6. 6.  Market needs are mainly defined by the customer needs and desires. It has been demonstrated that 60 to 80 percent of the successful technology-based products have their idea source in the recognition of customer needs and demands and that the financial return from market based products tends to be higher.  Customer’s ideas for a new product can be acquired by a survey. The survey data is collected from the customer through the survey channels such as the Web and then stored in the customer survey database. The data in the database is the raw data as the input of the data mining tools.
  7. 7. The characteristics of the customer survey data are summarized as follows: 1) The data sources are a set of survey data collected iteratively; 2) The same survey result may be stored in the data base differently; 3) There are some empty and missing data as some questions may not be answered by the respondents; 4) Survey data includes both numerical and categorical data; 5) The representations of categorical data are ambiguous.
  8. 8. Data preparation is a significant stage for data mining. It involves identifying data features, extracting the data, and converting it into the formats in which the KDD tools can analyze.
  9. 9. Traditional Data Preprocessing  In general, for any data set, the data preparation process should be applied for each KDD tools.  The raw survey data collected from customers usually can’t be directly used as input for most data mining tools.  It is required to preprocess such data to generate a meaningful data set. For each data mining algorithm, the requirement for the input data set may be different and therefore the method of data preprocessing is also different.  Typically, for m different data mining algorithms and n different raw data sets, there are m×n possible data preparations.
  10. 10. Unified Data Model:  Instead of preprocessing a raw survey data set for each data mining algorithm in a traditional way, we propose a unified data set model.  Using the unified data set as a standard , the number of data transformations (or) preprocessing can be reduced from m*n in a conventional way to m+n.  It saves a lot of time.  It also provides flexibility and adaptability for data preprocessing for different data mining tools.
  11. 11. Sample Survey Data:
  12. 12. Thank You!