O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a navegar o site, você aceita o uso de cookies. Leia nosso Contrato do Usuário e nossa Política de Privacidade.
O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a utilizar o site, você aceita o uso de cookies. Leia nossa Política de Privacidade e nosso Contrato do Usuário para obter mais detalhes.
An Efficient Data Preprocessing
Method for Mining
Customer Survey Data
Data preprocessing primarily consists of data
attribute selection, data cleaning and missing value
It is well known that over 80% of the time required to
carry out any real world data mining project is usually
spent on data preprocessing..
Data preprocessing lays the groundwork for data
mining. Before the discovery of useful
information/knowledge, the target data set must be
Without adequate preparation of your data, the return on the
resources invested in mining is certain to be
disappointing.It is well known that success of every data
mining algorithm is strongly dependent on a quality of data
In this context it is natural that data preprocessing can be a
very complicated task. Sometimes, data preprocessing takes
more than half of the total time spent by solving the data
mining problem. There are a number of different tools and
methods used for preprocessing.
Lets discuss an efficient approach for data preprocessing
for mining Web based customer survey data in order to
speed up the data preparation process.
Web Based Customer Survey
The Survey Designer designs and distributes the survey on
The customers are responsible for answering the survey
which will reflect their intention about the products or
The results of the survey are called as Customer Survey
The proposed approach is based on a unified data model
derived from analysis of the characteristics of the customer
survey data. The unified data model is used as a standard
representation for the incoming data so that it can be mined.
The data inconsistence between data sets is the main
difficulty for the data preprocessing though the survey
process analysis. Solution to this problem for mining
Web based customer survey data by means of a unified
data model to speed the process of data preparation
based on the characteristics of the data and the process
A unified data model is a standard data set whose
elements are well defined and unanimous for all survey
datasets. Based on the unified data model, the data
mining process is seamlessly integrated with the
Market needs are mainly defined by the customer
needs and desires. It has been demonstrated that 60 to
80 percent of the successful technology-based products
have their idea source in the recognition of customer
needs and demands and that the financial return from
market based products tends to be higher.
Customer’s ideas for a new product can be acquired by
a survey. The survey data is collected from the
customer through the survey channels such as the Web
and then stored in the customer survey database. The
data in the database is the raw data as the input of the
data mining tools.
The characteristics of the customer survey data are
summarized as follows:
1) The data sources are a set of survey data
2) The same survey result may be stored in the
data base differently;
3) There are some empty and missing data as
some questions may not be answered by the
4) Survey data includes both numerical and
5) The representations of categorical data are
Data preparation is a significant stage for data mining. It involves
identifying data features, extracting the data, and converting it into the
formats in which the KDD tools can analyze.
Traditional Data Preprocessing
In general, for any data set, the data preparation process
should be applied for each KDD tools.
The raw survey data collected from customers usually can’t
be directly used as input for most data mining tools.
It is required to preprocess such data to generate a
meaningful data set. For each data mining algorithm, the
requirement for the input data set may be different and
therefore the method of data preprocessing is also different.
Typically, for m different data mining algorithms and n
different raw data sets, there are m×n possible data
Unified Data Model:
Instead of preprocessing a raw survey data set
for each data mining algorithm in a traditional
way, we propose a unified data set model.
Using the unified data set as a standard ,
the number of data transformations (or)
preprocessing can be reduced from m*n in a
conventional way to m+n.
It saves a lot of time.
It also provides flexibility and adaptability for
data preprocessing for different data mining