SlideShare uma empresa Scribd logo
1 de 4
Integrated Intelligent Research(IIR) International Journal of Business Intelligent
Volume: 04 Issue: 01 June 2015,Pages No.26- 29
ISSN: 2278-2400
26
Machine Learning Approaches and its Challenges
Raj Mohan Kumaravel1
, Ilango Paramasivam2
1
Research Scholar, School of Information technology & Engineering VIT University, Vellore
2
Professor, School of Computing Science & Engineering, VIT University, Vellore
E-Mail: k.rajmohan90@gmail.com, pilango@vit.ac.in
Abstract – Real world data sets considerably is not in a proper
manner. They may lead to have incomplete or missing values.
Identifying a missed attributes is a challenging task. To impute the
missing data, data preprocessing has to be done. Data preprocessing
is a data mining process to cleanse the data. Handling missing data is
a crucial part in any data mining techniques. Major industries and
many real time applications hardly worried about their data. Because
loss of data leads the company growth goes down. For example,
health care industry has many datas about the patient details. To
diagnose the particular patient we need an exact data. If these exist
missing attribute values means it is very difficult to retain the datas.
Considering the drawback of missing values in the data mining
process, many techniques and algorithms were implemented and
many of them not so efficient. This paper tends to elaborate the
various techniques and machine learning approaches in handling
missing attribute values and made a comparative analysis to identify
the efficient method.
Keywords— Data mining, data set, impute, missing attributes,
preprocessing
I. INTRODUCTION
Incomplete data’s is very common in the large and huge data
bases. Technically, some attribute values are missing leads the
database inconsistent state. Data preprocessing is very essential
process to address the missing attribute values. Typically, they
can replace the missing values with many possible approaches.
We need certain knowledge to predict whether the data is
missed or not. [1] Many real world applications taking
complicated decisions to handle missing data. For example in a
health care industry, if doctor have to examine the patient, he /
she have to check for the patient history to predict the result.
Not only health care industry, there are many corporate
concerns also worried about their missed data. There are many
approaches and techniques that are handling for incomplete
data. There are many drawbacks that lead to having missing
attribute values that includes loss of efficiency, complication to
manage and analyze data, bias resulting from differences
between missing and complete data. [2] In order to avoid the
negative effects in the analysis of data mining algorithms.
When missing values are present, different approaches are
employed to prepare and cleanse the data. This is critical as
many existing industrial and research data’s contains missing
values. Missing data’s may lead to imperfection thus it will
lead to preprocessing stage such that data’s can be cleaned
completely. This step improves the extraction process and
inconsequence, the results obtained in any data mining
algorithm. The simplest way of dealing with this problem is
mainly to discard the examples with missing values and
analysis of complete examples does not lead to the serious
problem during inference. In this paper, we compared various
machine learning approaches to handle incomplete data.
Types of missing data
MCAR - (Missing completely at random) Values in the data’s
are said to be MCAR, if any of the data item being missing are
observable and non-observable parameters. It will occur at
random in rare. This is majorly identified in observable and non
– observable parameters.
MAR – (Missing at random) this is a type to handle missing
data. It occurs when the missing ness is related to a particular
variable, but it is not related to the value of the particular
variable that has missing data. Missing at random is the type
which sis going to make the decision in which type of attribute
or variables in the data sets.
NMAR– (Not missing at random) Data’s that are missing for
the specific reason. This is a common type of data handling.
ACTION MAR NMAR
ASSUMPTION Weaker Violated
PARAMETER Partial /
distinct
Good
DATA Information No information
TEST Not fit Not fit
RESULT Plausible Sensitive
Table 1.1 Comparisons of MAR and NMAR
II. MACHINE LEARNING TECHNIQUES
Missing data is the major problem in many real time
applications. There are many possibilities that may occur to
handle missing data including irresponsible to the questionnaire
and so on. Many new approaches have been proposed and
developed for incomplete data handling. [3] Generally, missing
data have the concept of ignoring techniques that simply omits
the cases that contain missing data. Rather removing the
missing data, we can remove the missing data by imputing the
data by replacing the accurate values.There are many
imputation techniques and methods that have proposed for the
data which has missing. There are many imputation methods
have been proposed that includes regression and multiple
imputation.
A. Regression Imputation: This is more useful imputation
technique especially for single imputation for regression based
analysis. Here, predicted data will be replaced as much of
missing data available. This method is thoroughly a prediction
and assumption based linear relationship between many
attributes. [4] Mostly we cannot expect the relationship to be
more linear. We use completers to calculate the regression of
any incomplete variable on other complete variable. Thus
regression imputation has a good imputing technique to handle
incomplete data. Regression has majorly classified with
classification and regression. If we have to address the missing
attributes by taking the complete attributes is actually called as
classification. If we want to take continuous incomplete data
Integrated Intelligent Research(IIR) International Journal of Business Intelligent
Volume: 04 Issue: 01 June 2015,Pages No.26- 29
ISSN: 2278-2400
27
sets, we have to take all that attributes to address complete
data sets. So considering the major algorithm that is mainly
helps us to address the complete attributes to present the
output with efficient data.
B. Multiple Imputations: As the name implies that has
multiple imputed data. [5] By replacing the missing values
with some number of set of n plausible values taken for the
predictive distribution to the state. Over all estimation has
been evaluated and that can analyze by complete data methods
to avoid problems that we faced in single imputation. By this
technique, it can relieve the distortion of the sample various
and produces unbiased estimates, but data must meet normal
distribution assumptions by the storage requirements. There
are many other techniques and methods which have proposed
by machine learning methods for their respective studies, in
which many of the algorithms also proposed efficiently.
C. K-Nearest Neighbor with imputation (KNNI):Using
instance – based algorithm, every time the missing data occur
we can call that as instance. This imputation computes a value
after the data’s imputed. For nominal attributes, KNNI will
complete with most nearest neighbor. Therefore, a proximate
measurement will be taken. KNNI is an imputation technique
that is mainly going to take the missed attributes data by
taking the complete or nearest attributes in the data set. After
taking the neighbor attributes the actual attribute can identify
by taking the probable and possible values that has to replace
in the actual missed attributes. This technique is not so
efficient approach because this can lead the replicated data.
Because this approach is substituting the value with maximum
likelihood that is present in the particular data sets. So this can
mislead the attribute data highly plausible values.
D. Fuzzy K – Means Clustering: We know that the
clustering is the technique to group the data’s into various
clusters. Here, in fuzzy clustering, each data object has a
membership function that describes the degree to which the
data has belongs to a certain cluster. [6] To update the
membership functions, we require fuzzy K – Means
Clustering. In this process, the data object cannot be assigned
to a concrete cluster that is represented by the cluster centroid.
By replacing various non – reference attributes for each
incomplete data object based on the information about
membership degrees.
E. Expectation – Maximization Technique: Estimating the
mean and covariance matrix, we can impute the data’s that are
missing by the technique called Expectation – Maximization
(EM). [7] The steps for EM are, first the records each
regression parameters of the variables with missing value’s
among the variable that has missing value and should compute
the mean and covariance matrix. Second, each record that has
missing values have to replace with expected data’s or values
being the product of the available values and estimated
regression coefficients. Third, the identified mean and
covariance matrix have to re estimated and the sample mean of
the completed data sets have to identify and that to estimate
the imputation error.
F. Support Vector Machine:To analyze data and recognize
patterns used for classification and regression based analysis.
Given a set of training examples, we have some training
algorithms that build a model that assigns new examples into
one category or the other, making some of the probabilistic
binary linear classifier. Formally a SVM can construct a hyper
plane in a high – or infinite dimensional space, which can be
used for various tasks. [8] To keep the computational load
reasonable, the mapping used by SVM schemes are designed
to ensure that dot products may be computed easily in terms of
the variables in the original space, by defining the term called
kernel function K(x, y).
G. Outlier detection: Data analysis has a large number of
variables that are being recorded or sampled. One of the steps
to obtaining a coherent analysis is the detection of outlaying
observations. It is an observation that appears to deviate
markedly from other observations in the sample. An outlier
may indicate wrong data. For example, the data may have
been coded incorrectly or an experiment may not have been
run correctly. If it can be determined that an outlaying point is
in fact erroneous, then the outlaying value should be deleted
from the analysis. In some cases, it may not be possible to
determine if an outlying point is bad data. [9] Outliers may be
due to random variation or may indicate something
scientifically interesting. Labeling, accommodation and
identification are three issues in outlier detection. Outlier
detection is the major technique that encompasses the
variables to be fit into the data set’s in order to avoid the
missing attributes.
Labeling is the flag potential outliers for further investigation.
This is nothing but a identifying a unique variable in the data
sets.
Accommodationuses robust statistical techniques that will not
be affected by outliers. That is, if we cannot determine the
potential outliers are erroneous observations.
Identification is formally test whether observations are
outliers. This can identify the attribute that is actually in
missed value. By taking the outlier detection, it is going to
ignore and eliminate the values and attributes that is not fit
into the data sets.
H. Iterative linear Fitting Method (ILF): This method
belongs to the category of regression – based methods, which
substitutes the missing data based on the maximum likelihood
function under specific modeling assumptions. [10] The linear
regression model is assumed for the data sets for simplicity.
This method predicts the data that are missing attributes for
each in turn. The iterative method is the technique that has a
procedure in mathematical procedure that generates a
sequence of approximate values for a class of problems.
According to the initial approximation called convergent
method. Iterative linear fitting method will be very efficient
algorithm to predict the assigned values and to address missing
or missed data sets respectively. Hence forth the machine
approaches firmly classified to avoid the missed attributes and
to cleanse the data.
Technique Variable
type
Substitution Possib
ility
Regression Incomplete Prediction Yes
Multiple Incomplete Estimation Yes
KNNI Incomplete Prediction Yes
Fuzzyk-
means
Probable Initialization Yes
E-M Iterative Initialization No
Integrated Intelligent Research(IIR) International Journal of Business Intelligent
Volume: 04 Issue: 01 June 2015,Pages No.26- 29
ISSN: 2278-2400
28
SVM Incomplete Prediction Yes
Outlier
detection
Incomplete Distribution Yes
ITF Numerical Iteration Yes
Table 2.1 Comparisons of MLT
III. REAL WORLD APPLICATIONS
A. Bioinformatics: It has become an important part in many
areas of biology. Data’s that includes images and signal
processing allow the extraction of useful results from large
amounts of raw data. Manual interpretation of using biological
tools is called as bioinformatics. Generally in the medical
related industry, it includes the database, analysis and
statistical algorithms. There are many biological data’s that
includes DNA, RNA, Protein, 3D structure, Genomic DNA,
Metabolic data etc.There are many applications in a
bioinformatics industry such as molecular medicine,
personalized medicine, gene therapy, drug development, waste
cleanup, biotechnology and anti – biotic resistance. To
maintain database such as protein sequence database,
secondary database, protein pattern database, structural
classification database is a major challenge. If any of these
database have any missing attributes we should implement
data mining algorithms to follow up the missed data very
efficiently. Many new machine approaches proposed to
maintain the data’s easier. Bioinformatics is the major and
important application that is using machine learning
approaches for various processes. There are many real time
data sets in various repositories. Many sample data sets have
tested and make use of many machine learning techniques in
order to address the actual attribute in the data sets.
B. Database marketing:Database marketing is a major trend
that has improved form of direct marketing. DBM is an
interactive approach to marketing, which uses the individual
addressable marketing media and channels. To extend help to
a company’s target audience, to estimate their demand and to
maintain database electronically. In marketing, there are many
sources of data that includes consumer data, business data,
analytics and modeling. As the name implies, DBM can used
by any organization that the data’s are available for the
customers as possible. Database marketing is the major
important real world application that can make use of the
marketing in various needs. In marketing, there are many
techniques that the company can maintain their strategy. Each
has major differences among various corporate worlds.There
are many users often building elaborate database for
maintaining customers’ information. These may include a
variety of data including name and address. As we know B2B,
Business to Business company marketers, customers are of
many companies can withstand and maintain the database.
C. Pattern recognition:We can generally categorize according
to the type of learning procedure used to generate the output
value. There are a set of training data has been provided
consisting of a set of instances that have been properly labeled
by hand with correct output. Within medical science, pattern
recognition is the basis for computer aided diagnosis systems.
Many machine learning algorithms has been proposed that
includes clustering, neural networks, regression based
methods, sequence labeled algorithms to make a data very
quality without any missing data present. In health care
industry, there are various parameters and data sets available.
So the attributes which is firmly missed that could identify by
considering the various patterns that is suited to the data sets in
the form of exactly cluster groups. These cluster groups can
consider and identify the various missed attributes in the data
sets. Pattern recognition is major important real world
application mainly in medical industry.
D. Robot locomotion:The word robot makes us to fill out
human intervention in any data’s. This was implemented
especially to develop the capabilities for robots to
autonomously decide how the robots have to develop. There
are many types of robots developed for many human needs the
way the prediction of any task to get completed.How could it
help then by machine learning? The techniques filled with
some data’s and can call as huge data’s. So here any of the
directions or any potential things missed means it is widely
makes a problem. So we can use many new machine learning
approaches to make use of robots very well. Locomotion is
nothing but the movement has to make by the robot to do any
task. So there are various movements and actions can do by
the robots and there are many dimensions and approximated
values can be identified. So this process is little easy by using
machine learning approaches. Using various techniques if any
of the dimensional values missed means, we can easily
identify the values by predicting the values by using regression
and classification. Various supervised learning and
unsupervised learning has developed for identifying many
techniques in much real world application.
IV.CHALLENGES
As data’s growing larger and even there are many machine
learning approaches and techniques. Still there may have some
loss of quality to that intend. So formally there are many
challenges out come by the word missing attributes that are
mainly reflected in the quality of data. Many real world
applications formally working with huge amount of data.If any
of the data’s missed means that will reflected to major concern.
So by filling the missing values into the equivalent probable
value or by simply eliminating the missed group or by ignoring
the actual missed data’s may lead to the loss of efficiency. So
the data’s shall say to be missed before going to the data
preprocessing. Although many new techniques impressed
companies and even they are taking and picking up some of the
technique still there may have some drawbacks. The main
challenges in addressing missing value attributes are the loss of
quality. This tends the data’s to go down. So considering the
data’s to be more formal we are about to make a prediction and
replace the values exactly in deed. Replacing is also the way
that we may feel not good. Rather we can go for some other
techniques to achieve. We have identified major challenges
faced by many real time applications and even some draw
backs of present machine learning approaches.
V.CONCLUSION
In this paper we have briefly discussed about the various
techniques of missing attributes. We have discussed about
various applications that are broadly facing this type of missing
attribute values. We have discussed about many machine
learning approaches. Because we have many mechanisms to
Integrated Intelligent Research(IIR) International Journal of Business Intelligent
Volume: 04 Issue: 01 June 2015,Pages No.26- 29
ISSN: 2278-2400
29
handle missing attribute values. Many techniques came up with
many added advantages somehow there are some drawbacks in
many machine learning approaches. So considering this into
the major perspective researchers are probably move onto the
evaluating the missed data by calculating manually using many
mathematical formulae and by many statistical software that
can retain the data that is actually missed in the data set. But
many of the algorithms do not in metric to achieve the
efficiency in data. So considering this we can use and
implement new algorithm and techniques that could eradicate
the missing attribute completely and we can propose the new
efficiency methods to achieve quality data. Because presence
of missing attributes may lead the database to go inconsistent
state. To avoid this we need to process and clean our data in
such a manner. Cleansing the data will be the most efficient
way to eradicate missing attribute values. Consequently every
approach has been proposed and so that we can achieve the
quality data with no missing attributes.
REFERENCE
[1] Y. S. Su, A. Gelman, J. Hill and M. Yajima, “Multiple Imputations with
Diagnostics (mi) in R: Opening Windows into the Black Box, 2014”
Journal of Statistical Software.
[2] R.J.A little and D.B Rubin, “Statistical analysis with missing data”, 2013
Wiley, New Jersey.
[3] A. Misrli, A. Benes, and R. Kale:“Artificial based software defect
predictors: Applications and benefits in a case study “2013AI Magazine.
[4] Wang, S.Y. and Lin, C.C. NCTUns 5.0: A Network Simulator for IEEE
802.11(p) and 1609 Wireless Vehicular Network Researches. Second
IEEE.Int.Symp.Wireless, Vehicular Communications, Calgary, 2013
Canada,
[5] E. Acar and B. Yener. Unsupervised multi way data analysis: A
literature survey, 2012
[6] Acuna E, Rodriguez C Classification, clustering and data mining
applications. Springer, 2011 Berlin, pp. 639–648.
[7] Alcalá-fdez J, Sánchez L, Garcia S, Jesus MJD, Ventura S, Garrell JM,
Otero J, Bacardit J, Rivas VM, Fernandez JC, Herrera F Keel: a software
tool to Assess evolutionary algorithms for data mining problems.
2011Soft Computing 13(3):307–318
[8] Luengo J, Garcia S, Herrera F A study on the use of imputation methods
for experimentation with Radial Basis Function Network classifiers
handling missing attribute values: the good synergy between RBFNs and
Event Covering method. Neural Network 23(3):406–418, 2010
[9] Qin B, Xia Y, Prabhakar S Rule induction for uncertain data. Knowledge
Info System: 10.1007/ s10115-010-0335-7, pp. 1–2, 2010
[10] Wang H, Wang S Mining incomplete survey data through classification.
Knowledge Info System 24(2):221–233, 2010.
[11] Peng C, Zhu J (2008) Comparison of two approaches for handling
missing covariates in logistic regression.68 (1):58–77
[12] Farhangfar A, et al A novel framework for imputation of missing values
in databases. IEEE

Mais conteúdo relacionado

Semelhante a Machine Learning Approaches and its Challenges

An efficient feature selection algorithm for health care data analysis
An efficient feature selection algorithm for health care data analysisAn efficient feature selection algorithm for health care data analysis
An efficient feature selection algorithm for health care data analysisjournalBEEI
 
IRJET- Disease Prediction System
IRJET- Disease Prediction SystemIRJET- Disease Prediction System
IRJET- Disease Prediction SystemIRJET Journal
 
Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...nalini manogaran
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET Journal
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - ReportAkanksha Gohil
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionIRJET Journal
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2Gokulks007
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningNandakumar P
 
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET Journal
 
Fault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clusteringFault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clusteringIRJET Journal
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfAnanthReddy38
 
Survey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction TechniquesSurvey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction TechniquesIRJET Journal
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSIJCI JOURNAL
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...ijsc
 
Data Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data QualityData Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data Qualitypriyanka rajput
 
Lecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptxLecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptxJayChauhan100
 
Twala2007.doc
Twala2007.docTwala2007.doc
Twala2007.docbutest
 
prediction using data mining.pdf
prediction using data mining.pdfprediction using data mining.pdf
prediction using data mining.pdfNavAhmed3
 

Semelhante a Machine Learning Approaches and its Challenges (20)

An efficient feature selection algorithm for health care data analysis
An efficient feature selection algorithm for health care data analysisAn efficient feature selection algorithm for health care data analysis
An efficient feature selection algorithm for health care data analysis
 
IRJET- Disease Prediction System
IRJET- Disease Prediction SystemIRJET- Disease Prediction System
IRJET- Disease Prediction System
 
Data processing
Data processingData processing
Data processing
 
Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...
 
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
IRJET- Prediction of Crime Rate Analysis using Supervised Classification Mach...
 
Data Analytics Using R - Report
Data Analytics Using R - ReportData Analytics Using R - Report
Data Analytics Using R - Report
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
UNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data MiningUNIT 2: Part 2: Data Warehousing and Data Mining
UNIT 2: Part 2: Data Warehousing and Data Mining
 
IRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current ApproachesIRJET- A Review of Data Cleaning and its Current Approaches
IRJET- A Review of Data Cleaning and its Current Approaches
 
Fault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clusteringFault detection of imbalanced data using incremental clustering
Fault detection of imbalanced data using incremental clustering
 
Dissertation
DissertationDissertation
Dissertation
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
Survey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction TechniquesSurvey on Feature Selection and Dimensionality Reduction Techniques
Survey on Feature Selection and Dimensionality Reduction Techniques
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
 
Data Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data QualityData Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data Quality
 
Lecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptxLecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptx
 
Twala2007.doc
Twala2007.docTwala2007.doc
Twala2007.doc
 
prediction using data mining.pdf
prediction using data mining.pdfprediction using data mining.pdf
prediction using data mining.pdf
 

Mais de ijcnes

A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...ijcnes
 
Economic Growth of Information Technology (It) Industry on the Indian Economy
Economic Growth of Information Technology (It) Industry on the Indian EconomyEconomic Growth of Information Technology (It) Industry on the Indian Economy
Economic Growth of Information Technology (It) Industry on the Indian Economyijcnes
 
An analysis of Mobile Learning Implementation in Shinas College of Technology...
An analysis of Mobile Learning Implementation in Shinas College of Technology...An analysis of Mobile Learning Implementation in Shinas College of Technology...
An analysis of Mobile Learning Implementation in Shinas College of Technology...ijcnes
 
A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...
A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...
A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...ijcnes
 
Challenges of E-government in Oman
Challenges of E-government in OmanChallenges of E-government in Oman
Challenges of E-government in Omanijcnes
 
Power Management in Micro grid Using Hybrid Energy Storage System
Power Management in Micro grid Using Hybrid Energy Storage SystemPower Management in Micro grid Using Hybrid Energy Storage System
Power Management in Micro grid Using Hybrid Energy Storage Systemijcnes
 
Holistic Forecasting of Onset of Diabetes through Data Mining Techniques
Holistic Forecasting of Onset of Diabetes through Data Mining TechniquesHolistic Forecasting of Onset of Diabetes through Data Mining Techniques
Holistic Forecasting of Onset of Diabetes through Data Mining Techniquesijcnes
 
A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...
A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...
A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...ijcnes
 
Feature Extraction in Content based Image Retrieval
Feature Extraction in Content based Image RetrievalFeature Extraction in Content based Image Retrieval
Feature Extraction in Content based Image Retrievalijcnes
 
Challenges and Mechanisms for Securing Data in Mobile Cloud Computing
Challenges and Mechanisms for Securing Data in Mobile Cloud ComputingChallenges and Mechanisms for Securing Data in Mobile Cloud Computing
Challenges and Mechanisms for Securing Data in Mobile Cloud Computingijcnes
 
Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...
Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...
Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...ijcnes
 
Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...
Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...
Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...ijcnes
 
An Effective and Scalable AODV for Wireless Ad hoc Sensor Networks
An Effective and Scalable AODV for Wireless Ad hoc Sensor NetworksAn Effective and Scalable AODV for Wireless Ad hoc Sensor Networks
An Effective and Scalable AODV for Wireless Ad hoc Sensor Networksijcnes
 
Secured Seamless Wi-Fi Enhancement in Dynamic Vehicles
Secured Seamless Wi-Fi Enhancement in Dynamic VehiclesSecured Seamless Wi-Fi Enhancement in Dynamic Vehicles
Secured Seamless Wi-Fi Enhancement in Dynamic Vehiclesijcnes
 
Virtual Position based Olsr Protocol for Wireless Sensor Networks
Virtual Position based Olsr Protocol for Wireless Sensor NetworksVirtual Position based Olsr Protocol for Wireless Sensor Networks
Virtual Position based Olsr Protocol for Wireless Sensor Networksijcnes
 
Mitigation and control of Defeating Jammers using P-1 Factorization
Mitigation and control of Defeating Jammers using P-1 FactorizationMitigation and control of Defeating Jammers using P-1 Factorization
Mitigation and control of Defeating Jammers using P-1 Factorizationijcnes
 
An analysis and impact factors on Agriculture field using Data Mining Techniques
An analysis and impact factors on Agriculture field using Data Mining TechniquesAn analysis and impact factors on Agriculture field using Data Mining Techniques
An analysis and impact factors on Agriculture field using Data Mining Techniquesijcnes
 
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...ijcnes
 
Priority Based Multi Sen Car Technique in WSN
Priority Based Multi Sen Car Technique in WSNPriority Based Multi Sen Car Technique in WSN
Priority Based Multi Sen Car Technique in WSNijcnes
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based Systemijcnes
 

Mais de ijcnes (20)

A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...
 
Economic Growth of Information Technology (It) Industry on the Indian Economy
Economic Growth of Information Technology (It) Industry on the Indian EconomyEconomic Growth of Information Technology (It) Industry on the Indian Economy
Economic Growth of Information Technology (It) Industry on the Indian Economy
 
An analysis of Mobile Learning Implementation in Shinas College of Technology...
An analysis of Mobile Learning Implementation in Shinas College of Technology...An analysis of Mobile Learning Implementation in Shinas College of Technology...
An analysis of Mobile Learning Implementation in Shinas College of Technology...
 
A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...
A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...
A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...
 
Challenges of E-government in Oman
Challenges of E-government in OmanChallenges of E-government in Oman
Challenges of E-government in Oman
 
Power Management in Micro grid Using Hybrid Energy Storage System
Power Management in Micro grid Using Hybrid Energy Storage SystemPower Management in Micro grid Using Hybrid Energy Storage System
Power Management in Micro grid Using Hybrid Energy Storage System
 
Holistic Forecasting of Onset of Diabetes through Data Mining Techniques
Holistic Forecasting of Onset of Diabetes through Data Mining TechniquesHolistic Forecasting of Onset of Diabetes through Data Mining Techniques
Holistic Forecasting of Onset of Diabetes through Data Mining Techniques
 
A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...
A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...
A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...
 
Feature Extraction in Content based Image Retrieval
Feature Extraction in Content based Image RetrievalFeature Extraction in Content based Image Retrieval
Feature Extraction in Content based Image Retrieval
 
Challenges and Mechanisms for Securing Data in Mobile Cloud Computing
Challenges and Mechanisms for Securing Data in Mobile Cloud ComputingChallenges and Mechanisms for Securing Data in Mobile Cloud Computing
Challenges and Mechanisms for Securing Data in Mobile Cloud Computing
 
Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...
Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...
Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...
 
Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...
Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...
Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...
 
An Effective and Scalable AODV for Wireless Ad hoc Sensor Networks
An Effective and Scalable AODV for Wireless Ad hoc Sensor NetworksAn Effective and Scalable AODV for Wireless Ad hoc Sensor Networks
An Effective and Scalable AODV for Wireless Ad hoc Sensor Networks
 
Secured Seamless Wi-Fi Enhancement in Dynamic Vehicles
Secured Seamless Wi-Fi Enhancement in Dynamic VehiclesSecured Seamless Wi-Fi Enhancement in Dynamic Vehicles
Secured Seamless Wi-Fi Enhancement in Dynamic Vehicles
 
Virtual Position based Olsr Protocol for Wireless Sensor Networks
Virtual Position based Olsr Protocol for Wireless Sensor NetworksVirtual Position based Olsr Protocol for Wireless Sensor Networks
Virtual Position based Olsr Protocol for Wireless Sensor Networks
 
Mitigation and control of Defeating Jammers using P-1 Factorization
Mitigation and control of Defeating Jammers using P-1 FactorizationMitigation and control of Defeating Jammers using P-1 Factorization
Mitigation and control of Defeating Jammers using P-1 Factorization
 
An analysis and impact factors on Agriculture field using Data Mining Techniques
An analysis and impact factors on Agriculture field using Data Mining TechniquesAn analysis and impact factors on Agriculture field using Data Mining Techniques
An analysis and impact factors on Agriculture field using Data Mining Techniques
 
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
 
Priority Based Multi Sen Car Technique in WSN
Priority Based Multi Sen Car Technique in WSNPriority Based Multi Sen Car Technique in WSN
Priority Based Multi Sen Car Technique in WSN
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based System
 

Último

Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...soginsider
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxSCMS School of Architecture
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...Health
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdfKamal Acharya
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksMagic Marks
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxSCMS School of Architecture
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 

Último (20)

Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic Marks
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 

Machine Learning Approaches and its Challenges

  • 1. Integrated Intelligent Research(IIR) International Journal of Business Intelligent Volume: 04 Issue: 01 June 2015,Pages No.26- 29 ISSN: 2278-2400 26 Machine Learning Approaches and its Challenges Raj Mohan Kumaravel1 , Ilango Paramasivam2 1 Research Scholar, School of Information technology & Engineering VIT University, Vellore 2 Professor, School of Computing Science & Engineering, VIT University, Vellore E-Mail: k.rajmohan90@gmail.com, pilango@vit.ac.in Abstract – Real world data sets considerably is not in a proper manner. They may lead to have incomplete or missing values. Identifying a missed attributes is a challenging task. To impute the missing data, data preprocessing has to be done. Data preprocessing is a data mining process to cleanse the data. Handling missing data is a crucial part in any data mining techniques. Major industries and many real time applications hardly worried about their data. Because loss of data leads the company growth goes down. For example, health care industry has many datas about the patient details. To diagnose the particular patient we need an exact data. If these exist missing attribute values means it is very difficult to retain the datas. Considering the drawback of missing values in the data mining process, many techniques and algorithms were implemented and many of them not so efficient. This paper tends to elaborate the various techniques and machine learning approaches in handling missing attribute values and made a comparative analysis to identify the efficient method. Keywords— Data mining, data set, impute, missing attributes, preprocessing I. INTRODUCTION Incomplete data’s is very common in the large and huge data bases. Technically, some attribute values are missing leads the database inconsistent state. Data preprocessing is very essential process to address the missing attribute values. Typically, they can replace the missing values with many possible approaches. We need certain knowledge to predict whether the data is missed or not. [1] Many real world applications taking complicated decisions to handle missing data. For example in a health care industry, if doctor have to examine the patient, he / she have to check for the patient history to predict the result. Not only health care industry, there are many corporate concerns also worried about their missed data. There are many approaches and techniques that are handling for incomplete data. There are many drawbacks that lead to having missing attribute values that includes loss of efficiency, complication to manage and analyze data, bias resulting from differences between missing and complete data. [2] In order to avoid the negative effects in the analysis of data mining algorithms. When missing values are present, different approaches are employed to prepare and cleanse the data. This is critical as many existing industrial and research data’s contains missing values. Missing data’s may lead to imperfection thus it will lead to preprocessing stage such that data’s can be cleaned completely. This step improves the extraction process and inconsequence, the results obtained in any data mining algorithm. The simplest way of dealing with this problem is mainly to discard the examples with missing values and analysis of complete examples does not lead to the serious problem during inference. In this paper, we compared various machine learning approaches to handle incomplete data. Types of missing data MCAR - (Missing completely at random) Values in the data’s are said to be MCAR, if any of the data item being missing are observable and non-observable parameters. It will occur at random in rare. This is majorly identified in observable and non – observable parameters. MAR – (Missing at random) this is a type to handle missing data. It occurs when the missing ness is related to a particular variable, but it is not related to the value of the particular variable that has missing data. Missing at random is the type which sis going to make the decision in which type of attribute or variables in the data sets. NMAR– (Not missing at random) Data’s that are missing for the specific reason. This is a common type of data handling. ACTION MAR NMAR ASSUMPTION Weaker Violated PARAMETER Partial / distinct Good DATA Information No information TEST Not fit Not fit RESULT Plausible Sensitive Table 1.1 Comparisons of MAR and NMAR II. MACHINE LEARNING TECHNIQUES Missing data is the major problem in many real time applications. There are many possibilities that may occur to handle missing data including irresponsible to the questionnaire and so on. Many new approaches have been proposed and developed for incomplete data handling. [3] Generally, missing data have the concept of ignoring techniques that simply omits the cases that contain missing data. Rather removing the missing data, we can remove the missing data by imputing the data by replacing the accurate values.There are many imputation techniques and methods that have proposed for the data which has missing. There are many imputation methods have been proposed that includes regression and multiple imputation. A. Regression Imputation: This is more useful imputation technique especially for single imputation for regression based analysis. Here, predicted data will be replaced as much of missing data available. This method is thoroughly a prediction and assumption based linear relationship between many attributes. [4] Mostly we cannot expect the relationship to be more linear. We use completers to calculate the regression of any incomplete variable on other complete variable. Thus regression imputation has a good imputing technique to handle incomplete data. Regression has majorly classified with classification and regression. If we have to address the missing attributes by taking the complete attributes is actually called as classification. If we want to take continuous incomplete data
  • 2. Integrated Intelligent Research(IIR) International Journal of Business Intelligent Volume: 04 Issue: 01 June 2015,Pages No.26- 29 ISSN: 2278-2400 27 sets, we have to take all that attributes to address complete data sets. So considering the major algorithm that is mainly helps us to address the complete attributes to present the output with efficient data. B. Multiple Imputations: As the name implies that has multiple imputed data. [5] By replacing the missing values with some number of set of n plausible values taken for the predictive distribution to the state. Over all estimation has been evaluated and that can analyze by complete data methods to avoid problems that we faced in single imputation. By this technique, it can relieve the distortion of the sample various and produces unbiased estimates, but data must meet normal distribution assumptions by the storage requirements. There are many other techniques and methods which have proposed by machine learning methods for their respective studies, in which many of the algorithms also proposed efficiently. C. K-Nearest Neighbor with imputation (KNNI):Using instance – based algorithm, every time the missing data occur we can call that as instance. This imputation computes a value after the data’s imputed. For nominal attributes, KNNI will complete with most nearest neighbor. Therefore, a proximate measurement will be taken. KNNI is an imputation technique that is mainly going to take the missed attributes data by taking the complete or nearest attributes in the data set. After taking the neighbor attributes the actual attribute can identify by taking the probable and possible values that has to replace in the actual missed attributes. This technique is not so efficient approach because this can lead the replicated data. Because this approach is substituting the value with maximum likelihood that is present in the particular data sets. So this can mislead the attribute data highly plausible values. D. Fuzzy K – Means Clustering: We know that the clustering is the technique to group the data’s into various clusters. Here, in fuzzy clustering, each data object has a membership function that describes the degree to which the data has belongs to a certain cluster. [6] To update the membership functions, we require fuzzy K – Means Clustering. In this process, the data object cannot be assigned to a concrete cluster that is represented by the cluster centroid. By replacing various non – reference attributes for each incomplete data object based on the information about membership degrees. E. Expectation – Maximization Technique: Estimating the mean and covariance matrix, we can impute the data’s that are missing by the technique called Expectation – Maximization (EM). [7] The steps for EM are, first the records each regression parameters of the variables with missing value’s among the variable that has missing value and should compute the mean and covariance matrix. Second, each record that has missing values have to replace with expected data’s or values being the product of the available values and estimated regression coefficients. Third, the identified mean and covariance matrix have to re estimated and the sample mean of the completed data sets have to identify and that to estimate the imputation error. F. Support Vector Machine:To analyze data and recognize patterns used for classification and regression based analysis. Given a set of training examples, we have some training algorithms that build a model that assigns new examples into one category or the other, making some of the probabilistic binary linear classifier. Formally a SVM can construct a hyper plane in a high – or infinite dimensional space, which can be used for various tasks. [8] To keep the computational load reasonable, the mapping used by SVM schemes are designed to ensure that dot products may be computed easily in terms of the variables in the original space, by defining the term called kernel function K(x, y). G. Outlier detection: Data analysis has a large number of variables that are being recorded or sampled. One of the steps to obtaining a coherent analysis is the detection of outlaying observations. It is an observation that appears to deviate markedly from other observations in the sample. An outlier may indicate wrong data. For example, the data may have been coded incorrectly or an experiment may not have been run correctly. If it can be determined that an outlaying point is in fact erroneous, then the outlaying value should be deleted from the analysis. In some cases, it may not be possible to determine if an outlying point is bad data. [9] Outliers may be due to random variation or may indicate something scientifically interesting. Labeling, accommodation and identification are three issues in outlier detection. Outlier detection is the major technique that encompasses the variables to be fit into the data set’s in order to avoid the missing attributes. Labeling is the flag potential outliers for further investigation. This is nothing but a identifying a unique variable in the data sets. Accommodationuses robust statistical techniques that will not be affected by outliers. That is, if we cannot determine the potential outliers are erroneous observations. Identification is formally test whether observations are outliers. This can identify the attribute that is actually in missed value. By taking the outlier detection, it is going to ignore and eliminate the values and attributes that is not fit into the data sets. H. Iterative linear Fitting Method (ILF): This method belongs to the category of regression – based methods, which substitutes the missing data based on the maximum likelihood function under specific modeling assumptions. [10] The linear regression model is assumed for the data sets for simplicity. This method predicts the data that are missing attributes for each in turn. The iterative method is the technique that has a procedure in mathematical procedure that generates a sequence of approximate values for a class of problems. According to the initial approximation called convergent method. Iterative linear fitting method will be very efficient algorithm to predict the assigned values and to address missing or missed data sets respectively. Hence forth the machine approaches firmly classified to avoid the missed attributes and to cleanse the data. Technique Variable type Substitution Possib ility Regression Incomplete Prediction Yes Multiple Incomplete Estimation Yes KNNI Incomplete Prediction Yes Fuzzyk- means Probable Initialization Yes E-M Iterative Initialization No
  • 3. Integrated Intelligent Research(IIR) International Journal of Business Intelligent Volume: 04 Issue: 01 June 2015,Pages No.26- 29 ISSN: 2278-2400 28 SVM Incomplete Prediction Yes Outlier detection Incomplete Distribution Yes ITF Numerical Iteration Yes Table 2.1 Comparisons of MLT III. REAL WORLD APPLICATIONS A. Bioinformatics: It has become an important part in many areas of biology. Data’s that includes images and signal processing allow the extraction of useful results from large amounts of raw data. Manual interpretation of using biological tools is called as bioinformatics. Generally in the medical related industry, it includes the database, analysis and statistical algorithms. There are many biological data’s that includes DNA, RNA, Protein, 3D structure, Genomic DNA, Metabolic data etc.There are many applications in a bioinformatics industry such as molecular medicine, personalized medicine, gene therapy, drug development, waste cleanup, biotechnology and anti – biotic resistance. To maintain database such as protein sequence database, secondary database, protein pattern database, structural classification database is a major challenge. If any of these database have any missing attributes we should implement data mining algorithms to follow up the missed data very efficiently. Many new machine approaches proposed to maintain the data’s easier. Bioinformatics is the major and important application that is using machine learning approaches for various processes. There are many real time data sets in various repositories. Many sample data sets have tested and make use of many machine learning techniques in order to address the actual attribute in the data sets. B. Database marketing:Database marketing is a major trend that has improved form of direct marketing. DBM is an interactive approach to marketing, which uses the individual addressable marketing media and channels. To extend help to a company’s target audience, to estimate their demand and to maintain database electronically. In marketing, there are many sources of data that includes consumer data, business data, analytics and modeling. As the name implies, DBM can used by any organization that the data’s are available for the customers as possible. Database marketing is the major important real world application that can make use of the marketing in various needs. In marketing, there are many techniques that the company can maintain their strategy. Each has major differences among various corporate worlds.There are many users often building elaborate database for maintaining customers’ information. These may include a variety of data including name and address. As we know B2B, Business to Business company marketers, customers are of many companies can withstand and maintain the database. C. Pattern recognition:We can generally categorize according to the type of learning procedure used to generate the output value. There are a set of training data has been provided consisting of a set of instances that have been properly labeled by hand with correct output. Within medical science, pattern recognition is the basis for computer aided diagnosis systems. Many machine learning algorithms has been proposed that includes clustering, neural networks, regression based methods, sequence labeled algorithms to make a data very quality without any missing data present. In health care industry, there are various parameters and data sets available. So the attributes which is firmly missed that could identify by considering the various patterns that is suited to the data sets in the form of exactly cluster groups. These cluster groups can consider and identify the various missed attributes in the data sets. Pattern recognition is major important real world application mainly in medical industry. D. Robot locomotion:The word robot makes us to fill out human intervention in any data’s. This was implemented especially to develop the capabilities for robots to autonomously decide how the robots have to develop. There are many types of robots developed for many human needs the way the prediction of any task to get completed.How could it help then by machine learning? The techniques filled with some data’s and can call as huge data’s. So here any of the directions or any potential things missed means it is widely makes a problem. So we can use many new machine learning approaches to make use of robots very well. Locomotion is nothing but the movement has to make by the robot to do any task. So there are various movements and actions can do by the robots and there are many dimensions and approximated values can be identified. So this process is little easy by using machine learning approaches. Using various techniques if any of the dimensional values missed means, we can easily identify the values by predicting the values by using regression and classification. Various supervised learning and unsupervised learning has developed for identifying many techniques in much real world application. IV.CHALLENGES As data’s growing larger and even there are many machine learning approaches and techniques. Still there may have some loss of quality to that intend. So formally there are many challenges out come by the word missing attributes that are mainly reflected in the quality of data. Many real world applications formally working with huge amount of data.If any of the data’s missed means that will reflected to major concern. So by filling the missing values into the equivalent probable value or by simply eliminating the missed group or by ignoring the actual missed data’s may lead to the loss of efficiency. So the data’s shall say to be missed before going to the data preprocessing. Although many new techniques impressed companies and even they are taking and picking up some of the technique still there may have some drawbacks. The main challenges in addressing missing value attributes are the loss of quality. This tends the data’s to go down. So considering the data’s to be more formal we are about to make a prediction and replace the values exactly in deed. Replacing is also the way that we may feel not good. Rather we can go for some other techniques to achieve. We have identified major challenges faced by many real time applications and even some draw backs of present machine learning approaches. V.CONCLUSION In this paper we have briefly discussed about the various techniques of missing attributes. We have discussed about various applications that are broadly facing this type of missing attribute values. We have discussed about many machine learning approaches. Because we have many mechanisms to
  • 4. Integrated Intelligent Research(IIR) International Journal of Business Intelligent Volume: 04 Issue: 01 June 2015,Pages No.26- 29 ISSN: 2278-2400 29 handle missing attribute values. Many techniques came up with many added advantages somehow there are some drawbacks in many machine learning approaches. So considering this into the major perspective researchers are probably move onto the evaluating the missed data by calculating manually using many mathematical formulae and by many statistical software that can retain the data that is actually missed in the data set. But many of the algorithms do not in metric to achieve the efficiency in data. So considering this we can use and implement new algorithm and techniques that could eradicate the missing attribute completely and we can propose the new efficiency methods to achieve quality data. Because presence of missing attributes may lead the database to go inconsistent state. To avoid this we need to process and clean our data in such a manner. Cleansing the data will be the most efficient way to eradicate missing attribute values. Consequently every approach has been proposed and so that we can achieve the quality data with no missing attributes. REFERENCE [1] Y. S. Su, A. Gelman, J. Hill and M. Yajima, “Multiple Imputations with Diagnostics (mi) in R: Opening Windows into the Black Box, 2014” Journal of Statistical Software. [2] R.J.A little and D.B Rubin, “Statistical analysis with missing data”, 2013 Wiley, New Jersey. [3] A. Misrli, A. Benes, and R. Kale:“Artificial based software defect predictors: Applications and benefits in a case study “2013AI Magazine. [4] Wang, S.Y. and Lin, C.C. NCTUns 5.0: A Network Simulator for IEEE 802.11(p) and 1609 Wireless Vehicular Network Researches. Second IEEE.Int.Symp.Wireless, Vehicular Communications, Calgary, 2013 Canada, [5] E. Acar and B. Yener. Unsupervised multi way data analysis: A literature survey, 2012 [6] Acuna E, Rodriguez C Classification, clustering and data mining applications. Springer, 2011 Berlin, pp. 639–648. [7] Alcalá-fdez J, Sánchez L, Garcia S, Jesus MJD, Ventura S, Garrell JM, Otero J, Bacardit J, Rivas VM, Fernandez JC, Herrera F Keel: a software tool to Assess evolutionary algorithms for data mining problems. 2011Soft Computing 13(3):307–318 [8] Luengo J, Garcia S, Herrera F A study on the use of imputation methods for experimentation with Radial Basis Function Network classifiers handling missing attribute values: the good synergy between RBFNs and Event Covering method. Neural Network 23(3):406–418, 2010 [9] Qin B, Xia Y, Prabhakar S Rule induction for uncertain data. Knowledge Info System: 10.1007/ s10115-010-0335-7, pp. 1–2, 2010 [10] Wang H, Wang S Mining incomplete survey data through classification. Knowledge Info System 24(2):221–233, 2010. [11] Peng C, Zhu J (2008) Comparison of two approaches for handling missing covariates in logistic regression.68 (1):58–77 [12] Farhangfar A, et al A novel framework for imputation of missing values in databases. IEEE