2. PROBLEM
STATEMENT
• The most commonly used method for
diagnosis of CAD by physicians is
angiography.
• But, it has major side effects and high
cost is associated with it.
• Moreover, analyzing too many factors, to
diagnose a patient, makes the physician’s
job difficult.
• Conventional methods for the diagnosis
of heart disease are mainly based on
analysis of patients medical history,
review of relevant symptoms by a
medical practitioner and physical
examination report.
3. • Hence, these methods often lead to imprecise diagnosis due
to human errors.
• Thus, there is a need of development of an automated
diagnostic system based on machine learning for heart
disease diagnosis that can resolve these problems.
• A dataset is formed by taking into consideration some of the
information of 779 individuals. The problem is : based on the
given information about each individual we have to calculate
that whether that individual will suffer from heart disease.
• So for implementation, we have created a code which
comprises of several machine learning algorithms and solves
the above problem statement along with a generic
comparison between the performances of different
algorithms in this case.
5. • Healthcare is such an enormous domain. The use of data science is a
necessity for healthcare to form meaningful transformations. Using it in the
most efficient and powerful way to discover hidden correlations of risk
factors is the objective of this study.
• The aim is to analyze the coronary artery disease data sets and predict the
possibilities of a given patient to have heart disease. This study analyzes the
attributes' effect on the outcome of heart disease.
• The machine learning algorithms used for analysis were Logistic Regression,
Support Vector Machines (SVM), and Random Forest. The models' features
were tuned using ensemble methods of Stepwise Regression, Variable
Importance, Bortua, and Recursive Feature Elimination.
6. • These models were evaluated using cross-validation for the best models to
predict heart disease. The features in the data set were also evaluated
using parametric statistical techniques of chi-square tests and ANOVA.
• This study's goal is to find the most significant features of patients and the
most accurate machine learning algorithm for the most optimized and
tuned method for heart disease predictions.
• This report includes all the necessary visualizations, descriptions,
comments, and the results. It concludes with the significance of this study
to help combat heart disease.
8. Efficient heart
disease prediction
system using
decision tree -
Purushottam,
Kanak Saxena,
Richa Sharma.
• In this paper, effective mechanisms have
been used for chronic disease prediction
by mining the data containing historical
health records.
• Here, we used Naïve Bayes, Decision tree,
Support Vector Machine (SVM) and
Artificial Neural Networks (ANN) classifiers
for the diagnosis of diabetes and heart
disease.
9. An Automated Diagnostic
System for Heart Disease
Prediction Based on χ 2
Statistical Model and
Optimally Configured Deep
Neural Network -
LIAQAT ALI ATIQUR
RAHMAN , AURANGZEB
KHAN, etc
• To eliminate irrelevant features,
we propose to use χ 2 statistical
model while the optimally
configured deep neural network
(DNN) is searched by using
exhaustive search strategy.
• The proposed model achieves the
prediction accuracy of 93.33%.
The obtained results are
promising compared to the
previously reported methods. The
findings of the study suggest that
the proposed diagnostic system
can be used by physicians to
accurately predict heart disease.
10. An Optimized Stacked
Support Vector Machines
Based Expert System for
the Effective Prediction of
Heart Failure -
Liaqat Ali, Awais Niamat,
Javed Ali Khan,etc.
• In this paper, we introduce an expert
system that stacks two support
vector machine (SVM) models for the
effective prediction of HF. The first
SVM model is linear and
L 1 regularized. It has the capability to
eliminate irrelevant features by
shrinking their coefficients to zero.
The second SVM model is
L 2 regularized. It is used as a
predictive model. To optimize the
two models, we propose a hybrid grid
search algorithm (HGSA) that is
capable of optimizing the two models
simultaneously.
11. An Intelligent Learning
System Based on Random
Search Algorithm and
Optimized Random Forest
Model for Improved Heart
Disease Detection -
Ashir Javeed Shijie Zhou
Liao Yongjian
• System uses random search
algorithm (RSA) for features
selection and random forest
model for heart failure prediction.
The proposed diagnostic system is
optimized using grid search
algorithm. Two types of
experiments are performed to
evaluate the precision of the
proposed method.
• In the first experiment, only
random forest model is developed
while in the second experiment
the proposed RSA based random
forest model is developed.
12. Machine Learning and
End-to-End Deep
Learning for the
Detection of Chronic
Heart Failure From
Heart Sounds -
Martin Gjoreski, Anton
Gradišek,
• The method was evaluated on
recordings from 947 subjects from six
publicly available datasets and one
CHF dataset that was collected for
this study. Using the same evaluation
method as a recent PhysoNet
challenge, the proposed method
achieved a score of 89.3, which is 9.1
higher than the challenge's baseline
method. The method's aggregated
accuracy is 92.9%.
• Finally, we identified 15 expert
features that are useful for building
ML models to differentiate between
CHF phases with an accuracy of
93.2%.
14. There are many disease prediction systems which do not use some of the
risk factors such as age, sex, blood pressure, cholesterol, diabetes, etc.
Without using these vital risk factors; result will not be much accurate. In
this paper; 12 important risk factors are used to predict heart disease in
accurate manner. Dataset is imported from UCI Machine Learning
Repository.
The technique mentioned in this paper will optimize the weights of neural
network. It deals with the population i.e individual input string. First it will
select the input string and assign a fitness value. Based on those fitness
value a new offspring will be generated. Then followed by the crossover
process it will generate possibly a fit string so as to obtain optimized weight.
The new string generated at each stage is possibly a better than the
previous one. This is how the weights are optimized at each stage of genetic
process.
15. After the weights are optimized it is fed into neural network which
uses back propagation technique to train the network. The process of
neural network consist of activation function which is calculated at
hidden layer and output layer. The weights obtained at output layer
will be compared with the previous weights so as to calculate error.
By calculating the error new weights will be generated and it will
again fed into neural network. This process will continue until the
error function is minimum.
17. • We train our prediction model by analyzing existing data because we
already know whether each patient has heart disease. This process is
also known as supervision and learning.
• The trained model is then used to predict if users suffer from heart
disease. The training and prediction process is described as follows:
• First, data is divided into two parts using component splitting. In this
experiment, data is split based on a ratio of 80:20 for the training set
and the prediction set.
18. • The training set data is used in the logistic regression component for
model training, while the prediction set data is used in the prediction
component.
• The following classification models are used - Logistic Regression,
Random Forest Classfier, SVM, Naive Bayes Classifier, Decision Tree
Classifier, LightGBM, XGBoost
• The two inputs of the prediction component are the model and the
prediction set. The prediction result shows the predicted data, actual
data, and the probability of different results in each group.
• The confusion matrix, also known as the error matrix, is used to
evaluate the accuracy of the model.
21. Preprocessing
(Input data)
• Preprocessing is a significant stage in the
knowledge discovery process. Real world
data tends to be noisy and inconsistent.
Data processing techniques like data
cleaning etc help in overcoming these
drawbacks. Normalization of the dataset
helps in classify the data which further
makes the data to smoothly allow
algorithms to execute with efficient results.
To carry out normalization, normalize
function is used. this helps in bifurcating
the data into classes. Then a variable will be
created that is ‘num’ which will hold the
predicted attribute.
22. Training the model
• In the training part, the backpropagation algorithm as mentioned above will be implemented.
backpropagation helps in finding a better set of weights in short amount of time. The training is done on
basis of the dataset input to the system. Herein ‘min max’ function is implemented so as to gain a matrix
of minimum and maximum values as specified in its argument. This function is applied for training of the
network. The efficiency of the system can be improved every instance as many times the model is
trained, the number of iterations etc. The whole dataset provided which consists of 13 attributes and 872
rows will help the model undergo training. Training can also be implemented by splitting the data in
equalized required amount of data partitions. In the user interactive GUI, as the user will select train
network option after entering his data at the backend the .csv file of UCI dataset will be read and
normalization will be carried out so as to classify the data into classes which becomes easier to be fed
onto the neural network. the neural network that is created here will be consisting of three layers
namely: input layer, hidden layer and output layer. Hidden layers can be customized to 2 or 3 as per users
requirements. To generate a network, train() function is implemented so as to pass the inputs. this
network will be stored in .mat file. After the network is generated, we check for mean square error.
23. Testing
the
model
• Testing will be conducted so as to
determine whether the model that is
trained is providing the desired
output. As the data is entered for
testing, the .csv file will be retrieved
to crosscheck and then compare and
the results of the newly entered data
will be generated. On basis of how
the model is trained with the help of
the dataset, the user will input
values of his choice to the attributes
specified and the results will be
generated as the whether there is a
risk of heart disease or not.
24. Classification of predicting model
• The genetic algorithm is applied so as to initialize neural network weight.
The genetic algorithm is used to evaluate and calculate the number of
layers in the neural network along with the total number of weights used
and bias. The initial population is generated at random. Bias is used such
that the output value generated will not be 0 or negative. On basis of the
mean square error calculated during testing, the fitness function of each
chromosome will be calculated. Ater selection and mutation is carried out
in genetic algorithm, the chromosome consisting of lower adaptation are
replaced with optimized one that is better and fitter chromosomes. If at
all, the best fit is not selected (worst fit is selected) then the process
continues until the best fit is selected. This genetic algorithm concept
along with Multilayer Feed Forward Network is used to predict the
presence or absence of cardiovascular disease in the patient
25. Prediction
of heart
disease
• This component will help in predicting the
severity of the cardiovascular disease. When
user will input data, the weights will be cross
checked with the given inputs. The prediction
neural network will consist of 13 nodes as a
part of input layer considering that 13 attribute
values will be input to the system. Then the
hidden layer and one node in the output layer
which will provide the result. The predicted will
be generated in the form of a ‘yes’ or ‘no’
format considering all the risk factors whether
they lie in the criteria as per the model is
trained.
27. Support Vector
Machine (SVM)
• A support vector machine is a
supervised learning technique in
machine learning algorithms. If
you give any labeled training data
to support vector machine
algorithms, it will produce a
classifier that will divide the
labeled data into different classes.
28. Decision
Tree (DT)
• A decision tree is one of the
supervised learning techniques in
machine learning algorithms. It is
used for both classification and
regression. In this algorithm, data
will be split according to the
parameters. A decision tree is a
tree that will contain nodes and
leaves. At leaves, we will get
outcomes or decision, and at the
nodes, data will be split.
29. Random
Forest (RF)
• It is one of the supervised machine learning
algorithms which is used for both classification
and regression also. However, it is mainly used
for classification purposes. The name itself is
suggested that it is a forest, a forest is a group
of trees similarly in a random forest algorithm
we will have trees these trees are the decision
trees. If we have a higher number of decision
trees prediction results will be more accurate.
Random forest algorithm works this way at; first
it will collect random samples from the dataset
and then it will create decision trees for each
sample from those available trees we will select
the tree which will produce the best prediction
results.
30. Naïve
Bayes (NB)
• Naïve Bayes is one of the
supervised machine learning
classification algorithms. Earlier it is
used for text classification. It deals
with the datasets which have the
highest dimensionality. Some
examples are sentimental analysis,
spam filtration, etc. This naïve
Bayes algorithm is based on Bayes
theorem with the assumption that
attributes are independent of each
other. It is nothing but attributes in
one class that is independent of any
other attributes that are present in
the sameclass.
31. A detailed survey of the previous studies shows that ANN-based
methods have been widely adopted in medical diagnosis due to
their capability in handling complex linear and non-linear problems.
Most of the studies which applied ANN for heart disease detection
used Levenberg Marquardt (LM), scaled conjugate gradient (SCG)
and Pola-Ribiere conjugate gradient (CGP) algorithms for learning
the values or weights of parameters from training data. However, in
this study we used recently proposed optimization algorithms
known as IBFGS and Adam. Moreover, the earlier studies used ANN
which is a neural network with only one hidden layer while in this
paper we used a deep neural network with more than one hidden
layer. Deep neural networks are neural networks that use multiple
hidden layers and are trained using new methods.
43. • Accuracy for training set for SVM = 0.9256198347107438
Accuracy for test set for SVM = 0.8032786885245902
• Accuracy for training set for Naive Bayes = 0.8677685950413223
Accuracy for test set for Naive Bayes = 0.7868852459016393
• Accuracy for training set for Logistic Regression =
0.8636363636363636
• Accuracy for test set for Logistic Regression = 0.8032786885245902
44. • Accuracy for training set for Decision Tree = 1.0
Accuracy for test set for Decision Tree = 0.7868852459016393
• Accuracy for training set for Random Forest = 0.9834710743801653
Accuracy for test set for Random Forest = 0.8032786885245902
• Accuracy for training set for LightGBM = 0.9958677685950413
Accuracy for test set for LightGBM = 0.7704918032786885
• Accuracy for training set for XGBoost = 0.987603305785124
Accuracy for test set for XGBoost = 0.7540983606557377
46. • Efficient heart disease prediction system using decision tree -
Purushottam, Kanak Saxena, Richa Sharma.
• An Automated Diagnostic System for Heart Disease Prediction Based on χ 2
Statistical Model and Optimally Configured Deep Neural Network - LIAQAT
ALI ATIQUR RAHMAN , AURANGZEB KHAN, etc.
• An Optimized Stacked Support Vector Machines Based Expert System for
the Effective Prediction of Heart Failure - Liaqat Ali, Awais Niamat, Javed Ali
Khan,etc.
• An Intelligent Learning System Based on Random Search Algorithm and
Optimized Random Forest Model for Improved Heart Disease Detection - Ashir
Javeed Shijie Zhou Liao Yongjian.
• Machine Learning and End-to-End Deep Learning for the Detection of Chronic
Heart Failure From Heart Sounds - Martin Gjoreski, Anton Gradišek,etc.