CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
COMP_GroupA2.pptx
1. Department of Computer Engineering,
Sinhgad Institute of Technology and Science, Pune
2019-20
Project Stage II SPPU Examination
Name. PRNNO.
Venkatesh T. Mainalli 71622707L
Gaurav J. Bade 71832932F
Akshay R. Dhole 71833043K
Shivraj Deshmukh 71833027H
Name of the guide :- Prof. G. S. Navale
Classification of Imbalanced Dataset
2. Agenda
• Introduction of Group Members
• Introduction of Project Topic
• Motivation
• Literature Survey
• Gap Analysis
• Problem Statement
• Objectives
• Activities per Objective
• Feasibility Study
• Schedule
• System Architecture
• Mathematical Model
• SRS
• UML/DFD
• Algorithms
• Testing
• Results
• Conclusion
• Future Scope
• References
25-04-2023 Classification of Imbalanced Dataset 2
3. Introduction of Group Member 1
25-04-2023 Classification of Imbalanced Dataset 3
Bio – Data : Personal Information
Roll No B17COA77
Name of Student Venkatesh Tukkappa Mainalli
Date of Birth 25-03-1998
Address House No Bldg No Anil Villa, Kolhewadi, Sinhgad Road
Lane, Appt,Society
City District Pune
Tal Dist State Pin Maharashtra- 411024
Email Id venkateshmainalli55555@gmail.com
Mobile No. 7798311297
4. Introduction of Group Member 1
25-04-2023 Classification of Imbalanced Dataset 4
Class Sem Year of Adm % Marks Remark
F. E. I 2016-2017 62.0 Pass
II 64.0 Pass
S. E. I 2017-2018 54.0 Pass
II 58.0 Pass
T. E. I 2018-2019 60.0 Pass
II 68.0 Pass
B. E. I 74.16 Pass
Academic Track Record
5. Introduction of Group Member 2
25-04-2023 Classification of Imbalanced Dataset 5
Bio – Data : Personal Information
Roll No B17COA14
Name of Student Shivraj Deshmukh
Date of Birth 17-01-1999
Address House No Bldg No Room no. 201
Lane, Appt,Society Balaji Hostel, Narhe
City District Pune
Tal Dist State Pin Maharashtra- 411041
Email Id deshmukh.shivraj95@gmail.com
Mobile No. 7972326595
Recent
Photo
4.05cm ht
6. Introduction of Group Member 2
25-04-2023 Classification of Imbalanced Dataset 6
Class Sem Year of Adm % Marks Remark
S. E. I 2017-2018 74.16 Pass
II 70.46 Pass
T. E. I 2018-2019 69.72 Pass
II 64.24 Pass
B. E. I 70.16 Pass
Academic Track Record
7. Introduction of Group Member 3
25-04-2023 Classification of Imbalanced Dataset 7
Bio – Data : Personal Information
Roll No B17COA17
Name of Student Akshay Ramchandra Dhole
Date of Birth 21-09-1998
Address House No Bldg No Flat No.1,
Lane, Appt,Society Shantiyuga Apartment, Manaji Nagar, Narhe
City District Pune
Tal Dist State Pin Maharashtra - 411041
Email Id akshaydholead10@gmail.com
Mobile No. 7768902644
Recent
Photo
4.05cm ht
8. Introduction of Group Member 3
25-04-2023 Classification of Imbalanced Dataset 8
Class Sem Year of Adm % Marks Remark
S. E. I 2017-2018 72.68 Pass
II 67.50 Pass
T. E. I 2018-2019 61.06 Pass
II 66.53 Pass
B. E. I 78.23 Pass
Academic Track Record
9. Introduction of Group Member 4
25-04-2023 Classification of Imbalanced Dataset 9
Bio – Data : Personal Information
Roll No B17COA04
Name of Student Gaurav Jayant Bade
Date of Birth 03-01-1998
Address House No Bldg No Flat No.420, E-Wing
Lane, Appt,Society Samarth Vishwa Society, Khandala
City District Khandala, Satara
Tal Dist State Pin Maharashtra- 412802
Email Id Gauravbade11@gmail.com
Mobile No. 9960503007
10. Introduction of Group Member 4
25-04-2023 Classification of Imbalanced Dataset 10
Class Sem Year of Adm % Marks Remark
S. E. I 2017-2018 69.72 Pass
II 69.12 Pass
T. E. I 2018-2019 64.46 Pass
II 64.09 Pass
B. E. I 2019-2020 73.19 Pass
Academic Track Record
11. Introduction
• What is Machine Learning ?
A branch of artificial intelligence, concerned with
the design and development of algorithms that allow
computers to evolve behaviors based on empirical
data.
As intelligence requires knowledge, it is necessary for
the computers to acquire knowledge.
25-04-2023 Classification of Imbalanced Dataset 11
12. • What is Imbalanced Dataset ?
Imbalance data distribution is an important
part of machine learning workflow.
An imbalanced dataset means instances of
one of the two classes is higher than the
other, in another way, the number of
observations is not the same for all the
classes in a classification dataset.
This problem is faced not only in the binary
class data but also in the multi-class data.
25-04-2023 Classification of Imbalanced Dataset 12
13. Motivation
Many real-world data sets have an imbalanced
distribution of the instances. Learning from such
data sets results in the classifier being biased
towards the majority class, thereby tending to
misclassify the minority class samples
For example – fraud detection, cancer detection,
manufacturing defects, online ads conversion etc.
25-04-2023 Classification of Imbalanced Dataset 13
14. Sr.
No.
Paper Title Paper Theme/Idea Pros / Cons
1 “ARCID: A New Approach
to Deal with Imbalanced
Datasets Classification”
Year:- 2018
Authors:- Abdellatif, Safa,
et.al
New Approach to deal with
Classification of Imbalanced
Datasets
Pros: ARCID performs slightly
better than Fitcare and RIPPER.
However, it outperforms all
standard rule-based and non-
rule based approaches by many
ranks
Cons: Managing the
overwhelming number of
CARs generated from real- life
datasets and Removing
redundant rules conveys the
same information.
Literature Survey
25-04-2023 Classification of Imbalanced Dataset 14
15. Sr.
No.
Paper Title Paper Theme/Idea Pros / Cons
2 Paper Title:-Applying Map-
Reduce to imbalanced data
classification
Year:- 2017
Authors:-Jedrzejowicz,
Joanna , et.al
Using Map-Reduce to deal
with Imbalanced data
classification
Pros:
1.EDBC uses the three methods
to classify the imbalance data
in proper in manner.
2.SplitBal works parallelly so
the processing time is less as
compare to EDBC.
Cons:
1.EDBC does not works
parallelly so the processing
time is more.
2. In SplitBal dataset is divided
into bins so working on bin is
lengthy.
Literature Survey
25-04-2023 Classification of Imbalanced Dataset 15
16. Sr. Paper Title Paper Theme/Idea Pros / Cons
3 Paper Title:-A study on
classifying imbalanced
datasets
Year:- 2014
Authors:-T. Lakshmi and
S. Prasad
Study of classifying
imbalanced dataset.
Pros:
1.The combination of SMOTE +
BAG + RF effectively improves
the performance of classifying
of imbalanced dataset.
2.The use of ensemble and
sampling techniques gives
drastic improvement in
performance in terms of AUROC
Cons:
1.Execution time was reduced
by using the under- sampling,
but the accuracy of classifying
dataset was distracted.
2.There was loss of necessary
and useful information due to
under-sampling.
Literature Survey
25-04-2023 Classification of Imbalanced Dataset 16
17. Sr.
No.
Paper Title Paper Theme/Idea Pros / Cons
4 Paper Title:-Improved
SMOTEBagging and its
application in imbalanced
data classification
Year:- 2013
Authors:-Yongqing, Z. &
Min, & Zhu, Danling, &
Zhang, M. & Gang, and M.
Daichuan
An approach to deal
imabalanced data using
oversampling and bagging
approach.
Pros:
1.SMOTE Method generates
synthetic minority examples to
over-sample the minority class.
2.Bagging Method improves
the performance of the overall
system.
3.Support Vector Machine one
of the most effective machine
learning algorithms for many
complex binary classification
problems.
4.SMOTE Bagging Method
overcomes over-fitting
problems of traditional
algorithm.
Literature Survey
25-04-2023 Classification of Imbalanced Dataset 17
18. Sr.
No.
Paper Title Paper Theme/Idea Pros / Cons
4 Paper Title:-Improved
SMOTEBagging and its
application in imbalanced
data classification
Year:- 2013
Authors:-Yongqing, Z. &
Min, & Zhu, Danling, &
Zhang, M. & Gang, and M.
Daichuan
An approach to deal
imabalanced data using
oversampling and bagging
approach.
Cons:
1.SMOTE Method it only work
on minority class not on
majority class.
2.Bagging Method combing of
decision sometimes over-head
Literature Survey
25-04-2023 Classification of Imbalanced Dataset 18
19. Sr.
No.
Paper Title Paper Theme/Idea Pros / Cons
5 Paper Title:-Class
Imbalance Problem in Data
Mining: Review
Year:- 2013
Authors:-M. R. Longadge,
M. S. S. Dongre, D. L.
Malik et al
Using Data Mining to deal
with Imbalanced data
classification
Pros:
Hybrid approach provides
better solution for class
imbalance. Suggests several
algorithm and techniques that
solve the issue of imbalance
distribution of sample.
Cons:
This method improves the
classification accuracy of
minority class but, because of
infinite data streams and
continuous concept drifting,
this method cannot suitable for
skewed data stream
classification.
Literature Survey
25-04-2023 Classification of Imbalanced Dataset 19
20. Sr.
No.
Paper Title Paper Theme/Idea Pros / Cons
6 Paper Title:-Skew Boost:
An algorithm for classifying
imbalanced datasets
Year:- 2011
Authors:-S. Hukerikar, A.
Tumma, A. Nikam, and V.
Attar,
An approach of handling
imbalanced dataset using
boosting tech
Pros:
SkewBoost performs better
than DataBoost-IM, Inverse
Random UnderSampling (IRUS),
EasyEnsemble and Balance
Cascade and Cost Sensitive
Boosting (CSB2).
Cons:
In case of some datasets,
SkewBoost remains behind of
some existing algorithms in
metrics related to the majority
class accuracy
Literature Survey
25-04-2023 Classification of Imbalanced Dataset 20
21. Sr.
No.
Paper Title Paper Theme/Idea Pros / Cons
7 Paper Title:-RUSBoost: A
Hybrid Approach to
Alleviating Class Imbalance
Year:- 2010
Authors:-Seiffert, Chris &
Khoshgoftaar, Taghi & Van
Hulse, Jason & Napolitano,
Amri.
Hybrid approach of
handling imbalanced
dataset using boosting tech.
Pros:
RUSBoost presents an easier,
faster, and fewer advanced
alternative to SMOTEBoost for
learning from imbalanced data.
Cons:
The complexity and model
training time get increased.
Literature Survey
25-04-2023 Classification of Imbalanced Dataset 21
22. Sr.
No.
Paper Title Paper Theme/Idea Pros / Cons
8 Paper Title:-Learning
pattern classification tasks
with imbalanced datasets
Year:- 2009
Authors:-S. L. Phung
Recognizing learning
pattern of classification of
imbalanced data.
Pros:
The approach proposed by
author can effectively improve
classification accuracy of
minority classes while
maintaining the overall
classification performance.
Cons:
The combination of supervised
and unsupervised algorithm
tends to have higher values of
G-means and F-measure when
compared with different
training algorithms
Literature Survey
25-04-2023 Classification of Imbalanced Dataset 22
23. Sr.
No.
Paper Title Paper Theme/Idea Pros / Cons
9 Paper Title:- “Z-SVM: An
SVM for improved
classification of imbalanced
data
Year:- 2006
Authors:-Imam, Tasadduq
& Ting, Kai &
Kamruzzaman, Joard
SVM approach to handle
Imbalanced Dataset
Pros:
Z-SVM aims at finding the
discriminating hyperplane that
maintains an optimal margin
from boundary examples called
support vectors.
Cons:
It is not much clear that how
particular value of parameter
affects the SVM hyperplane
and generalization capability of
learned model
Literature Survey
25-04-2023 Classification of Imbalanced Dataset 23
24. Sr.
No.
Paper Title Paper Theme/Idea Pros / Cons
10 Paper Title:-Editorial:
Special Issue on Learning
from Imbalanced Data Sets
Year:- 2004
Authors:-Chawla, Nitesh &
Japkowicz, Nathalie &
Kołcz, Aleksander.
Information of Learning
from Imbalanced dataset
Pros:
Use of cluster based
oversampling counters the
effect of class imbalance and
small disjuncts.
Cons:
Feature selection can often be
too expensive to apply for
handling imbalanced data.
Literature Survey
25-04-2023 Classification of Imbalanced Dataset 24
25. Sr.
No.
Paper Title Paper Theme/Idea Pros / Cons
11 Paper Title:-Learning from
imbalanced data sets with
boosting and data
generation: The DataBoost-
IM approach
Year:- 2004
Authors:-Guo, Hongyu &
Viktor, Herna.
Handling Imbalanced data
using boosting approach
Pros:
DataBoost-IM produce s a
series of high- quality classifiers
will be better able to predict
examples for which the
previous classifier’s
performance is poor.
Cons:
The execution gets complicated
when tedious approach is used
Literature Survey
25-04-2023 Classification of Imbalanced Dataset 25
26. Sr. Paper Title Paper Theme/Idea Pros / Cons
12 Paper Title:-Synthetic
Minority Oversampling
Technique
Year:- 2002
Authors:-N. Chawla
An oversampling approach
to deal with imbalanced
dataset
Pros:
For almost all the ROC curves, the
SMOTE approach dominates.
Adhering to the definition of ROC
convex hull, most of the doubtless
optimal classifiers are the ones
generated with SMOTE.
Cons:
A minority class sample could
possibly have a majority class
sample as its nearest neighbor
rather than a minority class
sample. This crowding will likely
contribute to the redrawing of the
decision surfaces in favor of the
minority class
Literature Survey
25-04-2023 Classification of Imbalanced Dataset 26
27. Gap Analysis
25-04-2023 Classification of Imbalanced Dataset 27
• The study of papers gave an idea about how
the imbalanced data can result in
inappropriate results.
• Some papers also gave idea about how we can
deal with imbalanced data.
28. Problem Statement
• To classify imbalanced data set to predict the accuracy of
minority class from a given structured data set using
diversified ensemble and clustering methods.
25-04-2023 Classification of Imbalanced Dataset 28
29. Objectives
1. To develop balanced data set from imbalanced
multiclass dataset by applying sampling techniques.
2. To improve the classification precision and achieve
good balancing among the class samples.
3. To develop an efficient method for solving the class
imbalance problem in the classification process.
25-04-2023 Classification of Imbalanced Dataset 29
30. 25-04-2023 Classification of Imbalanced Dataset 30
Activities per Objective
• Develop balanced data set from imbalanced multiclass dataset
by applying sampling techniques.
• Activities::
1. Input imbalanced dataset.
2. Apply SMOTE algorithm with sampling.
3. Result balanced dataset.
31. Activities per Objective
25-04-2023 Classification of Imbalanced Dataset 31
• To improve the classification precision and achieve good
balancing among the class samples.
• Activities:
1. Testing with any one classification algorithm then proceed with
combined algorithm .
2. Proper cleansing and pre-processing before testing.
3. Using the learning curve approach to find the performance of
algorithm.
32. Activities per Objective
25-04-2023
• To develop an efficient method for solving the class imbalance
problem in the classification process.
• Activities:
1.Evaluate Classifiers in Imbalanced Domains.
2. Using statistical measures for evaluating classification.
Classification of Imbalanced Dataset 32
33. Feasibility Study
25-04-2023
Following points are analyzed to check feasibility of Classification
of imabalanced Dataset:
1. Dataset for training and testing
2. SMOTE for synthetic oversampling
3. Random forest for ensembled classification
4. Tkinter for creating GUI
5. XGBoost for performing boosting
Classification of Imbalanced Dataset 33
34. Schedule
25-04-2023 Classification of Imbalanced Dataset 34
• - To develop a blockchain manager and integrate it with the
web portal.
• 2.1 Building a blockchain manager interface that can
support dynamic addition of blockchains.
• 2.2 Authorizing the blockchain dynamically to create
a private blockchain network.
• 2.3 Creating methods to support image checking and
uploading to the blockchain via blockchain
manager.
• 2.4 Creating method for restoration after the blockchain
failure.
• 2.5 Calling the methods from web interface.
36. Mathematical model
25-04-2023 Classification of Imbalanced Dataset 36
• The math's behind XGBoost is as follows
– The following steps are involved in gradient
boosting:
• F0(x)- with which we initialize the boosting algo as:
• The gradient of loss function is computed iteratively:
37. Mathematical model
25-04-2023 Classification of Imbalanced Dataset 37
• Each hm(x) is fit on gradient obtained at each step.
• The multiplicative factor ym for each terminal node is
derived and the boosted model Fm(x) is defined:
38. Mathematical model
25-04-2023 Classification of Imbalanced Dataset 38
• The math’s behind the working of RFC is as
follows:
• RFC does both row sampling and column sampling with
decision tree as a base.
39. Mathematical model
25-04-2023 Classification of Imbalanced Dataset 39
• As the no. of base learner(k) increases, the
variance gets decreased. When k is decreased,
variance is increased. But bias remain
constant for whole process.
• It can be expressed as:
– Random forest= DT(base learner)+ bagging(Row
sampling with replacement)+ feature
bagging(column sampling) +
aggregation(mean/median, majority vote)
40. Mathematical model
25-04-2023 Classification of Imbalanced Dataset 40
• Math’s behind RFC is:
• Assume training set of microarrays
–D = {(x1,y1),….,(xn,yn)}
• drawn randomly from a (possibly unknown)
probabilitydistribution .
–(xi,yi) ~ (X,Y)
41. Mathematical model
25-04-2023 Classification of Imbalanced Dataset 41
• Goal: to build a classifier which predicts y from x based
on the data set of examples D.
• Given: ensemble of (possibly weak) classifiers
– h ={ h1(x),….,hk(x)}.
• If each hk(x) is a decision tree, then the ensemble is a
random forest. We define the parameters of decision
tree for classifier hk(x) to be
–Qk = (qk1,qk2,....,qkp)
• (these parameters include the structure of tree, which
variables are split in which nodes, etc)
51. Project Implementation : Algorithms
25-04-2023 Classification of Imbalanced Dataset 51
• Algorithms used are as follows:
– Random Forest Classifier
– Xtreme Gradient Boosting
– SMOTE
52. Project Implementation : Algorithms
25-04-2023 Classification of Imbalanced Dataset 52
• Working of RFC
– We can understand the working of Random Forest
algorithm with the help of following steps −
– Step 1 − First, start with the selection of random
samples from a given dataset.
– Step 2 − Next, this algorithm will construct a
decision tree for every sample. Then it will get the
prediction result from every decision tree.
53. Project Implementation : Algorithms
25-04-2023 Classification of Imbalanced Dataset 53
• Working of RFC
– Step 3 − In this step, voting will be performed for
every predicted result.
– Step 4 − At last, select the most voted prediction
result as the final prediction result.
The working of RFC is as illustrated in fig.1.
54. Project Implementation : Algorithms
25-04-2023 Classification of Imbalanced Dataset 54
Fig. 1 Working of Random Forest Classification
55. Project Implementation : Algorithms
25-04-2023 Classification of Imbalanced Dataset 55
• Working of XGBoost
– XGBoost is an implementation of Gradient
Boosted decision trees.
– It is a type of Software library that was designed
basically to improve speed and model
performance.
56. Project Implementation : Algorithms
25-04-2023 Classification of Imbalanced Dataset 56
• Working of XGBoost
– In this algorithm, decision trees are created in
sequential form.
– Weights play an important role in XGBoost.
Weights are assigned to all the independent
variables which are then fed into the decision tree
which predicts results.
– Weight of variables predicted wrong by the tree is
increased and these the variables are then fed to
the second decision tree.
57. Project Implementation : Algorithms
25-04-2023 Classification of Imbalanced Dataset 57
• Working of XGBoost
– These individual classifiers/predictors then
ensemble to give a strong and more precise
model.
The Working of XGB is as illustrated in fig. 2.
58. Project Implementation : Algorithms
25-04-2023 Classification of Imbalanced Dataset 58
Fig. 2 Working of XGBoost
59. Project Implementation : Algorithms
25-04-2023 Classification of Imbalanced Dataset 59
• Working of SMOTE
– Step 1: Setting the minority class set A, for each ‘x
belongs to a’ the k- nearest neighbours of x are
obtained by calculating Euclidean distance
between x and every other sample in set A.
– Step 2: The sampling rate N is set according to
imbalanced proportion. For each ‘x belongs to a’,
N examples (i.e. x1,x2,x3,…xn) are randomly
selected from its KNN and they construct the set
A1.
60. Project Implementation : Algorithms
25-04-2023 Classification of Imbalanced Dataset 60
• Working of SMOTE
– Step 3: For each example xk belongs to A1(k=
1,2,3,…N), the following formula is used to
generate a new example:
x’ = x +rand(0,1)* |x-xk|
in which rand(0,1) represents the random
number between 0 & 1.
The working of SMOTE is as illustrated in fig 3.
61. Project Implementation : Algorithms
25-04-2023 Classification of Imbalanced Dataset 61
Fig. 3: Working of
Smote
62. Project Implementation : Testing
25-04-2023 Classification of Imbalanced Dataset 62
Test case ID 1: Execute the implemented system in Windows 10
Test Case Summary To verify if the system is able to execute on Win 10.
Prerequisites User must have python with packages installed.
Test Procedure
1. Open the System
2. Run the System
3. Execute the Algorithm
Expected Result All the algorithms should be successfully executed.
Actual Result All Algorithms are executed Successfully.
Status Pass
63. Project Implementation : Testing
25-04-2023 Classification of Imbalanced Dataset 63
Test case ID 2: Execute the implemented system in Macintosh
Test Case Summary To verify if the system is able to execute on Mac.
Prerequisites User must have python with packages installed.
Test Procedure
1. Open the System
2. Run the System
3. Execute the Algorithm
Expected Result All the algorithms should be successfully executed.
Actual Result All Algorithms are executed Successfully.
Status Pass
64. Project Implementation : Testing
25-04-2023 Classification of Imbalanced Dataset 64
Test case ID 3: Does component of UI works as expected
Test Case Summary To verify if components of UI are working in every
system.
Prerequisites User must have python with packages installed.
User must have Ubuntu
Test Procedure
1. Open the System
2. Run the System
3. Execute the Algorithm
Expected Result All the Components of UI should perform its task as assigned.
Actual Result All components of UI performs its task as assigned.
Status Pass
65. Project Implementation : Results
25-04-2023 Classification of Imbalanced Dataset 65
• We have evaluated some of the Classifiers
along with the proposed algorithm.
• The Classifiers we used are:
– Random Forest Classifier
– Xtreme Gradient Boosting
– Bagging Classifier
– SMOTE + XGBoost + RFC
66. Project Implementation : Results
25-04-2023 Classification of Imbalanced Dataset 66
• For RFC we got performance as:
– Recall: 0.788
– Precision: 0.947
– F1 Score: 0.860
– Average Precision recall: 0.838
67. Project Implementation : Results
25-04-2023 Classification of Imbalanced Dataset 67
• After tuning RFC we got performance as:
– Recall: 0.788
– Precision: 0.947
– F1 Score: 0.860
– Average Precision recall: 0.854
68. Project Implementation : Results
25-04-2023 Classification of Imbalanced Dataset 68
• Using XGB we got performance as:
– Recall: 0.805
– Precision: 0.948
– F1 Score: 0.871
– Average Precision recall: 0.868
69. Project Implementation : Results
25-04-2023 Classification of Imbalanced Dataset 69
• After tuning XGB we got performance as:
– Recall: 0.000
– Precision: 0.nan
– F1 Score: 0.nan
– Average Precision recall: 0.669
70. Project Implementation : Results
25-04-2023 Classification of Imbalanced Dataset 70
• Using XGB+Smote we got performance as:
– Recall: 1.000
– Precision: 0.999
– F1 Score: 1.000
– Average Precision recall: 1.000
71. Project Implementation : Results
25-04-2023 Classification of Imbalanced Dataset 71
• Using Smote+RFC we got performance as:
– Recall: 1.000
– Precision: 1.000
– F1 Score: 1.000
– Average Precision recall: 1.000
72. Project Implementation : Results
25-04-2023 Classification of Imbalanced Dataset 72
• From the Evaluation performance of each
algo. we came to conclusion that the Smote
resampled data along with XGB and RFC gave
better results than other algorithms.
73. Conclusion
• Classification is one among the foremost common problem in
machine learning. One of the common issues found in dataset while
classification is imbalanced classes issue. Various researchers have
put their best efforts to solve this issue. The literature review has
yielded some of the useful approaches to face this problem. The best
solution to face this problem is to make use of ensemble methods.
Ensemble methods are combination of multiple machine learning
technique into one model to decrease variance (bagging), bias
(boosting), or improve predictions (stacking). Other methods also
exist but are not capable of performing best in classification of
imbalanced dataset. In future, the proposed work will be
implemented using python programming which will ensure that the
target performance will be more than 95% so that ensemble
classifier can be used reliably.
25-04-2023 Classification of Imbalanced Dataset 73
74. References
25-04-2023 Classification of Imbalanced Dataset 74
[1] Abdellatif, Safa & Ben Hassine, Mohamed Ali & Ben Yahia, Sadok & Bouzeghoub, Amel. (2018).
“ARCID: A New Approach to Deal with Imbalanced Datasets Classification.”, SOFSEM 2018: Theory
and Practice of Computer Science, pp.569-580 doi:10.1007/978-3-319-73117-9_40.
[2] Jedrzejowicz, Joanna & Neumann, Jakub & Synowczyk, Piotr & Zakrzewska, Magdalena.
(2017). “ Applying Map-Reduce to imbalanced data classification”. 2017 IEEE International
Conference on INnovations in Intelligent SysTems and Applications (INISTA) pp.29-33.
doi10.1109/INISTA.2017.8001127.
[3] T. Lakshmi and S. Prasad, “A study on classifying imbalanced datasets,” IEEE Conference
Publication, 2014.
[4] Yongqing, Z. & Min, & Zhu, Danling, & Zhang, M. & Gang, and M. Daichuan, “Improved
SMOTEBagging and its application in imbalanced data classification,” IEEE Conference Anthology,
2013.
[5] M. R. Longadge, M. S. S. Dongre, D. L. Malik et al., “Class Imbalance Problem in Data Mining:
Review,” International Journal of Computer Science and Network (IJCSN), vol. 2, no. 1, 2013.
[Online]. Available: www.ijcsn.org
[6] S. Hukerikar, A. Tumma, A. Nikam, and V. Attar, “Skew Boost: An algorithm for classifying
imbalanced datasets,” in 2011 2nd International Conference on Computer and Communication
Technology (ICCCT-2011), 2011, pp. 46–52.
75. [7] Seiffert, Chris & Khoshgoftaar, Taghi & Van Hulse, Jason & Napolitano, Amri. (2010).
“RUSBoost: A Hybrid Approach to Alleviating Class Imbalance.
Systems, Man and Cybernetics, Part A: Systems and Humans,” IEEE Transactions, 2010.
[8] S. L. Phung, “Learning pattern classification tasks with imbalanced datasets,” in
Pattern Recognition. InTech, 2009.
[9] Imam, Tasadduq & Ting, Kai & Kamruzzaman, Joarder. “Z-SVM: An SVM for
improved classification of imbalanced data,” Advances in Artificial
Intelligence, vol. 4304, 2006.
[10]Chawla, Nitesh & Japkowicz, Nathalie & Kołcz, Aleksander. (2004). “Editorial:
Special Issue on Learning from Imbalanced Data Sets.”, SIGKDD
Explorations. 6. 1-6. 10.1145/1007730.1007733..
[11]Guo, Hongyu & Viktor, Herna. (2004). “Learning from imbalanced data sets with
boosting and data generation: The DataBoost-IM approach.” SIGKDD
Explorations. 6. 30-39. 10.1145/1007730.1007736.
[12]N. Chawla, “Synthetic Minority Oversampling Technique,” Journal of Artificial
Intelligence Research, vol. 16, pp. 321–357, 2002.
25-04-2023 Classification of Imbalanced Dataset 75