2. The applicability of the ANN is for the
classification and reorganization of the data.
However, for reorganization and classification, the
ANN needs a large dataset. In order to optimize this
type of data and for generating and making a pattern
or feature, the ANN needs a special system or
algorithm to overcome such problems. This study
aims to propose the application of GA to improve the
ANN mechanism. Besides, in this study, GA will be
used to overcome this problem [8].
One of the most popular methods of machine
learning is Support Vector Machine (SVM) and it has
been applied to solve the regression and classification
problems. For each one of the given input data, the
SVM takes a series of input data and predicts that the
output is formed by which one of two probable classes
(it is also known as the binary linear classifier). Given
a set of training examples, each marked as belonging
to one of two categories (Attack or Normal). In the
attack detection, the SVM is responsible of predicting
if the new data falls into the category of normal data
or the attack group [9].
The SVM is helpful in reorganizing and
classifying the data. However, for classification and
reorganization, a large data set is required by SVM
[10]. For optimizing this data type and for making and
generating a feature pattern, a special algorithm or
system is required by SVM for overcoming such
problems. This study has proposes to apply the GA
for improving the mechanism of SVM. In addition,
this study intends to utilize the GA for overcoming
this problem [11].
GA is one of the most used and most popular
algorithms for the machine learning. It is an adaptive
and exploratory algorithm for search and work that
has been based upon the evolutionary ideas of natural
genetics [12]. The GA generates the primary
individual population with a quality in a high level of
the individuals. Besides, each one of these individuals
represents a solution for the problem [13]. GA is
known as a parallel algorithm and it can find a
solution for a problem with many subsets, thus, this
algorithm is a proper algorithm to be used for IDS.
Genetic algorithm is capable of simultaneously
finding and searching for solutions in various problem
subsets. Moreover, the GA has no mathematical
derivation and it is capable of reaching to the roper
solution sets for the problems. In addition, GA can
propose a solution in a single solution that its value is
optimal. Besides, the GA is capable of recognizing the
new data or attacks from the previous ones and it is
considered as a suitable method for the intrusion
detection systems, particularly for detecting the
attacks, which are based upon the human behavior
[14].
In the machine learning field, the process of
selecting a set or a subset in a related feature for
making a solution model is known as the feature
selection. When the feature is in use, the assumption
is that there are redundant and irrelevant information
included in the data. Thus, in machine learning and to
overcome this problem, the feature selection
algorithm is used by the researchers to select the
relevant and useful information [15].
II. RELATED WORK
In the previous studies, the researchers have tried
to solve this problem by using different methods such
as LCFS, FFSA and MMIFS [16], fuzzy rule based
[17], SVM Classification, GA optimization [18],
ANN Classification [19], GA optimization and four-
angle-star [20]. Table 1 illustrate of these methods in
brief.
TABLE I. PREVIOUS WORK
Author Method objective
Bin Luo et
al.
four-angle-star based visualized feature
generation approach, (FASVFG)
evaluate the distance
between samples in a 5-class
classification problem
Abraham et
al.
fuzzy rule based
classifiers
framework for Distributed
Intrusion Detection Systems
(DIDS)
Amiri et al. Forward feature selection algorithm(FFSA)
Liner correlation feature selection (LCFS)
Modified mutual information feature
selection (MMIFS)
Propose a feature selection
phase, which can be
generally implemented on
any intrusion detection
Li et al. Ant colony algorithm and support vector
machine (SVM)
This paper proposes a
desirable IDS model with
high efficiency and accuracy
Dastanpour
et al.
Propose a feature selection based on the
genetic algorithm and support vector
machine
Improve detection rate with
the less number of features
Dastanpour
et al.
Applying Genetic Algorithms (GA) with
Artificial Neural Networks classifier to
detect the attacks in network
Increase of accuracy with the
optimal number of features
2014 IEEE Conference on Open Systems (ICOS), October 26-28, 2014, Subang, Malaysia
73
3. III. DATA ANALYSIS
In this study, theKnowledge Discovery and Data
Mining(KDD CUP 1999) has been applied for the
data set. This dataset has been used due to its
comprehensiveness. It is also the best dataset to
investigate one’s IDS performance. There are 22
attack types included in this dataset [21] and they can
be classified into 4 groups [22]: probing, U2R, R2L,
and DOS with the following details [23]:
Probing: surveillance and other probing: the
network is scanned by this attack type of data
collection about the targeted host.
U2R: unauthorized access to privileges of the root
(local super user). This attack is known as the attacks
in which the attacker can access the system and can
exploit the vulnerabilities to gain the key permissions.
R2L: unauthorized access from a remote machine.
In this attack type, some packets are sent in the
network for achieving the network accessibility as a
known and local user.
DOS: denial of service. This attack type is applied
to user behavior understanding. This attack type
requires spending some computing resources and
memory.
IV. METHODOLOGY
In Fig.1, the main idea of the study and the entire
method has been illustrated. First, in this method, the
dataset will be dived in a random pattern into 2
groups, the training set and the testing set. In the
training phase, the 1st task of the machine learning is
leaning and selecting the most proper features and
then in the testing phase, the machine learning
knowledge is tested by the machine learning and the
selected features in the training phase are also tested
and after that the data is categorized into the two
groups, the attacks and the normal data. In the
machine learning process, the SVM and ANN receive
the data and then the both of the SVM and ANN are
used by the system for classifying the training data.
Then the SVM and ANN are ready to be applied in
the training set of the system. After all, when each of
these algorithm classifier of testing data the result of
detection or classification pass to the GA for
optimization or improvement of each algorithm for
high reorganization [24]. In other words, when the
classification of SVM and ANN are finished, the
classification of each algorithm is improved by GA
for achieving high detection.
FIGURE 1. OVERALL METHOD OF THIS PAPER
The GA is a method in which the global
optimization is searched and it is able to simulate the
behavior and the process of the evolution in the
nature. It means that each key that may be possible
will be trained in a vector type that is known as the
chromosome. Each one of the vector elements is a
representative of a gene. A population will be formed
by the whole set of the chromosomes and the
population projection is based upon the function of
the fitness [25]. For measuring the chromosome
fitness, a fitness value is used. The genetic process
primary populations are developed randomly. The
operators are applied by the GA to create the next
generation out of the current generation: mutation,
crossover, and reproduction. The chromosomes that
have lower fitness are omitted by the GA. Besides, the
GA prevents the chromosomes with high fitness [26].
All the aforementioned process will be repeated and
as a result, more chromosomes will be received by the
next generation with high fitness. This process will be
continued until an individual proper chromosome is
detected [27]. A primary individual set is turned into
the individuals with high quality by the GA and each
individual that has been achieved can operate as one
solution. The above mentioned individuals are known
as the chromosomes and some pre-determined genes
are the elements that form those chromosomes [28].
2014 IEEE Conference on Open Systems (ICOS), October 26-28, 2014, Subang, Malaysia
74
4. V. EXPERIMENTAL RESULT
In this paper, the SVM and ANN are first used for
the classification and recognition of the data in
groups: normal and attack. Then the genetic algorithm
was used to optimize the recognized data by SVM and
ANN. In this study, the GA optimization means the
improvement of the classification of each method for
FIGURE
FIGURE
FIGURE 4.
98
98.5
99
99.5
100
1 3 5 7 9 11
DetectionRaate(%)
99.93
99.94
99.95
99.96
99.97
99.98
99.99
100
1 3 5 7 9 11 13
DetectionRate(%)
98
98.5
99
99.5
100
1 3 5 7 9 11 13
DetectionRate(%)
RESULT
In this paper, the SVM and ANN are first used for
the classification and recognition of the data into two
groups: normal and attack. Then the genetic algorithm
was used to optimize the recognized data by SVM and
ANN. In this study, the GA optimization means the
improvement of the classification of each method for
the percentage of classification and recog
GA and ANN results are shown in
and SVM results are indicated in
effectiveness of the GA on the classification methods
is illustrated. In table 2 the comparison between the
effect of GA on SVM and ANN with
algorithms applied in the intrusion detection is
illustrated.
IGURE 2. RESULT OF DETECTION RATE FOR ANN WITH GA
IGURE 3. RESULT OF DETECTION RATE FOR SVM WITH GA
RESULT OF COMPARING ANN WITH GA AND SVM WITH GA
13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
Number Of Feature
13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
Number Of Feature
13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
Number Of Feature
the percentage of classification and recognition. The
GA and ANN results are shown in Fig.2 and the GA
and SVM results are indicated in Fig.3. In Fig.4, the
effectiveness of the GA on the classification methods
illustrated. In table 2 the comparison between the
effect of GA on SVM and ANN with the other
algorithms applied in the intrusion detection is
41
GA- ANN
GA-SVM
GA - SVM
GA - ANN
2014 IEEE Conference on Open Systems (ICOS), October 26-28, 2014, Subang, Malaysia
75
5. TABLE II. COMPARATIVE OF GA ON ANN AND SVM WITH
OTHER ALGORITHM
Name of algorithm Detection rate Number of
Feature
LCFS 100 % 21
FFSA 100 % 31
MMIFS 100 % 24
fuzzy rule based 100 % 41
FASVFG 94 % 20
SVM With GA 100 % 24
ANN with GA 100 % 18
The comparison indicates that the GA with ANN
will result in a better performance with a lower
number of features. When the GA and SVM are
compared with GA and ANN, it can be recognized
that the GA the effectiveness of the GA is higher on
ANN than SVM. Although high detection rates can be
achieved by the other algorithms, GA and ANN can
reach a high detection rate with a lower number of
features.
VI. CONCLUSION
In this study GA has been proposed for producing
the detection features. Then the SVM and ANN are
used for the detection system classifier and comparing
with each other to show the effectiveness of the GA
on these methods. The outcomes show that in
comparison with the other methods, the highest
detection rate is obtained by the GA with ANN. In
this study, a series of experiments was conducted by
applying the KDD cup 99 dataset for the detection of
four categories of network attacks. The feature
selection that has been based upon the GA with the
ANN classification shows more proper detection rates
in the proposed intrusion detection system. In order to
detect the attacks efficiently, the GA with SVM
requires 24 features and GA with ANN needs 18 for
achieving 100% of detection. In the future work, it has
been planned that the other methods of classification
be employed with GA. In addition, their effectiveness
is planned to be explored in the network attack
detection.
VII. ACKNOWLEDGEMENT
This research is funded by the Research University
grant of UniversityTechnology Malaysia (UTM)
under the Vot no. 08H28. The authors would like to
thank the Research Management Centre of UTM and
the Malaysian ministry of education for their support
and cooperation including students and other
individuals who are either directly or indirectly
involved in this project.
VIII. REFRENCES
[1] S. X. Wu and W. Banzhaf, "The use of computational
intelligence in intrusion detection systems: A review,"
Applied Soft Computing, vol. 10, pp. 1-35, 2010.
[2] A. Simmonds, P. Sandilands, and L. Van Ekert, "An
ontology for network security attacks," in Applied
Computing, ed: Springer, 2004, pp. 317-323.
[3] A. Tamilarasan, S. Mukkamala, A. H. Sung, and K.
Yendrapalli, "Feature ranking and selection for intrusion
detection using artificial neural networks and statistical
methods," in Neural Networks, 2006. IJCNN'06.
International Joint Conference on, 2006, pp. 4754-4761.
[4] V. T. Goh, J. Zimmermann, and M. Looi, "Towards intrusion
detection for encrypted networks," in Availability, Reliability
and Security, 2009. ARES'09. International Conference on,
2009, pp. 540-545.
[5] M. S. Prasad, A. V. Babu, and M. K. B. Rao, "An Intrusion
Detection System Architecture Based on Neural Networks
and Genetic Algorithms," International Journal of Computer
Science and Management Research, vol. 2, 2013.
[6] E. Corchado and Á. Herrero, "Neural visualization of
network traffic data for intrusion detection," Applied Soft
Computing, vol. 11, pp. 2042-2056, 2011.
[7] O. Linda, T. Vollmer, and M. Manic, "Neural network based
intrusion detection system for critical infrastructures," in
Neural Networks, 2009. IJCNN 2009. International Joint
Conference on, 2009, pp. 1827-1834.
[8] A. Bivens, C. Palagiri, R. Smith, B. Szymanski, and M.
Embrechts, "Network-based intrusion detection using neural
networks," Intelligent Engineering Systems through Artificial
Neural Networks, vol. 12, pp. 579-584, 2002.
[9] S. Mukkamala, G. Janoski, and A. Sung, "Intrusion detection
using neural networks and support vector machines," in
Neural Networks, 2002. IJCNN'02. Proceedings of the 2002
International Joint Conference on, 2002, pp. 1702-1707.
[10] T. Shon, J. Seo, and J. Moon, "SVM approach with a genetic
algorithm for network intrusion detection," in Computer and
Information Sciences-ISCIS 2005, ed: Springer, 2005, pp.
224-233.
[11] D. S. Kim and J. S. Park, "Network-based intrusion detection
with support vector machines," in Information Networking,
2003, pp. 747-756.
[12] M. S. Hoque, M. Mukit, M. Bikas, and A. Naser, "An
implementation of intrusion detection system using genetic
algorithm," arXiv preprint arXiv:1204.1336, 2012.
[13] P. Gupta and S. K. Shinde, "Genetic algorithm technique
used to detect intrusion detection," in Advances in
Computing and Information Technology, ed: Springer, 2011,
pp. 122-131.
[14] W. Li, "Using genetic algorithm for network intrusion
detection," Proceedings of the United States Department of
Energy Cyber Security Group, pp. 1-8, 2004.
[15] G. G. Helmer, J. S. Wong, V. Honavar, and L. Miller,
"Intelligent agents for intrusion detection," in Information
Technology Conference, 1998. IEEE, 1998, pp. 121-124.
[16] F. Amiri, M. Rezaei Yousefi, C. Lucas, A. Shakery, and N.
Yazdani, "Mutual information-based feature selection for
intrusion detection systems," Journal of Network and
Computer Applications, vol. 34, pp. 1184-1199, 2011.
[17] A. Abraham, R. Jain, J. Thomas, and S. Y. Han, "D-SCIDS:
Distributed soft computing intrusion detection system,"
Journal of Network and Computer Applications, vol. 30, pp.
81-98, 2007.
[18] A. Dastanpour and R. A. R. Mahmood, "Feature Selection
Based on Genetic Algorithm and SupportVector Machine for
Intrusion Detection System," in The Second International
2014 IEEE Conference on Open Systems (ICOS), October 26-28, 2014, Subang, Malaysia
76
6. Conference on Informatics Engineering & Information
Science (ICIEIS2013), 2013, pp. 169-181.
[19] A. Dastanpour, S. Ibrahim, and R. Mashinchi, "Using
Genetic Algorithm to Supporting Artificial Neural Network
for Intrusion Detection System," in The International
Conference on Computer Security and Digital Investigation
(ComSec2014), 2014, pp. 1-13.
[20] B. Luo and J. Xia, "A novel intrusion detection system based
on feature generation with visualization strategy," Expert
Systems with Applications, 2014.
[21] M. K. Siddiqui and S. Naahid, "Analysis of KDD CUP 99
Dataset using Clustering based Data Mining," International
Journal of Database Theory & Application, vol. 6, 2013.
[22] L. M. L. de Campos, R. C. L. de Oliveira, and M.
Roisenberg, "Network Intrusion Detection System Using
Data Mining," in Engineering Applications of Neural
Networks, ed: Springer, 2012, pp. 104-113.
[23] I. Levin, "KDD-99 classifier learning contest: LLSoft's
results overview," SIGKDD explorations, vol. 1, pp. 67-75,
2000.
[24] S. Dhopte and M. Chaudhari, "Genetic Algorithm for
Intrusion Detection System."
[25] Y.-X. Meng, "The practice on using machine learning for
network anomaly intrusion detection," in Machine Learning
and Cybernetics (ICMLC), 2011 International Conference
on, 2011, pp. 576-581.
[26] H. Sarvari and M. M. Keikha, "Improving the accuracy of
intrusion detection systems by using the combination of
machine learning approaches," in Soft Computing and
Pattern Recognition (SoCPaR), 2010 International
Conference of, 2010, pp. 334-337.
[27] R. Sommer and V. Paxson, "Outside the closed world: On
using machine learning for network intrusion detection," in
Security and Privacy (SP), 2010 IEEE Symposium on, 2010,
pp. 305-316.
[28] M. H. Mashinchi, M. R. Mashinchi, and S. M. H.
Shamsuddin, "A Genetic Algorithm Approach for Solving
Fuzzy Linear and Quadratic Equations," World Academy of
Science, Engineering and Technology, vol. 28, 2007.
2014 IEEE Conference on Open Systems (ICOS), October 26-28, 2014, Subang, Malaysia
77