Information Systems and Networks are subjected to electronic attacks. When
network attacks hit, organizations are thrown into crisis mode. From the IT department to
call centers, to the board room and beyond, all are fraught with danger until the situation is
under control. Traditional methods which are used to overcome these threats (e.g. firewall,
antivirus software, password protection etc.) do not provide complete security to the system.
This encourages the researchers to develop an Intrusion Detection System which is capable
of detecting and responding to such events. This review paper presents a comprehensive
study of Genetic Algorithm (GA) based Intrusion Detection System (IDS). It provides a
brief overview of rule-based IDS, elaborates the implementation issues of Genetic Algorithm
and also presents a comparative analysis of existing studies.
2. 103
law firms and corporates. According to the report published by Symantec Corporation [14] for the month of
November, 2013 the number of targeted attacks has increased, 438 new vulnerabilities have been discovered
bringing the total for the year up to 5965, two zero-day vulnerabilities have been discovered and 42 million
identities have been exposed. A successful targeted attack on a large company can cost it $2.4 million in
direct financial losses and additional costs. For a medium-sized or small company, a targeted attack can mean
about $92,000 in damages – almost twice as much as an average attack [10]. Therefore, the attention drifts to
Intrusion Detection Systems which monitor network traffic so as to identify resources misuse, unauthorized
use as well as its abuse and perform actions as defined by security policies. Intrusion detection systems
perform following functions:
Monitoring and analysis of user and system activity
Auditing of system configurations and vulnerabilities
Assessing the integrity of critical system and data files
Statistical analysis of activity patterns based on the matching to known attacks
Abnormal activity analysis and Operating system audit
The majorities of currently existing IDS face a number of challenges such as low detection rates and high
false alarm rates and therefore obstruct legitimate users from accessing the network resources. These
problems are due to the sophistication of the attacks and their intended similarities to normal behavior. To
overcome these problems in currently existing IDS, Genetic Algorithm based Intrusion detection system is
employed to enhance the performance of intrusion detection for rare and complicated attacks.
The rest of the paper is organized as follows: Section 2 provides a brief introduction to Intrusion Detection
System. Section 3 describes the implementation issues of Genetic Algorithm. Section 4 describes the
technique of applying Genetic Algorithm to Intrusion Detection System. Section 5 presents the related work
and a comparative analysis of existing studies. Finally, the discussion is concluded.
II. INTRUSION DETECTION SYSTEM
Intrusion detection is the process of identifying and responding to such events which violate the computer
security policies, acceptable use policies or standard security practices. An Intrusion Detection System (IDS)
is a security system which implements the process of intrusion detection and reports the intrusion accurately
to the appropriate authority. The IDS monitors packets from various network connections in order to detect
an intrusive activity [1]. If an intrusion is detected, the IDS simply logs in a message into system audit file to
be later analyzed by network security experts or stops such connections to end an intruder's attack or
performs some other action as defined by the organization’s rules and practices to provide security, handle
intrusion and recover from the damage caused by security breaches [1]. These systems do not react equally at
all the times, false alarms could occur sometimes.
A. Components of IDS
The basic architecture of intrusion detection system is explained below [2] [16] and presented in figure 1:
Data Source: Data sources can be categorized into four categories namely Host-based monitors, Network-
based monitors, Application-based monitors and Target-based monitors.
Data gathering device (sensor): It is responsible for collecting data from the monitored system.
Analysis Engine (detector): This component takes information from the sensors and examines the data in
order to detect attacks. The analysis engine can use various analysis approaches e.g. misuse/signature
based detection or anomaly/statistical detection.
Knowledge base: It is database which contains information collected by the sensors, but in preprocessed
format (e.g. knowledge base of attacks and their signatures, filtered data, data profiles, etc.). This
information is usually provided by network and security experts.
Configuration device: It provides information about the current state of the intrusion detection system
(IDS).
Response Manager: The response manager only acts when an intrusion is detected and performs the
necessary action as defined by the security policies of the organization. These actions can be either
automated (active) or involve human interaction (inactive).
3. 104
Figure1. Basic Architecture of Intrusion Detection System
B. Characteristics of IDS
IDS must have following characteristics [2]:
Prediction performance: Typical measures for evaluating predictive performance of IDS include detection
rate and false alarm rate. Detection rate is defined as the ratio of the number of correctly detected attacks
to the total number of attacks. The false alarm (or false positive) rate is the ratio of the number of normal
connections that are incorrectly classified as attacks to the total number of normal connections. Therefore,
good IDS must have high detection rate and low false positive rate.
Time performance: The total time taken by IDS for generating alarm should be as short as possible. The
processing time depends upon the processing speed of the IDS, which is the rate at which the IDS
processes audit events. If this rate is not sufficiently high, then the real time processing of security events
may not be feasible. The propagation time is the time needed for processed information to propagate to
the security analyst. Both times need to be as short as possible in order to allow the security analyst
sufficient time to react to an attack before much damage has been done, as well as to stop an attacker
from modifying audit information or altering the IDS itself.
Fault tolerance: An IDS should be robust, dependable and resistant to attacks and should be able to
recover quickly. This characteristic is very important for the proper functioning of IDSs, since most
commercial IDSs run on operating systems and networks that are vulnerable to different types of attacks.
In addition, IDS should also be resistant to scenarios when an adversary can cause the IDS to generate a
large number of false or misleading alarms. Such alarms may easily have a negative impact on the
availability of the system, and the IDS should be able to quickly overcome these obstacles.
Dynamic reconfiguration: it must be dynamically reconfigurable so that time spent on reconfiguration of
the system is as short as possible.
C. Taxonomy of IDS’s
The IDSs are generally classified [9] as shown in the figure 2:
Figure2: Taxonomy of IDS’s
By location (or by scope of protection):
Data Source (Monitored System)
Data gathering (sensors)
Analysis Engine
Knowledge base Configuration
Response Component
Raw data
Events
System state System state
Actions
Actions
IDS Classification
By location By detection model
Host-based
IDS
Network-
based IDS
Misuse
Detection
Anomaly
Detection
4. 105
Intrusion Detection Systems can be divided into following two types depending on the location where they
look for intrusive actions:
Host-based IDS (HIDS): Host-based IDS loads a piece of software on the system to be monitored. This
software evaluates the information associated with the system including the contents of operating system,
system and application files. If any critical file is deleted or modified then an alert message is send to the
administrator for further investigation.
Network-based IDS (NIDS): identifies the intrusive activities by analyzing the stream of packets which
travel across the network.
By detection model:
Intrusion Detection Systems can also be classified into following categories on the basis of the detection
approaches:
Misuse detection (or signature based detection): these systems work by matching user activity with stored
signatures of known attacks. Such detection systems use a predefined knowledgebase to check whether
the new network connection is in that knowledge database. If yes, the IDS consider this connection as a
possible attack and then block it.
Anomaly detection (or Behavior detection): In this case, the system learns the characteristics of normal
user activities and then uses such characteristics to judge whether new user's activity is normal or not.
III. GENETIC ALGORITHM
The Genetic Algorithm is a probabilistic search algorithm that iteratively transforms a set (called population)
of mathematical objects (typically fixed-length binary character strings called chromosomes), each with an
associated fitness value, into a new population of offspring objects using operations that are patterned after
naturally occurring genetic operations, such as crossover and mutation [8]. Genetic Algorithm is inspired
from the natural search and selection processes leading to the survival of the fittest [13]. In last few years,
genetic algorithms have emerged as practical, robust optimization and search methods. Genetic Algorithms
represent an intelligent exploitation of a random search used to solve optimization problems. GAs, although
randomized, exploit historical information to direct the search into the region of better performance within
the search space.
A. Working Principle of GA:
The working principle of GA is explained as follows [17]. Genetic Algorithm begins with a set of suitable
solutions for the problem. Each solution is represented by a chromosome-like data structure. Solutions from
one population are selected and used to generate a new population. This is motivated by the possibility that
the new population will be better than the old one. Solutions are selected according to their fitness to generate
new population; more suitable they are, more chances they have to reproduce. This is repeated until some
condition (e.g. fixed number of generations reached or improvement of the best solution etc.) is satisfied. The
pseudo-code for GA is as shown below.
Pseudo-code:
BEGIN
INITIALISE population with random candidate solutions.
EVALUATE each candidate;
REPEAT UNTIL (terminate condition) is satisfied DO
1. SELECT parents;
2. RECOMBINE pairs of parents;
3. MUTATE the resulting offspring;
4. SELECT individuals or the next generation;
END
B. Encoding of solutions as chromosomes:
Before using genetic algorithm to solve any problem it is necessary to encode the potential solutions to that
problem in a form which can be processed by a computer [17]. One common approach is to encode the
solutions as binary strings: sequences of 1’s and 0’s, where each digit represents the value of some aspect of
the solution. Each solution is represented in the form of a chromosome. Different positions in a chromosome
are referred to as genes and are changed randomly within a range during the process of evolution.
Example:
5. 106
A Gene may look like: 1101
A chromosome may look like: Gene1 Gene 2 Gene3 Gene4
1101 1001 1111 1011
Binary string representation of above chromosome: 1101100111111011
Other methods of encoding include encoding values as integers or real numbers or any element (E11 E3
E7…E1 E15) or list of rules (R1 R2 R3…R22 R23) or any data structure. The selection of the encoding
method depends upon the attributes of the problem to be solved.
C. Steps involved in basic Genetic Algorithm:
The various steps involved in GA are explained below [17] and the overall flow chart is presented in figure 3:
Step 1: [Start] Generate random population of ‘n’ chromosomes each representing a different solution to the
problem.
Step 2: [Fitness] Evaluate fitness f(x) of each chromosome ‘x’ in the population.
Step 3: [New population] Generate new population by repeating following steps until the new population is
complete
a. [Selection] Select two parent chromosomes from a population according to their fitness (higher the
fitness, greater the chance of selection).
b. [Crossover] With a crossover probability, cross over the parents to generate new offspring.
Crossover could be one-point or multi-point. If no crossover is performed then offspring is the
exact copy of parents.
c. [Mutation] With a mutation probability, mutate new offspring (i.e. randomly flip some bits).
d. [Accepting] Place new offspring in the new population.
Step 4: [Replace] Use new population for further run of the algorithm.
Step 5: [Test] If the end condition is satisfied, stop and return the best solution in current population.
Step 6: [Loop] Go to step2.
Figure3: Overall flow of GA
Yes
No
Mutation
Start
Generate
random
population
Apply Fitness
Function
Optimization
criteria met?
Result
Selection
Crossover
6. 107
A genetic algorithm is quite straightforward in general, but it could be complex in most cases. The values of
various parameters (for example, mutation rate, crossover rate, population size, chromosome size, number of
evolutions or generations, and selection process) need to be selected by considering the attributes of the
problem being solved. Genetic Algorithm is used to solve a problem if alternate solutions are too slow (or
much complicated) or an exploratory tool is required to examine new approaches or benefits of GA meet key
problem requirements etc. The advantages of using Genetic algorithm are [8]:
Always gives answer
Answer gets better with time
Inherently parallel
Easily re-trainable
Multiple ways to speed up and improve a GA-based application as knowledge about problem domain is
gained
Easy to exploit previous or alternate solutions
Different operators used in genetic algorithm avoid getting stuck in local maxima etc.
D. Limitations of Genetic Algorithm
Genetic algorithms are efficient, but in practice they have certain limitations:
It is not always easy to find a fitness function.
Representing a problem space in genetic algorithms is very complex.
It is a tough task to choose the optimal parameters for a genetic algorithm.
Genetic algorithms need a large number of fitness function evaluations.
It is not easy to configure a genetic algorithm based system.
IV. GENETIC ALGORITHM BASED SYSTEM MODEL
Genetic Algorithm can be used in different ways in intrusion detection systems. If Intrusion Detection
System is illustrated as a rule-based system then GA can be considered as a tool to generate rules for the rule-
based IDS. The goal of the system is not to evolve a single best rule (global optimal), but to create a set of
rules which is good enough to detect attacks. The system works by analysing the network connections. The
figure 4 describes the overall flow of GA based IDS. The system works in two phases: training phase and
testing phase.
A. Training phase
In this phase, a set of classification rules is generated from network audit data using Genetic Algorithm in an
offline environment. The training data set contains analysed logs of connections which clearly distinguish
between normal connections and attacks. The examples of various data sets include KDD Cup99 and
DARPA. The records from the training data set are represented in the form of chromosomes. Each
chromosome is a rule within which certain features of a connection are encoded in the form of fixed length
vector. A fitness function is then applied to each chromosome in order to evaluate its goodness. If a
chromosome helps to identify an attack correctly, it is considered good (or fit) else it is considered bad.
Crossover and mutation operations are applied to the good chromosomes in order to produce new generation.
This entire process is repeated by using the newly generated population. This process of evolution continues
until a solution is reached (i.e. a set of rules, capable of detecting attacks is generated). The generated rules
are stored in a rule base in the following form:
if { condition } then { act }
For example, a rule can be defined as [1]:
if {the connection has following information: source IP address 124.12.5.18; destination IP address:
130.18.206.55; destination port number: 21; connection time: 10.1 seconds} then {stop the connection}
Explanation: if there exists a network connection request with source IP address 124.12.5.18, destination IP
address 130.18.206.55, destination port number 21, and connection time 10.1 seconds, then stop the
connection establishment – since IP address 124.12.5.18 is recognized by the IDS as a blacklisted IP address.
Thus, service request initiated from it, is rejected.
The various steps involved in training phase are [1]:
1. Encoding of connections – Consider the following case [13] where six features of a network connection
are being used to identify an attack. The dataset used in this case is DARPA dataset which contains 7 features
of a connection including the attack name. The normal connections contain no attack-names. Each
7. 108
chromosome is a rule within which the 7-features are encoded via fixed length vector, and each feature is
encoded as one or more genes of different types as shown in table below.
TABLE I. CHROMOSOME REPRESENTATION OF A RULE
Sr.
no
Feature Feature Explanation Format Number of Genes
1. Duration Time period of the connection H:M:S 3
2. Protocol Protocol used for making connection Numeric 1
3. Source Port Application that the attacker system is running Numeric 1
4. Destination Port Application that the target system is running Numeric 1
5. Source IP Attacker system’s IP address a.b.c.d 4
6. Destination IP Target system’s IP address a.b.c.d 4
7. Attack name and type Name of the attack string 1
Each rule uses an if-then clause with a “condition” and “outcome” part. The first 6-features are connected via
logical AND to form “condition” part; while attack name is the “outcome” to show network record
classification (during training) or connection (during intrusion detection) if a rule is matched. For example
consider the following rule [13]:
if (duration=“0:0:1” and protocol=“finger” and source_port=18982 and destination_port=79 and
source_ip=“9.9.9.9” and destination_ip=“172.16.112.50”) then (attack_name=“neptune”)
The above rule expresses that if a network packet is originated from IP address 9.9.9.9 and port 18982, and
sent to IP address 172.16.112.50 and port 79 using the protocol finger, and the connection duration is 1
second, then most likely it is a network attack of type neptune that may eventually cause the destination host
out of service. The above rule can be represented as follows:
{0, 0, 1, 2, 18982, 79, 9, 9, 9, 9, 172, 16, 112, 50, 1}
2. Evaluating each chromosome using fitness function – During the training phase, evaluation of
chromosomes is carried out in order to determine their goodness. If a chromosome correctly classifies an
attack, it is considered good; else, it is bad and is not selected for crossover to produce offspring. Thus, a
chromosome which detects more attacks has higher fitness value and has higher chances for selection. The
different fitness models proposed by various researchers are: support and confidence model, reward-penalty
model, weighted sum model etc.
3. Selection – In order to choose the chromosomes different selection methods are used e.g. Fitness-
proportion selection, Roulette-wheel selection, Rank selection, Local selection, Tournament selection, Steady
state selection [6].
4. Crossover –With a crossover probability, cross over the parents to generate new offspring. Crossover can
be one-point or multi-point. If no crossover is performed then offspring is the exact copy of parents.
5. Mutation: Each gene in a chromosome may or may not change depending on the probability of mutation
rate. Mutation improves population diversity needed in this work.
B. Testing phase
In this phase, the rules stored in the rule base are used to detect whether a real-time network connection is a
normal connection or an intrusive attack. If the characteristics of new connection match with the ‘condition’
section of some pre-defined rule in the rule-base then the connection is considered as an attack else it is
considered as a normal connection. If an attack is detected then IDS performs the necessary actions defined
by the security policies of the organization. The algorithm for GA-based IDS is presented below.
Algorithm: Intrusion Detection [1]
Input: Inflowing network connection
Output: Decision if connection is intrusive or not
1: Loop Forever {fetch incoming packet}
2: for each rule in rule-base
3: Match rule with network connection (analysis console)
4: if rules match then
5: Mark current connection as an intrusion (and generate an alarm as per security policies)
6: end if
7: end for each
8: end loop forever.
8. 109
Figure4: Overall flow of GA based IDS
V. RELATED WORK
The Intrusion Detection System has undergone rapid changes and is using new evolved techniques to
generate better results. Genetic Algorithm can be used in different ways in Intrusion Detection Systems.
Genetic Algorithm based intrusion detection approach discussed in this review paper is focused on a rule
based Intrusion Detection System which uses only Genetic Algorithm to generate knowledge. For this
purpose network connections are analysed to describe the normal and abnormal behaviour in the network.
This section briefly summarizes some of the GA based IDSs and presents a comparative analysis of various
existing studies in table 2.
The early effort of using GAs for intrusion detection can be dated back to 1995, when Crosbie and Spafford
[12] applied the multiple agent technology and GP (Genetic Programming) to detect network anomalies.
Each agent monitors one parameter of the network audit data and GP is used to find the set of agents that
collectively determine anomalous network behaviors. This method has the advantage of using many small
autonomous agents, but the communication among them is still a problem. Also the training process can be
time consuming if the agents are not appropriately initialized.
Wei Li [11] proposes a GA-based method to detect anomalous network behaviors. This implementation of
genetic algorithm is unique as it considers both temporal and spatial information of network connections in
encoding the network connection information into rules in IDS. This may lead to increased detection rates.
However, no experimental results are available yet.
Ren Hui Gong, Mohammad Zulkernine and Purang Abolmaesumi [13] present a method of applying Genetic
Algorithm for intrusion detection. Seven network features including both categorical and quantitative data
fields are used when encoding and deriving the rules. A simple but efficient and flexible fitness function, i.e.
the support-confidence framework, is used to judge the quality of each rule. Depending on the selection of
fitness function weight values, the generated rules can be used to either generally detect network intrusions or
precisely classify the types of intrusions.The method has been implemented using Java and third party
package ECJ. The implementation has been tested using subsets of 1998 DARPA dataset. Experimental
results show that the proposed method worked efficiently and has flexibility to be used in different ways.
Start
Evolution of rules
using Genetic
Algorithm
Analysis of new
connections using
rules from rule-base
Attack
Detected?
Alert
Yes
No
Training
Dataset
Testing
Dataset
9. 110
However, some limitations of the method are also observed. First, the generated rules are biased to the
training dataset. This issue may be resolved by carefully selecting either the number of generations in the
training phase or the number of top best-fit rules in the intrusion detection phase. Second, while the support-
confidence framework is simple to implement and provides improved accuracy to final rules, it requires the
whole training data to be loaded into memory before any computation. For large training datasets, it is neither
efficient nor feasible. The use of some sorts of cache technologies may solve the problem.
Anup Goyal and Chetan Kumar [3] describe a GA based IDS to classify different types of network attacks
with very low false positive rate (at 0.2%) and almost 100% detection rate. The algorithm takes into
consideration different features of network connections such as type of protocol, network service on the
destination and status of the connection to generate a classification rule set. Each rule in rule set identifies a
particular attack .The design of the fitness function is such to make it biased towards individuals that
correctly classify only the attack connections. The experiments are performed on the KDDCup99 data set.
The generated rule set consists of six rules that can be applied to the IDS to identify and classify six different
types of attack connections that fall into two classes namely Denial of Service (DoS) and Probing attacks.
GALIB C++ library, especially suited to develop GA is used to implement the proposed system.
Bader and Nasereddin [5] discuss a technique of using Genetic Algorithm for Intrusion Detection System.
This implementation considers both temporal and spatial information of network connections in encoding the
network connection information into rules in IDS. The network traffic used for implementing GA is a pre-
classified data set that differentiates normal network connections from anomalous ones. This data set is
gathered using network sniffers (a program used to record network traffic without doing something harmful)
such as Tcpdump or Snort. The data set is manually classified based on the knowledge of experts. The rules
generated are good enough for filtering new network traffic. The various attributes of network connections
which are used for generating rules are: source IP address, destination IP address, source port number,
destination port number, duration, state, protocol, number of bytes sent by originator, number of bytes send
by responder.
B. Uppalaiah, K. Anand, B. Narsimha, S. Swaraj and T. Bharat [4] suggest an intrusion detection system
using genetic algorithm to generate rule set for eight types of attacks belonging to four categories. The
proposed architecture deployed KDDCUP99 dataset. The dataset contains 41 features out of which only 3
features have been used to specify each entry of the dataset. The architecture of the system and the software
implementation for the proposed technique are also discussed. The system created specified set of rules and
achieved high DoS (Denial of Service), R2L (Remote to Local), U2R (User to Root), Probe attack detection
rate. The average success rate achieved during experiments is 83.65%. The proposed system is flexible for
usage in different application areas. The proposed system is implemented using C# in .net suite.
Firas Alabsi and Reyadh Naoum [7] recommend a new fitness function using Reward-Penalty technique to
evaluate the chromosomes efficiently. The data of 5% of KDDCUP’99 has been used for the proposed
system. The proposed fitness function works on the principle that reward and penalty are proportionate to the
strength and weakness of chromosomes. In order to prove the validity of the new fitness function, the results
of reward-penalty model based fitness function are compared with the results of the support-confidence
model based fitness function. The results closely match with each other. The system has been built by using
Vb.Net 2010 and SQL server 2008.
A.A. Ojugo, A.O. Eboka, O.E. Okonta, R.E Yoro (Mrs) and F.O. Aghware [1] present a genetic algorithm
based approach which uses rules derived from network audit data for network intrusion detection. The fitness
function utilized is based on the support-confidence framework. The fitness function is simple, efficient and
flexible. The training and testing data set used is the DARPA 1998 MIT Lincoln laboratory. The study
implemented GA based IDS using C (programming language) in Linux operating system platform. However,
some limitations of the method are also witnessed. First, the generated rules are biased to the training dataset.
This issue may be resolved by carefully selecting either the number of generations in the training phase or the
number of top best-fit rules in the intrusion detection phase. Second, while the support-confidence framework
is simple to implement and provides improved accuracy to final rules, it requires the whole training data to be
loaded into memory before any computation. For large training datasets, it is neither efficient nor feasible.
The use of some sorts of cache technologies may solve the problem.
V. Moraveji Hashmei, Z. Muda and W. Yassin [15] present a genetic algorithm based intrusion detection
system. Software implementation of the proposed system is presented. The system is flexible enough to be
used in different application environments, if proper attack taxonomy and proper training dataset exist. High
detection rate and low false positive rates are the highlights of the proposed system. The proposed system can
10. 111
be applied for intrusion detection without using any complementary technique that is commonly used with
other soft-computing techniques. KDDCUP’99 dataset is used for training phase.
Bharat S. Dhak and Shrikant Lade [6] present a genetic algorithm based intrusion detection technique to
detect malicious packets on the network and ultimately help to block the respective IP addresses. The Genetic
Algorithm process is discussed in detail. The training is done on the predefined data rules. The testing is done
on the entries generated by the firewall system of machine in pfirewall.log file. The proposed system can be
integrated with any of the IDS system to improve the efficiency and the performance of the same.
M. Sadiq Ali Khan [18] designed a rule-based Intrusion Detection System to detect DoS (Denial of Service)
or Probing attacks by formulating the contributing parameters in terms of rules. Genetic algorithm is used to
devise these rules. In this study, KDD-99 data set is used with reduced set of attributes. Principal Component
Analysis is used to reduce the data set. By running GA for more than 2000 times the proposed system
managed to achieve 91% accuracy in detecting network attacks.
TABLE II. COMPARISON OF EXISTING STUDIES ON GA BASED IDS
Reference
Detection
Approach
Fitness Function (F)
Explanation of Fitness function
used
Remarks
A.A. Ojugo,
A.O. Eboka,
O.E. Okonta,
R.E Yoro
(Mrs), F.O.
Aghware [1]
Misuse analysis
Support and confidence model
F=W1*support+
W2*confidence
If we have the rule:
If A then B,
support = |A and B| / N
confidence = |A and B| / |A|
N = Number of connections in
training data
|A| = Number of connections
matching condition A.
|A and B| = Connections
matching rule if A and B
w1, w2 = Weights to
balance/control the two terms.
Uses 7-network features;
so in order to detect
millions of connections
high processing speed and
sufficient cache are the
required features; 97% of
the attacks detected
correctly by this system.
B. Uppalaiah
K. Anand,
B.Narsimha,
S.Swaraj,
T.Bharat [4]
Misuse analysis
Fitness = f(x) / f (sum)
Where f(x) is the fitness of
entity x and f is the total fitness
of
all entities
Uses only 3 network
features; 83.65% of avg.
success rate; process is
faster , can be applied for
high speed networks
Bharat S.
Dhak,
Shrikant
Lade [6]
Misuse analysis
F= weight*packet_size
Where the packet_size is the
actual packet data size
prescribed by the incoming
packet data stream and weight
is the Vector which is applied
to each chromosome.
Scope of experiment is
focused to generate a list
of vulnerable IP
addresses; gained 96% of
accuracy.
Firas Alabsi,
Reyadh
Naoum [7]
Misuse analysis
Reward Penalty model based
F=2+(AB-A/AB+A)+(AB/X)-
(A/Y)
Consider a rule:
If A then B,
((AB-A)/(AB+A))= strength
of a record;
AB/X= ratio of the strength of
record to the strength of the
strongest record;
A/Y=ratio of the weakness of a
record to the weakness of the
weakest record;
Uses 5-network features;
Fitness function gives
reward to good
chromosomes and applies
penalty on the bad
chromosomes; comparison
between the newly
proposed and other
existing fitness functions
is presented.
Wei Li [11]
Anomaly
Detection
Weighted sum model based
F=1-penalty
Fitness function is determined
by calculating the general
outcome, absolute difference
and penalty values.
Considers both temporal
and spatial features of a
network connection to
detect an attack; no
experimental results
V. Moraveji
Hashmei, Z.
Muda, W.
Yassin [15]
Misuse analysis
F= (a/A)-(b/B)
Where a=number of correctly
detected attacks; A = total
number of attacks in the
training dataset; b = number of
normal connections that are
falsely detected as attacks; B =
total number of normal
connections.
Uses only 3-network
features; fast processing
and can be applied for
high speed networks; high
detection rate; low false
positives; gained 95.62%
as detection rate and
4.37% as false alarm; can
be used without using any
complementary technique.
11. 112
VI. CONCLUSION
The three factors which have impact on the effectiveness of the genetic algorithm are selection of fitness
function, representation of individuals and values of the GA parameters. The determination of these factors
often depends on applications. Designing accurate fitness function is the major challenge for solving a
particular problem. Different models for designing fitness function have been discussed in the paper. Using
GA for intrusion detection has proven to be a cost-effective approach. One of the major advantages of this
technique is due to the fact that in the real world, the types of intrusions change and become complicated
very rapidly. The GA based detection system can upload and update new rules to the systems as the new
intrusions become known. Therefore, it is cost effective and adaptive.
REFERENCES
[1] A.A. Ojugo, A.O. Eboka, O.E. Okonta, R.E Yoro (Mrs), F.O. Aghware, “Genetic Algorithm Rule-Based Intrusion
Detection System (GAIDS)”, Journal of Emerging Trends in Computing and Information Sciences, Vol.3, pp. 1182-
1194, Aug 2012
[2] Aleksandar Lazarevic, Vipin Kumar, Jaideep Srivastava, “Intrusion Detection a survey”, unpublished.
[3] Anup Goyal, Chetan Kumar, “GA-NIDS: A Genetic Algorithm based Network Intrusion Detection System”, 2008.
[4] B. Uppalaiah, K. Anand, B. Narsimha, S. Swaraj, T. Bharat, “Genetic Algorithm Approach to Intrusion Detection
System”, IJCST Vol. 3, Issue 1, Jan-March 2012
[5] Bader and Nasereddin, “Using Genetic Algorithm in Network Security”, IJRRAS, vol. 5, pp. 148-154, Nov. 2010
[6] Bharat S. Dhak, Shrikant Lade, “ An Evolutionary Approach to Intrusion Detection System using Genetic
Algorithm” .ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 2, Issue 12, Dec. 2012
[7] Firas Alabsi and Reyadh Naoum(2012, April), “Fitness Function for Genetic Algorithm used in Intrusion Detection
System”, International Journal of Applied Science and Technology, Vol. 2, pp. 632-637.
[8] GA tutorial, Available at:
http://www.vit.ac.in/academicresearch/res701/RES701DUMP/Evolutionary%20Algorithms/GATutorial.pdf
[9] Kamal Kishore Prasad, Samarjeet Borah, “Use of Genetic Algorithms in Intrusion Detection Systems: An Analysis”,
International Journal of Applied Research and Studies (iJARS) ISSN: 2278-9480 Volume 2, Issue 8, Aug 2013
[10]Kaspersky lab Global Corporate IT Security Risks: 2013, May 2013
[11]Li, Wei, “Using Genetic Algorithm for Network Intrusion Detection”, (2004)
[12]M. Crosbie, E. Spafford, “Applying Genetic Programming to Intrusion Detection”, Proceedings of the AAAI Fall
Symposium, 1995.
[13]Ren Hui Gong, Mohammad Zulkernine, Purang Abolmaesumi, “A Software Implementation of a Genetic Algorithm
Based Approach to Network Intrusion Detection”, Proceedings of the Sixth International Conference on Software
Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing and First ACIS International
Workshop on Self-Assembling Wireless Networks (SNPD/SAWN’05) 2005 IEEE .
[14]Ben Nahorney, Symantec Intelligence report: November 2013
[15]V. Moraveji Hashmei, Z. Muda and W. Yassin, “Improving Intrusion Detection using Genetic Algorithm”,
International Technology journal 12(11) pp. 2167-2173, 2013
[16]Mohammad Sazzadul Hoque, Md. Abdul Mukit and Md. Abu Naser Bikas, “An Implementation of Intrusion
Detection System using Genetic Algorithm”, International Journal of Network Security & Its Applications (IJNSA),
Vol.4, No.2, March 2012
[17]RC Chakraborty, Fundamentals of Genetic Algorithm: AI Course, June 2010, available at
http://www.myreaders.info/09-Genetic_Algorithms.pdf
[18]M. Sadiq Ali Khan, “Rule based Network Intrusion Detection using Genetic Algorithm”, International Journal of
Computer Applications (0975 – 8887) Volume 18– No.8, March 2011