SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,                    VOL. 7,      NO. 1,    JANUARY-MARCH 2010                                      35




               Layered Approach Using Conditional
               Random Fields for Intrusion Detection
                             Kapil Kumar Gupta, Baikunth Nath, Senior Member, IEEE, and
                                       Ramamohanarao Kotagiri, Member, IEEE

       Abstract—Intrusion detection faces a number of challenges; an intrusion detection system must reliably detect malicious activities
       in a network and must perform efficiently to cope with the large amount of network traffic. In this paper, we address these two issues
       of Accuracy and Efficiency using Conditional Random Fields and Layered Approach. We demonstrate that high attack detection
       accuracy can be achieved by using Conditional Random Fields and high efficiency by implementing the Layered Approach.
       Experimental results on the benchmark KDD ’99 intrusion data set show that our proposed system based on Layered Conditional
       Random Fields outperforms other well-known methods such as the decision trees and the naive Bayes. The improvement in attack
       detection accuracy is very high, particularly, for the U2R attacks (34.8 percent improvement) and the R2L attacks (34.5 percent
       improvement). Statistical Tests also demonstrate higher confidence in detection accuracy for our method. Finally, we show that our
       system is robust and is able to handle noisy data without compromising performance.

       Index Terms—Intrusion detection, Layered Approach, Conditional Random Fields, network security, decision trees, naive Bayes.

                                                                                 Ç

1    INTRODUCTION
                                                                                     The signature-based systems are trained by extracting specific
I   NTRUSION detection as defined by the SysAdmin, Audit,
   Networking, and Security (SANS) Institute is the art of
detecting inappropriate, inaccurate, or anomalous activity
                                                                                     patterns (or signatures) from previously known attacks while
                                                                                     the anomaly-based systems learn from the normal data
[6]. Today, intrusion detection is one of the high priority and                      collected when there is no anomalous activity [11].
challenging tasks for network administrators and security                                Another approach for detecting intrusions is to consider
professionals. More sophisticated security tools mean that the                       both the normal and the known anomalous patterns for
attackers come up with newer and more advanced penetra-                              training a system and then performing classification on the
tion methods to defeat the installed security systems [4] and                        test data. Such a system incorporates the advantages of both
[24]. Thus, there is a need to safeguard the networks from                           the signature-based and the anomaly-based systems and is
known vulnerabilities and at the same time take steps to                             known as the Hybrid System. Hybrid systems can be very
detect new and unseen, but possible, system abuses by                                efficient, subject to the classification method used, and can
developing more reliable and efficient intrusion detection                           also be used to label unseen or new instances as they assign
systems. Any intrusion detection system has some inherent                            one of the known classes to every test instance. This is
requirements. Its prime purpose is to detect as many attacks                         possible because during training the system learns features
as possible with minimum number of false alarms, i.e., the                           from all the classes. The only concern with the hybrid method
system must be accurate in detecting attacks. However, an                            is the availability of labeled data. However, data requirement
accurate system that cannot handle large amount of network                           is also a concern for the signature- and the anomaly-based
traffic and is slow in decision making will not fulfill the                          systems as they require completely anomalous and attack-
purpose of an intrusion detection system. We desire a system                         free data, respectively, which are not easy to ensure.
that detects most of the attacks, gives very few false alarms,                           The rest of this paper is organized as follows: In Section 2,
copes with large amount of data, and is fast enough to make                          we discuss the related work with emphasis on various
real-time decisions.                                                                 methods and frameworks used for intrusion detection. We
   Intrusion detection started in around 1980s after the                             describe the use of Conditional Random Fields (CRFs) for
influential paper from Anderson [10]. Intrusion detection                            intrusion detection [23] in Section 3 and the Layered
systems are classified as network based, host based, or
                                                                                     Approach [22] in Section 4. We then describe how to
application based depending on their mode of deployment
                                                                                     integrate the Layered Approach and the CRFs in Section 5.
and data used for analysis [11]. Additionally, intrusion
                                                                                     In Section 6, we give our experimental results and compare
detection systems can also be classified as signature based or
                                                                                     our method with other approaches that are known to
anomaly based depending upon the attack detection method.
                                                                                     perform well. We observe that our proposed system,
                                                                                     Layered CRFs, performs significantly better than other
. The authors are with the Department of Computer Science and Software               systems. We study the robustness of our method in Section 7
  Engineering, and NICTA Victoria Research Laboratory, The University of             by introducing noise in the system. We discuss feature
  Melbourne, Parkville 3010, Australia.                                              selection in Section 8 and draw conclusions in Section 9.
  E-mail: {kgupta, bnath, rao}@csse.unimelb.edu.au.
Manuscript received 6 Mar. 2007; revised 11 Dec. 2007; accepted 28 Jan.
2008; published online 12 Mar. 2008.                                                 2     RELATED WORK
For information on obtaining reprints of this article, please send e-mail to:
tdsc@computer.org, and reference IEEECS Log Number TDSC-2007-03-0031.                The field of intrusion detection and network security has
Digital Object Identifier no. 10.1109/TDSC.2008.20.                                  been around since late 1980s. Since then, a number of
                                               1545-5971/10/$26.00 ß 2010 IEEE       Published by the IEEE Computer Society
                     Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
36                                   IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,                     VOL. 7,   NO. 1, JANUARY-MARCH 2010


methods and frameworks have been proposed and many                          often hard to select the best possible architecture for a neural
systems have been built to detect intrusions. Various                       network. Support vector machines have also been used for
techniques such as association rules, clustering, naive Bayes               detecting intrusions [26]. Support vector machines map real-
classifier, support vector machines, genetic algorithms,                    valued input feature vector to a higher dimensional feature
artificial neural networks, and others have been applied to                 space through nonlinear mapping and can provide real-time
detect intrusions. In this section, we briefly discuss these                detection capability, deal with large dimensionality of
techniques and frameworks.                                                  data, and can be used for binary-class as well as multiclass
   Lee et al. introduced data mining approaches for                         classification. Other approaches for detecting intrusion
detecting intrusions in [30], [31], and [32]. Data mining                   include the use of genetic algorithm and autonomous and
approaches for intrusion detection include association rules                probabilistic agents for intrusion detection [1] and [5]. These
and frequent episodes, which are based on building                          methods are generally aimed at developing a distributed
classifiers by discovering relevant patterns of program                     intrusion detection system.
and user behavior. Association rules [8] and frequent                           To overcome the weakness of a single intrusion detection
episodes are used to learn the record patterns that describe                system, a number of frameworks have been proposed, which
user behavior. These methods can deal with symbolic data,                   describe the collaborative use of network-based and host-
and the features can be defined in the form of packet and                   based systems [45]. Systems that employ both signature-
connection details. However, mining of features is limited                  based and behavior-based techniques are discussed in [19]
to entry level of the packet and requires the number of                     and [41]. In [32], the authors describe a data mining frame-
records to be large and sparsely populated; otherwise, they                 work for building adaptive intrusion detection models. A
tend to produce a large number of rules that increase the                   distributed intrusion detection framework based on mobile
complexity of the system [7].                                               agents is discussed in [12].
   Data clustering methods such as the k-means and the fuzzy                    The most closely related work, to our work, is of Lee et al.
c-means have also been applied extensively for intrusion                    [30], [31], and [32]. They, however, consider a data mining
detection [36] and [39]. One of the main drawbacks of the                   approach for mining association rules and finding frequent
clustering technique is that it is based on calculating numeric             episodes in order to calculate the support and confidence of
distance between the observations, and hence, the observa-                  the rules separately. Instead, in our work, we define features
tions must be numeric. Observations with symbolic features                  from the observations as well as from the observations and
cannot be easily used for clustering, resulting in inaccuracy.              the previous labels and perform sequence labeling via the
In addition, the clustering methods consider the features                   CRFs to label every feature in the observation. This setting is
independently and are unable to capture the relationship                    sufficient for modeling the correlation between different
between different features of a single record, which further                features of an observation. We also compare our work with
degrades attack detection accuracy.                                         [21], which describes the use of maximum entropy principle
   Naive Bayes classifiers have also been used for intrusion                for detecting anomalies in the network traffic. The key
detection [9]. However, they make strict independence                       difference between [21] and our work is that the authors in
assumption between the features in an observation result-                   [21] use only the normal data during training and build a
ing in lower attack detection accuracy when the features are                baseline system, i.e., a behavior-based system, while we train
correlated, which is often the case for intrusion detection.                our system with both the normal and the anomalous data,
Bayesian network can also be used for intrusion detection                   i.e., we build a hybrid system. Second, the system in [21] fails
[28]. However, they tend to be attack specific and build a
                                                                            to model long-range dependencies in the observations,
decision network based on special characteristics of
                                                                            which can be easily represented in our model. We also
individual attacks. Thus, the size of a Bayesian network
                                                                            integrate the Layered Approach with the CRFs to gain the
increases rapidly as the number of features and the type of
attacks modeled by a Bayesian network increases.                            benefits of computational efficiency and high accuracy of
   To detect anomalous traces of system calls in privileged                 detection in a single system.
processes [20], hidden Markov models (HMMs) have been                           We compare the Layered Approach with the works in [18],
applied in [17], [42], and [43]. However, modeling the system               [25], and [41]. The authors in [18] describe the combination of
calls alone may not always provide accurate classification as               “strong” classifiers using stacking, where the decision tress,
in such cases various connection level features are ignored.                naive Bayes, and a number of other classification methods are
Further, HMMs are generative systems and fail to model                      used as base classifiers. The authors show that the output
long-range dependencies between the observations [29]. We                   from these classifiers can be combined to generate a better
further discuss this in detail in Section 3.                                classifier rather than selecting the best one. In [25], the authors
   Decision trees have also been used for intrusion                         use a combination of “weak” classifiers. The individual
detection [9]. The decision trees select the best features for              classification power of weak classifiers is slightly better than
each decision node during the construction of the tree based                random guessing. The authors show that a number of such
on some well-defined criteria. One such criterion is to use                 classifiers when combined using simple majority voting
the information gain ratio, which is used in C4.5. Decision                 mechanism, provide good classification. In [41], the authors
trees generally have very high speed of operation and high-                 apply a combination of anomaly and misuse detectors for
attack detection accuracy.                                                  better qualification of analyzed events. However, our work is
   Debar et al. [14] and Zhang et al. [46] discuss the use of               not based upon classifier combination. Combination of
artificial neural networks for network intrusion detection.                 classifiers is expensive with regard to the processing time
Though the neural networks can work effectively with noisy                  and decision making. The purpose of classifier combination is
data, they require large amount of data for training and it is              to improve accuracy. Rather, our system is based upon serial

                 Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
GUPTA ET AL.: LAYERED APPROACH USING CONDITIONAL RANDOM FIELDS FOR INTRUSION DETECTION                                                                37


layering of multiple hybrid detectors. From our experiments                   edges in subgraph S. In addition, the features fk and gk
in Section 6, we show that the Layered CRFs perform better                    are assumed to be given and fixed. For example, a
than individual classifiers and they are more efficient and                   Boolean edge feature fk might be true if the observation Xi
accurate than a system based on classifier combination. The                   is “protocol ¼ tcp,” tag YiÀ1 is “normal,” and tag Yi is
results from individual classifiers at a layer are not combined               “normal.” Similarly, a Boolean vertex feature gk might be true
at any later stage in the Layered Approach, and hence, an                     if the observation Xi is “service ¼ ftp” and tag Yi is “attack.”
attack can be blocked at the layer where it is detected. There is             Further, the parameter estimation problem is to find the
no communication overhead among the layers and the central                    parameters  ¼ ð1 ; 2 ; . . . ; 1 ; 2 ; . . .Þ from the training data
                                                                                            N
decision-maker. In addition, since the layers are independent                 D ¼ ðxi ; yi Þi¼1 with the empirical distribution pðx; yÞ [29].
                                                                                                                                         ~
they can be trained separately and deployed at critical                           CRFs are undirected graphical models used for sequence
locations in a network depending upon the specific require-                   tagging. The prime difference between CRF and other
ments of a network. Using a stacked system will not give us                   graphical models such as the HMM is that the HMM, being
the advantage of reduced processing when an attack is                         generative, models the joint distribution pðy; xÞ, whereas the
detected at the initial layers in the sequential model.                       CRF are discriminative models and directly model the
   In this paper, we show the effectiveness of CRFs for                       conditional distribution pðyjxÞ, which is the distribution of
intrusion detection. Motivated by our results in [23], we                     interest for the task of classification and sequence labeling.
perform detailed analysis and show that CRFs are a strong                     Similar to HMM, the naive Bayes is also generative and
candidate for building robust intrusion detection systems.                    models the joint distribution. Modeling the joint distribution
We then show that high efficiency can be achieved by                          has two disadvantages. First, it is not the distribution of
implementing the Layered Approach. Finally, we integrate                      interest, since the observations are completely visible and the
the Layered Approach and the CRFs to develop a system                         interest is in finding the correct class for the observations,
that is accurate and performs efficiently.                                    which is the conditional distribution pðyjxÞ. Second, inferring
                                                                              the conditional probability pðyjxÞ from the modeled joint
                                                                              distribution, using the Bayes rule, requires the marginal
3    CONDITIONAL RANDOM FIELDS                     FOR INTRUSION
                                                                              distribution pðxÞ. To estimate this marginal distribution is
     DETECTION                                                                difficult since the amount of training data is often limited and
Conditional models are probabilistic systems that are used                    the observation x contains highly dependent features that are
to model the conditional distribution over a set of random                    difficult to model and therefore strong independence
variables. Such models have been extensively used in the                      assumptions are made among the features of an observation.
natural language processing tasks. Conditional models offer                   This results in reduced accuracy [40]. CRFs, however, predict
a better framework as they do not make any unwarranted                        the label sequence y given the observation sequence x. This
assumptions on the observations and can be used to model                      allows them to model arbitrary relationship among different
rich overlapping features among the visible observations.                     features in an observation x [15]. CRFs also avoid the
Maxent classifiers [37], maximum entropy Markov models                        observation bias and the label bias problem, which are
[34], and CRFs [29] are such conditional models. The                          present in other discriminative models, such as the maximum
advantage of CRFs is that they are undirected and are, thus,                  entropy Markov models. This is because the maximum
free from the Label Bias and the Observation Bias [27]. The                   entropy Markov models have a per-state exponential model
simplest conditional classifier is the Maxent classifier based                for the conditional probabilities of the next state given the
upon maximum entropy classification, which estimates the                      current state and the observation, whereas the CRFs have a
conditional distribution of every class given the observations                single exponential model for the joint probability of the entire
[37]. The training data is used to constrain this conditional                 sequence of labels given the observation sequence [29].
distribution while ensuring maximum entropy and hence                             The task of intrusion detection can be compared to many
maximum uniformity. We now give a brief description of the                    problems in machine learning, natural language processing,
CRFs, which is motivated from the work in [29].                               and bioinformatics. The CRFs have proven to be very
    Let X be the random variable over data sequence to be                     successful in such tasks, as they do not make any unwar-
labeled and Y the corresponding label sequence. In                            ranted assumptions about the data. Hence, we explore the
addition, let G ¼ ðV ; EÞ be a graph such that Y ¼ ðYv Þv2ðV Þ ,              suitability of CRFs for intrusion detection.
so that Y is indexed by the vertices of G. Then, ðX; Y Þ is a
CRF, when conditioned on X, the random variables Yv                           3.1 Motivating Example
obey the Markov property with respect to the graph:                           The data analyzed by the intrusion detection system for
pðYv jX; Yw ; w 6¼ vÞ ¼ pðYv jX; Yw ; w $ vÞ, where w $ v means               classification often has a number of features that are highly
that w and v are neighbors in G, i.e., a CRF is a random field                correlated and complex relationships exist between them.
globally conditioned on X. For a simple sequence (or chain)                   For example, when classifying network connections as
modeling, as in our case, the joint distribution over the label               either normal or as attack, a system may consider features
sequence Y given X has the following form:                                    such as “logged in” and “number of file creations.” When
                                                                !             these features are analyzed individually, they do not
                   X                        X                                 provide any information that can aid in detecting attacks.
  p ðyjxÞ / exp       k fk ðe; yje ; xÞ þ   k gk ðv; yjv ; xÞ ; ð1Þ
                                                                              However, when these features are analyzed together, they
                 e2E;k                    v2V ;k
                                                                              can provide meaningful information, which can be helpful
where x is the data sequence, y is a label sequence, and yjs is               for the classification task. Taking another example, the
the set of components of y associated with the vertices or                    connection level feature such as the “service invoked” at the

                   Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
38                                      IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,                     VOL. 7,   NO. 1, JANUARY-MARCH 2010




                                                                               Fig. 2. Layered representation.

                                                                                  Our first goal is to improve the attack detection accuracy.
                                                                               We first compare the accuracy of CRFs for detecting attacks
Fig. 1. Graphical representation of a CRF.
                                                                               with other methods in Section 6. We consider all the
                                                                               41 features in the data set for each of the four attack groups
destination provides some information about the class label
                                                                               separately. As we shall observe, the CRFs outperform other
(in case an attacker sends request to a service that is not
                                                                               methods for detecting “Unauthorized access to Root” (U2R)
available). This information becomes more concrete and
                                                                               attacks. They are also effective in detecting the Probe,
aids in classification when analyzed with other features
                                                                               “Remote to Local” (R2L), and “Denial of Service” (DoS)
such as “protocol type” and “amount of data transferred”
                                                                               attacks. However, CRFs can be expensive during training
between source and destination (in case the client connects
                                                                               and testing. For a simple linear chain structure, the time
to an available service such as the ftp and performs data
                                                                               complexity for training a CRF is OðT L2 NIÞ, where T is the
transfer). These relationships, between different features in
                                                                               length of the sequence, L is the number of labels, N is the
the observed data, if considered during classification can
                                                                               number of training instances, and I is the number of
significantly decrease classification error. The CRFs do not
                                                                               iterations. During inference, the Viterbi algorithm is
consider features to be independent and hence perform
                                                                               employed, which has a complexity of OðT L2 Þ. The quad-
better when compared with other methods.
                                                                               ratic complexity is significant when the number of labels is
    The data set used in our experiments represents features
                                                                               large as in language tasks. However, for intrusion detection,
of every session in relational form with only one label for
                                                                               there are only two labels “normal” and “attack,” and thus,
the entire record. In this case, using a conditional model
                                                                               the system is very efficient. We further improve the overall
would result in a simple maximum entropy classifier [40].
                                                                               system performance by using the Layered Approach, which
However, we represent the data in the form of a sequence
                                                                               decreases T , i.e., the length of the sequence. The Layered
and assign a label to every feature in the sequence using the
                                                                               Approach is described next.
first-order Markov assumption instead of assigning a single
label to the entire observation. Though, this increases the
complexity but it also increases the attack detection accuracy.                4    LAYERED APPROACH FOR INTRUSION DETECTION
    Each record represents a separate connection, and hence,                   We now describe the Layer-based Intrusion Detection
we consider every record as a separate sequence. We aim to                     System (LIDS) in detail. The LIDS draws its motivation
model the relationships among features of individual                           from what we call as the Airport Security model, where a
connections using a CRF, as shown in Fig. 1. In the figure,                    number of security checks are performed one after the other
features such as duration, protocol, service, flag, and                        in a sequence. Similar to this model, the LIDS represents a
src_bytes take some possible value for every connection.                       sequential Layered Approach and is based on ensuring
During training, feature weights are learnt, and during                        availability, confidentiality, and integrity of data and (or)
testing, features are evaluated for the given observation,                     services over a network. Fig. 2 gives a generic representa-
which is then labeled accordingly.                                             tion of the framework.
    As it is evident from the figure, every label is connected                    The goal of using a layered model is to reduce computation
to every input feature, which indicates that all the features                  and the overall time required to detect anomalous events. The
in an observation help in labeling, and thus, a CRF can                        time required to detect an intrusive event is significant and
model dependencies among the features in an observation.                       can be reduced by eliminating the communication overhead
Present intrusion detection systems do not consider such                       among different layers. This can be achieved by making the
relationships among the features in the observations. They
                                                                               layers autonomous and self-sufficient to block an attack
either consider only one feature, such as in the case of
                                                                               without the need of a central decision-maker. Every layer in
system call modeling, or assume conditional independence
                                                                               the LIDS framework is trained separately and then deployed
among different features in the observation as in the case of
a naive Bayes classifier. As we will show from our                             sequentially. We define four layers that correspond to the
experimental results, the CRFs can effectively model such                      four attack groups mentioned in the data set. They are Probe
relationships among different features of an observation                       layer, DoS layer, R2L layer, and U2R layer. Each layer is then
resulting in higher attack detection accuracy. Another                         separately trained with a small set of relevant features.
advantage of using CRFs is that every element in the                           Feature selection is significant for Layered Approach and
sequence is labeled such that the probability of the entire                    discussed in the next section. In order to make the layers
labeling is maximized, i.e., all the features in the observa-                  independent, some features may be present in more than one
tion collectively determine the final labels. Hence, even if                   layer. The layers essentially act as filters that block any
some data is missing, the observation sequence can still be                    anomalous connection, thereby eliminating the need of
labeled with less number of features.                                          further processing at subsequent layers enabling quick

                    Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
GUPTA ET AL.: LAYERED APPROACH USING CONDITIONAL RANDOM FIELDS FOR INTRUSION DETECTION                                                               39


response to intrusion. The effect of such a sequence of layers is            illegitimate requests. Hence, for the DoS layer, traffic
that the anomalous events are identified and blocked as soon                 features such as the “percentage of connections having same
as they are detected.                                                        destination host and same service” and packet level features
   Our second goal is to improve the speed of operation of                   such as the “source bytes” and “percentage of packets with
the system. Hence, we implement the LIDS and select a                        errors” are significant. To detect DoS attacks, it may not be
small set of features for every layer rather than using all the              important to know whether a user is “logged in or not.”
41 features. This results in significant performance im-
provement during both the training and the testing of the                    5.1.3 R2L Layer
system. In many situations, there is a trade-off between                     The R2L attacks are one of the most difficult to detect as
efficiency and accuracy of the system and there can be                       they involve the network level and the host level features.
various avenues to improve system performance. Methods                       We therefore selected both the network level features such
such as naive Bayes assume independence among the                            as the “duration of connection” and “service requested”
observed data. This certainly increases system efficiency,                   and the host level features such as the “number of failed
but it may severely affect the accuracy. To balance this                     login attempts” among others for detecting R2L attacks.
trade-off, we use the CRFs that are more accurate, though
expensive, but we implement the Layered Approach to                          5.1.4 U2R Layer
improve overall system performance. The performance of                       The U2R attacks involve the semantic details that are very
our proposed system, Layered CRFs, is comparable to that                     difficult to capture at an early stage. Such attacks are often
of the decision trees and the naive Bayes, and our system                    content based and target an application. Hence, for U2R
has higher attack detection accuracy.                                        attacks, we selected features such as “number of file
                                                                             creations” and “number of shell prompts invoked,” while
5   INTEGRATING LAYERED APPROACH                      WITH                   we ignored features such as “protocol” and “source bytes.”
    CONDITIONAL RANDOM FIELD                                                     We used domain knowledge together with the practical
                                                                             significance and the feasibility of each feature before
In Section 1, we discussed two main requirements for an                      selecting it for a particular layer. Thus, from the total
intrusion detection system; accuracy of detection and                        41 features, we selected only 5 features for Probe layer,
efficiency in operation. As discussed in Sections 3 and 4,                   9 features for DoS layer, 14 features for R2L layer, and
respectively, the CRFs can be effective in improving the                     8 features for U2R layer. Since each layer is independent of
attack detection accuracy by reducing the number of false                    every other layer, the feature set for the layers is not
alarms, while the Layered Approach can be implemented to                     disjoint. The selected features for all the four layers are
improve the overall system efficiency. Hence, a natural                      presented in Appendix A. We then use the CRFs for attack
choice is to integrate them to build a single system that is                 detection as discussed in Section 3. However, the difference
accurate in detecting attacks and efficient in operation. Given              is that we use only the selected features for each layer rather
the data, we first select four layers corresponding to the four              than using all the 41 features. We now give the algorithm
attack groups (Probe, DoS, R2L, and U2R) and perform                         for integrating CRFs with the Layered Approach.
feature selection for each layer, which is described next.                   Algorithm
5.1 Feature Selection                                                         Training
                                                                              Step 1: Select the number of layers, n, for the complete
Ideally, we would like to perform feature selection auto-
matically. However, as will be discussed later in Section 8,                           system.
the methods for automatic feature selection were not found                    Step 2: Separately perform features selection for each layer.
to be effective. In this section, we describe our approach for                Step 3: Train a separate model with CRFs for each layer
selecting features for every layer and why some features                               using the features selected from Step 2.
were chosen over others. In our system, every layer is                        Step 4: Plug in the trained models sequentially such that
separately trained to detect a single type of attack category.                         only the connections labeled as normal are passed
We observe that the attack groups are different in their                               to the next layer.
impact, and hence, it becomes necessary to treat them                         Testing
differently. Hence, we select features for each layer based                   Step 5: For each (next) test instance perform Steps 6
upon the type of attacks that the layer is trained to detect.                          through 9.
                                                                              Step 6: Test the instance and label it either as attack or
5.1.1 Probe Layer
                                                                                       normal.
The probe attacks are aimed at acquiring information about                    Step 7: If the instance is labeled as attack, block it and
the target network from a source that is often external to the
                                                                                       identify it as an attack represented by the layer
network. Hence, basic connection level features such as the
                                                                                       name at which it is detected and go to Step 5. Else
“duration of connection” and “source bytes” are significant
while features like “number of files creations” and “number                            pass the sequence to the next layer.
of files accessed” are not expected to provide information                    Step 8: If the current layer is not the last layer in the system,
for detecting probes.                                                                  test the instance and go to Step 7. Else go to Step 9.
                                                                              Step 9: Test the instance and label it either as normal or as
5.1.2 DoS Layer                                                                        an attack. If the instance is labeled as an attack,
The DoS attacks are meant to force the target to stop the                              block it and identify it as an attack corresponding
service(s) that is (are) provided by flooding it with                                  to the layer name.

                  Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
40                                    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,                     VOL. 7,   NO. 1, JANUARY-MARCH 2010


                             TABLE 1
                                                                             desktop running with Intel(R) Core(TM) 2, CPU 2.4 GHz,
                             Data Set                                        and 2-Gbyte RAM under exactly the same conditions. We are
                                                                             mainly interested in the test time efficiency and not in the
                                                                             time required for training of the model as the real-time
                                                                             performance of the system depend upon the test efficiency
                                                                             alone. We note that our system is very efficient during testing.
                                                                             When we considered all the 41 features, the time taken to test
                                                                             all the 250,436 attacks was 57 seconds, which reduced to
                                                                             17 seconds when we performed feature selection and
                                                                             implemented the Layered Approach. More details will be
                                                                             presented when we give the detailed results for the
   Our final goal is to improve both the attack detection                    experiments.
accuracy and the efficiency of the system. Hence, we                            For our results, we give the Precision, Recall, and F -Value
integrate the CRFs and the Layered Approach to build a                       and not the accuracy alone as with the given data set, it is
single system. We perform detailed experiments and show                      easy to achieve very high accuracy by carefully selecting the
that our integrated system has dual advantage. First, as                     sample size. From Table 1, we note that the number of
expected, the efficiency of the system increases signifi-                    instances for the U2R, Probes, and R2L attacks is very low.
cantly. Second, since we select significant features for each                Hence, if we use accuracy as a measure for testing the
layer, the accuracy of the system further increases. This is
                                                                             performance of the system, the system can be biased and can
because all the 41 features are not required for detecting
                                                                             attain an accuracy of more than 99 percent for U2R attacks
attacks belonging to a particular attack group. Using more
                                                                             [16]. However, Precision, Recall, and F -Value are not
features than required can result in fitting irregularities in
                                                                             dependent on the size of the training and the test samples.
the data, which has a negative effect on the attack detection
                                                                             They are defined as follows:
accuracy of the system.
                                                                                                          TP
                                                                                          Precision ¼
6    EXPERIMENTS                                                                                       TP þ FP
                                                                                                           TP
For our experiments, we use the benchmark KDD ’99                                             Recall ¼
                                                                                                       TP þ FN
intrusion data set [3]. This data set is a version of the original
                                                                                                       ð1 þ
2 Þ Ã Recall à Precision
1998 DARPA intrusion detection evaluation program, which                                   F -Value ¼                                  ;
is prepared and managed by the MIT Lincoln Laboratory.
2 Ã ðRecall þ PrecisionÞ
The data set contains about five million connection records                  where TP, FP, and FN are the number of True Positives,
as the training data and about two million connection                        False Positives, and False Negatives, respectively, and
records as the test data. In our experiments, we use                         corresponds to the relative importance of precision versus
10 percent of the total training data and 10 percent of the                  recall and is usually set to 1.
test data (with corrected labels), which are provided                           We divide the training data into different groups;
separately. This leads to 494,020 training and 311,029 test                  Normal, Probe, DoS, R2L, and U2R. Similarly, we divide
instances. Each record in the data set represents a connection               the test data. We perform 10 experiments for each attack
between two IP addresses, starting and ending at some well-                  class by randomly selecting data corresponding to that
defined times with a well-defined protocol. Further, every                   attack class and normal data only. For example, to detect
record is represented by 41 different features. Each record                  Probe attacks, we train and test the system with Probe
represents a separate connection and is hence considered to                  attacks and normal data only. We do not add the DoS, R2L,
be independent of any other record.                                          and U2R data when detecting Probes. Not including these
   The training data is either labeled as normal or as one of                attacks while training allows the system to better learn the
the 24 different kinds of attack. These 24 attacks can be                    features for Probe attacks and normal events. When such a
grouped into four classes; Probing, DoS, R2L, and U2R.                       system is deployed online, other attacks such as DoS can
Similarly, the test data is also labeled as either normal or as
                                                                             either be seen as normal or as Probes. If DoS attacks are
one of the attacks belonging to the four attack groups. It is
                                                                             detected as normal, we expect them to be detected as attack
important to note that the test data is not from the same
                                                                             at other layers in the system. However, if the DoS attacks are
probability distribution as the training data, and it includes
                                                                             detected as Probe, it must be considered as an advantage
specific attack types not present in the training data. This
                                                                             since the attack is detected at an early stage. Similarly, if
makes the intrusion detection task more realistic [3]. Table 1
                                                                             some Probe attacks are not detected at the Probe layer, they
gives the number of instances for each group of attack in the
                                                                             may be detected at subsequent layers. Hence, for four attack
data set.
   For our experiments with CRFs, we use the CRF toolkit,                    classes, we have four independent models, which are
CRF++ [2]. We use the Weka tool [44] to perform experiments                  trained separately with specific features to detect attacks
with the decision trees and the naive Bayes classifier. We                   belonging to that particular group. For our experiments, we
develop python and shell scripts for data formatting and                     report the best, the average, and the worst cases.
implementing the Layered Approach. For all our experi-                          We represent a single layer, for example the Probe layer,
ments, we perform hybrid detection, as discussed in Section 1,               in Fig. 3. Other layers can be constructed similarly.
and use both the normal and the anomalous connections for                       In Section 6.1, we perform experiments with individual
training the model. We perform our experiments on a                          layers in the system. In Section 6.2, we represent how to

                  Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
GUPTA ET AL.: LAYERED APPROACH USING CONDITIONAL RANDOM FIELDS FOR INTRUSION DETECTION                                                                  41


                                                                                                          TABLE 3
                                                                                           Normal and Probes (with Feature Selection)



Fig. 3. Representation of a single layer (e.g., probe layer).

implement the system in real scenario and compare our
results with other systems in Section 6.3. In Section 6.4, we
discuss the significance of our results.

6.1 Building Individual Layers of the System
We perform two sets of experiments. From the first
experiment, we wish to examine the accuracy of CRFs for
intrusion detection. The objective is to see how CRFs                           perform feature selection, our system achieves much higher
compare with other techniques, which are known to                               accuracy and there is significant improvement in efficiency.
perform well. We do not consider feature selection, and
the systems are trained using all the 41 features. From this                    6.1.2 Detecting Probe Attacks with Feature Selection
experiment, we observe that CRFs perform much better for                        We used the same set of instances for this experiment as
U2R attacks while the decision trees achieve higher attack                      used in the previous experiment. However, we perform
detection for Probes and R2L. The difference in attack                          feature selection for this experiment. Table 3 gives the
detection accuracy for DoS is not significant. We note that                     results for this experiment.
the reason for better performance of decision trees is that                        We observe that the system takes only 2.04 seconds to
they perform feature selection. This motivates us to perform                    label all the 64,759 test instances. The Layered CRFs
our second experiment where we perform feature selection                        perform better and faster than our previous experiment
by selecting a small set of features for every attack group                     and are the best choice for detecting Probes. We also note
instead of using all the 41 features. We perform the same                       that there is no significant advantage with respect to time
experiment with decision trees and naive Bayes and                              for the layered decision trees as the number of features used
compare the results. We call the integrated models as                           in normal decision trees and in the layered decision trees is
Layered CRFs, layered decision trees, and layered naive                         approximately the same, resulting in similar efficiency. We
Bayes, respectively. For better comparison and readability,                     further note that the Recall and hence the F -Value for the
we give the results for both the experiments together.                          layered naive Bayes decreases drastically. This can be
                                                                                explained as follows: The classification accuracy with naive
6.1.1 Detecting Probe Attacks with All 41 Features                              Bayes generally improves as the number of features
We randomly select about 10,000 normal records and all the                      increases. However, if the number of features increases to
                                                                                a very large extent, the estimation tends to become
Probe records from the training data as the training data for
                                                                                unreliable. As a result, when we use all the 41 features,
detecting Probe attacks. We then use all the normal and
                                                                                the naive Bayes performs well but when we decrease the
Probe records from the test data for testing. Hence, we have
                                                                                number of features to five, its classification accuracy
15,000 training instances and 64,759 test instances. Table 2                    decreases. From this experiment, we conclude that the
gives the results for the experiments.                                          Layered CRFs are a better choice for detecting Probe
   In Table 2, the testing time of 14.53 seconds represents                     attacks.
the total time taken to label 64,759 test instances. The results
show that the decision trees are more efficient than the CRFs                   6.1.3 Detecting DoS Attacks with All 41 Features
and the naive Bayes. This is because they have a small tree
                                                                                We randomly select about 20,000 normal records and about
structure, often with very few decision nodes, which is very
                                                                                4,000 DoS records from the training data as the training data
efficient. The attack detection accuracy is also higher for the
                                                                                for detecting DoS attacks. We then use all the normal and
decision trees as they select the best possible features during
tree construction. However, as we show next, once we                            DoS records from the test data for testing. Hence, we have
                                                                                24,000 training instances and 290,446 test instances. Table 4
                                                                                gives the results for the experiments.
                           TABLE 2                                                 In Table 4, the testing time of 64.42 seconds represents
               Normal and Probes (All 41 Features)                              the time taken to label all the 290,446 test instances. The
                                                                                results show that all the three methods considered have
                                                                                similar attack detection accuracy; however, decision trees
                                                                                give a slight advantage with regard to test time efficiency.

                                                                                6.1.4 Detecting DoS Attacks with Feature Selection
                                                                                We used the same data for this experiment as used in the
                                                                                previous experiment. However, we perform feature selec-
                                                                                tion. Table 5 gives the results.
                                                                                   We observe that the system now takes only 15.17 seconds
                                                                                to label all the 290,446 test instances. The results follow the

                     Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
42                                    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,                     VOL. 7,   NO. 1, JANUARY-MARCH 2010


                         TABLE 4                                                                        TABLE 7
              Normal and DoS (All 41 Features)                                            Normal and R2L (with Feature Selection)




                         TABLE 5
           Normal and DoS (with Feature Selection)
                                                                             6.1.6 Detecting R2L Attacks with Feature Selection
                                                                             Table 7 gives the results when we performed feature
                                                                             selection for detecting R2L attacks.
                                                                                We observe that the time taken to test all the
                                                                             76,942 instances is only 5.96 seconds. Further, the
                                                                             Layered CRFs perform much better than the CRFs (an
                                                                             increase of about 60 percent), layered decision trees (an
                                                                             increase of about 125 percent), decision trees (an increase
                                                                             of about 17 percent), layered naive Bayes (an increase of
                                                                             about 250 percent), and naive Bayes (an increase of
                                                                             about 250 percent) and are the best choice for detecting
same trend as in the previous experiment with only a slight                  the R2L attacks. The Layered CRFs take slightly more
improvement. However, if we consider the testing time we                     time, which is acceptable since we achieve much higher
find that layered decision trees are a better choice. We also                detection accuracy.
note that there is slight increase in the detection accuracy
                                                                             6.1.7 Detecting U2R Attacks with All 41 Features
when we perform feature selection, but this increase is not
significant. The real advantage is seen in the reduced time                  We randomly select about 1,000 normal records and all the
for testing, which decreases four folds.                                     U2R records from the training data as the training data for
                                                                             detecting the User to Root attacks. We then use all the
6.1.5 Detecting R2L Attacks with All 41 Features                             normal and U2R records from the test data for testing.
We randomly select about 1,000 normal records and all the                    Hence, we have 1,000 training instances and 60,661 test
R2L records from the training data as the training data for                  instances. Table 8 gives the results.
detecting R2L attacks. We then use all the normal and                           In Table 8, the testing time of 13.45 seconds represents
R2L records from the test data for testing. Hence, we have                   the time taken to label all the 60,661 test instances. In this
2,000 training instances and 76,942 test instances. Table 6 gives            experiment, we find that the CRFs are far better than the
the results.                                                                 other two methods. The F -Value for CRFs is more than
   In Table 6, the testing time of 17.16 seconds represents                  150 percent with respect to the decision trees and more than
the time taken to label all the 76,942 test instances. We                    600 percent with respect to the naive Bayes. The U2R attacks
observe that the decision trees have a higher F -Value, but if               are very difficult to detect and most of the present intrusion
we look at the number of false alarms, we find that the CRFs                 detection systems fail to detect such attacks with acceptable
perform better and have high Precision compared to the                       reliability. We find that the CRFs can be used to reliably
decision trees and the naive Bayes.                                          detect such attacks.


                         TABLE 6                                                                         TABLE 8
              Normal and R2L (All 41 Features)                                                Normal and U2R (All 41 Features)




                  Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
GUPTA ET AL.: LAYERED APPROACH USING CONDITIONAL RANDOM FIELDS FOR INTRUSION DETECTION                                                                 43


                          TABLE 9                                                                            TABLE 10
            Normal and U2R (with Feature Selection)                                                        Confusion Matrix




                                                                                                           TABLE 11
                                                                                            Attack Detection at Each Layer (Case 1)




6.1.8 Detecting U2R Attacks with Feature Selection
In this experiment, we used exactly the same set of instances
as we used in the previous experiment. We also perform
feature selection. Table 9 gives the results for this experiment.
    We observe that the system takes only 2.67 seconds to label
all the 60,661 test instances. The Layered CRFs are the best                   category once the system detects an event as anomalous.
choice for detecting the U2R attacks and are far better than                   Layered Approach not only improves the attack detection,
CRFs (an increase of about 8 percent), layered decision trees                  but it also helps identify the type of attack once detected
(an increase of about 30 percent), decision trees (an increase                 because every layer is trained to detect only a particular
of about 184 percent), layered naive Bayes (an increase of                     category of attack. Hence, if an attack is detected at the U2R
about 38 percent), and naive Bayes (an increase of about                       layer, it is very likely that the attack is of “U2R” type. This
675 percent). We observe that the attack detection capability                  enables to perform quick recovery and prevent similar
also increases for the decision trees and the naive Bayes.                     attacks. Fig. 4 gives the real-time system representation.
    It is evident from the results that the accuracy of Layered                   We integrate the four models (with feature selection) from
CRFs is significantly higher for the U2R, R2L, and the Probe                   Section 6.1 to develop the final system. In this experiment,
attacks. The difference in accuracy is, however, not sig-                      we use the same data for training the individual models as
nificant for the DoS attacks. Further, regardless of the                       used in our previous experiments. However, the data in the
method considered and particularly for the CRFs, the time                      test set is relabeled either as normal or as attack and all the
required for training and testing the system is drastically                    data from the test set is passed though the system starting
reduced once we perform feature selection. We also note                        from the first layer. If layer 1 detects any connection as an
that the increase in detection accuracy is not significant for                 attack, it is blocked and labeled as “Probe.” Only the events
the layered decision trees and the layered naive Bayes for                     labeled as “Normal” are allowed to go to the next layer. The
the DoS group of attack. Their accuracy of detection                           same process is repeated at the next layers where an attack is
decreases for the Probe and R2L attacks while it increases                     blocked and labeled as “DoS,” “R2L,” or “U2R” at layer 2,
for the U2R attacks. However, we find that in all the cases,                   layer 3, and layer 4, respectively. We perform all the
the Layered CRFs perform significantly better and can better                   experiments 10 times and report their average. We give the
learn a model when we use a small set of specific features for                 results for this experiment in Tables 10, 11, and 12. Table 10
training.                                                                      gives the confusion matrix where the values represent the
                                                                               percent detection with respect to each of the five classes.
6.2 Implementing the System in Real Life                                          From the table, we observe that our system can detect
In real scenario, we are not aware of the category of an                       most of the “Probe” (98.62 percent), “DoS” (97.40 percent),
attack. Rather, we are interested in identifying the attack                    and “U2R” (86.33 percent) attacks while giving very few
                                                                               false alarms at each layer. The system can also detect “R2L”
                                                                               attacks with much higher reliability (29.62 percent) when
                                                                               compared with the previously reported systems, as we will
                                                                               discuss later in Section 6.3. The confusion matrix shows that


                                                                                                           TABLE 12
                                                                                            Attack Detection at Each Layer (Case 2)




Fig. 4. Real-time representation of the system.

                    Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
44                                     IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,                     VOL. 7,   NO. 1, JANUARY-MARCH 2010


only 71.90 percent of DoS attacks are labeled as DoS during                                               TABLE 13
testing. However, it is very important to note that the                                      Layered versus Nonlayered Approach
accuracy for detecting DoS attacks is not 71.90 percent, but
it is 25:50 þ 71:90 þ 0:00 þ 0:00 ¼ 97:40 percent. This is
because 25.50 percent of the DoS attacks are already
detected at the first layer, though our system identifies
them as probes since they are detected at the first layer. This
is acceptable because in the real environment it is critical to
detect an attack as early as possible to minimize its impact.
    It is also important to note that most of the “U2R” attacks
are detected in the third layer itself and hence labeled as
“R2L.” However, if we remove the third layer, the fourth
layer can detect these attacks with similar accuracy. Further,
looking at the “R2L” and “U2R” columns in Table 10, it is                     accurate in detecting attacks particularly the U2R, the R2L,
natural to think that the two layers can be merged. However,                  and the Probes.
this has two disadvantages. First, merging the two layers                        It is important to note that the time should be read in
results in increasing the number of features, which reduces                   relative terms rather than absolute, as for ease of experi-
efficiency. The merged layer performs poorly with regard to                   ments we used scripts for implementation. In real environ-
the total time taken when compared with both the                              ment, high speed can be achieved by implementing the
unmerged layers together. Second, when the layers are                         complete system in languages with efficient compilers such
merged, the “U2R” attacks are not detected effectively and
                                                                              as the “C Language.” Further, pipelining can be implemen-
their individual attack detection accuracy decreases. This is
                                                                              ted in multicore processors, where each core may represent a
because the number of “U2R” instances is very low in the
training data and the system simply learns the features that                  single layer, and due to pipelining, multiple I/O operations
are specific to the “R2L” attacks. Hence, we prefer separate                  can be replaced by a single I/O operation providing very
layers for the two attack groups. Using our approach, we can                  high speed of operation.
hope that any attack, even though its category is unknown,
                                                                              6.3 Comparison of Results
can be detected at any one of the layers in the system. We
can also increase or decrease the number of layers depend-                    In this section, we compare our work with other well-known
ing upon the environment where the system is deployed.                        methods based on the anomaly intrusion detection principle.
    We evaluate the performance of each layer in the                          The anomaly-based systems primarily detect deviations
system in Table 11. From the table, we observe that out of                    from the learnt normal data by using statistical methods,
all the 250,436 attack instances in the test data set, more                   machine learning, or data mining approaches. Standard
than 25 percent of the attacks are blocked at layer 1, and                    techniques such as the decision trees and naive Bayes are
more than 90 percent of all the attacks have been blocked                     known to perform well. However, our experiments show
by the end of layer 2. Thus, the Layered Approach is very                     that the Layered CRFs perform far better than these
effective in reducing the attack traffic at each layer in the                 techniques. The main reason for this is that the CRFs do not
system. The configuration takes 21 seconds to classify all                    consider the observation features to be independent. In [38],
the 250,436 attacks.                                                          the authors present a comparative study of various classifiers
    We can do further optimization by putting the DoS layer                   when applied to the KDD ’99 data set, and in [13], the authors
before the Probe layer. We can do this because the data is                    propose the use of Principle Component Analysis (PCA)
relational and each layer in our system is independent.                       before applying a machine learning algorithm. Use of
Putting the DoS layer before the Probe layer serves dual                      support vector machines is discussed in [26]. We compare
advantage as most of the attacks are detected at the first                    our results from the results presented in these papers in
layer itself and the overall system performs efficiently. This                Table 14. The table represents the Probability of Detection
optimization becomes significant in severe attack situations                  (PD) and False Alarm Rate (FAR) in percent for various
when the target is overwhelmed with illegitimate connec-                      methods including the KDD ’99 cup winners.
tions. The results are presented in Table 12.                                    From the table, we observe that the Layered CRFs
    We observe that the Layered Approach can be very                          perform significantly better than the previously reported
effective in restricting the attack traffic to the initial layers in          results including the winner of the KDD ’99 cup and
the system. We also performed experiments when we do                          various other methods applied to this data set. The most
not implement the Layered Approach, i.e., we consider only                    impressive part of the Layered CRFs is the margin of improve-
a single system that is trained with two classes (normal and                  ment as compared with other methods. Layered CRFs have very
attack). In this system, all the Probes, DoS, R2L, and U2R                    high attack detection of 98.6 percent for Probes (5.8 percent
attacks are labeled as “attack.” We perform experiments                       improvement) and 97.40 percent detection for DoS. They
both with and without feature selection. For feature                          outperform by a significant percentage for the R2L (34.5 percent
selection, we consider 21 features, which are selected by                     improvement) and the U2R (34.8 percent improvement) attacks.
applying the union operation on the feature sets of all the
four attack types. We compare these results with the                          6.4 Discussion and Issues
Layered Approach in Table 13.                                                 From our experiments and the comparison in Table 14, we
    We observe that a system implementing the Layered                         conclude that the Layered CRFs can be very effective in
Approach with feature selection is more efficient and more                    detecting the Probe, the U2R, and the R2L attacks as well as

                   Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
GUPTA ET AL.: LAYERED APPROACH USING CONDITIONAL RANDOM FIELDS FOR INTRUSION DETECTION                                                               45


                        TABLE 14                                                                          TABLE 15
                    Comparison of Results                                                         Ranking for the Six Methods




                                                                             scores better. Our integrated system also has the advantage
                                                                             that any method can be used in the layers of the system. This
                                                                             gives flexibility to the user to decide between the time and
                                                                             accuracy trade-off. Furthermore, we can increase or decrease
                                                                             the number of layers in the system depending upon the task
                                                                             requirement. Finally, our system can be used for performing
                                                                             analysis on attacks because the attack category can be
                                                                             inferred from the layer at which the attack is detected.
                                                                                To determine the statistical significance of our results, we
                                                                             compare our proposed method (Layered CRFs) with others
                                                                             for detecting Probes, DoS, R2L, and U2R attacks. We use the
                                                                             Wilcoxon sum rank test with 95 percent confidence interval
                                                                             to discriminate the performance of these methods. Table 15
                                                                             gives the ranking for various methods compared, where a
                                                                             system with rank 1 is the best.
                                                                                The results of the Wilcoxon test indicate that the Layered
                                                                             CRFs are much better (or equal) for detecting attacks. Thus,
                                                                             we conclude that the Layered CRFs are a strong candidate
the DoS attacks. However, if we consider all the 41 features                 for building robust and efficient intrusion detection systems.
given in the data set, we find that the time required to train
and test the model is high. To address this, we performed                    7    EFFECT       OF   NOISE
experiments with our integrated system by implementing a
four-layer system. The four layers correspond to Probe,                      Ideally, we would like to perform similar experiments with
DoS, R2L, and U2R. For each layer, we then selected a set of                 a large number of data sets. However, given the domain of
features that is sufficient to detect attacks at that particular             the problem, there are no other data sets that are freely
layer. Feature selection for each layer enhances the                         available, which can be used for our experiments. To
performance of the entire system. The runtime (testing)                      ameliorate this problem to some extent, we add substantial
                                                                             amount of noise in the training data and perform similar
performance of our model is comparable with other
                                                                             experiments to study the robustness of these systems. By
methods; however, the time required to train the model is
                                                                             experimenting with noisy data, we want to determine the
slightly higher. We also observe that feature selection not
                                                                             sensitivity of the proposed scheme with respect to noise. If
only decreases the time required to test an instance, but it
                                                                             the system performs poorly with noisy data, the results
also increases the accuracy of attack detection. This is
                                                                             could be an artifact of the data set.
because using more features than required can generate
superfluous rules often resulting in fitting irregularities in               7.1 Addition of Noise to Data
the data, which can misguide classification. From our                        Addition of noise was controlled by two parameters, the
experimental results, we conclude that the main strength                     probability of adding noise to a feature, p, and the scaling
of our method lies in detecting the R2L and the U2R attacks,                 factor, s, for a feature. We performed four sets of experi-
which are not satisfactorily detected by other methods. Our                  ments with noisy data, separately, one for each layer. For
method gives slight improvement for detecting Probe                          each layer, we varied the parameter p between 0 and 0.95 (by
attacks and was similar in accuracy when compared with                       keeping it at values 0.10, 0.20, 0.33, 0.50, 0.75, 0.90, and 0.95)
other methods for detecting the DoS attacks.                                 and varied the parameter s between À1,000 and þ1,000. In
   The prime reason for better detection accuracy for the                    case when the original feature was “0,” noise was added to
CRFs is that they do not consider the observation features to                any feature by using an additive function (random value
be independent. CRFs evaluate all the rules together, which                  between À1,000 and þ1,000) instead of scaling. Figs. 5, 6, 7,
are applicable for a given observation. This results in                      and 8 represent the effect of noise on each layer separately.
capturing the correlation among different features of the                       We find that our integrated system is robust to noise in
observation resulting in higher accuracy. Considering both                   the training data and performs better than other methods
the accuracy and the time required for testing, our system                   for all of the four attack groups.

                  Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
46                                        IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING,                   VOL. 7,   NO. 1, JANUARY-MARCH 2010




Fig. 5. Effect of noise on Probe layer.                                        Fig. 8. Effect of noise on U2R layer.

                                                                                  We then used PCA for dimensionality reduction [13].
                                                                               However, the main drawback of using PCA in our task is
                                                                               that PCA transforms a large number of possibly correlated
                                                                               features into a small number of uncorrelated features known
                                                                               as the principle components. Hence, when we applied PCA
                                                                               followed by CRFs in the newly transformed feature space,
                                                                               the method did not provide significant advantage as the
                                                                               strength of our approach is to model correlation among
                                                                               features and the features in the new space are independent.
                                                                                  We also note that, to construct a decision tree, the
                                                                               C4.5 algorithm performs feature selection. We selected the
Fig. 6. Effect of noise on DoS layer.                                          same features as selected by the C4.5 algorithm and then
                                                                               performed experiments with only those features. However,
8    AUTOMATIC FEATURE SELECTION                                               there was no significant improvement in the results.
From our experiments in the previous sections, we showed                          We also performed experiments with the method
the advantages of performing feature selection and im-                         proposed in [33] for efficiently inducing features for a
plementing the Layered Approach for attack detection. We                       CRF. The method is based upon iteratively constructing
performed our experiments by manually selecting features                       feature conjunctions that would significantly increase the
for different layers. However, we want to compare the                          conditional log-likelihood if added to the model. We used
results of manual feature selection with the results of                        the Mallet tool [35] for performing these experiments and
automatic feature selection (and no feature selection) for all                 compare the results with our previous results based on
the layers. Hence, we investigated various methods for                         Layered CRFs with manual feature selection in Table 16. We
automatic feature selection.                                                   observed that both the systems, with automatic and manual
   We experimented with a feed forward neural network to                       feature selections, had similar test time performance, but
determine the weights for all the 41 features. Features with                   the accuracy of detection when features were induced
weights close to zero were discarded. As a result, only a                      automatically was significantly lower than our system
small set of features was selected for each layer. However,                    based upon manual feature selection.
when we performed the experiments on the reduced set of                           It was not surprising that manual feature selection
features and compared the results, there was no significant                    performed better than automatic feature selection. How-
improvement in the detection accuracy, though there was                        ever, we note that automatic feature selection for Layered
reduction in training and testing time.                                        CRFs performed better than the decision trees, particularly
                                                                               for the R2L and the U2R attacks. This suggests that Layered


                                                                                                              TABLE 16
                                                                                                           Feature Selection




Fig. 7. Effect of noise on R2L layer.

                    Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.

Mais conteúdo relacionado

Mais procurados

1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...
1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...
1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...
Alexander Decker
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194
Editor IJARCET
 
A technical review and comparative analysis of machine learning techniques fo...
A technical review and comparative analysis of machine learning techniques fo...A technical review and comparative analysis of machine learning techniques fo...
A technical review and comparative analysis of machine learning techniques fo...
IJECEIAES
 
International Journal of Computer Science and Security Volume (2) Issue (1)
International Journal of Computer Science and Security Volume (2) Issue (1)International Journal of Computer Science and Security Volume (2) Issue (1)
International Journal of Computer Science and Security Volume (2) Issue (1)
CSCJournals
 

Mais procurados (17)

1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...
1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...
1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...
 
IDS - Fact, Challenges and Future
IDS - Fact, Challenges and FutureIDS - Fact, Challenges and Future
IDS - Fact, Challenges and Future
 
IRJET- Review on Intrusion Detection System using Recurrent Neural Network wi...
IRJET- Review on Intrusion Detection System using Recurrent Neural Network wi...IRJET- Review on Intrusion Detection System using Recurrent Neural Network wi...
IRJET- Review on Intrusion Detection System using Recurrent Neural Network wi...
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194
 
COMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTION
COMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTIONCOMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTION
COMBINING NAIVE BAYES AND DECISION TREE FOR ADAPTIVE INTRUSION DETECTION
 
A technical review and comparative analysis of machine learning techniques fo...
A technical review and comparative analysis of machine learning techniques fo...A technical review and comparative analysis of machine learning techniques fo...
A technical review and comparative analysis of machine learning techniques fo...
 
Detecting Anomaly IDS in Network using Bayesian Network
Detecting Anomaly IDS in Network using Bayesian NetworkDetecting Anomaly IDS in Network using Bayesian Network
Detecting Anomaly IDS in Network using Bayesian Network
 
Finding Diversity In Remote Code Injection Exploits
Finding Diversity In Remote Code Injection ExploitsFinding Diversity In Remote Code Injection Exploits
Finding Diversity In Remote Code Injection Exploits
 
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMSAN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
 
1850 1854
1850 18541850 1854
1850 1854
 
IDS - Analysis of SVM and decision trees
IDS - Analysis of SVM and decision treesIDS - Analysis of SVM and decision trees
IDS - Analysis of SVM and decision trees
 
Icacci presentation-cnn intrusion
Icacci presentation-cnn intrusionIcacci presentation-cnn intrusion
Icacci presentation-cnn intrusion
 
Hybrid Technique for Detection of Denial of Service (DOS) Attack in Wireless ...
Hybrid Technique for Detection of Denial of Service (DOS) Attack in Wireless ...Hybrid Technique for Detection of Denial of Service (DOS) Attack in Wireless ...
Hybrid Technique for Detection of Denial of Service (DOS) Attack in Wireless ...
 
Review on Intrusion Detection in MANETs
Review on Intrusion Detection in MANETsReview on Intrusion Detection in MANETs
Review on Intrusion Detection in MANETs
 
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...
 
International Journal of Computer Science and Security Volume (2) Issue (1)
International Journal of Computer Science and Security Volume (2) Issue (1)International Journal of Computer Science and Security Volume (2) Issue (1)
International Journal of Computer Science and Security Volume (2) Issue (1)
 
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...
A SURVEY ON DIFFERENT MACHINE LEARNING ALGORITHMS AND WEAK CLASSIFIERS BASED ...
 

Semelhante a Layered approach

FORTIFICATION OF HYBRID INTRUSION DETECTION SYSTEM USING VARIANTS OF NEURAL ...
FORTIFICATION OF HYBRID INTRUSION  DETECTION SYSTEM USING VARIANTS OF NEURAL ...FORTIFICATION OF HYBRID INTRUSION  DETECTION SYSTEM USING VARIANTS OF NEURAL ...
FORTIFICATION OF HYBRID INTRUSION DETECTION SYSTEM USING VARIANTS OF NEURAL ...
IJNSA Journal
 
COPYRIGHTThis thesis is copyright materials protected under the .docx
COPYRIGHTThis thesis is copyright materials protected under the .docxCOPYRIGHTThis thesis is copyright materials protected under the .docx
COPYRIGHTThis thesis is copyright materials protected under the .docx
voversbyobersby
 
Intrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern miningIntrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern mining
eSAT Journals
 
Intrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern miningIntrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern mining
eSAT Journals
 
INTRUSION DETECTION USING FEATURE SELECTION AND MACHINE LEARNING ALGORITHM WI...
INTRUSION DETECTION USING FEATURE SELECTION AND MACHINE LEARNING ALGORITHM WI...INTRUSION DETECTION USING FEATURE SELECTION AND MACHINE LEARNING ALGORITHM WI...
INTRUSION DETECTION USING FEATURE SELECTION AND MACHINE LEARNING ALGORITHM WI...
ijcsit
 
Vol 6 No 1 - October 2013
Vol 6 No 1 - October 2013Vol 6 No 1 - October 2013
Vol 6 No 1 - October 2013
ijcsbi
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 

Semelhante a Layered approach (20)

FORTIFICATION OF HYBRID INTRUSION DETECTION SYSTEM USING VARIANTS OF NEURAL ...
FORTIFICATION OF HYBRID INTRUSION  DETECTION SYSTEM USING VARIANTS OF NEURAL ...FORTIFICATION OF HYBRID INTRUSION  DETECTION SYSTEM USING VARIANTS OF NEURAL ...
FORTIFICATION OF HYBRID INTRUSION DETECTION SYSTEM USING VARIANTS OF NEURAL ...
 
COPYRIGHTThis thesis is copyright materials protected under the .docx
COPYRIGHTThis thesis is copyright materials protected under the .docxCOPYRIGHTThis thesis is copyright materials protected under the .docx
COPYRIGHTThis thesis is copyright materials protected under the .docx
 
Review of Intrusion and Anomaly Detection Techniques
Review of Intrusion and Anomaly Detection Techniques Review of Intrusion and Anomaly Detection Techniques
Review of Intrusion and Anomaly Detection Techniques
 
Intrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern miningIntrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern mining
 
Intrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern miningIntrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern mining
 
A Study and Comparative analysis of Conditional Random Fields for Intrusion d...
A Study and Comparative analysis of Conditional Random Fields for Intrusion d...A Study and Comparative analysis of Conditional Random Fields for Intrusion d...
A Study and Comparative analysis of Conditional Random Fields for Intrusion d...
 
INTRUSION DETECTION USING FEATURE SELECTION AND MACHINE LEARNING ALGORITHM WI...
INTRUSION DETECTION USING FEATURE SELECTION AND MACHINE LEARNING ALGORITHM WI...INTRUSION DETECTION USING FEATURE SELECTION AND MACHINE LEARNING ALGORITHM WI...
INTRUSION DETECTION USING FEATURE SELECTION AND MACHINE LEARNING ALGORITHM WI...
 
Intrusion Detection Systems By Anamoly-Based Using Neural Network
Intrusion Detection Systems By Anamoly-Based Using Neural NetworkIntrusion Detection Systems By Anamoly-Based Using Neural Network
Intrusion Detection Systems By Anamoly-Based Using Neural Network
 
Classification Rule Discovery Using Ant-Miner Algorithm: An Application Of N...
Classification Rule Discovery Using Ant-Miner Algorithm: An  Application Of N...Classification Rule Discovery Using Ant-Miner Algorithm: An  Application Of N...
Classification Rule Discovery Using Ant-Miner Algorithm: An Application Of N...
 
HYBRID ARCHITECTURE FOR DISTRIBUTED INTRUSION DETECTION SYSTEM IN WIRELESS NE...
HYBRID ARCHITECTURE FOR DISTRIBUTED INTRUSION DETECTION SYSTEM IN WIRELESS NE...HYBRID ARCHITECTURE FOR DISTRIBUTED INTRUSION DETECTION SYSTEM IN WIRELESS NE...
HYBRID ARCHITECTURE FOR DISTRIBUTED INTRUSION DETECTION SYSTEM IN WIRELESS NE...
 
Ijnsa050214
Ijnsa050214Ijnsa050214
Ijnsa050214
 
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
A Novel Classification via Clustering Method for Anomaly Based Network Intrus...
 
IRJET- An Intrusion Detection Framework based on Binary Classifiers Optimized...
IRJET- An Intrusion Detection Framework based on Binary Classifiers Optimized...IRJET- An Intrusion Detection Framework based on Binary Classifiers Optimized...
IRJET- An Intrusion Detection Framework based on Binary Classifiers Optimized...
 
IMPROVED IDS USING LAYERED CRFS WITH LOGON RESTRICTIONS AND MOBILE ALERTS BAS...
IMPROVED IDS USING LAYERED CRFS WITH LOGON RESTRICTIONS AND MOBILE ALERTS BAS...IMPROVED IDS USING LAYERED CRFS WITH LOGON RESTRICTIONS AND MOBILE ALERTS BAS...
IMPROVED IDS USING LAYERED CRFS WITH LOGON RESTRICTIONS AND MOBILE ALERTS BAS...
 
TAXONOMY BASED INTRUSION ATTACKS AND DETECTION MANAGEMENT SCHEME IN PEER-TOPE...
TAXONOMY BASED INTRUSION ATTACKS AND DETECTION MANAGEMENT SCHEME IN PEER-TOPE...TAXONOMY BASED INTRUSION ATTACKS AND DETECTION MANAGEMENT SCHEME IN PEER-TOPE...
TAXONOMY BASED INTRUSION ATTACKS AND DETECTION MANAGEMENT SCHEME IN PEER-TOPE...
 
1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...
1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...
1.[1 9]a genetic algorithm based elucidation for improving intrusion detectio...
 
IRJET - A Secure Approach for Intruder Detection using Backtracking
IRJET -  	  A Secure Approach for Intruder Detection using BacktrackingIRJET -  	  A Secure Approach for Intruder Detection using Backtracking
IRJET - A Secure Approach for Intruder Detection using Backtracking
 
Intrusion Detection System (IDS): Anomaly Detection using Outlier Detection A...
Intrusion Detection System (IDS): Anomaly Detection using Outlier Detection A...Intrusion Detection System (IDS): Anomaly Detection using Outlier Detection A...
Intrusion Detection System (IDS): Anomaly Detection using Outlier Detection A...
 
Vol 6 No 1 - October 2013
Vol 6 No 1 - October 2013Vol 6 No 1 - October 2013
Vol 6 No 1 - October 2013
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 

Mais de ingenioustech

Design and evaluation of a proxy cache for
Design and evaluation of a proxy cache forDesign and evaluation of a proxy cache for
Design and evaluation of a proxy cache for
ingenioustech
 

Mais de ingenioustech (20)

Supporting efficient and scalable multicasting
Supporting efficient and scalable multicastingSupporting efficient and scalable multicasting
Supporting efficient and scalable multicasting
 
Monitoring service systems from
Monitoring service systems fromMonitoring service systems from
Monitoring service systems from
 
Locally consistent concept factorization for
Locally consistent concept factorization forLocally consistent concept factorization for
Locally consistent concept factorization for
 
Measurement and diagnosis of address
Measurement and diagnosis of addressMeasurement and diagnosis of address
Measurement and diagnosis of address
 
Impact of le arrivals and departures on buffer
Impact of  le arrivals and departures on bufferImpact of  le arrivals and departures on buffer
Impact of le arrivals and departures on buffer
 
Exploiting dynamic resource allocation for
Exploiting dynamic resource allocation forExploiting dynamic resource allocation for
Exploiting dynamic resource allocation for
 
Efficient computation of range aggregates
Efficient computation of range aggregatesEfficient computation of range aggregates
Efficient computation of range aggregates
 
Dynamic measurement aware
Dynamic measurement awareDynamic measurement aware
Dynamic measurement aware
 
Design and evaluation of a proxy cache for
Design and evaluation of a proxy cache forDesign and evaluation of a proxy cache for
Design and evaluation of a proxy cache for
 
Throughput optimization in
Throughput optimization inThroughput optimization in
Throughput optimization in
 
Tcp
TcpTcp
Tcp
 
Privacy preserving
Privacy preservingPrivacy preserving
Privacy preserving
 
Phish market protocol
Phish market protocolPhish market protocol
Phish market protocol
 
Peering equilibrium multi path routing
Peering equilibrium multi path routingPeering equilibrium multi path routing
Peering equilibrium multi path routing
 
Peace
PeacePeace
Peace
 
Online social network
Online social networkOnline social network
Online social network
 
On the quality of service of crash recovery
On the quality of service of crash recoveryOn the quality of service of crash recovery
On the quality of service of crash recovery
 
It auditing to assure a secure cloud computing
It auditing to assure a secure cloud computingIt auditing to assure a secure cloud computing
It auditing to assure a secure cloud computing
 
Intrution detection
Intrution detectionIntrution detection
Intrution detection
 
Bayesian classifiers programmed in sql
Bayesian classifiers programmed in sqlBayesian classifiers programmed in sql
Bayesian classifiers programmed in sql
 

Último

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 

Último (20)

Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 

Layered approach

  • 1. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 7, NO. 1, JANUARY-MARCH 2010 35 Layered Approach Using Conditional Random Fields for Intrusion Detection Kapil Kumar Gupta, Baikunth Nath, Senior Member, IEEE, and Ramamohanarao Kotagiri, Member, IEEE Abstract—Intrusion detection faces a number of challenges; an intrusion detection system must reliably detect malicious activities in a network and must perform efficiently to cope with the large amount of network traffic. In this paper, we address these two issues of Accuracy and Efficiency using Conditional Random Fields and Layered Approach. We demonstrate that high attack detection accuracy can be achieved by using Conditional Random Fields and high efficiency by implementing the Layered Approach. Experimental results on the benchmark KDD ’99 intrusion data set show that our proposed system based on Layered Conditional Random Fields outperforms other well-known methods such as the decision trees and the naive Bayes. The improvement in attack detection accuracy is very high, particularly, for the U2R attacks (34.8 percent improvement) and the R2L attacks (34.5 percent improvement). Statistical Tests also demonstrate higher confidence in detection accuracy for our method. Finally, we show that our system is robust and is able to handle noisy data without compromising performance. Index Terms—Intrusion detection, Layered Approach, Conditional Random Fields, network security, decision trees, naive Bayes. Ç 1 INTRODUCTION The signature-based systems are trained by extracting specific I NTRUSION detection as defined by the SysAdmin, Audit, Networking, and Security (SANS) Institute is the art of detecting inappropriate, inaccurate, or anomalous activity patterns (or signatures) from previously known attacks while the anomaly-based systems learn from the normal data [6]. Today, intrusion detection is one of the high priority and collected when there is no anomalous activity [11]. challenging tasks for network administrators and security Another approach for detecting intrusions is to consider professionals. More sophisticated security tools mean that the both the normal and the known anomalous patterns for attackers come up with newer and more advanced penetra- training a system and then performing classification on the tion methods to defeat the installed security systems [4] and test data. Such a system incorporates the advantages of both [24]. Thus, there is a need to safeguard the networks from the signature-based and the anomaly-based systems and is known vulnerabilities and at the same time take steps to known as the Hybrid System. Hybrid systems can be very detect new and unseen, but possible, system abuses by efficient, subject to the classification method used, and can developing more reliable and efficient intrusion detection also be used to label unseen or new instances as they assign systems. Any intrusion detection system has some inherent one of the known classes to every test instance. This is requirements. Its prime purpose is to detect as many attacks possible because during training the system learns features as possible with minimum number of false alarms, i.e., the from all the classes. The only concern with the hybrid method system must be accurate in detecting attacks. However, an is the availability of labeled data. However, data requirement accurate system that cannot handle large amount of network is also a concern for the signature- and the anomaly-based traffic and is slow in decision making will not fulfill the systems as they require completely anomalous and attack- purpose of an intrusion detection system. We desire a system free data, respectively, which are not easy to ensure. that detects most of the attacks, gives very few false alarms, The rest of this paper is organized as follows: In Section 2, copes with large amount of data, and is fast enough to make we discuss the related work with emphasis on various real-time decisions. methods and frameworks used for intrusion detection. We Intrusion detection started in around 1980s after the describe the use of Conditional Random Fields (CRFs) for influential paper from Anderson [10]. Intrusion detection intrusion detection [23] in Section 3 and the Layered systems are classified as network based, host based, or Approach [22] in Section 4. We then describe how to application based depending on their mode of deployment integrate the Layered Approach and the CRFs in Section 5. and data used for analysis [11]. Additionally, intrusion In Section 6, we give our experimental results and compare detection systems can also be classified as signature based or our method with other approaches that are known to anomaly based depending upon the attack detection method. perform well. We observe that our proposed system, Layered CRFs, performs significantly better than other . The authors are with the Department of Computer Science and Software systems. We study the robustness of our method in Section 7 Engineering, and NICTA Victoria Research Laboratory, The University of by introducing noise in the system. We discuss feature Melbourne, Parkville 3010, Australia. selection in Section 8 and draw conclusions in Section 9. E-mail: {kgupta, bnath, rao}@csse.unimelb.edu.au. Manuscript received 6 Mar. 2007; revised 11 Dec. 2007; accepted 28 Jan. 2008; published online 12 Mar. 2008. 2 RELATED WORK For information on obtaining reprints of this article, please send e-mail to: tdsc@computer.org, and reference IEEECS Log Number TDSC-2007-03-0031. The field of intrusion detection and network security has Digital Object Identifier no. 10.1109/TDSC.2008.20. been around since late 1980s. Since then, a number of 1545-5971/10/$26.00 ß 2010 IEEE Published by the IEEE Computer Society Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
  • 2. 36 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 7, NO. 1, JANUARY-MARCH 2010 methods and frameworks have been proposed and many often hard to select the best possible architecture for a neural systems have been built to detect intrusions. Various network. Support vector machines have also been used for techniques such as association rules, clustering, naive Bayes detecting intrusions [26]. Support vector machines map real- classifier, support vector machines, genetic algorithms, valued input feature vector to a higher dimensional feature artificial neural networks, and others have been applied to space through nonlinear mapping and can provide real-time detect intrusions. In this section, we briefly discuss these detection capability, deal with large dimensionality of techniques and frameworks. data, and can be used for binary-class as well as multiclass Lee et al. introduced data mining approaches for classification. Other approaches for detecting intrusion detecting intrusions in [30], [31], and [32]. Data mining include the use of genetic algorithm and autonomous and approaches for intrusion detection include association rules probabilistic agents for intrusion detection [1] and [5]. These and frequent episodes, which are based on building methods are generally aimed at developing a distributed classifiers by discovering relevant patterns of program intrusion detection system. and user behavior. Association rules [8] and frequent To overcome the weakness of a single intrusion detection episodes are used to learn the record patterns that describe system, a number of frameworks have been proposed, which user behavior. These methods can deal with symbolic data, describe the collaborative use of network-based and host- and the features can be defined in the form of packet and based systems [45]. Systems that employ both signature- connection details. However, mining of features is limited based and behavior-based techniques are discussed in [19] to entry level of the packet and requires the number of and [41]. In [32], the authors describe a data mining frame- records to be large and sparsely populated; otherwise, they work for building adaptive intrusion detection models. A tend to produce a large number of rules that increase the distributed intrusion detection framework based on mobile complexity of the system [7]. agents is discussed in [12]. Data clustering methods such as the k-means and the fuzzy The most closely related work, to our work, is of Lee et al. c-means have also been applied extensively for intrusion [30], [31], and [32]. They, however, consider a data mining detection [36] and [39]. One of the main drawbacks of the approach for mining association rules and finding frequent clustering technique is that it is based on calculating numeric episodes in order to calculate the support and confidence of distance between the observations, and hence, the observa- the rules separately. Instead, in our work, we define features tions must be numeric. Observations with symbolic features from the observations as well as from the observations and cannot be easily used for clustering, resulting in inaccuracy. the previous labels and perform sequence labeling via the In addition, the clustering methods consider the features CRFs to label every feature in the observation. This setting is independently and are unable to capture the relationship sufficient for modeling the correlation between different between different features of a single record, which further features of an observation. We also compare our work with degrades attack detection accuracy. [21], which describes the use of maximum entropy principle Naive Bayes classifiers have also been used for intrusion for detecting anomalies in the network traffic. The key detection [9]. However, they make strict independence difference between [21] and our work is that the authors in assumption between the features in an observation result- [21] use only the normal data during training and build a ing in lower attack detection accuracy when the features are baseline system, i.e., a behavior-based system, while we train correlated, which is often the case for intrusion detection. our system with both the normal and the anomalous data, Bayesian network can also be used for intrusion detection i.e., we build a hybrid system. Second, the system in [21] fails [28]. However, they tend to be attack specific and build a to model long-range dependencies in the observations, decision network based on special characteristics of which can be easily represented in our model. We also individual attacks. Thus, the size of a Bayesian network integrate the Layered Approach with the CRFs to gain the increases rapidly as the number of features and the type of attacks modeled by a Bayesian network increases. benefits of computational efficiency and high accuracy of To detect anomalous traces of system calls in privileged detection in a single system. processes [20], hidden Markov models (HMMs) have been We compare the Layered Approach with the works in [18], applied in [17], [42], and [43]. However, modeling the system [25], and [41]. The authors in [18] describe the combination of calls alone may not always provide accurate classification as “strong” classifiers using stacking, where the decision tress, in such cases various connection level features are ignored. naive Bayes, and a number of other classification methods are Further, HMMs are generative systems and fail to model used as base classifiers. The authors show that the output long-range dependencies between the observations [29]. We from these classifiers can be combined to generate a better further discuss this in detail in Section 3. classifier rather than selecting the best one. In [25], the authors Decision trees have also been used for intrusion use a combination of “weak” classifiers. The individual detection [9]. The decision trees select the best features for classification power of weak classifiers is slightly better than each decision node during the construction of the tree based random guessing. The authors show that a number of such on some well-defined criteria. One such criterion is to use classifiers when combined using simple majority voting the information gain ratio, which is used in C4.5. Decision mechanism, provide good classification. In [41], the authors trees generally have very high speed of operation and high- apply a combination of anomaly and misuse detectors for attack detection accuracy. better qualification of analyzed events. However, our work is Debar et al. [14] and Zhang et al. [46] discuss the use of not based upon classifier combination. Combination of artificial neural networks for network intrusion detection. classifiers is expensive with regard to the processing time Though the neural networks can work effectively with noisy and decision making. The purpose of classifier combination is data, they require large amount of data for training and it is to improve accuracy. Rather, our system is based upon serial Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
  • 3. GUPTA ET AL.: LAYERED APPROACH USING CONDITIONAL RANDOM FIELDS FOR INTRUSION DETECTION 37 layering of multiple hybrid detectors. From our experiments edges in subgraph S. In addition, the features fk and gk in Section 6, we show that the Layered CRFs perform better are assumed to be given and fixed. For example, a than individual classifiers and they are more efficient and Boolean edge feature fk might be true if the observation Xi accurate than a system based on classifier combination. The is “protocol ¼ tcp,” tag YiÀ1 is “normal,” and tag Yi is results from individual classifiers at a layer are not combined “normal.” Similarly, a Boolean vertex feature gk might be true at any later stage in the Layered Approach, and hence, an if the observation Xi is “service ¼ ftp” and tag Yi is “attack.” attack can be blocked at the layer where it is detected. There is Further, the parameter estimation problem is to find the no communication overhead among the layers and the central parameters ¼ ð1 ; 2 ; . . . ; 1 ; 2 ; . . .Þ from the training data N decision-maker. In addition, since the layers are independent D ¼ ðxi ; yi Þi¼1 with the empirical distribution pðx; yÞ [29]. ~ they can be trained separately and deployed at critical CRFs are undirected graphical models used for sequence locations in a network depending upon the specific require- tagging. The prime difference between CRF and other ments of a network. Using a stacked system will not give us graphical models such as the HMM is that the HMM, being the advantage of reduced processing when an attack is generative, models the joint distribution pðy; xÞ, whereas the detected at the initial layers in the sequential model. CRF are discriminative models and directly model the In this paper, we show the effectiveness of CRFs for conditional distribution pðyjxÞ, which is the distribution of intrusion detection. Motivated by our results in [23], we interest for the task of classification and sequence labeling. perform detailed analysis and show that CRFs are a strong Similar to HMM, the naive Bayes is also generative and candidate for building robust intrusion detection systems. models the joint distribution. Modeling the joint distribution We then show that high efficiency can be achieved by has two disadvantages. First, it is not the distribution of implementing the Layered Approach. Finally, we integrate interest, since the observations are completely visible and the the Layered Approach and the CRFs to develop a system interest is in finding the correct class for the observations, that is accurate and performs efficiently. which is the conditional distribution pðyjxÞ. Second, inferring the conditional probability pðyjxÞ from the modeled joint distribution, using the Bayes rule, requires the marginal 3 CONDITIONAL RANDOM FIELDS FOR INTRUSION distribution pðxÞ. To estimate this marginal distribution is DETECTION difficult since the amount of training data is often limited and Conditional models are probabilistic systems that are used the observation x contains highly dependent features that are to model the conditional distribution over a set of random difficult to model and therefore strong independence variables. Such models have been extensively used in the assumptions are made among the features of an observation. natural language processing tasks. Conditional models offer This results in reduced accuracy [40]. CRFs, however, predict a better framework as they do not make any unwarranted the label sequence y given the observation sequence x. This assumptions on the observations and can be used to model allows them to model arbitrary relationship among different rich overlapping features among the visible observations. features in an observation x [15]. CRFs also avoid the Maxent classifiers [37], maximum entropy Markov models observation bias and the label bias problem, which are [34], and CRFs [29] are such conditional models. The present in other discriminative models, such as the maximum advantage of CRFs is that they are undirected and are, thus, entropy Markov models. This is because the maximum free from the Label Bias and the Observation Bias [27]. The entropy Markov models have a per-state exponential model simplest conditional classifier is the Maxent classifier based for the conditional probabilities of the next state given the upon maximum entropy classification, which estimates the current state and the observation, whereas the CRFs have a conditional distribution of every class given the observations single exponential model for the joint probability of the entire [37]. The training data is used to constrain this conditional sequence of labels given the observation sequence [29]. distribution while ensuring maximum entropy and hence The task of intrusion detection can be compared to many maximum uniformity. We now give a brief description of the problems in machine learning, natural language processing, CRFs, which is motivated from the work in [29]. and bioinformatics. The CRFs have proven to be very Let X be the random variable over data sequence to be successful in such tasks, as they do not make any unwar- labeled and Y the corresponding label sequence. In ranted assumptions about the data. Hence, we explore the addition, let G ¼ ðV ; EÞ be a graph such that Y ¼ ðYv Þv2ðV Þ , suitability of CRFs for intrusion detection. so that Y is indexed by the vertices of G. Then, ðX; Y Þ is a CRF, when conditioned on X, the random variables Yv 3.1 Motivating Example obey the Markov property with respect to the graph: The data analyzed by the intrusion detection system for pðYv jX; Yw ; w 6¼ vÞ ¼ pðYv jX; Yw ; w $ vÞ, where w $ v means classification often has a number of features that are highly that w and v are neighbors in G, i.e., a CRF is a random field correlated and complex relationships exist between them. globally conditioned on X. For a simple sequence (or chain) For example, when classifying network connections as modeling, as in our case, the joint distribution over the label either normal or as attack, a system may consider features sequence Y given X has the following form: such as “logged in” and “number of file creations.” When ! these features are analyzed individually, they do not X X provide any information that can aid in detecting attacks. p ðyjxÞ / exp k fk ðe; yje ; xÞ þ k gk ðv; yjv ; xÞ ; ð1Þ However, when these features are analyzed together, they e2E;k v2V ;k can provide meaningful information, which can be helpful where x is the data sequence, y is a label sequence, and yjs is for the classification task. Taking another example, the the set of components of y associated with the vertices or connection level feature such as the “service invoked” at the Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
  • 4. 38 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 7, NO. 1, JANUARY-MARCH 2010 Fig. 2. Layered representation. Our first goal is to improve the attack detection accuracy. We first compare the accuracy of CRFs for detecting attacks Fig. 1. Graphical representation of a CRF. with other methods in Section 6. We consider all the 41 features in the data set for each of the four attack groups destination provides some information about the class label separately. As we shall observe, the CRFs outperform other (in case an attacker sends request to a service that is not methods for detecting “Unauthorized access to Root” (U2R) available). This information becomes more concrete and attacks. They are also effective in detecting the Probe, aids in classification when analyzed with other features “Remote to Local” (R2L), and “Denial of Service” (DoS) such as “protocol type” and “amount of data transferred” attacks. However, CRFs can be expensive during training between source and destination (in case the client connects and testing. For a simple linear chain structure, the time to an available service such as the ftp and performs data complexity for training a CRF is OðT L2 NIÞ, where T is the transfer). These relationships, between different features in length of the sequence, L is the number of labels, N is the the observed data, if considered during classification can number of training instances, and I is the number of significantly decrease classification error. The CRFs do not iterations. During inference, the Viterbi algorithm is consider features to be independent and hence perform employed, which has a complexity of OðT L2 Þ. The quad- better when compared with other methods. ratic complexity is significant when the number of labels is The data set used in our experiments represents features large as in language tasks. However, for intrusion detection, of every session in relational form with only one label for there are only two labels “normal” and “attack,” and thus, the entire record. In this case, using a conditional model the system is very efficient. We further improve the overall would result in a simple maximum entropy classifier [40]. system performance by using the Layered Approach, which However, we represent the data in the form of a sequence decreases T , i.e., the length of the sequence. The Layered and assign a label to every feature in the sequence using the Approach is described next. first-order Markov assumption instead of assigning a single label to the entire observation. Though, this increases the complexity but it also increases the attack detection accuracy. 4 LAYERED APPROACH FOR INTRUSION DETECTION Each record represents a separate connection, and hence, We now describe the Layer-based Intrusion Detection we consider every record as a separate sequence. We aim to System (LIDS) in detail. The LIDS draws its motivation model the relationships among features of individual from what we call as the Airport Security model, where a connections using a CRF, as shown in Fig. 1. In the figure, number of security checks are performed one after the other features such as duration, protocol, service, flag, and in a sequence. Similar to this model, the LIDS represents a src_bytes take some possible value for every connection. sequential Layered Approach and is based on ensuring During training, feature weights are learnt, and during availability, confidentiality, and integrity of data and (or) testing, features are evaluated for the given observation, services over a network. Fig. 2 gives a generic representa- which is then labeled accordingly. tion of the framework. As it is evident from the figure, every label is connected The goal of using a layered model is to reduce computation to every input feature, which indicates that all the features and the overall time required to detect anomalous events. The in an observation help in labeling, and thus, a CRF can time required to detect an intrusive event is significant and model dependencies among the features in an observation. can be reduced by eliminating the communication overhead Present intrusion detection systems do not consider such among different layers. This can be achieved by making the relationships among the features in the observations. They layers autonomous and self-sufficient to block an attack either consider only one feature, such as in the case of without the need of a central decision-maker. Every layer in system call modeling, or assume conditional independence the LIDS framework is trained separately and then deployed among different features in the observation as in the case of a naive Bayes classifier. As we will show from our sequentially. We define four layers that correspond to the experimental results, the CRFs can effectively model such four attack groups mentioned in the data set. They are Probe relationships among different features of an observation layer, DoS layer, R2L layer, and U2R layer. Each layer is then resulting in higher attack detection accuracy. Another separately trained with a small set of relevant features. advantage of using CRFs is that every element in the Feature selection is significant for Layered Approach and sequence is labeled such that the probability of the entire discussed in the next section. In order to make the layers labeling is maximized, i.e., all the features in the observa- independent, some features may be present in more than one tion collectively determine the final labels. Hence, even if layer. The layers essentially act as filters that block any some data is missing, the observation sequence can still be anomalous connection, thereby eliminating the need of labeled with less number of features. further processing at subsequent layers enabling quick Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
  • 5. GUPTA ET AL.: LAYERED APPROACH USING CONDITIONAL RANDOM FIELDS FOR INTRUSION DETECTION 39 response to intrusion. The effect of such a sequence of layers is illegitimate requests. Hence, for the DoS layer, traffic that the anomalous events are identified and blocked as soon features such as the “percentage of connections having same as they are detected. destination host and same service” and packet level features Our second goal is to improve the speed of operation of such as the “source bytes” and “percentage of packets with the system. Hence, we implement the LIDS and select a errors” are significant. To detect DoS attacks, it may not be small set of features for every layer rather than using all the important to know whether a user is “logged in or not.” 41 features. This results in significant performance im- provement during both the training and the testing of the 5.1.3 R2L Layer system. In many situations, there is a trade-off between The R2L attacks are one of the most difficult to detect as efficiency and accuracy of the system and there can be they involve the network level and the host level features. various avenues to improve system performance. Methods We therefore selected both the network level features such such as naive Bayes assume independence among the as the “duration of connection” and “service requested” observed data. This certainly increases system efficiency, and the host level features such as the “number of failed but it may severely affect the accuracy. To balance this login attempts” among others for detecting R2L attacks. trade-off, we use the CRFs that are more accurate, though expensive, but we implement the Layered Approach to 5.1.4 U2R Layer improve overall system performance. The performance of The U2R attacks involve the semantic details that are very our proposed system, Layered CRFs, is comparable to that difficult to capture at an early stage. Such attacks are often of the decision trees and the naive Bayes, and our system content based and target an application. Hence, for U2R has higher attack detection accuracy. attacks, we selected features such as “number of file creations” and “number of shell prompts invoked,” while 5 INTEGRATING LAYERED APPROACH WITH we ignored features such as “protocol” and “source bytes.” CONDITIONAL RANDOM FIELD We used domain knowledge together with the practical significance and the feasibility of each feature before In Section 1, we discussed two main requirements for an selecting it for a particular layer. Thus, from the total intrusion detection system; accuracy of detection and 41 features, we selected only 5 features for Probe layer, efficiency in operation. As discussed in Sections 3 and 4, 9 features for DoS layer, 14 features for R2L layer, and respectively, the CRFs can be effective in improving the 8 features for U2R layer. Since each layer is independent of attack detection accuracy by reducing the number of false every other layer, the feature set for the layers is not alarms, while the Layered Approach can be implemented to disjoint. The selected features for all the four layers are improve the overall system efficiency. Hence, a natural presented in Appendix A. We then use the CRFs for attack choice is to integrate them to build a single system that is detection as discussed in Section 3. However, the difference accurate in detecting attacks and efficient in operation. Given is that we use only the selected features for each layer rather the data, we first select four layers corresponding to the four than using all the 41 features. We now give the algorithm attack groups (Probe, DoS, R2L, and U2R) and perform for integrating CRFs with the Layered Approach. feature selection for each layer, which is described next. Algorithm 5.1 Feature Selection Training Step 1: Select the number of layers, n, for the complete Ideally, we would like to perform feature selection auto- matically. However, as will be discussed later in Section 8, system. the methods for automatic feature selection were not found Step 2: Separately perform features selection for each layer. to be effective. In this section, we describe our approach for Step 3: Train a separate model with CRFs for each layer selecting features for every layer and why some features using the features selected from Step 2. were chosen over others. In our system, every layer is Step 4: Plug in the trained models sequentially such that separately trained to detect a single type of attack category. only the connections labeled as normal are passed We observe that the attack groups are different in their to the next layer. impact, and hence, it becomes necessary to treat them Testing differently. Hence, we select features for each layer based Step 5: For each (next) test instance perform Steps 6 upon the type of attacks that the layer is trained to detect. through 9. Step 6: Test the instance and label it either as attack or 5.1.1 Probe Layer normal. The probe attacks are aimed at acquiring information about Step 7: If the instance is labeled as attack, block it and the target network from a source that is often external to the identify it as an attack represented by the layer network. Hence, basic connection level features such as the name at which it is detected and go to Step 5. Else “duration of connection” and “source bytes” are significant while features like “number of files creations” and “number pass the sequence to the next layer. of files accessed” are not expected to provide information Step 8: If the current layer is not the last layer in the system, for detecting probes. test the instance and go to Step 7. Else go to Step 9. Step 9: Test the instance and label it either as normal or as 5.1.2 DoS Layer an attack. If the instance is labeled as an attack, The DoS attacks are meant to force the target to stop the block it and identify it as an attack corresponding service(s) that is (are) provided by flooding it with to the layer name. Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
  • 6. 40 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 7, NO. 1, JANUARY-MARCH 2010 TABLE 1 desktop running with Intel(R) Core(TM) 2, CPU 2.4 GHz, Data Set and 2-Gbyte RAM under exactly the same conditions. We are mainly interested in the test time efficiency and not in the time required for training of the model as the real-time performance of the system depend upon the test efficiency alone. We note that our system is very efficient during testing. When we considered all the 41 features, the time taken to test all the 250,436 attacks was 57 seconds, which reduced to 17 seconds when we performed feature selection and implemented the Layered Approach. More details will be presented when we give the detailed results for the Our final goal is to improve both the attack detection experiments. accuracy and the efficiency of the system. Hence, we For our results, we give the Precision, Recall, and F -Value integrate the CRFs and the Layered Approach to build a and not the accuracy alone as with the given data set, it is single system. We perform detailed experiments and show easy to achieve very high accuracy by carefully selecting the that our integrated system has dual advantage. First, as sample size. From Table 1, we note that the number of expected, the efficiency of the system increases signifi- instances for the U2R, Probes, and R2L attacks is very low. cantly. Second, since we select significant features for each Hence, if we use accuracy as a measure for testing the layer, the accuracy of the system further increases. This is performance of the system, the system can be biased and can because all the 41 features are not required for detecting attain an accuracy of more than 99 percent for U2R attacks attacks belonging to a particular attack group. Using more [16]. However, Precision, Recall, and F -Value are not features than required can result in fitting irregularities in dependent on the size of the training and the test samples. the data, which has a negative effect on the attack detection They are defined as follows: accuracy of the system. TP Precision ¼ 6 EXPERIMENTS TP þ FP TP For our experiments, we use the benchmark KDD ’99 Recall ¼ TP þ FN intrusion data set [3]. This data set is a version of the original ð1 þ
  • 7. 2 Þ Ã Recall à Precision 1998 DARPA intrusion detection evaluation program, which F -Value ¼ ; is prepared and managed by the MIT Lincoln Laboratory.
  • 8. 2 Ã ðRecall þ PrecisionÞ The data set contains about five million connection records where TP, FP, and FN are the number of True Positives, as the training data and about two million connection False Positives, and False Negatives, respectively, and
  • 9. records as the test data. In our experiments, we use corresponds to the relative importance of precision versus 10 percent of the total training data and 10 percent of the recall and is usually set to 1. test data (with corrected labels), which are provided We divide the training data into different groups; separately. This leads to 494,020 training and 311,029 test Normal, Probe, DoS, R2L, and U2R. Similarly, we divide instances. Each record in the data set represents a connection the test data. We perform 10 experiments for each attack between two IP addresses, starting and ending at some well- class by randomly selecting data corresponding to that defined times with a well-defined protocol. Further, every attack class and normal data only. For example, to detect record is represented by 41 different features. Each record Probe attacks, we train and test the system with Probe represents a separate connection and is hence considered to attacks and normal data only. We do not add the DoS, R2L, be independent of any other record. and U2R data when detecting Probes. Not including these The training data is either labeled as normal or as one of attacks while training allows the system to better learn the the 24 different kinds of attack. These 24 attacks can be features for Probe attacks and normal events. When such a grouped into four classes; Probing, DoS, R2L, and U2R. system is deployed online, other attacks such as DoS can Similarly, the test data is also labeled as either normal or as either be seen as normal or as Probes. If DoS attacks are one of the attacks belonging to the four attack groups. It is detected as normal, we expect them to be detected as attack important to note that the test data is not from the same at other layers in the system. However, if the DoS attacks are probability distribution as the training data, and it includes detected as Probe, it must be considered as an advantage specific attack types not present in the training data. This since the attack is detected at an early stage. Similarly, if makes the intrusion detection task more realistic [3]. Table 1 some Probe attacks are not detected at the Probe layer, they gives the number of instances for each group of attack in the may be detected at subsequent layers. Hence, for four attack data set. For our experiments with CRFs, we use the CRF toolkit, classes, we have four independent models, which are CRF++ [2]. We use the Weka tool [44] to perform experiments trained separately with specific features to detect attacks with the decision trees and the naive Bayes classifier. We belonging to that particular group. For our experiments, we develop python and shell scripts for data formatting and report the best, the average, and the worst cases. implementing the Layered Approach. For all our experi- We represent a single layer, for example the Probe layer, ments, we perform hybrid detection, as discussed in Section 1, in Fig. 3. Other layers can be constructed similarly. and use both the normal and the anomalous connections for In Section 6.1, we perform experiments with individual training the model. We perform our experiments on a layers in the system. In Section 6.2, we represent how to Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
  • 10. GUPTA ET AL.: LAYERED APPROACH USING CONDITIONAL RANDOM FIELDS FOR INTRUSION DETECTION 41 TABLE 3 Normal and Probes (with Feature Selection) Fig. 3. Representation of a single layer (e.g., probe layer). implement the system in real scenario and compare our results with other systems in Section 6.3. In Section 6.4, we discuss the significance of our results. 6.1 Building Individual Layers of the System We perform two sets of experiments. From the first experiment, we wish to examine the accuracy of CRFs for intrusion detection. The objective is to see how CRFs perform feature selection, our system achieves much higher compare with other techniques, which are known to accuracy and there is significant improvement in efficiency. perform well. We do not consider feature selection, and the systems are trained using all the 41 features. From this 6.1.2 Detecting Probe Attacks with Feature Selection experiment, we observe that CRFs perform much better for We used the same set of instances for this experiment as U2R attacks while the decision trees achieve higher attack used in the previous experiment. However, we perform detection for Probes and R2L. The difference in attack feature selection for this experiment. Table 3 gives the detection accuracy for DoS is not significant. We note that results for this experiment. the reason for better performance of decision trees is that We observe that the system takes only 2.04 seconds to they perform feature selection. This motivates us to perform label all the 64,759 test instances. The Layered CRFs our second experiment where we perform feature selection perform better and faster than our previous experiment by selecting a small set of features for every attack group and are the best choice for detecting Probes. We also note instead of using all the 41 features. We perform the same that there is no significant advantage with respect to time experiment with decision trees and naive Bayes and for the layered decision trees as the number of features used compare the results. We call the integrated models as in normal decision trees and in the layered decision trees is Layered CRFs, layered decision trees, and layered naive approximately the same, resulting in similar efficiency. We Bayes, respectively. For better comparison and readability, further note that the Recall and hence the F -Value for the we give the results for both the experiments together. layered naive Bayes decreases drastically. This can be explained as follows: The classification accuracy with naive 6.1.1 Detecting Probe Attacks with All 41 Features Bayes generally improves as the number of features We randomly select about 10,000 normal records and all the increases. However, if the number of features increases to a very large extent, the estimation tends to become Probe records from the training data as the training data for unreliable. As a result, when we use all the 41 features, detecting Probe attacks. We then use all the normal and the naive Bayes performs well but when we decrease the Probe records from the test data for testing. Hence, we have number of features to five, its classification accuracy 15,000 training instances and 64,759 test instances. Table 2 decreases. From this experiment, we conclude that the gives the results for the experiments. Layered CRFs are a better choice for detecting Probe In Table 2, the testing time of 14.53 seconds represents attacks. the total time taken to label 64,759 test instances. The results show that the decision trees are more efficient than the CRFs 6.1.3 Detecting DoS Attacks with All 41 Features and the naive Bayes. This is because they have a small tree We randomly select about 20,000 normal records and about structure, often with very few decision nodes, which is very 4,000 DoS records from the training data as the training data efficient. The attack detection accuracy is also higher for the for detecting DoS attacks. We then use all the normal and decision trees as they select the best possible features during tree construction. However, as we show next, once we DoS records from the test data for testing. Hence, we have 24,000 training instances and 290,446 test instances. Table 4 gives the results for the experiments. TABLE 2 In Table 4, the testing time of 64.42 seconds represents Normal and Probes (All 41 Features) the time taken to label all the 290,446 test instances. The results show that all the three methods considered have similar attack detection accuracy; however, decision trees give a slight advantage with regard to test time efficiency. 6.1.4 Detecting DoS Attacks with Feature Selection We used the same data for this experiment as used in the previous experiment. However, we perform feature selec- tion. Table 5 gives the results. We observe that the system now takes only 15.17 seconds to label all the 290,446 test instances. The results follow the Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
  • 11. 42 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 7, NO. 1, JANUARY-MARCH 2010 TABLE 4 TABLE 7 Normal and DoS (All 41 Features) Normal and R2L (with Feature Selection) TABLE 5 Normal and DoS (with Feature Selection) 6.1.6 Detecting R2L Attacks with Feature Selection Table 7 gives the results when we performed feature selection for detecting R2L attacks. We observe that the time taken to test all the 76,942 instances is only 5.96 seconds. Further, the Layered CRFs perform much better than the CRFs (an increase of about 60 percent), layered decision trees (an increase of about 125 percent), decision trees (an increase of about 17 percent), layered naive Bayes (an increase of about 250 percent), and naive Bayes (an increase of about 250 percent) and are the best choice for detecting same trend as in the previous experiment with only a slight the R2L attacks. The Layered CRFs take slightly more improvement. However, if we consider the testing time we time, which is acceptable since we achieve much higher find that layered decision trees are a better choice. We also detection accuracy. note that there is slight increase in the detection accuracy 6.1.7 Detecting U2R Attacks with All 41 Features when we perform feature selection, but this increase is not significant. The real advantage is seen in the reduced time We randomly select about 1,000 normal records and all the for testing, which decreases four folds. U2R records from the training data as the training data for detecting the User to Root attacks. We then use all the 6.1.5 Detecting R2L Attacks with All 41 Features normal and U2R records from the test data for testing. We randomly select about 1,000 normal records and all the Hence, we have 1,000 training instances and 60,661 test R2L records from the training data as the training data for instances. Table 8 gives the results. detecting R2L attacks. We then use all the normal and In Table 8, the testing time of 13.45 seconds represents R2L records from the test data for testing. Hence, we have the time taken to label all the 60,661 test instances. In this 2,000 training instances and 76,942 test instances. Table 6 gives experiment, we find that the CRFs are far better than the the results. other two methods. The F -Value for CRFs is more than In Table 6, the testing time of 17.16 seconds represents 150 percent with respect to the decision trees and more than the time taken to label all the 76,942 test instances. We 600 percent with respect to the naive Bayes. The U2R attacks observe that the decision trees have a higher F -Value, but if are very difficult to detect and most of the present intrusion we look at the number of false alarms, we find that the CRFs detection systems fail to detect such attacks with acceptable perform better and have high Precision compared to the reliability. We find that the CRFs can be used to reliably decision trees and the naive Bayes. detect such attacks. TABLE 6 TABLE 8 Normal and R2L (All 41 Features) Normal and U2R (All 41 Features) Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
  • 12. GUPTA ET AL.: LAYERED APPROACH USING CONDITIONAL RANDOM FIELDS FOR INTRUSION DETECTION 43 TABLE 9 TABLE 10 Normal and U2R (with Feature Selection) Confusion Matrix TABLE 11 Attack Detection at Each Layer (Case 1) 6.1.8 Detecting U2R Attacks with Feature Selection In this experiment, we used exactly the same set of instances as we used in the previous experiment. We also perform feature selection. Table 9 gives the results for this experiment. We observe that the system takes only 2.67 seconds to label all the 60,661 test instances. The Layered CRFs are the best category once the system detects an event as anomalous. choice for detecting the U2R attacks and are far better than Layered Approach not only improves the attack detection, CRFs (an increase of about 8 percent), layered decision trees but it also helps identify the type of attack once detected (an increase of about 30 percent), decision trees (an increase because every layer is trained to detect only a particular of about 184 percent), layered naive Bayes (an increase of category of attack. Hence, if an attack is detected at the U2R about 38 percent), and naive Bayes (an increase of about layer, it is very likely that the attack is of “U2R” type. This 675 percent). We observe that the attack detection capability enables to perform quick recovery and prevent similar also increases for the decision trees and the naive Bayes. attacks. Fig. 4 gives the real-time system representation. It is evident from the results that the accuracy of Layered We integrate the four models (with feature selection) from CRFs is significantly higher for the U2R, R2L, and the Probe Section 6.1 to develop the final system. In this experiment, attacks. The difference in accuracy is, however, not sig- we use the same data for training the individual models as nificant for the DoS attacks. Further, regardless of the used in our previous experiments. However, the data in the method considered and particularly for the CRFs, the time test set is relabeled either as normal or as attack and all the required for training and testing the system is drastically data from the test set is passed though the system starting reduced once we perform feature selection. We also note from the first layer. If layer 1 detects any connection as an that the increase in detection accuracy is not significant for attack, it is blocked and labeled as “Probe.” Only the events the layered decision trees and the layered naive Bayes for labeled as “Normal” are allowed to go to the next layer. The the DoS group of attack. Their accuracy of detection same process is repeated at the next layers where an attack is decreases for the Probe and R2L attacks while it increases blocked and labeled as “DoS,” “R2L,” or “U2R” at layer 2, for the U2R attacks. However, we find that in all the cases, layer 3, and layer 4, respectively. We perform all the the Layered CRFs perform significantly better and can better experiments 10 times and report their average. We give the learn a model when we use a small set of specific features for results for this experiment in Tables 10, 11, and 12. Table 10 training. gives the confusion matrix where the values represent the percent detection with respect to each of the five classes. 6.2 Implementing the System in Real Life From the table, we observe that our system can detect In real scenario, we are not aware of the category of an most of the “Probe” (98.62 percent), “DoS” (97.40 percent), attack. Rather, we are interested in identifying the attack and “U2R” (86.33 percent) attacks while giving very few false alarms at each layer. The system can also detect “R2L” attacks with much higher reliability (29.62 percent) when compared with the previously reported systems, as we will discuss later in Section 6.3. The confusion matrix shows that TABLE 12 Attack Detection at Each Layer (Case 2) Fig. 4. Real-time representation of the system. Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
  • 13. 44 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 7, NO. 1, JANUARY-MARCH 2010 only 71.90 percent of DoS attacks are labeled as DoS during TABLE 13 testing. However, it is very important to note that the Layered versus Nonlayered Approach accuracy for detecting DoS attacks is not 71.90 percent, but it is 25:50 þ 71:90 þ 0:00 þ 0:00 ¼ 97:40 percent. This is because 25.50 percent of the DoS attacks are already detected at the first layer, though our system identifies them as probes since they are detected at the first layer. This is acceptable because in the real environment it is critical to detect an attack as early as possible to minimize its impact. It is also important to note that most of the “U2R” attacks are detected in the third layer itself and hence labeled as “R2L.” However, if we remove the third layer, the fourth layer can detect these attacks with similar accuracy. Further, looking at the “R2L” and “U2R” columns in Table 10, it is accurate in detecting attacks particularly the U2R, the R2L, natural to think that the two layers can be merged. However, and the Probes. this has two disadvantages. First, merging the two layers It is important to note that the time should be read in results in increasing the number of features, which reduces relative terms rather than absolute, as for ease of experi- efficiency. The merged layer performs poorly with regard to ments we used scripts for implementation. In real environ- the total time taken when compared with both the ment, high speed can be achieved by implementing the unmerged layers together. Second, when the layers are complete system in languages with efficient compilers such merged, the “U2R” attacks are not detected effectively and as the “C Language.” Further, pipelining can be implemen- their individual attack detection accuracy decreases. This is ted in multicore processors, where each core may represent a because the number of “U2R” instances is very low in the training data and the system simply learns the features that single layer, and due to pipelining, multiple I/O operations are specific to the “R2L” attacks. Hence, we prefer separate can be replaced by a single I/O operation providing very layers for the two attack groups. Using our approach, we can high speed of operation. hope that any attack, even though its category is unknown, 6.3 Comparison of Results can be detected at any one of the layers in the system. We can also increase or decrease the number of layers depend- In this section, we compare our work with other well-known ing upon the environment where the system is deployed. methods based on the anomaly intrusion detection principle. We evaluate the performance of each layer in the The anomaly-based systems primarily detect deviations system in Table 11. From the table, we observe that out of from the learnt normal data by using statistical methods, all the 250,436 attack instances in the test data set, more machine learning, or data mining approaches. Standard than 25 percent of the attacks are blocked at layer 1, and techniques such as the decision trees and naive Bayes are more than 90 percent of all the attacks have been blocked known to perform well. However, our experiments show by the end of layer 2. Thus, the Layered Approach is very that the Layered CRFs perform far better than these effective in reducing the attack traffic at each layer in the techniques. The main reason for this is that the CRFs do not system. The configuration takes 21 seconds to classify all consider the observation features to be independent. In [38], the 250,436 attacks. the authors present a comparative study of various classifiers We can do further optimization by putting the DoS layer when applied to the KDD ’99 data set, and in [13], the authors before the Probe layer. We can do this because the data is propose the use of Principle Component Analysis (PCA) relational and each layer in our system is independent. before applying a machine learning algorithm. Use of Putting the DoS layer before the Probe layer serves dual support vector machines is discussed in [26]. We compare advantage as most of the attacks are detected at the first our results from the results presented in these papers in layer itself and the overall system performs efficiently. This Table 14. The table represents the Probability of Detection optimization becomes significant in severe attack situations (PD) and False Alarm Rate (FAR) in percent for various when the target is overwhelmed with illegitimate connec- methods including the KDD ’99 cup winners. tions. The results are presented in Table 12. From the table, we observe that the Layered CRFs We observe that the Layered Approach can be very perform significantly better than the previously reported effective in restricting the attack traffic to the initial layers in results including the winner of the KDD ’99 cup and the system. We also performed experiments when we do various other methods applied to this data set. The most not implement the Layered Approach, i.e., we consider only impressive part of the Layered CRFs is the margin of improve- a single system that is trained with two classes (normal and ment as compared with other methods. Layered CRFs have very attack). In this system, all the Probes, DoS, R2L, and U2R high attack detection of 98.6 percent for Probes (5.8 percent attacks are labeled as “attack.” We perform experiments improvement) and 97.40 percent detection for DoS. They both with and without feature selection. For feature outperform by a significant percentage for the R2L (34.5 percent selection, we consider 21 features, which are selected by improvement) and the U2R (34.8 percent improvement) attacks. applying the union operation on the feature sets of all the four attack types. We compare these results with the 6.4 Discussion and Issues Layered Approach in Table 13. From our experiments and the comparison in Table 14, we We observe that a system implementing the Layered conclude that the Layered CRFs can be very effective in Approach with feature selection is more efficient and more detecting the Probe, the U2R, and the R2L attacks as well as Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
  • 14. GUPTA ET AL.: LAYERED APPROACH USING CONDITIONAL RANDOM FIELDS FOR INTRUSION DETECTION 45 TABLE 14 TABLE 15 Comparison of Results Ranking for the Six Methods scores better. Our integrated system also has the advantage that any method can be used in the layers of the system. This gives flexibility to the user to decide between the time and accuracy trade-off. Furthermore, we can increase or decrease the number of layers in the system depending upon the task requirement. Finally, our system can be used for performing analysis on attacks because the attack category can be inferred from the layer at which the attack is detected. To determine the statistical significance of our results, we compare our proposed method (Layered CRFs) with others for detecting Probes, DoS, R2L, and U2R attacks. We use the Wilcoxon sum rank test with 95 percent confidence interval to discriminate the performance of these methods. Table 15 gives the ranking for various methods compared, where a system with rank 1 is the best. The results of the Wilcoxon test indicate that the Layered CRFs are much better (or equal) for detecting attacks. Thus, we conclude that the Layered CRFs are a strong candidate the DoS attacks. However, if we consider all the 41 features for building robust and efficient intrusion detection systems. given in the data set, we find that the time required to train and test the model is high. To address this, we performed 7 EFFECT OF NOISE experiments with our integrated system by implementing a four-layer system. The four layers correspond to Probe, Ideally, we would like to perform similar experiments with DoS, R2L, and U2R. For each layer, we then selected a set of a large number of data sets. However, given the domain of features that is sufficient to detect attacks at that particular the problem, there are no other data sets that are freely layer. Feature selection for each layer enhances the available, which can be used for our experiments. To performance of the entire system. The runtime (testing) ameliorate this problem to some extent, we add substantial amount of noise in the training data and perform similar performance of our model is comparable with other experiments to study the robustness of these systems. By methods; however, the time required to train the model is experimenting with noisy data, we want to determine the slightly higher. We also observe that feature selection not sensitivity of the proposed scheme with respect to noise. If only decreases the time required to test an instance, but it the system performs poorly with noisy data, the results also increases the accuracy of attack detection. This is could be an artifact of the data set. because using more features than required can generate superfluous rules often resulting in fitting irregularities in 7.1 Addition of Noise to Data the data, which can misguide classification. From our Addition of noise was controlled by two parameters, the experimental results, we conclude that the main strength probability of adding noise to a feature, p, and the scaling of our method lies in detecting the R2L and the U2R attacks, factor, s, for a feature. We performed four sets of experi- which are not satisfactorily detected by other methods. Our ments with noisy data, separately, one for each layer. For method gives slight improvement for detecting Probe each layer, we varied the parameter p between 0 and 0.95 (by attacks and was similar in accuracy when compared with keeping it at values 0.10, 0.20, 0.33, 0.50, 0.75, 0.90, and 0.95) other methods for detecting the DoS attacks. and varied the parameter s between À1,000 and þ1,000. In The prime reason for better detection accuracy for the case when the original feature was “0,” noise was added to CRFs is that they do not consider the observation features to any feature by using an additive function (random value be independent. CRFs evaluate all the rules together, which between À1,000 and þ1,000) instead of scaling. Figs. 5, 6, 7, are applicable for a given observation. This results in and 8 represent the effect of noise on each layer separately. capturing the correlation among different features of the We find that our integrated system is robust to noise in observation resulting in higher accuracy. Considering both the training data and performs better than other methods the accuracy and the time required for testing, our system for all of the four attack groups. Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
  • 15. 46 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 7, NO. 1, JANUARY-MARCH 2010 Fig. 5. Effect of noise on Probe layer. Fig. 8. Effect of noise on U2R layer. We then used PCA for dimensionality reduction [13]. However, the main drawback of using PCA in our task is that PCA transforms a large number of possibly correlated features into a small number of uncorrelated features known as the principle components. Hence, when we applied PCA followed by CRFs in the newly transformed feature space, the method did not provide significant advantage as the strength of our approach is to model correlation among features and the features in the new space are independent. We also note that, to construct a decision tree, the C4.5 algorithm performs feature selection. We selected the Fig. 6. Effect of noise on DoS layer. same features as selected by the C4.5 algorithm and then performed experiments with only those features. However, 8 AUTOMATIC FEATURE SELECTION there was no significant improvement in the results. From our experiments in the previous sections, we showed We also performed experiments with the method the advantages of performing feature selection and im- proposed in [33] for efficiently inducing features for a plementing the Layered Approach for attack detection. We CRF. The method is based upon iteratively constructing performed our experiments by manually selecting features feature conjunctions that would significantly increase the for different layers. However, we want to compare the conditional log-likelihood if added to the model. We used results of manual feature selection with the results of the Mallet tool [35] for performing these experiments and automatic feature selection (and no feature selection) for all compare the results with our previous results based on the layers. Hence, we investigated various methods for Layered CRFs with manual feature selection in Table 16. We automatic feature selection. observed that both the systems, with automatic and manual We experimented with a feed forward neural network to feature selections, had similar test time performance, but determine the weights for all the 41 features. Features with the accuracy of detection when features were induced weights close to zero were discarded. As a result, only a automatically was significantly lower than our system small set of features was selected for each layer. However, based upon manual feature selection. when we performed the experiments on the reduced set of It was not surprising that manual feature selection features and compared the results, there was no significant performed better than automatic feature selection. How- improvement in the detection accuracy, though there was ever, we note that automatic feature selection for Layered reduction in training and testing time. CRFs performed better than the decision trees, particularly for the R2L and the U2R attacks. This suggests that Layered TABLE 16 Feature Selection Fig. 7. Effect of noise on R2L layer. Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.
  • 16. GUPTA ET AL.: LAYERED APPROACH USING CONDITIONAL RANDOM FIELDS FOR INTRUSION DETECTION 47 Approach using CRFs with automated feature selection is a A.2 Features Selected for DoS Layer feasible scheme for building reliable intrusion detection systems. 9 CONCLUSIONS In this paper, we have addressed the dual problem of Accuracy and Efficiency for building robust and efficient intrusion detection systems. Our experimental results in Section 6 show that CRFs are very effective in improving the attack detection rate and decreasing the FAR. Having a low FAR is very important for any intrusion detection system. Further, feature selection and implementing the Layered Approach significantly reduce the time required to A.3 Features Selected for R2L Layer train and test the model. Even though we used a relational data set for our experiments, we showed that the sequence labeling methods such as the CRFs can be very effective in detecting attacks and they outperform other methods that are known to work well with the relational data. We compared our approach with some well-known methods and found that most of the present methods for intrusion detection fail to reliably detect R2L and U2R attacks, while our integrated system can effectively and efficiently detect such attacks giving an improvement of 34.5 percent for the R2L and 34.8 percent for the U2R attacks. We also discussed how our system is implemented in real life. Our system can help in identifying an attack once it is detected at a particular layer, which expedites the intrusion response mechanism, thus minimizing the impact of an attack. We showed that our system is robust to noise and performs better than any other compared system even when the training data is noisy. Finally, our system has the advantage A.4 Features Selected for U2R Layer that the number of layers can be increased or decreased depending upon the environment in which the system is deployed, giving flexibility to the network administrators. The areas for future research include the use of our method for extracting features that can aid in the develop- ment of signatures for signature-based systems. The signature-based systems can be deployed at the periphery of a network to filter out attacks that are frequent and previously known, leaving the detection of new unknown attacks for anomaly and hybrid systems. Sequence analysis methods such as the CRFs when applied to relational data give us the opportunity to employ the Layered Approach, as shown in this paper. This can further be extended to ACKNOWLEDGMENTS implement pipelining of layers in multicore processors, The authors sincerely thank the anonymous reviewers which is likely to result in very high performance. whose comments have greatly helped clarify and improve this paper. APPENDIX A FEATURE SELECTION REFERENCES A.1 Features Selected for Probe Layer [1] Autonomous Agents for Intrusion Detection, http://www.cerias. purdue.edu/research/aafid/, 2010. [2] CRF++: Yet Another CRF Toolkit, http://crfpp.sourceforge.net/, 2010. [3] KDD Cup 1999 Intrusion Detection Data, http://kdd.ics.uci.edu/ databases/kddcup99/kddcup99.html, 2010. [4] Overview of Attack Trends, http://www.cert.org/archive/pdf/ attack_trends.pdf, 2002. [5] Probabilistic Agent Based Intrusion Detection, http://www.cse.sc. edu/research/isl/agentIDS.shtml, 2010. Authorized licensed use limited to: Asha Das. Downloaded on August 10,2010 at 06:59:52 UTC from IEEE Xplore. Restrictions apply.