SlideShare uma empresa Scribd logo
1 de 4
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 01 Issue: 01 June 2015 Page No.9-12
ISSN: 2278-2419
9
Study on Data Mining Suitability for Intrusion
Detection System (IDS)
S.Vijaya Peterraj, S.Pauline Precilla Mary
Department of MCA,Loyola College,Chennai
Email: vijayapeterraj@gmail.com, pricy2277@gmail.com
Abstract-Intrusion Detection System used to discover attacks
against computers and network Infrastructures. There are many
techniques used to determine the IDS such as Outlier Detection
Schemes for Anomaly Detection, K-Mean Clustering of
monitoring data, classification detection and outlier detection.
The data mining approaches help to determine what meets the
criteria as an intrusion versus normal traffic, whether a system
uses anomaly detection, misuse detection, target monitoring, or
stealth probes. This paper attempts to evaluate, categorize,
compares and summarizes the performance of data mining
techniques to detect the intrusion.
Key Words- Data mining techniques, IDS techniques and
computer network security.
I. INTRODUCTION
Intrusion Detection System which is abbreviated as IDS. This
tool is considered as one of the imperative tools. IDSs perform
to detect the feasible intrusions and detect the computer
attacks. By this it alerts the proper individuals upon detection.
They supervise, detect and respond to unauthorized activity.
Once the event is detected it makes an alert. As by the
definition this means “the logical complement to network
firewalls”. IDSs are vital and necessary element of a complete
information security infrastructure. In a simple way this can be
explained as “the progression that monitoring computers or
networks for unauthorized entry, doings, or file modifications
also used to monitor network traffic.
This contains basic types of ID they are Host based and
Network based and each has a dissimilar loom to monitoring
and securing data. A main rationale for using data mining in
this is to assist in the examination of collections of
interpretations of behavior.
II. WHAT IS IDS?
It is an illegal performs of entering, seizing, or taking
possession of another's computer system. It means a code that
disables the accurate flowing of traffic on the network or steals
the information from the traffic. In other words it can be
defined as “to be a violation of the security policy of the
system”. It raises the alarm when possible intrusion happens.
The tools are based on signature of known attacks. When the
network is tiny and signatures are reserved up to date, the
human forecast solution to Intrusion Detection works well. As
in the final words this is “the progression of monitoring and
analyzing the dealings taking place in a computer system in
order to detect signs of security problems”.
III. INTRUSION DETECTION METHODOLOGIES
The signatures of a few attacks are identified, whereas further
attacks only reveal some variation from normal patterns. There
are two main methodologies that have been devised to perceive
intruders. Many intrusion detection system make use one or
both of these methodologies. They are:
Anomaly Detection
Misuse Detection
A. Anomaly Detection
Anomaly detection assumes that an intrusion will always
reproduce some deviations from normal patterns. It also refers
to detecting patterns in a given data set that do not conform to
a recognized normal behavior. The patterns thus detected are
called anomalies and frequently translate to significant and
actionable information in several application domains. It
essentially refers to storing features of user’s typical behaviors
into database, then comparing user’s present behavior with
those in the database. The benefit of anomaly detection lies in
its entire insignificance of the system, its strong adaptability
and the opportunity to detect the attack that was not at all
detected before. Anomaly-based intrusion detection is about
favoritism of hateful and rightful patterns of activities.
Anomaly detection can be separated into static and dynamic
anomaly detection. Typically, static detectors only tackle the
software portion of a system and are based on the statement
that the hardware need not be checked.
B. Misuse Detection
Misuse detection refers to techniques that utilize patterns of
identified intrusions. It is based on the knowledge of system
vulnerabilities. Preferably, a system security administrator
should be sentient of all the known vulnerabilities and
eradicate them. The original misuse detection systems used
rules to illustrate events suggestive of intrusive actions that a
security administrator looked for inside the system.
IV. DATA MINING FOR IDS?
By definition it is the progression of extracting patterns from
data. It is used in an ample range of profiling practices. A
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 01 Issue: 01 June 2015 Page No.9-12
ISSN: 2278-2419
10
crucial reason for using data mining in this is to support in the
investigation of collections of observations of behavior.
Data mining technology is superior for:
It can progress huge amount of data
It can determine the veiled and unseen information
Data Mining Tasks are:
Classification
Clustering
Regression
Association rule
A. Classification
It is the chore of generalizing identified formation to apply to
new data. For example, predicting tumor cells as benign or
malignant, classifying credit card transactions as legitimate or
fraudulent. Common algorithms include decision tree learning,
nearest neighbor, Naive Bayesian classification, neural
networks and support vector machines.
B. Clustering
It is the task of grouping objects or known structures in the
data that are in some way or another “similar”, without using
known structures in the data.
C. Regression
It attempts to find a function which models the data with the
least error.
D. Association Rule Learning
It searches for relationships between variables. For example a
supermarket might gather data on customer purchasing habits.
Using association rule learning, the supermarket can determine
which products are frequently bought together and use this
information for marketing purposes.
V. CLASSIFICATION TECHNIQUES
A. Decision Tree Algorithm
A decision tree performs the classification of a given data
sample through various levels of decisions to help reach a final
decision. Such a sequence of decisions is represented in a tree
structure. The tree structure is used in classifying unknown
data records. A decision tree with a range of discrete
(symbolic) class labels is called a classification tree, whereas a
decision tree with a range of continuous (numeric) values is
called a regression tree. CART (Classification and Regressing
Tree) is a well-known program used in the designing of
decision trees. Decision trees make use of the ID3 algorithm,
C4.5 algorithm and CART algorithm.
a) Ide3 Algorithm
IDE3 (Iterative Dichotomiser 3) decision tree algorithm was
introduced in 1986 by Quinlan Ross (Quinlan, 1986 and 1987).
It is based on Hunt’s algorithm and it is serially implemented.
Like other decision tree algorithms the tree is constructed in
two phases; tree growth and tree pruning. Data is sorted at
every node during the tree building phase in-order to select the
best splitting single attribute (Shafer et al, 1996). IDE3 uses
information gain measure in choosing the splitting attribute. It
only accepts categorical attributes in building a tree model
(Quinlan, 1986 and 1987).IDE3 does not give accurate result
when there is too-much noise or details in the training data set,
thus a an intensive pre-processing of data is carried out before
building a decision tree model with IDE3.
b) C4.5 Algorithm
C4.5 algorithm is an improvement of IDE3 algorithm,
developed by Quinlan Ross (1993). It is based on Hunt’s
algorithm and also like IDE3, it is serially implemented.
Pruning takes place in C4.5 by replacing the internal node with
a leaf node thereby reducing the error rate (Podgorelecet al,
2002). Unlike IDE3, C4.5 accepts both continuous and
categorical attributes in building the decision tree. It has an
enhanced method of tree pruning that educes misclassification
errors due noise or too-much details in the training data set.
Like IDE3 the data is sorted at every node of the tree in order
to determine the best splitting attribute. It uses gain ratio
impurity method to evaluate the splitting attribute (Quinlan,
1993).
c) Cart Algorithm
CART (Classification and regression trees) was introduced by
Breiman, (1984). It builds both classifications and regressions
trees. The classification tree construction by CART is based on
binary splitting of the attributes. It is also based on Hunt’s
model of decision tree construction and can be implemented
serially (Breiman, 1984). It uses gini index splitting measure in
selecting the splitting attribute. Pruning is done in CART by
using a portion of the training data set (Podgorelecet al, 2002).
CART uses both numeric and categorical attributes for
building the decision tree and has in-built features that deal
with missing attributes (Lewis, 200). CART is unique from
other Hunt’s based algorithm as it is also use for regression
analysis with the help of the regression trees. The regression
analysis feature is used in forecasting a dependent variable
(result) given a set of predictor variables over a given period of
time (Breiman,1984). It uses many single variable splitting
criteria like gini index, symgini etc and one multi-variable
(linear combinations) in determining the best split point and
data is sorted at every node to determine the best splitting
point. The linear combination splitting criteria is used during
regression analysis.
B. Support Vector Machines (Svm)
SVM first maps the input vector into a higher dimensional
feature space and then obtain the optimal separating hyper-
plane in the higher dimensional feature space. An SVM
classifier is designed for binary classification. The
generalization in this approach usually depends on the
geometrical characteristics of the given training data, and not
on the specifications of the input space. This procedure
transforms the training data into a feature space of a huge
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 01 Issue: 01 June 2015 Page No.9-12
ISSN: 2278-2419
11
dimension. That is, to separate a set of training vectors which
belong to two different classes
C. Fuzzy Logic
It processes the input data from the network and describes
measures that are significant to the anomaly detection. Fuzzy
logic is a form of many-valued logic. It deals
with reasoning that is approximate rather than fixed and exact.
In contrast with traditional logic theory, where binary sets
have two-valued logic: true or false, fuzzy logic variables may
have a truth value that ranges in degree between 0 and 1.Fuzzy
algorithms have been successfully applied to a variety of
industrial applications, including automobiles, autonomous
vehicles, chemical processes, and robotics. A fuzzy system
comprises of a group of linguistic statements based on expert
knowledge. This knowledge is usually in the form of if-then
rules. A case or an object can be distinguished by applying a
set of fuzzy logic rules based on the attributes’ linguistic value.
D. Naive Bayes
A naive Bayes classifier is a simple
probabilistic classifier based on applying Bayes' theorem with
strong (naive) independence assumptions.
E. Range Of Sutability
Sophia et al., compared the ability of three classification
techniques (k-means classifiers, neural networks and support
vector machines) to perform for network intrusion detection
applications. The results indicate that Support Vector Machines
train in the shortest amount of time with an acceptable
accuracy whilst Neural Networks exhibit high accuracy at the
cost of long training times.Phurivit Sangkatsanee et al.,
proposed a new real-time intrusion detection system (RT-IDS)
using a decision tree approach with an efficient data
preprocessing consisting of only 13 features. We evaluate the
RTIDS performance including detection rate, CPU, and
memory consumption under a real-time environment. From the
experimental results, our RT-IDS offers both total detection
rate (TDR) and normal detection rate (NTR) higher than 99%,
and the false alarm rate is very low. Therefore the decision tree
classification algorithm is a suitable approach for real-time
intrusion detection.Wenke Lee et al., suggested that the
association rules and frequent episodes algorithms can be used
to compute the consistent patterns from audit data. These
frequent patterns form an abstract summary of an audit trail,
and therefore can be used to guide the audit data gathering
process, provide help for feature selection and discover
patterns of intrusions.
VI. CLUSTERING TECHNIQUES
Clustering basically means that the process of grouping a set of
physical or abstract objects into classes of similar objects. It is
usually applied in the statistical data analysis which can be
made use of in many fields, for instance, machine learning,
data mining, pattern recognition, image analysis and
bioinformatics (Yusufovna, 2008).Clustering is useful in
intrusion detection as malicious activity should cluster
together, separating itself from non malicious activity. The
major clustering techniques methods can be classified into the
following categories: Partitioning, Hierarchical, Density-based,
Grid-based, Model-based methods each of which has their own
different ways of obtaining of cluster membership and
representation. Clustering is an effective way to find hidden
patterns in data that humans might otherwise miss. Clustering
provides some significant advantages. It is advantageous over
the classification techniques as it does not require any use of
labelled dataset for training.
Generally the pattern clustering activity involves the following
steps:-
(i) Representation of the pattern
(ii) Definition of pattern proximity
(iii) Clustering
A. K-Means Clustering
The K-means clustering is a classical clustering algorithm.
After an initial random assignment of example to K clusters,
the centres of clusters are computed and the examples are
assigned to the clusters with the closest centres. The process is
repeated until the cluster centres do not significantly change.
Once the cluster assignment is fixed, the mean distance of an
example to cluster centres is used as the score. Using the K-
means clustering algorithm, different clusters were specified
and generated for each output class.
B. Fuzzy C-Means Algorithm (Fcm)
Fuzzy c-means algorithm is one of the well known relational
clustering algorithms. It partitions the sample data for each
explanatory (input) variable into a number of clusters. These
clusters have “fuzzy” boundaries, in the sense that each data
value belongs to each cluster to some degree or other.
Membership is not certain, or “crisp”. Having decided upon the
number of such clusters to be used, some procedure is then
needed to location their centres (or more generally, midpoints)
and to determine the associated membership functions and the
degree of membership for the data points. The advantage of
using fuzzy logic is that it allows one to represent concepts
where objects can fall into more than one category (or from
another point of view- it allows representation of overlapping
categories).
C. Range Of Suitability
Kingsly Leung et al.,(2005) evaluated a new approach in
unsupervised anomaly detection in the application of network
intrusion detection. The new approach, fpMAFIA, is a density-
based and grid-based high dimensional clustering algorithm for
large data sets. It has the advantage that it can produce clusters
of any arbitrary shapes and cover over 95% of the data set with
appropriate values of parameters.Yu Guan and Ali A. Ghorbani
et al., (2003) presented a clustering heuristic for intrusion
detection, called Y-means.
Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 01 Issue: 01 June 2015 Page No.9-12
ISSN: 2278-2419
12
VII. CONCLUSION
Data mining is the modern technique for network intrusion
detection. Ready-made data mining algorithms are available
large amount of data can be handled with the data mining
technology. In this paper, we summarized the results of
various researches on suitability of data mining techniques
for intrusion detection.
REFERENCES
[1].Amanpreet Chauhan, Gaurav Mishra, Gulshan Kumar: International
Journal of Scientific & Engineering Research Volume 2, Issue 7, July-2011
1ISSN 2229-5518
[2]. Dr. Adnan Mohsin Abdulazeez Brifcani & Adel Sabry Issa on
“Intrusion Detection and Attack Classifier Based on Three Techniques: A
Comparative Study” 2011.
[3]. Kingsly Leung Christopher Leckie y on” Unsupervised Anomaly
Detection in Network Intrusion Detection Using Clusters”.
[4]. Aleksandar Lazarevic, Levent Ertoz, Vipin Kumar, Aysel Ozgur,
Jaideep Srivastava on “A Comparative Study of Anomaly Detection
Schemes in Network Intrusion Detection”.
[5].Yu Guan and Ali A. Ghorbani and Nabil Belacel Institute for
Information Technology on “Y-MEANS: A CLUSTERING METHOD
FOR INTRUSION DETECTION”
[6]. Safaa O. Al-Mamory, Hongli Zhang on” New data mining technique to
enhance IDS alarms quality”.
[7]. Manganaris, S., Christensen, M., Zerkle, D., Hermiz, and K: A data
mining analysis of RTID alarms. J. Comput. Netw 34, 571–577 (2000).
[8]. E. Bloedorn, et al., Data Mining for Network Intrusion Detection: How
to Get Started, MITRE Technical Report, August 2001.
[9]. S. Manganaris, M. Christensen, D. Serkle, and K.Hermix, A Data
Mining Analysis of RTID Alarms.
[10].Phurivit Sangkatsanee, Naruemon Wattanapongsakorn1, and
Chalermpol Charnsripinyo Computer Engineering Department, King
Mongkut’s University of Technology Thonburi on “Real-time Intrusion
Detection and Classification”.
[11]. Sophia Kaplantzis Nallasamy Mani on “A STUDY ON
CLASSIFICATION TECHNIQUES FOR NETWORK INTRUSION
DETECTION”.
[12]. Mrutyunjaya Panda, Manas Ranjan Patra on “SOME CLUSTERING
ALGORITHMS TO ENHANCE THE PERFORMANCE OF THE
NETWORK INTRUSION DETECTION SYSTEM”.
[13]. Qiang Wang Vasileios Megalooikonomou on “A Clustering
Algorithm for Intrusion Detection”.
[14]. W. Lee and S. Stolfo, .Data mining approaches for intrusion
detection,in Proceedings of the 7th USENIX Security Symposium, San
Antonio, TX, 1998.
[15]. P. Dokas, L. Ertoz, V. Kumar, A. Lazarevic, J. Srivastava, and P.
N.Tan, .Data mining for network intrusion detection,. in Proceedings of the
NSF Workshop on Next Generation Data Mining, Nov. 2002.

Mais conteúdo relacionado

Semelhante a V1_I1_2012_Paper3.docx

Intrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern miningIntrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern mining
eSAT Journals
 
Intrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern miningIntrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern mining
eSAT Journals
 
New Fuzzy Logic Based Intrusion Detection System
New Fuzzy Logic Based Intrusion Detection SystemNew Fuzzy Logic Based Intrusion Detection System
New Fuzzy Logic Based Intrusion Detection System
ijsrd.com
 
Synthesis of Polyurethane Solution (Castor oil based polyol for polyurethane)
Synthesis of Polyurethane Solution (Castor oil based polyol for polyurethane)Synthesis of Polyurethane Solution (Castor oil based polyol for polyurethane)
Synthesis of Polyurethane Solution (Castor oil based polyol for polyurethane)
IJARIIE JOURNAL
 

Semelhante a V1_I1_2012_Paper3.docx (20)

International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
SURVEY OF NETWORK ANOMALY DETECTION USING MARKOV CHAIN
SURVEY OF NETWORK ANOMALY DETECTION USING MARKOV CHAINSURVEY OF NETWORK ANOMALY DETECTION USING MARKOV CHAIN
SURVEY OF NETWORK ANOMALY DETECTION USING MARKOV CHAIN
 
Survey of network anomaly detection using markov chain
Survey of network anomaly detection using markov chainSurvey of network anomaly detection using markov chain
Survey of network anomaly detection using markov chain
 
A Survey: Comparative Analysis of Classifier Algorithms for DOS Attack Detection
A Survey: Comparative Analysis of Classifier Algorithms for DOS Attack DetectionA Survey: Comparative Analysis of Classifier Algorithms for DOS Attack Detection
A Survey: Comparative Analysis of Classifier Algorithms for DOS Attack Detection
 
C3602021025
C3602021025C3602021025
C3602021025
 
Critical analysis of genetic algorithm based IDS and an approach for detecti...
Critical analysis of genetic algorithm based IDS and an approach  for detecti...Critical analysis of genetic algorithm based IDS and an approach  for detecti...
Critical analysis of genetic algorithm based IDS and an approach for detecti...
 
Supervised Machine Learning Algorithms for Intrusion Detection.pptx
Supervised Machine Learning Algorithms for Intrusion Detection.pptxSupervised Machine Learning Algorithms for Intrusion Detection.pptx
Supervised Machine Learning Algorithms for Intrusion Detection.pptx
 
Intrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern miningIntrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern mining
 
Intrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern miningIntrusion detection and anomaly detection system using sequential pattern mining
Intrusion detection and anomaly detection system using sequential pattern mining
 
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMSAN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
 
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMSAN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
AN IMPROVED METHOD TO DETECT INTRUSION USING MACHINE LEARNING ALGORITHMS
 
A Study and Comparative analysis of Conditional Random Fields for Intrusion d...
A Study and Comparative analysis of Conditional Random Fields for Intrusion d...A Study and Comparative analysis of Conditional Random Fields for Intrusion d...
A Study and Comparative analysis of Conditional Random Fields for Intrusion d...
 
Machine learning in network security using knime analytics
Machine learning in network security using knime analyticsMachine learning in network security using knime analytics
Machine learning in network security using knime analytics
 
Articles - International Journal of Network Security & Its Applications (IJNSA)
Articles - International Journal of Network Security & Its Applications (IJNSA)Articles - International Journal of Network Security & Its Applications (IJNSA)
Articles - International Journal of Network Security & Its Applications (IJNSA)
 
MACHINE LEARNING IN NETWORK SECURITY USING KNIME ANALYTICS
MACHINE LEARNING IN NETWORK SECURITY USING KNIME ANALYTICSMACHINE LEARNING IN NETWORK SECURITY USING KNIME ANALYTICS
MACHINE LEARNING IN NETWORK SECURITY USING KNIME ANALYTICS
 
New Fuzzy Logic Based Intrusion Detection System
New Fuzzy Logic Based Intrusion Detection SystemNew Fuzzy Logic Based Intrusion Detection System
New Fuzzy Logic Based Intrusion Detection System
 
A Survey On Genetic Algorithm For Intrusion Detection System
A Survey On Genetic Algorithm For Intrusion Detection SystemA Survey On Genetic Algorithm For Intrusion Detection System
A Survey On Genetic Algorithm For Intrusion Detection System
 
Synthesis of Polyurethane Solution (Castor oil based polyol for polyurethane)
Synthesis of Polyurethane Solution (Castor oil based polyol for polyurethane)Synthesis of Polyurethane Solution (Castor oil based polyol for polyurethane)
Synthesis of Polyurethane Solution (Castor oil based polyol for polyurethane)
 
IRJET - A Secure Approach for Intruder Detection using Backtracking
IRJET -  	  A Secure Approach for Intruder Detection using BacktrackingIRJET -  	  A Secure Approach for Intruder Detection using Backtracking
IRJET - A Secure Approach for Intruder Detection using Backtracking
 
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
Comparative Study on Machine Learning Algorithms for Network Intrusion Detect...
 

Mais de praveena06

V1_I2_2012_Paper2.docx
V1_I2_2012_Paper2.docxV1_I2_2012_Paper2.docx
V1_I2_2012_Paper2.docx
praveena06
 
V1_I1_2012_Paper4.doc
V1_I1_2012_Paper4.docV1_I1_2012_Paper4.doc
V1_I1_2012_Paper4.doc
praveena06
 

Mais de praveena06 (14)

Methods to Maximize the well being and Vitality of Moribund Communities
Methods to Maximize the well being and Vitality of Moribund CommunitiesMethods to Maximize the well being and Vitality of Moribund Communities
Methods to Maximize the well being and Vitality of Moribund Communities
 
Identification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning MethodIdentification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning Method
 
V1_I2_2012_Paper7.docx
V1_I2_2012_Paper7.docxV1_I2_2012_Paper7.docx
V1_I2_2012_Paper7.docx
 
V1_I2_2012_Paper6.doc
V1_I2_2012_Paper6.docV1_I2_2012_Paper6.doc
V1_I2_2012_Paper6.doc
 
V1_I2_2012_Paper5.docx
V1_I2_2012_Paper5.docxV1_I2_2012_Paper5.docx
V1_I2_2012_Paper5.docx
 
V1_I2_2012_Paper4.doc
V1_I2_2012_Paper4.docV1_I2_2012_Paper4.doc
V1_I2_2012_Paper4.doc
 
V1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.docV1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.doc
 
V1_I2_2012_Paper2.docx
V1_I2_2012_Paper2.docxV1_I2_2012_Paper2.docx
V1_I2_2012_Paper2.docx
 
V1_I2_2012_Paper1.doc
V1_I2_2012_Paper1.docV1_I2_2012_Paper1.doc
V1_I2_2012_Paper1.doc
 
V1_I2_2012_Paper6.doc
V1_I2_2012_Paper6.docV1_I2_2012_Paper6.doc
V1_I2_2012_Paper6.doc
 
V1_I1_2012_Paper5.doc
V1_I1_2012_Paper5.docV1_I1_2012_Paper5.doc
V1_I1_2012_Paper5.doc
 
V1_I1_2012_Paper4.doc
V1_I1_2012_Paper4.docV1_I1_2012_Paper4.doc
V1_I1_2012_Paper4.doc
 
V1_I1_2012_Paper2.docx
V1_I1_2012_Paper2.docxV1_I1_2012_Paper2.docx
V1_I1_2012_Paper2.docx
 
V1_I1_2012_Paper1.doc
V1_I1_2012_Paper1.docV1_I1_2012_Paper1.doc
V1_I1_2012_Paper1.doc
 

Último

Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
chumtiyababu
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 

Último (20)

Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 

V1_I1_2012_Paper3.docx

  • 1. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications Volume: 01 Issue: 01 June 2015 Page No.9-12 ISSN: 2278-2419 9 Study on Data Mining Suitability for Intrusion Detection System (IDS) S.Vijaya Peterraj, S.Pauline Precilla Mary Department of MCA,Loyola College,Chennai Email: vijayapeterraj@gmail.com, pricy2277@gmail.com Abstract-Intrusion Detection System used to discover attacks against computers and network Infrastructures. There are many techniques used to determine the IDS such as Outlier Detection Schemes for Anomaly Detection, K-Mean Clustering of monitoring data, classification detection and outlier detection. The data mining approaches help to determine what meets the criteria as an intrusion versus normal traffic, whether a system uses anomaly detection, misuse detection, target monitoring, or stealth probes. This paper attempts to evaluate, categorize, compares and summarizes the performance of data mining techniques to detect the intrusion. Key Words- Data mining techniques, IDS techniques and computer network security. I. INTRODUCTION Intrusion Detection System which is abbreviated as IDS. This tool is considered as one of the imperative tools. IDSs perform to detect the feasible intrusions and detect the computer attacks. By this it alerts the proper individuals upon detection. They supervise, detect and respond to unauthorized activity. Once the event is detected it makes an alert. As by the definition this means “the logical complement to network firewalls”. IDSs are vital and necessary element of a complete information security infrastructure. In a simple way this can be explained as “the progression that monitoring computers or networks for unauthorized entry, doings, or file modifications also used to monitor network traffic. This contains basic types of ID they are Host based and Network based and each has a dissimilar loom to monitoring and securing data. A main rationale for using data mining in this is to assist in the examination of collections of interpretations of behavior. II. WHAT IS IDS? It is an illegal performs of entering, seizing, or taking possession of another's computer system. It means a code that disables the accurate flowing of traffic on the network or steals the information from the traffic. In other words it can be defined as “to be a violation of the security policy of the system”. It raises the alarm when possible intrusion happens. The tools are based on signature of known attacks. When the network is tiny and signatures are reserved up to date, the human forecast solution to Intrusion Detection works well. As in the final words this is “the progression of monitoring and analyzing the dealings taking place in a computer system in order to detect signs of security problems”. III. INTRUSION DETECTION METHODOLOGIES The signatures of a few attacks are identified, whereas further attacks only reveal some variation from normal patterns. There are two main methodologies that have been devised to perceive intruders. Many intrusion detection system make use one or both of these methodologies. They are: Anomaly Detection Misuse Detection A. Anomaly Detection Anomaly detection assumes that an intrusion will always reproduce some deviations from normal patterns. It also refers to detecting patterns in a given data set that do not conform to a recognized normal behavior. The patterns thus detected are called anomalies and frequently translate to significant and actionable information in several application domains. It essentially refers to storing features of user’s typical behaviors into database, then comparing user’s present behavior with those in the database. The benefit of anomaly detection lies in its entire insignificance of the system, its strong adaptability and the opportunity to detect the attack that was not at all detected before. Anomaly-based intrusion detection is about favoritism of hateful and rightful patterns of activities. Anomaly detection can be separated into static and dynamic anomaly detection. Typically, static detectors only tackle the software portion of a system and are based on the statement that the hardware need not be checked. B. Misuse Detection Misuse detection refers to techniques that utilize patterns of identified intrusions. It is based on the knowledge of system vulnerabilities. Preferably, a system security administrator should be sentient of all the known vulnerabilities and eradicate them. The original misuse detection systems used rules to illustrate events suggestive of intrusive actions that a security administrator looked for inside the system. IV. DATA MINING FOR IDS? By definition it is the progression of extracting patterns from data. It is used in an ample range of profiling practices. A
  • 2. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications Volume: 01 Issue: 01 June 2015 Page No.9-12 ISSN: 2278-2419 10 crucial reason for using data mining in this is to support in the investigation of collections of observations of behavior. Data mining technology is superior for: It can progress huge amount of data It can determine the veiled and unseen information Data Mining Tasks are: Classification Clustering Regression Association rule A. Classification It is the chore of generalizing identified formation to apply to new data. For example, predicting tumor cells as benign or malignant, classifying credit card transactions as legitimate or fraudulent. Common algorithms include decision tree learning, nearest neighbor, Naive Bayesian classification, neural networks and support vector machines. B. Clustering It is the task of grouping objects or known structures in the data that are in some way or another “similar”, without using known structures in the data. C. Regression It attempts to find a function which models the data with the least error. D. Association Rule Learning It searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. V. CLASSIFICATION TECHNIQUES A. Decision Tree Algorithm A decision tree performs the classification of a given data sample through various levels of decisions to help reach a final decision. Such a sequence of decisions is represented in a tree structure. The tree structure is used in classifying unknown data records. A decision tree with a range of discrete (symbolic) class labels is called a classification tree, whereas a decision tree with a range of continuous (numeric) values is called a regression tree. CART (Classification and Regressing Tree) is a well-known program used in the designing of decision trees. Decision trees make use of the ID3 algorithm, C4.5 algorithm and CART algorithm. a) Ide3 Algorithm IDE3 (Iterative Dichotomiser 3) decision tree algorithm was introduced in 1986 by Quinlan Ross (Quinlan, 1986 and 1987). It is based on Hunt’s algorithm and it is serially implemented. Like other decision tree algorithms the tree is constructed in two phases; tree growth and tree pruning. Data is sorted at every node during the tree building phase in-order to select the best splitting single attribute (Shafer et al, 1996). IDE3 uses information gain measure in choosing the splitting attribute. It only accepts categorical attributes in building a tree model (Quinlan, 1986 and 1987).IDE3 does not give accurate result when there is too-much noise or details in the training data set, thus a an intensive pre-processing of data is carried out before building a decision tree model with IDE3. b) C4.5 Algorithm C4.5 algorithm is an improvement of IDE3 algorithm, developed by Quinlan Ross (1993). It is based on Hunt’s algorithm and also like IDE3, it is serially implemented. Pruning takes place in C4.5 by replacing the internal node with a leaf node thereby reducing the error rate (Podgorelecet al, 2002). Unlike IDE3, C4.5 accepts both continuous and categorical attributes in building the decision tree. It has an enhanced method of tree pruning that educes misclassification errors due noise or too-much details in the training data set. Like IDE3 the data is sorted at every node of the tree in order to determine the best splitting attribute. It uses gain ratio impurity method to evaluate the splitting attribute (Quinlan, 1993). c) Cart Algorithm CART (Classification and regression trees) was introduced by Breiman, (1984). It builds both classifications and regressions trees. The classification tree construction by CART is based on binary splitting of the attributes. It is also based on Hunt’s model of decision tree construction and can be implemented serially (Breiman, 1984). It uses gini index splitting measure in selecting the splitting attribute. Pruning is done in CART by using a portion of the training data set (Podgorelecet al, 2002). CART uses both numeric and categorical attributes for building the decision tree and has in-built features that deal with missing attributes (Lewis, 200). CART is unique from other Hunt’s based algorithm as it is also use for regression analysis with the help of the regression trees. The regression analysis feature is used in forecasting a dependent variable (result) given a set of predictor variables over a given period of time (Breiman,1984). It uses many single variable splitting criteria like gini index, symgini etc and one multi-variable (linear combinations) in determining the best split point and data is sorted at every node to determine the best splitting point. The linear combination splitting criteria is used during regression analysis. B. Support Vector Machines (Svm) SVM first maps the input vector into a higher dimensional feature space and then obtain the optimal separating hyper- plane in the higher dimensional feature space. An SVM classifier is designed for binary classification. The generalization in this approach usually depends on the geometrical characteristics of the given training data, and not on the specifications of the input space. This procedure transforms the training data into a feature space of a huge
  • 3. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications Volume: 01 Issue: 01 June 2015 Page No.9-12 ISSN: 2278-2419 11 dimension. That is, to separate a set of training vectors which belong to two different classes C. Fuzzy Logic It processes the input data from the network and describes measures that are significant to the anomaly detection. Fuzzy logic is a form of many-valued logic. It deals with reasoning that is approximate rather than fixed and exact. In contrast with traditional logic theory, where binary sets have two-valued logic: true or false, fuzzy logic variables may have a truth value that ranges in degree between 0 and 1.Fuzzy algorithms have been successfully applied to a variety of industrial applications, including automobiles, autonomous vehicles, chemical processes, and robotics. A fuzzy system comprises of a group of linguistic statements based on expert knowledge. This knowledge is usually in the form of if-then rules. A case or an object can be distinguished by applying a set of fuzzy logic rules based on the attributes’ linguistic value. D. Naive Bayes A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions. E. Range Of Sutability Sophia et al., compared the ability of three classification techniques (k-means classifiers, neural networks and support vector machines) to perform for network intrusion detection applications. The results indicate that Support Vector Machines train in the shortest amount of time with an acceptable accuracy whilst Neural Networks exhibit high accuracy at the cost of long training times.Phurivit Sangkatsanee et al., proposed a new real-time intrusion detection system (RT-IDS) using a decision tree approach with an efficient data preprocessing consisting of only 13 features. We evaluate the RTIDS performance including detection rate, CPU, and memory consumption under a real-time environment. From the experimental results, our RT-IDS offers both total detection rate (TDR) and normal detection rate (NTR) higher than 99%, and the false alarm rate is very low. Therefore the decision tree classification algorithm is a suitable approach for real-time intrusion detection.Wenke Lee et al., suggested that the association rules and frequent episodes algorithms can be used to compute the consistent patterns from audit data. These frequent patterns form an abstract summary of an audit trail, and therefore can be used to guide the audit data gathering process, provide help for feature selection and discover patterns of intrusions. VI. CLUSTERING TECHNIQUES Clustering basically means that the process of grouping a set of physical or abstract objects into classes of similar objects. It is usually applied in the statistical data analysis which can be made use of in many fields, for instance, machine learning, data mining, pattern recognition, image analysis and bioinformatics (Yusufovna, 2008).Clustering is useful in intrusion detection as malicious activity should cluster together, separating itself from non malicious activity. The major clustering techniques methods can be classified into the following categories: Partitioning, Hierarchical, Density-based, Grid-based, Model-based methods each of which has their own different ways of obtaining of cluster membership and representation. Clustering is an effective way to find hidden patterns in data that humans might otherwise miss. Clustering provides some significant advantages. It is advantageous over the classification techniques as it does not require any use of labelled dataset for training. Generally the pattern clustering activity involves the following steps:- (i) Representation of the pattern (ii) Definition of pattern proximity (iii) Clustering A. K-Means Clustering The K-means clustering is a classical clustering algorithm. After an initial random assignment of example to K clusters, the centres of clusters are computed and the examples are assigned to the clusters with the closest centres. The process is repeated until the cluster centres do not significantly change. Once the cluster assignment is fixed, the mean distance of an example to cluster centres is used as the score. Using the K- means clustering algorithm, different clusters were specified and generated for each output class. B. Fuzzy C-Means Algorithm (Fcm) Fuzzy c-means algorithm is one of the well known relational clustering algorithms. It partitions the sample data for each explanatory (input) variable into a number of clusters. These clusters have “fuzzy” boundaries, in the sense that each data value belongs to each cluster to some degree or other. Membership is not certain, or “crisp”. Having decided upon the number of such clusters to be used, some procedure is then needed to location their centres (or more generally, midpoints) and to determine the associated membership functions and the degree of membership for the data points. The advantage of using fuzzy logic is that it allows one to represent concepts where objects can fall into more than one category (or from another point of view- it allows representation of overlapping categories). C. Range Of Suitability Kingsly Leung et al.,(2005) evaluated a new approach in unsupervised anomaly detection in the application of network intrusion detection. The new approach, fpMAFIA, is a density- based and grid-based high dimensional clustering algorithm for large data sets. It has the advantage that it can produce clusters of any arbitrary shapes and cover over 95% of the data set with appropriate values of parameters.Yu Guan and Ali A. Ghorbani et al., (2003) presented a clustering heuristic for intrusion detection, called Y-means.
  • 4. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications Volume: 01 Issue: 01 June 2015 Page No.9-12 ISSN: 2278-2419 12 VII. CONCLUSION Data mining is the modern technique for network intrusion detection. Ready-made data mining algorithms are available large amount of data can be handled with the data mining technology. In this paper, we summarized the results of various researches on suitability of data mining techniques for intrusion detection. REFERENCES [1].Amanpreet Chauhan, Gaurav Mishra, Gulshan Kumar: International Journal of Scientific & Engineering Research Volume 2, Issue 7, July-2011 1ISSN 2229-5518 [2]. Dr. Adnan Mohsin Abdulazeez Brifcani & Adel Sabry Issa on “Intrusion Detection and Attack Classifier Based on Three Techniques: A Comparative Study” 2011. [3]. Kingsly Leung Christopher Leckie y on” Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters”. [4]. Aleksandar Lazarevic, Levent Ertoz, Vipin Kumar, Aysel Ozgur, Jaideep Srivastava on “A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection”. [5].Yu Guan and Ali A. Ghorbani and Nabil Belacel Institute for Information Technology on “Y-MEANS: A CLUSTERING METHOD FOR INTRUSION DETECTION” [6]. Safaa O. Al-Mamory, Hongli Zhang on” New data mining technique to enhance IDS alarms quality”. [7]. Manganaris, S., Christensen, M., Zerkle, D., Hermiz, and K: A data mining analysis of RTID alarms. J. Comput. Netw 34, 571–577 (2000). [8]. E. Bloedorn, et al., Data Mining for Network Intrusion Detection: How to Get Started, MITRE Technical Report, August 2001. [9]. S. Manganaris, M. Christensen, D. Serkle, and K.Hermix, A Data Mining Analysis of RTID Alarms. [10].Phurivit Sangkatsanee, Naruemon Wattanapongsakorn1, and Chalermpol Charnsripinyo Computer Engineering Department, King Mongkut’s University of Technology Thonburi on “Real-time Intrusion Detection and Classification”. [11]. Sophia Kaplantzis Nallasamy Mani on “A STUDY ON CLASSIFICATION TECHNIQUES FOR NETWORK INTRUSION DETECTION”. [12]. Mrutyunjaya Panda, Manas Ranjan Patra on “SOME CLUSTERING ALGORITHMS TO ENHANCE THE PERFORMANCE OF THE NETWORK INTRUSION DETECTION SYSTEM”. [13]. Qiang Wang Vasileios Megalooikonomou on “A Clustering Algorithm for Intrusion Detection”. [14]. W. Lee and S. Stolfo, .Data mining approaches for intrusion detection,in Proceedings of the 7th USENIX Security Symposium, San Antonio, TX, 1998. [15]. P. Dokas, L. Ertoz, V. Kumar, A. Lazarevic, J. Srivastava, and P. N.Tan, .Data mining for network intrusion detection,. in Proceedings of the NSF Workshop on Next Generation Data Mining, Nov. 2002.