SlideShare a Scribd company logo
1 of 14
FLORIDA INSTITUTE OF TECHNOLOGY 
USING THE DATA MINING CLASSIFIER SELECTION MODEL 
TO IMPROVE NETWORK INTRUSION DETECTION 
DATA MINING FOR CYBERSECURITY 
FINAL PROJECT 
SUBMITTED TO: 
DR. LIAM MAYRON 
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR 
CYB 5675: DATA MINING 
BY 
CRAIG CANNON 
MELBOURNE, FLORIDA 
JUNE 15TH 2014
USING THE DATA MINING CLASSIFIER SELECTION MODEL 
TO IMPROVE NETWORK INTRUSION DETECTION 
INTRODUCTION: 
These days there are massive amounts of data being collected and stored in databases all 
over the world. This trend has been occurring now for decades. Databases are reaching sizes in 
the Terabyte range of data and could be rapidly approaching the Petabytes range in a few years. 
Inside these enormous data warehouses lay hidden invaluable information and knowledge that 
can only be extracted using some very creative data mining algorithms. These algorithms allow 
us to extract what is called nuggets of knowledge from large sets of data. Basically, they allow us 
to turn data into information. These algorithms were created originally for this purpose and have 
performed this purpose quite well. Nevertheless, over the years these algorithms have found 
themselves even more useful in the area of network intrusion detection. 
In the last few decades cyber attacks have been on the rise worldwide. Security breaches 
have plagued governments, individuals, and even corporations for some time now and seem to be 
an increasing trend with no end in sight. To defend against these attacks network systems such as 
firewalls, endpoint security servers, intrusion prevention systems, and intrusion detection 
systems have been placed on many networks within the global internet infrastructure. Out of all 
of these defense systems, network intrusion detection systems using data mining techniques is 
proving itself to be the most promising defense against cyber attacks. When used with data 
mining techniques, network intrusion detection systems can analyze massive amounts of audit 
data while at the same time improve detection efficiency. 
In the past few years researchers have been doing a lot of experimenting with various 
algorithms on network intrusion detection systems. Different approaches have been tried in an 
effort to improve both the effectiveness and efficiency of NIDS. As a result, detection has 
improved tremendously over the years and continues to be a major weapon in the cyber war 
arsenal. For this reason, choosing the best data mining classification method for network 
intrusion detection systems has become even more vital for future NID systems designs. 
Many NID system designers have begun to take a closer look at the results of this 
research. One particular research technique that comes to mind involves discovering algorithms 
that distinctively work better on certain types of attacks. Findings in this area could help 
immensely in enhancing intrusion detection performance. Network Intrusion Detection System 
(NIDS) plays vital role of identifying various types of attacks. NIDS main role is to secure the 
network security and its information infrastructure. 
The main purpose of NIDS is to detect intrusion among normal audit data which is in 
essence considered as a classification problem. Classification is defined as given a collection of 
records known as a training set where each record has a set of attributes one of which is called 
class. Classifications goal is to assign a class to previous unseen records. In order to do this task, 
data is divided into both a training set and a test set. The training set is used to build the model 
and the test set is used to validate the model. (Kumar, 2002) Below is a typical framework of a 
classification model. This framework gives a big picture overview of the experiment that is being
undertaken using the KDD99 dataset. Basically two random samples of the data will be taken 
and one will be used as the training dataset while the other as the test set. See Figure 1 below. 
Figure 1 
The purpose of this project is evaluate a wide-ranging set of classifier algorithms against 
the KDD dataset in order to determine if they can demonstrate better performance in a particular 
attack category. A comparison of each algorithm will be completed and their performance results 
will be matched to their specific attack category. An attempt to detect attacks in Probe, DOS, 
U2R, and R2L categories will be used as the baseline foundation of this project. Each of these 
attack categories involve distinctive attack types that and are used commonly by attackers when 
hacking a network. Once we have identified which algorithm best detects which attack category, 
we will propose a model for designing an ideal network intrusion detection systems for future 
usage. 
BACKGROUND: 
It is a well known fact that earning classifier models is a critical problem in data mining. 
Audit type data from real world networks is often presented as a set of records with multiple 
attributes. One of those attributes is a special category attribute called class. Initially, a training 
set of records with known class labels are extracted. The idea is to learn a model for the class in 
terms of other attributes. Once this model has been identified the objective is to use this newly 
found model based on both the predicted and actual classes to predict the class of any given set 
of records in order to optimize the system overall. 
Traditionally, the goal has been to reduce the number of misclassified records thereby in 
essence capitalize on accuracy. Many techniques in times past have been attempted to build 
efficient and effective classifier models. Three methods in particularly, techniques that learn 
rule-based models are principally well-liked in data mining.
The first method is the Transductive Confidence Machines for K-Nearest Neighbors 
(TCM-KNN) technique. The TCM-KNN technique is a commonly used machine learning and 
data mining algorithm used for intrusion detection. It is highly effective in the areas of fraud 
detection, pattern recognition and outlier detection. Historically the TCM-KNN algorithm wasn’t 
considered an algorithm fit for intrusion detection. Nevertheless, that attitude has changed 
immensely over the past decade as research has proven the algorithms validity over time. Results 
have proven to be outstanding. It has been found that TCM-KNN has high detection rates and 
low false positives even when used on a limited data set for training. For instance an average true 
positive rate of 99.6% and a false positive rate of 0.1% was was reported on a random sampling 
subset of the KDD dataset. (Ngyun, 2008) TCM-KNN uses the computation of the confidence 
based upon randomness. Unlike conventional methods in machine learning, transduction can 
offer measures of dependability to individual points, and uses very extensive assumptions except 
for the iid assumption (the training as well as new (unlabeled) points are independently and 
identically distributed). TCM-KNN is an ideal method to use for pattern recognition, fraud 
detection, and outlier detection.(Yang, 2007) 
The next method is the PNrule technique. PNrule technique was assessed on the KDD 
testing datasheet and performed quite well in its detection efforts. Returning a true positive rate 
of over 96 % against denial of service attacks (DOS) and a less than 10% false positive rate 
overall, the PNrule proved itself competent player in the data mining space. (Ngyun, 2008) 
Basically, the PNrule is a two stage general-to-specific framework based on finding rules that 
predict the presence of a target class (P-rules) and the absence of a target class (N-rules). The 
main inspiration is to learn a set of P-rules that together cover most of the positive examples and 
yet still its numerical importance. Initially, highly accurate rules are selected, but later accuracy 
is compromised in favor of support. This lessening of accuracy causes some negative examples 
or false positives to be supported. Now, we combine all the true positives and false positives 
collectively supported by the union of all the P-rules, and learn N-rules on this reduced data-set 
to remove the false positives. This two-phase approach is what makes the PN rule approach 
different. PNrule also uses a method that allows it to avoid over fitting and still maintain its 
flexibility. (Agarwal, 2000) 
The final method is the Multiclass Support Vector machines (SVMs) method. SVM 
technically considered as a binary classification problem, is based on the idea of constructing 
optimal hyper-planes to improve generalization abilities. Because most real world pattern 
recognition applications are multi-class classification cases, multi-class SVM algorithms are 
ideal for solving these types of real world problems. SVM is a decomposition-based method that 
divides a multi-class problem into multiple binary problems in order to construct multiple two-class 
SVM classifiers and combine their classification results. There are three ways SVMs can be 
implemented using binary SVM algorithms: first there’s the one-vs-all, next there’s the one-vs-one, 
and finally there’s the error correcting output coding way. Among the existing 
decomposition approaches, the one-vs-all strategy has been regarded as a simple method with 
somewhat low accuracy when compared with other multi-class SVMs. The one-vs-all classifier 
is also extremely powerful and can produce results that are usually at least as accurate as other 
methods. (Xu, 2006) In times past, when SVM were evaluated on the KDD99 dataset, the results 
were promising. False positives were maintained at a very low level averaging out to about 0.6% 
while True Positives were relatively in the rage of 75% for DOS attacks and 82% for probe 
attacks using a small dataset of 10,000 records. (Ngyun, 2008)
SYSTEM DESIGN AND IMPLEMENTATION: 
The system design experiment evaluation atmosphere consists of three steps: First the 
environment is setup with the required hardware and software, the data mining software is 
selected and installed, and the dataset itself is chosen and preprocessed. Next, five discrete 
classifier algorithms are chosen from a wide-ranging set of accepted classifier algorithms. These 
were chosen because of they represent a number of different industries and they are well known 
conventional classifier algorithms not to mention that they are already included in the WEKA 
software. The five classifier algorithms chosen are: Naïve Bayes, J48 (C4.5 revision), Decision 
table, JRip (Ripper) and SMO. Finally, the experiments are run using different classifier 
algorithms, the results are recorded, and the performances of each classifier algorithm are 
compared. Based on the information obtain an ideal model is suggested. 
All experimental iterations were done on the following hardware and software: Dell 
Latitude e6410 with Intel Core i5 2.53GHz processor, 4GB RAM, running on Microsoft 
Windows 7 operating system platform. The open source data mining machine learning package 
called WEKA version 3.6 for windows was used. WEKA is a data mining workbench that 
contains a large number of algorithms for classification and a lot of algorithms for data 
preprocessing, feature selection, clustering, and finding association rules. It’s an all-inclusive 
workbench and it’s written in Java and runs on multiple platforms. (Witten, 2011) 
The experiment uses the KDD99 dataset which came about as a direct result of the 1998 
DARPA Intrusion Detection Evaluation Program whose sole objective was to survey and 
evaluate research in intrusion detection. This standard set of pre audited data includes a wide 
variety of intrusions simulated in a military network environment. Nine weeks of raw TCP dump 
data for a local-area network (LAN) simulating a typical U.S. Air Force LAN was interspersed 
with multiple types of attacks. The raw training data contained seven weeks of TCP dump 
network traffic resulting in about four million connection records. (Stolfo, 1999) 
A connection is a series of TCP packets starting and ending at some distinct times, 
between which data flows to and from a source IP address to a destination IP address. Labeled as 
either normal, or as an attack, each connection could only contain exactly one specific attack 
type. As a result four main categories of attacks were identified: DOS: denial-of-service, R2L: 
remote to local, U2R: user to remote, and Probing. (Stolfo, 1999) These are the four categories of 
attacks that will be used in this classifier experiment’s dataset. The actual type of attacks in the 
dataset as well as their categories can be seen here in Figure 2 below. 
Figure 2
As our test dataset, the KDD99 dataset contains one type of normal data and 22 different types of 
attacks. Each instance of the dataset contains 41 features and one final feature for classifying. 
(Nguyen, 2008) In order to be able to do the experiment, we filtered the original KDD99 dataset 
down to about 10% of the data. This data was randomly chosen and represents a cross-section of 
attack types. The size of the original data set had 494,020 records and we reduce our dataset 
down to 49,402 records. To do this we loaded the original dataset into WEKA, went to Choose to 
Filters to Unsupervised to Instances and chose the remove Percentage Filter. Afterwards, we 
double clicked on the name of the filter and put 90% in the percentage box which leaves only 
10% of the data remaining. Figures 3 and 4 screenshots can be seen below showing the before 
and after results of the filter being applied to the original dataset. This new dataset will be used 
as training set. The next random sampling will remove 97% of the data and will only leave 3% or 
about 14,821 instances and it will be used as the test set. 
Figure 3 
Figure 4
After the filter was applied, the different type of attacks had to be categorized based on 
the four chosen categories: DOS, Probe, U2R, and R2L. This required looking at the total 
numbers of each attack and putting them in their respective category type. This was all done 
manually after WEKA identified each attack type. An attack by category was formed for each 
dataset. Afterwards a distribution of connection type classifier evaluation table was created in 
Excel for each dataset as well. Figures 5 and 6 below shows screenshots of these Excel attack 
tables. These tables will be used later to do some calculations. 
Figure 5 
Figure 6
Once the training set and test sets were created the time had come to select the classifiers 
and began training. The cross validation method was chosen as the test option because it avoids 
overlapping test sets. Using one group for testing and the rest for training allows time to fine tune 
the model building process. The experiment was started with the cross validation option and 
continued until all the algorithms were tested. Keeping a consistent testing method across the 
board is really the only way to have a true comparison and get a baseline standard established 
prior to improving the results by doing further iterations and recommending a model. Once that 
is done an accurate estimate can be identified and the results of each algorithm compared. 
The first classifier chosen was the NaiveBayes algorithm. NaiveBayes is a powerful 
probabilistic algorithm that learns from training data the conditional probability of each attribute. 
It is called naïve because it relies on two simplifying assumptions: it assumes that the predictive 
attributes are conditionally independent given the class, and it posits that no hidden attributes 
influence the prediction process. (Nguyen, 2008) NaïveBayes does its classification by applying 
Bayes rule to compute the probability when given the exacting instance. Afterwards, NaïveBayes 
predicts the class with the highest subsequent probability. The goal of classification is to 
correctly forecast the value of a selected discrete class variable given a vector of predictors or 
attributes. NaïveBayes is very effective in many real world datasets because it learns to combine 
classifier predictions. Nevertheless, because of attribute redundancy its predictive accuracy is 
sometimes reduced. (Baitharu, 2013) Figures 7 and 8 below show the process of running the 
NaïveBayes algorithm on the KDD99 training dataset using the cross validation process. 
Figure 7
Figure 8 
The second classifier chosen was the J48 algorithm. J48 formerly known as J48 is an 
optimized version of C4.5 version 8. (Nguyen, 2008) J48 can make precise forecasts from data as 
well as explain the patterns in it. J48 can deal effectively with many problems of numeric 
attributes such as: missing values, pruning, estimating error rates, complexity of decision tree 
induction, and generating rules from trees. (Baitharu, 2013) In terms of predictive accuracy, J48 
performs slightly better than CART. The learning and classification steps of J48 are very fast; 
however, J48 lacks scalability and efficiency when it’s applied to large datasets. This was clearly 
shown in the experiment when J48 was applied to the training dataset. The system running the 
software came to a complete halt when attempting to process the entire KDD99 dataset with over 
490,000 instances. Nevertheless, on both the test and training dataset J48 performed quite well. 
Figures 9 and 10 screenshots below shows the process of running the J48 algorithm on the 
KDD99 dataset using the cross validation process. 
Figure 9
Figure 10 
The third classifier chosen was the Decision table algorithm which builds a decision table 
majority classifier. Decision Table builds by using a simple decision table majority classifier. It 
summarizes the dataset with a ‘decision table’ which contains the same number of attributes as 
the original dataset. Afterwards, a new data item is allocated to a category by finding the line in 
the decision table that goes with the non-class values of the data item. Decision Table uses the 
wrapper method to find a good subset of attributes for inclusion in the table. By eliminating 
attributes that contribute little or nothing to a model of the dataset, the algorithm creates a 
smaller and condensed decision table. (Kohavi, 1995) Figures 11 and 12 screenshots below 
shows the process of running the Decision Table algorithm on the KDD99 dataset using the cross 
validation process. 
Figure 11
Figure 12 
The fourth classifier chosen was the JRip (Ripper) algorithm. JRip implements a 
propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER). 
Ripper builds a rule set by frequently adding rules to a blank rule set until all constructive 
examples are covered. Rules are formed by greedily adding conditions to the predecessor of a 
rule until no negative examples are covered. JRip (RIPPER) proceeds by treating all the 
examples of a particular judgment in the training data as a class, and finding a set of rules that 
cover all the members of that class. Thereafter it proceeds to the next class and does the same, 
repeating this until all classes have been covered. (Rajput, 2000) JRip is a rule learner similar to 
the commercial rule learner RIPPER. The RIPPER rule learning algorithm assembles a rule set 
in which all positive examples are covered, and its algorithm performs proficiently noisy 
datasets. Before building a rule, the current set of training examples are partitioned into two 
subsets, a growing set and a pruning set. The rule is created from examples in the growing set. 
The rule set begins with an empty rule set and rules are added incrementally to the rule set until 
no negative examples are covered. To prune a rule, RIPPER chooses the deletion that make best 
use of the function (Witten, 2005) 
The fifth and final classifier is the SMO algorithm. SMO can be either linear or non-linear. 
SMO is a supervised classifier. SMO uses two different datasets: training set and a test 
set. In a best case scenario situation the classes are linearly separable by a line that splits the two 
classes perfectly when in actuality there are many lines that separate the dataset and finding the 
best one becomes the "separating line". The best line is found by making the best use of the 
distance to the nearest points of both classes in the training set. The difficulty of the model has to 
be balanced between fitting on the training data and being generic. This can be achieved by 
allowing models which can make errors. SMO can make some errors to avoid over-fitting by 
minimizing the number of errors that will be made.(Baitharu, 2013) 
DISCUSSION: 
The instances that performed the best out of the five chosen algorithms were assessed on 
the KDD dataset. Figure 13 gives the results of our findings. Both the true positives (TP) and 
false positives (FP) of each algorithm was recorded. These are the factors that were used to 
identify the best algorithm for a particular category. The Total Time to build the model also was 
used in order to get some real world insight into if these models were actually applicable in terms 
of real world time constraints. I must say I was somewhat surprised at the results. The J48 
classifier had both the highest true positive rate for the DOS category as well as the lowest false 
positive rate for the DOS category. This in essence proves that certain algorithms do indeed work
better for certain attack categories which confirms the finds of other researchers like Nguyen and 
Choi. Another important insight which stood out involved the long total time taken to build the 
model by the JRip Algorithm. JRip took 914.82 seconds to build the model while NaiveBayes 
only took 10.55 seconds. The Decision Table algorithm had the second highest amount of Total 
time taking 642.55 seconds to build the model. Decision table also had only a .70 or 70% true 
positive rate for DOS indicating it wouldn’t be the best algorithm to use for finding denial of 
service attacks. Another interesting point is that based on my datasets there weren’t many U2R 
and R2L attack records included in either of the datasets. This made it very difficult to identify 
which algorithm would be best for these attacks. As shown in figure 13 below, most of the rates 
besides each of these attacks are zeroes. In essence this experiment was a great project. 
Crunching the numbers by building spreadsheets helped me to make more sense of what was 
actually happening with the numbers. Based of all of the findings, I feel that J48 classifier is best 
for DOS attacks while Decision table is best for PROBE attacks both having the highest true 
positive rates as well as the lowest false positive rates in their respective attack categories. U2R 
and R2L attack findings were inconclusive due to a lack of data for these attacks. As a results the 
propose model recommended would combine both J48 and Decision Table classifiers into a NID 
system to improve it efficiency. If the system is found to work too slow in the real world, 
Decision Table would be replace with the SMO algorithm which is fairly good at identifying 
PROBE attacks in less time than Decision table. In essence, designing an IDS system with these 
two algorithms working in parallel would be a well designed IDS system. 
Figure 13
CONCLUSION: 
In this final report I have attempted to provide a understanding of research about data 
mining classifier selection models and their impact on intrusion detection systems. I have also 
tried to give some introductory understanding into different classifier algorithms and techniques 
used to build efficient intrusion detection models. The WEKA data mining workbench was used 
as the tool of choice to demonstrate how these models would actually be designed as well as how 
they would work in identifying certain types of attacks. Background information was given 
describing different classifier methodologies and citing how researchers are progressing in this 
area of data mining. A System design and implementation methodology was presented outlining 
the project I choose to try to give an understanding of some of the current research being done 
today in the area of data mining and cyber security. Overall the research in this project in my 
opinion clearly demonstrated that a correlation exists between specific data mining algorithms 
and specific attack types. Understanding these relationships as well as other factors such as total 
time and average accuracy can prove invaluable in future designs of intrusion detection systems. 
Nevertheless, there is much more research that can be done in this area. From my findings I have 
learned that identifying the relationship between certain algorithms paired with certain attack 
types is an important topic for both future IDS designs as well as future security risks mitigation 
efforts. Having a clearer understanding of this topic has made me more aware of both the 
intricacies and complexities of data mining algorithms when used in cyber security applications. 
As ubiquitous computing continues to change the technology infrastructure landscape, one thing 
is for sure, data mining techniques being used in cyber security technologies will continue to 
gain even more importance in the ongoing cyber war for many years to come. 
ACKNOWLEDGEMENTS: 
This project has been one of the most challenging research projects that I have ever done, yet it 
has been one of the most rewarding projects I have ever done as well. I would like to thank to my 
professor, Dr. Liam Mayron, whose encouragement, guidance and support from the beginning to 
the end has motivated me to do my very best and never accept anything less. His wisdom and 
insight has not only allowed me to develop better understanding of this subject but also has 
taught me to never give up and for that I am eternally grateful.
REFERENCES 
Agarwal, R., Joshi, M., (2000) 
PNrule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in 
Network Intrusion Detection) retrieved from: 
http://www.siam.org/meetings/sdm01/pdf/sdm01_30.pdf 
Baitharu, T., Pani, S., (2013) 
A Survey on Application of Machine Learning Algorithms on Data Mining: International Journal 
of Innovative Technology and Exploring Engineering 
retrieved from: http://www.ijitee.org/attachments/File/v3i7/G1363123713.pdf 
Kohavi, R. "Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid" 
from: KDD-96 Proceedings (1996) retrieved from: 
http://www.aaai.org/Papers/KDD/1996/KDD96-033.pdf 
Kumar, V. (2002) 
Data Mining Algorithms: Tutorial Presented at IPAM 2002 Workshop on Mathematical 
Challenges in Scientific Data Mining retrieved from: 
http://www.ipam.ucla.edu/publications/sdm2002/sdm2002_4007.pdf 
Nguyen, H., Choi, D., (2008) 
Application of Data Mining to Network Intrusion Detection: Classifier Selection Model 
retrieved from: http://arxiv.org/ftp/arxiv/papers/1007/1007.1268.pdf 
Rajput, A., Aharwal, R., (2000) J48 and JRIP Rules for E-Governance Data - retrieved from: 
http://www.cscjournals.org/csc/manuscript/Journals/IJCSS/volume5/Issue2/IJCSS-448.pdf 
Stolfo, S., Fan, W., Lee, W., Prodromidis, A’, Cost-based Modeling and Evaluation 
for Data Mining With Application to Fraud and Intrusion Detection: Results from the JAM 
Project retrieved from: https://kdd.ics.uci.edu/databases/kddcup99/task.html 
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd 
edn. Morgan Kaufmann, San Francisco (2005) 
Xu, X., (2006) Adaptive Intrusion Detection Based on Machine Learning: Feature Extraction, 
Classifier Construction and Sequential Pattern Prediction retrieved from: 
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=8CD0CFADEE68079885E5FD4F71B 
C4F7E?doi=10.1.1.107.9575&rep=rep1&type=pdf 
Yang, L., Guo, L. (2007) 
An active learning based TCM-KNN algorithm for supervised network intrusion detection 
retrieved from: http://sourcedb.ict.cas.cn/cn/ictthesis/200907/P020090722621827712247.pdf

More Related Content

What's hot

Benchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
Benchmarks for Evaluating Anomaly Based Intrusion Detection SolutionsBenchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
Benchmarks for Evaluating Anomaly Based Intrusion Detection SolutionsIJNSA Journal
 
A Stacked Generalization Ensemble Approach for Improved Intrusion Detection
A Stacked Generalization Ensemble Approach for Improved Intrusion DetectionA Stacked Generalization Ensemble Approach for Improved Intrusion Detection
A Stacked Generalization Ensemble Approach for Improved Intrusion DetectionIJCSIS Research Publications
 
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...ijceronline
 
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksSoftware Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksEditor IJCATR
 
Predicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningPredicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningGuido A. Ciollaro
 
PERFORMANCE EVALUATION OF MACHINE LEARNING TECHNIQUES FOR DOS DETECTION IN WI...
PERFORMANCE EVALUATION OF MACHINE LEARNING TECHNIQUES FOR DOS DETECTION IN WI...PERFORMANCE EVALUATION OF MACHINE LEARNING TECHNIQUES FOR DOS DETECTION IN WI...
PERFORMANCE EVALUATION OF MACHINE LEARNING TECHNIQUES FOR DOS DETECTION IN WI...IJNSA Journal
 
Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...butest
 
IRJET- Intrusion Detection using IP Binding in Real Network
IRJET- Intrusion Detection using IP Binding in Real NetworkIRJET- Intrusion Detection using IP Binding in Real Network
IRJET- Intrusion Detection using IP Binding in Real NetworkIRJET Journal
 
Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...researchinventy
 
The role of NLP & ML in Cognitive System by Sunantha Krishnan
The role of NLP & ML in Cognitive System by Sunantha KrishnanThe role of NLP & ML in Cognitive System by Sunantha Krishnan
The role of NLP & ML in Cognitive System by Sunantha Krishnansunanthakrishnan
 
SURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKS
SURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKSSURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKS
SURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKSijcseit
 
MULTI-LAYER CLASSIFIER FOR MINIMIZING FALSE INTRUSION
MULTI-LAYER CLASSIFIER FOR MINIMIZING FALSE INTRUSIONMULTI-LAYER CLASSIFIER FOR MINIMIZING FALSE INTRUSION
MULTI-LAYER CLASSIFIER FOR MINIMIZING FALSE INTRUSIONIJNSA Journal
 

What's hot (16)

Benchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
Benchmarks for Evaluating Anomaly Based Intrusion Detection SolutionsBenchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
Benchmarks for Evaluating Anomaly Based Intrusion Detection Solutions
 
final
finalfinal
final
 
A Stacked Generalization Ensemble Approach for Improved Intrusion Detection
A Stacked Generalization Ensemble Approach for Improved Intrusion DetectionA Stacked Generalization Ensemble Approach for Improved Intrusion Detection
A Stacked Generalization Ensemble Approach for Improved Intrusion Detection
 
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
 
G44093135
G44093135G44093135
G44093135
 
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural NetworksSoftware Defect Prediction Using Radial Basis and Probabilistic Neural Networks
Software Defect Prediction Using Radial Basis and Probabilistic Neural Networks
 
Predicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine LearningPredicting Fault-Prone Files using Machine Learning
Predicting Fault-Prone Files using Machine Learning
 
Spam email filtering
Spam email filteringSpam email filtering
Spam email filtering
 
PERFORMANCE EVALUATION OF MACHINE LEARNING TECHNIQUES FOR DOS DETECTION IN WI...
PERFORMANCE EVALUATION OF MACHINE LEARNING TECHNIQUES FOR DOS DETECTION IN WI...PERFORMANCE EVALUATION OF MACHINE LEARNING TECHNIQUES FOR DOS DETECTION IN WI...
PERFORMANCE EVALUATION OF MACHINE LEARNING TECHNIQUES FOR DOS DETECTION IN WI...
 
Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...Analogy Based Defect Prediction Model Elham Paikari Department of ...
Analogy Based Defect Prediction Model Elham Paikari Department of ...
 
IRJET- Intrusion Detection using IP Binding in Real Network
IRJET- Intrusion Detection using IP Binding in Real NetworkIRJET- Intrusion Detection using IP Binding in Real Network
IRJET- Intrusion Detection using IP Binding in Real Network
 
20170412 om patri pres 153pdf
20170412 om patri pres 153pdf20170412 om patri pres 153pdf
20170412 om patri pres 153pdf
 
Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...Research Inventy : International Journal of Engineering and Science is publis...
Research Inventy : International Journal of Engineering and Science is publis...
 
The role of NLP & ML in Cognitive System by Sunantha Krishnan
The role of NLP & ML in Cognitive System by Sunantha KrishnanThe role of NLP & ML in Cognitive System by Sunantha Krishnan
The role of NLP & ML in Cognitive System by Sunantha Krishnan
 
SURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKS
SURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKSSURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKS
SURVEY ON LINK LAYER ATTACKS IN COGNITIVE RADIO NETWORKS
 
MULTI-LAYER CLASSIFIER FOR MINIMIZING FALSE INTRUSION
MULTI-LAYER CLASSIFIER FOR MINIMIZING FALSE INTRUSIONMULTI-LAYER CLASSIFIER FOR MINIMIZING FALSE INTRUSION
MULTI-LAYER CLASSIFIER FOR MINIMIZING FALSE INTRUSION
 

Similar to Data Mining Classifier Selection Model Improves Network Intrusion Detection

rpaper
rpaperrpaper
rpaperimu409
 
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...ijceronline
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionIRJET Journal
 
BENCHMARKS FOR EVALUATING ANOMALY-BASED INTRUSION DETECTION SOLUTIONS
BENCHMARKS FOR EVALUATING ANOMALY-BASED INTRUSION DETECTION SOLUTIONSBENCHMARKS FOR EVALUATING ANOMALY-BASED INTRUSION DETECTION SOLUTIONS
BENCHMARKS FOR EVALUATING ANOMALY-BASED INTRUSION DETECTION SOLUTIONSIJNSA Journal
 
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...IJCNCJournal
 
Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...
Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...
Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...IJCNCJournal
 
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...BASMAJUMAASALEHALMOH
 
Progress of Machine Learning in the Field of Intrusion Detection Systems
Progress of Machine Learning in the Field of Intrusion Detection SystemsProgress of Machine Learning in the Field of Intrusion Detection Systems
Progress of Machine Learning in the Field of Intrusion Detection Systemsijcisjournal
 
11421ijcPROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYST...
11421ijcPROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYST...11421ijcPROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYST...
11421ijcPROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYST...ijcisjournal
 
ATTACK DETECTION AVAILING FEATURE DISCRETION USING RANDOM FOREST CLASSIFIER
ATTACK DETECTION AVAILING FEATURE DISCRETION USING RANDOM FOREST CLASSIFIERATTACK DETECTION AVAILING FEATURE DISCRETION USING RANDOM FOREST CLASSIFIER
ATTACK DETECTION AVAILING FEATURE DISCRETION USING RANDOM FOREST CLASSIFIERCSEIJJournal
 
Attack Detection Availing Feature Discretion using Random Forest Classifier
Attack Detection Availing Feature Discretion using Random Forest ClassifierAttack Detection Availing Feature Discretion using Random Forest Classifier
Attack Detection Availing Feature Discretion using Random Forest ClassifierCSEIJJournal
 
An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...
An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...
An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...ijctcm
 
Enhanced Intrusion Detection System using Feature Selection Method and Ensemb...
Enhanced Intrusion Detection System using Feature Selection Method and Ensemb...Enhanced Intrusion Detection System using Feature Selection Method and Ensemb...
Enhanced Intrusion Detection System using Feature Selection Method and Ensemb...IJCSIS Research Publications
 

Similar to Data Mining Classifier Selection Model Improves Network Intrusion Detection (20)

rpaper
rpaperrpaper
rpaper
 
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
New Hybrid Intrusion Detection System Based On Data Mining Technique to Enhan...
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
 
BENCHMARKS FOR EVALUATING ANOMALY-BASED INTRUSION DETECTION SOLUTIONS
BENCHMARKS FOR EVALUATING ANOMALY-BASED INTRUSION DETECTION SOLUTIONSBENCHMARKS FOR EVALUATING ANOMALY-BASED INTRUSION DETECTION SOLUTIONS
BENCHMARKS FOR EVALUATING ANOMALY-BASED INTRUSION DETECTION SOLUTIONS
 
A45010107
A45010107A45010107
A45010107
 
A45010107
A45010107A45010107
A45010107
 
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...
Intrusion Detection System (IDS) Development Using Tree-Based Machine Learnin...
 
Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...
Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...
Intrusion Detection System(IDS) Development Using Tree-Based Machine Learning...
 
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
Using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) to Classif...
 
1855 1860
1855 18601855 1860
1855 1860
 
1855 1860
1855 18601855 1860
1855 1860
 
A6
A6A6
A6
 
Progress of Machine Learning in the Field of Intrusion Detection Systems
Progress of Machine Learning in the Field of Intrusion Detection SystemsProgress of Machine Learning in the Field of Intrusion Detection Systems
Progress of Machine Learning in the Field of Intrusion Detection Systems
 
11421ijcPROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYST...
11421ijcPROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYST...11421ijcPROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYST...
11421ijcPROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYST...
 
ICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptxICMCSI 2023 PPT 1074.pptx
ICMCSI 2023 PPT 1074.pptx
 
ATTACK DETECTION AVAILING FEATURE DISCRETION USING RANDOM FOREST CLASSIFIER
ATTACK DETECTION AVAILING FEATURE DISCRETION USING RANDOM FOREST CLASSIFIERATTACK DETECTION AVAILING FEATURE DISCRETION USING RANDOM FOREST CLASSIFIER
ATTACK DETECTION AVAILING FEATURE DISCRETION USING RANDOM FOREST CLASSIFIER
 
Attack Detection Availing Feature Discretion using Random Forest Classifier
Attack Detection Availing Feature Discretion using Random Forest ClassifierAttack Detection Availing Feature Discretion using Random Forest Classifier
Attack Detection Availing Feature Discretion using Random Forest Classifier
 
An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...
An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...
An Empirical Comparison and Feature Reduction Performance Analysis of Intrusi...
 
Enhanced Intrusion Detection System using Feature Selection Method and Ensemb...
Enhanced Intrusion Detection System using Feature Selection Method and Ensemb...Enhanced Intrusion Detection System using Feature Selection Method and Ensemb...
Enhanced Intrusion Detection System using Feature Selection Method and Ensemb...
 
C3602021025
C3602021025C3602021025
C3602021025
 

Data Mining Classifier Selection Model Improves Network Intrusion Detection

  • 1. FLORIDA INSTITUTE OF TECHNOLOGY USING THE DATA MINING CLASSIFIER SELECTION MODEL TO IMPROVE NETWORK INTRUSION DETECTION DATA MINING FOR CYBERSECURITY FINAL PROJECT SUBMITTED TO: DR. LIAM MAYRON IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR CYB 5675: DATA MINING BY CRAIG CANNON MELBOURNE, FLORIDA JUNE 15TH 2014
  • 2. USING THE DATA MINING CLASSIFIER SELECTION MODEL TO IMPROVE NETWORK INTRUSION DETECTION INTRODUCTION: These days there are massive amounts of data being collected and stored in databases all over the world. This trend has been occurring now for decades. Databases are reaching sizes in the Terabyte range of data and could be rapidly approaching the Petabytes range in a few years. Inside these enormous data warehouses lay hidden invaluable information and knowledge that can only be extracted using some very creative data mining algorithms. These algorithms allow us to extract what is called nuggets of knowledge from large sets of data. Basically, they allow us to turn data into information. These algorithms were created originally for this purpose and have performed this purpose quite well. Nevertheless, over the years these algorithms have found themselves even more useful in the area of network intrusion detection. In the last few decades cyber attacks have been on the rise worldwide. Security breaches have plagued governments, individuals, and even corporations for some time now and seem to be an increasing trend with no end in sight. To defend against these attacks network systems such as firewalls, endpoint security servers, intrusion prevention systems, and intrusion detection systems have been placed on many networks within the global internet infrastructure. Out of all of these defense systems, network intrusion detection systems using data mining techniques is proving itself to be the most promising defense against cyber attacks. When used with data mining techniques, network intrusion detection systems can analyze massive amounts of audit data while at the same time improve detection efficiency. In the past few years researchers have been doing a lot of experimenting with various algorithms on network intrusion detection systems. Different approaches have been tried in an effort to improve both the effectiveness and efficiency of NIDS. As a result, detection has improved tremendously over the years and continues to be a major weapon in the cyber war arsenal. For this reason, choosing the best data mining classification method for network intrusion detection systems has become even more vital for future NID systems designs. Many NID system designers have begun to take a closer look at the results of this research. One particular research technique that comes to mind involves discovering algorithms that distinctively work better on certain types of attacks. Findings in this area could help immensely in enhancing intrusion detection performance. Network Intrusion Detection System (NIDS) plays vital role of identifying various types of attacks. NIDS main role is to secure the network security and its information infrastructure. The main purpose of NIDS is to detect intrusion among normal audit data which is in essence considered as a classification problem. Classification is defined as given a collection of records known as a training set where each record has a set of attributes one of which is called class. Classifications goal is to assign a class to previous unseen records. In order to do this task, data is divided into both a training set and a test set. The training set is used to build the model and the test set is used to validate the model. (Kumar, 2002) Below is a typical framework of a classification model. This framework gives a big picture overview of the experiment that is being
  • 3. undertaken using the KDD99 dataset. Basically two random samples of the data will be taken and one will be used as the training dataset while the other as the test set. See Figure 1 below. Figure 1 The purpose of this project is evaluate a wide-ranging set of classifier algorithms against the KDD dataset in order to determine if they can demonstrate better performance in a particular attack category. A comparison of each algorithm will be completed and their performance results will be matched to their specific attack category. An attempt to detect attacks in Probe, DOS, U2R, and R2L categories will be used as the baseline foundation of this project. Each of these attack categories involve distinctive attack types that and are used commonly by attackers when hacking a network. Once we have identified which algorithm best detects which attack category, we will propose a model for designing an ideal network intrusion detection systems for future usage. BACKGROUND: It is a well known fact that earning classifier models is a critical problem in data mining. Audit type data from real world networks is often presented as a set of records with multiple attributes. One of those attributes is a special category attribute called class. Initially, a training set of records with known class labels are extracted. The idea is to learn a model for the class in terms of other attributes. Once this model has been identified the objective is to use this newly found model based on both the predicted and actual classes to predict the class of any given set of records in order to optimize the system overall. Traditionally, the goal has been to reduce the number of misclassified records thereby in essence capitalize on accuracy. Many techniques in times past have been attempted to build efficient and effective classifier models. Three methods in particularly, techniques that learn rule-based models are principally well-liked in data mining.
  • 4. The first method is the Transductive Confidence Machines for K-Nearest Neighbors (TCM-KNN) technique. The TCM-KNN technique is a commonly used machine learning and data mining algorithm used for intrusion detection. It is highly effective in the areas of fraud detection, pattern recognition and outlier detection. Historically the TCM-KNN algorithm wasn’t considered an algorithm fit for intrusion detection. Nevertheless, that attitude has changed immensely over the past decade as research has proven the algorithms validity over time. Results have proven to be outstanding. It has been found that TCM-KNN has high detection rates and low false positives even when used on a limited data set for training. For instance an average true positive rate of 99.6% and a false positive rate of 0.1% was was reported on a random sampling subset of the KDD dataset. (Ngyun, 2008) TCM-KNN uses the computation of the confidence based upon randomness. Unlike conventional methods in machine learning, transduction can offer measures of dependability to individual points, and uses very extensive assumptions except for the iid assumption (the training as well as new (unlabeled) points are independently and identically distributed). TCM-KNN is an ideal method to use for pattern recognition, fraud detection, and outlier detection.(Yang, 2007) The next method is the PNrule technique. PNrule technique was assessed on the KDD testing datasheet and performed quite well in its detection efforts. Returning a true positive rate of over 96 % against denial of service attacks (DOS) and a less than 10% false positive rate overall, the PNrule proved itself competent player in the data mining space. (Ngyun, 2008) Basically, the PNrule is a two stage general-to-specific framework based on finding rules that predict the presence of a target class (P-rules) and the absence of a target class (N-rules). The main inspiration is to learn a set of P-rules that together cover most of the positive examples and yet still its numerical importance. Initially, highly accurate rules are selected, but later accuracy is compromised in favor of support. This lessening of accuracy causes some negative examples or false positives to be supported. Now, we combine all the true positives and false positives collectively supported by the union of all the P-rules, and learn N-rules on this reduced data-set to remove the false positives. This two-phase approach is what makes the PN rule approach different. PNrule also uses a method that allows it to avoid over fitting and still maintain its flexibility. (Agarwal, 2000) The final method is the Multiclass Support Vector machines (SVMs) method. SVM technically considered as a binary classification problem, is based on the idea of constructing optimal hyper-planes to improve generalization abilities. Because most real world pattern recognition applications are multi-class classification cases, multi-class SVM algorithms are ideal for solving these types of real world problems. SVM is a decomposition-based method that divides a multi-class problem into multiple binary problems in order to construct multiple two-class SVM classifiers and combine their classification results. There are three ways SVMs can be implemented using binary SVM algorithms: first there’s the one-vs-all, next there’s the one-vs-one, and finally there’s the error correcting output coding way. Among the existing decomposition approaches, the one-vs-all strategy has been regarded as a simple method with somewhat low accuracy when compared with other multi-class SVMs. The one-vs-all classifier is also extremely powerful and can produce results that are usually at least as accurate as other methods. (Xu, 2006) In times past, when SVM were evaluated on the KDD99 dataset, the results were promising. False positives were maintained at a very low level averaging out to about 0.6% while True Positives were relatively in the rage of 75% for DOS attacks and 82% for probe attacks using a small dataset of 10,000 records. (Ngyun, 2008)
  • 5. SYSTEM DESIGN AND IMPLEMENTATION: The system design experiment evaluation atmosphere consists of three steps: First the environment is setup with the required hardware and software, the data mining software is selected and installed, and the dataset itself is chosen and preprocessed. Next, five discrete classifier algorithms are chosen from a wide-ranging set of accepted classifier algorithms. These were chosen because of they represent a number of different industries and they are well known conventional classifier algorithms not to mention that they are already included in the WEKA software. The five classifier algorithms chosen are: Naïve Bayes, J48 (C4.5 revision), Decision table, JRip (Ripper) and SMO. Finally, the experiments are run using different classifier algorithms, the results are recorded, and the performances of each classifier algorithm are compared. Based on the information obtain an ideal model is suggested. All experimental iterations were done on the following hardware and software: Dell Latitude e6410 with Intel Core i5 2.53GHz processor, 4GB RAM, running on Microsoft Windows 7 operating system platform. The open source data mining machine learning package called WEKA version 3.6 for windows was used. WEKA is a data mining workbench that contains a large number of algorithms for classification and a lot of algorithms for data preprocessing, feature selection, clustering, and finding association rules. It’s an all-inclusive workbench and it’s written in Java and runs on multiple platforms. (Witten, 2011) The experiment uses the KDD99 dataset which came about as a direct result of the 1998 DARPA Intrusion Detection Evaluation Program whose sole objective was to survey and evaluate research in intrusion detection. This standard set of pre audited data includes a wide variety of intrusions simulated in a military network environment. Nine weeks of raw TCP dump data for a local-area network (LAN) simulating a typical U.S. Air Force LAN was interspersed with multiple types of attacks. The raw training data contained seven weeks of TCP dump network traffic resulting in about four million connection records. (Stolfo, 1999) A connection is a series of TCP packets starting and ending at some distinct times, between which data flows to and from a source IP address to a destination IP address. Labeled as either normal, or as an attack, each connection could only contain exactly one specific attack type. As a result four main categories of attacks were identified: DOS: denial-of-service, R2L: remote to local, U2R: user to remote, and Probing. (Stolfo, 1999) These are the four categories of attacks that will be used in this classifier experiment’s dataset. The actual type of attacks in the dataset as well as their categories can be seen here in Figure 2 below. Figure 2
  • 6. As our test dataset, the KDD99 dataset contains one type of normal data and 22 different types of attacks. Each instance of the dataset contains 41 features and one final feature for classifying. (Nguyen, 2008) In order to be able to do the experiment, we filtered the original KDD99 dataset down to about 10% of the data. This data was randomly chosen and represents a cross-section of attack types. The size of the original data set had 494,020 records and we reduce our dataset down to 49,402 records. To do this we loaded the original dataset into WEKA, went to Choose to Filters to Unsupervised to Instances and chose the remove Percentage Filter. Afterwards, we double clicked on the name of the filter and put 90% in the percentage box which leaves only 10% of the data remaining. Figures 3 and 4 screenshots can be seen below showing the before and after results of the filter being applied to the original dataset. This new dataset will be used as training set. The next random sampling will remove 97% of the data and will only leave 3% or about 14,821 instances and it will be used as the test set. Figure 3 Figure 4
  • 7. After the filter was applied, the different type of attacks had to be categorized based on the four chosen categories: DOS, Probe, U2R, and R2L. This required looking at the total numbers of each attack and putting them in their respective category type. This was all done manually after WEKA identified each attack type. An attack by category was formed for each dataset. Afterwards a distribution of connection type classifier evaluation table was created in Excel for each dataset as well. Figures 5 and 6 below shows screenshots of these Excel attack tables. These tables will be used later to do some calculations. Figure 5 Figure 6
  • 8. Once the training set and test sets were created the time had come to select the classifiers and began training. The cross validation method was chosen as the test option because it avoids overlapping test sets. Using one group for testing and the rest for training allows time to fine tune the model building process. The experiment was started with the cross validation option and continued until all the algorithms were tested. Keeping a consistent testing method across the board is really the only way to have a true comparison and get a baseline standard established prior to improving the results by doing further iterations and recommending a model. Once that is done an accurate estimate can be identified and the results of each algorithm compared. The first classifier chosen was the NaiveBayes algorithm. NaiveBayes is a powerful probabilistic algorithm that learns from training data the conditional probability of each attribute. It is called naïve because it relies on two simplifying assumptions: it assumes that the predictive attributes are conditionally independent given the class, and it posits that no hidden attributes influence the prediction process. (Nguyen, 2008) NaïveBayes does its classification by applying Bayes rule to compute the probability when given the exacting instance. Afterwards, NaïveBayes predicts the class with the highest subsequent probability. The goal of classification is to correctly forecast the value of a selected discrete class variable given a vector of predictors or attributes. NaïveBayes is very effective in many real world datasets because it learns to combine classifier predictions. Nevertheless, because of attribute redundancy its predictive accuracy is sometimes reduced. (Baitharu, 2013) Figures 7 and 8 below show the process of running the NaïveBayes algorithm on the KDD99 training dataset using the cross validation process. Figure 7
  • 9. Figure 8 The second classifier chosen was the J48 algorithm. J48 formerly known as J48 is an optimized version of C4.5 version 8. (Nguyen, 2008) J48 can make precise forecasts from data as well as explain the patterns in it. J48 can deal effectively with many problems of numeric attributes such as: missing values, pruning, estimating error rates, complexity of decision tree induction, and generating rules from trees. (Baitharu, 2013) In terms of predictive accuracy, J48 performs slightly better than CART. The learning and classification steps of J48 are very fast; however, J48 lacks scalability and efficiency when it’s applied to large datasets. This was clearly shown in the experiment when J48 was applied to the training dataset. The system running the software came to a complete halt when attempting to process the entire KDD99 dataset with over 490,000 instances. Nevertheless, on both the test and training dataset J48 performed quite well. Figures 9 and 10 screenshots below shows the process of running the J48 algorithm on the KDD99 dataset using the cross validation process. Figure 9
  • 10. Figure 10 The third classifier chosen was the Decision table algorithm which builds a decision table majority classifier. Decision Table builds by using a simple decision table majority classifier. It summarizes the dataset with a ‘decision table’ which contains the same number of attributes as the original dataset. Afterwards, a new data item is allocated to a category by finding the line in the decision table that goes with the non-class values of the data item. Decision Table uses the wrapper method to find a good subset of attributes for inclusion in the table. By eliminating attributes that contribute little or nothing to a model of the dataset, the algorithm creates a smaller and condensed decision table. (Kohavi, 1995) Figures 11 and 12 screenshots below shows the process of running the Decision Table algorithm on the KDD99 dataset using the cross validation process. Figure 11
  • 11. Figure 12 The fourth classifier chosen was the JRip (Ripper) algorithm. JRip implements a propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER). Ripper builds a rule set by frequently adding rules to a blank rule set until all constructive examples are covered. Rules are formed by greedily adding conditions to the predecessor of a rule until no negative examples are covered. JRip (RIPPER) proceeds by treating all the examples of a particular judgment in the training data as a class, and finding a set of rules that cover all the members of that class. Thereafter it proceeds to the next class and does the same, repeating this until all classes have been covered. (Rajput, 2000) JRip is a rule learner similar to the commercial rule learner RIPPER. The RIPPER rule learning algorithm assembles a rule set in which all positive examples are covered, and its algorithm performs proficiently noisy datasets. Before building a rule, the current set of training examples are partitioned into two subsets, a growing set and a pruning set. The rule is created from examples in the growing set. The rule set begins with an empty rule set and rules are added incrementally to the rule set until no negative examples are covered. To prune a rule, RIPPER chooses the deletion that make best use of the function (Witten, 2005) The fifth and final classifier is the SMO algorithm. SMO can be either linear or non-linear. SMO is a supervised classifier. SMO uses two different datasets: training set and a test set. In a best case scenario situation the classes are linearly separable by a line that splits the two classes perfectly when in actuality there are many lines that separate the dataset and finding the best one becomes the "separating line". The best line is found by making the best use of the distance to the nearest points of both classes in the training set. The difficulty of the model has to be balanced between fitting on the training data and being generic. This can be achieved by allowing models which can make errors. SMO can make some errors to avoid over-fitting by minimizing the number of errors that will be made.(Baitharu, 2013) DISCUSSION: The instances that performed the best out of the five chosen algorithms were assessed on the KDD dataset. Figure 13 gives the results of our findings. Both the true positives (TP) and false positives (FP) of each algorithm was recorded. These are the factors that were used to identify the best algorithm for a particular category. The Total Time to build the model also was used in order to get some real world insight into if these models were actually applicable in terms of real world time constraints. I must say I was somewhat surprised at the results. The J48 classifier had both the highest true positive rate for the DOS category as well as the lowest false positive rate for the DOS category. This in essence proves that certain algorithms do indeed work
  • 12. better for certain attack categories which confirms the finds of other researchers like Nguyen and Choi. Another important insight which stood out involved the long total time taken to build the model by the JRip Algorithm. JRip took 914.82 seconds to build the model while NaiveBayes only took 10.55 seconds. The Decision Table algorithm had the second highest amount of Total time taking 642.55 seconds to build the model. Decision table also had only a .70 or 70% true positive rate for DOS indicating it wouldn’t be the best algorithm to use for finding denial of service attacks. Another interesting point is that based on my datasets there weren’t many U2R and R2L attack records included in either of the datasets. This made it very difficult to identify which algorithm would be best for these attacks. As shown in figure 13 below, most of the rates besides each of these attacks are zeroes. In essence this experiment was a great project. Crunching the numbers by building spreadsheets helped me to make more sense of what was actually happening with the numbers. Based of all of the findings, I feel that J48 classifier is best for DOS attacks while Decision table is best for PROBE attacks both having the highest true positive rates as well as the lowest false positive rates in their respective attack categories. U2R and R2L attack findings were inconclusive due to a lack of data for these attacks. As a results the propose model recommended would combine both J48 and Decision Table classifiers into a NID system to improve it efficiency. If the system is found to work too slow in the real world, Decision Table would be replace with the SMO algorithm which is fairly good at identifying PROBE attacks in less time than Decision table. In essence, designing an IDS system with these two algorithms working in parallel would be a well designed IDS system. Figure 13
  • 13. CONCLUSION: In this final report I have attempted to provide a understanding of research about data mining classifier selection models and their impact on intrusion detection systems. I have also tried to give some introductory understanding into different classifier algorithms and techniques used to build efficient intrusion detection models. The WEKA data mining workbench was used as the tool of choice to demonstrate how these models would actually be designed as well as how they would work in identifying certain types of attacks. Background information was given describing different classifier methodologies and citing how researchers are progressing in this area of data mining. A System design and implementation methodology was presented outlining the project I choose to try to give an understanding of some of the current research being done today in the area of data mining and cyber security. Overall the research in this project in my opinion clearly demonstrated that a correlation exists between specific data mining algorithms and specific attack types. Understanding these relationships as well as other factors such as total time and average accuracy can prove invaluable in future designs of intrusion detection systems. Nevertheless, there is much more research that can be done in this area. From my findings I have learned that identifying the relationship between certain algorithms paired with certain attack types is an important topic for both future IDS designs as well as future security risks mitigation efforts. Having a clearer understanding of this topic has made me more aware of both the intricacies and complexities of data mining algorithms when used in cyber security applications. As ubiquitous computing continues to change the technology infrastructure landscape, one thing is for sure, data mining techniques being used in cyber security technologies will continue to gain even more importance in the ongoing cyber war for many years to come. ACKNOWLEDGEMENTS: This project has been one of the most challenging research projects that I have ever done, yet it has been one of the most rewarding projects I have ever done as well. I would like to thank to my professor, Dr. Liam Mayron, whose encouragement, guidance and support from the beginning to the end has motivated me to do my very best and never accept anything less. His wisdom and insight has not only allowed me to develop better understanding of this subject but also has taught me to never give up and for that I am eternally grateful.
  • 14. REFERENCES Agarwal, R., Joshi, M., (2000) PNrule: A New Framework for Learning Classifier Models in Data Mining (A Case-Study in Network Intrusion Detection) retrieved from: http://www.siam.org/meetings/sdm01/pdf/sdm01_30.pdf Baitharu, T., Pani, S., (2013) A Survey on Application of Machine Learning Algorithms on Data Mining: International Journal of Innovative Technology and Exploring Engineering retrieved from: http://www.ijitee.org/attachments/File/v3i7/G1363123713.pdf Kohavi, R. "Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid" from: KDD-96 Proceedings (1996) retrieved from: http://www.aaai.org/Papers/KDD/1996/KDD96-033.pdf Kumar, V. (2002) Data Mining Algorithms: Tutorial Presented at IPAM 2002 Workshop on Mathematical Challenges in Scientific Data Mining retrieved from: http://www.ipam.ucla.edu/publications/sdm2002/sdm2002_4007.pdf Nguyen, H., Choi, D., (2008) Application of Data Mining to Network Intrusion Detection: Classifier Selection Model retrieved from: http://arxiv.org/ftp/arxiv/papers/1007/1007.1268.pdf Rajput, A., Aharwal, R., (2000) J48 and JRIP Rules for E-Governance Data - retrieved from: http://www.cscjournals.org/csc/manuscript/Journals/IJCSS/volume5/Issue2/IJCSS-448.pdf Stolfo, S., Fan, W., Lee, W., Prodromidis, A’, Cost-based Modeling and Evaluation for Data Mining With Application to Fraud and Intrusion Detection: Results from the JAM Project retrieved from: https://kdd.ics.uci.edu/databases/kddcup99/task.html Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2005) Xu, X., (2006) Adaptive Intrusion Detection Based on Machine Learning: Feature Extraction, Classifier Construction and Sequential Pattern Prediction retrieved from: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=8CD0CFADEE68079885E5FD4F71B C4F7E?doi=10.1.1.107.9575&rep=rep1&type=pdf Yang, L., Guo, L. (2007) An active learning based TCM-KNN algorithm for supervised network intrusion detection retrieved from: http://sourcedb.ict.cas.cn/cn/ictthesis/200907/P020090722621827712247.pdf