SlideShare uma empresa Scribd logo
1 de 19
`
Traffic Classification based on Machine Learning
using Flow-level Information
Jong Gun Lee (jglee@an.kaist.ac.kr)
Advanced Networking Lab.
`
Table of Contents
• Motivation of this work
• Background about machine learning
• Our approach using machine learning
• Experiment (dataset and result)
• Conclusion
`
Motivation
• We cannot effectively classify the traffic of some new
emergent applications,
– such as online games and streaming applications
– because there is no application information, such as port
number or a common byte sequence in payload
We propose a methodology to classify Internet traffic
with supervised and unsupervised learning
`
Basic Terminologies of Machine Learning
• Classifier
is mapping unlabeled instances into classes
• Instance
is a single object of the world
• Attribute
is a single object of the world
• Feature
is the specification of an attribute and its value
• Feature vector
is a list of features describing an instance
`
Unsupervised and Supervised Learning
• Supervised learning (with answer/teacher)
– With a training set, a classifier learns the characteristics of each
class. And when entering new instance, the classifier predicts
the class of the instance.
• Unsupervised learning (without answer/teacher)
– With only a set of data (feature vectors), a classifier make a set
of clusters.
`
K-Means
• One of the unsupervised learning methods
• K value is the number of clusters and this value is given as
the initial parameter
• Procedure
– First, the classifier randomly chooses K points as the centers of
K subspaces
– Second, it divides the overall vector space into K subspaces
according to the centers
– Third, it picks new K centers for each subspaces
– And then, it iterates 2nd
and 3rd
steps until all of the centers are
not changed or moved within the threshold value
`
Example of K-Means
• # of instance: 8, K=2
`
Overall Process of Our Method
Unsupervised
Learning
Feature
Extraction
Supervised
Learning
N packets N feature
vectors
Classifier
K Clusters
Classification
Method
`
Flow-level Feature Information
• Protocol number: 6(TCP) or 17(UDP)
• Duration: seconds
• Number of packets per second (PPS)
• Mean of size of all packets
• Mean of size of non-ACK packets
• Rate of ACK packets
• Interaction Information
`
Feature Extraction (Interaction Information)
• Interaction Information
– H: 2-dimensional histogram, 16x16
– p1, p2, p3, …, pn
• a sequence of packets size of a flow and its partner flow
according to timestamp
For i = 1 : n-1
H[pi/100][pi+1/100]++
A sequence of packets’ size: 40, 80, 1500, …, 40, 1500
Pair-wise representation: [40, 80], [80, 1500], …, [40, 1500]
Histogram: [40/100, 80/100], [80/100, 1500/100], … , [40/100, 1500/100]
[0, 0], [0, 15], …, [0, 15]
`
Guideline
Unsupervised
Learning
Supervised
Learning
Feature
Extraction
Packets N feature
vectors
K clusters
yes
no
Classifier
Rx and Tx
Rx only
Tx only
#bins, bin size
Dynamic/static
Initial ??
packets
Effetive K
estimation
Efficient
theshold
What kind of
learning methodFeature
extraction
Unknown
TRaffic
`
Dataset
• 6412 bittorrent.arff
• 4913 clubbox.arff
• 101355 edonkey.arff
• 21060 fileguri.arff
• 635 ftp.arff
• 200274 http.arff
• 3611 https.arff
• 22 melon.arff
• 4986 msnp.arff
• 1565 nateon.arff
• 169 nntp.arff
• 63 pop3.arff
• 224 sayclub.arff
• 40556 smtp.arff
• 67 ssh.arff
• 385912 total
• 1500 bittorrent.arff
• 1500 clubbox.arff
• 1500 edonkey.arff
• 1500 fileguri.arff
• 0 ftp.arff
• 1500 http.arff
• 1500 https.arff
• 0 melon.arff
• 1500 msnp.arff
• 1500 nateon.arff
• 0 nntp.arff
• 0 pop3.arff
• 0 sayclub.arff
• 1500 smtp.arff
• 0 ssh.arff
• 13500 total
`
`
`
Sum of Squared Error (SSE)
• How to get SSE
• #bins: 8*8
• #clusters: 1~20
`
Fitting of SSE
Y=1.446e004 * X^(-1.194) + 755.8
`
Estimation of SSE
`
Decrease Rate of SSE
0.1% decrease
`
To do list
• Direction
– Rx and Tx, Rx only, and Tx only
• Dynamic bin size
• Initial N packets or all the packets
• Different (un)supervised learning method
• Different feature extraction method

Mais conteúdo relacionado

Mais procurados

Qos Quality of services
Qos   Quality of services Qos   Quality of services
Qos Quality of services HayderThary
 
Railway booking & management system
Railway booking & management systemRailway booking & management system
Railway booking & management systemNikhil Raj
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classificationKrish_ver2
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningParas Kohli
 
web communication protocols in IoT
web communication protocols in IoTweb communication protocols in IoT
web communication protocols in IoTFabMinds
 
Voice oriented data communication
Voice oriented data communicationVoice oriented data communication
Voice oriented data communicationAnkit Anand
 
Report on the sky x technology.
Report on the sky x technology.Report on the sky x technology.
Report on the sky x technology.Udirmaan Deka
 
Slides for protocol layering and network applications
Slides for protocol layering and network applicationsSlides for protocol layering and network applications
Slides for protocol layering and network applicationsjajinekkanti
 
message communication protocols in IoT
message communication protocols in IoTmessage communication protocols in IoT
message communication protocols in IoTFabMinds
 
Firewall Design and Implementation
Firewall Design and ImplementationFirewall Design and Implementation
Firewall Design and Implementationajeet singh
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 

Mais procurados (20)

Snmp
SnmpSnmp
Snmp
 
Qos Quality of services
Qos   Quality of services Qos   Quality of services
Qos Quality of services
 
Railway booking & management system
Railway booking & management systemRailway booking & management system
Railway booking & management system
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
Text summarization
Text summarization Text summarization
Text summarization
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
web communication protocols in IoT
web communication protocols in IoTweb communication protocols in IoT
web communication protocols in IoT
 
Spam Email identification
Spam Email identificationSpam Email identification
Spam Email identification
 
Voice oriented data communication
Voice oriented data communicationVoice oriented data communication
Voice oriented data communication
 
Report on the sky x technology.
Report on the sky x technology.Report on the sky x technology.
Report on the sky x technology.
 
Slides for protocol layering and network applications
Slides for protocol layering and network applicationsSlides for protocol layering and network applications
Slides for protocol layering and network applications
 
message communication protocols in IoT
message communication protocols in IoTmessage communication protocols in IoT
message communication protocols in IoT
 
Delivery and Forwarding of IP Packets
Delivery and Forwarding of IP PacketsDelivery and Forwarding of IP Packets
Delivery and Forwarding of IP Packets
 
Transport layer
Transport layer Transport layer
Transport layer
 
IntServ & DiffServ
IntServ & DiffServIntServ & DiffServ
IntServ & DiffServ
 
Firewall Design and Implementation
Firewall Design and ImplementationFirewall Design and Implementation
Firewall Design and Implementation
 
Delay telerant network
Delay telerant networkDelay telerant network
Delay telerant network
 
Quality of service
Quality of serviceQuality of service
Quality of service
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Class diagram
Class diagramClass diagram
Class diagram
 

Semelhante a ` Traffic Classification based on Machine Learning

malware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year projectmalware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year projectNaveenAd4
 
Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]MithunPChandra
 
Performance evaluation methods for P2P overlays
Performance evaluation methods for P2P overlaysPerformance evaluation methods for P2P overlays
Performance evaluation methods for P2P overlaysKnut-Helge Vik
 
2016-04-27 research seminar, 2nd presenter
2016-04-27 research seminar, 2nd presenter2016-04-27 research seminar, 2nd presenter
2016-04-27 research seminar, 2nd presenterifi8106tlu
 
Packet Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String KernelsPacket Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String KernelsIJERA Editor
 
Analysis of algorithn class 2
Analysis of algorithn class 2Analysis of algorithn class 2
Analysis of algorithn class 2Kumar
 
background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
 
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...ssuser4b1f48
 
Cognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from MicrosoftCognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from MicrosoftŁukasz Grala
 
Learning with classification and clustering, neural networks
Learning with classification and clustering, neural networksLearning with classification and clustering, neural networks
Learning with classification and clustering, neural networksShaun D'Souza
 
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...AntareepMajumder
 
181123 asynchronous method for deep reinforcement learning seunghyeok back
181123 asynchronous method for deep reinforcement learning seunghyeok back181123 asynchronous method for deep reinforcement learning seunghyeok back
181123 asynchronous method for deep reinforcement learning seunghyeok backSeungHyeok Baek
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...Madan Golla
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfKundjanasith Thonglek
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowJen Stirrup
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...Till Blume
 

Semelhante a ` Traffic Classification based on Machine Learning (20)

Iiwas19 yamazaki slide
Iiwas19 yamazaki slideIiwas19 yamazaki slide
Iiwas19 yamazaki slide
 
malware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year projectmalware detection ppt for vtu project and other final year project
malware detection ppt for vtu project and other final year project
 
Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]Fast detection of transformed data leaks[mithun_p_c]
Fast detection of transformed data leaks[mithun_p_c]
 
Performance evaluation methods for P2P overlays
Performance evaluation methods for P2P overlaysPerformance evaluation methods for P2P overlays
Performance evaluation methods for P2P overlays
 
2016-04-27 research seminar, 2nd presenter
2016-04-27 research seminar, 2nd presenter2016-04-27 research seminar, 2nd presenter
2016-04-27 research seminar, 2nd presenter
 
Unit i
Unit iUnit i
Unit i
 
Packet Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String KernelsPacket Classification using Support Vector Machines with String Kernels
Packet Classification using Support Vector Machines with String Kernels
 
Analysis of algorithn class 2
Analysis of algorithn class 2Analysis of algorithn class 2
Analysis of algorithn class 2
 
background.pptx
background.pptxbackground.pptx
background.pptx
 
TINET_FRnOG_2008_public
TINET_FRnOG_2008_publicTINET_FRnOG_2008_public
TINET_FRnOG_2008_public
 
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
NS-CUK Seminar: S.T.Nguyen, Review on "Continuous-Time Sequential Recommendat...
 
Cognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from MicrosoftCognitive Toolkit - Deep Learning framework from Microsoft
Cognitive Toolkit - Deep Learning framework from Microsoft
 
Learning with classification and clustering, neural networks
Learning with classification and clustering, neural networksLearning with classification and clustering, neural networks
Learning with classification and clustering, neural networks
 
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
 
181123 asynchronous method for deep reinforcement learning seunghyeok back
181123 asynchronous method for deep reinforcement learning seunghyeok back181123 asynchronous method for deep reinforcement learning seunghyeok back
181123 asynchronous method for deep reinforcement learning seunghyeok back
 
Document clustering for forensic analysis an approach for improving compute...
Document clustering for forensic   analysis an approach for improving compute...Document clustering for forensic   analysis an approach for improving compute...
Document clustering for forensic analysis an approach for improving compute...
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
 
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and TensorflowArtificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
Artificial Intelligence and Deep Learning in Azure, CNTK and Tensorflow
 
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...Towards an Incremental Schema-level Index  for Distributed Linked Open Data G...
Towards an Incremental Schema-level Index for Distributed Linked Open Data G...
 

Mais de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mais de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

` Traffic Classification based on Machine Learning

  • 1. ` Traffic Classification based on Machine Learning using Flow-level Information Jong Gun Lee (jglee@an.kaist.ac.kr) Advanced Networking Lab.
  • 2. ` Table of Contents • Motivation of this work • Background about machine learning • Our approach using machine learning • Experiment (dataset and result) • Conclusion
  • 3. ` Motivation • We cannot effectively classify the traffic of some new emergent applications, – such as online games and streaming applications – because there is no application information, such as port number or a common byte sequence in payload We propose a methodology to classify Internet traffic with supervised and unsupervised learning
  • 4. ` Basic Terminologies of Machine Learning • Classifier is mapping unlabeled instances into classes • Instance is a single object of the world • Attribute is a single object of the world • Feature is the specification of an attribute and its value • Feature vector is a list of features describing an instance
  • 5. ` Unsupervised and Supervised Learning • Supervised learning (with answer/teacher) – With a training set, a classifier learns the characteristics of each class. And when entering new instance, the classifier predicts the class of the instance. • Unsupervised learning (without answer/teacher) – With only a set of data (feature vectors), a classifier make a set of clusters.
  • 6. ` K-Means • One of the unsupervised learning methods • K value is the number of clusters and this value is given as the initial parameter • Procedure – First, the classifier randomly chooses K points as the centers of K subspaces – Second, it divides the overall vector space into K subspaces according to the centers – Third, it picks new K centers for each subspaces – And then, it iterates 2nd and 3rd steps until all of the centers are not changed or moved within the threshold value
  • 7. ` Example of K-Means • # of instance: 8, K=2
  • 8. ` Overall Process of Our Method Unsupervised Learning Feature Extraction Supervised Learning N packets N feature vectors Classifier K Clusters Classification Method
  • 9. ` Flow-level Feature Information • Protocol number: 6(TCP) or 17(UDP) • Duration: seconds • Number of packets per second (PPS) • Mean of size of all packets • Mean of size of non-ACK packets • Rate of ACK packets • Interaction Information
  • 10. ` Feature Extraction (Interaction Information) • Interaction Information – H: 2-dimensional histogram, 16x16 – p1, p2, p3, …, pn • a sequence of packets size of a flow and its partner flow according to timestamp For i = 1 : n-1 H[pi/100][pi+1/100]++ A sequence of packets’ size: 40, 80, 1500, …, 40, 1500 Pair-wise representation: [40, 80], [80, 1500], …, [40, 1500] Histogram: [40/100, 80/100], [80/100, 1500/100], … , [40/100, 1500/100] [0, 0], [0, 15], …, [0, 15]
  • 11. ` Guideline Unsupervised Learning Supervised Learning Feature Extraction Packets N feature vectors K clusters yes no Classifier Rx and Tx Rx only Tx only #bins, bin size Dynamic/static Initial ?? packets Effetive K estimation Efficient theshold What kind of learning methodFeature extraction Unknown TRaffic
  • 12. ` Dataset • 6412 bittorrent.arff • 4913 clubbox.arff • 101355 edonkey.arff • 21060 fileguri.arff • 635 ftp.arff • 200274 http.arff • 3611 https.arff • 22 melon.arff • 4986 msnp.arff • 1565 nateon.arff • 169 nntp.arff • 63 pop3.arff • 224 sayclub.arff • 40556 smtp.arff • 67 ssh.arff • 385912 total • 1500 bittorrent.arff • 1500 clubbox.arff • 1500 edonkey.arff • 1500 fileguri.arff • 0 ftp.arff • 1500 http.arff • 1500 https.arff • 0 melon.arff • 1500 msnp.arff • 1500 nateon.arff • 0 nntp.arff • 0 pop3.arff • 0 sayclub.arff • 1500 smtp.arff • 0 ssh.arff • 13500 total
  • 13. `
  • 14. `
  • 15. ` Sum of Squared Error (SSE) • How to get SSE • #bins: 8*8 • #clusters: 1~20
  • 16. ` Fitting of SSE Y=1.446e004 * X^(-1.194) + 755.8
  • 18. ` Decrease Rate of SSE 0.1% decrease
  • 19. ` To do list • Direction – Rx and Tx, Rx only, and Tx only • Dynamic bin size • Initial N packets or all the packets • Different (un)supervised learning method • Different feature extraction method