Because of the enormous growth in the number of peer-to-peer (P2P) applications in recent years, P2P traffic now constitutes a substantial proportion of Internet traffic. The ability to accurately identify different P2P applications from the network traffic is essential for managing a number of network traffic issues, such as service differentiation and capacity planning. However, modern P2P applications often use proprietary protocols,dynamic port numbers, and packet encryptions, which make traditional identification approaches like port-based or signaturebased identification less effective.
In this paper, we propose an approach for accurately recognizing P2P applications running on monitored hosts based on signaling behavior, which is regulated by the underlying P2P protocol; therefore, each application possesses a distinguishing characteristic. We consider that the signaling behavior of each P2P application can serve as a unique signature for application identification. Our approach is particularly useful for three reasons: 1) it does not need to access the packet payload; 2) it recognizes applications based purely on their signaling behavior; and 3) it can identify particular P2P applications. The performance evaluation shows that 92% of a real-life traffic trace can be correctly recognized within a 5-minute monitoring period.
3. Introduction
P2P traffic constitutes a substantial volume of Internet
traffic
Accurately identify P2P applications from the network
traffic is important
Network management, capacity planning, etc.
Conventional approaches: port numbers or payload
signatures
Dynamic ports, encrypted payload
ICC09 3
4. Fundamentals of Our Scheme
P2P applications generate two types of traffic
Data transfer traffic
File‐sharing or file‐redistribution
Signaling traffic
File information refreshment, peer discovery, control information
exchange, etc.
Signaling activity is regulated by the underlying P2P
protocol
Each P2P application may have a unique characteristic
ICC09 4
5. Fundamentals of Our Scheme
Verify our conjecture
Compare the signaling activity patterns of BitTorrent,
eMule, and Skype
Traffic data
Capture the traffic of 3 hosts that execute BitTorrent,
eMule, or Skype
Assume packets with payload size smaller than 100 bytes
are signaling packets
ICC09 5
6. Signaling Activity Patterns
Assign id to hosts that were contacted by the monitored
host based on the order in which they are observed
BitTorrent
Intensive exchange of signaling packets
The BitTorrent client progressively discovers new hosts
ICC09 6
7. Signaling Activity Patterns
eMule
The number of hosts increases rapidly in the first 10
minutes but increases slowly thereafter
Skype
Most of signaling packets belong to the probe traffic
ICC09 7
8. Proposed Scheme
Identify P2P applications running on hosts based on the
signaling behavior
How to characterize signaling traffic?
ICC09 8
9. Signaling Behavior Characterization
Keep track of signaling packets of a monitored host for a
period of time
Count the number of hosts contacted and the number of
packets sent and received every minute
Classify hosts contacted with the monitored host into 2 types
Sending/receiving packets within 5 minutes => old host
Otherwise => new host
Characterize the signaling behavior on two levels
Host level: based on the number of new or old hosts
Message level: based on the number of new or old packets
ICC09 9
10. Signaling Behavior Features
Host level
Ratio of new / old hosts
Growth rate of new / old hosts
Correlation coefficient between the number of new and old hosts
Message level
Ratio of new / old packets
Growth rate of new / old packets
Correlation coefficient between the number of new and old packets
ICC09 10
11. Example
Host level ‐ ratio of new hosts
Keep track of hosts contacted with the monitored host
Incoming direction in the 6th min.: B and D are old hosts; A,
G, and H are new hosts
Ratio of new hosts in the 6th min. => 3/5
Old host New host
Incoming Outgoing
Direction Direction
A B CD BED C B AE B FG A
A BC BCD BD CBE D B F B GADH
1 2 3 4 5 6
Monitor time (min.)
ICC09 11
12. Identifier Design
Adopt support vector machine (SVM)
Training phase
Derive features from each training data
Label each training data with the name of P2P applications
Train the SVM classifier
Identification phase
Derive features from a signaling packet stream
Use the trained classifier to determine the P2P application
ICC09 12
13. Traffic Data
Category Hosts Packets
BitTorrent 110,711 104,722,150
eMule 42,377 36,716,588
Skype 61,777 34,076,328
World of Warcraft 218 2,528,359
TELNET 362 21,118,522
HTTP 4,448 28,264,360
ICC09 13
15. Conclusion
Summary
Identify distinct P2P applications without examining
payload
Characterize signaling behavior possessed by P2P
applications
Future work
Consider the case that a host launches multiple P2P
applications
Short flows?
ICC09 15