Results 1 - 10
of
64
BLINC: Multilevel Traffic Classification in the Dark
- In Proceedings of ACM SIGCOMM
, 2005
"... Abstract — We present a fundamentally different approach to classifying traffic flows according to the applications that generate them. In contrast to previous methods, our approach is based on observing and identifying patterns of host behavior at the transport layer. We analyze these patterns at t ..."
Abstract
-
Cited by 98 (3 self)
- Add to MetaCart
Abstract — We present a fundamentally different approach to classifying traffic flows according to the applications that generate them. In contrast to previous methods, our approach is based on observing and identifying patterns of host behavior at the transport layer. We analyze these patterns at three levels of increasing detail (i) the social, (ii) the functional and (iii) the application level. This multilevel approach of looking at traffic flow is probably the most important contribution of this paper. Furthermore, our approach has two important features. First, it operates in the dark, having (a) no access to packet payload, (b) no knowledge of port numbers and (c) no additional information other than what current flow collectors provide. These restrictions respect privacy, technological and practical constraints. Second, it can be tuned to balance the accuracy of the classification versus the number of successfully classified traffic flows. We demonstrate the effectiveness of our approach on three real traces. Our results show that we are able to classify 80%-90 % of the traffic with more than 95 % accuracy. I.
Transport Layer Identification of P2P Traffic
, 2004
"... Since the emergence of peer-to-peer (P2P) networking in the late '90s, P2P applications have multiplied, evolved and established themselves as the leading `growth app' of Internet traffic workload. In contrast to first-generation P2P networks which used well-defined port numbers, current P2P applica ..."
Abstract
-
Cited by 68 (1 self)
- Add to MetaCart
Since the emergence of peer-to-peer (P2P) networking in the late '90s, P2P applications have multiplied, evolved and established themselves as the leading `growth app' of Internet traffic workload. In contrast to first-generation P2P networks which used well-defined port numbers, current P2P applications have the ability to disguise their existence through the use of arbitrary ports. As a result, reliable estimates of P2P traffic require examination of packet payload, a methodological landmine from legal, privacy, technical, logistic, and fiscal perspectives. Indeed, access to user payload is often rendered impossible by one of these factors, inhibiting trustworthy estimation of P2P traffic growth and dynamics. In this paper, we develop a systematic methodology to identify P2P flows at the transport layer, i.e., based on connection patterns of P2P networks, and without relying on packet payload. We believe our approach is the first method for characterizing P2P traffic using only knowledge of network dynamics rather than any user payload. To evaluate our methodology, we also develop a payload technique for P2P traffic identification, by reverse engineering and analyzing the nine most popular P2P protocols, and demonstrate its efficacy with the discovery of P2P protocols in our traces that were previously unknown to us. Finally, our results indicate that P2P traffic continues to grow unabatedly, contrary to reports in the popular media.
Pollution in P2P File Sharing Systems
- IN IEEE INFOCOM
, 2005
"... One way to combat P2P file sharing of copyrighted content is to deposit into the file sharing systems large volumes of polluted files. Without taking sides in the file sharing debate, in this paper we undertake a measurement study of the nature and magnitude of pollution in KaZaA, currently the most ..."
Abstract
-
Cited by 59 (3 self)
- Add to MetaCart
One way to combat P2P file sharing of copyrighted content is to deposit into the file sharing systems large volumes of polluted files. Without taking sides in the file sharing debate, in this paper we undertake a measurement study of the nature and magnitude of pollution in KaZaA, currently the most popular P2P file sharing system. We develop a crawling platform which crawls the majority of the KaZaA 20,000+ supernodes in less than 60 minutes. From the raw data gathered by the crawler for popular audio content, we obtain statistics on the number of unique versions and copies available in a 24-hour period. We develop an automated procedure to detect whether a given version is polluted or not, and we show that the probabilities of false positives and negatives of the detection procedure are very small. We use the data from the crawler and our pollution detection algorithm to determine the fraction of versions and fraction of copies that are polluted for several recent and old songs. We observe that pollution is pervasive for recent popular songs. We also identify and describe a number of anti-pollution mechanisms.
The KaZaA Overlay: A Measurement Study
- Computer Networks Journal (Elsevier
, 2005
"... Both in terms of number of participating users and in traffic volume, KaZaA is one of the most important applications in the Internet today. Nevertheless, because KaZaA is proprietary and uses encryption, little is understood about KaZaA’s overlay structure and dynamics, its messaging protocol, and ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
Both in terms of number of participating users and in traffic volume, KaZaA is one of the most important applications in the Internet today. Nevertheless, because KaZaA is proprietary and uses encryption, little is understood about KaZaA’s overlay structure and dynamics, its messaging protocol, and its index management. We have built two measurement apparatus- the KaZaA Sniffing Platform and the KaZaA Probing Tool- to unravel many of the mysteries behind KaZaA. We deploy the apparatus to study KaZaA’s overlay structure and dynamics, its neighbor selection, its use of dynamic port numbers to circumvent firewalls, and its index management. Although this study does not fully solve the KaZaA puzzle, it nevertheless leads to a coherent description of KaZaA and its overlay. Furthermore, we leverage the measurement results to set forth a number of key principles for the design of a successful unstructured P2P overlay. The measurement results and resulting design principles in this paper should be useful for future architects of P2P overlay networks as well as for engineers managing ISPs. 1 1
ACAS: Automated construction of application signatures
- In SIGCOMM’05 MineNet Workshop
, 2005
"... An accurate mapping of traffic to applications is important for a broad range of network management and measurement tasks. Internet applications have traditionally been identified using well-known default server network-port numbers in the TCP or UDP headers. However this approach has become increas ..."
Abstract
-
Cited by 38 (1 self)
- Add to MetaCart
An accurate mapping of traffic to applications is important for a broad range of network management and measurement tasks. Internet applications have traditionally been identified using well-known default server network-port numbers in the TCP or UDP headers. However this approach has become increasingly inaccurate. An alternate, more accurate technique is to use specific application-level features in the protocol exchange to guide the identification. Unfortunately deriving the signatures manually is very time consuming and difficult. In this paper, we explore automatically extracting application signatures from IP traffic payload content. In particular we apply three statistical machine learning algorithms to automatically identify signatures for a range of applications. The results indicate that this approach is highly accurate and scales to allow online application identification on high speed links. We also discovered that content signatures still work in the presence of encryption. In these cases we were able to derive content signature for unencrypted handshakes negotiating the encryption parameters of a particular connection.
Traffic Classification Using Clustering Algorithms
- Proceedings of the ACM SIGCOMM Workshop on Mining Network Data (MineNet
, 2006
"... Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiti ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
Classification of network traffic using port-based or payload-based analysis is becoming increasingly difficult with many peer-to-peer (P2P) applications using dynamic port numbers, masquerading techniques, and encryption to avoid detection. An alternative approach is to classify traffic by exploiting the distinctive characteristics of applications when they communicate on a network. We pursue this latter approach and demonstrate how cluster analysis can be used to effectively identify groups of traffic that are similar using only transport layer statistics. Our work considers two unsupervised clustering algorithms, namely K-Means and DBSCAN, that have previously not been used for network traffic classification. We evaluate these two algorithms and compare them to the previously used AutoClass algorithm, using empirical Internet traces. The experimental results show that both K-Means and DBSCAN work very well and much more quickly then AutoClass. Our results indicate that although DBSCAN has lower accuracy compared to K-Means and AutoClass, DBSCAN produces better clusters.
Detecting Botnets with Tight Command and Control
- Proceedings of the 31st IEEE Conference on Local Computer Networks (LCN
, 2006
"... Systems are attempting to detect botnets by examining traffic content for IRC commands or by setting up honeynets. Our approach for detecting botnets is to examine flow characteristics such as bandwidth, duration, and packet timing looking for evidence of botnet command and control activity. We have ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
Systems are attempting to detect botnets by examining traffic content for IRC commands or by setting up honeynets. Our approach for detecting botnets is to examine flow characteristics such as bandwidth, duration, and packet timing looking for evidence of botnet command and control activity. We have constructed an architecture that first eliminates traffic that is unlikely to be a part of a botnet, classifies the remaining traffic into a group that is likely to be part of a botnet, then correlates the likely traffic to find common communications patterns that would suggest the activity of a botnet. Our results show that botnet evidence can be extracted from a traffic trace containing almost 9 million flows. 1
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines
- SIGCOMM '06
, 2006
"... Many networking applications require fast state lookups in a concurrent state machine, which tracks the state of a large number of flows simultaneously. We consider the question of how to compactly represent such concurrent state machines. To achieve compactness, we consider data structures for Appr ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
Many networking applications require fast state lookups in a concurrent state machine, which tracks the state of a large number of flows simultaneously. We consider the question of how to compactly represent such concurrent state machines. To achieve compactness, we consider data structures for Approximate Concurrent State Machines (ACSMs) that can return false positives, false negatives, or a “don’t know” response. We describe three techniques based on Bloom filters and hashing, and evaluate them using both theoretical analysis and simulation. Our analysis leads us to an extremely efficient hashing-based scheme with several parameters that can be chosen to trade off space, computation, and the impact of errors. Our hashing approach also yields a simple alternative structure with the same functionality as a counting Bloom filter that uses much less space. We show how ACSMs can be used for video congestion control. Using an ACSM, a router can implement sophisticated Active Queue Management (AQM) techniques for video traffic (without the need for standards changes to mark packets or change video formats), with a factor of four reduction in memory compared to full-state schemes and with very little error. We also show that ACSMs show promise for real-time detection of P2P traffic.
Finding peer-to-peer file-sharing using coarse network behaviors
- In Proceedings of the 11th European Symposium on Research in Computer Security
, 2006
"... Abstract. A user who wants to use a service forbidden by their site’s usage policy can masquerade their packets in order to evade detection. One masquerade technique sends prohibited traffic on TCP ports commonly used by permitted services, such as port 80. Users who hide their traffic in this way p ..."
Abstract
-
Cited by 20 (8 self)
- Add to MetaCart
Abstract. A user who wants to use a service forbidden by their site’s usage policy can masquerade their packets in order to evade detection. One masquerade technique sends prohibited traffic on TCP ports commonly used by permitted services, such as port 80. Users who hide their traffic in this way pose a special challenge, since filtering by port number risks interfering with legitimate services using the same port. We propose a set of tests for identifying masqueraded peer-to-peer file-sharing based on traffic summaries (flows). Our approach is based on the hypothesis that these applications have observable behavior that can be differentiated without relying on deep packet examination. We develop tests for these behaviors that, when combined, provide an accurate method for identifying these masqueraded services without relying on payload or port number. We test this approach by demonstrating that our integrated detection mechanism can identify BitTorrent with a 72 % true positive rate and virtually no observed false positives in control services (FTP-Data, HTTP, SMTP). 1
Internet Traffic Classification Demystified: The Myths, Caveats and Best Practices
- In Proc. ACM CoNEXT
, 2008
"... Recent research on Internet traffic classification algorithms has yielded a flurry of proposed approaches for distinguishing types of traffic, but no systematic comparison of the various algorithms. This fragmented approach to traffic classification research leaves the operational community with no ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Recent research on Internet traffic classification algorithms has yielded a flurry of proposed approaches for distinguishing types of traffic, but no systematic comparison of the various algorithms. This fragmented approach to traffic classification research leaves the operational community with no basis for consensus on what approach to use when, and how to interpret results. In this work we critically revisit traffic classification by conducting a thorough evaluation of three classification approaches, based on transport layer ports, host behavior, and flow features. A strength of our work is the broad range of data against which we test the three classification approaches: seven traces with payload collected in Japan, Korea, and the US. The diverse geographic locations, link characteristics and application traffic mix in these data allowed us to evaluate the approaches under a wide variety of conditions. We analyze the advantages and limitations of each approach, evaluate methods to overcome the limitations, and extract insights and recommendations for both the study and practical application of traffic classification. We make our software, classifiers, and data available for researchers interested in validating or extending this work. 1.

