Results 1 - 10
of
20
GT: picking up the truth from the ground for Internet traffic ∗
"... Much of Internet traffic modeling, firewall, and intrusion detection research requires traces where some ground truth regarding application and protocol is associated with each packet or flow. This paper presents the design, development and experimental evaluation of gt, an open source software tool ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Much of Internet traffic modeling, firewall, and intrusion detection research requires traces where some ground truth regarding application and protocol is associated with each packet or flow. This paper presents the design, development and experimental evaluation of gt, an open source software toolset for associating ground truth information with Internet traffic traces. By probing the monitored host’s kernel to obtain information on active Internet sessions, gt gathers ground truth at the application level. Preliminary experimental results show that gt’s effectiveness comes at little cost in terms of overhead on the hosting machines. Furthermore, when coupled with other packet inspection mechanisms, gt can derive ground truth not only in terms of applications (e.g., e-mail), but also in terms of protocols (e.g., SMTP vs. POP3).
Nfsight: NetFlow-based Network Awareness Tool
"... Network awareness is highly critical for network and security administrators. It enables informed planning and management of network resources, as well as detection and a comprehensive understanding of malicious activity. It requires a set of tools to efficiently collect, process, and represent netw ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Network awareness is highly critical for network and security administrators. It enables informed planning and management of network resources, as well as detection and a comprehensive understanding of malicious activity. It requires a set of tools to efficiently collect, process, and represent network data. While many such tools already exist, there is no flexible and practical solution for visualizing network activity at various granularities, and quickly gaining insights about the status of network assets. To address this issue, we developed Nfsight, a Net-Flow processing and visualization application designed to offer a comprehensive network awareness solution. Nfsight constructs bidirectional flows out of the unidirectional NetFlow flows and leverages these bidirectional flows to provide client/server identification and intrusion detection capabilities. We present in this paper the internal architecture of Nfsight, the evaluation of the service, and intrusion detection algorithms. We illustrate the contributions of Nfsight through several case studies conducted by security administrators on a large university network. 1
Flexible Traffic and Host Profiling via DNS Rendezvous
"... Abstract—The ability to accurately classify network traffic and to perform timely detection of the presence of unwanted classes of traffic has important implications for network operations and security. In recent years, classification has become more challenging due to applications that use ports th ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract—The ability to accurately classify network traffic and to perform timely detection of the presence of unwanted classes of traffic has important implications for network operations and security. In recent years, classification has become more challenging due to applications that use ports that are not wellknown, that overload or masquerade with other applications’ well-known ports, and that may encrypt or otherwise obfuscate their payload. The goal of our work is to develop a method for traffic classification that is flexible, i.e., that can be used to create arbitrary organizations of traffic from coarse to finegrained groups, and can identify encrypted traffic as well as new applications. In this paper, we present a novel method for classification based on analyzing rendezvous traffic (i.e., the traffic preamble in which a given host determines the remote IP address of a peer host or service) that usually precedes application traffic. Our approach exploits the most widely used rendezvous service, the Domain Name System (DNS). Specifically, through careful tracking of client IP addresses, alpha-numeric domain names, and answer IP addresses in rendezvous traffic, we apply classification labels to end-hosts and their traffic reported by flow-export data. Additionally, we present the notion of host profiling as a method for expanding traffic classification in cases where there is not a direct match between rendezvous traffic and application traffic. To assess the feasibility of our method, we perform a focused case study on one day in the lives of two drastically different user end-host populations: office and residential. Our results demonstrate the efficacy and capability of a DNS rendezvous-based method of classification that performs well even in situations where application payload is encrypted (or unavailable) or when application traffic is monitored by packet sampling. I.
1 KISS: Stochastic Packet Inspection Classifier for UDP Traffic
"... Abstract—This paper proposes KISS, a novel Internet classification engine. Motivated by the expected raise of UDP traffic, which stems from the momentum of P2P streaming applications, we propose a novel classification framework which leverages on statistical characterization of payload. Statistical ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—This paper proposes KISS, a novel Internet classification engine. Motivated by the expected raise of UDP traffic, which stems from the momentum of P2P streaming applications, we propose a novel classification framework which leverages on statistical characterization of payload. Statistical signatures are derived by the means of a Chi-Square like test, which extracts the protocol “format”, but ignores the protocol “semantic ” and “synchronization ” rules. The signatures feed a decision process based either on the geometric distance among samples, or on Support Vector Machines. KISS is very accurate, and its signatures are intrinsically robust to packet sampling, reordering, and flow asymmetry, so that it can be used on almost any network. KISS is tested in different scenarios, considering traditional client-server protocols, VoIP and both traditional and new P2P Internet applications. Results are astonishing. The average True Positive percentage is 99.6%, with the worst case equal 98.1,% while results are almost perfect when dealing with new P2P streaming applications. Index Terms—Traffic classification, Supervised learning algorithms I.
On the Characteristics and Reasons of Long-lived Internet Flows ∗ ABSTRACT
"... Prior studies of Internet traffic have considered traffic at different resolutions and time scales: packets and flows for hours or days, aggregate packet statistics for days or weeks, and hourly trends for months. However, little is known about the long-term behavior of individual flows. In this pap ..."
Abstract
- Add to MetaCart
Prior studies of Internet traffic have considered traffic at different resolutions and time scales: packets and flows for hours or days, aggregate packet statistics for days or weeks, and hourly trends for months. However, little is known about the long-term behavior of individual flows. In this paper, we study individual flows (as defined by the 5-tuple of protocol, source and destination IP address and port) over days and weeks. While the vast majority of flows are short, and most bytes are in short flows, we find that about 20% of the overall bytes are carried in flows that last longer than 10 minutes, and flows lasting 100 minutes or longer make up 2 % of traffic. We show that long-lived flows are qualitatively different from short flows: they are generally slower, less bursty, and are due to different applications and protocols. We investigate the causes of short- and long-lived flows, and show that the traffic mix varies significantly depending on duration time scale, with computer-to-computer traffic more and more dominating in larger time scales. Categories andSubjectDescriptors
Probabilistic Graphical Models for Semi-Supervised Traffic Classification ∗
"... Traffic classification using machine learning continues to be an active research area. The majority of work in this area uses off-the-shelf machine learning tools and treats them as black-box classifiers. This approach turns all the modelling complexity into a feature selection problem. In this pape ..."
Abstract
- Add to MetaCart
Traffic classification using machine learning continues to be an active research area. The majority of work in this area uses off-the-shelf machine learning tools and treats them as black-box classifiers. This approach turns all the modelling complexity into a feature selection problem. In this paper, we build a problem-specific solution to the traffic classification problem by designing a custom probabilistic graphical model. Graphical models are a modular framework to design classifiers which incorporate domain-specific knowledge. More specifically, our solution introduces semi-supervised learning which means we learn from both labelled and unlabelled traffic flows. We show that our solution performs competitively compared to previous approaches while using less data and simpler features.
anomaly detection algorithm
"... An evaluation of automatic parameter tuning of a statistics-based ..."
SUMMARY
"... A novel host behavior classification approach is proposed as a preliminary step toward traffic classification and anomaly detection in network communication. Although many attempts described in the literature were devoted to flow or application classifications, these approaches are not always adapta ..."
Abstract
- Add to MetaCart
A novel host behavior classification approach is proposed as a preliminary step toward traffic classification and anomaly detection in network communication. Although many attempts described in the literature were devoted to flow or application classifications, these approaches are not always adaptable to the operational constraints of traffic monitoring (expected to work even without packet payload, without bidirectionality, on high-speed networks or from flow reports only, etc.). Instead, the classification proposed here relies on the leading idea that traffic is relevantly analyzed in terms of host typical behaviors: typical connection patterns of both legitimate applications (data sharing, downloading, etc.) and anomalous (eventually aggressive) behaviors are obtained by profiling traffic at the host level using unsupervised statistical classification. Classification at the host level is not reducible to flow or application classification, and neither is the contrary: they are different operations which might have complementary roles in network management. The proposed host classification is based on a ninedimensional feature space evaluating host Internet connectivity, dispersion and exchanged traffic content. A minimum spanning tree (MST) clustering technique is developed that does not require any supervised learning step to produce a set of statistically established typical host behaviors. Not relying on a priori defined classes of known behaviors enables the procedure to discover new host behaviors, that potentially were never observed

