Results 1 - 10
of
69
Outside the Closed World: On Using Machine Learning For Network Intrusion Detection
- In Proceedings of the IEEE Symposium on Security and Privacy
, 2010
"... Abstract—In network intrusion detection research, one popular strategy for finding attacks is monitoring a network’s activity for anomalies: deviations from profiles of normality previously learned from benign traffic, typically identified using tools borrowed from the machine learning community. Ho ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Abstract—In network intrusion detection research, one popular strategy for finding attacks is monitoring a network’s activity for anomalies: deviations from profiles of normality previously learned from benign traffic, typically identified using tools borrowed from the machine learning community. However, despite extensive academic research one finds a striking gap in terms of actual deployments of such systems: compared with other intrusion detection approaches, machine learning is rarely employed in operational “real world ” settings. We examine the differences between the network intrusion detection problem and other areas where machine learning regularly finds much more success. Our main claim is that the task of finding attacks is fundamentally different from these other applications, making it significantly harder for the intrusion detection community to employ machine learning effectively. We support this claim by identifying challenges particular to network intrusion detection, and provide a set of guidelines meant to strengthen future research on anomaly detection. Keywords-anomaly detection; machine learning; intrusion detection; network security. I.
Exploiting Dynamicity in Graph-based Traffic Analysis: Techniques and Applications
"... Network traffic can be represented by a Traffic Dispersion Graph (TDG) that contains an edge between two nodes that send a particular type of traffic (e.g., DNS) to one another. TDGs have recently been proposed as an alternative way to interpret and visualize network traffic. Previous studies have f ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Network traffic can be represented by a Traffic Dispersion Graph (TDG) that contains an edge between two nodes that send a particular type of traffic (e.g., DNS) to one another. TDGs have recently been proposed as an alternative way to interpret and visualize network traffic. Previous studies have focused on static properties of TDGs using graph snapshots in isolation. In this work, we represent network traffic with a series of related graph instances that change over time. This representation facilitates the analysis of the dynamic nature of network traffic, providing additional descriptive power. For example, DNS and P2P graph instances can appear similar when compared in isolation, but the way the DNS and P2P TDGs change over time differs significantly. To quantify the changes over time, we introduce a series of novel metrics that capture changes both in the graph structure (e.g., the average degree) and the participants (i.e., IP addresses) of a TDG. We apply our new methodologies to improve graph-based traffic classification and to detect changes in the profile of legacy applications (e.g., e-mail).
METRICFORENSICS: A Multi-Level Approach for Mining Volatile Graphs ∗
"... Advances in data collection and storage capacity have made it increasingly possible to collect highly volatile graph data for analysis. Existing graph analysis techniques are not appropriate for such data, especially in cases where streaming or near-real-time results are required. An example that ha ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Advances in data collection and storage capacity have made it increasingly possible to collect highly volatile graph data for analysis. Existing graph analysis techniques are not appropriate for such data, especially in cases where streaming or near-real-time results are required. An example that has drawn significant research interest is the cyber-security domain, where internet communication traces are collected and real-time discovery of events, behaviors, patterns, and anomalies is desired. We propose MetricForensics, a scalable framework for analysis of volatile graphs. MetricForensics combines a multi-level “drill down ” approach, a collection of user-selected graph metrics, and a collection of analysis techniques. At each successive level, more sophisticated metrics are computed and the graph is viewed at finer temporal resolutions. In this way, MetricForensics scales to highly volatile graphs by only allocating resources for computationally expensive analysis when an interesting event is discovered at a coarser resolution first. We test MetricForensics on three real-world graphs: an enterprise IP trace, a trace of legitimate and malicious network traffic from a research institution, and the MIT Reality Mining proximity sensor data. Our largest graph has ∼3M vertices and ∼32M edges, spanning 4.5 days. The results demonstrate the scalability and capability of MetricForensics in analyzing volatile graphs; and highlight four novel phenomena in such graphs: elbows, broken correlations, prolonged spikes, and lightweight stars. This work was performed under the auspices of the U.S.
On Community Outliers and their Efficient Detection in Information Networks
- KDD'10
, 2010
"... Linked or networked data are ubiquitous in many applications. Examples include web data or hypertext documents connected via hyperlinks, social networks or user profiles connected via friend links, co-authorship and citation information, blog data, movie reviews and so on. In these datasets (called ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Linked or networked data are ubiquitous in many applications. Examples include web data or hypertext documents connected via hyperlinks, social networks or user profiles connected via friend links, co-authorship and citation information, blog data, movie reviews and so on. In these datasets (called “information networks”), closely related objects that share the same properties or interests form a community. For example, a community in blogsphere could be users mostly interested in cell phone reviews and news. Outlier detection in information networks can reveal important anomalous and interesting behaviors that are not obvious if community information is ignored. An example could be a low-income person being friends with many rich people even though his income is not anomalously low when considered over the entire population. This paper first introduces the concept of community outliers (interesting points or rising stars for a more positive sense), and then shows that wellknown baseline approaches without considering links or community information cannot find these community outliers. We propose an efficient solution by modeling networked data as a mixture model composed of multiple normal communities and a set of randomly generated outliers. The probabilistic model characterizes both data and links simultaneously by defining their joint distribution based on hidden Markov random fields (HMRF). Maximizing the data likelihood and the posterior of the model gives the solution to the outlier inference problem. We apply the model on both
Enforcing Secure Object Initialization in Java
"... Abstract. Sun and the CERT recommend for secure Java development to not allow partially initialized objects to be accessed. The CERT considers the severity of the risks taken by not following this recommendation as high. The solution currently used to enforce object initialization is to implement a ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. Sun and the CERT recommend for secure Java development to not allow partially initialized objects to be accessed. The CERT considers the severity of the risks taken by not following this recommendation as high. The solution currently used to enforce object initialization is to implement a coding pattern proposed by Sun, which is not formally checked. We propose a modular type system to formally specify the initialization policy of libraries or programs and a type checker to statically check at load time that all loaded classes respect the policy. This allows to prove the absence of bugs which have allowed some famous privilege escalations in Java. Our experimental results show that our safe default policy allows to prove 91 % of classes of java.lang, java.security and javax.security safe without any annotation and by adding 57 simple annotations we proved all classes but four safe. The type system and its soundness theorem have been formalized and machine checked using Coq. 1
Histogram-Based Traffic Anomaly Detection
"... Identifying network anomalies is essential in enterprise and provider networks for diagnosing events, like attacks or failures, that severely impact performance, security, and Service Level Agreements (SLAs). Feature-based anomaly detection models (ab)normal network traffic behavior by analyzing dif ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Identifying network anomalies is essential in enterprise and provider networks for diagnosing events, like attacks or failures, that severely impact performance, security, and Service Level Agreements (SLAs). Feature-based anomaly detection models (ab)normal network traffic behavior by analyzing different packet header features, like IP addresses and port numbers. In this work, we describe a new approach to feature-based anomaly detection that constructs histograms of different traffic features, models histogram patterns, and identifies deviations from the created models. We assess the strengths and weaknesses of many design options, like the utility of different features, the construction of feature histograms, the modeling and clustering algorithms, and the detection of deviations. Compared to previous feature-based anomaly detection approaches, our work differs by constructing detailed histogram models, rather than using coarse entropy-based distribution approximations. We evaluate histogram-based anomaly detection and compare it to previous approaches using collected network traffic traces. Our results demonstrate the effectiveness of our technique in identifying a wide range of anomalies. The assessed technical details are generic and, therefore, we expect that the derived insights will be useful for similar future research efforts.
Dynamic Intrusion Detection Method for Mobile Ad Hoc Network Using CPDOD Algorithm
"... Abstract Mobile Ad hoc networks (MANETs) are susceptible to several types of attacks due to their open medium, lack of centralized monitoring and management point, dynamic topology and other features. Many of the intrusion detection techniques developed on wired networks cannot be directly applied t ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract Mobile Ad hoc networks (MANETs) are susceptible to several types of attacks due to their open medium, lack of centralized monitoring and management point, dynamic topology and other features. Many of the intrusion detection techniques developed on wired networks cannot be directly applied to MANET due to special characteristics of the networks. However, all such intrusion detection techniques suffer from performance penalties and high false alarm rates. In this paper, we propose a novel intrusion detection method by combining two anomaly methods Conformal Predictor k-nearest neighbor and Distancebased Outlier Detection (CPDOD) algorithm. A series of experimental results demonstrate that the proposed method can effectively detect anomalies with low false positive rate, high detection rate and achieve higher detection accuracy.
MultiAspectForensics: Pattern Mining on Large-scale Heterogeneous Networks with Tensor Analysis
"... Abstract—Modern applications such as web knowledge base, network traffic monitoring and online social networks have made available an unprecedented amount of network data with rich types of interactions carrying multiple attributes, for instance, port number and time tick in the case of network traf ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—Modern applications such as web knowledge base, network traffic monitoring and online social networks have made available an unprecedented amount of network data with rich types of interactions carrying multiple attributes, for instance, port number and time tick in the case of network traffic. The design of algorithms to leverage this structured relationship with the power of computing to assist researchers and practitioners for better understanding, exploration and navigation of this space of information has become a challenging, albeit rewarding, topic in social network analysis and data mining. The constantly growing scale and enriching genres of network data always demand higher levels of efficiency, robustness and generalizability where existing approaches with successes on small, homogeneous network data are likely to fall short. We introduce MultiAspectForensics, a handy tool to automatically detect and visualize novel subgraph patterns within a local community of nodes in a heterogenous network, such as a set of vertices that form a dense bipartite graph whose edges share exactly the same set of attributes. We apply the proposed method on three data sets from distinct application domains, present empirical results and discuss insights derived from these patterns discovered. Our algorithm, built on scalable tensor analysis procedures, captures spectral properties of network data and reveals informative signals for subsequent domain-specific study and investigation, such as suspicious port-scanning activities in the scenario of cybersecurity monitoring. I.
Hierarchical Probabilistic Models for Group Anomaly Detection
"... Statistical anomaly detection typically focuses on finding individual point anomalies. Often the most interesting or unusual things in a data set are not odd individual points, but rather larger scale phenomena that only become apparent when groups of points are considered. In this paper, we propose ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Statistical anomaly detection typically focuses on finding individual point anomalies. Often the most interesting or unusual things in a data set are not odd individual points, but rather larger scale phenomena that only become apparent when groups of points are considered. In this paper, we propose generative models for detecting such group anomalies. We evaluate our methods on synthetic data as well as astronomical data from the Sloan Digital Sky Survey. The empirical results show that the proposed models are effective in detecting group anomalies. 1
Unsupervised Transfer Classification: Application to Text Categorization
"... We study the problem of building the classification model for a target class in the absence of any labeled training example for that class. To address this difficult learning problem, we extend the idea of transfer learning by assuming that the following side information is available: (i) a collecti ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We study the problem of building the classification model for a target class in the absence of any labeled training example for that class. To address this difficult learning problem, we extend the idea of transfer learning by assuming that the following side information is available: (i) a collection of labeled examples belonging to other classes in the problem domain, called the auxiliary classes; (ii) the class information including the prior of the target class and the correlation betweenthetargetclass andtheauxiliaryclasses. Our goal is to construct the classification model for the target class by leveraging the above data and information. We refer to this learning problem as unsupervised transfer classification. Our framework is based on the generalized maximum entropy model that is effective in transferring the label information of the auxiliary classes to the target class. A theoretical analysis shows that under certain assumption, the classification model obtained by the proposed approach converges to the optimal model when it is learned from the labeled examples for the target class. Empirical study on text categorization over four different data sets verifies the effectiveness of the proposed approach.

