Results 1 - 10
of
95
Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule
, 2003
"... Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic ..."
Abstract
-
Cited by 84 (4 self)
- Add to MetaCart
Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the e#ciency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.
Anomaly Detection: A Survey
, 2007
"... Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and c ..."
Abstract
-
Cited by 69 (1 self)
- Add to MetaCart
Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the di®erent directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.
On the Capability of an SOM based Intrusion Detection system
- IEEE-INNS International Joint Conference on Neural Networks. Pp 1808-1813, 2003. and Ongun
, 1996
"... Abstract—An approach to network intrusion detection is investigated, based purely on a hierarchy of Self-Organizing Feature Maps. Our principle interest is to establish just how far such an approach can be taken in practice. To do so, the KDD benchmark dataset from the International Knowledge Discov ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
Abstract—An approach to network intrusion detection is investigated, based purely on a hierarchy of Self-Organizing Feature Maps. Our principle interest is to establish just how far such an approach can be taken in practice. To do so, the KDD benchmark dataset from the International Knowledge Discovery and Data Mining Tools Competition is employed. This supplies a connection-based description of a factitious computer network in which each connection is described in terms of 41 features. Unlike previous approaches, only 6 of the most basic features are employed. The resulting system is capable of detection (false positive) rates of 89 % (4.6%), where this is at least as good as the alternative data-mining approaches that require all 41 features.
Learning intrusion detection: supervised or unsupervised?
- IN: IMAGE ANALYSIS AND PROCESSING, PROC. OF 13TH ICIAP CONFERENCE. (2005) 50–57
, 2005
"... Application and development of specialized machine learning techniques is gaining increasing attention in the intrusion detection community. A variety of learning techniques proposed for different intrusion detection problems can be roughly classified into two broad categories: supervised (classifi ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Application and development of specialized machine learning techniques is gaining increasing attention in the intrusion detection community. A variety of learning techniques proposed for different intrusion detection problems can be roughly classified into two broad categories: supervised (classification) and unsupervised (anomaly detection and clustering). In this contribution we develop an experimental framework for comparative analysis of both kinds of learning techniques. In our framework we cast unsupervised techniques into a special case of classification, for which training and model selection can be performed by means of ROC analysis. We then investigate both kinds of learning techniques with respect to their detection accuracy and ability to detect unknown attacks.
Language Models for Detection of Unknown Attacks in Network Traffic
, 2006
"... In this paper we propose a method for network intrusion detection based on language models. Our method proceeds by extracting language features such as n-grams and words from connection payloads and applying unsupervised anomaly detection – without prior learning phase or presence of labeled data. T ..."
Abstract
-
Cited by 18 (7 self)
- Add to MetaCart
In this paper we propose a method for network intrusion detection based on language models. Our method proceeds by extracting language features such as n-grams and words from connection payloads and applying unsupervised anomaly detection – without prior learning phase or presence of labeled data. The essential part of this procedure is linear-time computation of similarity measures between language models of connection payloads. Particular patterns in these models decisive for differentiation of attacks and normal data can be traced back to attack semantics and utilized for automatic generation of attack signatures. Results of experiments conducted on two datasets of network traffic demonstrate the importance of higher-order n-grams and variable-length language models for detection of unknown network attacks. An implementation of our system achieved detection accuracy of over 80 % with no false positives on instances of recent remote-to-local attacks in HTTP, FTP and SMTP traffic.
Feature Bagging for Outlier Detection
- In KDD ’05
, 2005
"... Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is proposed. It combines results from multiple outlier detection algori ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is proposed. It combines results from multiple outlier detection algorithms that are applied using different set of features. Every outlier detection algorithm uses a small subset of features that are randomly selected from the original feature set. As a result, each outlier detector identifies different outliers, and thus assigns to all data records outlier scores that correspond to their probability of being outliers. The outlier scores computed by the individual outlier detection algorithms are then combined in order to find the better quality outliers. Experiments performed on several synthetic and real life data sets show that the proposed methods for combining outputs from multiple outlier detection algorithms provide non-trivial improvements over the base algorithm.
Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems
- In Proceedings of the IEEE International Conference on Data Mining (ICDM’06
, 2006
"... Unsupervised or unlabeled learning approaches for network anomaly detection have been recently proposed. In particular, recent work on unlabeled anomaly detection focused on high speed classification based on simple payload statistics. For example, PAYL, an anomaly IDS, measures the occurrence frequ ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
Unsupervised or unlabeled learning approaches for network anomaly detection have been recently proposed. In particular, recent work on unlabeled anomaly detection focused on high speed classification based on simple payload statistics. For example, PAYL, an anomaly IDS, measures the occurrence frequency in the payload of n-grams. A simple model of normal traffic is then constructed according to this description of the packets ’ content. It has been demonstrated that anomaly detectors based on payload statistics can be “evaded ” by mimicry attacks using byte substitution and padding techniques. In this paper we propose a new approach to construct high speed payload-based anomaly IDS intended to be accurate and hard to evade. We propose a new technique to extract the features from the payload. We use a feature clustering algorithm originally proposed for text classification problems to reduce the dimensionality of the feature space. Accuracy and hardness of evasion are obtained by constructing our anomaly-based IDS using an ensemble of one-class SVM classifiers that work on different feature spaces. 1
A Behavior-based Approach To Securing Email Systems
- Mathematical Methods, Models and Architectures for Computer Networks Security
, 2003
"... The Malicious Email Tracking (MET) system, reported in a prior publication, is a behavior-based security system for email services. The Email Mining Toolkit (EMT) presented in this paper is an offline email archive data mining analysis system that is designed to assist computing models of malicious ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
The Malicious Email Tracking (MET) system, reported in a prior publication, is a behavior-based security system for email services. The Email Mining Toolkit (EMT) presented in this paper is an offline email archive data mining analysis system that is designed to assist computing models of malicious email behavior for deployment in an online MET system. EMT includes a variety of behavior models for email attachments, user accounts and groups of accounts. Each model computed is used to detect anomalous and errant email behaviors. We report on the set of features implemented in the current version of EMT, and describe tests of the system and our plans for extensions to the set of models.
Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters
- In Proc. 28th Australasian CS Conf., volume 38 of CRPITV
, 2005
"... Most current network intrusion detection systems employ signature-based methods or data mining-based methods which rely on labelled training data. This training data is typically expensive to produce. Moreover, these methods have difficulty in detecting new types of attack. Using unsupervised anomal ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Most current network intrusion detection systems employ signature-based methods or data mining-based methods which rely on labelled training data. This training data is typically expensive to produce. Moreover, these methods have difficulty in detecting new types of attack. Using unsupervised anomaly detection techniques, however, the system can be trained with unlabelled data and is capable of detecting previously "unseen" attacks. In this paper, we present a new density-based and grid-based clustering algorithm that is suitable for unsupervised anomaly detection. We evaluated our methods using the 1999 KDD Cup data set. Our evaluation shows that the accuracy of our approach is close to that of existing techniques reported in the literature, and has several advantages in terms of computational complexity.
Robust Support Vector Machines for Anomaly Detection
- In Proc. 2003 International Conference on Machine Learning and Applications (ICMLA’03
, 2003
"... MIT’s Lincoln Labs to study intrusion detection systems, the performance of robust support vector machines (RVSMs) was compared with that of conventional support vector machines and nearest neighbor classifiers in separating normal usage profiles from intrusive profiles of computer programs. The res ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
MIT’s Lincoln Labs to study intrusion detection systems, the performance of robust support vector machines (RVSMs) was compared with that of conventional support vector machines and nearest neighbor classifiers in separating normal usage profiles from intrusive profiles of computer programs. The results indicate the superiority of RSVMs not only in terms of high intrusion detection accuracy and low false positives but also in terms of their generalization ability in the presence of noise and running time. Keywords—Intrusion detection, computer security, robust support vector machines, noisy data. I.

