Results 11 - 20
of
414
Intrusion detection with unlabeled data using clustering
- In Proceedings of ACM CSS Workshop on Data Mining Applied to Security (DMSA-2001
, 2001
"... Abstract Intrusions pose a serious security risk in a network environment. Although systems can be hardened against many types of intrusions, often intrusions are successful making systems for detecting these intrusions critical to the security of these system. New intrusion types, of which detectio ..."
Abstract
-
Cited by 191 (6 self)
- Add to MetaCart
(Show Context)
Abstract Intrusions pose a serious security risk in a network environment. Although systems can be hardened against many types of intrusions, often intrusions are successful making systems for detecting these intrusions critical to the security of these system. New intrusion types, of which detection systems are unaware, are the most difficult to detect. Current signature based methods and learning algorithms which rely on labeled data to train, generally can not detect these new intrusions. In addition, labeled training data in order to train misuse and anomaly detection systems is typically very expensive. We present a new type of clustering-based intrusion detection algorithm, unsupervised anomaly detection, which trains on unlabeled data in order to detect new intrusions. In our system, no manually or otherwise classified data is necessary for training. Our method is able to detect many different types of intrusions, while maintaining a low false positive rate as verified over the KDD CUP 1999 dataset.. 1 Introduction A network intrusion attack can be any use of a network that compromises its stability or the security of information that is stored on computers connected to it. A very wide range of activity falls under this definition, including attempts to destabilize the network as a whole, gain unauthorized access to files or privileges, or simply mishandling and misuse of software. Added security measures can not stop all such attacks. The goal of intrusion detection is to build a system which would automatically scan network activity and detect such intrusion attacks. Once an attack is detected, the system administrator is informed and can take corrective action.
Activity Monitoring: Noticing interesting changes in behavior
- In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 1999
"... We introduce a problem class which we term activity monitoring. Such problems involve monitoring the behavior of a large population of entities for interesting events requiring action. We present a framework within which each of the individual problems has a natural expression, as well as a methodol ..."
Abstract
-
Cited by 175 (9 self)
- Add to MetaCart
(Show Context)
We introduce a problem class which we term activity monitoring. Such problems involve monitoring the behavior of a large population of entities for interesting events requiring action. We present a framework within which each of the individual problems has a natural expression, as well as a methodology for evaluating performance of activity monitoring techniques. We show that two superficially different tasks, news story monitoring and intrusion detection, can be expressed naturally within the framework, and show that key differences in solution methods can be compared. 1 Introduction In this paper we introduce a problem class which we term activity monitoring. Such problems typically involve monitoring the behavior of a large population of entities for interesting events requiring action. Examples include the tasks of fraud detection, computer intrusion detection, network performance monitoring, crisis monitoring, some forms of fault detection, and news story monitoring. These appli...
Learning when Training Data are Costly: The Effect of Class Distribution on Tree Induction
, 2002
"... For large, real-world inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the data and/or the computational costs associated with learning from the data. One question of practical importance is: if n ..."
Abstract
-
Cited by 173 (9 self)
- Add to MetaCart
(Show Context)
For large, real-world inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the data and/or the computational costs associated with learning from the data. One question of practical importance is: if n training examples are going to be selected, in what proportion should the classes be represented? In this article we analyze the relationship between the marginal class distribution of training data and the performance of classification trees induced from these data, when the size of the training set is fixed. We study twenty-six data sets and, for each, determine the best class distribution for learning. Our results show that, for a fixed number of training examples, it is often possible to obtain improved classifier performance by training with a class distribution other than the naturally occurring class distribution. For example, we show that to build a classifier robust to different misclassification costs, a balanced class distribution generally performs quite well. We also describe and evaluate a budgetsensitive progressive-sampling algorithm that selects training examples such that the resulting training set has a good (near-optimal) class distribution for learning.
Iterative classification of relational data
- Papers of the AAAI-2000 Workshop on Learning Statistical Models From Relational Data
, 2000
"... Relational data offer a unique opportunity for improving the classification accuracy of statistical models. If two objects are related, inferring something about one object can aid inferences about the other. We present an iterative classification procedure that exploits this characteristic of relat ..."
Abstract
-
Cited by 162 (15 self)
- Add to MetaCart
Relational data offer a unique opportunity for improving the classification accuracy of statistical models. If two objects are related, inferring something about one object can aid inferences about the other. We present an iterative classification procedure that exploits this characteristic of relational data. This approach uses simple Bayesian classifiers in an iterative fashion, dynamically updating the attributes of some objects as inferences are made about related objects. Inferences made with high confidence in initial iterations are fed back into the data and are used to inform subsequent inferences about related objects. We evaluate the performance of this approach on a binary classification task. Experiments indicate that iterative classification significantly increases accuracy when compared to a single-pass approach.
Tree Induction for Probability-based Ranking
, 2002
"... Tree induction is one of the most effective and widely used methods for building classification models. However, many applications require cases to be ranked by the probability of class membership. Probability estimation trees (PETs) have the same attractive features as classification trees (e.g., c ..."
Abstract
-
Cited by 161 (4 self)
- Add to MetaCart
Tree induction is one of the most effective and widely used methods for building classification models. However, many applications require cases to be ranked by the probability of class membership. Probability estimation trees (PETs) have the same attractive features as classification trees (e.g., comprehensibility, accuracy and efficiency in high dimensions and on large data sets). Unfortunately, decision trees have been found to provide poor probability estimates. Several techniques have been proposed to build more accurate PETs, but, to our knowledge, there has not been a systematic experimental analysis of which techniques actually improve the probability-based rankings, and by how much. In this paper we first discuss why the decision-tree representation is not intrinsically inadequate for probability estimation. Inaccurate probabilities are partially the result of decision-tree induction algorithms that focus on maximizing classification accuracy and minimizing tree size (for example via reduced-error pruning). Larger trees can be better for probability estimation, even if the extra size is superfluous for accuracy maximization. We then present the results of a comprehensive set of experiments, testing some straghtforward methods for improving probability-based rankings. We show that using a simple, common smoothing method--the Laplace correction--uniformly improves probability-based rankings. In addition, bagging substantioJly improves the rankings, and is even more effective for this purpose than for improving accuracy. We conclude that PETs, with these simple modifications, should be considered when rankings based on class-membership probability are required.
Learning Influence Probabilities In Social Networks
"... Recently, there has been tremendous interest in the phenomenon of influence propagation in social networks. The studies in this area assume they have as input to their problems a social graph with edges labeled with probabilities of influence between users. However, the question of where these proba ..."
Abstract
-
Cited by 148 (18 self)
- Add to MetaCart
(Show Context)
Recently, there has been tremendous interest in the phenomenon of influence propagation in social networks. The studies in this area assume they have as input to their problems a social graph with edges labeled with probabilities of influence between users. However, the question of where these probabilities come from or how they can be computed from real social network data has been largely ignored until now. Thus it is interesting to ask whether from a social graph and a log of actions by its users, one can build models of influence. This is the main problem attacked in this paper. In addition to proposing models and algorithms for learning the model parameters and for testing the learned models to make predictions, we also develop techniques for predicting the time by which a user may be expected to perform an action. We validate our ideas and techniques using the Flickr data set consisting of a social graph with 1.3M nodes, 40M edges, and an action log consisting of 35M tuples referring to 300K distinct actions. Beyond showing that there is genuine influence happening in a real social network, we show that our techniques have excellent prediction performance.
Anomaly Detection over Noisy Data using Learned Probability Distributions
- In Proceedings of the International Conference on Machine Learning
, 2000
"... Traditional anomaly detection techniques focus on detecting anomalies in new data after training on normal (or clean) data. In this paper we present a technique for detecting anomalies without training on normal data. We present a method for detecting anomalies within a data set that contains a larg ..."
Abstract
-
Cited by 143 (9 self)
- Add to MetaCart
Traditional anomaly detection techniques focus on detecting anomalies in new data after training on normal (or clean) data. In this paper we present a technique for detecting anomalies without training on normal data. We present a method for detecting anomalies within a data set that contains a large number of normal elements and relatively few anomalies. We present a mixture model for explaining the presence of anomalies in the data. Motivated by the model, the approach uses machine learning techniques to estimate a probability distribution over the data and applies a statistical test to detect the anomalies. The anomaly detection technique is applied to intrusion detection by examining intrusions manifested as anomalies in UNIX system call traces.
SMOTEBoost: improving prediction of the minority class in boosting
- In Proceedings of the Principles of Knowledge Discovery in Databases, PKDD-2003
, 2003
"... Abstract. Many real world data mining applications involve learning from imbalanced data sets, where the particular events of interest may be very few when compared to the other classes. Learning from data sets that contain rare events usually produces biased classifiers that have a higher predictiv ..."
Abstract
-
Cited by 121 (13 self)
- Add to MetaCart
(Show Context)
Abstract. Many real world data mining applications involve learning from imbalanced data sets, where the particular events of interest may be very few when compared to the other classes. Learning from data sets that contain rare events usually produces biased classifiers that have a higher predictive accuracy over the majority class(es), but poorer predictive accuracy over the minority class of interest. SMOTE (Synthetic Minority Over-sampling Technique) is a recent approach that is specifically designed for learning with minority classes. Boosting is a promising ensemble-based learning algorithm that can improve the classification performance of any weak classifier. This paper presents a novel approach for learning from rare classes, based on a combination of the SMOTE algorithm and the boosting procedure. Unlike standard boosting where all misclassified examples are given equal weights, the novel SMOTEBoost approach creates synthetic examples from the minority class, thus indirectly changing the updating weights and compensating for skewed distributions. The SMOTEBoost algorithm applied to several highly and moderately imbalanced data sets shows improvement in prediction performance on the rare class compared to standard boosting and to the SMOTE algorithm used with a single classifier. 1.
The Effect of Class Distribution on Classifier Learning: An Empirical Study
, 2001
"... In this article we analyze the effect of class distribution on classifier learning. We begin by describing the different ways in which class distribution affects learning and how it affects the evaluation of learned classifiers. We then present the results of two comprehensive experimental studie ..."
Abstract
-
Cited by 107 (2 self)
- Add to MetaCart
(Show Context)
In this article we analyze the effect of class distribution on classifier learning. We begin by describing the different ways in which class distribution affects learning and how it affects the evaluation of learned classifiers. We then present the results of two comprehensive experimental studies. The first study compares the performance of classifiers generated from unbalanced data sets with the performance of classifiers generated from balanced versions of the same data sets. This comparison allows us to isolate and quantify the effect that the training set's class distribution has on learning and contrast the performance of the classifiers on the minority and majority classes. The second study assesses what distribution is "best" for training, with respect to two performance measures: classification accuracy and the area under the ROC curve (AUC). A tacit assumption behind much research on classifier induction is that the class distribution of the training data should match the "natural" distribution of the data. This study shows that the naturally occurring class distribution often is not best for learning, and often substantially better performance can be obtained by using a different class distribution. Understanding how classifier performance is affected by class distribution can help practitioners to choose training data---in real-world situations the number of training examples often must be limited due to computational costs or the costs associated with procuring and preparing the data. 1.
Meta-Learning in Distributed Data Mining Systems: Issues and Approaches
- Advances of Distributed Data Mining
, 2000
"... Data mining systems aim to discover patterns and extract useful information from facts recorded in databases. A widely adopted approach to this objective is to apply various machine learning algorithms to compute descriptive models of the available data. Here, we explore one of the main challeng ..."
Abstract
-
Cited by 103 (0 self)
- Add to MetaCart
(Show Context)
Data mining systems aim to discover patterns and extract useful information from facts recorded in databases. A widely adopted approach to this objective is to apply various machine learning algorithms to compute descriptive models of the available data. Here, we explore one of the main challenges in this research area, the development of techniques that scale up to large and possibly physically distributed databases. Meta-learning is a technique that seeks to compute higher-level classifiers (or classification models), called meta-classifiers, that integrate in some principled fashion multiple classifiers computed separately over different databases. This study, describes meta-learning and presents the JAM system (Java Agents for Meta-learning), an agent-based meta-learning system for large-scale data mining applications. Specifically, it identifies and addresses several important desiderata for distributed data mining systems that stem from their additional complexity co...