Results 1 -
2 of
2
Anomaly Detection over Noisy Data using Learned Probability Distributions
- In Proceedings of the International Conference on Machine Learning
, 2000
"... Traditional anomaly detection techniques focus on detecting anomalies in new data after training on normal (or clean) data. In this paper we present a technique for detecting anomalies without training on normal data. We present a method for detecting anomalies within a data set that contains a larg ..."
Abstract
-
Cited by 81 (9 self)
- Add to MetaCart
Traditional anomaly detection techniques focus on detecting anomalies in new data after training on normal (or clean) data. In this paper we present a technique for detecting anomalies without training on normal data. We present a method for detecting anomalies within a data set that contains a large number of normal elements and relatively few anomalies. We present a mixture model for explaining the presence of anomalies in the data. Motivated by the model, the approach uses machine learning techniques to estimate a probability distribution over the data and applies a statistical test to detect the anomalies. The anomaly detection technique is applied to intrusion detection by examining intrusions manifested as anomalies in UNIX system call traces.
Improved smoothing for probabilistic suffix trees seen as variable order Markov chains
- IN EUROPEAN CONFERENCE ON MACHINE LEARNING (ECML
, 2002
"... In this paper, we compare Probabilistic Suffix Trees (PST), recently proposed, to a specic smoothing of Markov chains and show that they both induce the same model, namely a variable order Markov chain. We show a weakness of PST in terms of smoothing and propose to use an enhanced smoothing. We sh ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
In this paper, we compare Probabilistic Suffix Trees (PST), recently proposed, to a specic smoothing of Markov chains and show that they both induce the same model, namely a variable order Markov chain. We show a weakness of PST in terms of smoothing and propose to use an enhanced smoothing. We show that the model based on enhanced smoothing outperform the PST while needing less parameters on a protein domain detection task on public databases.

