Results 1 - 10
of
59
Computer Immunology
- Communications of the ACM
, 1996
"... Natural immune systems protect animals from dangerous foreign pathogens, including bacteria, viruses, parasites, and toxins. Their role in the body is analogous to that of computer security systems in computing. Although there are many differences between living organisms and computer systems, this ..."
Abstract
-
Cited by 152 (7 self)
- Add to MetaCart
Natural immune systems protect animals from dangerous foreign pathogens, including bacteria, viruses, parasites, and toxins. Their role in the body is analogous to that of computer security systems in computing. Although there are many differences between living organisms and computer systems, this article argues that the similarities are compelling and could point the way to improved computer security. Improvements can be achieved by designing computer immune systems that have some of the important properties illustrated by natural immune systems. These include multi-layered protection, highly distributed detection and memory systems, diversity of detection ability across individuals, inexact matching strategies, and sensitivity to most new foreign patterns. We first give an overview of how the immune system relates to computer security. We then illustrate these ideas with two examples.
Probabilistic discovery of time series motifs
, 2003
"... Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of thi ..."
Abstract
-
Cited by 92 (19 self)
- Add to MetaCart
Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of this work were the poor scalability of the motif discovery algorithm, and the inability to discover motifs in the presence of noise. Here we address these limitations by introducing a novel algorithm inspired by recent advances in the problem of pattern discovery in biosequences. Our algorithm is probabilistic in nature, but as we show empirically and theoretically, it can find time series motifs with very high probability even in the presence of noise or “don’t care ” symbols. Not only is the algorithm fast, but it is an anytime algorithm, producing likely candidate motifs almost immediately, and gradually improving the quality of results over time.
Towards parameter-free data mining
- In: Proc. 10th ACM SIGKDD Intn’l Conf. Knowledge Discovery and Data Mining
, 2004
"... Most data mining algorithms require the setting of many input parameters. Two main dangers of working with parameter-laden algorithms are the following. First, incorrect settings may cause an algorithm to fail in finding the true patterns. Second, a perhaps more insidious problem is that the algorit ..."
Abstract
-
Cited by 86 (15 self)
- Add to MetaCart
Most data mining algorithms require the setting of many input parameters. Two main dangers of working with parameter-laden algorithms are the following. First, incorrect settings may cause an algorithm to fail in finding the true patterns. Second, a perhaps more insidious problem is that the algorithm may report spurious patterns that do not really exist, or greatly overestimate the significance of the reported patterns. This is especially likely when the user fails to understand the role of parameters in the data mining process. Data mining algorithms should have as few parameters as possible, ideally none. A parameter-free algorithm would limit our ability to impose our prejudices, expectations, and presumptions on the problem at hand, and would let the data itself speak to us. In this work, we show that recent results in bioinformatics and computational theory hold great promise for a parameter-free datamining paradigm. The results are motivated by observations in Kolmogorov complexity theory. However, as a practical matter, they can be implemented using any off-the-shelf compression algorithm with the addition of just a dozen or so lines of code. We will show that this approach is competitive or superior to the stateof-the-art approaches in anomaly/interestingness detection, classification, and clustering with empirical tests on time series/DNA/text/video datasets.
A survey of outlier detection methodologies
- Artificial Intelligence Review
, 2004
"... Abstract. Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populat ..."
Abstract
-
Cited by 80 (3 self)
- Add to MetaCart
Abstract. Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review.
Finding Surprising Patterns in a Time Series Database in Linear Time and Space
- In In proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2002
"... The problem of finding a specified pattern in a time series database (i.e. query by content) has received much attention and is now a relatively mature field. In contrast, the important problem of enumerating all surprising or interesting patterns has received far less attention. This problem requir ..."
Abstract
-
Cited by 78 (4 self)
- Add to MetaCart
The problem of finding a specified pattern in a time series database (i.e. query by content) has received much attention and is now a relatively mature field. In contrast, the important problem of enumerating all surprising or interesting patterns has received far less attention. This problem requires a meaningful definition of "surprise", and an efficient search technique. All previous attempts at finding surprising patterns in time series use a very limited notion of surprise, and/or do not scale to massive datasets. To overcome these lim- itations we introduce a novel technique that defines a pattern surprising if the frequency of its occurrence differs substantially from that expected by chance, given some previously seen data. This notion has the advantage of not requiring an explicit definition of surprise, which may be impossible to elicit from a domain expert. Instead the user simply gives the algorithm a collection of previously observed normal data. Our algorithm uses a suffix tree to efficiently encode the frequency of all observed patterns and allows a Markov model to predict the expected frequency of previously unobserved patterns. Once the suffix tree has been constructed, a measure of surprise for all the patterns in a new database can be determined in time and space linear in the size of the database. We demonstrate the utility of our approach with an extensive experimental evaluation.
Online Novelty Detection on Temporal Sequences
, 2003
"... Novelty detection, or anomaly detection, on temporal sequences has increasingly attracted attention from researchers in different areas. In this paper, we present a new framework for online novelty detection on temporal sequences . This framework includes a mechanism for associating each detectio ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
Novelty detection, or anomaly detection, on temporal sequences has increasingly attracted attention from researchers in different areas. In this paper, we present a new framework for online novelty detection on temporal sequences . This framework includes a mechanism for associating each detection result with a confidence value. Based on this framework, we develop a concrete online detection algorithm, by modeling the temporal sequence using an online support vector regression algorithm. Experiments on both synthetic and real world data are performed to demonstrate the promising performance of our proposed detection algorithm.
Making Time-series Classification More Accurate Using Learned Constraints
- In proc. of SDM Int’l Conf
, 2004
"... It has long been known that Dynamic Time Warping (DTW) is superior to Euclidean distance for classification and clustering of time series. However, until lately, most research has utilized Euclidean distance because it is more efficiently calculated. A recently introduced technique that greatly miti ..."
Abstract
-
Cited by 42 (13 self)
- Add to MetaCart
It has long been known that Dynamic Time Warping (DTW) is superior to Euclidean distance for classification and clustering of time series. However, until lately, most research has utilized Euclidean distance because it is more efficiently calculated. A recently introduced technique that greatly mitigates DTWs demanding CPU time has sparked a flurry of research activity. However, the technique and its many extensions still only allow DTW to be applied to moderately large datasets. In addition, almost all of the research on DTW has focused exclusively on speeding up its calculation; there has been little work done on improving its accuracy. In this work, we target the accuracy aspect of DTW performance and introduce a new framework that learns arbitrary constraints on the warping path of the DTW calculation. Apart from improving the accuracy of classification, our technique as a side effect speeds up DTW by a wide margin as well. We show the utility of our approach on datasets from diverse domains and demonstrate significant gains in accuracy and efficiency.
Immune System Approaches to Intrusion Detection - A Review
- In Proc. of the 3rd International Conference on Artificial Immune Systems, LNCS 3239
, 2004
"... The use of artificial immune systems in intrusion detection is an appealing concept for two reasons. Firstly, the human immune system provides the human body with a high level of protection from invading pathogens, in a robust, self-organised and distributed manner. Secondly, current techniques ..."
Abstract
-
Cited by 42 (16 self)
- Add to MetaCart
The use of artificial immune systems in intrusion detection is an appealing concept for two reasons. Firstly, the human immune system provides the human body with a high level of protection from invading pathogens, in a robust, self-organised and distributed manner. Secondly, current techniques used in computer security are not able to cope with the dynamic and increasingly complex nature of computer systems and their security. It is hoped that biologically inspired approaches in this area, including the use of immune-based systems will be able to meet this challenge. Here we collate the algorithms used, the development of the systems and the outcome of their implementation. It provides an introduction and review of the key developments within this field, in addition to making suggestions for future research.
Immunity-Based Systems: A Survey
- Proceeding of the IEEE International Conference on Systems, Man and Cybernetics
, 1997
"... Biological systems such as human beings can be regarded as sophisticated information processing systems, and can be expected to provide inspiration for various ideas to science and engineering. Biologically motivated information processing systems can be classified into: brain-nervous systems (neura ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
Biological systems such as human beings can be regarded as sophisticated information processing systems, and can be expected to provide inspiration for various ideas to science and engineering. Biologically motivated information processing systems can be classified into: brain-nervous systems (neural networks), genetic systems (evolutionary algorithms), and immune systems (artificial immune systems). Among these, nervous systems and genetic systems have been widely applied to various fields. There have been a relative few applications of the immune system. This paper presents a survey of artificial immune systems and provides numerous insights of immunity-based systems applications in science and engineering.
Anomaly Detection Using Real-Valued Negative Selection
- Journal of Genetic Programming and Evolvable Machines
, 2004
"... This paper describes a real-valued representation for the negative selection algorithm and its applications to anomaly detection. In many anomaly detection applications, only positive (normal) samples are available for training purpose. However, conventional classification algorithms need samples fo ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
This paper describes a real-valued representation for the negative selection algorithm and its applications to anomaly detection. In many anomaly detection applications, only positive (normal) samples are available for training purpose. However, conventional classification algorithms need samples for all classes (e.g. normal and abnormal) during the training phase. This approach uses only normal samples to generate abnormal samples, which are used as input to a classification algorithm. This hybrid approach is compared against an anomaly detection technique that uses self-organizing maps to cluster the normal data sets (samples). Experiments are performed with di#erent data sets and some results are reported.

