Results 1 -
5 of
5
Classifier ensembles for changing environments
- In Multiple Classifier Systems
, 2004
"... Abstract. We consider strategies for building classifier ensembles for non-stationary environments where the classification task changes during the operation of the ensemble. Individual classifier models capable of online learning are reviewed. The concept of “forgetting ” is discussed. Online ensem ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Abstract. We consider strategies for building classifier ensembles for non-stationary environments where the classification task changes during the operation of the ensemble. Individual classifier models capable of online learning are reviewed. The concept of “forgetting ” is discussed. Online ensembles and strategies suitable for changing environments are summarized.
Incremental local outlier detection for data streams
- In Proceedings of IEEE Symposium on Computational Intelligence and Data Mining
, 2007
"... Abstract. Outlier detection has recently become an important problem in many industrial and financial applications. This problem is further complicated by the fact that in many cases, outliers have to be detected from data streams that arrive at an enormous pace. In this paper, an incremental LOF (L ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Abstract. Outlier detection has recently become an important problem in many industrial and financial applications. This problem is further complicated by the fact that in many cases, outliers have to be detected from data streams that arrive at an enormous pace. In this paper, an incremental LOF (Local Outlier Factor) algorithm, appropriate for detecting outliers in data streams, is proposed. The proposed incremental LOF algorithm provides equivalent detection performance as the iterated static LOF algorithm (applied after insertion of each data record), while requiring significantly less computational time. In addition, the incremental LOF algorithm also dynamically updates the profiles of data points. This is a very important property, since data profiles may change over time. The paper provides theoretical evidence that insertion of a new data point as well as deletion of an old data point influence only limited number of their closest neighbors and thus the number of updates per such insertion/deletion does not depend on the total number of points N in the data set. Our experiments performed on several simulated and real life data sets have demonstrated that the proposed incremental LOF algorithm is computationally efficient, while at the same time very successful in detecting outliers and changes of distributional behavior in various data stream applications. I.
Online clustering of parallel data streams
- In press for Data & Knowledge Engineering
, 2005
"... In recent years, the management and processing of so-called data streams has become a topic of active research in several fields of computer science such as, e.g., distributed systems, database systems, and data mining. A data stream can roughly be thought of as a transient, continuously increasing ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In recent years, the management and processing of so-called data streams has become a topic of active research in several fields of computer science such as, e.g., distributed systems, database systems, and data mining. A data stream can roughly be thought of as a transient, continuously increasing sequence of time-stamped data. In this paper, we consider the problem of clustering parallel streams of real-valued data, that is to say, continuously evolving time series. In other words, we are interested in grouping data streams the evolution over time of which is similar in a specific sense. In order to maintain an up-to-date clustering structure, it is necessary to analyze the incoming data in an online manner, tolerating not more than a constant time delay. For this purpose, we develop an efficient online version of the classical K-means clustering algorithm. Our method’s efficiency is mainly due to a scalable online transformation of the original data which allows for a fast computation of approximate distances between streams. Key words: data mining, clustering, data streams, fuzzy sets 1
Maintaining Nonparametric Estimators over Data Streams
- In Proc. BTW
, 2004
"... An effective processing and analysis of data streams is of utmost importance for a plethora of emerging applications like network monitoring, traffic management, and financial tickers. In addition to the management of transient and potentially unbounded streams, their analysis with advanced data min ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
An effective processing and analysis of data streams is of utmost importance for a plethora of emerging applications like network monitoring, traffic management, and financial tickers. In addition to the management of transient and potentially unbounded streams, their analysis with advanced data mining techniques has been identified as a research challenge. A wellestablished class of mining techniques is based on nonparametric statistics where especially nonparametric density estimation is among the essential building blocks. In this paper, we examine the maintenance of nonparametric estimators over data streams. We present a tailored framework that incrementally maintains a nonparametric estimator over a data stream while consuming only a fixed amount of memory. Our framework is memory-adaptive and therefore, supports a fundamental requirement for an operator within a data stream management system. As an example, we apply our framework to selectivity estimation of range queries, which is a popular use-case for statistical estimators. After providing an analysis of the processing cost, results of experimental comparisons are reported where synthetic data streams as well as real-world ones are considered. Our results demonstrate the accuracy of the results being produced by estimators derived from our framework. 1
B.: Wavelet Density Estimators over Data Streams
- In: Proc. of SAC. (2005
"... Many scientific and commercial applications rely on an immediate processing of transient data streams. In addition to processing queries over streams, their continuative analysis has received more attention recently. Due to specific characteristics of data streams, common analysis techniques known f ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Many scientific and commercial applications rely on an immediate processing of transient data streams. In addition to processing queries over streams, their continuative analysis has received more attention recently. Due to specific characteristics of data streams, common analysis techniques known from the area of data mining are not directly applicable to streams. One of the core operations in data analysis is density estimation, which is used for capturing an unknown distribution in various analysis tasks. Modern density estimates base either on kernel functions or wavelets, whereas the wavelet-based ones profit from their ability to identify discontinuities as well as local oscillations. In this paper, we present a new approach to computing wavelet density estimators over data streams. Our estimators allow continuous updates by arrival of new data and provide accurate analytical results, while consuming only a constant amount of memory. Moreover, our estimators are adaptive according to memory as well as CPU usage, i. e., we may change the memory size as well as its computing overhead at runtime. An experimental evaluation proves the feasibility of our approach and shows the superiority of wavelet density estimators compared to their kernel-based counterparts. 1

