Results 31  40
of
182
An indexing scheme for fast similarity search in large time series databases
 In proceedings of the 11 th International Conference on Scientific and Statistical Database Management
, 1999
"... We address the problem of similarity search in large time series databases. We introduce a novel indexing algorithm that allows faster retrieval. The index is formed by creating bins that contain time series subsequences of approximately the same shape. For each bin, we can quickly calculate a lower ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
We address the problem of similarity search in large time series databases. We introduce a novel indexing algorithm that allows faster retrieval. The index is formed by creating bins that contain time series subsequences of approximately the same shape. For each bin, we can quickly calculate a lowerbound on the distance between a given query and the most similar element of the bin. This bound allows us to search the bins in best first order, and to prune some bins from the search space without having to examine the contents. Additional speedup is obtained by optimizing the data within the bins such that we can avoid having to compare the query to every item in the bin. We call our approach STBindexing and experimentally validate it on space telemetry, medical and synthetic data, demonstrating approximately an order of magnitude speedup.
Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data
"... Abstract—Given the pervasiveness of time series data in all human endeavors, and the ubiquity of clustering as a data mining application, it is somewhat surprising that the problem of time series clustering from a single stream remains largely unsolved. Most work on time series clustering considers ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Given the pervasiveness of time series data in all human endeavors, and the ubiquity of clustering as a data mining application, it is somewhat surprising that the problem of time series clustering from a single stream remains largely unsolved. Most work on time series clustering considers the clustering of individual time series, e.g., gene expression profiles, individual heartbeats or individual gait cycles. The few attempts at clustering time series streams have been shown to be objectively incorrect in some cases, and in other cases shown to work only on the most contrived datasets by carefully adjusting a large set of parameters. In this work, we make two fundamental contributions. First, we show that the problem definition for time series clustering from streams currently used is inherently flawed, and a new definition is necessary. Second, we show that the Minimum Description Length (MDL) framework offers an efficient, effective and essentially parameterfree method for time series clustering. We show that our method produces objectively correct results on a wide variety of datasets from medicine, zoology and industrial process analyses. Keywords—time series; clustering; MDL I.
Clustering of streaming time series is meaningless
 In Proc. of the SIGMOD workshop in Data Mining and Knowledge Discovery
, 2003
"... Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it’s own right as an exploratory technique, and also as a subroutine in more complex data mining algor ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it’s own right as an exploratory technique, and also as a subroutine in more complex data mining algorithms such as rule discovery, indexing, summarization, anomaly detection, and classification. Given these two facts, it is hardly surprising that time series clustering has attracted much attention. The data to be clustered can be in one of two formats: many individual time series, or a single time series, from which individual time series are extracted with a sliding window. Given the recent explosion of interest in streaming data and online algorithms, the latter case has received much attention. In this work we make a surprising claim. Clustering of streaming time series is completely meaningless. More concretely, clusters extracted from streaming time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising, since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. Although the primary contribution of our work is to draw attention to the fact that an apparent solution to an important problem is incorrect and should no longer be used, we also introduce a novel method which, based on the concept of time series motifs, is able to meaningfully cluster some streaming time series datasets.
Knowledge Discovery from Series of Interval Events
 Journal of Intelligent Information Systems
, 2000
"... Knowledge discovery from data sets can be extensively automated by using data mining software tools. Techniques for mining series of interval events, however, have not been considered. Such time series are common in many applications. In this paper, we propose mining techniques to discover temporal ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
Knowledge discovery from data sets can be extensively automated by using data mining software tools. Techniques for mining series of interval events, however, have not been considered. Such time series are common in many applications. In this paper, we propose mining techniques to discover temporal containment relationships in such series. Specifically, an item A is said to contain an item B if an eventoftype B occurs during the time span of an eventoftype A, and this is a frequent relationship in the data set. Mining such relationships provides insight about temporal relationships among various items. We implement the technique and analyze trace data collected from a real database application. Experimental results indicate that the proposed mining technique can discover interesting results. Wealso introduce a quantization technique as a preprocessing step to generalize the method to all time series.
Multivariate Time Series Classification with Temporal Abstractions
"... The increase in the number of complex temporal datasets collected today has prompted the development of methods that extend classical machine learning and data mining methods to timeseries data. This work focuses on methods for multivariate timeseries classification. Time series classification is ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
The increase in the number of complex temporal datasets collected today has prompted the development of methods that extend classical machine learning and data mining methods to timeseries data. This work focuses on methods for multivariate timeseries classification. Time series classification is a challenging problem mostly because the number of temporal features that describe the data and are potentially useful for classification is enormous. We study and develop a temporal abstraction framework for generating multivariate time series features suitable for classification tasks. We propose the STFMine algorithm that automatically mines discriminative temporal abstraction patterns from the time series data and uses them to learn a classification model. Our experimental evaluations, carried out on both synthetic and real world medical data, demonstrate the benefit of our approach in learning accurate classifiers for timeseries datasets.
RuleGrowth: Mining Sequential Rules Common to Several Sequences by PatternGrowth
 SAC 2011
, 2011
"... Mining sequential rules from large databases is an important topic in data mining fields with wide applications. Most of the relevant studies focused on finding sequential rules appearing in a single sequence of events and the mining task dealing with multiple sequences were far less explored. In th ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
Mining sequential rules from large databases is an important topic in data mining fields with wide applications. Most of the relevant studies focused on finding sequential rules appearing in a single sequence of events and the mining task dealing with multiple sequences were far less explored. In this paper, we present RuleGrowth, a novel algorithm for mining sequential rules common to several sequences. Unlike other algorithms, RuleGrowth uses a patterngrowth approach for discovering sequential rules such that it can be much more efficient and scalable. We present a comparison of RuleGrowth’s performance with current algorithms for three public datasets. The experimental results show that RuleGrowth clearly outperforms current algorithms for all three datasets under low support and confidence threshold and has a much better scalability.
Sequential pattern mining in multiple streams
 In Proc. ICDM
, 2005
"... In this paper, we deal with mining sequential patterns in multiple data streams. Building on a stateoftheart sequential pattern mining algorithm PrefixSpan for mining transaction databases, we propose MILE 1, an efficient algorithm to facilitate the mining process. MILE recursively utilizes the k ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we deal with mining sequential patterns in multiple data streams. Building on a stateoftheart sequential pattern mining algorithm PrefixSpan for mining transaction databases, we propose MILE 1, an efficient algorithm to facilitate the mining process. MILE recursively utilizes the knowledge of existing patterns to avoid redundant data scanning, and can therefore effectively speed up the new patterns ’ discovery process. Another unique feature of MILE is that it can incorporate some prior knowledge of the data distribution in data streams into the mining process to further improve the performance. Extensive empirical results show that MILE is significantly faster than PrefixSpan. As MILE consumes more memory than PrefixSpan, we also present a solution to balance the memory usage and time efficiency in memory constrained environments. 1.
Densitybased Clustering of Time Series Subsequences
"... Doubts have been raised that time series subsequences can be clustered in a meaningful way. This paper introduces a kerneldensitybased algorithm that detects meaningful patterns in the presence of a vast number of randomwalklike subsequences. The value of densitybased algorithms for noise elimi ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Doubts have been raised that time series subsequences can be clustered in a meaningful way. This paper introduces a kerneldensitybased algorithm that detects meaningful patterns in the presence of a vast number of randomwalklike subsequences. The value of densitybased algorithms for noise elimination in general has long been demonstrated. The challenge of applying such techniques to timeseries data consists in first specifying uninteresting sequences that are to be considered as noise, and secondly ensuring that those uninteresting sequences will not a#ect the clustering result. Both problems are addressed in this paper and the success of the technique is demonstrated on several standard data sets.
C.: F4: largescale automated forecasting using fractals
 In: Proc. of CIKM’02. (2002) 2–9
, 2002
"... Forecasting has attracted a lot of research interest, with very successful methods for periodic time series. Here, we propose a fast, automated method to do nonlinear forecasting, for both periodic as well as chaotic time series. We use the technique of delay coordinate embedding, which needs sever ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
(Show Context)
Forecasting has attracted a lot of research interest, with very successful methods for periodic time series. Here, we propose a fast, automated method to do nonlinear forecasting, for both periodic as well as chaotic time series. We use the technique of delay coordinate embedding, which needs several parameters; our contribution is the automated way of setting these parameters, using the concept of ‘intrinsic dimensionality’. Our operational system has fast and scalable algorithms for preprocessing and, using Rtrees, also has fast methods for forecasting. The result of this work is a blackbox which, given a time series as input, finds the best parameter settings, and generates a prediction system. Tests on real and synthetic data show that our system achieves low error, while it can handle arbitrarily large datasets. Categories and Subject Descriptors H.2.8 [Database Applications]: Data Mining—time series forecasting
Recursive information granulation: aggregation and interpretation issues
 IEEE Trans. Systems, Man and CyberneticsB
"... Abstract—This paper contributes to the conceptual and algorithmic framework of information granulation. We revisit the role of information granules that are relevant to several main classes of technical pursuits involving temporal and spatial granulation. A detailed algorithm of information granulat ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
Abstract—This paper contributes to the conceptual and algorithmic framework of information granulation. We revisit the role of information granules that are relevant to several main classes of technical pursuits involving temporal and spatial granulation. A detailed algorithm of information granulation, regarded as an optimization problem reconciling two conflicting design criteria, namely, a specificity of information granules and their experimental relevance (coverage of numeric data), is provided in the paper. The resulting information granules are formalized in the language of set theory (interval analysis). The uniform treatment of data points and data intervals (sets) allows for a recursive application of the algorithm. We assess the quality of information granules through the application of fuzzymeans (FCM) clustering algorithm. Numerical studies deal with twodimensional (2D) synthetic data and experimental traffic data.