Results 1 -
9 of
9
Fast Time Series Classification Using Numerosity Reduction
- In ICML’06
, 2006
"... Many algorithms have been proposed for the problem of time series classification. However, it is clear that one-nearest-neighbor with Dynamic Time Warping (DTW) distance is exceptionally difficult to beat. This approach has one weakness, however; it is computationally too demanding for many realtime ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
Many algorithms have been proposed for the problem of time series classification. However, it is clear that one-nearest-neighbor with Dynamic Time Warping (DTW) distance is exceptionally difficult to beat. This approach has one weakness, however; it is computationally too demanding for many realtime applications. One way to mitigate this problem is to speed up the DTW calculations. Nonetheless, there is a limit to how much this can help. In this work, we propose an additional technique, numerosity reduction, to speed up one-nearestneighbor DTW. While the idea of numerosity reduction for nearest-neighbor classifiers has a long history, we show here that we can leverage off an original observation about the relationship between dataset size and DTW constraints to produce an extremely compact dataset with little or no loss in accuracy. We test our ideas with a comprehensive set of experiments, and show that it can efficiently produce extremely fast accurate classifiers. 1.
Using Feature-based Fitness Evaluation in Symbolic Regression with Added Noise
, 2008
"... Symbolic regression is a popular genetic programming (GP) application. Typically, the fitness function for this task is based on a sum-of-errors, involving the values of the dependent variable directly calculated from the candidate expression. While this approach is extremely successful in many inst ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Symbolic regression is a popular genetic programming (GP) application. Typically, the fitness function for this task is based on a sum-of-errors, involving the values of the dependent variable directly calculated from the candidate expression. While this approach is extremely successful in many instances, its performance can deteriorate in the presence of noise. In this paper, a feature-based fitness function is considered, in which the fitness scores are determined by comparing the statistical features of the sequence of values, rather than the actual values themselves. The set of features used in the fitness evaluation are customized according to the target, and are drawn from a wide set of features capable of characterizing a variety of behaviours. Experiments examining the performance of the feature-based and standard fitness functions are carried out for non-oscillating and oscillating targets in a GP system which introduces noise during the evaluation of candidate expressions. Results show strength in the feature-based fitness function, especially for the oscillating target.
Time Series Clustering: Complex is Simpler!
"... Given a motion capture sequence, how to identify the category of the motion? Classifying human motions is a critical task in motion editing and synthesizing, for which manual labeling is clearly inefficient for large databases. Here we study the general problem of time series clustering. We propose ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Given a motion capture sequence, how to identify the category of the motion? Classifying human motions is a critical task in motion editing and synthesizing, for which manual labeling is clearly inefficient for large databases. Here we study the general problem of time series clustering. We propose a novel method of clustering time series that can (a) learn joint temporal dynamics in the data; (b) handle time lags; and (c) produce interpretable features. We achieve this by developing complex-valued linear dynamical systems (CLDS), which include real-valued Kalman filters as a special case; our advantage is that the transition matrix is simpler (just diagonal), and the transmission one easier to interpret. We then present Complex-Fit, a novel EM algorithm to learn the parameters for the general model and its special case for clustering. Our approach produces significant improvement in clustering quality, 1.5 to 5 times better than well-known competitors on real motion capture sequences. 1.
Mining Heterogeneous Multivariate Time-Series for Learning Meaningful Patterns: Application to Home Health Telecare. Laboratory TIMC-IMAG, Facult'e de m'edecine de
, 2004
"... For the last years, time-series mining has become a challenging issue for researchers. An important application lies in most monitoring purposes, which require analyzing large sets of time-series for learning usual patterns. Any deviation from this learned profile is then considered as an unexpected ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
For the last years, time-series mining has become a challenging issue for researchers. An important application lies in most monitoring purposes, which require analyzing large sets of time-series for learning usual patterns. Any deviation from this learned profile is then considered as an unexpected situation. Moreover, complex applications may involve the temporal study of several heterogeneous parameters. In that paper, we propose a method for mining heterogeneous multivariate time-series for learning meaningful patterns. The proposed approach allows for mixed time-series – containing both pattern and non-pattern data – such as for imprecise matches, outliers, stretching and global translating of patterns instances in time. We present the early results of our approach in the context of monitoring the health status of a person at home. The purpose is to build a behavioral profile of a person by analyzing the time variations of several quantitative or qualitative parameters recorded through a provision of sensors installed in the home.
Evolving Stochastic Processes Using Feature Tests and Genetic Programming
, 2009
"... The synthesis of stochastic processes using genetic programming is investigated. Stochastic process behaviours take the form of time series data, in which quantities of interest vary over time in a probabilistic, and often noisy, manner. A suite of statistical feature tests are performed on time ser ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The synthesis of stochastic processes using genetic programming is investigated. Stochastic process behaviours take the form of time series data, in which quantities of interest vary over time in a probabilistic, and often noisy, manner. A suite of statistical feature tests are performed on time series plots from example processes, and the resulting feature values are used as targets during evolutionary search. A process algebra, the stochastic π-calculus, is used to denote processes. Investigations consider variations of GP representations for a subset of the stochastic π-calculus, for example, the use of channel unification, and various grammatical constraints. Target processes of varying complexity are studied. Results show that the use of grammatical GP with statistical feature tests can successfully synthesize stochastic processes. Success depends upon a selection of appropriate feature tests for characterizing the target behaviour, and the complexity of the target process.
The Evolution of Higher-level Biochemical Reaction Models
, 2010
"... Computational tools for analyzing biochemical phenomena are becoming increasingly important. Recently, high-level formal languages for modeling and simulating biochemical reactions have been proposed. These languages make the formal modeling of complex reactions accessible to domain specialists outs ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Computational tools for analyzing biochemical phenomena are becoming increasingly important. Recently, high-level formal languages for modeling and simulating biochemical reactions have been proposed. These languages make the formal modeling of complex reactions accessible to domain specialists outside of theoretical computer science. This research explores the use of genetic programming to automate the construction of models written in one such language. Given a description of desired timecourse data, the goal is for genetic programming to construct a model that might generate the data. The language investigated is Kahramano˘gullari’s and Cardelli’s PIM language. The PIM syntax is defined in a grammar-guided genetic programming system. All time series generated during simulations are described by statistical feature tests, and the fitness evaluation compares feature proximity between the target and candidate solutions. Target PIM models of varying complexity are used as target expressions for genetic programming. Results were very successful in all cases. One reason for this success is the compositional nature of PIM, which is amenable to genetic program search.
Early Prediction on Time Series: A Nearest Neighbor Approach ∗
"... In this paper, we formulate the problem of early classification of time series data, which is important in some time-sensitive applications such as healthinformatics. We introduce a novel concept of MPL (Minimum Prediction Length) and develop ECTS (Early Classification on Time Series), an effective ..."
Abstract
- Add to MetaCart
In this paper, we formulate the problem of early classification of time series data, which is important in some time-sensitive applications such as healthinformatics. We introduce a novel concept of MPL (Minimum Prediction Length) and develop ECTS (Early Classification on Time Series), an effective 1-nearest neighbor classification method. ECTS makes early predictions and at the same time retains the accuracy comparable to that of a 1NN classifier using the full-length time series. Our empirical study using benchmark time series data sets shows that ECTS works well on the real data sets where 1NN classification is effective. 1
Characteristic-based Clustering for Time Series Data
"... With the growing importance of time series clustering research, particularly for similarity searches amongst long time series such as those arising in medicine or finance, it is critical for us to find a way to resolve the outstanding problems that make most clustering methods impractical under cert ..."
Abstract
- Add to MetaCart
With the growing importance of time series clustering research, particularly for similarity searches amongst long time series such as those arising in medicine or finance, it is critical for us to find a way to resolve the outstanding problems that make most clustering methods impractical under certain circumstances. When the time series is very long, some clustering algorithms may fail because the very notation of similarity is dubious in high dimension space; many methods cannot handle missing data when the clustering is based on a distance metric. This paper proposes a method for clustering of time series based on their structural characteristics. Unlike other alternatives, this method does not cluster point values using a distance metric, rather it clusters based on global features extracted from the time series. The feature measures are obtained from each individual series and can be fed into arbitrary clustering algorithms, including an unsupervised neural network algorithm, self-organizing map, or hierarchal clustering algorithm. Global measures describing the time series are obtained by applying statistical operations that best capture the underlying characteristics: trend, seasonality, periodicity, serial correlation, skewness, kurtosis, chaos, nonlinearity, and self-similarity. Since the method clusters using extracted global measures, it reduces the dimensionality of the time series and is much less sensitive to missing or noisy data. We further provide a search mechanism to find the best selection from the feature set that should be used as the clustering inputs. The proposed technique has been tested using benchmark time series datasets previously reported for time series clustering and a set of time series datasets with known characteristics. The empirical results show that our approach is able to yield meaningful clusters. The resulting clusters are similar to those produced by other methods, but with some promising and interesting variations that can be intuitively explained with knowledge of the global characteristics of the time series.
Invariant Time-Series Classification
"... Abstract. Time-series classification is a field of machine learning that has attracted considerable focus during the recent decades. The large number of time-series application areas ranges from medical diagnosis up to financial econometrics. Support Vector Machines (SVMs) are reported to perform no ..."
Abstract
- Add to MetaCart
Abstract. Time-series classification is a field of machine learning that has attracted considerable focus during the recent decades. The large number of time-series application areas ranges from medical diagnosis up to financial econometrics. Support Vector Machines (SVMs) are reported to perform non-optimally in the domain of time series, because they suffer detecting similarities in the lack of abundant training instances. In this study we present a novel time-series transformation method which significantly improves the performance of SVMs. Our novel transformation method is used to enlarge the training set through creating new transformed instances from the support vector instances. The new transformed instances encapsulate the necessary intra-class variations required to redefine the maximum margin decision boundary. The proposed transformation method utilizes the variance distributions from the intra-class warping maps to build transformation fields, which are applied to series instances using the Moving Least Squares algorithm. Extensive experimentations on 35 time series datasets demonstrate the superiority of the proposed method compared to both the Dynamic Time Warping version of the Nearest Neighbor and the SVMs classifiers, outperforming them in the majority of the experiments.

