Results 1  10
of
16
Fast Time Series Classification Using Numerosity Reduction
 In ICML’06
, 2006
"... Many algorithms have been proposed for the problem of time series classification. However, it is clear that onenearestneighbor with Dynamic Time Warping (DTW) distance is exceptionally difficult to beat. This approach has one weakness, however; it is computationally too demanding for many realtime ..."
Abstract

Cited by 34 (10 self)
 Add to MetaCart
Many algorithms have been proposed for the problem of time series classification. However, it is clear that onenearestneighbor with Dynamic Time Warping (DTW) distance is exceptionally difficult to beat. This approach has one weakness, however; it is computationally too demanding for many realtime applications. One way to mitigate this problem is to speed up the DTW calculations. Nonetheless, there is a limit to how much this can help. In this work, we propose an additional technique, numerosity reduction, to speed up onenearestneighbor DTW. While the idea of numerosity reduction for nearestneighbor classifiers has a long history, we show here that we can leverage off an original observation about the relationship between dataset size and DTW constraints to produce an extremely compact dataset with little or no loss in accuracy. We test our ideas with a comprehensive set of experiments, and show that it can efficiently produce extremely fast accurate classifiers. 1.
SemiSupervised Time Series Classification
"... The problem of time series classification has attracted great interest in the last decade. However current research assumes the existence of large amounts of labeled training data. In reality, such data may be very difficult or expensive to obtain. For example, it may require the time and expertise ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
The problem of time series classification has attracted great interest in the last decade. However current research assumes the existence of large amounts of labeled training data. In reality, such data may be very difficult or expensive to obtain. For example, it may require the time and expertise of cardiologists, space launch technicians, or other domain specialists. As in many other domains, there are often copious amounts of unlabeled data available. For example, the PhysioBank archive contains gigabytes of ECG data. In this work we propose a semisupervised technique for building time series classifiers. While such algorithms are well known in text domains, we will show that special considerations must be made to make them both efficient and effective for the time series domain. We evaluate our work with a comprehensive set of experiments on diverse data sources including electrocardiograms, handwritten documents, manufacturing, and video datasets. The experimental results demonstrate that our approach requires only a handful of labeled examples to construct accurate classifiers.
Time Series Clustering: Complex is Simpler!
"... Given a motion capture sequence, how to identify the category of the motion? Classifying human motions is a critical task in motion editing and synthesizing, for which manual labeling is clearly inefficient for large databases. Here we study the general problem of time series clustering. We propose ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Given a motion capture sequence, how to identify the category of the motion? Classifying human motions is a critical task in motion editing and synthesizing, for which manual labeling is clearly inefficient for large databases. Here we study the general problem of time series clustering. We propose a novel method of clustering time series that can (a) learn joint temporal dynamics in the data; (b) handle time lags; and (c) produce interpretable features. We achieve this by developing complexvalued linear dynamical systems (CLDS), which include realvalued Kalman filters as a special case; our advantage is that the transition matrix is simpler (just diagonal), and the transmission one easier to interpret. We then present ComplexFit, a novel EM algorithm to learn the parameters for the general model and its special case for clustering. Our approach produces significant improvement in clustering quality, 1.5 to 5 times better than wellknown competitors on real motion capture sequences. 1.
Using Featurebased Fitness Evaluation in Symbolic Regression with Added Noise
, 2008
"... Symbolic regression is a popular genetic programming (GP) application. Typically, the fitness function for this task is based on a sumoferrors, involving the values of the dependent variable directly calculated from the candidate expression. While this approach is extremely successful in many inst ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Symbolic regression is a popular genetic programming (GP) application. Typically, the fitness function for this task is based on a sumoferrors, involving the values of the dependent variable directly calculated from the candidate expression. While this approach is extremely successful in many instances, its performance can deteriorate in the presence of noise. In this paper, a featurebased fitness function is considered, in which the fitness scores are determined by comparing the statistical features of the sequence of values, rather than the actual values themselves. The set of features used in the fitness evaluation are customized according to the target, and are drawn from a wide set of features capable of characterizing a variety of behaviours. Experiments examining the performance of the featurebased and standard fitness functions are carried out for nonoscillating and oscillating targets in a GP system which introduces noise during the evaluation of candidate expressions. Results show strength in the featurebased fitness function, especially for the oscillating target.
Early Prediction on Time Series: A Nearest Neighbor Approach ∗
"... In this paper, we formulate the problem of early classification of time series data, which is important in some timesensitive applications such as healthinformatics. We introduce a novel concept of MPL (Minimum Prediction Length) and develop ECTS (Early Classification on Time Series), an effective ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
In this paper, we formulate the problem of early classification of time series data, which is important in some timesensitive applications such as healthinformatics. We introduce a novel concept of MPL (Minimum Prediction Length) and develop ECTS (Early Classification on Time Series), an effective 1nearest neighbor classification method. ECTS makes early predictions and at the same time retains the accuracy comparable to that of a 1NN classifier using the fulllength time series. Our empirical study using benchmark time series data sets shows that ECTS works well on the real data sets where 1NN classification is effective. 1
Mining Heterogeneous Multivariate TimeSeries for Learning Meaningful Patterns: Application to Home Health Telecare. Laboratory TIMCIMAG, Facult'e de m'edecine de
, 2004
"... For the last years, timeseries mining has become a challenging issue for researchers. An important application lies in most monitoring purposes, which require analyzing large sets of timeseries for learning usual patterns. Any deviation from this learned profile is then considered as an unexpected ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
For the last years, timeseries mining has become a challenging issue for researchers. An important application lies in most monitoring purposes, which require analyzing large sets of timeseries for learning usual patterns. Any deviation from this learned profile is then considered as an unexpected situation. Moreover, complex applications may involve the temporal study of several heterogeneous parameters. In that paper, we propose a method for mining heterogeneous multivariate timeseries for learning meaningful patterns. The proposed approach allows for mixed timeseries – containing both pattern and nonpattern data – such as for imprecise matches, outliers, stretching and global translating of patterns instances in time. We present the early results of our approach in the context of monitoring the health status of a person at home. The purpose is to build a behavioral profile of a person by analyzing the time variations of several quantitative or qualitative parameters recorded through a provision of sensors installed in the home.
Evolving Stochastic Processes Using Feature Tests and Genetic Programming
, 2009
"... The synthesis of stochastic processes using genetic programming is investigated. Stochastic process behaviours take the form of time series data, in which quantities of interest vary over time in a probabilistic, and often noisy, manner. A suite of statistical feature tests are performed on time ser ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
The synthesis of stochastic processes using genetic programming is investigated. Stochastic process behaviours take the form of time series data, in which quantities of interest vary over time in a probabilistic, and often noisy, manner. A suite of statistical feature tests are performed on time series plots from example processes, and the resulting feature values are used as targets during evolutionary search. A process algebra, the stochastic πcalculus, is used to denote processes. Investigations consider variations of GP representations for a subset of the stochastic πcalculus, for example, the use of channel unification, and various grammatical constraints. Target processes of varying complexity are studied. Results show that the use of grammatical GP with statistical feature tests can successfully synthesize stochastic processes. Success depends upon a selection of appropriate feature tests for characterizing the target behaviour, and the complexity of the target process.
The Evolution of Higherlevel Biochemical Reaction Models
, 2010
"... Computational tools for analyzing biochemical phenomena are becoming increasingly important. Recently, highlevel formal languages for modeling and simulating biochemical reactions have been proposed. These languages make the formal modeling of complex reactions accessible to domain specialists outs ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Computational tools for analyzing biochemical phenomena are becoming increasingly important. Recently, highlevel formal languages for modeling and simulating biochemical reactions have been proposed. These languages make the formal modeling of complex reactions accessible to domain specialists outside of theoretical computer science. This research explores the use of genetic programming to automate the construction of models written in one such language. Given a description of desired timecourse data, the goal is for genetic programming to construct a model that might generate the data. The language investigated is Kahramano˘gullari’s and Cardelli’s PIM language. The PIM syntax is defined in a grammarguided genetic programming system. All time series generated during simulations are described by statistical feature tests, and the fitness evaluation compares feature proximity between the target and candidate solutions. Target PIM models of varying complexity are used as target expressions for genetic programming. Results were very successful in all cases. One reason for this success is the compositional nature of PIM, which is amenable to genetic program search.
Finding structural similarity in time series data using BagofPatterns representation
 In M. Winslett (Ed.), Proceedings of the 21st international conference on scientif ic and statistical database management (SSDBM 2009
, 2009
"... Abstract. For more than one decade, time series similarity search has been given a great deal of attention by data mining researchers. As a result, many time series representations and distance measures have been proposed. However, most existing work on time series similarity search focuses on findi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. For more than one decade, time series similarity search has been given a great deal of attention by data mining researchers. As a result, many time series representations and distance measures have been proposed. However, most existing work on time series similarity search focuses on finding shapebased similarity. While some of the existing approaches work well for short time series data, they typically fail to produce satisfactory results when the sequence is long. For long sequences, it is more appropriate to consider the similarity based on the higherlevel structures. In this work, we present a histogrambased representation for time series data, similar to the “bag of words” approach that is widely accepted by the text mining and information retrieval communities. We show that our approach outperforms the existing methods in clustering, classification, and anomaly detection on several real datasets.
Categorizing classes of signals by means of fuzzy gradual rules
"... This paper presents an approach to the approximate description of univariate realvalued functions in terms of precise or imprecise reference points and interpolation between these points. It is achieved by means of gradual rules which express that the closer the variable to the abscissa of a refere ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper presents an approach to the approximate description of univariate realvalued functions in terms of precise or imprecise reference points and interpolation between these points. It is achieved by means of gradual rules which express that the closer the variable to the abscissa of a reference point, the closer the value of the function to the ordinate of this reference point. Gradual rules enable us to specify sophisticated gauges, under the form of connected areas, inside of which the function belonging to the class under consideration should remain. This provides a simple and efficient tool for categorizing signals. This tool can be further improved by making the gauge flexible by means of fuzzy gradual rules. This is illustrated on a benchmark example. 1