Results 1  10
of
11
Fast subsequence matching in timeseries databases
 PROCEEDINGS OF THE 1994 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA
, 1994
"... We present an efficient indexing method to locate 1dimensional subsequences within a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space ..."
Abstract

Cited by 528 (24 self)
 Add to MetaCart
(Show Context)
We present an efficient indexing method to locate 1dimensional subsequences within a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*tree [9]. In more detail, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an ecient and eective algorithm to divide such trails into subtrails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case efficiently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times.
Efficient similarity search in sequence databases
, 1994
"... We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Anot ..."
Abstract

Cited by 505 (21 self)
 Add to MetaCart
We propose an indexing method for time sequences for processing similarity queries. We use the Discrete Fourier Transform (DFT) to map time sequences to the frequency domain, the crucial observation being that, for most sequences of practical interest, only the first few frequencies are strong. Another important observation is Parseval's theorem, which specifies that the Fourier transform preserves the Euclidean distance in the time or frequency domain. Having thus mapped sequences to a lowerdimensionality space by using only the first few Fourier coe cients, we use Rtrees to index the sequences and e ciently answer similarity queries. We provide experimental results which show that our method is superior to search based on sequential scanning. Our experiments show that a few coefficients (13) are adequate to provide good performance. The performance gain of our method increases with the number and length of sequences.
A nearest trajectory strategy for time series prediction
 Proceedings of the International Workshop on Advanced BlackBox Techniques for Nonlinear Modeling
"... AbstractA method of local modeling for predicting time series generated by nonlinear dynamic systems is proposed that incorporates a weighted Euclidean metric and a novel ρsteps ahead crossvalidation error to assess model accuracy. The tradeoff between the cost of computation and model accuracy is ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
AbstractA method of local modeling for predicting time series generated by nonlinear dynamic systems is proposed that incorporates a weighted Euclidean metric and a novel ρsteps ahead crossvalidation error to assess model accuracy. The tradeoff between the cost of computation and model accuracy is discussed in the context of optimizing model parameters. A fast nearest neighbor algorithm and a novel modification to find neighboring trajectory segments are described. I.
Hidden markov independent component analysis
 Advances in independent component analysis
, 2000
"... ..."
(Show Context)
Support Vector Machines and Learning about Time
, 2003
"... The analysis of temporal data is an important issue in current research, because most realworld data either explicitly or implicitly contains some information about time. The key to successfully solving temporal learning tasks is to analyze the assumptions that can be made and prior knowledge one h ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
The analysis of temporal data is an important issue in current research, because most realworld data either explicitly or implicitly contains some information about time. The key to successfully solving temporal learning tasks is to analyze the assumptions that can be made and prior knowledge one has about the temporal process of the learning problem and find a representation of the data and a learning algorithm that makes effective use of this knowledge. This paper will present a concise overview of the application of Support Vector Machines to different temporal learning tasks and the corresponding temporal representations.
Products and Sums of TreeStructured Gaussian Processes
"... Recently Hinton (1999) has introduced the Products of Experts (PoE) model. In this paper we consider a PoE model in which each expert is a Gaussian, giving rise to a product model that is also Gaussian. However, if we constrain each expert to be a treestructured Gaussian process (TSGP) the product ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Recently Hinton (1999) has introduced the Products of Experts (PoE) model. In this paper we consider a PoE model in which each expert is a Gaussian, giving rise to a product model that is also Gaussian. However, if we constrain each expert to be a treestructured Gaussian process (TSGP) the product of these then has a more complex structure than the individual trees. By way of comparison we also consider the framework within which the resultant process is constructed from the sum of treestructured Gaussian processes. The result of this method is also a Gaussian process. We investigate the approximation of various target stationary processes with these Product of Experts and Sum of Experts models. Our results show that the preferred choice between the two models depends on the type of target process. We also show that for AR(1) and MA(2) target processes, an exact representation of these processes using only two component TSGPs can be found. Recently Hinton (1999...
unknown title
, 1999
"... opinions, ndings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily re
ect the views of the National Science Foundation, DARPA, or other funding parties. ..."
Abstract
 Add to MetaCart
(Show Context)
opinions, ndings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily re
ect the views of the National Science Foundation, DARPA, or other funding parties.
Prediction as a Knowledge Representation Problem: A Case Study in Model Design
, 2002
"... The WITAS project aims to develop technologies to enable an Unmanned Airial Vehicle (UAV) to operate autonomously and intelligently, in applications such as trac surveillance and remote photogrammetry. Many of the necessary control and reasoning tasks, e.g. state estimation, reidenti cation, planni ..."
Abstract
 Add to MetaCart
The WITAS project aims to develop technologies to enable an Unmanned Airial Vehicle (UAV) to operate autonomously and intelligently, in applications such as trac surveillance and remote photogrammetry. Many of the necessary control and reasoning tasks, e.g. state estimation, reidenti cation, planning and diagnosis, involve prediction as an important component. Prediction relies on models, and such models can take a variety of forms. Model design involves many choices with many alternatives for each choice, and each alternative carries advantages and disadvantages that may be far from obvious. In spite of this, and of the important role of prediction in so many areas, the problem of predictive model design is rarely studied on its own.
c © TÜB_ITAK Testing the Residuals of an ARIMA Model on the Cekerek Stream Watershed in Turkey
"... In ARIMA modeling studies, the selection of a best model t to historical data is directly related to whether residual analysis is performed well. Therefore, diagnostic checks including the independence, normality and homoscedasticity of residuals is the most important stage of ARIMA model building. ..."
Abstract
 Add to MetaCart
In ARIMA modeling studies, the selection of a best model t to historical data is directly related to whether residual analysis is performed well. Therefore, diagnostic checks including the independence, normality and homoscedasticity of residuals is the most important stage of ARIMA model building. This study is concerned with testing residuals from ARIMA models for monthly streamflow data from the Cekerek Stream watershed. Alternative tests including the LjungBox Q statistic, runs test and turning point test for independence analysis of the residuals; KolmogorovSmirnov and AndersonDarling tests for normality of residuals; and GoldfeldQuandt, Breusch and Pagan and Spearman’s rho approaches for the homoscedasticity of residuals were used. The selected parsimony model for each data set among the ARIMA models fullled the diagnostic checks, considering the Schwarz Bayesian criterion. Key words: ARIMA model, Monthly streamflow, C ekerek stream, Diagnostic checks.
To Buy or Not to Buy: Mining Airfare Data to Minimize Ticket
"... As product prices become increasingly available on the World Wide Web, consumers attempt to understand how corporations vary these prices over time. However, corporations change prices based on proprietary algorithms and hidden variables (e.g., the number of unsold seats on a
ight). Is it possibl ..."
Abstract
 Add to MetaCart
(Show Context)
As product prices become increasingly available on the World Wide Web, consumers attempt to understand how corporations vary these prices over time. However, corporations change prices based on proprietary algorithms and hidden variables (e.g., the number of unsold seats on a
ight). Is it possible to develop data mining techniques that will enable consumers to predict price changes under these conditions? This paper reports on a pilot study in the domain of airline ticket prices where we recorded over 12,000 price observations over a 41 day period. When trained on this data, Hamlet  our multistrategy data mining algorithm  generated a predictive model that saved 341 simulated passengers $198,074 by advising them when to buy and when to postpone ticket purchases. Remarkably, a clairvoyant algorithm with complete knowledge of future prices could save at most $320,572 in our simulation, thus Hamlet's savings were 61.8 % of optimal. The algorithm's savings of $198,074 represents an average savings of 23.8 % for the 341 passengers for whom savings are possible. Overall, Hamlet saved 4.4 % of the ticket price averaged over the entire set of 4,488 simulated passengers. Our pilot study suggests that mining of price data available over the web has the potential to save consumers substantial sums of money per annum.