Results 1 
9 of
9
Fast Subsequence Matching in TimeSeries Databases
 SIGMOD 94
, 1994
"... We present an efficient indexing method to locate 1dimensional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space ..."
Abstract

Cited by 430 (21 self)
 Add to MetaCart
We present an efficient indexing method to locate 1dimensional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*tree [9]. In more deteil, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an efficient and effective algorithm to divide such trails into subtrails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case efficiently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times.
Efficient Retrieval of Similar Time Sequences Under Time Warping
, 1997
"... Fast similarity searching in large timesequence databases has attracted a lot of research interest [1, 5, 2, 6, 3, 10]. All of them use the Euclidean distance (L 2 ), or some variation of L p metrics. L p metrics lead to efficient indexing, thanks to feature extraction (e.g., by keeping the first ..."
Abstract

Cited by 173 (3 self)
 Add to MetaCart
Fast similarity searching in large timesequence databases has attracted a lot of research interest [1, 5, 2, 6, 3, 10]. All of them use the Euclidean distance (L 2 ), or some variation of L p metrics. L p metrics lead to efficient indexing, thanks to feature extraction (e.g., by keeping the first few DFT coefficients) and subsequent use of fast spatial access methods for the points in feature space. In this work we examine a popular, fieldtested dissimilarity function, the "time warping" distance function which permits local accelerations and decelerations in the rate of the signals or sequences. This function is natural and suitable for several applications, like matching of voice, audio and medical signals (e.g., electrocardiograms) However, from the indexing viewpoint it presents two major challenges: (a) it does not lead to any natural "features", precluding the use of spatial access methods (b) it is quadratic (O(len 1 len 2 )) on the length of the sequences involved. Here we ...
Highdimensional similarity joins
 In ICDE
, 1997
"... Many emerging data mining applications require a similarity join between points in a highdimensional domain. We present a new algorithm that utilizes a new index structure, called thekdB tree, for fast spatial similarity joins on highdimensional points. This index structure reduces the number of ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
Many emerging data mining applications require a similarity join between points in a highdimensional domain. We present a new algorithm that utilizes a new index structure, called thekdB tree, for fast spatial similarity joins on highdimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of finding appropriate branches in the internal nodes. The storage cost for internal nodes is independent of the number of dimensions. Hence the proposed index structure scales to highdimensional data. Empirical evaluation, using synthetic and reallife datasets, shows that similarity join using thekdB tree is 2 to an order of magnitude faster than the R + tree, with the performance gap increasing with the number of dimensions. 1
An efficient parallel algorithm for high dimensional similarity join
 In IPPS: 11th International Parallel Processing Symposium. IEEE Computer
, 1998
"... Multidimensional similarity join finds pairs of multidimensional points that are within some small distance of each other: The 6kdB tree has been proposed as a data structure that scales better as the number of dimensions increases compared to previous data structures. We present a cost model o ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Multidimensional similarity join finds pairs of multidimensional points that are within some small distance of each other: The 6kdB tree has been proposed as a data structure that scales better as the number of dimensions increases compared to previous data structures. We present a cost model of the EkdB tree and use it to optimize the leaf size. We present novel parallel algorithms for the similarity join using the EkdB tree. A loadbalancing strategy based on equidepth histograms is shown to work well for uniform or lowskew situations, whereas another based on weighted equidepth histograms works far better for highskew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. Furthel; its cost is proportional to the overall cost of the similarity join. 1.
Abstract Fast Subsequence Matching in TimeSeries Databases
"... We present an efficient indexing method to locate 1dimeneional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space ..."
Abstract
 Add to MetaCart
We present an efficient indexing method to locate 1dimeneional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*tree [9]. In more deteil, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an efficient and effective algorithm to divide such trails into subtrails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case efficiently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times. 1
Ring Current Influence on Auroral Electrojet Predictions
, 1999
"... . Geomagnetic storms and substorms develop under strong control of the solar wind. This is demonstrated by the fact that the geomagnetic activity indices Dst and AE can be predicted from the solar wind alone. A consequence of the strong control by a common source is that substorm and storm indices t ..."
Abstract
 Add to MetaCart
. Geomagnetic storms and substorms develop under strong control of the solar wind. This is demonstrated by the fact that the geomagnetic activity indices Dst and AE can be predicted from the solar wind alone. A consequence of the strong control by a common source is that substorm and storm indices tend to be highly correlated. However, a part of this correlation is likely to be an eect of internal magnetospheric processes, such as a ringcurrent modulation of the solar windAE relation. The present work extends previous studies of nonlinear AE predictions from the solar wind. It is examined whether the AE predictions are modulated by the Dst index.This is accomplished by comparing neural network predictions from Dst and the solar wind, with predictions from the solar wind alone. Two conclusions are reached: (1) with an optimal set of solarwind data available, the AE predictions are not markedly improved by the Dst input, but (2) the AE predictions are improved by Dst if less than, o...
The S²Tree: An Index Structure for Subsequence Matching of Spatial Objects
 In Proceedings of the 5th PacificAsic Conference on Knowledge Discovery and Data Mining (PAKDD) (Hong Kong
"... We present the S²Tree, an indexing method for subsequence matching of spatial objects. The S²Tree locates subsequences within a collection of spatial sequences, i.e., sequences made up of spatial objects, such that the subsequences match a given query pattern within a specified tolerance. Our meth ..."
Abstract
 Add to MetaCart
We present the S²Tree, an indexing method for subsequence matching of spatial objects. The S²Tree locates subsequences within a collection of spatial sequences, i.e., sequences made up of spatial objects, such that the subsequences match a given query pattern within a specified tolerance. Our method is based on (i) the stringsearching techniques that locate substrings within a string of symbols drawn from a discrete alphabet (e.g., ASCII characters) and (ii) the spatial access methods that index (unsequenced) spatial objects. Particularly, the S²Tree can be applied to solve problems such as subsequence matching of timeseries data, where features of subsequences are often extracted and mapped into spatial objects. Moreover, it supports queries such as "what is the longest common pattern of the two time series?", which previous subsequence matching algorithms find difficult to solve efficiently.
FastMap: A Fast Algorithm for Indexing, . . .
, 1995
"... A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [Jag91]. Thus, we can subsequently use highly finetuned spatial access methods (SAMs), to answer several ..."
Abstract
 Add to MetaCart
A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [Jag91]. Thus, we can subsequently use highly finetuned spatial access methods (SAMs), to answer several types of queries, including the `Query By Example' type (which translates to a range query); the `all pairs' query (which translates to a spatial join [BKSS94]); the nearestneighbor or bestmatch query, etc. However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points. This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some kdimensional space (k is userdefined), such that the dissimilarities are preserved. There are two benefits from this mapping: ...
MANY
"... AbstractÐMany emerging data mining applications require a similarity join between points in a highdimensional domain. We present a new algorithm that utilizes a new index structure, called the tree, for fast spatial similarity joins on highdimensional points. This index structure reduces the numbe ..."
Abstract
 Add to MetaCart
AbstractÐMany emerging data mining applications require a similarity join between points in a highdimensional domain. We present a new algorithm that utilizes a new index structure, called the tree, for fast spatial similarity joins on highdimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of finding appropriate branches in the internal nodes. The storage cost for internal nodes is independent of the number of dimensions. Hence, the proposed index structure scales to highdimensional data. We analyze the cost of the join for the tree and the Rtree family, and show that the tree will perform better for highdimensional joins. Empirical evaluation, using synthetic and reallife data sets, shows that similarity join using the tree is twice to an order of magnitude faster than the R ‡ tree, with the performance gap increasing with the number of dimensions. We also discuss how some of the ideas of the tree can be applied to the Rtree family. These biased Rtrees perform better than the corresponding traditional Rtrees for highdimensional similarity joins, but do not match the performance of the tree. Index TermsÐData mining, similar time sequences, similarity join. æ