Results 1 -
9 of
9
Fast Subsequence Matching in Time-Series Databases
- SIGMOD 94
, 1994
"... We present an efficient indexing method to locate 1-dimensional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space ..."
Abstract
-
Cited by 372 (18 self)
- Add to MetaCart
We present an efficient indexing method to locate 1-dimensional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*-tree [9]. In more deteil, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an efficient and effective algorithm to divide such trails into sub-trails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case efficiently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times.
Efficient Retrieval of Similar Time Sequences Under Time Warping
, 1997
"... Fast similarity searching in large time-sequence databases has attracted a lot of research interest [1, 5, 2, 6, 3, 10]. All of them use the Euclidean distance (L 2 ), or some variation of L p metrics. L p metrics lead to efficient indexing, thanks to feature extraction (e.g., by keeping the first ..."
Abstract
-
Cited by 156 (3 self)
- Add to MetaCart
Fast similarity searching in large time-sequence databases has attracted a lot of research interest [1, 5, 2, 6, 3, 10]. All of them use the Euclidean distance (L 2 ), or some variation of L p metrics. L p metrics lead to efficient indexing, thanks to feature extraction (e.g., by keeping the first few DFT coefficients) and subsequent use of fast spatial access methods for the points in feature space. In this work we examine a popular, field-tested dissimilarity function, the "time warping" distance function which permits local accelerations and decelerations in the rate of the signals or sequences. This function is natural and suitable for several applications, like matching of voice, audio and medical signals (e.g., electrocardiograms) However, from the indexing viewpoint it presents two major challenges: (a) it does not lead to any natural "features", precluding the use of spatial access methods (b) it is quadratic (O(len 1 len 2 )) on the length of the sequences involved. Here we ...
High-dimensional similarity joins
- In ICDE
, 1997
"... Many emerging data mining applications require a similarity join between points in a high-dimensional domain. We present a new algorithm that utilizes a new index structure, called the-kdB tree, for fast spatial similarity joins on high-dimensional points. This index structure reduces the number of ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
Many emerging data mining applications require a similarity join between points in a high-dimensional domain. We present a new algorithm that utilizes a new index structure, called the-kdB tree, for fast spatial similarity joins on high-dimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of finding appropriate branches in the internal nodes. The storage cost for internal nodes is independent of the number of dimensions. Hence the proposed index structure scales to highdimensional data. Empirical evaluation, using synthetic and real-life datasets, shows that similarity join using the-kdB tree is 2 to an order of magnitude faster than the R + tree, with the performance gap increasing with the number of dimensions. 1
An efficient parallel algorithm for high dimensional similarity join
- In IPPS: 11th International Parallel Processing Symposium. IEEE Computer
, 1998
"... Multidimensional similarity join finds pairs of multi-dimensional points that are within some small distance of each other: The 6-k-d-B tree has been proposed as a data structure that scales better as the number of dimensions in-creases compared to previous data structures. We present a cost model o ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Multidimensional similarity join finds pairs of multi-dimensional points that are within some small distance of each other: The 6-k-d-B tree has been proposed as a data structure that scales better as the number of dimensions in-creases compared to previous data structures. We present a cost model of the E-k-d-B tree and use it to optimize the leaf size. We present novel parallel algorithms for the similar-ity join using the E-k-d-B tree. A load-balancing strategy based on equi-depth histograms is shown to work well for uniform or low-skew situations, whereas another based on weighted equi-depth histograms works far better for high-skew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. Furthel; its cost is proportional to the overall cost of the similarity join. 1.
Abstract Fast Subsequence Matching in Time-Series Databases
"... We present an efficient indexing method to locate 1-dimeneional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space ..."
Abstract
- Add to MetaCart
We present an efficient indexing method to locate 1-dimeneional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*-tree [9]. In more deteil, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an efficient and effective algorithm to divide such trails into sub-trails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case efficiently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times. 1
Ring Current Influence on Auroral Electrojet Predictions
, 1999
"... . Geomagnetic storms and substorms develop under strong control of the solar wind. This is demonstrated by the fact that the geomagnetic activity indices Dst and AE can be predicted from the solar wind alone. A consequence of the strong control by a common source is that substorm and storm indices t ..."
Abstract
- Add to MetaCart
. Geomagnetic storms and substorms develop under strong control of the solar wind. This is demonstrated by the fact that the geomagnetic activity indices Dst and AE can be predicted from the solar wind alone. A consequence of the strong control by a common source is that substorm and storm indices tend to be highly correlated. However, a part of this correlation is likely to be an eect of internal magnetospheric processes, such as a ring-current modulation of the solar wind-AE relation. The present work extends previous studies of nonlinear AE predictions from the solar wind. It is examined whether the AE predictions are modulated by the Dst index.This is accomplished by comparing neural network predictions from Dst and the solar wind, with predictions from the solar wind alone. Two conclusions are reached: (1) with an optimal set of solar-wind data available, the AE predictions are not markedly improved by the Dst input, but (2) the AE predictions are improved by Dst if less than, o...
The S²-Tree: An Index Structure for Subsequence Matching of Spatial Objects
- In Proceedings of the 5th Pacific-Asic Conference on Knowledge Discovery and Data Mining (PAKDD) (Hong Kong
"... We present the S²-Tree, an indexing method for subsequence matching of spatial objects. The S²-Tree locates subsequences within a collection of spatial sequences, i.e., sequences made up of spatial objects, such that the subsequences match a given query pattern within a specified tolerance. Our meth ..."
Abstract
- Add to MetaCart
We present the S²-Tree, an indexing method for subsequence matching of spatial objects. The S²-Tree locates subsequences within a collection of spatial sequences, i.e., sequences made up of spatial objects, such that the subsequences match a given query pattern within a specified tolerance. Our method is based on (i) the string-searching techniques that locate substrings within a string of symbols drawn from a discrete alphabet (e.g., ASCII characters) and (ii) the spatial access methods that index (unsequenced) spatial objects. Particularly, the S²-Tree can be applied to solve problems such as subsequence matching of time-series data, where features of subsequences are often extracted and mapped into spatial objects. Moreover, it supports queries such as "what is the longest common pattern of the two time series?", which previous subsequence matching algorithms find difficult to solve efficiently.
FastMap: A Fast Algorithm for Indexing, . . .
, 1995
"... A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert [Jag91]. Thus, we can subsequently use highly fine-tuned spatial access methods (SAMs), to answer several ..."
Abstract
- Add to MetaCart
A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert [Jag91]. Thus, we can subsequently use highly fine-tuned spatial access methods (SAMs), to answer several types of queries, including the `Query By Example' type (which translates to a range query); the `all pairs' query (which translates to a spatial join [BKSS94]); the nearest-neighbor or best-match query, etc. However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points. This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some k-dimensional space (k is user-defined), such that the dis-similarities are preserved. There are two benefits from this mapping: ...
MANY
"... AbstractÐMany emerging data mining applications require a similarity join between points in a high-dimensional domain. We present a new algorithm that utilizes a new index structure, called the tree, for fast spatial similarity joins on high-dimensional points. This index structure reduces the numbe ..."
Abstract
- Add to MetaCart
AbstractÐMany emerging data mining applications require a similarity join between points in a high-dimensional domain. We present a new algorithm that utilizes a new index structure, called the tree, for fast spatial similarity joins on high-dimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of finding appropriate branches in the internal nodes. The storage cost for internal nodes is independent of the number of dimensions. Hence, the proposed index structure scales to high-dimensional data. We analyze the cost of the join for the tree and the R-tree family, and show that the tree will perform better for high-dimensional joins. Empirical evaluation, using synthetic and real-life data sets, shows that similarity join using the tree is twice to an order of magnitude faster than the R ‡ tree, with the performance gap increasing with the number of dimensions. We also discuss how some of the ideas of the tree can be applied to the R-tree family. These biased R-trees perform better than the corresponding traditional R-trees for high-dimensional similarity joins, but do not match the performance of the tree. Index TermsÐData mining, similar time sequences, similarity join. æ

