Results 1  10
of
15
Fast subsequence matching in timeseries databases
 Proceedings of the 1994 ACM SIGMOD International Conference on Management of data
, 1994
"... We present an ecient indexing method to locate 1dimensional subsequences within a collection of sequences, such that the subsequences match a given (query) pattern within a specied tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space. The ..."
Abstract

Cited by 529 (24 self)
 Add to MetaCart
(Show Context)
We present an ecient indexing method to locate 1dimensional subsequences within a collection of sequences, such that the subsequences match a given (query) pattern within a specied tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*tree [9]. In more detail, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an ecient and eective algorithm to divide such trails into subtrails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case eciently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times. 1
FastMap: A Fast Algorithm for Indexing, DataMining and Visualization of Traditional and Multimedia Datasets
, 1995
"... A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [25]. Thus, we can subsequently use highly finetuned spatial access methods (SAMs), to answer several types ..."
Abstract

Cited by 499 (23 self)
 Add to MetaCart
(Show Context)
A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [25]. Thus, we can subsequently use highly finetuned spatial access methods (SAMs), to answer several types of queries, including the `Query By Example' type (which translates to a range query); the `all pairs' query (which translates to a spatial join [8]); the nearestneighbor or bestmatch query, etc. However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points. This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some kdimensional space (k is userdefined), such that the dissimilarities are preserved. There are two benefits from this mapping: (a) efficient ret...
Efficient Retrieval of Similar Time Sequences Under Time Warping
, 1997
"... Fast similarity searching in large timesequence databases has attracted a lot of research interest [1, 5, 2, 6, 3, 10]. All of them use the Euclidean distance (L 2), or some variation of L ..."
Abstract

Cited by 217 (5 self)
 Add to MetaCart
(Show Context)
Fast similarity searching in large timesequence databases has attracted a lot of research interest [1, 5, 2, 6, 3, 10]. All of them use the Euclidean distance (L 2), or some variation of L
FastMap: AFast Algorithm for Indexing, DataMining and
 Visualization of Traditional and Multimedia Datasets. ACM SIGMOD Conference Proceedings
, 1995
"... Avery promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [Jag91]. Thus, we can subsequently use highly netuned spatial access methods (SAMs), to answer several types ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Avery promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [Jag91]. Thus, we can subsequently use highly netuned spatial access methods (SAMs), to answer several types of queries, including the `Query By Example ' type (which translates to a range query) � the `all pairs ' query (which translates to a spatial join [BKSS94]) � the nearestneighbor or bestmatch query, etc. However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points. This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some kdimensional space (k is userde ned), such that the dissimilarities are preserved. There are two bene ts from this mapping: (a) e cient retrieval, in conjunction with a SAM, as discussed before and (b) visualization and datamining: the objects can now be plotted as points in 2d or 3d space, revealing potential clusters, correlations among attributes and other regularities that datamining is looking for. We introduce an older method from pattern recognition, namely, MultiDimensional Scaling (MDS) [Tor52] � although unsuitable for indexing, we use it as yardstick for our method. Then, we propose a much faster algorithm to solve the problem in hand, while in addition it allows for indexing. Experiments on real and synthetic data indeed show that the proposed algorithm is signi cantly faster than MDS, (being linear, as opposed to quadratic, on the database size N), while it manages to preserve distances and the overall structure of the dataset. 1
An efficient parallel algorithm for high dimensional similarity join
 In IPPS: 11th International Parallel Processing Symposium. IEEE Computer
, 1998
"... Multidimensional similarity join finds pairs of multidimensional points that are within some small distance of each other: The 6kdB tree has been proposed as a data structure that scales better as the number of dimensions increases compared to previous data structures. We present a cost model o ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Multidimensional similarity join finds pairs of multidimensional points that are within some small distance of each other: The 6kdB tree has been proposed as a data structure that scales better as the number of dimensions increases compared to previous data structures. We present a cost model of the EkdB tree and use it to optimize the leaf size. We present novel parallel algorithms for the similarity join using the EkdB tree. A loadbalancing strategy based on equidepth histograms is shown to work well for uniform or lowskew situations, whereas another based on weighted equidepth histograms works far better for highskew datasets. The latter strategy is only slightly slower than the former strategy for low skew datasets. Furthel; its cost is proportional to the overall cost of the similarity join. 1.
Ring Current Influence on Auroral Electrojet Predictions
, 1999
"... . Geomagnetic storms and substorms develop under strong control of the solar wind. This is demonstrated by the fact that the geomagnetic activity indices Dst and AE can be predicted from the solar wind alone. A consequence of the strong control by a common source is that substorm and storm indices t ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
. Geomagnetic storms and substorms develop under strong control of the solar wind. This is demonstrated by the fact that the geomagnetic activity indices Dst and AE can be predicted from the solar wind alone. A consequence of the strong control by a common source is that substorm and storm indices tend to be highly correlated. However, a part of this correlation is likely to be an eect of internal magnetospheric processes, such as a ringcurrent modulation of the solar windAE relation. The present work extends previous studies of nonlinear AE predictions from the solar wind. It is examined whether the AE predictions are modulated by the Dst index.This is accomplished by comparing neural network predictions from Dst and the solar wind, with predictions from the solar wind alone. Two conclusions are reached: (1) with an optimal set of solarwind data available, the AE predictions are not markedly improved by the Dst input, but (2) the AE predictions are improved by Dst if less than, o...
MANY
"... AbstractÐMany emerging data mining applications require a similarity join between points in a highdimensional domain. We present a new algorithm that utilizes a new index structure, called the tree, for fast spatial similarity joins on highdimensional points. This index structure reduces the numbe ..."
Abstract
 Add to MetaCart
AbstractÐMany emerging data mining applications require a similarity join between points in a highdimensional domain. We present a new algorithm that utilizes a new index structure, called the tree, for fast spatial similarity joins on highdimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of finding appropriate branches in the internal nodes. The storage cost for internal nodes is independent of the number of dimensions. Hence, the proposed index structure scales to highdimensional data. We analyze the cost of the join for the tree and the Rtree family, and show that the tree will perform better for highdimensional joins. Empirical evaluation, using synthetic and reallife data sets, shows that similarity join using the tree is twice to an order of magnitude faster than the R ‡ tree, with the performance gap increasing with the number of dimensions. We also discuss how some of the ideas of the tree can be applied to the Rtree family. These biased Rtrees perform better than the corresponding traditional Rtrees for highdimensional similarity joins, but do not match the performance of the tree. Index TermsÐData mining, similar time sequences, similarity join. æ
FastMap: A Fast Algorithm for Indexing, . . .
, 1995
"... A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [Jag91]. Thus, we can subsequently use highly finetuned spatial access methods (SAMs), to answer several ..."
Abstract
 Add to MetaCart
A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [Jag91]. Thus, we can subsequently use highly finetuned spatial access methods (SAMs), to answer several types of queries, including the `Query By Example' type (which translates to a range query); the `all pairs' query (which translates to a spatial join [BKSS94]); the nearestneighbor or bestmatch query, etc. However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points. This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some kdimensional space (k is userdefined), such that the dissimilarities are preserved. There are two benefits from this mapping: ...
Abstract Fast Subsequence Matching in TimeSeries Databases
"... We present an efficient indexing method to locate 1dimeneional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space ..."
Abstract
 Add to MetaCart
We present an efficient indexing method to locate 1dimeneional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*tree [9]. In more deteil, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an efficient and effective algorithm to divide such trails into subtrails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case efficiently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times. 1