Results 1 
7 of
7
Fast retrieval of similar subsequences in long sequence databases
 In 3 rd IEEE Knowledge and Data Engineering Exchange Workshop
, 1999
"... shpark,dongwon,wwc¡ Although the Euclidean distance has been the most popular similarity measure in sequence databases, recent techniques prefer to use highcost distance functions such as the time warping distance and the editing distance for wider applicability. However, if these distance function ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
(Show Context)
shpark,dongwon,wwc¡ Although the Euclidean distance has been the most popular similarity measure in sequence databases, recent techniques prefer to use highcost distance functions such as the time warping distance and the editing distance for wider applicability. However, if these distance functions are applied to the retrieval of similar subsequences, the number of subsequences to be inspected during the search is quadratic to the ¢ average length of data sequences. In this paper, we propose a novel subsequence matching scheme, called the aligned subsequence matching, where the number of subsequences to be compared with a query sequence is reduced to ¢ linear to. We also present an indexing technique to speedup the aligned subsequence matching using the similarity measure of the modified time warping distance. The experiments on the synthetic data sequences demonstrate the effectiveness of our proposed approach; ours consistently outperformed the sequential scanning and achieved up to 6.5 times speedup. 1.
AIMS: An Immersidata Management System
, 2003
"... Weintroduce a system to address the challenges involved in managing the multidimensional sensor data streams generated within immersiveenvironments. We call this data type, immersidata,which is defined as the data acquired from a user's interactions with an immersiveenvironment. Managemen ..."
Abstract

Cited by 22 (16 self)
 Add to MetaCart
Weintroduce a system to address the challenges involved in managing the multidimensional sensor data streams generated within immersiveenvironments. We call this data type, immersidata,which is defined as the data acquired from a user's interactions with an immersiveenvironment. Managementof immersidata is challenging because they are: 1) multidimensional, 2) spatiotemporal, 3) continuous data streams (CDS), 4) large in size and bandwidth requirements, and 5) noisy.
FastMap: AFast Algorithm for Indexing, DataMining and
 Visualization of Traditional and Multimedia Datasets. ACM SIGMOD Conference Proceedings
, 1995
"... Avery promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [Jag91]. Thus, we can subsequently use highly netuned spatial access methods (SAMs), to answer several types ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Avery promising idea for fast searching in traditional and multimedia databases is to map objects into points in kd space, using k featureextraction functions, provided by a domain expert [Jag91]. Thus, we can subsequently use highly netuned spatial access methods (SAMs), to answer several types of queries, including the `Query By Example ' type (which translates to a range query) � the `all pairs ' query (which translates to a spatial join [BKSS94]) � the nearestneighbor or bestmatch query, etc. However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points. This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some kdimensional space (k is userde ned), such that the dissimilarities are preserved. There are two bene ts from this mapping: (a) e cient retrieval, in conjunction with a SAM, as discussed before and (b) visualization and datamining: the objects can now be plotted as points in 2d or 3d space, revealing potential clusters, correlations among attributes and other regularities that datamining is looking for. We introduce an older method from pattern recognition, namely, MultiDimensional Scaling (MDS) [Tor52] � although unsuitable for indexing, we use it as yardstick for our method. Then, we propose a much faster algorithm to solve the problem in hand, while in addition it allows for indexing. Experiments on real and synthetic data indeed show that the proposed algorithm is signi cantly faster than MDS, (being linear, as opposed to quadratic, on the database size N), while it manages to preserve distances and the overall structure of the dataset. 1
Index Interpolation: An Approach to Subsequence Matching Supporting Normalization Transform in TimeSeries Databases
"... In this paper, w epropose a subsequence matching algorithm that supports normalization transform in timeseries databases. Normalization transform enables nding sequences with similar uctuation patterns although they are not close to each other before the normalization transform. Application of the e ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, w epropose a subsequence matching algorithm that supports normalization transform in timeseries databases. Normalization transform enables nding sequences with similar uctuation patterns although they are not close to each other before the normalization transform. Application of the existing whole matching algorithm supporting normalization transform to the subsequence matching is feasible, but requires an index for ev ery possible length of the query sequence causing serious overhead on both storage space and update time. The proposed algorithm generates indexes only for a small number of di erent lengths of query sequences. F or subsequence matching it selects the most appropriate index among them. We can obtain better searc h performance by using more indexes. We callour approach index interp olation. We formally pro ve that the proposed algorithm does not cause false dismissal. F or performance evaluation, we haveconducted experiments using the indexes for only ve di erent lengths out of the lengths 256 512 of the query sequence. The results show that the proposed algorithm outperforms the sequential scan by up to 14.6 times on the average when the selectivity of the query is 10;5. 1
INFORMATION FILTERING AND RETRIEVAL: Overview, Issues and Directions BASIS FOR A PANEL DISCUSSION
"... Thispaperisintended to serve as a springboard for a panel discussion with audience participation on information ltering and retrieval. Medical informatics is an emerging specialtywhichlinks medicine and information technology. With the unprecedented availability of digital information, retrieval an ..."
Abstract
 Add to MetaCart
Thispaperisintended to serve as a springboard for a panel discussion with audience participation on information ltering and retrieval. Medical informatics is an emerging specialtywhichlinks medicine and information technology. With the unprecedented availability of digital information, retrieval and ltering are becoming important aspects of medical informatics. An overview of medical applications for ltering and retrieval is provided, and important stateoftheart techniques are introduced in a way that does not presuppose prior knowledge in this eld. With this basis for understanding issues and research developments, the ensuing discussion will examine challenges and opportunities for workers in this eld. 1
Parallel Algorithms for Highdimensional Proximity Joins
"... We consider the problem of parallelizing highdimensional proximity joins. We present a parallel multidimensional join algorithm based on an the epsilonkdB tree and compare it with the more common approach of space partitioning. An evaluation of the algorithms on an IBM SP2 sharednothing multiproce ..."
Abstract
 Add to MetaCart
(Show Context)
We consider the problem of parallelizing highdimensional proximity joins. We present a parallel multidimensional join algorithm based on an the epsilonkdB tree and compare it with the more common approach of space partitioning. An evaluation of the algorithms on an IBM SP2 sharednothing multiprocessor is presented using both synthetic and reallife datasets. We also examine the e ectiveness of the algorithms in the context of a speci c datamining problem, that of nding similar timeseries. The empirical results show that our algorithm exhibits good performance and scalability, aswell an ability to handle dataskew. 1
Highdimensional Proximity Joins
"... Many emerging data mining applications require a proximity (similarity) join between points in a highdimensional domain. We present a new algorithm that utilizes a new data structure, called the fflkd tree, for fast spatial proximity joins on highdimensional points. This data structure reduces th ..."
Abstract
 Add to MetaCart
Many emerging data mining applications require a proximity (similarity) join between points in a highdimensional domain. We present a new algorithm that utilizes a new data structure, called the fflkd tree, for fast spatial proximity joins on highdimensional points. This data structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of finding appropriate branches in the internal nodes. The storage cost for internal nodes is independent of the number of dimensions. Hence the proposed data structure scales to highdimensional data. We analyze the cost of the join for the fflkd tree and the Rtree family, and show that the fflkd tree will perform better for highdimensional joins. Empirical evaluation, using synthetic and reallife datasets, shows that proximity join using the fflkd tree is typically 2 to 40 times faster than the R + tree, with the performance gap increasing with the number of dimensions. We also ...