## An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases (2001)

Venue: | In ICDE |

Citations: | 44 - 2 self |

### BibTeX

@INPROCEEDINGS{Kim01anindex-based,

author = {Sang-wook Kim and Sanghyun Park and Wesley W. Chu},

title = {An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases},

booktitle = {In ICDE},

year = {2001},

pages = {607--614}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper discusses an effective processing of similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warping fail to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. They have to scan all the database, thus suffer from serious performance degradation in large databases. Another method that hires the suffix tree, which does not assume any distance function, also shows poor performance due to the large tree size. In this paper, we propose a new novel method for similarity search that supports time warping. Our primary goal is to innovate on search performance in large databases without permitting any false dismissal. To attain this goal, we devise a new distance function D tw\Gammalb that consistently unde...

### Citations

2355 | R-Trees: A Dynamic Index Structure for Spatial Searching: Memorandum No
- Guttman, Stonebraker
- 1983
(Show Context)
Citation Context ...to a point in 4-dimensional space since a 4-tuple feature vector is extracted from a sequence for indexing. For indexing a set of 4-dimensional points, any multidimensional indexes such as the R-tree =-=[13]-=-, R + -tree [22], R -tree [3], and X-tree [5] can be used. The index construction algorithm first makes an entry h F irst(S), Last(S), Greatest(S), Smallest(S), ID(S) i for each data sequence S and th... |

1843 | Computational Geometry: An Introduction - Preparata, Shamos - 1985 |

1047 | Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles
- Beckmann, Kriegel, et al.
- 1990
(Show Context)
Citation Context ...d in such applications as voice recognition [20] and electro-cardiogram analysis. For efficient processing of similarity search, most of previous approaches [1, 2, 11] employ multidimensional indexes =-=[3, 5, 22]-=-. Yi et al. [25] claimed that the multi-dimensional indexes assuming the triangular inequality [19] directly or indirectly cause false dismissal in similarity search when their distance functions do n... |

538 | The X-Tree: An Index Structure for High-Dimensional Data - Berchtold, Keim, et al. - 1996 |

441 | Fast Subsequence Matching in Time-Series Database
- Faloutsos, Rangantathan, et al.
- 1994
(Show Context)
Citation Context ...using synthetic data sequences with different lengths. 6 Concluding Remarks Similarity Search is an operation that finds data sequences whose changing patterns are similar to that of a query sequence =-=[1, 11]-=-, and is of growing importance in such applications as data mining and data warehousing [7, 21]. Time warping is a useful transformation in such situations where the Euclidean distance is not applicab... |

439 | Efficient similarity search in sequence databases
- Agrawal, Faloutsos, et al.
- 1993
(Show Context)
Citation Context ...proach is suitable for practical applications. 1 Introduction The sequence database is a set of data sequences (hereafter, we simply call them sequences), each of which is an ordered list of elements =-=[1]-=-. Sequences of stock prices, money exchange rates, 1 temperature data, product sales data, and company growth rates are the typical examples of sequence databases [2, 11]. Similarity search is an oper... |

432 | FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets
- Faloutsos, Lin
- 1995
(Show Context)
Citation Context ...y in large databases. For resolving this performance degradation, Yi et al. [25] also proposed a method that maps a sequence of arbitrary length n into a k-dimensional point (k ! n) using the FastMap =-=[10]-=-, a feature extraction function, and then builds a multi-dimensional index on a set of mapped points. By using a multi-dimensional index, this method improves search performance significantly. However... |

429 | Data mining: an overview from a database perspective
- Chen, Han, et al.
- 1996
(Show Context)
Citation Context ...equences whose changing patterns are similar to that of a given query sequence [1, 2, 11]. Similarity search is of growing importance in many new applications such as data mining and data warehousing =-=[7, 21]-=-. Similarity search is classified into whole matching and subsequence matching [1]. Assuming that all the data and query sequences have the same length, whole matching searches for the data sequences ... |

304 | The R+- Tree: A Dynamic Index for Multi-dimensional Objects
- Sellis, Roussopoulus, et al.
- 1987
(Show Context)
Citation Context ...d in such applications as voice recognition [20] and electro-cardiogram analysis. For efficient processing of similarity search, most of previous approaches [1, 2, 11] employ multidimensional indexes =-=[3, 5, 22]-=-. Yi et al. [25] claimed that the multi-dimensional indexes assuming the triangular inequality [19] directly or indirectly cause false dismissal in similarity search when their distance functions do n... |

226 | On Packing R-trees
- Kamel, Faloutsos
- 1993
(Show Context)
Citation Context ... is the identifier of S. If there are a large number of data sequences at the stage of initial index construction, we can achieve high performance gains in construction by hiring bulk loading methods =-=[6, 14, 15]-=-. 4.3.2 Query Processing Algorithm 1 shows TW-Sim-Search, our query processing algorithm. Step-1 extracts a 4-tuple feature vector from the query sequence. Step-2 performs a square-range query on a fo... |

139 | Similarity-Based Queries for Time Series Data
- Rafiei, Mendelzon
- 1997
(Show Context)
Citation Context ...equences whose changing patterns are similar to that of a given query sequence [1, 2, 11]. Similarity search is of growing importance in many new applications such as data mining and data warehousing =-=[7, 21]-=-. Similarity search is classified into whole matching and subsequence matching [1]. Assuming that all the data and query sequences have the same length, whole matching searches for the data sequences ... |

121 | STR: A simple and efficient algorithm for R-tree packing
- Leutenegger, Lopez, et al.
- 1997
(Show Context)
Citation Context ... is the identifier of S. If there are a large number of data sequences at the stage of initial index construction, we can achieve high performance gains in construction by hiring bulk loading methods =-=[6, 14, 15]-=-. 4.3.2 Query Processing Algorithm 1 shows TW-Sim-Search, our query processing algorithm. Step-1 extracts a 4-tuple feature vector from the query sequence. Step-2 performs a square-range query on a fo... |

117 | Efficient Similarity Search - Agrawal, Faloutsos, et al. - 1993 |

116 |
Finding patterns in time series: a dynamic programming approach
- Berndt, Cliord
- 1996
(Show Context)
Citation Context ...efore, recent work on similarity search tends to support various types of transformations such as scaling [2, 8], shifting [2, 8], normalization [9, 12, 16], moving average [17, 21], and time warping =-=[4, 18, 25]-=-. Time warping is a transformation that allows any sequence element to replicate itself as many times as needed without extra costs [25]. For example, two sequences S = h20; 21; 21; 20; 20; 23; 23; 23... |

106 | On similarity queries for time-series data: Constraint specifications and implementation
- Goldin, Kanellakis
- 1995
(Show Context)
Citation Context ...es for the subsequences, contained in data sequences, that are similar to a query sequence of arbitrary length. In order to measure the similarity of any two sequences of length n, most of approaches =-=[1, 8, 11, 12, 21]-=- map the sequences into points in an n-dimensional space and compute the Euclidean distance between those points as a similarity measure. However, they often fail to search for the data sequences that... |

104 |
String Searching Algorithms
- Stephen
- 1994
(Show Context)
Citation Context ...d is only applicable to restricted applications since it cannot avoid false dismissal. Park et al. [18] proposed an efficient method for subsequence search under time warping by using the suffix tree =-=[24]-=- as its index structure. This method guarantees no false dismissal since the suffix tree does not assume any distance function. However, this method fails to provide a systematic guideline to perform ... |

89 | Finding Similar Time Series
- Das, Gunopulos, et al.
- 1997
(Show Context)
Citation Context ...ly the Euclidean distance as a similarity measure. Therefore, recent work on similarity search tends to support various types of transformations such as scaling [2, 8], shifting [2, 8], normalization =-=[9, 12, 16]-=-, moving average [17, 21], and time warping [4, 18, 25]. Time warping is a transformation that allows any sequence element to replicate itself as many times as needed without extra costs [25]. For exa... |

46 |
Fast time-series searching with scaling and shifting
- CHU, WONG
- 1999
(Show Context)
Citation Context ...es for the subsequences, contained in data sequences, that are similar to a query sequence of arbitrary length. In order to measure the similarity of any two sequences of length n, most of approaches =-=[1, 8, 11, 12, 21]-=- map the sequences into points in an n-dimensional space and compute the Euclidean distance between those points as a similarity measure. However, they often fail to search for the data sequences that... |

33 | High-dimensional Similarity Joins
- Shim, Srikant, et al.
- 1997
(Show Context)
Citation Context ... function has been widely used to measure the similarity of two sequences S and Q. L 1 is the Manhattan distance, L 2 is the Euclidean distance, and L1 is the maximum distance in any pair of elements =-=[23]-=-. L p function requires that two sequences to be compared have the same length. L p (S; Q) = 0 @ jSj X i=1 js i \Gamma q i j p 1 A 1=p ; 1sps1: Now, let us describe the time warping distance. Time war... |

11 | A generic approach to bulk loading multidimensional index structures
- Bercken, Seeger, et al.
- 1997
(Show Context)
Citation Context ... is the identifier of S. If there are a large number of data sequences at the stage of initial index construction, we can achieve high performance gains in construction by hiring bulk loading methods =-=[6, 14, 15]-=-. 4.3.2 Query Processing Algorithm 1 shows TW-Sim-Search, our query processing algorithm. Step-1 extracts a 4-tuple feature vector from the query sequence. Step-2 performs a square-range query on a fo... |

11 |
Efficient Searches for Similar
- Park, Chu, et al.
- 2000
(Show Context)
Citation Context ...efore, recent work on similarity search tends to support various types of transformations such as scaling [2, 8], shifting [2, 8], normalization [9, 12, 16], moving average [17, 21], and time warping =-=[4, 18, 25]-=-. Time warping is a transformation that allows any sequence element to replicate itself as many times as needed without extra costs [25]. For example, two sequences S = h20; 21; 21; 20; 20; 23; 23; 23... |

6 |
Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases
- LOH, KIM, et al.
- 2000
(Show Context)
Citation Context ... a similarity measure. Therefore, recent work on similarity search tends to support various types of transformations such as scaling [2, 8], shifting [2, 8], normalization [9, 12, 16], moving average =-=[17, 21]-=-, and time warping [4, 18, 25]. Time warping is a transformation that allows any sequence element to replicate itself as many times as needed without extra costs [25]. For example, two sequences S = h... |

2 |
Index Interpolation: A Subsequence Matching Algorithm Supporting Moving Average Transform of Arbitrary Order in Time-Series Databases", submitted for publication
- Loh, Kim, et al.
(Show Context)
Citation Context ...ly the Euclidean distance as a similarity measure. Therefore, recent work on similarity search tends to support various types of transformations such as scaling [2, 8], shifting [2, 8], normalization =-=[9, 12, 16]-=-, moving average [17, 21], and time warping [4, 18, 25]. Time warping is a transformation that allows any sequence element to replicate itself as many times as needed without extra costs [25]. For exa... |

2 |
Faloutsos, "Efficient Retrieval of Similar Time Sequences Under Time Warping
- Yi, Jagadish, et al.
- 1998
(Show Context)
Citation Context ...efore, recent work on similarity search tends to support various types of transformations such as scaling [2, 8], shifting [2, 8], normalization [9, 12, 16], moving average [17, 21], and time warping =-=[4, 18, 25]-=-. Time warping is a transformation that allows any sequence element to replicate itself as many times as needed without extra costs [25]. For example, two sequences S = h20; 21; 21; 20; 20; 23; 23; 23... |