## Robust and fast similarity search for moving object trajectories (2005)

### Cached

### Download Links

- [www.cs.uiuc.edu]
- [softbase.uwaterloo.ca]
- [softbase.uwaterloo.ca]
- [cs.uwaterloo.ca]
- [cs.uwaterloo.ca]
- DBLP

### Other Repositories/Bibliography

Venue: | In SIGMOD |

Citations: | 93 - 14 self |

### BibTeX

@INPROCEEDINGS{Chen05robustand,

author = {Lei Chen and M. Tamer Özsu},

title = {Robust and fast similarity search for moving object trajectories},

booktitle = {In SIGMOD},

year = {2005},

pages = {491--502}

}

### Years of Citing Articles

### OpenURL

### Abstract

An important consideration in similarity-based retrieval of moving object trajectories is the definition of a distance function. The existing distance functions are usually sensitive to noise, shifts and scaling of data that commonly occur due to sensor failures, errors in detection techniques, disturbance signals, and different sampling rates. Cleaning data to eliminate these is not always possible. In this paper, we introduce a novel distance function, Edit Distance on Real sequence (EDR) which is robust against these data imperfections. Analysis and comparison of EDR with other popular distance

### Citations

2334 |
Algorithms for Clustering Data
- Jain, Dubes
- 1988
(Show Context)
Citation Context ... Language signs, and it is a 10 class data set with 5 trajectories per class 2 . For each data set, we take all possible pairs of classes and use the “complete linkage” hierarchy clustering algorithm =-=[16]-=-, which was reported to produce the best clustering results [36], to partition them into two clusters. We draw the dendrogram of each clustered result to see whether it correctly partitions the trajec... |

1358 |
Binary codes capable of correcting deletions insertions and reversals
- Levenshtein
- 1966
(Show Context)
Citation Context ...e previous section. EDR is more robust and accurate than the existing ones in measuring the similarity between two trajectories. 3.1 Edit Distance on Real Sequences EDR is based on Edit Distance (ED) =-=[26]-=-, which is widely used in bio-informatics and speech recognition to measure the similarity between two strings. Given two strings A and B, ED(A, B) is the number of insert, delete, or replace operatio... |

1086 |
Features of similarity
- Tversky
- 1977
(Show Context)
Citation Context ...namic Partial Function (DPF) [12], that do not follow triangle inequality. Furthermore, much work in psychology also suggests that human similarity judgements do not follow triangle inequality either =-=[32]-=-. Therefore, given a “good”, robust but non-metric distance function, the issue is how to improve the retrieval efficiency for similarity search. The computation cost of EDR by dynamic programming is ... |

536 | A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces
- WEBER, SCHEK, et al.
- 1998
(Show Context)
Citation Context ...ly applies to one-dimensional strings, and naive implementation of Q-grams on multi-dimensional trajectories will not only increase the space cost but may also suffer the dimensionality curse problem =-=[38]-=-. Finally, Theorem 1 applies only to range queries (searching strings with at most k edit operation to the query string). In most cases, users may not know the range a priori. In these situations, k-N... |

520 | Comparing images using the Hausdorff distance
- HUTTENLOCHER, KLANDERMAN, et al.
- 1993
(Show Context)
Citation Context ...stance functions that are robust to noisy data will usually violate triangle inequality. Many robust distance functions have been proposed in the domain of image retrieval, such as Hausdorff distance =-=[14]-=- and Dynamic Partial Function (DPF) [12], that do not follow triangle inequality. Furthermore, much work in psychology also suggests that human similarity judgements do not follow triangle inequality ... |

443 | Efficient Similarity Search in Sequence Databases
- Agrawal, Faloutsos, et al.
- 1993
(Show Context)
Citation Context ...iderable research has been conducted on similarity-based retrieval on one-dimensional time series data, such as stock or commodity prices, sales volume, weather data and biomedical measurements (e.g. =-=[1, 24, 20, 23, 40]-=-). Unfortunately, the distance functions and indexing methods proposed for one-dimensional time series data can not be directly applied to moving object trajectories due to their unique characteristic... |

258 | Exact Indexing of Dynamic Time Warping
- Ratanamahatana, Keogh
- 2004
(Show Context)
Citation Context ...roduce local shifts into trajectories (i.e., the trajectories follow similar paths, but certain sub-paths are shifted in time). Even though the similarity measures, such as Dynamic Time Warping (DTW) =-=[41, 8, 19]-=-, and Edit distance with Real Penalty (ERP) [6], can be used to measure the similarity between trajectories with local shifts, they are sensitive to noise. Since existing similarity measures can not r... |

252 | Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases
- Keogh, Chakrabarti, et al.
(Show Context)
Citation Context ...iderable research has been conducted on similarity-based retrieval on one-dimensional time series data, such as stock or commodity prices, sales volume, weather data and biomedical measurements (e.g. =-=[1, 24, 20, 23, 40]-=-). Unfortunately, the distance functions and indexing methods proposed for one-dimensional time series data can not be directly applied to moving object trajectories due to their unique characteristic... |

237 | On the need for time series data mining benchmarks: a survey and empirical demonstration
- Keogh, Kasetty
- 2002
(Show Context)
Citation Context ...ies of trajectories when the trajectories contain little or no noise. The second test uses classification of labelled data to evaluate the efficacy of a distance function, as proposed by Keogh et al. =-=[21]-=-. Specifically, each trajectory is assigned a class label. Then the “leave one out” prediction mechanism is applied to each trajectory 1 University of California, Irvine: http://kdd.ics.uci.edu. 2 The... |

217 | Efficient time series matching by wavelets
- Chan, Fu
- 1999
(Show Context)
Citation Context ...iderable research has been conducted on similarity-based retrieval on one-dimensional time series data, such as stock or commodity prices, sales volume, weather data and biomedical measurements (e.g. =-=[1, 24, 20, 23, 40]-=-). Unfortunately, the distance functions and indexing methods proposed for one-dimensional time series data can not be directly applied to moving object trajectories due to their unique characteristic... |

189 | Efficient retrieval of similar time sequences under time warping
- Yi, Jagadish, et al.
- 1998
(Show Context)
Citation Context ...roduce local shifts into trajectories (i.e., the trajectories follow similar paths, but certain sub-paths are shifted in time). Even though the similarity measures, such as Dynamic Time Warping (DTW) =-=[41, 8, 19]-=-, and Edit distance with Real Penalty (ERP) [6], can be used to measure the similarity between trajectories with local shifts, they are sensitive to noise. Since existing similarity measures can not r... |

188 | Discovering similar multidimensional trajectories
- Vlachos, Kollios, et al.
- 2002
(Show Context)
Citation Context ... in videos). Thus, due to sensor failures, disturbance signals or errors in detection techniques, many outliers may appear. Longest Common Subsequences (LCSS) has been applied to address this problem =-=[36]-=-; however, it does not consider various gap between similar subsequences, which leads to inaccuracy. The gap refers to a sub-trajectory in between two identified similar components of two trajectories... |

174 | Approximate string joins in a database (Almost) for free
- Gravano, Ipeirotis, et al.
(Show Context)
Citation Context ...ure work. 4.1 Pruning by Mean Value Q-gram Given a string S, a Q-gram of S is defined as a substring of size q. Q-grams have been well studied as a solution to the approximate string matching problem =-=[17, 31, 10]-=-, which is defined as follows: given a long text of length n and a pattern of length m (m ≤ n), retrieve all the segments of the text whose edit distance to the pattern is at most k. If a pattern and ... |

157 |
Fast time sequence indexing for arbitrary Lp norms
- Yi, Faloutsos
- 2000
(Show Context)
Citation Context |

124 |
Novel Approaches in Query Processing for Moving Object Trajectories
- Pfoser, Jensen, et al.
- 2000
(Show Context)
Citation Context ...ts can be ignored. This separates similaritybased retrieval from queries in spatio-temporal databases where time components of trajectories are important to answer time slice or time interval queries =-=[28]-=-. Considerable research has been conducted on similarity-based retrieval on one-dimensional time series data, such as stock or commodity prices, sales volume, weather data and biomedical measurements ... |

108 | Efficiently supporting ad hoc queries in large datasets of time sequences
- Korn, Jagadish, et al.
- 1997
(Show Context)
Citation Context |

106 | On similarity queries for time series data: Constraint specification and implementation
- Goldin, Kanellakis
- 1995
(Show Context)
Citation Context ...d techniques can be extended to more than two dimensions. Given S, we can normalize its x and y position values using the corresponding mean (µx), (µy) and standard deviation (σx), (σy), respectively =-=[13]-=-: Norm(S) = [(t1, ( s1,x−µx σx ( sn,x−µx σx , sn,y−µy σy , s1,y−µy σy )), . . . , (tn, ))]. Normalization is recommended so that the distance between two trajectories is invariant to spatial scaling a... |

96 | Indexing multi-dimensional time-series with support for multiple distance measures
- Vlachos, Hadjieleftheriou, et al.
- 2003
(Show Context)
Citation Context ...ly perform better than mean value Q-grams on removing false candidates. 5.4 Efficiency of Combined Methods We test the combination of methods proposed in Section 4.4 on NHL data [5], a mixed data set =-=[34]-=- and a randomwalk trajectory data set [6, 19]. The NHL data consists of 5000 two dimensional trajectories of National Hockey League players and their trajectory lengths vary from 30 to 256. The mixed ... |

79 | Classification with Non-Metric Distances: Image Retrieval and Class Representation
- Jacobs, Weinshall, et al.
- 2000
(Show Context)
Citation Context ..., making it non-metric and, thus, non-indexable by traditional distance-based indexing methods. However, this does not mean that EDR is not a “good” distance function. As pointed out by Jacobs et. al =-=[15]-=-, it is not the poor selection of features or careless design that cause a distance function not to follow triangle inequality. Inherently, distance functions that are robust to noisy data will usuall... |

71 |
Two algorithms for approximate string matching in static texts
- Jokinen, Ukkonen
- 1991
(Show Context)
Citation Context ...ure work. 4.1 Pruning by Mean Value Q-gram Given a string S, a Q-gram of S is defined as a substring of size q. Q-grams have been well studied as a solution to the approximate string matching problem =-=[17, 31, 10]-=-, which is defined as follows: given a long text of length n and a pattern of length m (m ≤ n), retrieve all the segments of the text whose edit distance to the pattern is at most k. If a pattern and ... |

61 | Making time-series classification more accurate using learned constraints
- Ratanamahatana, Keogh
- 2004
(Show Context)
Citation Context ...aximum standard deviation of trajectories leads to the best clustering results, which is also confirmed by [33]. The same ɛ value is used for LCSS and EDR. In order to make a fair comparison with DTW =-=[29]-=-, we also test DTW with different warping lengths and report the best results. Since Euclidean distance requires sequences with the same length, we apply the strategy used in [36], where the shorter o... |

60 | The string edit distance matching problem with moves
- Cormode, Muthukrishnan
- 2002
(Show Context)
Citation Context ...or space. To avoid false dismissals, the distance in the embedded space is required to be the lower bound of the edit distance on strings. A number of embedding methods have been proposed for strings =-=[2, 3, 9, 18]-=-; however, only two of these [18, 2] avoid introducing false dismissals. Both of these take a similar approach in that they transform strings into a multidimensional integer space by mapping strings t... |

59 | Scaling up dynamic time warping for datamining applications
- Keogh, Pazzani
- 2000
(Show Context)
Citation Context ...using four distance functions on two labelled data sets. The two labelled trajectory data sets are the “Cameramouse” (CM) [11] and the Australian Sign Language (ASL) data sets which were also used in =-=[22, 36]-=-. The “Cameramouse” data set contains 15 trajectories of 5 words (3 for each word) obtained by tracking the finger tips of people as they “write” various words. The ASL data set from UCI KDD data arch... |

51 | Variable length queries for time series data
- Kahveci, Singh
- 2001
(Show Context)
Citation Context ...or space. To avoid false dismissals, the distance in the embedded space is required to be the lower bound of the edit distance on strings. A number of embedding methods have been proposed for strings =-=[2, 3, 9, 18]-=-; however, only two of these [18, 2] avoid introducing false dismissals. Both of these take a similar approach in that they transform strings into a multidimensional integer space by mapping strings t... |

51 | Similarity search for multidimensional data sequences
- Lee, Chun, et al.
- 2000
(Show Context)
Citation Context ...dexing is based on exact equality match on real values, which is not suitable for similarity search. Our work focuses on trajectory retrieval and EDR is defined based on range value match. Lee et al. =-=[25]-=- use the distance between minimum bounding rectangle to compute the distance between two multidimensional sequences. Even though they can achieve very high recall, the distance function can not avoid ... |

44 |
Filtration with q-samples in approximate string matching
- Sutinen, Tarhio
- 1996
(Show Context)
Citation Context ...ure work. 4.1 Pruning by Mean Value Q-gram Given a string S, a Q-gram of S is defined as a substring of size q. Q-grams have been well studied as a solution to the approximate string matching problem =-=[17, 31, 10]-=-, which is defined as follows: given a long text of length n and a pattern of length m (m ≤ n), retrieve all the segments of the text whose edit distance to the pattern is at most k. If a pattern and ... |

41 | Matching and indexing sequences of different lengths
- Bozkaya, Yazdani, et al.
- 1997
(Show Context)
Citation Context ... studies. Most importantly, we show the three pruning methods can be combined to deliver superior retrieval efficiency. Limited work has been done on multidimensional time-series data. Bozkaya et al. =-=[4]-=- present a modified version of LCSS to compute the distance between two sequences. In order to answer the similarity-based queries efficiently, an index scheme is designed based on the lengths of the ... |

39 | A spatio temporal semantic model for multimedia presentations and multimedia database systems
- Chen, Kashyap
(Show Context)
Citation Context ...roduce local shifts into trajectories (i.e., the trajectories follow similar paths, but certain sub-paths are shifted in time). Even though the similarity measures, such as Dynamic Time Warping (DTW) =-=[41, 8, 19]-=-, and Edit distance with Real Penalty (ERP) [6], can be used to measure the similarity between trajectories with local shifts, they are sensitive to noise. Since existing similarity measures can not r... |

27 |
Going metric: Denoising pairwise data
- Roth, Laub, et al.
(Show Context)
Citation Context ...s). If all the trajectories have the same length, applying near triangle inequality will not remove any false candidates. We also investigate a general approach, called Constant Shift Embedding (CSE) =-=[30]-=-, to convert a distance function that does not follow triangle inequality to another one that follows. The idea is as follows. Given a distance function dist that is defined on data space D and does n... |

22 | Shape-based similarity query for trajectory of mobile objects
- Yanagisawa, Akahani, et al.
- 2003
(Show Context)
Citation Context ...ork guarantees that there are no false dismissals. Cai and Ng [5] propose an effective lower bound technique for indexing trajectories. However, Euclidean distances are used as the similarity measure =-=[25, 39, 5]-=-, and, as argued earlier, this measure is not robust to noise or time shifting which often appear in trajectory data. Little and Gu [27] use path and speed curves to represent the motion trajectories ... |

20 | Rotation Invariant Distance Measures for Trajectories
- Vlachos, Gunopulos, et al.
- 2004
(Show Context)
Citation Context ...ng which often appear in trajectory data. Little and Gu [27] use path and speed curves to represent the motion trajectories and measure the distance between two trajectories using DTW. Vlachos et al. =-=[35]-=- also use DTW on rotation invariant representation of trajectories, sequences of angle and arc-length pairs. However, DTW requires continuity along the warping path, which makes it sensitive to noise ... |

19 | A wavelet-based anytime algorithm for k-means clustering of time series
- Vlachos, Lin, et al.
- 2003
(Show Context)
Citation Context ...ance functions to handle local time shifting and noise, we add to three data sets interpolated Gaussian noise (about 10-20% of the length of trajectories) and local time shifting using the program in =-=[37]-=-. To get average values over a number of data sets, we use each raw data set as a seed and generate 50 distinct data sets that include noise and time shifting. The results are shown in Table 2. For tw... |

10 |
On the marriage of edit distance and Lp norms
- Chen, Ng
- 2004
(Show Context)
Citation Context ...ng trajectories. The trajectory S of a moving object is defined as a sequence of pairs, S = [(t1, s1) . . . , (tn, sn)], which show the successive positions of the moving object over a period of time =-=[5, 6]-=-. Here, n, the number of sample timestamps in S, is defined as the length Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee pro... |

6 |
The camera mouse: Preliminary invertigation of automated visaul tracking for computer access
- Gips, Betke, et al.
- 2000
(Show Context)
Citation Context ...EDR using the approach in [36]. Specifically, we perform hierarchy clustering using four distance functions on two labelled data sets. The two labelled trajectory data sets are the “Cameramouse” (CM) =-=[11]-=- and the Australian Sign Language (ASL) data sets which were also used in [22, 36]. The “Cameramouse” data set contains 15 trajectories of 5 words (3 for each word) obtained by tracking the finger tip... |

5 |
Video retrieval by spatial and temporal sturcture of trajectories
- Little, Gu
- 2001
(Show Context)
Citation Context ..., Euclidean distances are used as the similarity measure [25, 39, 5], and, as argued earlier, this measure is not robust to noise or time shifting which often appear in trajectory data. Little and Gu =-=[27]-=- use path and speed curves to represent the motion trajectories and measure the distance between two trajectories using DTW. Vlachos et al. [35] also use DTW on rotation invariant representation of tr... |

4 | Abbadi. BFT: Bit Filtration Technique for Approximate String Join in Biological Databases
- Aghili, Agrawal, et al.
- 2003
(Show Context)
Citation Context ...or space. To avoid false dismissals, the distance in the embedded space is required to be the lower bound of the edit distance on strings. A number of embedding methods have been proposed for strings =-=[2, 3, 9, 18]-=-; however, only two of these [18, 2] avoid introducing false dismissals. Both of these take a similar approach in that they transform strings into a multidimensional integer space by mapping strings t... |

3 |
bounds for embedding edit distance into normed spaces
- Lower
- 2003
(Show Context)
Citation Context |

3 |
Indexing spatio-temporal trajectoires with chebyshev polynomials
- Cai, Ng
- 2004
(Show Context)
Citation Context ...ng trajectories. The trajectory S of a moving object is defined as a sequence of pairs, S = [(t1, s1) . . . , (tn, sn)], which show the successive positions of the moving object over a period of time =-=[5, 6]-=-. Here, n, the number of sample timestamps in S, is defined as the length Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee pro... |

3 | Similarity-based Search Over Time Series and Trajectory Data - Chen - 2005 |

3 |
Dyndex: a dynamic and non-metric space indexer
- Chang
- 2002
(Show Context)
Citation Context ...y data will usually violate triangle inequality. Many robust distance functions have been proposed in the domain of image retrieval, such as Hausdorff distance [14] and Dynamic Partial Function (DPF) =-=[12]-=-, that do not follow triangle inequality. Furthermore, much work in psychology also suggests that human similarity judgements do not follow triangle inequality either [32]. Therefore, given a “good”, ... |