Abstract:
We study sequence nearest neighbors (SNN). Let D be a database of n sequences; we would like to preprocess D so that given any on-line query sequence Q we can quickly find a sequence S in D for which d(S; Q) d(S; T ) for any other sequence T in D. Here d(S; Q) denotes the distance between sequences S and Q, defined to be the minimum number of edit operations needed to transform one to another (all edit operations will be reversible so that d(S; T ) = d(T; S) for any two sequences T and S). These operations correspond to the notion of similarity between sequences that we wish to capture in a given application. Natural edit operations include character edits (inserts, replacements, deletes etc), block edits (moves, copies, deletes, reversals) and block numerical transformations (scaling by an additive or a multiplicative constant). The SNN problem arises in many applications. We present the first known efficient algorithm for "approximate" nearest neighbor search for sequences with p...
Citations
|
347
|
Approximate nearest neighbors: towards removing the curse of dimension-ality
– Indyk, Motwani
- 1999
|
|
298
|
Time Warps, String Edits and Macromolecules: the Theory and Practice of Sequence Comparisons
– Sankoff, Kruskal
- 1983
|
|
195
|
The theory and computation of evolutionary distances: Pattern recognition
– Sellers
- 1980
|
|
172
|
Fast similarity search in the presence of noise, scaling, and translation in time-series databases
– Agrawal, Lin, et al.
|
|
142
|
Efficient search for approximate nearest neighbor in high dimensional spaces
– Kushilevitz, Ostrovsky, et al.
- 1998
|
|
140
|
Algorithms for approximate string matching
– Ukkonen
- 1985
|
|
130
|
Two algorithms for nearest-neighbor search in high dimensions
– Kleinberg
|
|
90
|
Binary Codes Capable of Correcting Deletions
– Levenshtein
- 1966
|
|
87
|
Fast parallel and serial approximate string matching
– Landau, Vishkin
- 1989
|
|
66
|
The string-to-string correction problem with block move
– TICHY
- 1984
|
|
64
|
Profile analysis
– Gribskov, Liithy, et al.
- 1990
|
|
49
|
Deterministic Coin Tossing and Accelerating Cascades: Micro and Macro Techniques for Designing Parallel Algorithms
– Cole, Vishkin
- 1986
|
|
44
|
Communication complexity of document exchange
– Cormode, Paterson, et al.
- 2000
|
|
43
|
Block edit models for approximate string matching
– Lopresti, Tompkins
- 1997
|
|
40
|
Approximate String Matching in Sublinear Expected Time
– Chang, Lawler
- 1990
|
|
31
|
Approximate and Dynamic Matching of Patterns Using a Labeling Paradigm
– Sahinalp, Vishkin
- 1996
|
|
26
|
Rapid identification of repeated patterns
– Karp, Miller, et al.
- 1972
|
|
23
|
On approximate nearest neighbors in non-euclidean spaces
– Indyk
- 1998
|
|
20
|
Approximate nearest neighbor algorithms for Frechet metric via product metrics
– Indyk
- 2002
|