Results 1  10
of
58
Learning String Edit Distance
, 1997
"... In many applications, it is necessary to determine the similarity of two strings. A widelyused notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic mo ..."
Abstract

Cited by 193 (2 self)
 Add to MetaCart
In many applications, it is necessary to determine the similarity of two strings. A widelyused notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows us to learn a string edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string edit distance with nearly one fifth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.
Flexible Syntactic Matching of Curves and its Application to Automatic Hierarchical Classification of Silhouettes
 IEEE Transactions on Pattern Analysis and Machine Intelligence
"... Curve matching is one instance of the fundamental correspondence problem. Our exible algorithm is designed to match curves under substantial deformations and arbitrary large scaling and rigid transformations. A syntactic representation is constructed for both curves, and an edit transformation which ..."
Abstract

Cited by 113 (2 self)
 Add to MetaCart
Curve matching is one instance of the fundamental correspondence problem. Our exible algorithm is designed to match curves under substantial deformations and arbitrary large scaling and rigid transformations. A syntactic representation is constructed for both curves, and an edit transformation which maps one curve to the other is found using dynamic programming. We present extensive...
On aligning curves
 IEEE TPAMI
, 2003
"... We present a novel approach to finding a correspondence (alignment) between two curves. The correspondence is based on a notion of an alignment curve which treats both curves symmetrically. We then define a similarity metric based on the alignment curve using two intrinsic properties of the curve, ..."
Abstract

Cited by 94 (3 self)
 Add to MetaCart
We present a novel approach to finding a correspondence (alignment) between two curves. The correspondence is based on a notion of an alignment curve which treats both curves symmetrically. We then define a similarity metric based on the alignment curve using two intrinsic properties of the curve, namely, length and curvature. The optimal correspondence is found by an efficient dynamicprogramming method both for aligning pairs of curve segments and pairs of closed curves, and is effective in the presence of a variety of transformations of the curve. Finally, the correspondence is shown in application to handwritten character recognition, prototype formation, and object recognition, and is potentially useful in other applications such as registration and tracking.
A new version of the NearestNeighbour Approximating and Eliminating Search Algorithm (AESA) with linear preprocessing time and memory requirements
 PATTERN RECOGNITION LETTERS 15 (1994) 917
, 1994
"... ..."
On mapmatching vehicle tracking data
 In Proc. 31st VLDB Conference
, 2005
"... Vehicle tracking data is an essential “raw ” material for a broad range of applications such as traffic management and control, routing, and navigation. An important issue with this data is its accuracy. The method of sampling vehicular movement using GPS is affected by two error sources and consequ ..."
Abstract

Cited by 47 (8 self)
 Add to MetaCart
Vehicle tracking data is an essential “raw ” material for a broad range of applications such as traffic management and control, routing, and navigation. An important issue with this data is its accuracy. The method of sampling vehicular movement using GPS is affected by two error sources and consequently produces inaccurate trajectory data. To become useful, the data has to be related to the underlying road network by means of map matching algorithms. We present three such algorithms that consider especially the trajectory nature of the data rather than simply the current position as in the typical mapmatching case. An incremental algorithm is proposed that matches consecutive portions of the trajectory to the road network, effectively trading accuracy for speed of computation. In contrast, the two global algorithms compare the entire trajectory to candidate paths in the road network. The algorithms are evaluated in terms of (i) their running time and (ii) the quality of their matching result. Two novel quality measures utilizing the Fréchet distance are introduced and subsequently used in an experimental evaluation to assess the quality of matching real tracking data to a road network. 1
A TreeEditDistance Algorithm for Comparing Simple, Closed Shapes
 In Proceedings of the 11th ACMSIAM Symposium on Discrete Algorithms (SODA
, 2000
"... We discuss a graphalgorithmic approach to comparing shapes. We focus in this paper on comparing simple closed curves in the plane. Our approach is to (1) represent such a shape by its skeleton, which is a tree embedded in the plane, and (2) compare two shapes by comparing their skeletons via tree e ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
We discuss a graphalgorithmic approach to comparing shapes. We focus in this paper on comparing simple closed curves in the plane. Our approach is to (1) represent such a shape by its skeleton, which is a tree embedded in the plane, and (2) compare two shapes by comparing their skeletons via tree editdistance. In this paper, we dene our version of tree editdistance (it diers from that previously described in the literature), and give a polynomialtime algorithm to compute the distance between two trees. 1 Introduction This paper arose out of a collaboration between a computervision researcher and an algorithms researcher. Kimia et al. [4] had previously compared shapes by comparing their graphs using a heuristic for general graphcomparison. The heuristic, due to Gold and Rangarajan [3], is based on nding a local minimum to a quadratic program. This approach had several disadvantages, however, and Kimia was searching for another approach. Klein suggested that the notion of ed...
Effective Proximity Retrieval by Ordering Permutations
, 2007
"... We introduce a new probabilistic proximity search algorithm for range and Knearest neighbor (KNN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically highdimensional, as is the case in m ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
We introduce a new probabilistic proximity search algorithm for range and Knearest neighbor (KNN) searching in both coordinate and metric spaces. Although there exist solutions for these problems, they boil down to a linear scan when the space is intrinsically highdimensional, as is the case in many pattern recognition tasks. This, for example, renders the KNN approach to classification rather slow in large databases. Our novel idea is to predict closeness between elements according to how they order their distances towards a distinguished set of anchor objects. Each element in the space sorts the anchor objects from closest to farthest to it, and the similarity between orders turns out to be an excellent predictor of the closeness between the corresponding elements. We present extensive experiments comparing our method against stateoftheart exact and approximate techniques, both in synthetic and real, metric and nonmetric databases, measuring both CPU time and distance computations. The experiments demonstrate that our technique almost always improves upon the performance of alternative techniques, in some cases by a wide margin.
Relaxing stylus typing precision by geometric pattern matching
 Proceedings of the International Conference on Intelligent User Interfaces (IUI '05
, 2005
"... Fitts ’ law models the inherent speedaccuracy tradeoff constraint in stylus typing. Users attempting to go beyond the Fitts ’ law speed ceiling will tend to land the stylus outside the targeted key, resulting in erroneous words and increasing users ’ frustration. We propose a geometric pattern mat ..."
Abstract

Cited by 21 (7 self)
 Add to MetaCart
Fitts ’ law models the inherent speedaccuracy tradeoff constraint in stylus typing. Users attempting to go beyond the Fitts ’ law speed ceiling will tend to land the stylus outside the targeted key, resulting in erroneous words and increasing users ’ frustration. We propose a geometric pattern matching technique to overcome this problem. Our solution can be used either as an enhanced spell checker or as a way to enable users to escape the Fitts ’ law constraint in stylus typing, potentially resulting in higher text entry speeds than what is currently theoretically modeled. We view the hit points on a stylus keyboard as a high resolution geometric pattern. This pattern can be matched against patterns formed by the letter key center positions of legitimate words in a lexicon. We present the development and evaluation of an “elastic ” stylus keyboard capable of correcting words even if the user misses all the intended keys, as long as the user’s tapping pattern is close enough to the intended word.
Evaluation of string distance algorithms for dialectology
 Linguistic Distances
, 2006
"... We examine various string distance measures for suitability in modeling dialect distance, especially its perception. We find measures superior which do not normalize for word length, but which are are sensitive to order. We likewise find evidence for the superiority of measures which incorporate a s ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
We examine various string distance measures for suitability in modeling dialect distance, especially its perception. We find measures superior which do not normalize for word length, but which are are sensitive to order. We likewise find evidence for the superiority of measures which incorporate a sensitivity to phonological context, realized in the form of ngrams— although we cannot identify which form of context (bigram, trigram, etc.) is best. However, we find no clear benefit in using gradual as opposed to binary segmental difference when calculating sequence distances. 1
Evidence Accumulation Clustering based on the KMeans Algorithm
 Structural, Syntactic, and Statistical Pattern Recognition, LNCS 2396:442–451
, 2002
"... The idea of evidence accumulation for the combination of multiple clusterings was recently proposed [7]. Taking the Kmeans as the basic algorithm for the decomposition of data into a large number, k, of compact clusters, evidence on pattern association is accumulated, by a voting mechanism, ove ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
The idea of evidence accumulation for the combination of multiple clusterings was recently proposed [7]. Taking the Kmeans as the basic algorithm for the decomposition of data into a large number, k, of compact clusters, evidence on pattern association is accumulated, by a voting mechanism, over multiple clusterings obtained by random initializations of the Kmeans algorithm. This produces a mapping of the clusterings into a new similarity measure between patterns. The final data partition is obtained by applying the singlelink method over this similarity matrix. In this paper we further explore and extend this idea, by proposing: (a) the combination of multiple Kmeans clusterings using variable k; (b) using cluster lifetime as the criterion for extracting the final clusters; and (c) the adaptation of this approach to string patterns.