Results 1  10
of
42
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
, 1998
"... The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimens ..."
Abstract

Cited by 1019 (40 self)
 Add to MetaCart
The nearest neighbor problem is the following: Given a set of n points P = fp 1 ; : : : ; png in some metric space X, preprocess P so as to efficiently answer queries which require finding the point in P closest to a query point q 2 X. We focus on the particularly interesting case of the ddimensional Euclidean space where X = ! d under some l p norm. Despite decades of effort, the current solutions are far from satisfactory; in fact, for large d, in theory or in practice, they provide little improvement over the bruteforce algorithm which compares the query point to each data point. Of late, there has been some interest in the approximate nearest neighbors problem, which is: Find a point p 2 P that is an fflapproximate nearest neighbor of the query q in that for all p 0 2 P , d(p; q) (1 + ffl)d(p 0 ; q). We present two algorithmic results for the approximate version that significantly improve the known bounds: (a) preprocessing cost polynomial in n and d, and a trul...
Verifying candidate matches in sparse and wildcard matching
 In Proceedings on 34th Annual ACM Symposium on Theory of Computing (STOC 2002
, 2002
"... This paper obtains the following results on pattern mat
hing problems in whi
h the text has length n and the pattern has length m. An O(n logm) time deterministi
algorithm for the String Mat
hing with Wild
ards problems, even when the alphabet is large. AnO(k log 2 m) time Las Vegas algorithm for ..."
Abstract

Cited by 52 (3 self)
 Add to MetaCart
This paper obtains the following results on pattern mat
hing problems in whi
h the text has length n and the pattern has length m. An O(n logm) time deterministi
algorithm for the String Mat
hing with Wild
ards problems, even when the alphabet is large. AnO(k log 2 m) time Las Vegas algorithm for the Sparse String Mat
hing with Wild
ards problem, where k << n is the number of nonzeros in the text. We also give Las Vegas algorithms for the higher dimensional version of this problem. As an appli
ation of the above, an O(n log 2 m) time Las Vegas algorithm for the Subset Mat
hing and Tree Pattern Mat
hing problems, and a Las Vegas algorithm for the Geometri
Pattern Mat
hing problem. Finally, an O(n log 2 m) time deterministi
algorithm for Subset Mat
hing and Tree Pattern Mat
hing. The ru
ial new idea underlying the rst three results above is that of
onrming mat
hes by
onvolving ve
tors obtained by oding
hara
ters in the alphabet with nonboolean (i.e., rational or even
omplex) entries; in
ontrast, almost all previous pattern mat
hing algorithms
onsider only boolean odes for the alphabet. The
ru
ial new idea underlying the fourth result is a simpler method of shifting
hara
ters whi h ensures that ea
h
hara
ter o
urs as a singleton in some shift.
4Points Congruent Sets for Robust Pairwise Surface Registration
 INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES
, 2008
"... We introduce 4PCS, a fast and robust alignment scheme for 3D point sets that uses wide bases, which are known to be resilient to noise and outliers. The algorithm allows registering raw noisy data, possibly contaminated with outliers, without prefiltering or denoising the data. Further, the method ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
We introduce 4PCS, a fast and robust alignment scheme for 3D point sets that uses wide bases, which are known to be resilient to noise and outliers. The algorithm allows registering raw noisy data, possibly contaminated with outliers, without prefiltering or denoising the data. Further, the method significantly reduces the number of trials required to establish a reliable registration between the underlying surfaces in the presence of noise, without any assumptions about starting alignment. Our method is based on a novel technique to extract all coplanar 4points sets from a 3D point set that are approximately congruent, under rigid transformation, to a given set of coplanar 4points. This extraction procedure runs in roughly O(n2 + k) time, where n is the number of candidate points and k is the number of reported 4points sets. In practice, when noise level is low and there is sufficient overlap, using local descriptors the time complexity reduces to O(n + k). We also propose an extension to handle similarity and affine transforms. Our technique achieves an order of magnitude asymptotic acceleration compared to common randomized alignment techniques. We demonstrate the robustness of our algorithm on several sets of multiple range scans with varying degree of noise, outliers, and extent of overlap.
Tree pattern matching and subset matching in deterministic O(n log3 n)time
, 1999
"... Tree pattern matching and subset matching in deterministic O(n log ..."
Abstract

Cited by 37 (5 self)
 Add to MetaCart
Tree pattern matching and subset matching in deterministic O(n log
Pattern Matching for Spatial Point Sets
 PROC. 39TH ANNU. IEEE SYMPOS. FOUND. COMPUT. SCI
, 1998
"... Two sets of points in ddimensional space are given: a data set D consisting of N points, and a pattern set or probe P consisting of k points. We address the problem of determining whether there is a transformation, among a specified group of transformations of the space, carrying P into or near (me ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
Two sets of points in ddimensional space are given: a data set D consisting of N points, and a pattern set or probe P consisting of k points. We address the problem of determining whether there is a transformation, among a specified group of transformations of the space, carrying P into or near (meaning at a small directed Hausdorff distance of) D. The groups we consider are translations and rigid motions. Runtimes of approximately O(n log n) and O(n d log n) respectively are obtained (letting n = maxfN; kg and omitting the effects of several secondary parameters). For translations, a runtime of approximately O(n(ak + 1) log² n) is obtained for the case that a constant fraction a ! 1 of the points of the probe is allowed to fail to match.
Comparing Graph Representations of Protein Structure for Mining FamilySpecific ResidueBased Packing Motifs
 Journal of Computational Biology
, 2005
"... We find recurring aminoacid residue packing patterns, or spatial motifs, that are characteristic of protein structural families, by applying a novel frequent subgraph mining algorithm to graph representations of protein threedimensional structure. Graph nodes represent amino acids, and edges are c ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
(Show Context)
We find recurring aminoacid residue packing patterns, or spatial motifs, that are characteristic of protein structural families, by applying a novel frequent subgraph mining algorithm to graph representations of protein threedimensional structure. Graph nodes represent amino acids, and edges are chosen in one of three ways: first, using a threshold for contact distance between residues; second, using Delaunay tessellation; and third, using the recently developed almostDelaunay edges. For a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, subgraph mining typically identifies several hundred common subgraphs corresponding to spatial motifs that are frequently found in proteins in the family but rarely found outside of it. We find that some of the large motifs map onto known functional regions in two protein families explored in this study, i.e., serine proteases and kinases. We find that graphs based on almostDelaunay edges significantly reduce the number of edges in the graph representation and hence present computational advantage, yet the patterns extracted from such graphs have a biological interpretation approximately equivalent to that of those extracted from distance based graphs. Key words: protein structure motifs, frequent subgraph mining, almostDelaunay. 1.
Dense Point Sets Have Sparse Delaunay Triangulations
"... Delaunay triangulations and Voronoi diagrams are one of the most thoroughly studies objects in computational geometry, with numerous applications including nearestneighbor searching, clustering, finiteelement mesh generation, deformable surface modeling, and surface reconstruction. Many algorithms ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
(Show Context)
Delaunay triangulations and Voronoi diagrams are one of the most thoroughly studies objects in computational geometry, with numerous applications including nearestneighbor searching, clustering, finiteelement mesh generation, deformable surface modeling, and surface reconstruction. Many algorithms in these application domains begin by constructing the Delaunay triangulation or Voronoi diagram of a set of points in R³. Since threedimensional Delaunay triangulations can have complexity Ω(n²) in the worst case, these algorithms have worstcase running time \Omega (n2). However, this behavior is almost never observed in practice except for highlycontrived inputs. For all practical purposes, threedimensional Delaunay triangulations appear to have linear complexity. This frustrating
Approximate Nearest Neighbor Algorithms for Hausdorff Metrics via Embeddings
"... Hausdorff metrics are used in geometric settings for measuring the distance between sets of points. They ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
Hausdorff metrics are used in geometric settings for measuring the distance between sets of points. They
Mining Protein Family Specific Residue Packing Patterns from Protein Structure Graphs
 In Proc. of Research in Computational Molecular Biology
, 2004
"... Finding recurring residue packing patterns, or spatial motifs, that characterize protein structural families is an important problem in bioinformatics. To this end, we apply a novel frequent subgraph mining algorithm to three graph representations of protein threedimensional (3D) structure. In each ..."
Abstract

Cited by 31 (12 self)
 Add to MetaCart
(Show Context)
Finding recurring residue packing patterns, or spatial motifs, that characterize protein structural families is an important problem in bioinformatics. To this end, we apply a novel frequent subgraph mining algorithm to three graph representations of protein threedimensional (3D) structure. In each protein graph, a vertex represents an amino acid. Vertexresidues are connected by edges using three approaches: first, based on simple distance threshold between contact residues; second using the Delaunay tessellation from computational geometry, and third using the recently developed almostDelaunay tessellation approach. Applying this approach to a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, we typically identify several hundred common subgraphs equivalent to common packing motifs found in the majority of proteins in the family. We also use the counts of motifs extracted from proteins in two different SCOP families as input variables in a binary classification experiment using Support Vector Machines. The resulting models are capable of predicting the protein family association with the accuracy exceeding 90 percent. Our results indicate that graphs based on both almostDelaunay and Delaunay tessellations are more sparse than contact distance graph; yet the former afford similar accuracy of classification as the latter. The protein graph mining and classification approaches developed in this paper can be used for rapid and automated annotation of protein structures determined in structural genomics projects.
Rapid: Randomized pharmacophore identification for drug design
 In Proceedings of the International Symposium on Computational Geometry
, 1997
"... This paper describes a randomized approach for finding invariants in a set of flexible ligands (drug molecules) that underlies an integrated software system called RAPID currently under development. An invariant is a collection of features embedded in < 3 which is present in one or more of the po ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
This paper describes a randomized approach for finding invariants in a set of flexible ligands (drug molecules) that underlies an integrated software system called RAPID currently under development. An invariant is a collection of features embedded in < 3 which is present in one or more of the possible lowenergy conformations of each ligand. Such invariants of chemically distinct molecules are useful for computational chemists since they may represent candidate pharmacophores. A pharmacophore contains the parts of the ligand that are primarily responsible for its interaction and binding with a specific receptor. It is regarded as an inverse image of a receptor and is used as a template for building more effective pharmaceutical drugs. The identification of pharmacophores is crucial in drug design since the structure of the targeted receptor is frequently unknown, but a number of molecules that interact with the receptor have been discovered by experiments. It is expected that our techniques and the results produced by our system will prove useful in other applications such as molecular database screening and comparative molecular field analysis.