Results 1  10
of
91
Diskcovering, a fastconverging method for phylogenetic tree reconstruction
 JOURNAL OF COMPUTATIONAL BIOLOGY
, 1999
"... The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaflabeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and diverg ..."
Abstract

Cited by 92 (10 self)
 Add to MetaCart
(Show Context)
The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaflabeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges in systematic biology. In this paper, we present a simple method, the DiskCovering Method (DCM), which boosts the performance of base phylogenetic methods under various Markov models of evolution. We analyze the performance of DCMboosted distance methods under the Jukes–Cantor Markov model of biomolecular sequence evolution, and prove that for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. We also provide an experimental study based upon simulating sequence evolution on model trees. This study confirms substantial reductions in error rates at realistic sequence lengths.
Inferring Evolutionary Trees with Strong Combinatorial Evidence
 THEORETICAL COMPUTER SCIENCE
, 1997
"... We consider the problem of inferring the evolutionary tree of a set of n species. We propose a quartet reconstruction method which specifically produces trees whose edges have strong combinatorial evidence. Let Q be a set of resolved quartets defined on the studied species, the method computes th ..."
Abstract

Cited by 73 (13 self)
 Add to MetaCart
We consider the problem of inferring the evolutionary tree of a set of n species. We propose a quartet reconstruction method which specifically produces trees whose edges have strong combinatorial evidence. Let Q be a set of resolved quartets defined on the studied species, the method computes the unique maximum subset Q of Q which is equivalent to a tree and outputs the corresponding tree as an estimate of the species' phylogeny. We use a characterization of the subset Q due to [6] to provide an O(n 4 ) incremental algorithm for this variant of the NPhard quartet consistency problem. Moreover, when chosing the resolution of the quartets by the FourPoint Method (FPM) and considering the CavenderFarris model of evolution, we show that the convergence rate of the Q method is at worst polynomial when the maximum evolutive distance between two species is bounded. We complete these theoretical results by an experimental study on real and simulated data sets. The results ...
Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs
, 1999
"... We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an n ..."
Abstract

Cited by 65 (1 self)
 Add to MetaCart
We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an nobject set, maintaining the closest pair, in O(nlog² n) time per update and O(n) space. With quadratic space, we can instead use a quadtreelike structure to achieve an optimal time bound, O(n) per update. We apply these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Gröbner bases, and local improvement algorithms for partition and placement problems. Experiments show our new methods to be faster in practice than previously used heuristics.
Object Recognition as ManytoMany Feature Matching
, 2006
"... Object recognition can be formulated as matching image features to model features. When recognition is exemplarbased, feature correspondence is onetoone. However, segmentation errors, articulation, scale difference, and withinclass deformation can yield image and model features which don’t matc ..."
Abstract

Cited by 48 (4 self)
 Add to MetaCart
Object recognition can be formulated as matching image features to model features. When recognition is exemplarbased, feature correspondence is onetoone. However, segmentation errors, articulation, scale difference, and withinclass deformation can yield image and model features which don’t match onetoone but rather manytomany. Adopting a graphbased representation of a set of features, we present a matching algorithm that establishes manytomany correspondences between the nodes of two noisy, vertexlabeled weighted graphs. Our approach reduces the problem of manytomany matching of weighted graphs to that of manytomany matching of weighted point sets in a normed vector space. This is accomplished by embedding the initial weighted graphs into a normed vector space with low distortion using a novel embedding technique based on a spherical encoding of graph structure. Manytomany vector correspondences established by the Earth Mover’s Distance framework are mapped back into manytomany correspondences between graph nodes. Empirical evaluation of the algorithm on an extensive set of recognition trials, including a comparison with two competing graph matching approaches, demonstrates both the robustness and efficacy of the overall approach.
A short proof that phylogenetic tree reconstruction by maximum likelihood is hard
 IEEE Trans Comput Biol and Bioinformatics
"... Maximum likelihood is one of the most widely used techniques to infer evolutionary histories. Although it is thought to be intractable, a proof of its hardness has been lacking. Here, we give a short proof that computing the maximum likelihood tree is NPhard by exploiting a connection between likel ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
(Show Context)
Maximum likelihood is one of the most widely used techniques to infer evolutionary histories. Although it is thought to be intractable, a proof of its hardness has been lacking. Here, we give a short proof that computing the maximum likelihood tree is NPhard by exploiting a connection between likelihood and parsimony observed by Tuffley and Steel. 1
Constructing a Tree from Homeomorphic Subtrees, with Applications to Computational Evolutionary Biology
"... We are given a set T = fT1 ; T2 ; : : : ; Tkg of rooted binary trees, each T i leaflabeled by a subset L(T i ) ae f1; 2; : : : ; ng. If T is a tree on f1; 2; : : : ; ng, we let TjL denote the minimal subtree of T induced by the nodes of L and all their ancestors. The consensus tree problem asks wh ..."
Abstract

Cited by 47 (2 self)
 Add to MetaCart
(Show Context)
We are given a set T = fT1 ; T2 ; : : : ; Tkg of rooted binary trees, each T i leaflabeled by a subset L(T i ) ae f1; 2; : : : ; ng. If T is a tree on f1; 2; : : : ; ng, we let TjL denote the minimal subtree of T induced by the nodes of L and all their ancestors. The consensus tree problem asks whether there exists a tree T such that for every i, T jL(T i ) is homeomorphic to T i . We present algorithms which test if a given set of trees has a consensus tree and if so, construct one. The deterministic algorithm takes time minfO(Nn 1=2 ); O(N + n 2 log n)g, where N = P i jT i j, and uses linear space. The randomized algorithm takes time O(N log 3 n) and uses linear space. The previous best for this problem was an 1981 O(Nn) algorithm by Aho et al. Our faster deterministic algorithm uses a new efficient algorithm for the following interesting dynamic graph problem: Given a graph G with n nodes and m edges and a sequence of b batches of one or more edge deletions, then a...
Efficient Algorithms for Inverting Evolution
 Proceedings of the ACM Symposium on the Foundations of Computer Science
, 1999
"... Evolution can be mathematically modelled by a stochastic process that operates on the DNA of species. Such models are based on the established theory that the DNA sequences, or genomes, of all extant species have been derived from the genome of the common ancestor of all species by a process of rand ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
(Show Context)
Evolution can be mathematically modelled by a stochastic process that operates on the DNA of species. Such models are based on the established theory that the DNA sequences, or genomes, of all extant species have been derived from the genome of the common ancestor of all species by a process of random mutation and natural selection. A stochastic model...
Constructing Big Trees from Short Sequences
 PROCEEDINGS OF THE 24TH INTERNATIONAL COLLOQUIUM ON AUTOMATA, LANGUAGES, AND PROGRAMMING
, 1997
"... The construction of evolutionary trees is a fundamental problem in biology, and yet methods for reconstructing evolutionary trees are not reliable when it comes to inferring accurate topologies of large divergent evolutionary trees from realistic length sequences. We address this problem and presen ..."
Abstract

Cited by 35 (6 self)
 Add to MetaCart
The construction of evolutionary trees is a fundamental problem in biology, and yet methods for reconstructing evolutionary trees are not reliable when it comes to inferring accurate topologies of large divergent evolutionary trees from realistic length sequences. We address this problem and present a new polynomial time algorithm for reconstructing evolutionary trees called the Short Quartets Method which is consistent and which has greater statistical power than other polynomial time methods, such as NeighborJoining and the 3approximation algorithm by Agarwala et al. (and the "Double Pivot" variant of the Agarwala et al. algorithm by Cohen and Farach) for the L1nearest tree problem. Our study indicates that our method will produce the correct topology from shorter sequences than can be guaranteed using these other methods.
ManytoMany Graph Matching via Metric Embedding
 In CVPR), pages I–850– I–857 vol.1
, 2003
"... Graph matching is an important component in many object recognition algorithms. Although most graph matching algorithms seek a onetoone correspondence between nodes, it is often the case that a more meaningful correspondence exists between a cluster of nodes in one graph and a cluster of nodes in ..."
Abstract

Cited by 31 (4 self)
 Add to MetaCart
(Show Context)
Graph matching is an important component in many object recognition algorithms. Although most graph matching algorithms seek a onetoone correspondence between nodes, it is often the case that a more meaningful correspondence exists between a cluster of nodes in one graph and a cluster of nodes in the other. We present a matching algorithm that establishes manytomany correspondences between nodes of noisy, vertexlabeled weighted graphs. The algorithm is based on recent developments in efficient lowdistortion metric embedding of graphs into normed vector spaces. By embedding weighted graphs into normed vector spaces, we reduce the problem of manytomany graph matching to that of computing a distributionbased distance measure between graph embeddings. We use a specific measure, the Earth Mover's Distance, to compute distances between sets of weighted vectors. Empirical evaluation of the algorithm on an extensive set of recognition trials demonstrates both the robustness and efficiency of the overall approach. 1.