Results 1  10
of
45
Learning Latent Tree Graphical Models
 J. of Machine Learning Research
, 2011
"... We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing me ..."
Abstract

Cited by 46 (10 self)
 Add to MetaCart
We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our algorithms can be applied to both discrete and Gaussian random variables and our learned models are such that all the observed and latent variables have the same domain (state space). Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using socalled information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a preprocessing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures such as neighborjoining) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare
Reconstructing Phylogenies from GeneContent and GeneOrder Data
 MATHEMATICS OF EVOLUTION AND PHYLOGENY, OLIVIER GASCUEL (ED.)
"... ..."
Scaling up accurate phylogenetic reconstruction from geneorder data
, 2002
"... Motivation: Phylogenetic reconstruction from geneorder data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distancebased methods (such as neighborjoining), parsimony methods using sequencebased encod ..."
Abstract

Cited by 36 (14 self)
 Add to MetaCart
Motivation: Phylogenetic reconstruction from geneorder data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distancebased methods (such as neighborjoining), parsimony methods using sequencebased encodings, Bayesian approaches, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach, but cannot handle more than about 15 genomes of limited size (e.g., organelles). Results: We report here on our successful efforts to scale up direct optimization through a twostep approach: the first step decomposes the dataset into smaller pieces and runs the direct optimization (GRAPPA) on the smaller pieces, while the second step builds a tree from the results obtained on the smaller pieces. We used the sophisticated diskcovering method (DCM) pioneered by Warnow and her group, suitably modified to take into account the computational limitations of GRAPPA. We find that DCMGRAPPA scales gracefully to at least 1,000 genomes of a few hundred genes each and retains surprisingly high accuracy throughout the range: in our experiments, the topological error rate rarely exceeded a few percent. Thus, reconstruction based on geneorder data can now be accomplished with high accuracy on datasets of significant size. Availability: All of our software is available in source form under GPL at www.compbio.unm.edu Contact:
Stougie L: Constructing level2 phylogenetic networks from triplets. 2007. arXiv:0707.2890v1 [qbio.PE
"... Abstract—Jansson and Sung showed that, given a dense set of input triplets T (representing hypotheses about the local evolutionary relationships of triplets of taxa), it is possible to determine in polynomial time whether there exists a level1 network consistent with T, and if so, to construct such ..."
Abstract

Cited by 33 (9 self)
 Add to MetaCart
Abstract—Jansson and Sung showed that, given a dense set of input triplets T (representing hypotheses about the local evolutionary relationships of triplets of taxa), it is possible to determine in polynomial time whether there exists a level1 network consistent with T, and if so, to construct such a network [24]. Here, we extend this work by showing that this problem is even polynomial time solvable for the construction of level2 networks. This shows that, assuming density, it is tractable to construct plausible evolutionary histories from input triplets even when such histories are heavily nontreelike. This further strengthens the case for the use of tripletbased methods in the construction of phylogenetic networks. We also implemented the algorithm and applied it to yeast data. 1 Index Terms—Phylogenetic networks, level2, triplets, reticulations, polynomial time algorithms. Ç
Performance of supertree methods on various dataset decompositions
 PHYLOGENETIC SUPERTREES: COMBINING INFORMATION TO REVEAL THE TREE OF LIFE, VOLUME 3 OF COMPUTATIONAL BIOLOGY
, 2004
"... Many largescale phylogenetic reconstruction methods attempt to solve hard optimization problems (such as Maximum Parsimony (MP) and Maximum Likelihood (ML)), but they are severely limited by the number of taxa that they can handle in a reasonable time frame. A standard heuristic approach to this pr ..."
Abstract

Cited by 22 (10 self)
 Add to MetaCart
Many largescale phylogenetic reconstruction methods attempt to solve hard optimization problems (such as Maximum Parsimony (MP) and Maximum Likelihood (ML)), but they are severely limited by the number of taxa that they can handle in a reasonable time frame. A standard heuristic approach to this problem is the divideandconquer strategy: decompose the dataset into smaller subsets, solve the subsets (i.e., use MP or ML on each subset to obtain trees), then combine the solutions to the subsets into a solution to the original dataset. This last step, combining given trees into a single tree, is known as supertree construction in computational phylogenetics. The traditional application of supertree methods is to combine existing, published phylogenies into a single phylogeny. Here, we study supertree construction in the context of divideandconquer methods for largescale tree reconstruction. We study several divideandconquer approaches and experimentally demonstrate their advantage over Matrix Representation Parsimony (MRP), a traditional supertree technique, and over global heuristics such as the parsimony ratchet. On the ten large biological datasets under investigation, our study shows that the techniques used for dividing the dataset into subproblems as well as those used for merging them into a single solution strongly influence the quality of the supertree construction. In most cases, our merging technique—the Strict Consensus Merger (SCM)—outperforms MRP with respect to MP scores and running time. Divideandconquer techniques are also a highly competitive alternative to global heuristics such as the parsimony ratchet, especially on the more challenging datasets.
An Investigation of Phylogenetic Likelihood Methods
, 2003
"... We analyze the performance of likelihoodbased approaches used to reconstruct phylogenetic trees. Unlike other techniques such as NeighborJoining (NJ) and Maximum Parsimony (MP), relatively little is known regarding the behavior of algorithms founded on the principle of likelihood. ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
We analyze the performance of likelihoodbased approaches used to reconstruct phylogenetic trees. Unlike other techniques such as NeighborJoining (NJ) and Maximum Parsimony (MP), relatively little is known regarding the behavior of algorithms founded on the principle of likelihood.
Catalog of small trees
, 2005
"... This chapter is concerned with the description of the Small Trees website which can be found at the following web address: ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
(Show Context)
This chapter is concerned with the description of the Small Trees website which can be found at the following web address:
Maximum likelihood on four taxa phylogenetic trees: analytic solutions
 Proceedings of the Seventh Annual Conference on Research in Computational Molecular Biology – RECOMB 2003
, 2003
"... Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees (Felsenstein, 1981), but nding the global optimum is a hard computational task. Because no general analytic solution is known, numeric techniques such as hill climbing or expectation maximization ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees (Felsenstein, 1981), but nding the global optimum is a hard computational task. Because no general analytic solution is known, numeric techniques such as hill climbing or expectation maximization (EM), are used in order to nd optimal parameters for a given tree. So far, analytic solutions were derived only for the simplest model three taxa, two statecharacters, under a molecular clock (MC). Quoting Ziheng Yang (2000), who initiated the analytic approach, \this seems to be the simplest case, but has many of the conceptual and statistical complexities involved in phylogenetic estimation&quot;. In this work, we give analytic solutions for four taxa, two state characters under a molecular clock. The change from three to four taxa incurs a major increase in the complexity of the underlying algebraic system, and requires novel techniques and approaches. We start by presenting the general maximum likelihood problem on phylogenetic trees as a constrained optimization problem, and the resulting system of polynomial equations. In full generality, it is infeasible to solve this system, therefore specialized tools for the MC case are developed. Four taxa rooted trees have two topologies { the fork (two subtrees with two leaves each) and the comb (one subtree with three leaves, the other with a single leaf). We combine the ultrametric properties of MC trees with the Hadamard conjugation (Hendy and Penny, 1993) to derive anumber of topology dependent identities. Employing these idenResearch supported by ISF grant 418/00. y Corresponding author.
Why neighborjoining works
, 2006
"... Abstract. We show that the neighborjoining algorithm is a robust quartet method for constructing trees from distances. This leads to a new performance guarantee that contains Atteson’s optimal radius bound as a special case and explains many cases where neighborjoining is successful even when Atte ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We show that the neighborjoining algorithm is a robust quartet method for constructing trees from distances. This leads to a new performance guarantee that contains Atteson’s optimal radius bound as a special case and explains many cases where neighborjoining is successful even when Atteson’s criterion is not satisfied. We also provide a proof for Atteson’s conjecture on the optimal edge radius of the neighbor joining algorithm. The strong performance guarantees we provide also hold for the quadratic time fast neighborjoining algorithm, thus providing a theoretical basis for inferring very large phylogenies with neighborjoining. 1.