Results 1  10
of
24
Scaling up accurate phylogenetic reconstruction from geneorder data
, 2002
"... Motivation: Phylogenetic reconstruction from geneorder data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distancebased methods (such as neighborjoining), parsimony methods using sequencebased encod ..."
Abstract

Cited by 30 (14 self)
 Add to MetaCart
Motivation: Phylogenetic reconstruction from geneorder data has attracted increasing attention from both biologists and computer scientists over the last few years. Methods used in reconstruction include distancebased methods (such as neighborjoining), parsimony methods using sequencebased encodings, Bayesian approaches, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach, but cannot handle more than about 15 genomes of limited size (e.g., organelles). Results: We report here on our successful efforts to scale up direct optimization through a twostep approach: the first step decomposes the dataset into smaller pieces and runs the direct optimization (GRAPPA) on the smaller pieces, while the second step builds a tree from the results obtained on the smaller pieces. We used the sophisticated diskcovering method (DCM) pioneered by Warnow and her group, suitably modified to take into account the computational limitations of GRAPPA. We find that DCMGRAPPA scales gracefully to at least 1,000 genomes of a few hundred genes each and retains surprisingly high accuracy throughout the range: in our experiments, the topological error rate rarely exceeded a few percent. Thus, reconstruction based on geneorder data can now be accomplished with high accuracy on datasets of significant size. Availability: All of our software is available in source form under GPL at www.compbio.unm.edu Contact:
Learning Latent Tree Graphical Models
 J. of Machine Learning Research
, 2011
"... We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing me ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our algorithms can be applied to both discrete and Gaussian random variables and our learned models are such that all the observed and latent variables have the same domain (state space). Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using socalled information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a preprocessing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures such as neighborjoining) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare
Performance of supertree methods on various dataset decompositions
 Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, volume 3 of Computational Biology
, 2004
"... Many largescale phylogenetic reconstruction methods attempt to solve hard optimization problems (such as Maximum Parsimony (MP) and Maximum Likelihood (ML)), but they are severely limited by the number of taxa that they can handle in a reasonable time frame. A standard heuristic approach to this pr ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
Many largescale phylogenetic reconstruction methods attempt to solve hard optimization problems (such as Maximum Parsimony (MP) and Maximum Likelihood (ML)), but they are severely limited by the number of taxa that they can handle in a reasonable time frame. A standard heuristic approach to this problem is the divideandconquer strategy: decompose the dataset into smaller subsets, solve the subsets (i.e., use MP or ML on each subset to obtain trees), then combine the solutions to the subsets into a solution to the original dataset. This last step, combining given trees into a single tree, is known as supertree construction in computational phylogenetics. The traditional application of supertree methods is to combine existing, published phylogenies into a single phylogeny. Here, we study supertree construction in the context of divideandconquer methods for largescale tree reconstruction. We study several divideandconquer approaches and experimentally demonstrate their advantage over Matrix Representation Parsimony (MRP), a traditional supertree technique, and over global heuristics such as the parsimony ratchet. On the ten large biological datasets under investigation, our study shows that the techniques used for dividing the dataset into subproblems as well as those used for merging them into a single solution strongly influence the quality of the supertree construction. In most cases, our merging technique—the Strict Consensus Merger (SCM)—outperforms MRP with respect to MP scores and running time. Divideandconquer techniques are also a highly competitive alternative to global heuristics such as the parsimony ratchet, especially on the more challenging datasets. 1
Stougie L: Constructing level2 phylogenetic networks from triplets. 2007. arXiv:0707.2890v1 [qbio.PE
"... Abstract—Jansson and Sung showed that, given a dense set of input triplets T (representing hypotheses about the local evolutionary relationships of triplets of taxa), it is possible to determine in polynomial time whether there exists a level1 network consistent with T, and if so, to construct such ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
Abstract—Jansson and Sung showed that, given a dense set of input triplets T (representing hypotheses about the local evolutionary relationships of triplets of taxa), it is possible to determine in polynomial time whether there exists a level1 network consistent with T, and if so, to construct such a network [24]. Here, we extend this work by showing that this problem is even polynomial time solvable for the construction of level2 networks. This shows that, assuming density, it is tractable to construct plausible evolutionary histories from input triplets even when such histories are heavily nontreelike. This further strengthens the case for the use of tripletbased methods in the construction of phylogenetic networks. We also implemented the algorithm and applied it to yeast data. 1 Index Terms—Phylogenetic networks, level2, triplets, reticulations, polynomial time algorithms. Ç
An Investigation of Phylogenetic Likelihood Methods
, 2003
"... We analyze the performance of likelihoodbased approaches used to reconstruct phylogenetic trees. Unlike other techniques such as NeighborJoining (NJ) and Maximum Parsimony (MP), relatively little is known regarding the behavior of algorithms founded on the principle of likelihood. ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
We analyze the performance of likelihoodbased approaches used to reconstruct phylogenetic trees. Unlike other techniques such as NeighborJoining (NJ) and Maximum Parsimony (MP), relatively little is known regarding the behavior of algorithms founded on the principle of likelihood.
Catalog of small trees
, 2005
"... This chapter is concerned with the description of the Small Trees website which can be found at the following web address: ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
This chapter is concerned with the description of the Small Trees website which can be found at the following web address:
Why neighborjoining works
, 2006
"... Abstract. We show that the neighborjoining algorithm is a robust quartet method for constructing trees from distances. This leads to a new performance guarantee that contains Atteson’s optimal radius bound as a special case and explains many cases where neighborjoining is successful even when Atte ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Abstract. We show that the neighborjoining algorithm is a robust quartet method for constructing trees from distances. This leads to a new performance guarantee that contains Atteson’s optimal radius bound as a special case and explains many cases where neighborjoining is successful even when Atteson’s criterion is not satisfied. We also provide a proof for Atteson’s conjecture on the optimal edge radius of the neighbor joining algorithm. The strong performance guarantees we provide also hold for the quadratic time fast neighborjoining algorithm, thus providing a theoretical basis for inferring very large phylogenies with neighborjoining. 1.
Absolute Convergence: True Trees From Short Sequences
, 2001
"... Fastconverging methods for reconstructing phylogenetic trees require that the sequences characterizing the taxa be of only polynomial length, a major asset in practice, since reallife sequences are of bounded length. However, of the halfdozen such methods proposed over the last few years, only tw ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Fastconverging methods for reconstructing phylogenetic trees require that the sequences characterizing the taxa be of only polynomial length, a major asset in practice, since reallife sequences are of bounded length. However, of the halfdozen such methods proposed over the last few years, only two fulfill this condition without requiring knowledge of typically unknown parameters, such as the evolutionary rate(s) used in the model; this additional requirement severely limits the applicability of the methods. We say that methods that need such knowledge demonstrate relative fast convergence, since they rely upon an oracle. We focus on the class of methods that do not require such knowledge and thus demonstrate absolute fast convergence. We give a very general construction scheme that not only turns any relative fastconverging method into an absolute fastconverging one, but also turns any statistically consistent method that converges from sequences of length O(e O(diam(T)) ) into an absolute fastconverging method.
Reconstructing optimal phylogenetic trees: a challenge in experimental algorithmics
 Experimental Algorithmics, volume 2547 of Lecture Notes in Computer Science
, 2002
"... ..."
Quartet methods for phylogeny reconstruction from gene orders
 Dept. CS and Engin., Univ. SouthCarolina
, 2005
"... Abstract. Phylogenetic reconstruction from generearrangement data has attracted increasing attention from biologists and computer scientists. Methods used in reconstruction include distancebased methods, parsimony methods using sequencebased encodings, and direct optimization. The latter, pioneer ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Abstract. Phylogenetic reconstruction from generearrangement data has attracted increasing attention from biologists and computer scientists. Methods used in reconstruction include distancebased methods, parsimony methods using sequencebased encodings, and direct optimization. The latter, pioneered by Sankoff and extended by us with the software suite GRAPPA, is the most accurate approach; however, its exhaustive approach means that it can be applied only to small datasets of fewer than 15 taxa. While we have successfully scaled it up to 1,000 genomes by integrating it with a diskcovering method (DCMGRAPPA), the recursive decomposition may need many levels of recursion to handle datasets with 1,000 or more genomes. We thus investigated quartetbased approaches, which directly decompose the datasets into subsets of four taxa each; such approaches have been well studied for sequence data, but not for generearrangement data. We give an optimization algorithm for the NPhard problem of computing optimal trees for each quartet, present a variation of the dyadic method (using heuristics to choose suitable short quartets), and use both in simulation studies. We find that our quartetbased method can handle more genomes than the base version of GRAPPA, thus enabling us to reduce the number of levels of recursion in DCMGRAPPA, but is more sensitive to the rate of evolution, with error rates rapidly increasing when saturation is approached. 1