Results 1  10
of
15
Learning Latent Tree Graphical Models
 J. of Machine Learning Research
, 2011
"... We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing me ..."
Abstract

Cited by 44 (12 self)
 Add to MetaCart
(Show Context)
We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our algorithms can be applied to both discrete and Gaussian random variables and our learned models are such that all the observed and latent variables have the same domain (state space). Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using socalled information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a preprocessing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures such as neighborjoining) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare
T.: Spectral methods for learning multivariate latent tree structure
 In: Advances in Neural Information Processing Systems 24
, 2011
"... Abstract This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linearGaussian models, hidden Markov models, Gaussian mixture models, and M ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
(Show Context)
Abstract This work considers the problem of learning the structure of multivariate linear tree models, which include a variety of directed tree graphical models with continuous, discrete, and mixed latent variables such as linearGaussian models, hidden Markov models, Gaussian mixture models, and Markov evolutionary trees. The setting is one where we only have samples from certain observed variables in the tree, and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables are connected to each other and to the observed variables). We propose the Spectral Recursive Grouping algorithm, an efficient and simple bottomup procedure for recovering the tree structure from independent samples of the observed variables. Our finite sample size bounds for exact recovery of the tree structure reveal certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many highdimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables from secondorder statistics.
Fast and reliable reconstruction of phylogenetic trees with very short edges
 In SODA: ACMSIAM Symposium on Discrete Algorithms
, 2008
"... Phylogenetic reconstruction is the problem of reconstructing an evolutionary tree from sequences corresponding to leaves of that tree. A central goal in phylogenetic reconstruction is to be able to reconstruct the tree as accurately as possible from as short as possible input sequences. The sequence ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
(Show Context)
Phylogenetic reconstruction is the problem of reconstructing an evolutionary tree from sequences corresponding to leaves of that tree. A central goal in phylogenetic reconstruction is to be able to reconstruct the tree as accurately as possible from as short as possible input sequences. The sequence length required for correct topological reconstruction depends on certain properties of the tree, such as its depth and minimal edgeweight. Fast converging reconstruction algorithms are considered stateof theart in this sense, as they require asymptotically minimal sequence length in order to guarantee (with high probability) correct topological reconstruction of the entire tree. However, when the original phylogenetic tree contains very short edges, this minimal sequencelength is still too long for practical purposes. Short
Phylogenies without branch bounds: Contracting the short, pruning the deep
, 2009
"... We introduce a new phylogenetic reconstruction algorithm which, unlike most previous rigorous inference techniques, does not rely on assumptions regarding the branch lengths or the depth of the tree. The algorithm returns a forest which is guaranteed to contain all edges that are: 1) sufficiently lo ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
We introduce a new phylogenetic reconstruction algorithm which, unlike most previous rigorous inference techniques, does not rely on assumptions regarding the branch lengths or the depth of the tree. The algorithm returns a forest which is guaranteed to contain all edges that are: 1) sufficiently long and 2) sufficiently close to the leaves. How much of the true tree is recovered depends on the sequence length provided. The algorithm is distancebased and runs in polynomial time. 1
AlignmentFree Phylogenetic Reconstruction
, 2009
"... We introduce the first polynomialtime phylogenetic reconstruction algorithm under a model of sequence evolution allowing insertions and deletions—or indels. Given appropriate assumptions, our algorithm requires sequence lengths growing polynomially in the number of leaf taxa. Our techniques are dis ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
We introduce the first polynomialtime phylogenetic reconstruction algorithm under a model of sequence evolution allowing insertions and deletions—or indels. Given appropriate assumptions, our algorithm requires sequence lengths growing polynomially in the number of leaf taxa. Our techniques are distancebased and largely bypass the problem of multiple alignment.
Network delay inference from additive metrics, Preprint. Available at Arxiv: math.PR/0604367
, 2006
"... We use computational phylogenetic techniques to solve a central problem in inferential network monitoring. More precisely, we design a novel algorithm for multicastbased delay inference, that is, the problem of reconstructing delay characteristics of a network from endtoend delay measurements on ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
We use computational phylogenetic techniques to solve a central problem in inferential network monitoring. More precisely, we design a novel algorithm for multicastbased delay inference, that is, the problem of reconstructing delay characteristics of a network from endtoend delay measurements on network paths. Our inference algorithm is based on additive metric techniques used in phylogenetics. It runs in polynomial time and requires a sample of size only poly(log n). We also show how to recover the topology of the routing tree. 1
Phase Transition in DistanceBased Phylogeny Reconstruction
, 2013
"... We introduce a new distancebased phylogeny reconstruction technique which provably achieves, at sufficiently short branch lengths, a logarithmic sequencelength requirement—improving significantly over previous polynomial bounds for distancebased methods and matching existing results for general ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We introduce a new distancebased phylogeny reconstruction technique which provably achieves, at sufficiently short branch lengths, a logarithmic sequencelength requirement—improving significantly over previous polynomial bounds for distancebased methods and matching existing results for general methods. The technique is based on an averaging procedure that implicitly reconstructs ancestral sequences. In the same token, we extend previous results on phase transitions in phylogeny reconstruction to general timereversible models. More precisely, we show that in the socalled KestenStigum zone (roughly, a region of the parameter space where ancestral sequences are well approximated by “linear combinations ” of the observed sequences) sequences of length O(log n) suffice for reconstruction when branch lengths are discretized. Here n is the number of extant species. Our results challenge, to some extent, the conventional wisdom that estimates of evolutionary distances alone carry significantly less information about phylogenies than full sequence datasets.
DACTAL: divideandconquer trees (almost) without alignments. Bioinformatics, 28:pages i274–i282
, 2012
"... Motivation: While phylogenetic analyses of datasets containing 1000–5000 sequences are challenging for existing methods, the estimation of substantially larger phylogenies poses a problem of much greater complexity and scale. Methods: We present DACTAL, a method for phylogeny estimation that produce ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Motivation: While phylogenetic analyses of datasets containing 1000–5000 sequences are challenging for existing methods, the estimation of substantially larger phylogenies poses a problem of much greater complexity and scale. Methods: We present DACTAL, a method for phylogeny estimation that produces trees from unaligned sequence datasets without ever needing to estimate an alignment on the entire dataset. DACTAL combines iteration with a novel divideandconquer approach, so that each iteration begins with a tree produced in the prior iteration, decomposes the taxon set into overlapping subsets, estimates trees on each subset, and then combines the smaller trees into a tree on the full taxon set using a new supertree method. We prove that DACTAL is guaranteed to produce the true tree under certain conditions. We compare DACTAL to SATé and maximum likelihood trees on estimated alignments using simulated and real datasets with 1000–27 643 taxa. Results: Our studies show that on average DACTAL yields more accurate trees than the twophase methods we studied on very large datasets that are difficult to align, and has approximately the same accuracy on the easier datasets. The comparison to SATé shows that both have the same accuracy, but that DACTAL achieves this accuracy in a fraction of the time. Furthermore, DACTAL can analyze larger datasets than SATé, including a dataset with almost 28 000 sequences. Availability: DACTAL source code and results of dataset analyses are available at www.cs.utexas.edu/users/phylo/software/dactal. Contact:
DACTAL: . . . alignments
, 2012
"... Motivation: While phylogenetic analyses of datasets containing 10005000 sequences are challenging for existing methods, the estimation of substantially larger phylogenies poses a problem of much greater complexity and scale. Methods: We present DACTAL, a method for phylogeny estimation that produce ..."
Abstract
 Add to MetaCart
Motivation: While phylogenetic analyses of datasets containing 10005000 sequences are challenging for existing methods, the estimation of substantially larger phylogenies poses a problem of much greater complexity and scale. Methods: We present DACTAL, a method for phylogeny estimation that produces trees from unaligned sequence datasets without ever needing to estimate an alignment on the entire dataset. DACTAL combines iteration with a novel divideandconquer approach, so that each iteration begins with a tree produced in the prior iteration, decomposes the taxon set into overlapping subsets, estimates trees on each subset, and then combines the smaller trees into a tree on the full taxon set using a new supertree method. We prove that DACTAL is guaranteed to produce the true tree under certain conditions. We compare DACTAL to SATé (Liu et al., Science