Results 1  10
of
121
NeighborNet: An agglomerative method for the construction of planar phylogenetic networks
"... We introduce NeighborNet, a network construction and data representation method that combines aspects of the neighbor joining (NJ) and SplitsTree. Like NJ, NeighborNet uses agglomeration: taxa are combined into progressively larger and larger overlapping clusters. Like SplitsTree, NeighborNet constr ..."
Abstract

Cited by 315 (9 self)
 Add to MetaCart
We introduce NeighborNet, a network construction and data representation method that combines aspects of the neighbor joining (NJ) and SplitsTree. Like NJ, NeighborNet uses agglomeration: taxa are combined into progressively larger and larger overlapping clusters. Like SplitsTree, NeighborNet constructs networks rather than trees, and so can be used to represent multiple phylogenetic hypotheses simultaneously, or to detect complex evolutionary processes like recombination, lateral transfer and hybridization. NeighborNet tends to produce networks that are substantially more resolved than those made with SplitsTree. The method is e#cient (O(n ) time) and is well suited for the preliminary analyses of complex phylogenetic data. We report results of three case studies: one based on mitochondrial gene order data from early branching eukaryotes, another based on nuclear sequence data from New Zealand alpine buttercups (Ranunculi), and a third on poorly corrected synthetic data.
Reconstruction of Markov random fields from samples: Some easy observations and algorithms
, 2008
"... Markov random fields are used to model high dimensional distributions in a number of applied areas. Much recent interest has been devoted to the reconstruction of the dependency structure from independent samples from the Markov random fields. We analyze a simple algorithm for reconstructing the und ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
(Show Context)
Markov random fields are used to model high dimensional distributions in a number of applied areas. Much recent interest has been devoted to the reconstruction of the dependency structure from independent samples from the Markov random fields. We analyze a simple algorithm for reconstructing the underlying graph defining a Markov random field on n nodes and maximum degree d given observations. We show that under mild nondegeneracy conditions it reconstructs the generating graph with high probability using Θ(d log n) samples which is optimal up to a multiplicative constant. Our results seem to be the first results for general models that guarantee that the generating model is reconstructed. Furthermore, we provide explicit O(n d+2 log n) running time bound. In cases where the measure on the graph has correlation decay, the running time is O(n 2 log n) for all fixed d. We also discuss the effect of observing noisy samples and show that as long as the noise level is low, our algorithm is effective. On the other hand, we construct an example where large noise implies nonidentifiability even for generic noise and interactions. Finally, we briefly show that in some cases, models with hidden nodes can also be recovered. 1
Constructing a Tree from Homeomorphic Subtrees, with Applications to Computational Evolutionary Biology
"... We are given a set T = fT1 ; T2 ; : : : ; Tkg of rooted binary trees, each T i leaflabeled by a subset L(T i ) ae f1; 2; : : : ; ng. If T is a tree on f1; 2; : : : ; ng, we let TjL denote the minimal subtree of T induced by the nodes of L and all their ancestors. The consensus tree problem asks wh ..."
Abstract

Cited by 48 (3 self)
 Add to MetaCart
(Show Context)
We are given a set T = fT1 ; T2 ; : : : ; Tkg of rooted binary trees, each T i leaflabeled by a subset L(T i ) ae f1; 2; : : : ; ng. If T is a tree on f1; 2; : : : ; ng, we let TjL denote the minimal subtree of T induced by the nodes of L and all their ancestors. The consensus tree problem asks whether there exists a tree T such that for every i, T jL(T i ) is homeomorphic to T i . We present algorithms which test if a given set of trees has a consensus tree and if so, construct one. The deterministic algorithm takes time minfO(Nn 1=2 ); O(N + n 2 log n)g, where N = P i jT i j, and uses linear space. The randomized algorithm takes time O(N log 3 n) and uses linear space. The previous best for this problem was an 1981 O(Nn) algorithm by Aho et al. Our faster deterministic algorithm uses a new efficient algorithm for the following interesting dynamic graph problem: Given a graph G with n nodes and m edges and a sequence of b batches of one or more edge deletions, then a...
Learning Latent Tree Graphical Models
 J. of Machine Learning Research
, 2011
"... We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing me ..."
Abstract

Cited by 46 (10 self)
 Add to MetaCart
(Show Context)
We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our algorithms can be applied to both discrete and Gaussian random variables and our learned models are such that all the observed and latent variables have the same domain (state space). Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using socalled information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a preprocessing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures such as neighborjoining) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare
Performance study of phylogenetic methods: (unweighted) quartet methods and neighborjoining
, 2003
"... ..."
Learning Nonsingular Phylogenies and Hidden Markov Models
 Proceedings of the thirtyseventh annual ACM Symposium on Theory of computing, Baltimore (STOC05
, 2005
"... In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov m ..."
Abstract

Cited by 42 (7 self)
 Add to MetaCart
In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov models without the nonsingularity condition is at least as hard as learning parity with noise. On the other hand, we give a polynomialtime algorithm for learning nonsingular phylogenies and hidden Markov models.
Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. arXiv:0710.0262v2
, 2007
"... We introduce a simple algorithm for reconstructing phylogenies from multiple gene trees in the presence of incomplete lineage sorting, that is, when the topology of the gene trees may differ from that of the species tree. We show that our technique is statistically consistent under standard stochast ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
We introduce a simple algorithm for reconstructing phylogenies from multiple gene trees in the presence of incomplete lineage sorting, that is, when the topology of the gene trees may differ from that of the species tree. We show that our technique is statistically consistent under standard stochastic assumptions, that is, it returns the correct tree given sufficiently many unlinked loci. We also show that it can tolerate moderate estimation errors. 1
Evolutionary Trees can be Learned in Polynomial Time in the TwoState General Markov Model
 SIAM Journal on Computing
, 1998
"... The jState General Markov Model of evolution (due to Steel) is a stochastic model concerned with the evolution of strings over an alphabet of size j . In particular, the TwoState General Markov Model of evolution generalises the wellknown CavenderFarrisNeyman model of evolution by removing the sy ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
(Show Context)
The jState General Markov Model of evolution (due to Steel) is a stochastic model concerned with the evolution of strings over an alphabet of size j . In particular, the TwoState General Markov Model of evolution generalises the wellknown CavenderFarrisNeyman model of evolution by removing the symmetry restriction (which requires that the probability that a `0' turns into a `1' along an edge is the same as the probability that a `1' turns into a `0' along the edge). Farach and Kannan showed how to PAClearn Markov Evolutionary Trees in the CavenderFarrisNeyman model provided that the target tree satisfies the additional restriction that all pairs of leaves have a sufficiently high probability of being the same. We show how to remove both restrictions and thereby obtain the first polynomialtime PAClearning algorithm (in the sense of Kearns et al.) for the general class of TwoState Markov Evolutionary Trees. Research Report RR347, Department of Computer Science, University of Wa...
Phase transitions in phylogeny
 Trans. Amer. Math. Soc
, 2003
"... Abstract. We apply the theory of Markov random fields on trees to derive a phase transition in the number of samples needed in order to reconstruct phylogenies. We consider the CavenderFarrisNeyman model of evolution on trees, where all the inner nodes have degree at least 3, and the net transitio ..."
Abstract

Cited by 33 (8 self)
 Add to MetaCart
(Show Context)
Abstract. We apply the theory of Markov random fields on trees to derive a phase transition in the number of samples needed in order to reconstruct phylogenies. We consider the CavenderFarrisNeyman model of evolution on trees, where all the inner nodes have degree at least 3, and the net transition on each edge is bounded by ɛ. Motivated by a conjecture by M. Steel, we show that if 2(1 − 2ɛ) 2> 1, then for balanced trees, the topology of the underlying tree, having n leaves, can be reconstructed from O(log n) samples (characters) at the leaves. On the other hand, we show that if 2(1 − 2ɛ) 2 < 1, then there exist topologies which require at least n Ω(1) samples for reconstruction. Our results are the first rigorous results to establish the role of phase transitions for Markov random fields on trees, as studied in probability, statistical physics and information theory, for the study of phylogenies in mathematical biology. 1.