Results 1 - 10
of
20
Phylogenetic networks: modeling, reconstructibility, and accuracy
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
, 2004
"... Abstract—Phylogenetic networks model the evolutionary history of sets of organisms when events such as hybrid speciation and horizontal gene transfer occur. In spite of their widely acknowledged importance in evolutionary biology, phylogenetic networks have so far been studied mostly for specific da ..."
Abstract
-
Cited by 46 (12 self)
- Add to MetaCart
Abstract—Phylogenetic networks model the evolutionary history of sets of organisms when events such as hybrid speciation and horizontal gene transfer occur. In spite of their widely acknowledged importance in evolutionary biology, phylogenetic networks have so far been studied mostly for specific data sets. We present a general definition of phylogenetic networks in terms of directed acyclic graphs (DAGs) and a set of conditions. Further, we distinguish between model networks and reconstructible ones and characterize the effect of extinction and taxon sampling on the reconstructibility of the network. Simulation studies are a standard technique for assessing the performance of phylogenetic methods. A main step in such studies entails quantifying the topological error between the model and inferred phylogenies. While many measures of tree topological accuracy have been proposed, none exist for phylogenetic networks. Previously, we proposed the first such measure, which applied only to a restricted class of networks. In this paper, we extend that measure to apply to all networks, and prove that it is a metric on the space of phylogenetic networks. Our results allow for the systematic study of existing network methods, and for the design of new accurate ones. Index Terms—Phylogenetic networks, reticulate evolution, error metric, Robinson-Foulds, bipartitions, tripartitions. 1
Estimating Recombination Rates from Population Genetic Data
, 2000
"... We introduce a new method for estimating recombination rates from population genetic data. The method uses a computationally-intensive statistical procedure (importance sampling) to calculate the likelihood under a coalescent-based model. Detailed comparisons of the new algorithm with two existing m ..."
Abstract
-
Cited by 42 (7 self)
- Add to MetaCart
We introduce a new method for estimating recombination rates from population genetic data. The method uses a computationally-intensive statistical procedure (importance sampling) to calculate the likelihood under a coalescent-based model. Detailed comparisons of the new algorithm with two existing methods (one based on importance sampling and one based on MCMC) show it to be substantially more efficient. (The improvement over the existing importance sampling scheme is typically by four orders of magnitude.) The existing approaches not infrequently led to misleading results on the problems we investigated. We also performed a simulation study to look at the properties of the maximum likelihood estimator (mle) of the recombination rate, and its robustness to misspecification of the demographic model.
A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters
- In proc Research in Computational Molecular Biology
, 2005
"... ..."
Network (reticulate) evolution: biology, models, and algorithms
- In The Ninth Pacific Symposium on Biocomputing (PSB
, 2004
"... ..."
Improved recombination lower bounds for haplotype data
- In Proceedings of the Ninth Annual International Conference on Computational Molecular Biology
, 2005
"... ABSTRACT Recombination is an important evolutionary mechanism responsible for the genetic diversity in humans and other organisms. Recently, there has been extensive research on understanding the fine scale variation in recombination rates across the human genome using DNA polymorphism data. A combi ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
ABSTRACT Recombination is an important evolutionary mechanism responsible for the genetic diversity in humans and other organisms. Recently, there has been extensive research on understanding the fine scale variation in recombination rates across the human genome using DNA polymorphism data. A combinatorial approach towards this is to estimate the minimum number of recombination events in any history of the sample. Recently, Myers and Griffiths [30] proposed two measures, Rh and Rs, that give lower bounds on the minimum number of recombination events. In this paper, we provide new and improved methods (both in terms of running time and ability to detect past recombination events) for computing lower bounds on the minimum number of recombination events. Our principal results include: • We show that computing the lower bound Rh is NPhard using a reduction from the minimum test collection problem [12]. We adapt the greedy algorithm for the set cover problem [24] to give a polynomial time algorithm for computing a diversity based bound, which we call Rg. This algorithm is several orders of magnitude faster than the Recmin program of Myers and Griffiths [30] and the bound Rg matches the bound Rh almost always. • We also show that computing the lower bound Rsfor a given matrix is also NP-hard using a reduction from MAX-2SAT. We give a O(m · 2 n) time algorithm for exactly computing Rs for a dataset with n haplotypes and m SNP’s We propose a new bound RI which extends the history based bound Rs using the notion of intermediate haplotypes. This bound detects more recombination events than both Rh and Rs bounds on many real datasets. • We extend our algorithms for computing Rg and Rs to obtain lower bounds for haplotypes with missing data. These methods can detect more recombination events for the LPL dataset [32] than previous bounds and provide stronger evidence for the presence of a recombination hotspot. • We apply our lower bounds methods to a real dataset [22] and demonstrate that these can provide a good indication for the presence and the location of recombination hotspots. 1
Combinatorial problems arising in SNP and Haplotype Analysis
- Discrete Mathematics and Theoretical Computer Science. Proceedings of DMTCS 2003
, 2003
"... Abstract. It is widely anticipated that the study of variation in the human genome will provide a means of predicting riskof a variety of complex diseases. This paper presents a number of algorithmic and combinatorial problems that arise when studying a very common form of genomic variation, single ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Abstract. It is widely anticipated that the study of variation in the human genome will provide a means of predicting riskof a variety of complex diseases. This paper presents a number of algorithmic and combinatorial problems that arise when studying a very common form of genomic variation, single nucleotide polymorphisms (SNPs). We review recent results and present challenging open problems. 1
A comparison of three estimators of the population-scaled recombination rate: accuracy and robustness
- Genetics
, 2005
"... 2. Correspondence should be addressed to Paul Fearnhead. ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
2. Correspondence should be addressed to Paul Fearnhead.
Association mapping of complex diseases with ancestral recombination graphs: Models and efficient algorithms
- Proc. of RECOMB 2007: The 11th Ann. International Conference Research in Computational Molecular Biology
"... Abstract. Association, or LD (linkage disequilibrium), mapping is an intensely-studied approach to gene mapping (genome-wide or in candidate regions) that is widely hoped to be able to efficiently locate genes influencing both complex and Mendelian traits. The logic underlying association mapping im ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract. Association, or LD (linkage disequilibrium), mapping is an intensely-studied approach to gene mapping (genome-wide or in candidate regions) that is widely hoped to be able to efficiently locate genes influencing both complex and Mendelian traits. The logic underlying association mapping implies that the best possible mapping results would be obtained if the genealogical history of the sampled individuals were explicitly known. Such a history would be in the form of an “ancestral recombination graph (ARG)”. But despite the conceptual importance of genealogical histories to association mapping, few practical association mapping methods have explicitly used derived genealogical aspects of ARGs. Two notable exceptions are [35] and [23]. In this paper we develop an association mapping method that explicitly constructs and samples minARGs (ARGs that minimize the number of recombinations). We develop an ARG sampling method that provably samples minARGs uniformly at random, and that is practical for moderate sized datasets. We also develop a different, faster, ARG sampling method that still samples from a well-defined subspace of ARGs, and that is practical for larger sized datasets. We present novel efficient algorithms on extensions of the “phenotype likelihood ” problem, a key step in the method in [35]. We also prove that computing the phenotype likelihood for a different natural extension of the penetrance model in [35] is NP-hard, answering a question unresolved in that paper. Finally, we put all of these results into practice, and examine how well the implemented methods perform, compared to the results in [35]. The empirical results show great speed ups, and definite but sometimes small, improvements in mapping accuracy. Speed is particularly important in doing genome-wide scans for causative mutations. 1
Introduction to phylogenetic networks
- in Evolutionary Studies, Molecular Biology and Evolution
, 2005
"... ..."
A concise necessary and sufficient condition for the existence of a galled-tree
- IEEE/ACM Transactions on Computational Biology and Bioinformatics
"... Abstract — Galled-trees are a special class of graphical representation of evolutionary history that has proved amenable to efficient, polynomial-time algorithms. The goal of this paper is to construct a concise necessary and sufficient condition for the existence of a galled-tree for M, a set of bi ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract — Galled-trees are a special class of graphical representation of evolutionary history that has proved amenable to efficient, polynomial-time algorithms. The goal of this paper is to construct a concise necessary and sufficient condition for the existence of a galled-tree for M, a set of binary sequences that purportedly have evolved in the presence of recombination. Both root-known and rootunknown cases are considered here. Index Terms—Galled-trees, recombination, quadpartition, incompatibility 1

