Results 1  10
of
44
Estimating Recombination Rates from Population Genetic Data
, 2000
"... We introduce a new method for estimating recombination rates from population genetic data. The method uses a computationallyintensive statistical procedure (importance sampling) to calculate the likelihood under a coalescentbased model. Detailed comparisons of the new algorithm with two existing m ..."
Abstract

Cited by 63 (9 self)
 Add to MetaCart
We introduce a new method for estimating recombination rates from population genetic data. The method uses a computationallyintensive statistical procedure (importance sampling) to calculate the likelihood under a coalescentbased model. Detailed comparisons of the new algorithm with two existing methods (one based on importance sampling and one based on MCMC) show it to be substantially more efficient. (The improvement over the existing importance sampling scheme is typically by four orders of magnitude.) The existing approaches not infrequently led to misleading results on the problems we investigated. We also performed a simulation study to look at the properties of the maximum likelihood estimator (mle) of the recombination rate, and its robustness to misspecification of the demographic model.
Phylogenetic networks: modeling, reconstructibility, and accuracy
 IEEE/ACM Transactions on Computational Biology and Bioinformatics
, 2004
"... Abstract—Phylogenetic networks model the evolutionary history of sets of organisms when events such as hybrid speciation and horizontal gene transfer occur. In spite of their widely acknowledged importance in evolutionary biology, phylogenetic networks have so far been studied mostly for specific da ..."
Abstract

Cited by 62 (16 self)
 Add to MetaCart
Abstract—Phylogenetic networks model the evolutionary history of sets of organisms when events such as hybrid speciation and horizontal gene transfer occur. In spite of their widely acknowledged importance in evolutionary biology, phylogenetic networks have so far been studied mostly for specific data sets. We present a general definition of phylogenetic networks in terms of directed acyclic graphs (DAGs) and a set of conditions. Further, we distinguish between model networks and reconstructible ones and characterize the effect of extinction and taxon sampling on the reconstructibility of the network. Simulation studies are a standard technique for assessing the performance of phylogenetic methods. A main step in such studies entails quantifying the topological error between the model and inferred phylogenies. While many measures of tree topological accuracy have been proposed, none exist for phylogenetic networks. Previously, we proposed the first such measure, which applied only to a restricted class of networks. In this paper, we extend that measure to apply to all networks, and prove that it is a metric on the space of phylogenetic networks. Our results allow for the systematic study of existing network methods, and for the design of new accurate ones. Index Terms—Phylogenetic networks, reticulate evolution, error metric, RobinsonFoulds, bipartitions, tripartitions. 1
A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters
 In proc Research in Computational Molecular Biology
, 2005
"... ..."
Network (reticulate) evolution: biology, models, and algorithms
 In The Ninth Pacific Symposium on Biocomputing (PSB
, 2004
"... ..."
(Show Context)
Improved recombination lower bounds for haplotype data
 In Proceedings of the Ninth Annual International Conference on Computational Molecular Biology
, 2005
"... ABSTRACT Recombination is an important evolutionary mechanism responsible for the genetic diversity in humans and other organisms. Recently, there has been extensive research on understanding the fine scale variation in recombination rates across the human genome using DNA polymorphism data. A combi ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
ABSTRACT Recombination is an important evolutionary mechanism responsible for the genetic diversity in humans and other organisms. Recently, there has been extensive research on understanding the fine scale variation in recombination rates across the human genome using DNA polymorphism data. A combinatorial approach towards this is to estimate the minimum number of recombination events in any history of the sample. Recently, Myers and Griffiths [30] proposed two measures, Rh and Rs, that give lower bounds on the minimum number of recombination events. In this paper, we provide new and improved methods (both in terms of running time and ability to detect past recombination events) for computing lower bounds on the minimum number of recombination events. Our principal results include: • We show that computing the lower bound Rh is NPhard using a reduction from the minimum test collection problem [12]. We adapt the greedy algorithm for the set cover problem [24] to give a polynomial time algorithm for computing a diversity based bound, which we call Rg. This algorithm is several orders of magnitude faster than the Recmin program of Myers and Griffiths [30] and the bound Rg matches the bound Rh almost always. • We also show that computing the lower bound Rsfor a given matrix is also NPhard using a reduction from MAX2SAT. We give a O(m · 2 n) time algorithm for exactly computing Rs for a dataset with n haplotypes and m SNP’s We propose a new bound RI which extends the history based bound Rs using the notion of intermediate haplotypes. This bound detects more recombination events than both Rh and Rs bounds on many real datasets. • We extend our algorithms for computing Rg and Rs to obtain lower bounds for haplotypes with missing data. These methods can detect more recombination events for the LPL dataset [32] than previous bounds and provide stronger evidence for the presence of a recombination hotspot. • We apply our lower bounds methods to a real dataset [22] and demonstrate that these can provide a good indication for the presence and the location of recombination hotspots. 1
Combinatorial problems arising in SNP and Haplotype Analysis
 Discrete Mathematics and Theoretical Computer Science. Proceedings of DMTCS 2003
, 2003
"... Abstract. It is widely anticipated that the study of variation in the human genome will provide a means of predicting riskof a variety of complex diseases. This paper presents a number of algorithmic and combinatorial problems that arise when studying a very common form of genomic variation, single ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Abstract. It is widely anticipated that the study of variation in the human genome will provide a means of predicting riskof a variety of complex diseases. This paper presents a number of algorithmic and combinatorial problems that arise when studying a very common form of genomic variation, single nucleotide polymorphisms (SNPs). We review recent results and present challenging open problems. 1
Possible ancestral structure in human populations. PLoS Genet 2:972–979
, 2006
"... Determining the evolutionary relationships between fossil hominid groups such as Neanderthals and modern humans has been a question of enduring interest in human evolutionary genetics. Here we present a new method for addressing whether archaic human groups contributed to the modern gene pool (calle ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Determining the evolutionary relationships between fossil hominid groups such as Neanderthals and modern humans has been a question of enduring interest in human evolutionary genetics. Here we present a new method for addressing whether archaic human groups contributed to the modern gene pool (called ancient admixture), using the patterns of variation in contemporary human populations. Our method improves on previous work by explicitly accounting for recent population history before performing the analyses. Using sequence data from the Environmental Genome Project, we find strong evidence for ancient admixture in both a European and a West African population (p ’ 107), with contributions to the modern gene pool of at least 5%. While Neanderthals form an obvious archaic source population candidate in Europe, there is not yet a clear source population candidate in West Africa.
Association mapping of complex diseases with ancestral recombination graphs: Models and efficient algorithms
 Proc. of RECOMB 2007: The 11th Ann. International Conference Research in Computational Molecular Biology
"... Abstract. Association, or LD (linkage disequilibrium), mapping is an intenselystudied approach to gene mapping (genomewide or in candidate regions) that is widely hoped to be able to efficiently locate genes influencing both complex and Mendelian traits. The logic underlying association mapping im ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Association, or LD (linkage disequilibrium), mapping is an intenselystudied approach to gene mapping (genomewide or in candidate regions) that is widely hoped to be able to efficiently locate genes influencing both complex and Mendelian traits. The logic underlying association mapping implies that the best possible mapping results would be obtained if the genealogical history of the sampled individuals were explicitly known. Such a history would be in the form of an “ancestral recombination graph (ARG)”. But despite the conceptual importance of genealogical histories to association mapping, few practical association mapping methods have explicitly used derived genealogical aspects of ARGs. Two notable exceptions are [35] and [23]. In this paper we develop an association mapping method that explicitly constructs and samples minARGs (ARGs that minimize the number of recombinations). We develop an ARG sampling method that provably samples minARGs uniformly at random, and that is practical for moderate sized datasets. We also develop a different, faster, ARG sampling method that still samples from a welldefined subspace of ARGs, and that is practical for larger sized datasets. We present novel efficient algorithms on extensions of the “phenotype likelihood ” problem, a key step in the method in [35]. We also prove that computing the phenotype likelihood for a different natural extension of the penetrance model in [35] is NPhard, answering a question unresolved in that paper. Finally, we put all of these results into practice, and examine how well the implemented methods perform, compared to the results in [35]. The empirical results show great speed ups, and definite but sometimes small, improvements in mapping accuracy. Speed is particularly important in doing genomewide scans for causative mutations. 1
Introduction to phylogenetic networks
 in Evolutionary Studies, Molecular Biology and Evolution
, 2005
"... ..."
(Show Context)
Counting All Possible Ancestral Configurations of Sample Sequences in Population Genetics
"... Abstract — Given a set D of input sequences, a genealogy for D can be constructed backwards in time, using such evolutionary events as mutation, coalescent and recombination. An ancestral configuration (AC) can be regarded as the multiset of all sequences present at a particular point in time in a p ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Abstract — Given a set D of input sequences, a genealogy for D can be constructed backwards in time, using such evolutionary events as mutation, coalescent and recombination. An ancestral configuration (AC) can be regarded as the multiset of all sequences present at a particular point in time in a possible genealogy for D. The complexity of computing the likelihood of observing D depends heavily on the total number of distinct ACs of D, and therefore it is of interest to estimate that number. For D consisting of binary sequences of finite length, we consider the problem of enumerating exactly all distinct ACs. We assume that the root sequence type is known and that mutation process is governed by the infinitesites model. When there is no recombination, we construct a general method of obtaining closedform formulas for the total number of ACs. The enumeration problem becomes much more complicated when recombination is involved. In that case, we devise a method of enumeration based on counting contingency tables and construct a dynamic programming algorithm for the approach. Lastly, we describe a method of counting the number of ACs that can appear in genealogies with less than or equal to a given number R of recombinations. Of particular interest is the case in which R is close to the minimum number of recombinations for D. Index Terms—Ancestral configurations, coalescent, recombination, contingency table, enumeration 1