Results 1  10
of
92
Haplotyping as Perfect Phylogeny: Conceptual Framework and Efficient Solutions (Extended Abstract)
, 2002
"... The next highpriority phase of human genomics will involve the development of a full Haplotype Map of the human genome [12]. It will be used in largescale screens of populations to associate specific haplotypes with specific complex geneticinfluenced diseases. A prototype Haplotype Mapping strat ..."
Abstract

Cited by 109 (10 self)
 Add to MetaCart
The next highpriority phase of human genomics will involve the development of a full Haplotype Map of the human genome [12]. It will be used in largescale screens of populations to associate specific haplotypes with specific complex geneticinfluenced diseases. A prototype Haplotype Mapping strategy is presently being finalized by an NIH workinggroup. The biological key to that strategy is the surprising fact that genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected [12, 6, 21, 7]. In this paper
Optimal, efficient reconstruction of phylogenetic networks with constrained recombination
 J. Bioinformatics and Computational Biology
, 2003
"... gusfield,eddhu¡ A phylogenetic network is a generalization of a phylogenetic tree, allowing structural properties that are not treelike. With the growth of genomic data, much of which does not fit ideal tree models, there is greater need to understand the algorithmics and combinatorics of phylogenet ..."
Abstract

Cited by 94 (13 self)
 Add to MetaCart
gusfield,eddhu¡ A phylogenetic network is a generalization of a phylogenetic tree, allowing structural properties that are not treelike. With the growth of genomic data, much of which does not fit ideal tree models, there is greater need to understand the algorithmics and combinatorics of phylogenetic networks [10, 11]. However, to date, very little has been published on this, with the notable exception of the paper by Wang et al.[12]. Other related papers include [4, 5, 7] We consider the problem introduced in [12], of determining whether the sequences can be derived on a phylogenetic network where the recombination cycles are node disjoint. In this paper, we call such a phylogenetic network a “galledtree”. By more deeply analysing the combinatorial constraints on cycledisjoint phylogenetic networks, we obtain an efficient algorithm that is guaranteed to be both a necessary and sufficient test for the existence of a galledtree for the data. If there is a galledtree, the algorithm constructs one and obtains an implicit representation of all the galled trees for the data, and can create these in linear time for each one. We also note two additional results related to galled trees: first, any set of sequences that can be derived on a galled tree can be derived on a true tree (without recombination cycles), where at most one back mutation is allowed per site; second, the site compatibility problem (which is NPhard in general) can be solved in linear time for any set of sequences that can be derived on a galled tree. The combinatorial constraints we develop apply (for the most part) to nodedisjoint cycles in any phylogenetic network (not just galledtrees), and can be used for example to prove that a given site cannot be on a nodedisjoint cycle in any phylogenetic network. Perhaps more important than the specific results about galledtrees, we introduce an approach that can be used to study recombination in phylogenetic networks that go beyond galledtrees.
Efficient reconstruction of haplotype structure via perfect phylogeny
 Journal of Bioinformatics and Computational Biology
, 2003
"... Each person’s genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person’s genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a h ..."
Abstract

Cited by 68 (10 self)
 Add to MetaCart
Each person’s genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person’s genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype. The determination of the haplotypes within a population is essential for understanding genetic variation and the inheritance of complex diseases. The haplotype mapping project, a successor to the human genome project, seeks to determine the common haplotypes in the human population. Since experimental determination of a person’s genotype is less expensive than determining its component haplotypes, algorithms are required for computing haplotypes from genotypes. Two observations aid in this process: first, the human genome contains short blocks within which only a few different haplotypes occur; second, as suggested by Gusfield, it is reasonable to assume that the haplotypes observed within a block have evolved according to a perfect phylogeny, in which at most one mutation event has occurred at any site, and no recombination occurred at the given region. We present a simple and efficient polynomialtime algorithm for inferring haplotypes from the genotypes of a set of individuals assuming a perfect phylogeny. Using a reduction to 2SAT we extend this algorithm to handle constraints that apply when we have genotypes from both parents and child. We also present a hardness result for the problem of removing the minimum number of individuals from a population to ensure that the genotypes of the remaining individuals are consistent with a perfect phylogeny. Our algorithms have been tested on real data and give biologically meaningful results. Our webserver
Parameterized Computational Feasibility
 Feasible Mathematics II
, 1994
"... Many natural computational problems have input consisting of two or more parts. For example, the input might consist of a graph and a positive integer. For many natural problems we may view one of the inputs as a parameter and study how the complexity of the problem varies if the parameter is he ..."
Abstract

Cited by 59 (20 self)
 Add to MetaCart
Many natural computational problems have input consisting of two or more parts. For example, the input might consist of a graph and a positive integer. For many natural problems we may view one of the inputs as a parameter and study how the complexity of the problem varies if the parameter is held fixed. For many applications of computational problems involving such a parameter, only a small range of parameter values is of practical significance, so that fixedparameter complexity is a natural concern. In studying the complexity of such problems, it is therefore important to have a framework in which we can make qualitative distinctions about the contribution of the parameter to the complexity of the problem. In this paper we survey one such framework for investigating parameterized computational complexity and present a number of new results for this theory.
A PolynomialTime Algorithm for the Perfect Phylogeny Problem when the Number of Character States Is Fixed
 SIAM JOURNAL ON COMPUTING
, 1994
"... We present a polynomialtime algorithm for determining whether a set of species, described by the characters they exhibit, has a perfect phylogeny, assuming the maximum number of possible states for a character is fixed. This solves a longstanding open problem. Our result should be contrasted with ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
We present a polynomialtime algorithm for determining whether a set of species, described by the characters they exhibit, has a perfect phylogeny, assuming the maximum number of possible states for a character is fixed. This solves a longstanding open problem. Our result should be contrasted with the proof by Steel and Bodlaender, Fellows, and Warnow that the perfect phylogeny problem is NPcomplete in general.
Reconstructing reticulate evolution in species  theory and practice
 In Proc. of 8’th Annual International Conference on Computational Molecular Biology
, 2004
"... We present new methods for reconstructing reticulate evolution of species due to events such as horizontal transfer or hybrid speciation; both methods are based upon extensions of Wayne Maddison’s approach in his seminal 1997 paper. Our first method is a polynomial time algorithm for constructing ph ..."
Abstract

Cited by 46 (7 self)
 Add to MetaCart
We present new methods for reconstructing reticulate evolution of species due to events such as horizontal transfer or hybrid speciation; both methods are based upon extensions of Wayne Maddison’s approach in his seminal 1997 paper. Our first method is a polynomial time algorithm for constructing phylogenetic networks from two gene trees contained inside the network. We allow the network to have an arbitrary number of reticulations, but we limit the reticulation in the network so that the cycles in network are nodedisjoint (“galled”). Our second method is a polynomial time algorithm for constructing networks with one reticulation, where we allow for errors in the estimated gene trees. Using simulations, we demonstrate improved performance of this method over both NeighborNet and Maddison’s method. 1
A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters
 In proc Research in Computational Molecular Biology
, 2005
"... ..."
A Fast Algorithm for the Computation and Enumeration of Perfect Phylogenies
 SIAM JOURNAL ON COMPUTING
, 1995
"... The Perfect Phylogeny Problem is a classical problem in computational evolutionary biology, in which a set of species/taxa is described by a set of qualitative characters. In recent years, the problem has been shown to be NPComplete in general, while the different fixed parameter versions can e ..."
Abstract

Cited by 40 (7 self)
 Add to MetaCart
The Perfect Phylogeny Problem is a classical problem in computational evolutionary biology, in which a set of species/taxa is described by a set of qualitative characters. In recent years, the problem has been shown to be NPComplete in general, while the different fixed parameter versions can each be solved in polynomial time. In particular, Agarwala and FernandezBaca have developed an O(2 3r (nk 3 +k 4 )) algorithm for the perfect phylogeny problem for n species defined by k rstate characters. Since commonly the character data is drawn from alignments of molecular sequences, k is the length of the sequences and can thus be very large (in the hundreds or thousands). Thus, it is imperative to develop algorithms which run efficiently for large values of k. In this paper we make additional observations about the structure of the problem and produce an algorithm for the problem that runs in time O(2 2r k 2 n). We also show how it is possible to efficiently build a...
Reconstructing a History of Recombinations From a Set of Sequences
 Discrete Appl. Math
, 1998
"... One of the classic problems in computational biology is the reconstruction of evolutionary history. A recent trend in the area is to increase the explanatory power of the models that are considered by incorporating higherorder evolutionary events that more accurately reflect the mechanisms of mutat ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
One of the classic problems in computational biology is the reconstruction of evolutionary history. A recent trend in the area is to increase the explanatory power of the models that are considered by incorporating higherorder evolutionary events that more accurately reflect the mechanisms of mutation at the level of the chromosome. We take a step in this direction by considering the problem of reconstructing an evolutionary history for a set of genetic sequences that have evolved by recombination. Recombination is a nontreelike event that produces a child sequence by crossing two parent sequences. We present polynomialtime algorithms for reconstructing a parsimonious history of such events for several models of recombination when all sequences, including those of ancestors, are present in the input. We also show that these models appear to be near the limit of what can be solved in polynomial time, in that several natural generalizations are NPcomplete. Keywords Computational bio...
Perfect phylogeny and haplotype assignment
 In Proceedings of the Eighth Annual International Conference on Research in Computational Molecular Biology (RECOMB 04
, 2004
"... This paper is concerned with the reconstruction of perfect phylogenies from binary character data with missing values, and related problems of inferring complete haplotypes from haplotypes or genotypes with missing data. In cases where the problems considered are NPhard we assume a rich data hypoth ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
This paper is concerned with the reconstruction of perfect phylogenies from binary character data with missing values, and related problems of inferring complete haplotypes from haplotypes or genotypes with missing data. In cases where the problems considered are NPhard we assume a rich data hypothesis under which they become tractable. Natural probabilistic models are introduced for the generation of character vectors, haplotypes or genotypes with missing data, and it is shown that these models support the rich data hypothesis. The principal results include: • A nearlinear time algorithm for inferring a perfect phylogeny from binary character data (or haplotype data) with missing values, under the rich data hypothesis; • A quadratictime algorithm for inferring a perfect phylogeny from genotype data with missing values with high probability, under certain distributional assumptions; • Demonstration that the problems of maximumlikelihood inference of complete haplotypes from partial haplotypes or partial genotypes can be cast as minimumentropy disjoint set cover problems; • In the case where the haplotypes come from a perfect phylogeny, a representation of the set cover problem as minimumentropy covering of subtrees of a tree by nodes;