Results 1  10
of
124
Optimal, efficient reconstruction of phylogenetic networks with constrained recombination
 J. Bioinformatics and Computational Biology
, 2003
"... gusfield,eddhu¡ A phylogenetic network is a generalization of a phylogenetic tree, allowing structural properties that are not treelike. With the growth of genomic data, much of which does not fit ideal tree models, there is greater need to understand the algorithmics and combinatorics of phylogenet ..."
Abstract

Cited by 115 (14 self)
 Add to MetaCart
(Show Context)
gusfield,eddhu¡ A phylogenetic network is a generalization of a phylogenetic tree, allowing structural properties that are not treelike. With the growth of genomic data, much of which does not fit ideal tree models, there is greater need to understand the algorithmics and combinatorics of phylogenetic networks [10, 11]. However, to date, very little has been published on this, with the notable exception of the paper by Wang et al.[12]. Other related papers include [4, 5, 7] We consider the problem introduced in [12], of determining whether the sequences can be derived on a phylogenetic network where the recombination cycles are node disjoint. In this paper, we call such a phylogenetic network a “galledtree”. By more deeply analysing the combinatorial constraints on cycledisjoint phylogenetic networks, we obtain an efficient algorithm that is guaranteed to be both a necessary and sufficient test for the existence of a galledtree for the data. If there is a galledtree, the algorithm constructs one and obtains an implicit representation of all the galled trees for the data, and can create these in linear time for each one. We also note two additional results related to galled trees: first, any set of sequences that can be derived on a galled tree can be derived on a true tree (without recombination cycles), where at most one back mutation is allowed per site; second, the site compatibility problem (which is NPhard in general) can be solved in linear time for any set of sequences that can be derived on a galled tree. The combinatorial constraints we develop apply (for the most part) to nodedisjoint cycles in any phylogenetic network (not just galledtrees), and can be used for example to prove that a given site cannot be on a nodedisjoint cycle in any phylogenetic network. Perhaps more important than the specific results about galledtrees, we introduce an approach that can be used to study recombination in phylogenetic networks that go beyond galledtrees.
Haplotype reconstruction from genotype data using imperfect phylogeny
 Bioinformatics
, 2004
"... Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize the genetic variation between differe ..."
Abstract

Cited by 94 (7 self)
 Add to MetaCart
(Show Context)
Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize the genetic variation between different people, we must determine an individual’s haplotype or which nucleotide base occurs at each position of these common SNPs for each chromosome. In this paper, we present results for a highly accurate method for haplotype resolution from genotype data. Our method leverages a new insight into the underlying structure of haplotypes which shows that SNPs are organized in highly correlated “blocks”. In a few recent studies (see Daly et al. (2001); Patil et al. (2001)), considerable parts of the human genome were partitioned into blocks, such that the majority of the sequenced genotypes have one of about four common haplotypes in each block. Our method partitions the SNPs into blocks and for each block, we predict the common haplotypes and each individual’s haplotype. We evaluate our method over biological data. Our method predicts the common haplotypes perfectly and has a very low error rate (less than ¢ ¡ over the data from Daly et al. (2001).) when taking into account the predictions for the uncommon haplotypes. Our method is extremely efficient compared to previous methods, such as PHASE and HAPLOTYPER. Its efficiency allows us to find the block partition of the haplotypes, to cope with missing data and to work with large data sets. Availability: The algorithm is available via webserver at
Efficient reconstruction of haplotype structure via perfect phylogeny
 Journal of Bioinformatics and Computational Biology
, 2003
"... Each person’s genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person’s genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a h ..."
Abstract

Cited by 75 (12 self)
 Add to MetaCart
(Show Context)
Each person’s genome contains two copies of each chromosome, one inherited from the father and the other from the mother. A person’s genotype specifies the pair of bases at each site, but does not specify which base occurs on which chromosome. The sequence of each chromosome separately is called a haplotype. The determination of the haplotypes within a population is essential for understanding genetic variation and the inheritance of complex diseases. The haplotype mapping project, a successor to the human genome project, seeks to determine the common haplotypes in the human population. Since experimental determination of a person’s genotype is less expensive than determining its component haplotypes, algorithms are required for computing haplotypes from genotypes. Two observations aid in this process: first, the human genome contains short blocks within which only a few different haplotypes occur; second, as suggested by Gusfield, it is reasonable to assume that the haplotypes observed within a block have evolved according to a perfect phylogeny, in which at most one mutation event has occurred at any site, and no recombination occurred at the given region. We present a simple and efficient polynomialtime algorithm for inferring haplotypes from the genotypes of a set of individuals assuming a perfect phylogeny. Using a reduction to 2SAT we extend this algorithm to handle constraints that apply when we have genotypes from both parents and child. We also present a hardness result for the problem of removing the minimum number of individuals from a population to ensure that the genotypes of the remaining individuals are consistent with a perfect phylogeny. Our algorithms have been tested on real data and give biologically meaningful results. Our webserver
Y: Haplotype inference by maximum parsimony
 Bioinformatics
"... Motivation: Haplotypes have been attracting increasing attention because of their importance in analysis of many finescale moleculargenetics data. Since direct sequencing of haplotype via experimental methods is both timeconsuming and expensive, haplotype inference methods that infer haplotypes b ..."
Abstract

Cited by 67 (4 self)
 Add to MetaCart
(Show Context)
Motivation: Haplotypes have been attracting increasing attention because of their importance in analysis of many finescale moleculargenetics data. Since direct sequencing of haplotype via experimental methods is both timeconsuming and expensive, haplotype inference methods that infer haplotypes based on genotype samples become attractive alternatives. Results: (1) We design and implement an algorithm for an important computational model of haplotype inference that has been suggested before in several places. The model finds a set of minimum number of haplotypes that explains the genotype samples. (2) Strong supports of this computational model are given based on the computational results on both real data and simulation data. (3) We also did some comparative study to show the strength and weakness of this computational model using our program. Availability: The software HAPAR is free for noncommercial uses. Available upon request (lwang@cs.cityu.edu.hk). Contact:
Robot Analysis
 Proc. of IEEE COMPSAC
, 1999
"... In this paper, we develop a probabilistic model to approach two realistic scenarios regarding the singular haplotype reconstruction problem the incompleteness and inconsistency occurred in the DNA sequencing process to generate the input haplotype fragments and the common practice used to generate ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
In this paper, we develop a probabilistic model to approach two realistic scenarios regarding the singular haplotype reconstruction problem the incompleteness and inconsistency occurred in the DNA sequencing process to generate the input haplotype fragments and the common practice used to generate synthetic data in experimental algorithm studies. We design three algorithms in the model that can reconstruct the two unknown haplotypes from the given matrix of haplotype fragments with provable high probability and in time linear in the size of the input matrix. We also present experimental results that conform with the theoretical efficient performance of those algorithms. The software of our algorithms is available for public access and for realtime online demonstration.
Large scale reconstruction of haplotypes from genotype data
 In Proc. RECOMB’03
, 2003
"... Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize an individual’s variation, we must de ..."
Abstract

Cited by 49 (3 self)
 Add to MetaCart
(Show Context)
Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize an individual’s variation, we must determine an individual’s haplotype or which nucleotide base occurs at each position of these common SNPs for each chromosome. In this paper, we present results for a highly accurate method for haplotype resolution from genotype data. Our method leverages a new insight into the underlying structure of haplotypes which shows that SNPs are organized in highly correlated “blocks”. The majority of individuals have one of about four common haplotypes in each block. Our method partitions the SNPs into blocks and for each block, we predict the common haplotypes and each individual’s haplotype. We evaluate our method over biological data. Our method predicts the common haplotypes perfectly and has a very low error rate (0.47%) when taking into account the predictions for the uncommon haplotypes. Our method is extremely efficient compared to previous methods, (a matter of seconds where previous methods needed hours). Its efficiency allows us to find the block partition of the haplotypes, to cope with missing data and to work with large data sets such as genotypes for thousands of SNPs for hundreds of individuals. The algorithm is available via webserver
Bayesian Haplotype Inference via the Dirichlet Process
 In Proceedings of the 21st International Conference on Machine Learning
, 2004
"... The problem of inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. The problem can be ..."
Abstract

Cited by 44 (10 self)
 Add to MetaCart
(Show Context)
The problem of inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. The problem can be formulated as a mixture model, where the mixture components correspond to the pool of haplotypes in the population. The size of this pool is unknown; indeed, knowing the size of the pool would correspond to knowing something significant about the genome and its history. Thus methods for fitting the genotype mixture must crucially address the problem of estimating a mixture with an unknown number of mixture components. In this paper we present a Bayesian approach to this problem based on a nonparametric prior known as the Dirichlet process. The model also incorporates a likelihood that captures statistical errors in the haplotype/genotype relationship. We apply our approach to the analysis of both simulated and real genotype data, and compare to extant methods. 1.
A survey of computational methods for determining haplotypes
 Lecture Notes in Computer Science (2983): Computational Methods for SNPs and Haplotype Inference
, 2004
"... Abstract. It is widely anticipated that the study of variation in the human genome will provide a means of predicting risk of a variety of complex diseases. Single nucleotide polymorphisms (SNPs) are the most common form of genomic variation. Haplotypes have been suggested as one means for reducing ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
(Show Context)
Abstract. It is widely anticipated that the study of variation in the human genome will provide a means of predicting risk of a variety of complex diseases. Single nucleotide polymorphisms (SNPs) are the most common form of genomic variation. Haplotypes have been suggested as one means for reducing the complexity of studying SNPs. In this paper we review some of the computational approaches that have been taking for determining haplotypes and suggest new approaches. 1
Maximum Likelihood Haplotyping for General Pedigrees
, 2004
"... Haplotype data is valuable in mapping diseasesusceptibility genes in the study of Mendelian and complex diseases. We present algorithms for inferring a most likely haplotype configuration for general pedigrees, implemented in the newest version of the genetic linkage analysis system SUPERLINK. In S ..."
Abstract

Cited by 37 (2 self)
 Add to MetaCart
(Show Context)
Haplotype data is valuable in mapping diseasesusceptibility genes in the study of Mendelian and complex diseases. We present algorithms for inferring a most likely haplotype configuration for general pedigrees, implemented in the newest version of the genetic linkage analysis system SUPERLINK. In SUPERLINK, genetic linkage analysis problems are represented internally using Bayesian networks. The use of Bayesian networks enables efficient maximum likelihood haplotyping for more complex pedigrees than was previously possible. Furthermore, to support efficient haplotyping for larger pedigrees, we have also incorporated a novel algorithm for determining a better elimination order for the variables of the Bayesian network. The presented optimization algorithm also improves likelihood computations. We present experimental results for the new algorithms on a variety of real and semiartificial data sets, and use our software to evaluate MCMC approximations for haplotyping.
Efficient RuleBased Haplotyping Algorithms for Pedigree Data (Extended Abstract)
, 2003
"... Jing Li jili@cs.ucr.edu Tao Jiang + University of California  Riverside & Shanghai Center for Bioinform. Technology jiang@cs.ucr.edu ABSTRACT We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove tha ..."
Abstract

Cited by 36 (9 self)
 Add to MetaCart
Jing Li jili@cs.ucr.edu Tao Jiang + University of California  Riverside & Shanghai Center for Bioinform. Technology jiang@cs.ucr.edu ABSTRACT We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that the problem of finding a minimum recombinant haplotype configuration (MRHC) is in general NPhard. This is the first complexity result concerning the problem to our knowledge. An iterative algorithm based on blocks of consecutive resolved marker loci (called blockextension) is proposed. It is very e#cient and can be used for large pedigrees with a large number of markers, especially for those data sets requiring few recombinants (or recombination events). A polynomialtime exact algorithm for haplotype reconstruction without recombinants is also presented. This algorithm first identifies all the necessary constraints based on the Mendelian law and the zero recombinant assumption, and represents them using a system of linear equations over the cyclic group Z2 . By using a simple method based on Gaussian elimination, we could obtain all possible feasible haplotype configurations. We have tested the blockextension algorithm on simulated data generated on three pedigree structures. The results show that the algorithm performs very well on both multiallelic and biallelic data, especially when the number of recombinants is small.