Results 1  10
of
229
Haplotype reconstruction from genotype data using imperfect phylogeny
 Bioinformatics
, 2004
"... Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize the genetic variation between differe ..."
Abstract

Cited by 94 (7 self)
 Add to MetaCart
(Show Context)
Critical to the understanding of the genetic basis for complex diseases is the modeling of human variation. Most of this variation can be characterized by single nucleotide polymorphisms (SNPs) which are mutations at a single nucleotide position. To characterize the genetic variation between different people, we must determine an individual’s haplotype or which nucleotide base occurs at each position of these common SNPs for each chromosome. In this paper, we present results for a highly accurate method for haplotype resolution from genotype data. Our method leverages a new insight into the underlying structure of haplotypes which shows that SNPs are organized in highly correlated “blocks”. In a few recent studies (see Daly et al. (2001); Patil et al. (2001)), considerable parts of the human genome were partitioned into blocks, such that the majority of the sequenced genotypes have one of about four common haplotypes in each block. Our method partitions the SNPs into blocks and for each block, we predict the common haplotypes and each individual’s haplotype. We evaluate our method over biological data. Our method predicts the common haplotypes perfectly and has a very low error rate (less than ¢ ¡ over the data from Daly et al. (2001).) when taking into account the predictions for the uncommon haplotypes. Our method is extremely efficient compared to previous methods, such as PHASE and HAPLOTYPER. Its efficiency allows us to find the block partition of the haplotypes, to cope with missing data and to work with large data sets. Availability: The algorithm is available via webserver at
Modelbased inference of haplotype block variation
 Proceedings of the Seventh Annual International Conference on Computational Molecular Biology (RECOMB 2003
, 2003
"... The uneven recombination structure of human DNA has been highlighted by several recent studies. Knowledge of the haplotype blocks generated by this phenomenon can be applied to dramatically increase the statistical power of genetic mapping. Several criteria have already been proposed for identifying ..."
Abstract

Cited by 68 (6 self)
 Add to MetaCart
(Show Context)
The uneven recombination structure of human DNA has been highlighted by several recent studies. Knowledge of the haplotype blocks generated by this phenomenon can be applied to dramatically increase the statistical power of genetic mapping. Several criteria have already been proposed for identifying these blocks, all of which require haplotypes as input. We propose a comprehensive statistical model of haplotype block variation and show how the parameters of this model can be learned from haplotypes and/or unphased genotype data. Using realworld SNP data, we demonstrate that our approach can be used to resolve genotypes into their constituent haplotypes with greater accuracy than previously known methods.
Y: Haplotype inference by maximum parsimony
 Bioinformatics
"... Motivation: Haplotypes have been attracting increasing attention because of their importance in analysis of many finescale moleculargenetics data. Since direct sequencing of haplotype via experimental methods is both timeconsuming and expensive, haplotype inference methods that infer haplotypes b ..."
Abstract

Cited by 67 (4 self)
 Add to MetaCart
(Show Context)
Motivation: Haplotypes have been attracting increasing attention because of their importance in analysis of many finescale moleculargenetics data. Since direct sequencing of haplotype via experimental methods is both timeconsuming and expensive, haplotype inference methods that infer haplotypes based on genotype samples become attractive alternatives. Results: (1) We design and implement an algorithm for an important computational model of haplotype inference that has been suggested before in several places. The model finds a set of minimum number of haplotypes that explains the genotype samples. (2) Strong supports of this computational model are given based on the computational results on both real data and simulation data. (3) We also did some comparative study to show the strength and weakness of this computational model using our program. Availability: The software HAPAR is free for noncommercial uses. Available upon request (lwang@cs.cityu.edu.hk). Contact:
Bayesian Haplotype Inference via the Dirichlet Process
 In Proceedings of the 21st International Conference on Machine Learning
, 2004
"... The problem of inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. The problem can be ..."
Abstract

Cited by 44 (10 self)
 Add to MetaCart
(Show Context)
The problem of inferring haplotypes from genotypes of single nucleotide polymorphisms (SNPs) is essential for the understanding of genetic variation within and among populations, with important applications to the genetic analysis of disease propensities and other complex traits. The problem can be formulated as a mixture model, where the mixture components correspond to the pool of haplotypes in the population. The size of this pool is unknown; indeed, knowing the size of the pool would correspond to knowing something significant about the genome and its history. Thus methods for fitting the genotype mixture must crucially address the problem of estimating a mixture with an unknown number of mixture components. In this paper we present a Bayesian approach to this problem based on a nonparametric prior known as the Dirichlet process. The model also incorporates a likelihood that captures statistical errors in the haplotype/genotype relationship. We apply our approach to the analysis of both simulated and real genotype data, and compare to extant methods. 1.
A survey of computational methods for determining haplotypes
 Lecture Notes in Computer Science (2983): Computational Methods for SNPs and Haplotype Inference
, 2004
"... Abstract. It is widely anticipated that the study of variation in the human genome will provide a means of predicting risk of a variety of complex diseases. Single nucleotide polymorphisms (SNPs) are the most common form of genomic variation. Haplotypes have been suggested as one means for reducing ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
(Show Context)
Abstract. It is widely anticipated that the study of variation in the human genome will provide a means of predicting risk of a variety of complex diseases. Single nucleotide polymorphisms (SNPs) are the most common form of genomic variation. Haplotypes have been suggested as one means for reducing the complexity of studying SNPs. In this paper we review some of the computational approaches that have been taking for determining haplotypes and suggest new approaches. 1
Efficient Inference of Haplotypes from Genotypes on a Pedigree
 J Bioinfo Comp Biol
, 2003
"... We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that the problem of finding a minimumrecombinant haplotype configuration (MRHC) is in general NPhard. This is the first complexity result concerning the pr ..."
Abstract

Cited by 36 (10 self)
 Add to MetaCart
(Show Context)
We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that the problem of finding a minimumrecombinant haplotype configuration (MRHC) is in general NPhard. This is the first complexity result concerning the problem to our knowledge. An iterative algorithm based on blocks of consecutive resolved marker loci (called blockextension) is proposed. It is very efficient and can be used for large pedigrees with a large number of markers, especially for those data sets requiring few recombinants (or recombination events). A polynomialtime exact algorithm for haplotype reconstruction without recombinants is also presented. This algorithm first identifies all the necessary constraints based on the Mendelian law and the zero recombinant assumption, and represents them using a system of linear equations over the cyclic group Z 2 . By using a simple method based on Gaussian elimination, we could obtain all possible feasible haplotype configurations. A C++ implementation of the blockextension algorithm, called PedPhase, has been tested on both simulated data and real data. The results show that the program performs very well on both types of data and will be useful for large scale haplotype inference projects.
Efficient RuleBased Haplotyping Algorithms for Pedigree Data (Extended Abstract)
, 2003
"... Jing Li jili@cs.ucr.edu Tao Jiang + University of California  Riverside & Shanghai Center for Bioinform. Technology jiang@cs.ucr.edu ABSTRACT We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove tha ..."
Abstract

Cited by 35 (9 self)
 Add to MetaCart
Jing Li jili@cs.ucr.edu Tao Jiang + University of California  Riverside & Shanghai Center for Bioinform. Technology jiang@cs.ucr.edu ABSTRACT We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that the problem of finding a minimum recombinant haplotype configuration (MRHC) is in general NPhard. This is the first complexity result concerning the problem to our knowledge. An iterative algorithm based on blocks of consecutive resolved marker loci (called blockextension) is proposed. It is very e#cient and can be used for large pedigrees with a large number of markers, especially for those data sets requiring few recombinants (or recombination events). A polynomialtime exact algorithm for haplotype reconstruction without recombinants is also presented. This algorithm first identifies all the necessary constraints based on the Mendelian law and the zero recombinant assumption, and represents them using a system of linear equations over the cyclic group Z2 . By using a simple method based on Gaussian elimination, we could obtain all possible feasible haplotype configurations. We have tested the blockextension algorithm on simulated data generated on three pedigree structures. The results show that the algorithm performs very well on both multiallelic and biallelic data, especially when the number of recombinants is small.
An approximation algorithm for haplotype inference by maximum parsimony
 Journal of Computational Biology
, 2005
"... This paper studies haplotype inference by maximum parsimony using population data. We define the optimal haplotype inference (OHI) problem as given a set of genotypes and a set of related haplotypes, find a minimum subset of haplotypes that can resolve all the genotypes. We prove that OHI is NPhard ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
(Show Context)
This paper studies haplotype inference by maximum parsimony using population data. We define the optimal haplotype inference (OHI) problem as given a set of genotypes and a set of related haplotypes, find a minimum subset of haplotypes that can resolve all the genotypes. We prove that OHI is NPhard and can be formulated as an integer quadratic programming (IQP) problem. To solve the IQP problem, we propose an iterative semidefinite programming based approximation algorithm, (called SDPHapInfer). We show that this algorithm finds a solution within a factor of O(log n) of the optimal solution, where n is the number of genotypes. This algorithm has been implemented and tested on a variety of simulated and biological data. In comparison with three other methods: (1) HAPAR, which was implemented based on the branching and bound algorithm, (2) HAPLOTYPER, which was implemented based on the ExpectationMaximization algorithm, and (3) PHASE, which combined the Gibbs sampling algorithm with an approximate coalescent prior, the experimental results indicate that SDPHapInfer and HAPLOTYPER have similar error rates. In addition, the results generated by PHASE have lower error rates on some data but higher error rates on others. The error rates of HAPAR are higher than the others on biological data. In
The Haplotyping Problem: An Overview of Computational Models and Solutions
 Journal of Computer Science and Technology
, 2003
"... The investigation of genetic di#erences among humans has given evidence that mutations in DNA sequences are responsible for some genetic diseases. The most common mutation is the one that involves only a single nucleotide of the DNA sequence, which is called a single nucleotide polymorphism (SNP) ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
(Show Context)
The investigation of genetic di#erences among humans has given evidence that mutations in DNA sequences are responsible for some genetic diseases. The most common mutation is the one that involves only a single nucleotide of the DNA sequence, which is called a single nucleotide polymorphism (SNP). As a consequence, computing a complete map of all SNPs occurring in the human populations is one of the primary goals of recent studies in human genomics. The construction of such a map requires to determine the DNA sequences that from all chromosomes. In diploid organisms like humans, each chromosome consists of two sequences called haplotypes.
An Overview of Combinatorial Methods for Haplotype Inference
 Lecture Notes in Computer Science (2983): Computational Methods for SNPs and Haplotype Inference
, 2004
"... A current highpriority phase of human genomics involves the development of a full Haplotype Map of the human genome [23]. It will be used in largescale screens of populations to associate specific haplotypes with specific complex geneticinfluenced diseases. A key, perhaps bottleneck, problem is t ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
(Show Context)
A current highpriority phase of human genomics involves the development of a full Haplotype Map of the human genome [23]. It will be used in largescale screens of populations to associate specific haplotypes with specific complex geneticinfluenced diseases. A key, perhaps bottleneck, problem is to computationally infer haplotype pairs from genotype data. This paper follows the talk given at the DIMACS Conference on SNPs and Haplotypes held in November of 2002. It reviews several combinatorial approaches to the haplotype inference problem that we have investigated over the last several years. In addition, it updates some of the work presented earlier, and discusses the current state of our work. 1 Introduction to SNP’s, Genotypes and Haplotypes In diploid organisms (such as humans) there are two (not completely identical) “copies ” of each chromosome, and hence of each region of interest.