Results 1  10
of
20
Haplotyping as Perfect Phylogeny: Conceptual Framework and Efficient Solutions (Extended Abstract)
, 2002
"... The next highpriority phase of human genomics will involve the development of a full Haplotype Map of the human genome [12]. It will be used in largescale screens of populations to associate specific haplotypes with specific complex geneticinfluenced diseases. A prototype Haplotype Mapping strat ..."
Abstract

Cited by 109 (10 self)
 Add to MetaCart
The next highpriority phase of human genomics will involve the development of a full Haplotype Map of the human genome [12]. It will be used in largescale screens of populations to associate specific haplotypes with specific complex geneticinfluenced diseases. A prototype Haplotype Mapping strategy is presently being finalized by an NIH workinggroup. The biological key to that strategy is the surprising fact that genomic DNA can be partitioned into long blocks where genetic recombination has been rare, leading to strikingly fewer distinct haplotypes in the population than previously expected [12, 6, 21, 7]. In this paper
A survey of computational methods for determining haplotypes
 Lecture Notes in Computer Science (2983): Computational Methods for SNPs and Haplotype Inference
, 2004
"... Abstract. It is widely anticipated that the study of variation in the human genome will provide a means of predicting risk of a variety of complex diseases. Single nucleotide polymorphisms (SNPs) are the most common form of genomic variation. Haplotypes have been suggested as one means for reducing ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
Abstract. It is widely anticipated that the study of variation in the human genome will provide a means of predicting risk of a variety of complex diseases. Single nucleotide polymorphisms (SNPs) are the most common form of genomic variation. Haplotypes have been suggested as one means for reducing the complexity of studying SNPs. In this paper we review some of the computational approaches that have been taking for determining haplotypes and suggest new approaches. 1
Efficient RuleBased Haplotyping Algorithms for Pedigree Data (Extended Abstract)
, 2003
"... Jing Li jili@cs.ucr.edu Tao Jiang + University of California  Riverside & Shanghai Center for Bioinform. Technology jiang@cs.ucr.edu ABSTRACT We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that th ..."
Abstract

Cited by 32 (9 self)
 Add to MetaCart
Jing Li jili@cs.ucr.edu Tao Jiang + University of California  Riverside & Shanghai Center for Bioinform. Technology jiang@cs.ucr.edu ABSTRACT We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that the problem of finding a minimum recombinant haplotype configuration (MRHC) is in general NPhard. This is the first complexity result concerning the problem to our knowledge. An iterative algorithm based on blocks of consecutive resolved marker loci (called blockextension) is proposed. It is very e#cient and can be used for large pedigrees with a large number of markers, especially for those data sets requiring few recombinants (or recombination events). A polynomialtime exact algorithm for haplotype reconstruction without recombinants is also presented. This algorithm first identifies all the necessary constraints based on the Mendelian law and the zero recombinant assumption, and represents them using a system of linear equations over the cyclic group Z2 . By using a simple method based on Gaussian elimination, we could obtain all possible feasible haplotype configurations. We have tested the blockextension algorithm on simulated data generated on three pedigree structures. The results show that the algorithm performs very well on both multiallelic and biallelic data, especially when the number of recombinants is small.
Efficient Inference of Haplotypes from Genotypes on a Pedigree
 J Bioinfo Comp Biol
, 2003
"... We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that the problem of finding a minimumrecombinant haplotype configuration (MRHC) is in general NPhard. This is the first complexity result concerning the pr ..."
Abstract

Cited by 30 (10 self)
 Add to MetaCart
We study haplotype reconstruction under the Mendelian law of inheritance and the minimum recombination principle on pedigree data. We prove that the problem of finding a minimumrecombinant haplotype configuration (MRHC) is in general NPhard. This is the first complexity result concerning the problem to our knowledge. An iterative algorithm based on blocks of consecutive resolved marker loci (called blockextension) is proposed. It is very efficient and can be used for large pedigrees with a large number of markers, especially for those data sets requiring few recombinants (or recombination events). A polynomialtime exact algorithm for haplotype reconstruction without recombinants is also presented. This algorithm first identifies all the necessary constraints based on the Mendelian law and the zero recombinant assumption, and represents them using a system of linear equations over the cyclic group Z 2 . By using a simple method based on Gaussian elimination, we could obtain all possible feasible haplotype configurations. A C++ implementation of the blockextension algorithm, called PedPhase, has been tested on both simulated data and real data. The results show that the program performs very well on both types of data and will be useful for large scale haplotype inference projects.
A lineartime algorithm for the perfect phylogeny haplotyping (PPH) problem
 In International Conference on Research in Computational Molecular Biology (RECOMB
, 2005
"... Since the introduction of the Perfect Phylogeny Haplotyping (PPH) Problem in RECOMB 2002 (Gusfield, 2002), the problem of finding a lineartime (deterministic, worstcase) solution for it has remained open, despite broad interest in the PPH problem and a series of papers on various aspects of it. In ..."
Abstract

Cited by 30 (8 self)
 Add to MetaCart
Since the introduction of the Perfect Phylogeny Haplotyping (PPH) Problem in RECOMB 2002 (Gusfield, 2002), the problem of finding a lineartime (deterministic, worstcase) solution for it has remained open, despite broad interest in the PPH problem and a series of papers on various aspects of it. In this paper, we solve the open problem, giving a practical, deterministic lineartime algorithm based on a simple data structure and simple operations on it. The method is straightforward to program and has been fully implemented. Simulations show that it is much faster in practice than prior nonlinear methods. The value of a lineartime solution to the PPH problem is partly conceptual and partly for use in the inner loop of algorithms for more complex problems, where the PPH problem must be solved repeatedly. Key words: Perfect Phylogeny Haplotyping (PPH) Problem, Haplotype Inference Problem, lineartime algorithm, shadow tree. 1.
An approximation algorithm for haplotype inference by maximum parsimony
 Journal of Computational Biology
, 2005
"... This paper studies haplotype inference by maximum parsimony using population data. We define the optimal haplotype inference (OHI) problem as given a set of genotypes and a set of related haplotypes, find a minimum subset of haplotypes that can resolve all the genotypes. We prove that OHI is NPhard ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
This paper studies haplotype inference by maximum parsimony using population data. We define the optimal haplotype inference (OHI) problem as given a set of genotypes and a set of related haplotypes, find a minimum subset of haplotypes that can resolve all the genotypes. We prove that OHI is NPhard and can be formulated as an integer quadratic programming (IQP) problem. To solve the IQP problem, we propose an iterative semidefinite programming based approximation algorithm, (called SDPHapInfer). We show that this algorithm finds a solution within a factor of O(log n) of the optimal solution, where n is the number of genotypes. This algorithm has been implemented and tested on a variety of simulated and biological data. In comparison with three other methods: (1) HAPAR, which was implemented based on the branching and bound algorithm, (2) HAPLOTYPER, which was implemented based on the ExpectationMaximization algorithm, and (3) PHASE, which combined the Gibbs sampling algorithm with an approximate coalescent prior, the experimental results indicate that SDPHapInfer and HAPLOTYPER have similar error rates. In addition, the results generated by PHASE have lower error rates on some data but higher error rates on others. The error rates of HAPAR are higher than the others on biological data. In
Practical algorithms and fixedparameter tractability for the single individual SNP haplotyping problem
 In Proceedings of the 2nd Inter national Workshop on Algorithms in Bioinformatics, (WABI
"... Abstract. Single nucleotide polymorphisms (SNPs) are the most frequent form of human genetic variation, of foremost importance for a variety of applications including medical diagnostic, phylogenies and drug design. The complete SNPs sequence information from each of the two copies of a given chromo ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
Abstract. Single nucleotide polymorphisms (SNPs) are the most frequent form of human genetic variation, of foremost importance for a variety of applications including medical diagnostic, phylogenies and drug design. The complete SNPs sequence information from each of the two copies of a given chromosome in a diploid genome is called a haplotype. The Haplotyping Problem for a single individual is as follows: Given a set of fragments from one individual’s DNA, find a maximally consistent pair of SNPs haplotypes (one per chromosome copy) by removing data “errors” related to sequencing errors, repeats, and paralogous recruitment. Two versions of the problem, i.e. the Minimum Fragment Removal (MFR) and the Minimum SNP Removal (MSR), are considered. The Haplotyping Problem was introduced in [8], where it was proved that both MSR and MFR are polynomially solvable when each fragment covers a set of consecutive SNPs (i.e., it is a gapless fragment), and NPhard
The Haplotyping Problem: An Overview of Computational Models and Solutions
 Journal of Computer Science and Technology
, 2003
"... The investigation of genetic di#erences among humans has given evidence that mutations in DNA sequences are responsible for some genetic diseases. The most common mutation is the one that involves only a single nucleotide of the DNA sequence, which is called a single nucleotide polymorphism (SNP) ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
The investigation of genetic di#erences among humans has given evidence that mutations in DNA sequences are responsible for some genetic diseases. The most common mutation is the one that involves only a single nucleotide of the DNA sequence, which is called a single nucleotide polymorphism (SNP). As a consequence, computing a complete map of all SNPs occurring in the human populations is one of the primary goals of recent studies in human genomics. The construction of such a map requires to determine the DNA sequences that from all chromosomes. In diploid organisms like humans, each chromosome consists of two sequences called haplotypes.
An Overview of Combinatorial Methods for Haplotype Inference
 Lecture Notes in Computer Science (2983): Computational Methods for SNPs and Haplotype Inference
, 2004
"... A current highpriority phase of human genomics involves the development of a full Haplotype Map of the human genome [23]. It will be used in largescale screens of populations to associate specific haplotypes with specific complex geneticinfluenced diseases. A key, perhaps bottleneck, problem is t ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
A current highpriority phase of human genomics involves the development of a full Haplotype Map of the human genome [23]. It will be used in largescale screens of populations to associate specific haplotypes with specific complex geneticinfluenced diseases. A key, perhaps bottleneck, problem is to computationally infer haplotype pairs from genotype data. This paper follows the talk given at the DIMACS Conference on SNPs and Haplotypes held in November of 2002. It reviews several combinatorial approaches to the haplotype inference problem that we have investigated over the last several years. In addition, it updates some of the work presented earlier, and discusses the current state of our work. 1 Introduction to SNP’s, Genotypes and Haplotypes In diploid organisms (such as humans) there are two (not completely identical) “copies ” of each chromosome, and hence of each region of interest.
Computing the Minimum Recombinant Haplotype Configuration from incomplete genotype data on a pedigree by integer linear programming
 Journal of Computational Biology
, 2005
"... We study the problem of reconstructing haplotype configurations from genotypes on pedigree data with missing alleles under the Mendelian law of inheritance and the minimum recombination principle, which is important for the construction of haplotype maps and genetic linkage/association analyses. Our ..."
Abstract

Cited by 19 (9 self)
 Add to MetaCart
We study the problem of reconstructing haplotype configurations from genotypes on pedigree data with missing alleles under the Mendelian law of inheritance and the minimum recombination principle, which is important for the construction of haplotype maps and genetic linkage/association analyses. Our previous results show that the problem of finding a minimumrecombinant haplotype configuration (MRHC) is in general NPhard. The existing algorithms for MRHC either are heuristic in nature and cannot guarantee optimality, or only work under some restrictions (on e.g. the size and structure of the input pedigree, the number of marker loci, the number of recombinants in the pedigree, etc.). In addition, most of them cannot handle data with missing alleles and, for those that do consider missing data, they usually do not perform well in terms of minimizing the number of recombinants when a significant fraction of alleles are missing. This paper presents an effective integer linear programming (ILP) formulation of the MRHC problem with missing data and a branchandbound strategy that utilizes a partial order relationship and some other special relationships among variables to decide the branching order. The partial order relationship is discovered in the preprocessing of constraints by considering