## BMC Bioinformatics BioMed Central Methodology article (2006)

### BibTeX

@MISC{Eronen06bmcbioinformatics,

author = {Lauri Eronen and Floris Geerts and Hannu Toivonen and Open Access},

title = {BMC Bioinformatics BioMed Central Methodology article},

year = {2006}

}

### OpenURL

### Abstract

HaploRec: efficient and accurate large-scale reconstruction of haplotypes

### Citations

524 |
A new statistical method for haplotype reconstruction from population data
- Stephens, Smith, et al.
- 2001
(Show Context)
Citation Context ... of Clayton [19] is also based on the EM algorithm, but uses a sequential pruning strategy; HaploRec also uses a similar pruning approach. The PL strategy is also used in the current version of Phase =-=[7,8]-=-. PL-EM and Snphap are based on a multinomial haplotype probability model with a uniform Dirichlet prior. Phase, however, uses a prior distribution based on coalescent theory (see [20] for a review) a... |

232 |
Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population
- Excoffier, Slatkin
- 1995
(Show Context)
Citation Context ...loRec are novel, some of the algorithmic principles are similar to earlier work. HaploRec follows a likelihood-based expectation-maximization (EM) haplotype inference strategy which was introduced in =-=[4]-=-. PL-EM [5,6] overcomes the computational complexity of the basic EM approach by using a a pruning strategy on the possible haplotype resolutions, called Partition-Ligation (PL). The Snphap algorithm ... |

198 |
Generating Samples under the Wright-Fisher neutral model of genetic variation
- Hudson
- 2002
(Show Context)
Citation Context ...t [23] is used to validate the general observations from simulations, and to test the method in slightly different settings, especially with small sample sizes. We used Hudson's coalescence simulator =-=[24]-=- to simulate data sets of 1000 genotypes. Our simulated settings range from 5 to 500 markers, with average marker spacings between 6.6 and 166 kb and map lengths between 166 kb and 16.6 Mb. These mark... |

161 | A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase
- Scheet, Stephens
- 2006
(Show Context)
Citation Context ... universal block boundaries across the whole population, our model averages over all possible segmentations for each haplotype, without any fixed block boundaries. Quite recently, Scheet and Stephens =-=[21]-=- introduced fastPhase, which models the population with a set of founder haplotypes (or clusters); the cluster memberships are allowed to change continuously along the chromosome, according to a hidde... |

160 |
Bayesian haplotype inference for multiple linked single-nucleotide polymorphisms
- Niu, Qin, et al.
- 1994
(Show Context)
Citation Context ...ovel, some of the algorithmic principles are similar to earlier work. HaploRec follows a likelihood-based expectation-maximization (EM) haplotype inference strategy which was introduced in [4]. PL-EM =-=[5,6]-=- overcomes the computational complexity of the basic EM approach by using a a pruning strategy on the possible haplotype resolutions, called Partition-Ligation (PL). The Snphap algorithm of Clayton [1... |

76 | The power of amnesia
- RON, SINGER, et al.
- 1994
(Show Context)
Citation Context ...ov model only uses a simple frequency threshold to determine the context lengths. The set of contexts could be pruned further using the accuracy of predicting the next allele as a selection criterion =-=[30]-=-. Another possibility for refinement is to smooth the probability over several context lengths simultaneously [31]. Ideas from the models of Phase and Gerbil could be used to better account for mutati... |

74 |
Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation
- Stephens, Scheet
(Show Context)
Citation Context ... of Clayton [19] is also based on the EM algorithm, but uses a sequential pruning strategy; HaploRec also uses a similar pruning approach. The PL strategy is also used in the current version of Phase =-=[7,8]-=-. PL-EM and Snphap are based on a multinomial haplotype probability model with a uniform Dirichlet prior. Phase, however, uses a prior distribution based on coalescent theory (see [20] for a review) a... |

69 |
High-resolution haplotype structure in the human genome. Nat Genet
- MJ, JD, et al.
(Show Context)
Citation Context ...notypes). Still, HaploRec is very competitive, even with its default parameter values chosen based on the simulated, larger data sets with sparser marker spacing. Experiments with the dense Daly data =-=[26]-=- (results not shown) indicate the same: among the tested methods, HaploRec was second only to Phase and fastPhase while being fastest of all methods. Future experiments with large real data sets, as P... |

56 | On prediction using variable order Markov models
- Begleiter, El-Yaniv, et al.
- 2004
(Show Context)
Citation Context ...Markov chain is thus individually adjusted for each position and each haplotype. Although both fixed and variable-order Markov models have been extensively studied and used for many applications (see =-=[34]-=- for a review of variable-order Markov models), we are not aware of any previous applications to haplotype reconstruction. There is also a subtle difference between these models and typical applicatio... |

55 |
Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms
- Niu, Liu
(Show Context)
Citation Context ...ovel, some of the algorithmic principles are similar to earlier work. HaploRec follows a likelihood-based expectation-maximization (EM) haplotype inference strategy which was introduced in [4]. PL-EM =-=[5,6]-=- overcomes the computational complexity of the basic EM approach by using a a pruning strategy on the possible haplotype resolutions, called Partition-Ligation (PL). The Snphap algorithm of Clayton [1... |

49 | Model-based inference of haplotype block variation
- Greenspan, Geiger
- 2003
(Show Context)
Citation Context ...els scale naturally to the long and sparse marker maps often used in gene mapping. Our segmentation-based model bears some resemblance to methods which combine haplotype block finding and haplotyping =-=[9,10]-=-. However, whereas these models place universal block boundaries across the whole population, our model averages over all possible segmentations for each haplotype, without any fixed block boundaries.... |

46 |
GERBIL: Genotype resolution and block identification using likelihood
- Kimmel, Shamir
- 2004
(Show Context)
Citation Context ...els scale naturally to the long and sparse marker maps often used in gene mapping. Our segmentation-based model bears some resemblance to methods which combine haplotype block finding and haplotyping =-=[9,10]-=-. However, whereas these models place universal block boundaries across the whole population, our model averages over all possible segmentations for each haplotype, without any fixed block boundaries.... |

43 |
JA: Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 2005, 6:109–118. Coe E: East, Emerson, and the birth of maize genetics
- WYS, BJ, et al.
(Show Context)
Citation Context ...ssary compromises when choosing the window size, and also close to linear in the number of genotypes to allow sample sizes of hundreds to thousands of individuals, as required by association analysis =-=[18]-=-. HaploRec produces accurate haplotype reconstructions, and scales to long marker maps (or windows) that span long genetic regions. While the statistical models of HaploRec are novel, some of the algo... |

38 |
Inference of haplotypes from PCR-amplified samples of diploid populations. Mol Biol Evol 7:111–122 Donnelly P
- AG
- 1990
(Show Context)
Citation Context ...ss. We propose the following solution. When we estimate the probability of a haplotype H and consider the variable-order Markovian distribution at marker i, we find the longest observed fragment that =-=(1)-=- matches haplotype H and ends at marker i - 1, and (2) has a frequency exceeding some given threshold, minfr. The fragments whose frequency does not exceed this threshold are considered uninformative.... |

30 | A linear-time algorithm for the perfect phylogeny haplotyping problem - Ding, ilkov, et al. - 2005 |

26 |
Haplotypes vs single marker linkage disequilibrium tests: what do we gain
- Akey
- 2001
(Show Context)
Citation Context ...he disease and the population [18,32]. However, many methods for association-based gene mapping assume haplotype data. It has been shown, too, that haplotypes can be more powerful than single markers =-=[33]-=-. We presented models and methods for statistical haplotype reconstruction from genotypes of unrelated individuals, and specifically targeted large and sparse data sets, such as those needed in chromo... |

9 |
Little loss of information due to unknown phase for fine-scale linkage-disequilibrium mapping with single-nucleotide-polymorphism genotype data
- Morris, Whittaker, et al.
- 2004
(Show Context)
Citation Context ...ruction tends to exaggerate linkage disequilibrium since haplotyping methods more or less directly aim to maximize it. It has been shown that estimated haplotypes can, indeed, lead to false positives =-=[27,28]-=-. On the other hand, this does not always have to be the case. A simulation study shows that in association analysis, the haplotypes produced by HaploRec can be equally powerful to the true haplotypes... |

8 |
SNPHAP - A program for estimating frequencies of large haplotypes of SNPs
- Clayton
- 2006
(Show Context)
Citation Context ...6] overcomes the computational complexity of the basic EM approach by using a a pruning strategy on the possible haplotype resolutions, called Partition-Ligation (PL). The Snphap algorithm of Clayton =-=[19]-=- is also based on the EM algorithm, but uses a sequential pruning strategy; HaploRec also uses a similar pruning approach. The PL strategy is also used in the current version of Phase [7,8]. PL-EM and... |

7 | Yooseph S, Istrail S: A survey of computational methods for determining haplotypes - BV, Bafna, et al. |

6 |
Estimated haplotype counts from case-control samples cannot be treated as observed counts. Am J Hum Genet 78(4), 729–30; author reply 728–9
- Curtis, Sham
- 2006
(Show Context)
Citation Context ...ruction tends to exaggerate linkage disequilibrium since haplotyping methods more or less directly aim to maximize it. It has been shown that estimated haplotypes can, indeed, lead to false positives =-=[27,28]-=-. On the other hand, this does not always have to be the case. A simulation study shows that in association analysis, the haplotypes produced by HaploRec can be equally powerful to the true haplotypes... |

5 | Schork N: A comprehensive literature review of haplotyping software and methods for use with unrelated individuals. Human Genomics 2005 - Salem, Wessel |

5 |
genealogies and the coalescent process. In Oxford surveys in evolutionary biology Volume 7. Edited by: Futuyma D, Antonovics J
- Gene
- 1991
(Show Context)
Citation Context ...rsion of Phase [7,8]. PL-EM and Snphap are based on a multinomial haplotype probability model with a uniform Dirichlet prior. Phase, however, uses a prior distribution based on coalescent theory (see =-=[20]-=- for a review) and uses Bayesian inference implemented with Gibbs sampling instead of EM. The underlying idea for our statistical models for haplotypes is that we derive an overall probability for a h... |

5 | Improved smoothing for probabilistic suffix trees seen as variable order Markov chains
- Kermorvant, Dupont
- 2002
(Show Context)
Citation Context ...ned further using the accuracy of predicting the next allele as a selection criterion [30]. Another possibility for refinement is to smooth the probability over several context lengths simultaneously =-=[31]-=-. Ideas from the models of Phase and Gerbil could be used to better account for mutations and genotyping errors. A possible approach would be to allow for a small number of mismatches between the hapl... |

4 | E: A note on phasing long genomic regions using local haplotype predictions. Journal of bioinformatics and computational biology 2006, 4(3):639-648. doi:10.1186/1471-2105-13-S6-S3 Cite this article as: Efros and Halperin: Haplotype reconstruction using pe
- Eskin, Sharan, et al.
(Show Context)
Citation Context ...mputationally haplotyping a long map is to first divide the map to small, overlapping windows, to reconstruct the haplotypes in each window separately, and then to combine haplotypes from the windows =-=[16]-=-. HaploRec is aimed to have the following important properties. First, increasing the window size should give relatively more accurate results since large windows contain more information, i.e., addin... |

3 |
GJ, Aach JD, Mitra RD, Church GM: Long-range polony haplotyping of individual human chromosome molecules. Nature genetics 2006
- Zhang, Zhu, et al.
(Show Context)
Citation Context ... proposed algorithm HaploRec, do [410]. For a review of these and other haplotyping methods we refer to [11-13]. Laboratory techniques are being developed for direct molecular haplotyping (see, e.g., =-=[14,15]-=-), but these techniques are not mature yet, and are currently time consuming and expensive. Page 1 of 18 (page number not for citation purposes)BMC Bioinformatics 2006, 7:542 http://www.biomedcentral... |

3 |
Toivonen H: A Markov chain approach to reconstruction of long haplotypes
- Eronen, Geerts
(Show Context)
Citation Context ...haplotype probability models based on this idea; full specifications are given in the Methods section. We introduced the ideas for two of them (the Markovian models) in a preliminary conference paper =-=[22]-=-, one (the segmentation model) is completely novel. The actual haplotyping algorithm has also been greatly improved, resulting in significantly more accurate results and reduced running times, while s... |

2 |
JL, Zagury JF: Computation of haplotypes on SNPs subsets: advantage of the "global method
- Coulonges, Delaneau, et al.
(Show Context)
Citation Context ...ndows contain more information, i.e., adding markers should improve accuracy (in the phases between the markers that already were there), whether the new markers are added between the old ones or not =-=[17]-=-. Second, the time complexity of the algorithm should be close to linear in the number of markers, in order to avoid unnecessary compromises when choosing the window size, and also close to linear in ... |

2 |
Onkamo P, Eronen L, Toivonen H: An empirical comparison of case-control and trio based study designs in high throughput association mapping
- Hintsanen, Sevon
(Show Context)
Citation Context ... have to be the case. A simulation study shows that in association analysis, the haplotypes produced by HaploRec can be equally powerful to the true haplotypes, despite some inevitable phasing errors =-=[29]-=-. The locality of the statistical models has a subtle role here: in the case-control settings normally used in association studies, linkage disequilibrium is increased in the cases in the vicinity of ... |

1 |
Jiricny CR, Boland , Lynch HT, Chadwick RB, de la Chapelle
- Yan, Papadopoulos, et al.
(Show Context)
Citation Context ... proposed algorithm HaploRec, do [410]. For a review of these and other haplotyping methods we refer to [11-13]. Laboratory techniques are being developed for direct molecular haplotyping (see, e.g., =-=[14,15]-=-), but these techniques are not mature yet, and are currently time consuming and expensive. Page 1 of 18 (page number not for citation purposes)BMC Bioinformatics 2006, 7:542 http://www.biomedcentral... |