Results 1 - 10
of
11
Learning in graphical models
, 2004
"... Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for ..."
Abstract
-
Cited by 469 (8 self)
- Add to MetaCart
Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in large-scale data analysis problems. We also present examples of graphical models in bioinformatics, error-control coding and language processing. Key words and phrases: Probabilistic graphical models, junction tree algorithm, sum-product algorithm, Markov chain Monte Carlo, variational inference, bioinformatics, error-control coding.
Optimizing Exact Genetic Linkage Computations
, 2003
"... Genetic linkage analysis is a challenging application which requires Bayesian networks consisting of thousands of vertices. Consequently, computing the likelihood of data, which is needed for learning linkage parameters, using exact inference procedures calls for an extremely efficient implementatio ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
Genetic linkage analysis is a challenging application which requires Bayesian networks consisting of thousands of vertices. Consequently, computing the likelihood of data, which is needed for learning linkage parameters, using exact inference procedures calls for an extremely efficient implementation that carefully optimizes the order of conditioning and summation operations. In this paper we present the use of stochastic greedy algorithms for optimizing this order. Our algorithm has been incorporated into the newest version of superlink, which is currently the fastest genetic linkage program for exact likelihood computations in general pedigrees. We demonstrate an order of magnitude improvement in run times of likelihood computations using our new optimization algorithm, and hence enlarge the class of problems that can be handled effectively by exact computations.
Maximum Likelihood Haplotyping for General Pedigrees
, 2004
"... Haplotype data is valuable in mapping disease-susceptibility genes in the study of Mendelian and complex diseases. We present algorithms for inferring a most likely haplotype configuration for general pedigrees, implemented in the newest version of the genetic linkage analysis system SUPERLINK. In S ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
Haplotype data is valuable in mapping disease-susceptibility genes in the study of Mendelian and complex diseases. We present algorithms for inferring a most likely haplotype configuration for general pedigrees, implemented in the newest version of the genetic linkage analysis system SUPERLINK. In SUPERLINK, genetic linkage analysis problems are represented internally using Bayesian networks. The use of Bayesian networks enables efficient maximum likelihood haplotyping for more complex pedigrees than was previously possible. Furthermore, to support efficient haplotyping for larger pedigrees, we have also incorporated a novel algorithm for determining a better elimination order for the variables of the Bayesian network. The presented optimization algorithm also improves likelihood computations. We present experimental results for the new algorithms on a variety of real and semiartificial data sets, and use our software to evaluate MCMC approximations for haplotyping.
Multilocus linkage analysis by blocked Gibbs sampling
- Statistics and Computing
, 2000
"... The problem of multilocus linkage analysis is expressed as a graphical model, making explicit a previously implicit connection, and recent developments in the field are described in this context. A novel application of blocked Gibbs sampling for Bayesian networks is developed to generate inheritance ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The problem of multilocus linkage analysis is expressed as a graphical model, making explicit a previously implicit connection, and recent developments in the field are described in this context. A novel application of blocked Gibbs sampling for Bayesian networks is developed to generate inheritance matrices from an irreducible Markov chain. This is used as the basis for reconstruction of historical meiotic states and approximate calculation of the likelihood function for the location of an unmapped genetic trait. We believe this to be the only approach that currently makes fully informative multilocus linkage analysis possible on large extended pedigrees.
Likelihood Computations Using Value Abstraction
- In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence
, 2000
"... In this paper, we use evidence-specific value abstraction for speeding Bayesian networks inference. ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
In this paper, we use evidence-specific value abstraction for speeding Bayesian networks inference.
Rapid Multipoint Linkage Analysis of Recessive Traits in Nuclear Families, including Homozygosity Mapping
"... this paper allows very rapid multipoint likelihood calculation in nuclear families (with or without parental consanguinity), and the accompanying software package makes multipoint mapping feasible in many experimental contexts. 18 18 Acknowledgments We thank David Botstein and Michele Gschwend for ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
this paper allows very rapid multipoint likelihood calculation in nuclear families (with or without parental consanguinity), and the accompanying software package makes multipoint mapping feasible in many experimental contexts. 18 18 Acknowledgments We thank David Botstein and Michele Gschwend for many discussions concerning homozygosity mapping and for sharing unpublished data. We thank Daniel Kastner and his colleagues for sharing the pedigree and genotype data from their FMF studies. We thank Robert Elston, Michael Boehnke, Augustine Kong, and an anonymous referee for comments on the manuscript. This work was supported in part by a grant from the National Institutes of Health (HG00098) to E.S.L. 19 19 Appendix: Description of algorithm Consider a fixed map of M ordered marker loci with known recombination fractions q i between loci i and i+1. We wish to compute the likelihood for a given pedigree. Following Lander and Green (1987), the inheritance pattern at each locus i (i=1, 2, ..., M) can be described by an n-bit vector v i . Each bit describes the outcome of one of the n meioses in the pedigree: the bit is 0 if the paternally derived allele is transmitted and 1 if the maternally derived allele is transmitted. The set of all possible n-bit vectors will be identified with Z 2 ( )
Online system for faster multipoint linkage analysis via parallel execution on thousands of personal computers
- American Journal of Human Genetics
"... Computation of LOD scores is a valuable tool for mapping disease-susceptibility genes in the study of Mendelian and complex diseases. However, computation of exact multipoint likelihoods of large inbred pedigrees with extensive missing data is often beyond the capabilities of a single computer. We p ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Computation of LOD scores is a valuable tool for mapping disease-susceptibility genes in the study of Mendelian and complex diseases. However, computation of exact multipoint likelihoods of large inbred pedigrees with extensive missing data is often beyond the capabilities of a single computer. We present a distributed system called “SU-PERLINK-ONLINE, ” for the computation of multipoint LOD scores of large inbred pedigrees. It achieves high performance via the efficient parallelization of the algorithms in SUPERLINK, a state-of-the-art serial program for these tasks, and through the use of the idle cycles of thousands of personal computers. The main algorithmic challenge has been to efficiently split a large task for distributed execution in a highly dynamic, nondedicated running environment. Notably, the system is available online, which allows computationally intensive analyses to be performed with no need for either the installation of software or the maintenance of a complicated distributed environment. As the system was being developed, it was extensively tested by collaborating medical centers worldwide on a variety of real data sets, some of which are presented in this article. Computation of LOD is a valuable tool for mapping disease-susceptibility genes in the study of Mendelian and complex diseases. Computation of the LOD score— defined as log 10 (L HA/L H0) , where LH0
The Additive Genetic Gamma Frailty Model for Linkage Analysis of Age-of-Onset Variation
"... ... This paper extends the gamma frailty model by incorporating inheritance vector information and provides a semiparametric approach for linkage testing. For a given inheritance vector at the putative disease locus, we construct an additive genetic gamma frailty for each individual within a nuclear ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
... This paper extends the gamma frailty model by incorporating inheritance vector information and provides a semiparametric approach for linkage testing. For a given inheritance vector at the putative disease locus, we construct an additive genetic gamma frailty for each individual within a nuclear family and use the Cox proportional hazard model to model age of onset. We derive the conditional hazard ratio parameter for sib pairs and define a likelihood ratio based LOD score statistic under our model. The EM algorithm is used for estimating the parameters and the maximum likelihood functions. Simulated data sets are used to illustrate these new statistical methods.
Speeding up HMM algorithms for genetic linkage analysis via chain reductions of the state space
- BIOINFORMATICS
, 2009
"... ..."

