Results 1 - 10
of
57
Learning in graphical models
, 2004
"... Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for ..."
Abstract
-
Cited by 469 (8 self)
- Add to MetaCart
Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in large-scale data analysis problems. We also present examples of graphical models in bioinformatics, error-control coding and language processing. Key words and phrases: Probabilistic graphical models, junction tree algorithm, sum-product algorithm, Markov chain Monte Carlo, variational inference, bioinformatics, error-control coding.
Hidden Markov models in computational biology: applications to protein modeling
- JOURNAL OF MOLECULAR BIOLOGY
, 1994
"... Hidden.Markov Models (HMMs) are applied t.0 the problems of statistical modeling, database searching and multiple sequence alignment of protein families and protein domains. These methods are demonstrated the on globin family, the protein kinase catalytic domain, and the EF-hand calcium binding moti ..."
Abstract
-
Cited by 436 (29 self)
- Add to MetaCart
Hidden.Markov Models (HMMs) are applied t.0 the problems of statistical modeling, database searching and multiple sequence alignment of protein families and protein domains. These methods are demonstrated the on globin family, the protein kinase catalytic domain, and the EF-hand calcium binding motif. In each case the parameters of an HMM are estimated from a training set of unaligned sequences. After the HMM is built, it is used to obtain a multiple alignment of all the training sequences. It is also used to search the. SWISS-PROT 22 database for other sequences. that are members of the given protein family, or contain the given domain. The Hi " produces multiple alignments of good quality that agree closely with the alignments produced by programs that incorporate threedimensional structural information. When employed in discrimination tests (by examining how closely the sequences in a database fit the globin, kinase and EF-hand HMMs), the '\ HMM is able to distinguish members of these families from non-members with a high degree of accuracy. Both the HMM and PROFILESEARCH (a technique used to search for relationships between a protein sequence and multiply aligned sequences) perform better in these tests than PROSITE (a dictionary of sites and patterns in proteins). The HMM appecvs to have a slight advantage over PROFILESEARCH in terms of lower rates of false
Stochastic Context-Free Grammars for Modeling RNA
, 1993
"... this paper, we apply stochastic context-free grammars (SCFGs) to the problems of statistical modeling, database searching, multiple alignment, and prediction of the secondary structure of RNA families. This approach is highly related to our previous work on modeling protein families with HMMs [HKMS9 ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
this paper, we apply stochastic context-free grammars (SCFGs) to the problems of statistical modeling, database searching, multiple alignment, and prediction of the secondary structure of RNA families. This approach is highly related to our previous work on modeling protein families with HMMs [HKMS93, KBM
Graphical Models for Genetic Analyses
- STATISTTICAL SCIENCE
, 2003
"... This paper introduces graphical models as a natural environment in which to formulate and solve problems in genetics and related areas. Particular emphasis is given to the relationships among various local computation algorithms which have been developed within the hitherto mostly separate areas o ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
This paper introduces graphical models as a natural environment in which to formulate and solve problems in genetics and related areas. Particular emphasis is given to the relationships among various local computation algorithms which have been developed within the hitherto mostly separate areas of graphical models and genetics. The potential of graphical models is explored and illustrated through a number of example applications where the genetic element is substantial or dominating.
CARTHAGENE: Constructing and Joining Maximum Likelihood Genetic Maps
, 1997
"... Genetic mapping is an important step in the study of any organism. An accurate genetic map is extremely valuable for locating genes or more generally either qualitative or quantitative trait loci (QTL). This paper presents a new approach to two important problems in genetic mapping: automatically or ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Genetic mapping is an important step in the study of any organism. An accurate genetic map is extremely valuable for locating genes or more generally either qualitative or quantitative trait loci (QTL). This paper presents a new approach to two important problems in genetic mapping: automatically ordering markers to obtain a multipoint maximum likelihood map and building a multipoint maximum likelihood map using pooled data from several crosses. The approach is embodied in an hybrid algorithm that mixes the statistical optimization algorithm EM with local search techniques which have been developed in the artificial intelligence and operations research communities. An efficient implementation of the EM algorithm provides maximum likelihood recombination fractions, while the local search techniques look for orders that maximize this maximum likelihood. The specificity of the approach lies in the neighborhood structure used in the local search algorithms which has been inspired by an an...
Statistical issues in the search for genes affecting quantitative traits in experimental populations
- Statistical Science
, 1997
"... Abstract. This article reviews key contributions in the area of statistics as applied to the use of molecular marker technology and quantitative genetics in the search for genes affecting quantitative traits responsible for specific diseases and economically important agronomic traits. Since an exha ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Abstract. This article reviews key contributions in the area of statistics as applied to the use of molecular marker technology and quantitative genetics in the search for genes affecting quantitative traits responsible for specific diseases and economically important agronomic traits. Since an exhaustive literature review is not possible, the limited scope of this work is to encourage further statistical work in this vast field by first reviewing human and domestic species literature, and then concentrating on the statistical developments for experimental breeding populations. Substantial gains have been made over the years by both plant and animal breeders toward a long-term goal of locating genes affecting quantitative traits (quantitative trait loci, QTLs) for the eventual characterization and manipulation of these genes in order to develop improved agronomically important traits. Our main concern is that the care and expense that are required in generating both genetic marker data and quantitative trait data should be accompanied by equal care in the statistical analysis of the data. Through an example using an F 2 male genetic map of mouse chromosome 10, and quantitative trait values measured on weight gain, we implement much of the reviewed methodology for the purpose of detecting or locating a QTL having an effect on weight gain. Key words and phrases: Interval mapping; interval testing; multiple markers; mixture distribution; QTL; single markers. 1.
The Application of Stochastic Context-Free Grammars to Folding, Aligning and Modeling Homologous RNA Sequences
, 1993
"... Stochastic context-free grammars (SCFGs) are applied to the problems of folding, aligning and modeling families of homologous RNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Stochastic context-free grammars (SCFGs) are applied to the problems of folding, aligning and modeling families of homologous RNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. The novel aspect of this work is that SCFG parameters are learned automatically from unaligned, unfolded training sequences. A generalization of the HMM forward-backward algorithm is introduced to do this. The new algorithm, Tree-Grammar EM, based on tree grammars and faster than the previously proposed SCFG inside-outside training algorithm, produced a model that we tested on the transfer RNA (tRNA) family. Results show that after having been trained on as few as 20 tRNA sequences from only two tRNA subfamilies (mitochondrial and cytoplasmic), the model can discern general tRNA from similarlength RNA sequences of other kinds, can find secondary structure of new tRNA sequences, and c...
On the complexity of fundamental computational problems in pedigree analysis
- Computer Science Department, University of California, Davis
, 1999
"... On the complexity of fundamental computational problems in pedigree analysis ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
On the complexity of fundamental computational problems in pedigree analysis
Computational Methods for Complex Stochastic Systems: A Review of Some Alternatives to MCMC
"... We consider analysis of complex stochastic models based upon partial information. MCMC and reversible jump MCMC are often the methods of choice for such problems, but in some situations they can be difficult to implement; and suffer from problems such as poor mixing, and the difficulty of diagnosing ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
We consider analysis of complex stochastic models based upon partial information. MCMC and reversible jump MCMC are often the methods of choice for such problems, but in some situations they can be difficult to implement; and suffer from problems such as poor mixing, and the difficulty of diagnosing convergence. Here we review three alternatives to MCMC methods: importance sampling, the forward-backward algorithm, and sequential Monte Carlo (SMC). We discuss how to design good proposal densities for importance sampling, show some of the range of models for which the forward-backward algorithm can be applied, and show how resampling ideas from SMC can be used to improve the efficiency of the other two methods. We demonstrate these methods on a range of examples, including estimating the transition density of a diffusion and of a discrete-state continuous-time Markov chain; inferring structure in population genetics; and segmenting genetic divergence data.

