Results 1 - 10
of
10
Sequential Monte Carlo Methods for Dynamic Systems
- Journal of the American Statistical Association
, 1998
"... A general framework for using Monte Carlo methods in dynamic systems is provided and its wide applications indicated. Under this framework, several currently available techniques are studied and generalized to accommodate more complex features. All of these methods are partial combinations of three ..."
Abstract
-
Cited by 340 (4 self)
- Add to MetaCart
A general framework for using Monte Carlo methods in dynamic systems is provided and its wide applications indicated. Under this framework, several currently available techniques are studied and generalized to accommodate more complex features. All of these methods are partial combinations of three ingredients: importance sampling and resampling, rejection sampling, and Markov chain iterations. We deliver a guideline on how they should be used and under what circumstance each method is most suitable. Through the analysis of differences and connections, we consolidate these methods into a generic algorithm by combining desirable features. In addition, we propose a general use of Rao-Blackwellization to improve performances. Examples from econometrics and engineering are presented to demonstrate the importance of Rao-Blackwellization and to compare different Monte Carlo procedures. Keywords: Blind deconvolution; Bootstrap filter; Gibbs sampling; Hidden Markov model; Kalman filter; Markov...
Bayesian Methods for Hidden Markov Models -- Recursive Computing in the 21st Century
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2002
"... Markov chain Monte Carlo (MCMC) sampling strategies can be used to simulate hidden Markov model (HMM) parameters from their posterior distribution given observed data. Some MCMC methods (for computing likelihood, conditional probabilities of hidden states, and the most likely sequence of states) use ..."
Abstract
-
Cited by 52 (8 self)
- Add to MetaCart
Markov chain Monte Carlo (MCMC) sampling strategies can be used to simulate hidden Markov model (HMM) parameters from their posterior distribution given observed data. Some MCMC methods (for computing likelihood, conditional probabilities of hidden states, and the most likely sequence of states) used in practice can be improved by incorporating established recursive algorithms. The most important is a set of forward-backward recursions calculating conditional distributions of the hidden states given observed data and model parameters. We show how to use the recursive algorithms in an MCMC context and demonstrate mathematical and empirical results showing a Gibbs sampler using the forward-backward recursions mixes more rapidly than another sampler often used for HMM's. We introduce an augmented variables technique for obtaining unique state labels in HMM's and finite mixture models. We show how recursive computing allows statistically efficient use of MCMC output when estimating the hidden states. We directly calculate the posterior distribution of the hidden chain's state space size by MCMC, circumventing asymptotic arguments underlying the Bayesian information criterion, which is shown to be inappropriate for a frequently analyzed data set in the HMM literature. The use of log-likelihood for assessing MCMC convergence is illustrated, and posterior predictive checks are used to investigate application specific questions of model adequacy.
Bayesian Segmentation of Protein Secondary Structure
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 2000
"... We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of protein sequence/structure relationships in terms of structural segments, and formulate secondary structure prediction as a general Bayesian inference problem. A distinctive feature of our approach is the ability to develop explicit probabilistic models for -helices, -strands, and other classes of secondary structure, incorporating experimentally and empirically observed aspects of protein structure such as helical capping signals, side chain correlations, and segment length distributions. Our model is Markovian in the segments, permitting ef# cient exact calculation of the posterior probability distribution over all possible segmentations of the sequence using dynamic programming. The optimal segmentation is computed and compared to a predictor based on marginal posterior modes, and the latter is shown to provide signi# cant improvement in predictive accuracy. The marginalization procedure provides exact secondary structure probabilities at each sequence position, which are shown to be reliable estimates of prediction uncertainty. We apply this model to a database of 452 nonhomologous structures, achieving accuracies as high as the best currently available methods. We conclude by discussing an extension of this framework to model nonlocal interactions in protein structures, providing a possible direction for future improvements in secondary structure prediction accuracy.
Unified Gibbs Method For Biological Sequence Analysis
"... The biotechnology revolution stems from rapid advances in the biological sciences. One important product of these advances is a large and rapidly growing data base of biopolymer (DNA, RNA, and protein) sequences, which has attracted much attention from researchers in different fields. The great majo ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The biotechnology revolution stems from rapid advances in the biological sciences. One important product of these advances is a large and rapidly growing data base of biopolymer (DNA, RNA, and protein) sequences, which has attracted much attention from researchers in different fields. The great majority of the techniques generated for studying these data have been designed to analyze a single sequence or for the comparison of a pair of sequences. Multiple sequence analysis has remained a difficult challenge. In recent years, formal statistical models have shown potential in one such problem, multiple sequence alignment. In this article we describe a general statistical paradigm, the unified Gibbs method, for the conversion of nearly any existing method for the analysis of a single sequence or for the comparison of a pair of sequences into a multiple sequence analysis method. Our previous successful experiences with the unified Gibbs include the development of the site sampler, the moti...
Bayesian methods in biological sequence analysis
- Handbook of Statistical Genetics, 2nd ed
, 2003
"... Hidden Markov models, the expectation–maximization algorithm, and the Gibbs sampler were introduced for biological sequence analysis in early 1990s. Since then the use of formal statistical models and inference procedures has revolutionized the field of computational biology. This chapter reviews th ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Hidden Markov models, the expectation–maximization algorithm, and the Gibbs sampler were introduced for biological sequence analysis in early 1990s. Since then the use of formal statistical models and inference procedures has revolutionized the field of computational biology. This chapter reviews the hidden Markov and related models, as well as their Bayesian inference procedures and algorithms, for sequence alignments and gene regulatory binding motif discoveries. We emphasize that the combination of Markov chain Monte Carlo and dynamic-programming techniques often results in effective algorithms for NP-hard problems in sequence analysis. In the past decade, we have witnessed the development of the likelihood approach to pairwise sequence alignments (Bishop and Thompson, 1986; Thorne et al., 1991); probabilistic models for RNA secondary structure (Zuker, 1989; Lowe and Eddy, 1997);
Self-Organizing Maps and its Applications in Sleep Apnea Research and Molecular Genetics
, 2000
"... This paper presents the application of special unsupervised neural networks (self-organizing maps) to different domains, as sleep apnea discovery, protein sequences analysis and tumor classification. An enhancement of the original algorithm, as well as the introduction of several hierachical leve ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper presents the application of special unsupervised neural networks (self-organizing maps) to different domains, as sleep apnea discovery, protein sequences analysis and tumor classification. An enhancement of the original algorithm, as well as the introduction of several hierachical levels enables the discovery of complex structures as present in this type of applications. Furthermore, an integration of unsupervised neural networks with hidden markov models is proposed. Keywords: Unsupervised Neural Networks, Hidden Markov Models, Sleep Apnea, Protein Sequences, Tumor Classification 1 Introduction The development of more and more powerful computers in recent years has lead to a recording of a great amount of data gathered from, for example, industrial processes, medical applications, meteorological phenomena, etc. Artificial neural networks (ANNs) and methods from statistics are particularly interesting for handling such noisy and inconsistent data. The application of...
Advance Monte Carlo Filters and their Applications in Nonlinear/Non-Gaussian Dynamic Systems. 1
"... Stochastic systems are routinely used in science, engineering and economics. Many of these systems have a natural dynamic structure; others can often be built up dynamically. Except for a few special cases such as the linear Gaussian models or the discrete hidden Markov models, statistical analysis ..."
Abstract
- Add to MetaCart
Stochastic systems are routinely used in science, engineering and economics. Many of these systems have a natural dynamic structure; others can often be built up dynamically. Except for a few special cases such as the linear Gaussian models or the discrete hidden Markov models, statistical analysis of these systems still present major challenges to researchers. The sequential Monte
Mammalian Genomes Ease the Location of Human Transcription Factor Binding Sites but Do Not Ease Their Description Abstract
, 2004
"... Comparisons of multiple related genomes have already produced a number of interesting findings, and sequencing resources are available to obtain the genomes of many more species. For studies of human disease, there is naturally a strong interest in the genomes of vertebrates, especially mammals. Dec ..."
Abstract
- Add to MetaCart
Comparisons of multiple related genomes have already produced a number of interesting findings, and sequencing resources are available to obtain the genomes of many more species. For studies of human disease, there is naturally a strong interest in the genomes of vertebrates, especially mammals. Decisions concerning the particular species to sequence depend on a number of important factors. While much useful and constructive discussion about these choices has ensued, there have been few quantitative analyses addressing this issue. Here we consider two of these factors: 1) pattern discovery of functional elements, such as transcription factor binding site models, and 2) identification of unusually conserved sequence fragments. To address these issues, we examined data from seven mammals (dog, cow, pig, rat, cat, baboon, and chimpanzee) which are being sequenced in the NISC Comparative Sequencing Program. We find that, taken together, the data from human, mouse, and the seven additional mammals are only 1.5 times as effective for pattern identification as the data from human and mouse alone. Contrastingly, they are 3.5 times as effective for identification of conserved fragments. For many reasons, the sequencing of these mammalian genomes is, and will continue to be, a valuable endeavor, but our results suggest that its contribution to the identification of the patterns of functional sites in DNA sequence will be limited. Interestingly, our results are less pessimistic about its contribution to the identification of sequence conservation, and they suggest that the availability of additional sequences will contribute significantly to such an endeavor.
Journal Of Computational Biology
- Journal of Computational Biology
, 2001
"... We consider the problem of inferring fold changes in gene expression from cDNA microarray data. Standard procedures focus on the ratio of measured # uorescent intensities at each spot on the microarray, but to do so is to ignore the fact that the variation of such ratios is not constant. Estimates o ..."
Abstract
- Add to MetaCart
We consider the problem of inferring fold changes in gene expression from cDNA microarray data. Standard procedures focus on the ratio of measured # uorescent intensities at each spot on the microarray, but to do so is to ignore the fact that the variation of such ratios is not constant. Estimates of gene expression changes are derived within a simple hierarchical model that accounts for measurement error and # uctuations in absolute gene expression levels. Signi# cant gene expression changes are identi# ed by deriving the posterior odds of change within a similar model. The methods are tested via simulation and are applied to a panel of Escherichia coli microarrays.

