Results 1  10
of
24
Sequential Monte Carlo Methods for Dynamic Systems
 Journal of the American Statistical Association
, 1998
"... A general framework for using Monte Carlo methods in dynamic systems is provided and its wide applications indicated. Under this framework, several currently available techniques are studied and generalized to accommodate more complex features. All of these methods are partial combinations of three ..."
Abstract

Cited by 581 (10 self)
 Add to MetaCart
A general framework for using Monte Carlo methods in dynamic systems is provided and its wide applications indicated. Under this framework, several currently available techniques are studied and generalized to accommodate more complex features. All of these methods are partial combinations of three ingredients: importance sampling and resampling, rejection sampling, and Markov chain iterations. We deliver a guideline on how they should be used and under what circumstance each method is most suitable. Through the analysis of differences and connections, we consolidate these methods into a generic algorithm by combining desirable features. In addition, we propose a general use of RaoBlackwellization to improve performances. Examples from econometrics and engineering are presented to demonstrate the importance of RaoBlackwellization and to compare different Monte Carlo procedures. Keywords: Blind deconvolution; Bootstrap filter; Gibbs sampling; Hidden Markov model; Kalman filter; Markov...
Bayesian methods for hidden Markov models
 Journal of the American Statistical Association
"... ..."
Bayesian Segmentation of Protein Secondary Structure
 JOURNAL OF COMPUTATIONAL BIOLOGY
, 2000
"... We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of ..."
Abstract

Cited by 41 (7 self)
 Add to MetaCart
We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of protein sequence/structure relationships in terms of structural segments, and formulate secondary structure prediction as a general Bayesian inference problem. A distinctive feature of our approach is the ability to develop explicit probabilistic models for helices, strands, and other classes of secondary structure, incorporating experimentally and empirically observed aspects of protein structure such as helical capping signals, side chain correlations, and segment length distributions. Our model is Markovian in the segments, permitting ef# cient exact calculation of the posterior probability distribution over all possible segmentations of the sequence using dynamic programming. The optimal segmentation is computed and compared to a predictor based on marginal posterior modes, and the latter is shown to provide signi# cant improvement in predictive accuracy. The marginalization procedure provides exact secondary structure probabilities at each sequence position, which are shown to be reliable estimates of prediction uncertainty. We apply this model to a database of 452 nonhomologous structures, achieving accuracies as high as the best currently available methods. We conclude by discussing an extension of this framework to model nonlocal interactions in protein structures, providing a possible direction for future improvements in secondary structure prediction accuracy.
Discovering frequent episodes and learning hidden markov models: A formal connection
 IEEE TKDE, Vol
, 2005
"... Abstract—This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete Hidd ..."
Abstract

Cited by 40 (12 self)
 Add to MetaCart
(Show Context)
Abstract—This paper establishes a formal connection between two common, but previously unconnected methods for analyzing data streams: discovering frequent episodes in a computer science framework and learning generative models in a statistics framework. We introduce a special class of discrete Hidden Markov Models (HMMs), called Episode Generating HMMs (EGHs), and associate each episode with a unique EGH. We prove that, given any two episodes, the EGH that is more likely to generate a given data sequence is the one associated with the more frequent episode. To be able to establish such a relationship, we define a new measure of frequency of an episode, based on what we call nonoverlapping occurrences of the episode in the data. An efficient algorithm is proposed for counting the frequencies for a set of episodes. Through extensive simulations, we show that our algorithm is both effective and more efficient than current methods for frequent episode discovery. We also show how the association between frequent episodes and EGHs can be exploited to assess the significance of frequent episodes discovered and illustrate empirically how this idea may be used to improve the efficiency of the frequent episode discovery. Index Terms—Temporal data mining, sequential data, frequent episodes, Hidden Markov Models, statistical significance. æ 1
Bayesian protein family classifier
 In Proceedings of the 6 th International Conference on Intelligent Systems for Molecular Biology
, 1998
"... A Bayesian procedure for the simultaneous alignment and classification of sequences into subclasses is described. This Gibbs sampling algorithm iterates between an alignment step and a classification step. It employs Bayesian inference for the identification of the number of conserved columns, the n ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
A Bayesian procedure for the simultaneous alignment and classification of sequences into subclasses is described. This Gibbs sampling algorithm iterates between an alignment step and a classification step. It employs Bayesian inference for the identification of the number of conserved columns, the number of motifs in each class, their size, and the size of the classes. Using Bayesian prediction, interclass differences in all these variables are brought to bare on the classification. Application to a superfamily of cyclic nucleotidebinding proteins identifies both similarities and differences in the sequence characteristics of the five subclasses identified by the procedure: 1) cNMPdependent kinases, 2) prokaryotic cAMPdependent regulatory proteins, CRPtype, 3) prokaryotic regulatory proteins, FNRtype, 4) cAMP gated ion channel proteins of animals, and 5) cAMP gated ion channels of plants.
A Theory for Dynamic Weighting in Monte Carlo Computation
, 2001
"... This article provides a first theoretical analysis of a new Monte Carlo approach, the dynamic weighting algorithm, proposed recently by Wong and Liang. In dynamic weighting Monte Carlo, one augments the original state space of interest by a weighting factor, which allows the resulting Markov chain t ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This article provides a first theoretical analysis of a new Monte Carlo approach, the dynamic weighting algorithm, proposed recently by Wong and Liang. In dynamic weighting Monte Carlo, one augments the original state space of interest by a weighting factor, which allows the resulting Markov chain to move more freely and to escape from local modes. It uses a new invariance principle to guide the construction of transition rules. We analyze the behavior of the weights resulting from such a process and provide detailed recommendations on how to use these weights properly. Our recommendations are supported by a renewal theorytype analysis. Our theoretical investigations are further demonstrated by a simulation study and applications in neural network training and Ising model simulations.
Bayesian methods in biological sequence analysis
 Handbook of Statistical Genetics, 2nd ed
, 2003
"... Hidden Markov models, the expectation–maximization algorithm, and the Gibbs sampler were introduced for biological sequence analysis in early 1990s. Since then the use of formal statistical models and inference procedures has revolutionized the field of computational biology. This chapter reviews th ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Hidden Markov models, the expectation–maximization algorithm, and the Gibbs sampler were introduced for biological sequence analysis in early 1990s. Since then the use of formal statistical models and inference procedures has revolutionized the field of computational biology. This chapter reviews the hidden Markov and related models, as well as their Bayesian inference procedures and algorithms, for sequence alignments and gene regulatory binding motif discoveries. We emphasize that the combination of Markov chain Monte Carlo and dynamicprogramming techniques often results in effective algorithms for NPhard problems in sequence analysis. In the past decade, we have witnessed the development of the likelihood approach to pairwise sequence alignments (Bishop and Thompson, 1986; Thorne et al., 1991); probabilistic models for RNA secondary structure (Zuker, 1989; Lowe and Eddy, 1997);
Unified Gibbs Method For Biological Sequence Analysis
"... The biotechnology revolution stems from rapid advances in the biological sciences. One important product of these advances is a large and rapidly growing data base of biopolymer (DNA, RNA, and protein) sequences, which has attracted much attention from researchers in different fields. The great majo ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
The biotechnology revolution stems from rapid advances in the biological sciences. One important product of these advances is a large and rapidly growing data base of biopolymer (DNA, RNA, and protein) sequences, which has attracted much attention from researchers in different fields. The great majority of the techniques generated for studying these data have been designed to analyze a single sequence or for the comparison of a pair of sequences. Multiple sequence analysis has remained a difficult challenge. In recent years, formal statistical models have shown potential in one such problem, multiple sequence alignment. In this article we describe a general statistical paradigm, the unified Gibbs method, for the conversion of nearly any existing method for the analysis of a single sequence or for the comparison of a pair of sequences into a multiple sequence analysis method. Our previous successful experiences with the unified Gibbs include the development of the site sampler, the moti...
Haplotype inference using a hidden Markov model with efficient Markov chain sampling
, 2007
"... Knowledge of haplotypes is useful for understanding block structures of the genome and finding genes associated with disease. Direct measurement of haplotypes in the absence of family data is presently impractical. Hence several methods have been developed previously for reconstructing haplotypes f ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Knowledge of haplotypes is useful for understanding block structures of the genome and finding genes associated with disease. Direct measurement of haplotypes in the absence of family data is presently impractical. Hence several methods have been developed previously for reconstructing haplotypes from population data. In this thesis, a new populationbased method is developed using a Hidden Markov Model (HMM) for the source of ancestral haplotype segments. A higherorder Markov model is used to account for linkage disequilibrium in the ancestral haplotypes. The HMM includes parameters for the genotyping error rate, the mutation rate, and the recombination rate. Four mutation models with varying number of parameters are developed and compared. Parameters of the model are inferred by Bayesian methods, using Markov Chain Monte Carlo (MCMC). Crucial to the efficiency of the Markov chain sampling is the use of a ForwardBackward algorithm for summing over all possible state sequences of the HMM. This model is tested by reconstructing the haplotypes of 129 children in the data set of Daly et al. (2001) and of 30 children in the CEU and YRI data of the HAPMAP project. For these data sets, familybased haplotype reconstructions found using MERLIN (Abecasis et al. 2002) are used to check the correctness of the ii populationbased reconstructions. The results of this HMM method are quite close to the familybased reconstructions and comparable to the PHASE program (Stephens et