Results 1  10
of
35
Parametric inference for biological sequence analysis
 In: Proceedings of the National Academy of Sciences. Volume
, 2004
"... One of the major successes in computational biology has been the unification, using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied towards these problems include hidden Markov models for annotatio ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
(Show Context)
One of the major successes in computational biology has been the unification, using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied towards these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sumproduct algorithm, solves many of the inference problems associated with different statistical models. This paper introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sumproduct algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models. 1 Inference with Graphical Models for Biological Sequence Analysis This paper develops a new algorithm for graphical models based on the mathematical foundation for statistical models proposed in [18]. Its relevance for computational biology can be summarized as follows: (a) Graphical models are a unifying statistical framework for biological sequence analysis. (b) Parametric inference is important for obtaining biologically meaningful results.
Variational bayesian learning of directed graphical models with hidden variables, Bayesian Analysis 1
, 2006
"... Abstract. A key problem in statistics and machine learning is inferring suitable structure of a model given some observed data. A Bayesian approach to model comparison makes use of the marginal likelihood of each candidate model to form a posterior distribution over models; unfortunately for most mo ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
Abstract. A key problem in statistics and machine learning is inferring suitable structure of a model given some observed data. A Bayesian approach to model comparison makes use of the marginal likelihood of each candidate model to form a posterior distribution over models; unfortunately for most models of interest, notably those containing hidden or latent variables, the marginal likelihood is intractable to compute. We present the variational Bayesian (VB) algorithm for directed graphical models, which optimises a lower bound approximation to the marginal likelihood in a procedure similar to the standard EM algorithm. We show that for a large class of models, which we call conjugate exponential, the VB algorithm is a straightforward generalisation of the EM algorithm that incorporates uncertainty over model parameters. In a thorough case study using a small class of bipartite DAGs containing hidden variables, we compare the accuracy of the VB approximation to existing asymptoticdata approximations such as the Bayesian Information Criterion (BIC) and the CheesemanStutz (CS) criterion, and also to a sampling based gold standard, Annealed Importance Sampling (AIS). We find that the VB algorithm is empirically superior to CS and BIC, and much faster than AIS. Moreover, we prove that a VB approximation can always be constructed in such a way that guarantees it to be more accurate than the CS approximation.
Generalized measurement models
, 2004
"... document without permission of its author may be prohibited by law. ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
document without permission of its author may be prohibited by law.
The hidden life of latent variables: Bayesian learning with mixed graph models
, 2008
"... Directed acyclic graphs (DAGs) have been widely used as a representation of conditional independence in machine learning and statistics. Moreover, hidden or latent variables are often an important component of graphical models. However, DAG models suffer from an important limitation: the family of D ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Directed acyclic graphs (DAGs) have been widely used as a representation of conditional independence in machine learning and statistics. Moreover, hidden or latent variables are often an important component of graphical models. However, DAG models suffer from an important limitation: the family of DAGs is not closed under marginalization of hidden variables. This means that in general we cannot use a DAG to represent the independencies over a subset of variables in a larger DAG. Directed mixed graphs (DMGs) are a representation that includes DAGs as a special case, and overcomes this limitation. This paper introduces algorithms for performing Bayesian inference in Gaussian and probit DMG models. An important requirement for inference is the characterization of the distribution over parameters of the models. We introduce a new distribution for covariance matrices of Gaussian DMGs. We discuss and illustrate how several Bayesian machine learning tasks can benefit from the principle presented here: the power to model dependencies that are generated from hidden variables, but without necessarily modelling such variables explicitly.
Variational Bayes Solution of Linear Neural Networks and its Generalization Performance
, 2007
"... It is wellknown that, in unidentifiable models, the Bayes estimation provides much better generalization performance than the maximum likelihood (ML) estimation. However, its accurate approximation by Markov chain Monte Carlo methods requires huge computational costs. As an alternative, a tractable ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
It is wellknown that, in unidentifiable models, the Bayes estimation provides much better generalization performance than the maximum likelihood (ML) estimation. However, its accurate approximation by Markov chain Monte Carlo methods requires huge computational costs. As an alternative, a tractable approximation method, called the variational Bayes (VB) approach, has recently been proposed and been attracting people’s attention. Its advantage over the expectation maximization (EM) algorithm, often used for realizing the ML estimation, has been experimentally shown in many applications, nevertheless, has not been theoretically shown yet. In this paper, through the analysis of the simplest unidentifiable models, we theoretically show some properties of the VB approach. We first prove that, in threelayer linear neural networks, the VB approach is asymptotically equivalent to a positivepart JamesStein type shrinkage estimation. Then, we theoretically clarify its free energy, generalization error, and training error. Comparing them with those of the ML estimation and of the Bayes estimation, we discuss the advantage of the VB approach. We also show that, unlike in the Bayes estimation, the free energy and the generalization error are less simply related with each other, and that, in typical cases, the VB free energy well approximates the Bayes one, while the VB generalization error significantly differs from the Bayes one.
Learning Coefficients of Layered Models when the True Distribution Mismatches the Singularities
, 2003
"... Hierarchical learning machines such as layered neural networks have singularities in their parameter spaces. At singularities, the Fisher information matrix becomes degen erate, resulting that the conventional learning theory of regular statistical models does not hold. Recently, it was proven that ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Hierarchical learning machines such as layered neural networks have singularities in their parameter spaces. At singularities, the Fisher information matrix becomes degen erate, resulting that the conventional learning theory of regular statistical models does not hold. Recently, it was proven that, if the parameter of the true distribution is con tained in the singularities of the learning machine, then the generalization error in Bayes estimation is asymptotically equal to / where is smaller than the dimension of the parameter and is the number of training samples. However, the constant strongly depends on the local geometrical structure of singularities, hence the generalization error is not yet clarified when the true distribution is almost but not completely contained in the singularities. In this paper, in order to analyze such cases, we study the Bayes gen eralization error under the condition that the Kullback distance of the true distribution from the distribution represented by singularities is in proportion to l/s, and show two results. (1) If the dimension of the parameter from inputs to hidden units is not larger than three, then there exits a region of true parameters such that the generalization error is larger than that of the corresponding regular model. (2) However, if the dimension from inputs to hidden units is larger than three, then for arbitrary true distribution, the generalization error is smaller than that of the corresponding regular model.
Maximum likelihood estimation in latent class models for contingency table data
 In Algebraic and Geometric Methods in Statistics
, 2008
"... 1 page 1 ..."
Marginal Likelihood Integrals for Mixtures of Independence Models
"... Inference in Bayesian statistics involves the evaluation of marginal likelihood integrals. We present algebraic algorithms for computing such integrals exactly for discrete data of small sample size. Our methods apply to both uniform priors and Dirichlet priors. The underlying statistical models are ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
(Show Context)
Inference in Bayesian statistics involves the evaluation of marginal likelihood integrals. We present algebraic algorithms for computing such integrals exactly for discrete data of small sample size. Our methods apply to both uniform priors and Dirichlet priors. The underlying statistical models are mixtures of independent distributions, or, in geometric language, secant varieties of SegreVeronese varieties.
Automated analytic asymptotic evaluation of the marginal likelihood of latent models
 In C. Meek & U. Kjorulff (Eds.), Proceedings of the 19th Conference on Uncertainty and Artificial Intelligence
, 2003
"... We present two algorithms for analytic asymptotic evaluation of the marginal likelihood of data given a Bayesian network with hidden nodes. As shown by previous work, this evaluation is particularly hard because for these models asymptotic approximation of the marginal likelihood deviates from the s ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We present two algorithms for analytic asymptotic evaluation of the marginal likelihood of data given a Bayesian network with hidden nodes. As shown by previous work, this evaluation is particularly hard because for these models asymptotic approximation of the marginal likelihood deviates from the standard BIC score. Our algorithms compute regular dimensionality drop for latent models and compute the nonstandard approximation formulas for singular statistics for these models. The presented algorithms are implemented in Matlab and Maple and their usage is demonstrated on several examples. In this paper we address the problem of computing anaiytic asymptotic approximations of marginai ii~eiihoods and present two computer programs that compute such approximations. Our algorithms are developed in the context of Bayesian networks with hidden variables, where the evaluation of marginal likelihood was shown to be particularly hard (Rusakov & Geiger, 2002). Consider the evaluation of the marginal likelihood given a Bayesian network model. Under some regularity conditions, the asymptotic form of the log marginal likelihood for Bayesian network models without hidden variables is specified by the standard BIC fornula: 1
Effective Dimensions of Hierarchical Latent Class Models
 Journal of Artificial Intelligence Research
, 2002
"... Hierarchical latent class (HLC) models are treestructured Bayesian networks where leaf nodes are observed while internal nodes are latent. There are no theoretically well justified model selection criteria for HLC models in particular and Bayesian networks with latent nodes in general. Nonetheless, ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Hierarchical latent class (HLC) models are treestructured Bayesian networks where leaf nodes are observed while internal nodes are latent. There are no theoretically well justified model selection criteria for HLC models in particular and Bayesian networks with latent nodes in general. Nonetheless, empirical studies suggest that the BIC score is a reasonable criterion to use in practice for learning HLC models. Empirical studies also suggest that sometimes model selection can be improved if standard model dimension is replaced with effective model dimension in the penalty term of the BIC score.