Results 1  10
of
23
Parametric inference for biological sequence analysis
 In: Proceedings of the National Academy of Sciences. Volume
, 2004
"... One of the major successes in computational biology has been the unification, using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied towards these problems include hidden Markov models for annotatio ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
One of the major successes in computational biology has been the unification, using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied towards these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sumproduct algorithm, solves many of the inference problems associated with different statistical models. This paper introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sumproduct algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models. 1 Inference with Graphical Models for Biological Sequence Analysis This paper develops a new algorithm for graphical models based on the mathematical foundation for statistical models proposed in [18]. Its relevance for computational biology can be summarized as follows: (a) Graphical models are a unifying statistical framework for biological sequence analysis. (b) Parametric inference is important for obtaining biologically meaningful results.
Variational bayesian learning of directed graphical models with hidden variables, Bayesian Analysis 1
, 2006
"... Abstract. A key problem in statistics and machine learning is inferring suitable structure of a model given some observed data. A Bayesian approach to model comparison makes use of the marginal likelihood of each candidate model to form a posterior distribution over models; unfortunately for most mo ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
Abstract. A key problem in statistics and machine learning is inferring suitable structure of a model given some observed data. A Bayesian approach to model comparison makes use of the marginal likelihood of each candidate model to form a posterior distribution over models; unfortunately for most models of interest, notably those containing hidden or latent variables, the marginal likelihood is intractable to compute. We present the variational Bayesian (VB) algorithm for directed graphical models, which optimises a lower bound approximation to the marginal likelihood in a procedure similar to the standard EM algorithm. We show that for a large class of models, which we call conjugate exponential, the VB algorithm is a straightforward generalisation of the EM algorithm that incorporates uncertainty over model parameters. In a thorough case study using a small class of bipartite DAGs containing hidden variables, we compare the accuracy of the VB approximation to existing asymptoticdata approximations such as the Bayesian Information Criterion (BIC) and the CheesemanStutz (CS) criterion, and also to a sampling based gold standard, Annealed Importance Sampling (AIS). We find that the VB algorithm is empirically superior to CS and BIC, and much faster than AIS. Moreover, we prove that a VB approximation can always be constructed in such a way that guarantees it to be more accurate than the CS approximation.
Generalized measurement models
, 2004
"... Given a set of random variables, it is often the case that their associations can be explained by hidden common causes. We present a set of welldefined assumptions and a provably correct algorithm that allow us to identify some of such hidden common causes. The assumptions are fairly general and so ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Given a set of random variables, it is often the case that their associations can be explained by hidden common causes. We present a set of welldefined assumptions and a provably correct algorithm that allow us to identify some of such hidden common causes. The assumptions are fairly general and sometimes weaker than those used in practice by, for instance, econometricians, psychometricians, social scientists and in many other fields where latent variable models are important and tools such as factor analysis are applicable. The goal is automated knowledge discovery: identifying latent variables that can be used across diferent applications and causal models and throw new insights over a data generating process. Our approach is evaluated throught simulations and three realworld cases.
The hidden life of latent variables: Bayesian learning with mixed graph models
, 2008
"... Directed acyclic graphs (DAGs) have been widely used as a representation of conditional independence in machine learning and statistics. Moreover, hidden or latent variables are often an important component of graphical models. However, DAG models suffer from an important limitation: the family of D ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Directed acyclic graphs (DAGs) have been widely used as a representation of conditional independence in machine learning and statistics. Moreover, hidden or latent variables are often an important component of graphical models. However, DAG models suffer from an important limitation: the family of DAGs is not closed under marginalization of hidden variables. This means that in general we cannot use a DAG to represent the independencies over a subset of variables in a larger DAG. Directed mixed graphs (DMGs) are a representation that includes DAGs as a special case, and overcomes this limitation. This paper introduces algorithms for performing Bayesian inference in Gaussian and probit DMG models. An important requirement for inference is the characterization of the distribution over parameters of the models. We introduce a new distribution for covariance matrices of Gaussian DMGs. We discuss and illustrate how several Bayesian machine learning tasks can benefit from the principle presented here: the power to model dependencies that are generated from hidden variables, but without necessarily modelling such variables explicitly.
Learning Coefficients of Layered Models when the True Distribution Mismatches the Singularities
, 2003
"... Hierarchical learning machines such as layered neural networks have singularities in their parameter spaces. At singularities, the Fisher information matrix becomes degen erate, resulting that the conventional learning theory of regular statistical models does not hold. Recently, it was proven that ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Hierarchical learning machines such as layered neural networks have singularities in their parameter spaces. At singularities, the Fisher information matrix becomes degen erate, resulting that the conventional learning theory of regular statistical models does not hold. Recently, it was proven that, if the parameter of the true distribution is con tained in the singularities of the learning machine, then the generalization error in Bayes estimation is asymptotically equal to / where is smaller than the dimension of the parameter and is the number of training samples. However, the constant strongly depends on the local geometrical structure of singularities, hence the generalization error is not yet clarified when the true distribution is almost but not completely contained in the singularities. In this paper, in order to analyze such cases, we study the Bayes gen eralization error under the condition that the Kullback distance of the true distribution from the distribution represented by singularities is in proportion to l/s, and show two results. (1) If the dimension of the parameter from inputs to hidden units is not larger than three, then there exits a region of true parameters such that the generalization error is larger than that of the corresponding regular model. (2) However, if the dimension from inputs to hidden units is larger than three, then for arbitrary true distribution, the generalization error is smaller than that of the corresponding regular model.
Marginal Likelihood Integrals for Mixtures of Independence Models
"... Inference in Bayesian statistics involves the evaluation of marginal likelihood integrals. We present algebraic algorithms for computing such integrals exactly for discrete data of small sample size. Our methods apply to both uniform priors and Dirichlet priors. The underlying statistical models are ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
Inference in Bayesian statistics involves the evaluation of marginal likelihood integrals. We present algebraic algorithms for computing such integrals exactly for discrete data of small sample size. Our methods apply to both uniform priors and Dirichlet priors. The underlying statistical models are mixtures of independent distributions, or, in geometric language, secant varieties of SegreVeronese varieties.
Variational Bayes Solution of Linear Neural Networks and its Generalization Performance
, 2007
"... It is wellknown that, in unidentifiable models, the Bayes estimation provides much better generalization performance than the maximum likelihood (ML) estimation. However, its accurate approximation by Markov chain Monte Carlo methods requires huge computational costs. As an alternative, a tractable ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
It is wellknown that, in unidentifiable models, the Bayes estimation provides much better generalization performance than the maximum likelihood (ML) estimation. However, its accurate approximation by Markov chain Monte Carlo methods requires huge computational costs. As an alternative, a tractable approximation method, called the variational Bayes (VB) approach, has recently been proposed and been attracting people’s attention. Its advantage over the expectation maximization (EM) algorithm, often used for realizing the ML estimation, has been experimentally shown in many applications, nevertheless, has not been theoretically shown yet. In this paper, through the analysis of the simplest unidentifiable models, we theoretically show some properties of the VB approach. We first prove that, in threelayer linear neural networks, the VB approach is asymptotically equivalent to a positivepart JamesStein type shrinkage estimation. Then, we theoretically clarify its free energy, generalization error, and training error. Comparing them with those of the ML estimation and of the Bayes estimation, we discuss the advantage of the VB approach. We also show that, unlike in the Bayes estimation, the free energy and the generalization error are less simply related with each other, and that, in typical cases, the VB free energy well approximates the Bayes one, while the VB generalization error significantly differs from the Bayes one.
Maximum likelihood estimation in latent class models for contingency table data
 In Algebraic and Geometric Methods in Statistics
, 2008
"... 1 page 1 ..."
Effective Dimensions of Hierarchical Latent Class Models
 Journal of Artificial Intelligence Research
, 2002
"... Hierarchical latent class (HLC) models are treestructured Bayesian networks where leaf nodes are observed while internal nodes are latent. There are no theoretically well justified model selection criteria for HLC models in particular and Bayesian networks with latent nodes in general. Nonetheless, ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Hierarchical latent class (HLC) models are treestructured Bayesian networks where leaf nodes are observed while internal nodes are latent. There are no theoretically well justified model selection criteria for HLC models in particular and Bayesian networks with latent nodes in general. Nonetheless, empirical studies suggest that the BIC score is a reasonable criterion to use in practice for learning HLC models. Empirical studies also suggest that sometimes model selection can be improved if standard model dimension is replaced with effective model dimension in the penalty term of the BIC score.
Asymptotic Approximation of Marginal Likelihood Integrals
"... We study the asymptotics of marginal likelihood integrals for discrete models using resolution of singularities from algebraic geometry, a method introduced recently by Sumio Watanabe. We briefly describe the statistical and mathematical foundations of this method, and explore how Newton diagrams an ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We study the asymptotics of marginal likelihood integrals for discrete models using resolution of singularities from algebraic geometry, a method introduced recently by Sumio Watanabe. We briefly describe the statistical and mathematical foundations of this method, and explore how Newton diagrams and toric modifications help solve the problem. The approximations are then compared with exact computations of the integrals. 1