Results 1 
8 of
8
Learning with mixtures of trees
 Journal of Machine Learning Research
, 2000
"... This paper describes the mixturesoftrees model, a probabilistic model for discrete multidimensional domains. Mixturesoftrees generalize the probabilistic trees of Chow and Liu [6] in a different and complementary direction to that of Bayesian networks. We present efficient algorithms for learnin ..."
Abstract

Cited by 109 (2 self)
 Add to MetaCart
This paper describes the mixturesoftrees model, a probabilistic model for discrete multidimensional domains. Mixturesoftrees generalize the probabilistic trees of Chow and Liu [6] in a different and complementary direction to that of Bayesian networks. We present efficient algorithms for learning mixturesoftrees models in maximum likelihood and Bayesian frameworks. We also discuss additional efficiencies that can be obtained when data are “sparse, ” and we present data structures and algorithms that exploit such sparseness. Experimental results demonstrate the performance of the model for both density estimation and classification. We also discuss the sense in which treebased classifiers perform an implicit form of feature selection, and demonstrate a resulting insensitivity to irrelevant attributes.
An Introduction to Variational Methods for Graphical Methods
 Machine Learning
, 1998
"... . This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMRDT database, the sigmoid belief network, the Boltzmann m ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
. This paper presents a tutorial introduction to the use of variational methods for inference and learning in graphical models (Bayesian networks and Markov random fields). We present a number of examples of graphical models, including the QMRDT database, the sigmoid belief network, the Boltzmann machine, and several variants of hidden Markov models, in which it is infeasible to run exact inference algorithms. We then introduce variational methods, which exploit laws of large numbers to transform the original graphical model into a simplified graphical model in which inference is efficient. Inference in the simpified model provides bounds on probabilities of interest in the original model. We describe a general framework for generating variational transformations based on convex duality. Finally we return to the examples and demonstrate how variational algorithms can be formulated in each case.
Incremental Bayesian Networks for Structure Prediction
"... We propose a class of graphical models appropriate for structure prediction problems where the model structure is a function of the output structure. Incremental Sigmoid Belief Networks (ISBNs) avoid the need to sum over the possible model structures by using directed arcs and incrementally specifyi ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
We propose a class of graphical models appropriate for structure prediction problems where the model structure is a function of the output structure. Incremental Sigmoid Belief Networks (ISBNs) avoid the need to sum over the possible model structures by using directed arcs and incrementally specifying the model structure. Exact inference in such directed models is not tractable, but we derive two efficient approximations based on mean field methods, which prove effective in artificial experiments. We then demonstrate their effectiveness on a benchmark natural language parsing task, where they achieve stateoftheart accuracy. Also, the model which is a closer approximation to an ISBN has better parsing accuracy, suggesting that ISBNs are an appropriate abstract model of structure prediction tasks. 1.
Recurrent Sampling Models for the Helmholtz Machine
, 1999
"... Many recent analysisbysynthesis density estimation models of cortical learning and processing have made the crucial simplifying assumption that units within a single layer are mutually independent given the states of units in the layer below or the layer above. In this article, we suggest using ei ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Many recent analysisbysynthesis density estimation models of cortical learning and processing have made the crucial simplifying assumption that units within a single layer are mutually independent given the states of units in the layer below or the layer above. In this article, we suggest using either a Markov random field or an alternative stochastic sampling architecture to capture explicitly particular forms of dependence within each layer. We develop the architectures in the context of real and binary Helmholtz machines. Recurrent sampling can be used to capture correlations within layers in the generative or the recognition models, and we
Nonconjugate Variational Message Passing for Multinomial and Binary Regression
"... Variational Message Passing (VMP) is an algorithmic implementation of the Variational Bayes (VB) method which applies only in the special case of conjugate exponential family models. We propose an extension to VMP, which we refer to as Nonconjugate Variational Message Passing (NCVMP) which aims to ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Variational Message Passing (VMP) is an algorithmic implementation of the Variational Bayes (VB) method which applies only in the special case of conjugate exponential family models. We propose an extension to VMP, which we refer to as Nonconjugate Variational Message Passing (NCVMP) which aims to alleviate this restriction while maintaining modularity, allowing choice in how expectations are calculated, and integrating into an existing messagepassing framework: Infer.NET. We demonstrate NCVMP on logistic binary and multinomial regression. In the multinomial case we introduce a novel variational bound for the softmax factor which is tighter than other commonly used bounds whilst maintaining computational tractability. 1
Incremental sigmoid belief networks for grammar learning
 Journal of Machine Learning Research
, 2010
"... We propose a class of Bayesian networks appropriate for structured prediction problems where the Bayesian network’s model structure is a function of the predicted output structure. These incremental sigmoid belief networks (ISBNs) make decoding possible because inference with partial output structur ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We propose a class of Bayesian networks appropriate for structured prediction problems where the Bayesian network’s model structure is a function of the predicted output structure. These incremental sigmoid belief networks (ISBNs) make decoding possible because inference with partial output structures does not require summing over the unboundedly many compatible model structures, due to their directed edges and incrementally specified model structure. ISBNs are specifically targeted at challenging structured prediction problems such as natural language parsing, where learning the domain’s complex statistical dependencies benefits from large numbers of latent variables. While exact inference in ISBNs with large numbers of latent variables is not tractable, we propose two efficient approximations. First, we demonstrate that a previous neural network parsing model can be viewed as a coarse meanfield approximation to inference with ISBNs. We then derive a more accurate but still tractable variational approximation, which proves effective in artificial experiments. We compare the effectiveness of these models on a benchmark natural language parsing task, where they achieve accuracy competitive with the stateoftheart. The model which is a closer approximation model of natural language grammar learning.
Negative Tree Reweighted Belief Propagation
"... We introduce a new class of lower bounds on the log partition function of a Markov random field which makes use of a reversed Jensen’s inequality. In particular, our method approximates the intractable distribution using a linear combination of spanning trees with negative weights. This technique is ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We introduce a new class of lower bounds on the log partition function of a Markov random field which makes use of a reversed Jensen’s inequality. In particular, our method approximates the intractable distribution using a linear combination of spanning trees with negative weights. This technique is a lowerbound counterpart to the treereweighted belief propagation algorithm, which uses a convex combination of spanning trees with positive weights to provide corresponding upper bounds. We develop algorithms to optimize and tighten the lower bounds over the nonconvex set of valid parameter values. Our algorithm generalizes mean field approaches (including naïve and structured mean field approximations), which it includes as a limiting case. 1
In Cognitive Science 26:3, 2002
 Cognitive Science
, 1990
"... This paper summarizes our recent work in developing statistical models of language which are compatible with the kinds of linguistic structures posited by current linguistic theories. In a series of papers we have developed tools for estimating or \learning" such models from data (Johnson et al., 19 ..."
Abstract
 Add to MetaCart
This paper summarizes our recent work in developing statistical models of language which are compatible with the kinds of linguistic structures posited by current linguistic theories. In a series of papers we have developed tools for estimating or \learning" such models from data (Johnson et al., 1999; Johnson and Riezler, 2000; Riezler et al., 2000) and this paper provides a highlevel overview of both the general approach and the methods we developed