Results 1  10
of
42
AN INTRODUCTION TO VARIATIONAL METHODS FOR GRAPHICAL MODELS
 TO APPEAR: M. I. JORDAN, (ED.), LEARNING IN GRAPHICAL MODELS
"... ..."
(Show Context)
Text Classification from Labeled and Unlabeled Documents using EM
 MACHINE LEARNING
, 1999
"... This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large qua ..."
Abstract

Cited by 859 (17 self)
 Add to MetaCart
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. We introduce an algorithm for learning from labeled and unlabeled documents based on the combination of ExpectationMaximization (EM) and a naive Bayes classifier. The algorithm first trains a classifier using the available labeled documents, and probabilistically labels the unlabeled documents. It then trains a new classifier using the labels for all the documents, and iterates to convergence. This basic EM procedure works well when the data conform to the generative assumptions of the model. However these assumptions are often violated in practice, and poor performance can result. We present two extensions to the algorithm that improve ...
EM procedures using mean fieldlike approximations for Markov modelbased image segmentation
, 2001
"... This paper deals with Markov random field modelbased image segmentation. This involves parameter estimation in hidden Markov models for which one of the most widely used procedures is the EM algorithm. In practice, difficulties arise due to the dependence structure in the models and approximations ..."
Abstract

Cited by 47 (11 self)
 Add to MetaCart
This paper deals with Markov random field modelbased image segmentation. This involves parameter estimation in hidden Markov models for which one of the most widely used procedures is the EM algorithm. In practice, difficulties arise due to the dependence structure in the models and approximations are required to make the algorithm tractable. We propose a class of algorithms in which the idea is to deal with systems of independent variables. This corresponds to approximations of the pixels' interactions similar to the mean field approximation. It follows algorithms that have the advantage of taking the Markovian structure into account while preserving the good features of EM. In addition, this class, that includes new and already known procedures, is presented in a unified framework, showing that apparently distant algorithms come from similar approximation principles. We illustrate the algorithms performance on synthetic and real images. These experiments point out the ability of o...
Realtime particle filters
 Proceedings of the IEEE
, 2004
"... ctkwok,fox£ Particle filters estimate the state of dynamical systems from sensor information. In many real time applications of particle filters, however, sensor information arrives at a significantly higher rate than the update rate of the filter. The prevalent approach to dealing with such situati ..."
Abstract

Cited by 46 (2 self)
 Add to MetaCart
(Show Context)
ctkwok,fox£ Particle filters estimate the state of dynamical systems from sensor information. In many real time applications of particle filters, however, sensor information arrives at a significantly higher rate than the update rate of the filter. The prevalent approach to dealing with such situations is to update the particle filter as often as possible and to discard sensor information that cannot be processed in time. In this paper we present realtime particle filters, which make use of all sensor information even when the filter update rate is below the update rate of the sensors. This is achieved by representing posteriors as mixtures of sample sets, where each mixture component integrates one observation arriving during a filter update. The weights of the mixture components are set so as to minimize the approximation error introduced by the mixture representation. Thereby, our approach focuses computational resources (samples) on valuable sensor information. Experiments using data collected with a mobile robot show that our approach yields strong improvements over other approaches. 1
Variational Approximations between Mean Field Theory and the Junction Tree Algorithm
 In Uncertainty in Artificial Intelligence
, 2000
"... Recently, variational approximations such as the mean field approximation have received much interest. We extend the standard mean field method by using an approximating distribution that factorises into cluster potentials. This includes undirected graphs, directed acyclic graphs and junction ..."
Abstract

Cited by 45 (1 self)
 Add to MetaCart
(Show Context)
Recently, variational approximations such as the mean field approximation have received much interest. We extend the standard mean field method by using an approximating distribution that factorises into cluster potentials. This includes undirected graphs, directed acyclic graphs and junction trees. We derive generalised mean field equations to optimise the cluster potentials. We show that the method bridges the gap between the standard mean field approximation and the exact junction tree algorithm. In addition, we address the problem of how to choose the structure and the free parameters of the approximating distribution. From the generalised mean field equations we derive rules to simplify the approximation in advance without affecting the potential accuracy of the model class. We also show how the method fits into some other variational approximations that are currently popular. 1 INTRODUCTION Graphical models, such as Bayesian networks, Markov fields, and Bolt...
ModelIndependent Mean Field Theory as a Local Method for Approximate Propagation of Information
 Propagation of Information,” Computation in Neural Systems
, 2002
"... We present a systematic approach to mean field theory (MFT) in a general probabilistic setting without assuming a particular model. The mean field equations derived here may serve as a local and thus very simple method for approximate inference in probabilistic models such as Boltzmann machines or B ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
We present a systematic approach to mean field theory (MFT) in a general probabilistic setting without assuming a particular model. The mean field equations derived here may serve as a local and thus very simple method for approximate inference in probabilistic models such as Boltzmann machines or Bayesian networks. "Modelindependent" means that we do not assume a particular type of dependencies; in a Bayesian network, for example, we allow arbitrary tables to specify conditional dependencies. In general, there are multiple solutions to the mean field equations. We show that improved estimates can be obtained by forming a weighted mixture of the multiple mean field solutions. Simple approximate expressions for the mixture weights are given. The general formalism derived so far is evaluated for the special case of Bayesian networks. The benefits of taking into account multiple solutions are demonstrated by using MFT for inference in a small and in a very large Bayesian network. The results are compared to the exact results.
Hierarchical MixturesofExperts for Exponential Family Regression Models: Approximation and Maximum Likelihood Estimation
 Ann. Statistics
, 1999
"... this paper we consider the denseness and consistency of these models in the generalized linear model context. Before proceeding we present some notation regarding mixtures and hierarchical mixtures of generalized linear models and oneparameter exponential family HIERARCHICAL MIXTURESOFEXPERTS 3 ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
this paper we consider the denseness and consistency of these models in the generalized linear model context. Before proceeding we present some notation regarding mixtures and hierarchical mixtures of generalized linear models and oneparameter exponential family HIERARCHICAL MIXTURESOFEXPERTS 3 regression models. Generalized linear models are widely used in statistical practice [McCullagh and Nelder (1989)]. Oneparameter exponential family regression models [see Bickel and Doksum (1977), page 67] with generalized linear mean functions (GLM1) are special examples of the generalized linear models, where the probability distribution can be parameterized by the mean function. In the regression context, a GLM1 model proposes that the conditional expectation (x) of a real response variable y (the output) is related to a vector of predictors (or inputs)
Convex variational Bayesian inference for large scale generalized linear models
 In ICML
, 2009
"... We show how variational Bayesian inference can be implemented for very large generalized linear models. Our relaxation is proven to be a convex problem for any logconcave model. We provide a generic double loop algorithm for solving this relaxation on models with arbitrary superGaussian potenti ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
(Show Context)
We show how variational Bayesian inference can be implemented for very large generalized linear models. Our relaxation is proven to be a convex problem for any logconcave model. We provide a generic double loop algorithm for solving this relaxation on models with arbitrary superGaussian potentials. By iteratively decoupling the criterion, most of the work can be done by solving large linear systems, rendering our algorithm orders of magnitude faster than previously proposed solvers for the same problem. We evaluate our method on problems of Bayesian active learning for large binary classification models, and show how to address settings with many candidates and sequential inclusion steps. 1.
Distributed constrained optimization with semicoordinate transformations
, 2008
"... Recent work has shown how information theory extends conventional fullrationality game theory to allow bounded rational agents. The associated mathematical framework can be used to solve constrained optimization problems. This is done by translating the problem into an iterated game, where each age ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
(Show Context)
Recent work has shown how information theory extends conventional fullrationality game theory to allow bounded rational agents. The associated mathematical framework can be used to solve constrained optimization problems. This is done by translating the problem into an iterated game, where each agent controls a different variable of the problem, so that the joint probability distribution across the agents ’ moves gives an expected value of the objective function. The dynamics of the agents is designed to minimize a Lagrangian function of that joint distribution. Here we illustrate how the updating of the Lagrange parameters in the Lagrangian is a form of automated annealing, which focuses the joint distribution more and more tightly about the joint moves that optimize the objective function. We then investigate the use of “semicoordinate” variable transformations. These separate the joint state of the agents from the variables of the optimization problem, with the two connected by an onto mapping. We present experiments illustrating the ability of such transformations to facilitate optimization. We focus on the special kind of transformation in which the statistically independent states of the agents induces a mixture distribution over the optimization variables. Computer experiment illustrate this for ksat constraint satisfaction problems and for unconstrained minimization of NK functions.
On the Identifiability of MixturesofExperts
 Neural Networks
, 1999
"... In mixturesofexperts (ME) models, "experts" of generalized linear models are combined, according to a set of local weights called the "gating function". The invariant transformations of the ME probability density functions include the permutations of the expert labels and the t ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
In mixturesofexperts (ME) models, "experts" of generalized linear models are combined, according to a set of local weights called the "gating function". The invariant transformations of the ME probability density functions include the permutations of the expert labels and the translations of the parameters in the gating functions. Under certain conditions, we show that the ME systems are identifiable if the experts are ordered and the gating parameters are initialized. The conditions are validated for Poisson, gamma, normal and binomial experts. KeywordsGeneralized linear models, identifiability, invariant transformations, mixturesofexperts. 1 INTRODUCTION MixturesofExperts (ME) (Jacobs et. al. 1991) and Hierarchical MixturesofExperts (HME) (Jordan and Jacobs 1994) originated from the neural network literature, and have had wide applications for examining relationships among variables [Cacciatore and Nowlan (1994), Meila and Jordan (1995), Ghahramani and Hinton (1996), Tip...