Results 1 - 10
of
26
Text Classification from Labeled and Unlabeled Documents using EM
- Machine Learning
, 1999
"... . This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large qua ..."
Abstract
-
Cited by 632 (16 self)
- Add to MetaCart
. This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. We introduce an algorithm for learning from labeled and unlabeled documents based on the combination of Expectation-Maximization (EM) and a naive Bayes classifier. The algorithm first trains a classifier using the available labeled documents, and probabilistically labels the unlabeled documents. It then trains a new classifier using the labels for all the documents, and iterates to convergence. This basic EM procedure works well when the data conform to the generative assumptions of the model. However these assumptions are often violated in practice, and poor performance can result. We present two extensions to the algorithm that improve ...
Variational Approximations between Mean Field Theory and the Junction Tree Algorithm
- In Uncertainty in Artificial Intelligence
, 2000
"... Recently, variational approximations such as the mean field approximation have received much interest. We extend the standard mean field method by using an approximating distribution that factorises into cluster potentials. This includes undirected graphs, directed acyclic graphs and junction ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
Recently, variational approximations such as the mean field approximation have received much interest. We extend the standard mean field method by using an approximating distribution that factorises into cluster potentials. This includes undirected graphs, directed acyclic graphs and junction trees. We derive generalised mean field equations to optimise the cluster potentials. We show that the method bridges the gap between the standard mean field approximation and the exact junction tree algorithm. In addition, we address the problem of how to choose the structure and the free parameters of the approximating distribution. From the generalised mean field equations we derive rules to simplify the approximation in advance without affecting the potential accuracy of the model class. We also show how the method fits into some other variational approximations that are currently popular. 1 INTRODUCTION Graphical models, such as Bayesian networks, Markov fields, and Bolt...
Real-time particle filters
- Proceedings of the IEEE
, 2004
"... ctkwok,fox£ Particle filters estimate the state of dynamical systems from sensor information. In many real time applications of particle filters, however, sensor information arrives at a significantly higher rate than the update rate of the filter. The prevalent approach to dealing with such situati ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
ctkwok,fox£ Particle filters estimate the state of dynamical systems from sensor information. In many real time applications of particle filters, however, sensor information arrives at a significantly higher rate than the update rate of the filter. The prevalent approach to dealing with such situations is to update the particle filter as often as possible and to discard sensor information that cannot be processed in time. In this paper we present real-time particle filters, which make use of all sensor information even when the filter update rate is below the update rate of the sensors. This is achieved by representing posteriors as mixtures of sample sets, where each mixture component integrates one observation arriving during a filter update. The weights of the mixture components are set so as to minimize the approximation error introduced by the mixture representation. Thereby, our approach focuses computational resources (samples) on valuable sensor information. Experiments using data collected with a mobile robot show that our approach yields strong improvements over other approaches. 1
EM procedures using mean field-like approximations for Markov model-based image segmentation
, 2001
"... This paper deals with Markov random field model-based image segmentation. This involves parameter estimation in hidden Markov models for which one of the most widely used procedures is the EM algorithm. In practice, difficulties arise due to the dependence structure in the models and approximations ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
This paper deals with Markov random field model-based image segmentation. This involves parameter estimation in hidden Markov models for which one of the most widely used procedures is the EM algorithm. In practice, difficulties arise due to the dependence structure in the models and approximations are required to make the algorithm tractable. We propose a class of algorithms in which the idea is to deal with systems of independent variables. This corresponds to approximations of the pixels' interactions similar to the mean field approximation. It follows algorithms that have the advantage of taking the Markovian structure into account while preserving the good features of EM. In addition, this class, that includes new and already known procedures, is presented in a unified framework, showing that apparently distant algorithms come from similar approximation principles. We illustrate the algorithms performance on synthetic and real images. These experiments point out the ability of o...
Model-Independent Mean Field Theory as a Local Method for Approximate Propagation of Information
- Propagation of Information,” Computation in Neural Systems
, 2002
"... We present a systematic approach to mean field theory (MFT) in a general probabilistic setting without assuming a particular model. The mean field equations derived here may serve as a local and thus very simple method for approximate inference in probabilistic models such as Boltzmann machines or B ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
We present a systematic approach to mean field theory (MFT) in a general probabilistic setting without assuming a particular model. The mean field equations derived here may serve as a local and thus very simple method for approximate inference in probabilistic models such as Boltzmann machines or Bayesian networks. "Model-independent" means that we do not assume a particular type of dependencies; in a Bayesian network, for example, we allow arbitrary tables to specify conditional dependencies. In general, there are multiple solutions to the mean field equations. We show that improved estimates can be obtained by forming a weighted mixture of the multiple mean field solutions. Simple approximate expressions for the mixture weights are given. The general formalism derived so far is evaluated for the special case of Bayesian networks. The benefits of taking into account multiple solutions are demonstrated by using MFT for inference in a small and in a very large Bayesian network. The results are compared to the exact results.
Hierarchical Mixtures-of-Experts for Exponential Family Regression Models: Approximation and Maximum Likelihood Estimation
- Ann. Statistics
, 1999
"... this paper we consider the denseness and consistency of these models in the generalized linear model context. Before proceeding we present some notation regarding mixtures and hierarchical mixtures of generalized linear models and one-parameter exponential family HIERARCHICAL MIXTURES-OF-EXPERTS 3 ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
this paper we consider the denseness and consistency of these models in the generalized linear model context. Before proceeding we present some notation regarding mixtures and hierarchical mixtures of generalized linear models and one-parameter exponential family HIERARCHICAL MIXTURES-OF-EXPERTS 3 regression models. Generalized linear models are widely used in statistical practice [McCullagh and Nelder (1989)]. One-parameter exponential family regression models [see Bickel and Doksum (1977), page 67] with generalized linear mean functions (GLM1) are special examples of the generalized linear models, where the probability distribution can be parameterized by the mean function. In the regression context, a GLM1 model proposes that the conditional expectation (x) of a real response variable y (the output) is related to a vector of predictors (or inputs)
Distributed constrained optimization with semicoordinate transformations
- JOURNAL OF OPERATIONS RESEARCH
, 2005
"... Recent work has shown how information theory extends conventional full-rationality game theory to allow bounded rational agents. The associated mathematical framework can be used to solve constrained optimization problems. This is done by translating the problem into an iterated game, where each age ..."
Abstract
-
Cited by 7 (7 self)
- Add to MetaCart
Recent work has shown how information theory extends conventional full-rationality game theory to allow bounded rational agents. The associated mathematical framework can be used to solve constrained optimization problems. This is done by translating the problem into an iterated game, where each agent controls a different variable of the problem, so that the joint probability distribution across the agents ’ moves gives an expected value of the objective function. The dynamics of the agents is designed to minimize a Lagrangian function of that joint distribution. Here we illustrate how the updating of the Lagrange parameters in the Lagrangian is a form of automated annealing, which focuses the joint distribution more and more tightly about the joint moves that optimize the objective function. We then investigate the use of “semicoordinate” variable transformations. These separate the joint state of the agents from the variables of the optimization problem, with the two connected by an onto mapping. We present experiments illustrating the ability of such transformations to facilitate optimization. We focus on the special kind of transformation in which the statistically independent states of the agents induces a mixture distribution over the optimization variables. Computer experiment illustrate this for k-sat constraint satisfaction problems and for unconstrained minimization of NK functions.
On the Identifiability of Mixtures-of-Experts
- Neural Networks
, 1999
"... In mixtures-of-experts (ME) models, "experts" of generalized linear models are combined, according to a set of local weights called the "gating function". The invariant transformations of the ME probability density functions include the permutations of the expert labels and the translations of the p ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
In mixtures-of-experts (ME) models, "experts" of generalized linear models are combined, according to a set of local weights called the "gating function". The invariant transformations of the ME probability density functions include the permutations of the expert labels and the translations of the parameters in the gating functions. Under certain conditions, we show that the ME systems are identifiable if the experts are ordered and the gating parameters are initialized. The conditions are validated for Poisson, gamma, normal and binomial experts. Keywords---Generalized linear models, identifiability, invariant transformations, mixtures-of-experts. 1 INTRODUCTION Mixtures-of-Experts (ME) (Jacobs et. al. 1991) and Hierarchical Mixtures-of-Experts (HME) (Jordan and Jacobs 1994) originated from the neural network literature, and have had wide applications for examining relationships among variables [Cacciatore and Nowlan (1994), Meila and Jordan (1995), Ghahramani and Hinton (1996), Tip...
On the Asymptotic Normality of Hierarchical Mixtures-of-Experts for Generalized Linear Models
- IEEE Trans. on Information Theory
, 1999
"... In the class of hierarchical mixtures-of-experts (HME) models, "experts" in the exponential family with generalized linear mean functions of the form /(ff + x T fi) are mixed, according to a set of local weights called the "gating functions" depending on the predictor x. Here /(\Delta) is the inve ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In the class of hierarchical mixtures-of-experts (HME) models, "experts" in the exponential family with generalized linear mean functions of the form /(ff + x T fi) are mixed, according to a set of local weights called the "gating functions" depending on the predictor x. Here /(\Delta) is the inverse link function. We provide regularity conditions on the experts and on the gating functions under which the maximum likelihood method in the large sample limit produces a consistent and asymptotically normal estimator of the mean response. The regularity conditions are validated for Poisson, gamma, normal and binomial experts. Index Terms --- Hierarchical mixtures-of-experts, generalized linear models, maximum likelihood estimation, large sample theory, asymptotic normal distribution, regularity conditions, Fisher information, statistical inference. 1 Introduction In Hierarchical Mixtures-of-Experts (HME) (Jordan and Jacobs 1994), experts of simple regression models are mixed in a tree-s...
Incorporating Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables
- In UAI
, 2001
"... Global variational approximation methods in graphical ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Global variational approximation methods in graphical

