Results 1 - 10
of
20
Bayesian reconstructions from emission tomography data using a modified EM algorithm
- IEEE Trans. Med. Imag
, 1990
"... Abstract-A new method of reconstruction from SPECT data is proposed, which builds on the EM approach to maximum likelihood reconstruction from emission tomography data, but aims instead at maximum posterior probability estimation, that takes account of prior belief about “smoothness ” in the isotope ..."
Abstract
-
Cited by 162 (2 self)
- Add to MetaCart
Abstract-A new method of reconstruction from SPECT data is proposed, which builds on the EM approach to maximum likelihood reconstruction from emission tomography data, but aims instead at maximum posterior probability estimation, that takes account of prior belief about “smoothness ” in the isotope concentration. A novel modification to the EM algorithm yields a practical method. The method is illustrated by an application to data from brain scans. I.
Convergence results for the EM Approach to Mixtures of Experts Architectures
- NEURAL NETWORKS
, 1995
"... The Expectation-Maximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architectur ..."
Abstract
-
Cited by 89 (6 self)
- Add to MetaCart
The Expectation-Maximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architecture of Jordan and Jacobs (1992). They showed empirically that the EM algorithm for these architectures yields significantly faster convergence than gradient ascent. In the current paper we provide a theoretical analysis of this algorithm. We show that the algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood. We also analyze the convergence of the algorithm and provide an explicit expression for the convergence rate. In addition, we describe an acceleration technique that yields a significant speedup in simulation experiments.
Unsupervised Learning from Dyadic Data
, 1998
"... Dyadic data refers to a domain with two finite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This includes event co-occurrences, histogram data, and single stimulus preference data as special cases. Dyadic data arises naturally in many applic ..."
Abstract
-
Cited by 89 (9 self)
- Add to MetaCart
Dyadic data refers to a domain with two finite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This includes event co-occurrences, histogram data, and single stimulus preference data as special cases. Dyadic data arises naturally in many applications ranging from computational linguistics and information retrieval to preference analysis and computer vision. In this paper, we present a systematic, domain-independent framework for unsupervised learning from dyadic data by statistical mixture models. Our approach covers different models with flat and hierarchical latent class structures and unifies probabilistic modeling and structure discovery. Mixture models provide both, a parsimonious yet flexible parameterization of probability distributions with good generalization performance on sparse data, as well as structural information about data-inherent grouping structure. We propose an annealed version of the standard Expectation Maximization algorithm for model fitting which is empirically evaluated on a variety of data sets from different domains.
Update rules for parameter estimation in Bayesian networks
, 1997
"... This paper re-examines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in on-line learning [12]. We provide a unified framework for parameter estimation that encompasses both on-line learning, where the model is co ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
This paper re-examines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in on-line learning [12]. We provide a unified framework for parameter estimation that encompasses both on-line learning, where the model is continuously adapted to new data cases as they arrive, and the more traditional batch learning, where a pre-accumulated set of samples is used in a one-time model selection process. In the batch case, our framework encompassesboth the gradient projection algorithm [2, 3] and the EM algorithm [14] for Bayesian networks. The framework also leads to new on-line and batch parameter update schemes, including a parameterized version of EM. We provide both empirical and theoretical results indicating that parameterized EM allows faster convergence to the maximum likelihood parameters than does standard EM. 1 Introduction Over the past few years, there has been a growing interest in the problem of le...
A Comparison of New and Old Algorithms for A Mixture Estimation Problem
- Machine Learning
, 1995
"... . We investigate the problem of estimating the proportion vector which maximizes the likelihood of a given sample for a mixture of given densities. We adapt a framework developed for supervised learning and give simple derivations for many of the standard iterative algorithms like gradient projectio ..."
Abstract
-
Cited by 27 (12 self)
- Add to MetaCart
. We investigate the problem of estimating the proportion vector which maximizes the likelihood of a given sample for a mixture of given densities. We adapt a framework developed for supervised learning and give simple derivations for many of the standard iterative algorithms like gradient projection and EM. In this framework, the distance between the new and old proportion vectors is used as a penalty term. The square distance leads to the gradient projection update, and the relative entropy to a new update which we call the exponentiated gradient update (EGj ). Curiously, when a second order Taylor expansion of the relative entropy is used, we arrive at an update EMj which, for j = 1, gives the usual EM update. Experimentally, both the EMj-update and the EGj-update for j ? 1 outperform the EM algorithm and its variants. We also prove a polynomial bound on the rate of convergence of the EGj algorithm. 1. Introduction The problem of maximum-likelihood (ML) estimation of a mixture of de...
Self-Organizing Maps, Vector Quantization, and Mixture Modeling
- IEEE Transactions on Neural Networks
, 2001
"... |Self-organizing maps are popular algorithms for unsupervised learning and data visualization. Exploiting the link between vector quantization and mixture modeling, we derive EM algorithms for self-organizing maps with and without missing values. We compare self-organizing maps with the elastic-net ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
|Self-organizing maps are popular algorithms for unsupervised learning and data visualization. Exploiting the link between vector quantization and mixture modeling, we derive EM algorithms for self-organizing maps with and without missing values. We compare self-organizing maps with the elastic-net approach and explain why the former is better suited for the visualization of high-dimensional data. Several extensions and improvements are discussed. As an illustration we apply a self-organizing map based on a multinomial distribution to market basket analysis. I. Introduction Self-organizing maps are popular tools for clustering and visualization of high-dimensional data [1], [2]. The wellknown Kohonen learning algorithm can be interpreted as a variant of vector quantization with additional lateral interactions [3], [4]. The addition of lateral interaction between units introduces a sense of topology, such that neighboring units represent inputs that are close together in input space [...
Batch and on-line parameter estimation of Gaussian mixtures based on the joint entropy
- In Neural Information Processing Systems
, 1998
"... We describe a new iterative method for parameter estimation of Gaussian mixtures. The new method is based on a framework developed by Kivinen and Warmuth for supervised on-line learning. In contrast to gradient descent and EM, which estimate the mixture’s covariance matrices, the proposed method est ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
We describe a new iterative method for parameter estimation of Gaussian mixtures. The new method is based on a framework developed by Kivinen and Warmuth for supervised on-line learning. In contrast to gradient descent and EM, which estimate the mixture’s covariance matrices, the proposed method estimates the inverses of the covariance matrices. Furthermore, the new parameter estimation procedure can be applied in both on-line and batch settings. We show experimentally that it is typically faster than EM, and usually requires about half as many iterations as EM. We also describe experiments with digit recognition that demonstrate the merits of the on-line version. 1
Penalized Maximum Likelihood Estimator for Normal Mixtures
, 2000
"... The estimation of the parameters of a mixture of Gaussian densities is considered, within the framework of maximum likelihood. Due to unboundedness of the likelihood function, the maximum likelihood estimator fails to exist. We adopt a solution to likelihood function degeneracy which consists in pen ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The estimation of the parameters of a mixture of Gaussian densities is considered, within the framework of maximum likelihood. Due to unboundedness of the likelihood function, the maximum likelihood estimator fails to exist. We adopt a solution to likelihood function degeneracy which consists in penalizing the likelihood function. The resulting penalized likelihood function is then bounded over the parameter space and the existence of the penalized maximum likelihood estimator is granted. As original contribution we provide asymptotic properties, and in particular a consistency proof, for the penalized maximum likelihood estimator. Numerical examples are provided in the finite data case, showing the performances of the penalized estimator compared to the standard one.
Inadequacy of Interval Estimates Corresponding to Variational Bayesian Approximations
- IN AISTATS05 (EDS R.G. COWELL AND Z. GHAHRAMANI), SOCIETY FOR ARTIFICIAL INTELLIGENCE AND STATISTICS
, 2005
"... In this paper we investigate the properties of the covariance matrices associated with variational Bayesian approximations, based on data from mixture models, and compare them with the true covariance matrices, corresponding to Fisher information matrices. It is shown ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In this paper we investigate the properties of the covariance matrices associated with variational Bayesian approximations, based on data from mixture models, and compare them with the true covariance matrices, corresponding to Fisher information matrices. It is shown
An Algorithm for Unsupervised Learning via Normal Mixture Models
- In ISIS: Information, Statistics and Induction in Science
, 1996
"... : We consider the approach to unsupervised learning whereby a normal mixture model is fitted to the data by maximum likelihood. An algorithm called NMM is presented that enables the normal mixture model with either restricted or unrestricted component covariance matrices to be fitted to a given data ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
: We consider the approach to unsupervised learning whereby a normal mixture model is fitted to the data by maximum likelihood. An algorithm called NMM is presented that enables the normal mixture model with either restricted or unrestricted component covariance matrices to be fitted to a given data set. The algorithm automatically handles the problem of the specification of initial values for the parameters in the iterative fitting of the model within the framework of the EM algorithm. The algorithm also has the provision to carry a test for the number of components on the basis of the likelihood ratio statistic. Keywords: Mixture models, Maximum likelihood, EM algorithm, Likelihood ratio test. Area of Interest: Concept Formation and Classification. 1 Introduction In this paper we consider the development of an algorithm for the fitting of a normal mixture model in the absence of data on entities that have been classified with respect to the components of the mixture. This is usual...

