Results 1  10
of
23
Bayesian reconstructions from emission tomography data using a modified EM algorithm
 IEEE Trans. Med. Imag
, 1990
"... AbstractA new method of reconstruction from SPECT data is proposed, which builds on the EM approach to maximum likelihood reconstruction from emission tomography data, but aims instead at maximum posterior probability estimation, that takes account of prior belief about “smoothness ” in the isotope ..."
Abstract

Cited by 191 (3 self)
 Add to MetaCart
AbstractA new method of reconstruction from SPECT data is proposed, which builds on the EM approach to maximum likelihood reconstruction from emission tomography data, but aims instead at maximum posterior probability estimation, that takes account of prior belief about “smoothness ” in the isotope concentration. A novel modification to the EM algorithm yields a practical method. The method is illustrated by an application to data from brain scans. I.
Unsupervised Learning from Dyadic Data
, 1998
"... Dyadic data refers to a domain with two finite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This includes event cooccurrences, histogram data, and single stimulus preference data as special cases. Dyadic data arises naturally in many applic ..."
Abstract

Cited by 100 (9 self)
 Add to MetaCart
Dyadic data refers to a domain with two finite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This includes event cooccurrences, histogram data, and single stimulus preference data as special cases. Dyadic data arises naturally in many applications ranging from computational linguistics and information retrieval to preference analysis and computer vision. In this paper, we present a systematic, domainindependent framework for unsupervised learning from dyadic data by statistical mixture models. Our approach covers different models with flat and hierarchical latent class structures and unifies probabilistic modeling and structure discovery. Mixture models provide both, a parsimonious yet flexible parameterization of probability distributions with good generalization performance on sparse data, as well as structural information about datainherent grouping structure. We propose an annealed version of the standard Expectation Maximization algorithm for model fitting which is empirically evaluated on a variety of data sets from different domains.
Convergence results for the EM Approach to Mixtures of Experts Architectures
 NEURAL NETWORKS
, 1995
"... The ExpectationMaximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architectur ..."
Abstract

Cited by 96 (6 self)
 Add to MetaCart
The ExpectationMaximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architecture of Jordan and Jacobs (1992). They showed empirically that the EM algorithm for these architectures yields significantly faster convergence than gradient ascent. In the current paper we provide a theoretical analysis of this algorithm. We show that the algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood. We also analyze the convergence of the algorithm and provide an explicit expression for the convergence rate. In addition, we describe an acceleration technique that yields a significant speedup in simulation experiments.
Update rules for parameter estimation in Bayesian networks
, 1997
"... This paper reexamines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in online learning [12]. We provide a unified framework for parameter estimation that encompasses both online learning, where the model is co ..."
Abstract

Cited by 53 (2 self)
 Add to MetaCart
This paper reexamines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in online learning [12]. We provide a unified framework for parameter estimation that encompasses both online learning, where the model is continuously adapted to new data cases as they arrive, and the more traditional batch learning, where a preaccumulated set of samples is used in a onetime model selection process. In the batch case, our framework encompassesboth the gradient projection algorithm [2, 3] and the EM algorithm [14] for Bayesian networks. The framework also leads to new online and batch parameter update schemes, including a parameterized version of EM. We provide both empirical and theoretical results indicating that parameterized EM allows faster convergence to the maximum likelihood parameters than does standard EM. 1 Introduction Over the past few years, there has been a growing interest in the problem of le...
A Comparison of New and Old Algorithms for A Mixture Estimation Problem
 Machine Learning
, 1995
"... . We investigate the problem of estimating the proportion vector which maximizes the likelihood of a given sample for a mixture of given densities. We adapt a framework developed for supervised learning and give simple derivations for many of the standard iterative algorithms like gradient projectio ..."
Abstract

Cited by 34 (13 self)
 Add to MetaCart
. We investigate the problem of estimating the proportion vector which maximizes the likelihood of a given sample for a mixture of given densities. We adapt a framework developed for supervised learning and give simple derivations for many of the standard iterative algorithms like gradient projection and EM. In this framework, the distance between the new and old proportion vectors is used as a penalty term. The square distance leads to the gradient projection update, and the relative entropy to a new update which we call the exponentiated gradient update (EGj ). Curiously, when a second order Taylor expansion of the relative entropy is used, we arrive at an update EMj which, for j = 1, gives the usual EM update. Experimentally, both the EMjupdate and the EGjupdate for j ? 1 outperform the EM algorithm and its variants. We also prove a polynomial bound on the rate of convergence of the EGj algorithm. 1. Introduction The problem of maximumlikelihood (ML) estimation of a mixture of de...
SelfOrganizing Maps, Vector Quantization, and Mixture Modeling
 IEEE Transactions on Neural Networks
, 2001
"... Selforganizing maps are popular algorithms for unsupervised learning and data visualization. Exploiting the link between vector quantization and mixture modeling, we derive EM algorithms for selforganizing maps with and without missing values. We compare selforganizing maps with the elasticnet ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
Selforganizing maps are popular algorithms for unsupervised learning and data visualization. Exploiting the link between vector quantization and mixture modeling, we derive EM algorithms for selforganizing maps with and without missing values. We compare selforganizing maps with the elasticnet approach and explain why the former is better suited for the visualization of highdimensional data. Several extensions and improvements are discussed. As an illustration we apply a selforganizing map based on a multinomial distribution to market basket analysis. I. Introduction Selforganizing maps are popular tools for clustering and visualization of highdimensional data [1], [2]. The wellknown Kohonen learning algorithm can be interpreted as a variant of vector quantization with additional lateral interactions [3], [4]. The addition of lateral interaction between units introduces a sense of topology, such that neighboring units represent inputs that are close together in input space [...
Batch and online parameter estimation of Gaussian mixtures based on the joint entropy
 In Neural Information Processing Systems
, 1998
"... We describe a new iterative method for parameter estimation of Gaussian mixtures. The new method is based on a framework developed by Kivinen and Warmuth for supervised online learning. In contrast to gradient descent and EM, which estimate the mixture’s covariance matrices, the proposed method est ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
We describe a new iterative method for parameter estimation of Gaussian mixtures. The new method is based on a framework developed by Kivinen and Warmuth for supervised online learning. In contrast to gradient descent and EM, which estimate the mixture’s covariance matrices, the proposed method estimates the inverses of the covariance matrices. Furthermore, the new parameter estimation procedure can be applied in both online and batch settings. We show experimentally that it is typically faster than EM, and usually requires about half as many iterations as EM. We also describe experiments with digit recognition that demonstrate the merits of the online version. 1
Penalized Maximum Likelihood Estimator for Normal Mixtures
, 2000
"... The estimation of the parameters of a mixture of Gaussian densities is considered, within the framework of maximum likelihood. Due to unboundedness of the likelihood function, the maximum likelihood estimator fails to exist. We adopt a solution to likelihood function degeneracy which consists in pen ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
The estimation of the parameters of a mixture of Gaussian densities is considered, within the framework of maximum likelihood. Due to unboundedness of the likelihood function, the maximum likelihood estimator fails to exist. We adopt a solution to likelihood function degeneracy which consists in penalizing the likelihood function. The resulting penalized likelihood function is then bounded over the parameter space and the existence of the penalized maximum likelihood estimator is granted. As original contribution we provide asymptotic properties, and in particular a consistency proof, for the penalized maximum likelihood estimator. Numerical examples are provided in the finite data case, showing the performances of the penalized estimator compared to the standard one.
Maximum likelihood estimation of constellation vectors for blind separation of cochannel BPSK signals and its performance analysis
 IEEE Trans. Sig. Proc
, 1997
"... Abstract—In this paper, we present a method for blind separation of cochannel BPSK signals arriving at an antenna array. This method consists of two parts: the maximum likelihood constellation estimation and assignment. We show that at high SNR, the maximum likelihood constellation estimation is we ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Abstract—In this paper, we present a method for blind separation of cochannel BPSK signals arriving at an antenna array. This method consists of two parts: the maximum likelihood constellation estimation and assignment. We show that at high SNR, the maximum likelihood constellation estimation is well approximated by the smallest distance clustering algorithm, which we proposed earlier on heuristic grounds. We observe that both these methods for estimating the constellation vectors perform very well at high SNR and nearly attain Cramér–Rao bounds. Using this fact and noting that the assignment algorithm causes negligble error at high SNR, we derive upper bounds on the probability of bit error for the above method at high SNR. These upper bounds fall very rapidly with increasing SNR, showing that our constellation estimationassignment approach is very efficient. Simulation results are given to demonstrate the usefulness of the bounds. I.