Results 1 - 10
of
32
Space-Alternating Generalized Expectation-Maximization Algorithm
- IEEE Trans. Signal Processing
, 1994
"... The expectation-maximization (EM) method can facilitate maximizing likelihood functions that arise in statistical estimation problems. In the classical EM paradigm, one iteratively maximizes the conditional loglikelihood of a single unobservable complete data space, rather than maximizing the intra ..."
Abstract
-
Cited by 102 (21 self)
- Add to MetaCart
The expectation-maximization (EM) method can facilitate maximizing likelihood functions that arise in statistical estimation problems. In the classical EM paradigm, one iteratively maximizes the conditional loglikelihood of a single unobservable complete data space, rather than maximizing the intractable likelihood function for the measured or incomplete data. EM algorithms update all parameters simultaneously, which has two drawbacks: 1) slow convergence, and 2) difficult maximization steps due to coupling when smoothness penalties are used. This paper describes the space-alternating generalized EM (SAGE) method, which updates the parameters sequentially by alternating between several small hidden-data spaces defined by the algorithm designer. We prove that the sequence of estimates monotonically increases the penalizedlikelihood objective, we derive asymptotic convergence rates, and we provide sufficient conditions for monotone convergence in norm. Two signal processing applicatio...
Convergence results for the EM Approach to Mixtures of Experts Architectures
- NEURAL NETWORKS
, 1995
"... The Expectation-Maximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architectur ..."
Abstract
-
Cited by 89 (6 self)
- Add to MetaCart
The Expectation-Maximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architecture of Jordan and Jacobs (1992). They showed empirically that the EM algorithm for these architectures yields significantly faster convergence than gradient ascent. In the current paper we provide a theoretical analysis of this algorithm. We show that the algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood. We also analyze the convergence of the algorithm and provide an explicit expression for the convergence rate. In addition, we describe an acceleration technique that yields a significant speedup in simulation experiments.
Convergence of a stochastic approximation version of the EM algorithm
, 1997
"... The Expectation Maximization (EM) algorithm is a powerful computational technique for locating maxima of functions... ..."
Abstract
-
Cited by 47 (7 self)
- Add to MetaCart
The Expectation Maximization (EM) algorithm is a powerful computational technique for locating maxima of functions...
Accelerating EM for large databases
- Machine Learning
, 2001
"... The EM algorithm is a popular method for parameter estimation in a variety of problems involving missing data. However, the EM algorithm often requires signi cant computational resources and has been dismissed as impractical for large databases. We presenttwo approaches that signi cantly reduce the ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
The EM algorithm is a popular method for parameter estimation in a variety of problems involving missing data. However, the EM algorithm often requires signi cant computational resources and has been dismissed as impractical for large databases. We presenttwo approaches that signi cantly reduce the computational cost of applying the EM algorithm to databases with a large number of cases, including databases with large dimensionality. Both approaches are based on partial E-steps for which we can use the results of Neal and Hinton (1998) to obtain the standard convergence guarantees of EM. The rst approach is a version of the incremental EM, described in Neal and Hinton (1998), which cycles through data cases in blocks. The number of cases in each block dramatically e ects the e ciency of the algorithm. We provide a method for selecting a near optimal block size. The second approach, which we call lazy EM, will, at scheduled iterations, evaluate the signi cance of each data case and then proceed for several iterations actively using only the signi cant cases. We demonstrate that both methods can signi cantly reduce computational costs through their application to high-dimensional real-world and synthetic mixture modeling problems for large databases. Keywords: Expectation Maximization Algorithm, incremental EM, lazy EM, online EM, data blocking, mixture models, clustering.
Learning Probabilistic Networks
- THE KNOWLEDGE ENGINEERING REVIEW
, 1998
"... A probabilistic network is a graphical model that encodes probabilistic relationships between variables of interest. Such a model records qualitative influences between variables in addition to the numerical parameters of the probability distribution. As such it provides an ideal form for combini ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
A probabilistic network is a graphical model that encodes probabilistic relationships between variables of interest. Such a model records qualitative influences between variables in addition to the numerical parameters of the probability distribution. As such it provides an ideal form for combining prior knowledge, which might be limited solely to experience of the influences between some of the variables of interest, and data. In this paper, we first show how data can be used to revise initial estimates of the parameters of a model. We then progress to showing how the structure of the model can be revised as data is obtained. Techniques for learning with incomplete data are also covered.
A Component-wise EM Algorithm for Mixtures
, 1999
"... In some situations, EM algorithm shows slow convergence problems. One possible reason is that standard procedures update the parameters simultaneously. In this paper we focus on nite mixture estimation. In this framework, we propose a component-wise EM, which updates the parameters sequentially. We ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
In some situations, EM algorithm shows slow convergence problems. One possible reason is that standard procedures update the parameters simultaneously. In this paper we focus on nite mixture estimation. In this framework, we propose a component-wise EM, which updates the parameters sequentially. We give an interpretation of this procedure as a proximal point algorithm and use it to prove the convergence. Illustrative numerical experiments show how our algorithm compares to EM and a version of the SAGE algorithm.
Accelerated Quantification of Bayesian Networks with Incomplete Data
- In Proceedings of First International Conference on Knowledge Discovery and Data Mining
, 1995
"... Probabilistic expert systems based on Bayesian networks (BNs) require initial specification of both a qualitative graphical structure and quantitative assessment of conditional probability tables. This paper considers statistical batch learning of the probability tables on the basis of incomple ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Probabilistic expert systems based on Bayesian networks (BNs) require initial specification of both a qualitative graphical structure and quantitative assessment of conditional probability tables. This paper considers statistical batch learning of the probability tables on the basis of incomplete data and expert knowledge. The EM algorithm with a generalized conjugate gradient acceleration method has been dedicated to quantification of BNs by maximum posterior likelihood estimation for a super-class of the recursive graphical models. This new class of models allows a great variety of local functional restrictions to be imposed on the statistical model, which hereby extents the control and applicability of the constructed method for quantifying BNs. Introduction The construction of probabilistic expert systems (Pearl 1988, Andreassen et al. 1989) based on Bayesian networks (BNs) is often a challenging process. It is typically divided into two parts: First the constructi...
On Stochastic Versions of the EM Algorithm
, 1995
"... We compare three different stochastic versions of the EM
algorithm: The SEM algorithm, the SAEM algorithm and the MCEM algorithm. We suggest that the most relevant contribution of the MCEM methodology is what we call the
simulated annealing MCEM algorithm, which turns out to be very close to SAEM. ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
We compare three different stochastic versions of the EM
algorithm: The SEM algorithm, the SAEM algorithm and the MCEM algorithm. We suggest that the most relevant contribution of the MCEM methodology is what we call the
simulated annealing MCEM algorithm, which turns out to be very close to SAEM. We focus particularly on the mixture of
distributions problem. In this context, we review the available theoretical results on the convergence of these algorithms and on the behavior of SEM as the sample size tends to infinity. The second part is devoted to intensive Monte Carlo numerical simulations and a real data study. We show that, for some particular mixture situations, the SEM algorithm is almost always preferable to the EM and
simulated annealing versions SAEM and MCEM. For
some very intricate mixtures, however, none of these algorithms can be confidently used. Then, SEM can be used as an efficient data exploratory tool for locating significant maxima of the likelihood function. In the real data case, we show that the SEM stationary distribution provides a contrasted view of the loglikelihood by emphasizing sensible maxima.
A Fast and Robust General Purpose Clustering Algorithm
- In Pacific Rim International Conference on Artificial Intelligence
, 2000
"... General purpose and highly applicable clustering methods are usually required during the early stages of knowledge discovery exercises. k-Means has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very larg ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
General purpose and highly applicable clustering methods are usually required during the early stages of knowledge discovery exercises. k-Means has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, k-Means has several disadvantages derived from its statistical simplicity. We propose an algorithm that remains very efficient, generally applicable, multi-dimensional but is more robust to noise and outliers. We achieve this by using the discrete median rather than the mean as the estimator of the center of a cluster. Comparison with k-Means, Expectation Maximization and Gibbs sampling demonstrates the advantages of our algorithm.
Incremental Model-Based Clustering for Large Datasets with Small Clusters
- Journal of Computational and Graphical Statistics
, 2003
"... Clustering is often useful for analyzing and summarizing information within large datasets. Model-based clustering methods have been found to be e#ective for determining the number of clusters, dealing with outliers, and selecting the best clustering method in datasets that are small to moderate ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Clustering is often useful for analyzing and summarizing information within large datasets. Model-based clustering methods have been found to be e#ective for determining the number of clusters, dealing with outliers, and selecting the best clustering method in datasets that are small to moderate in size. For large datasets, current model-based clustering methods tend to be limited by memory and time requirements and the increasing di#culty of maximum likelihood estimation. They may fit too many clusters in some portions of the data and/or miss clusters containing relatively few observations.

