Results 1 
6 of
6
Convergence results for the EM Approach to Mixtures of Experts Architectures
 NEURAL NETWORKS
, 1995
"... The ExpectationMaximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architectur ..."
Abstract

Cited by 96 (6 self)
 Add to MetaCart
The ExpectationMaximization (EM) algorithm is an iterative approach to maximum likelihood parameter estimation. Jordan and Jacobs recently proposed an EM algorithm for the mixture of experts architecture of Jacobs, Jordan, Nowlan and Hinton (1991) and the hierarchical mixture of experts architecture of Jordan and Jacobs (1992). They showed empirically that the EM algorithm for these architectures yields significantly faster convergence than gradient ascent. In the current paper we provide a theoretical analysis of this algorithm. We show that the algorithm can be regarded as a variable metric algorithm with its searching direction having a positive projection on the gradient of the log likelihood. We also analyze the convergence of the algorithm and provide an explicit expression for the convergence rate. In addition, we describe an acceleration technique that yields a significant speedup in simulation experiments.
Update rules for parameter estimation in Bayesian networks
, 1997
"... This paper reexamines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in online learning [12]. We provide a unified framework for parameter estimation that encompasses both online learning, where the model is co ..."
Abstract

Cited by 53 (2 self)
 Add to MetaCart
This paper reexamines the problem of parameter estimation in Bayesian networks with missing values and hidden variables from the perspective of recent work in online learning [12]. We provide a unified framework for parameter estimation that encompasses both online learning, where the model is continuously adapted to new data cases as they arrive, and the more traditional batch learning, where a preaccumulated set of samples is used in a onetime model selection process. In the batch case, our framework encompassesboth the gradient projection algorithm [2, 3] and the EM algorithm [14] for Bayesian networks. The framework also leads to new online and batch parameter update schemes, including a parameterized version of EM. We provide both empirical and theoretical results indicating that parameterized EM allows faster convergence to the maximum likelihood parameters than does standard EM. 1 Introduction Over the past few years, there has been a growing interest in the problem of le...
A Comparison of New and Old Algorithms for A Mixture Estimation Problem
 Machine Learning
, 1995
"... . We investigate the problem of estimating the proportion vector which maximizes the likelihood of a given sample for a mixture of given densities. We adapt a framework developed for supervised learning and give simple derivations for many of the standard iterative algorithms like gradient projectio ..."
Abstract

Cited by 34 (13 self)
 Add to MetaCart
. We investigate the problem of estimating the proportion vector which maximizes the likelihood of a given sample for a mixture of given densities. We adapt a framework developed for supervised learning and give simple derivations for many of the standard iterative algorithms like gradient projection and EM. In this framework, the distance between the new and old proportion vectors is used as a penalty term. The square distance leads to the gradient projection update, and the relative entropy to a new update which we call the exponentiated gradient update (EGj ). Curiously, when a second order Taylor expansion of the relative entropy is used, we arrive at an update EMj which, for j = 1, gives the usual EM update. Experimentally, both the EMjupdate and the EGjupdate for j ? 1 outperform the EM algorithm and its variants. We also prove a polynomial bound on the rate of convergence of the EGj algorithm. 1. Introduction The problem of maximumlikelihood (ML) estimation of a mixture of de...
Constrained Nonparametric Estimation via Mixtures
, 2001
"... We present a general approach to estimating probability measures constrained to lie in a convex set. We represent constrained measures as mixtures of simple, known extreme measures, and so the problem of estimating a constrained measure becomes one of estimating an unconstrained mixing measure. Conv ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We present a general approach to estimating probability measures constrained to lie in a convex set. We represent constrained measures as mixtures of simple, known extreme measures, and so the problem of estimating a constrained measure becomes one of estimating an unconstrained mixing measure. Convex constraints arise in many modeling situations, such as estimation of the mean and estimation under stochastic ordering constraints. We describe mixture representation techniques for these and other situations, and discuss applications to maximum likelihood and Bayesian estimation.
Statistical Models for Cooccurrence Data
, 1998
"... Modeling and predicting cooccurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing cooccurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from tw ..."
Abstract
 Add to MetaCart
Modeling and predicting cooccurrences of events is a fundamental problem of unsupervised learning. In this contribution we develop a statistical framework for analyzing cooccurrence data in a general setting where elementary observations are joint occurrences of pairs of abstract objects from two finite sets. The main challenge for statistical models in this context is to overcome the inherent data sparseness and to estimate the probabilities for pairs which were rarely observed or even unobserved in a given sample set. Moreover, it is often of considerable interest to extract grouping structure or to find a hierarchical data organization. A novel family of mixture models is proposed which explain the observed data by a finite number of shared aspects or clusters. This provides a common framework for statistical inference and structure discovery and also includes several recently proposed models as special cases. Adopting the maximum likelihood principle, EM algorithms are d...
Fitting Finite Mixture Model to Exchange Rate Using Maximum Likelihood Estimation 1
"... Abstract Exchange rate has great influence to the inflation and economic growth for a country. The importance of currency is that the great influence on import and export prices with the changes of exchange rate. Thus, maximum likelihood estimation (MLE) is used to fit finite mixture model. In this ..."
Abstract
 Add to MetaCart
Abstract Exchange rate has great influence to the inflation and economic growth for a country. The importance of currency is that the great influence on import and export prices with the changes of exchange rate. Thus, maximum likelihood estimation (MLE) is used to fit finite mixture model. In this paper, a twocomponent mixture of normal distribution is used to analysis the return value of nominal monthly exchange rate for Malaysia, Thailand and Philippines by using maximum likelihood estimation. The data collected for this paper is taken from July 2005 until September 2012. IJSER Key words: Exchange rate, maximum likelihood estimation, finite