Results 1  10
of
16
Inducing Features of Random Fields
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1997
"... We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the ..."
Abstract

Cited by 579 (12 self)
 Add to MetaCart
We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the KullbackLeibler divergence between the model and the empirical distribution of the training data. A greedy algorithm determines how features are incrementally added to the field and an iterative scaling algorithm is used to estimate the optimal values of the weights. The random field models and techniques introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are nonMarkovian and have a large number of parameters that must be estimated. Relations to other learning approaches, including decision trees, are given. As a demonstration of the method, we describe its application to the problem of automatic word classifica...
Information Geometry of the EM and em Algorithms for Neural Networks
 Neural Networks
, 1995
"... In order to realize an inputoutput relation given by noisecontaminated examples, it is effective to use a stochastic model of neural networks. A model network includes hidden units whose activation values are not specified nor observed. It is useful to estimate the hidden variables from the obs ..."
Abstract

Cited by 105 (9 self)
 Add to MetaCart
In order to realize an inputoutput relation given by noisecontaminated examples, it is effective to use a stochastic model of neural networks. A model network includes hidden units whose activation values are not specified nor observed. It is useful to estimate the hidden variables from the observed or specified inputoutput data based on the stochastic model. Two algorithms, the EM  and emalgorithms, have so far been proposed for this purpose. The EMalgorithm is an iterative statistical technique of using the conditional expectation, and the emalgorithm is a geometrical one given by information geometry. The emalgorithm minimizes iteratively the KullbackLeibler divergence in the manifold of neural networks. These two algorithms are equivalent in most cases. The present paper gives a unified information geometrical framework for studying stochastic models of neural networks, by forcussing on the EM and em algorithms, and proves a condition which guarantees their equ...
The Latent Maximum Entropy Principle
 In Proc. of ISIT
, 2002
"... We present an extension to Jaynes' maximum entropy principle that handles latent variables. The principle of latent maximum entropy we propose is di#erent from both Jaynes' maximum entropy principle and maximum likelihood estimation, but often yields better estimates in the presence of h ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
We present an extension to Jaynes' maximum entropy principle that handles latent variables. The principle of latent maximum entropy we propose is di#erent from both Jaynes' maximum entropy principle and maximum likelihood estimation, but often yields better estimates in the presence of hidden variables and limited training data. We first show that solving for a latent maximum entropy model poses a hard nonlinear constrained optimization problem in general. However, we then show that feasible solutions to this problem can be obtained e#ciently for the special case of loglinear modelswhich forms the basis for an e#cient approximation to the latent maximum entropy principle. We derive an algorithm that combines expectationmaximization with iterative scaling to produce feasible loglinear solutions. This algorithm can be interpreted as an alternating minimization algorithm in the information divergence, and reveals an intimate connection between the latent maximum entropy and maximum likelihood principles.
Generalization And Maximum Likelihood From Small Data Sets
 Proc. IEEESP Workshop on Neural Networks for Signal Processing
, 1993
"... INTRODUCTION An often encountered learning problem is maximum likelihood training of exponential models. When the state is only partially specified by the training data, iterative training algorithms are used to produce a sequence of models that assign increasing likelihood to the training data. Al ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
INTRODUCTION An often encountered learning problem is maximum likelihood training of exponential models. When the state is only partially specified by the training data, iterative training algorithms are used to produce a sequence of models that assign increasing likelihood to the training data. Although the performance as measured on the training set continues to improve as the algorithms progress, performance on related data sets may eventually begin to deteriorate. The cause of this behavior can be seen when the training problem is stated in the Alternating Minimization framework [1]. A modified maximum likelihood training criterion is suggested to counter this behavior. It leads to a simple modification of the learning algorithms which relates generalization to learning speed. Training Boltzmann Machines [2] and Hidden Markov Models [3, 4, 5, 6] is discussed under this modified criterion. PROBLEM STATEMENT A detailed presentation of this material is avail
Elementary Function Generators for Neural Network Emulators
 IEEE Transactions on Neural Networks
, 2000
"... Abstract—Piecewise first and secondorder approximations are employed to design commonly used elementary function generators for neuralnetwork emulators. Three novel schemes are proposed for the firstorder approximations. The first scheme requires one multiplication, one addition, and a 28byte ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Abstract—Piecewise first and secondorder approximations are employed to design commonly used elementary function generators for neuralnetwork emulators. Three novel schemes are proposed for the firstorder approximations. The first scheme requires one multiplication, one addition, and a 28byte lookup table. The second scheme requires one addition, a 14byte lookup table, and no multiplication. The third scheme needs a 14byte lookup table, no multiplication, and no addition. A secondorder approximation approach provides better function precision; it requires more hardware and involves the computation of one multiplication and two additions and access to a 28byte lookup table. We consider bit serial implementations of the schemes to reduce the hardware cost. The maximum delay for the four schemes ranges from 24 to 32bit serial machine cycles; the secondorder approximation approach has the largest delay. The proposed approach can be applied to compute other elementary function with proper considerations. Index Terms—Elementary function generators, hardwired neuroemulators, neuralnetwork functions, piecewise approximation, square root implementation, trigonometric functions. I.
Partial likelihood for realtime signal processing with finite normal mixtures
 in Proc. IEEE Workshop on Neural Networks for Signal Proc. VIII
, 1998
"... Abstract We introduce a unified framework for nonlinear signal processing with finite normal mixtures (FNM) by using maximum partial likelihood (MPL) theory. We show that the equivalence of MPL to accumulated relative entropy (ARE) minimization is valid for the FNM. Then, we define the information ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract We introduce a unified framework for nonlinear signal processing with finite normal mixtures (FNM) by using maximum partial likelihood (MPL) theory. We show that the equivalence of MPL to accumulated relative entropy (ARE) minimization is valid for the FNM. Then, we define the information geometry of MPL and use the result to derive the em algorithm for distribution learning based on the FNM model. The superior convergence of the em algorithm as compared to the least relative entropy (LRE) and the backpropagation algorithms is demonstrated by simulations. We also discuss the performance of the FNM based equalizers with different number of mixtures and observation vector sizes. I.
IPF for discrete chain factor graphs
 Uncertainty in Artificial Intelligence (UAI
, 2002
"... Iterative Proportional Fitting (IPF), combined with EM, is commonly used as an algorithm for likelihood maximization in undirected graphical models. In this paper, we present two iterative algorithms that generalize upon IPF. The rst one is for likelihood maximization in discrete chain factor ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Iterative Proportional Fitting (IPF), combined with EM, is commonly used as an algorithm for likelihood maximization in undirected graphical models. In this paper, we present two iterative algorithms that generalize upon IPF. The rst one is for likelihood maximization in discrete chain factor graphs, which we de ne as a wide class of discrete variable models including undirected graphical models and Bayesian networks, but also chain graphs and sigmoid belief networks.
Graphical Models: Parameter Learning
 In Handbook of Brain Theory and Neural Networks
, 2003
"... ..."
Information Geometry and Maximum Likelihood Criteria
 Princeton University
, 1996
"... This paper presents a brief comparison of two information geometries as they are used to describe the EM algorithm used in maximum likelihood estimation from incomplete data. The Alternating Minimization framework based on the IGeometry developed by Csisz'ar is presented first, followed by the ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper presents a brief comparison of two information geometries as they are used to describe the EM algorithm used in maximum likelihood estimation from incomplete data. The Alternating Minimization framework based on the IGeometry developed by Csisz'ar is presented first, followed by the emalgorithm of Amari. Following a comparison of these algorithms, a discussion of a variation in likelihood criterion is presented. The EM algorithm is usually formulated so as to improve the marginal likelihood criterion (as described in Section 2.1). Closely related algorithms also exist which are intended to maximize different likelihood criteria. The 1Best criterion, for example, leads to the Viterbi training algorithm used in Hidden Markov Modeling. This criterion has an information geometric description that results from a minor modification of the marginal likelihood formulation. The techniques discussed here are not given in rigorous detail, but rather at a level intended to allow comparison between them; the works cited in the bibliography should be consulted for complete and correct presentations of all methods discussed. 2 Likelihood Criteria for Incomplete Data Problems
Interior Point Implementations of Alternating Minimization Training
, 1994
"... This paper presents an alternating minimization algorithm used to train radial basis function networks. The algorithm is a modification of an interior point method used in solving primal linear programs. The resulting algorithm is shown to have a convergence rate on the order of p nL iterations wh ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper presents an alternating minimization algorithm used to train radial basis function networks. The algorithm is a modification of an interior point method used in solving primal linear programs. The resulting algorithm is shown to have a convergence rate on the order of p nL iterations where n is a measure of the network size and L is a measure of the resulting solution's accuracy.