Results 1  10
of
72
Hierarchical mixtures of experts and the EM algorithm
 Neural Computation
, 1994
"... We present a treestructured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a maximum likelihood ..."
Abstract

Cited by 723 (19 self)
 Add to MetaCart
We present a treestructured architecture for supervised learning. The statistical model underlying the architecture is a hierarchical mixture model in which both the mixture coefficients and the mixture components are generalized linear models (GLIM’s). Learning is treated as a maximum likelihood problem; in particular, we present an ExpectationMaximization (EM) algorithm for adjusting the parameters of the architecture. We also develop an online learning algorithm in which the parameters are updated incrementally. Comparative simulation results are presented in the robot dynamics domain. 1
Active Learning with Statistical Models
, 1995
"... For manytypes of learners one can compute the statistically "optimal" way to select data. We review how these techniques have been used with feedforward neural networks [MacKay, 1992# Cohn, 1994]. We then showhow the same principles may be used to select data for two alternative, statisticallybas ..."
Abstract

Cited by 529 (10 self)
 Add to MetaCart
For manytypes of learners one can compute the statistically "optimal" way to select data. We review how these techniques have been used with feedforward neural networks [MacKay, 1992# Cohn, 1994]. We then showhow the same principles may be used to select data for two alternative, statisticallybased learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate.
Supervised learning from incomplete data via an EM approach
 Advances in Neural Information Processing Systems 6
, 1994
"... Realworld learning tasks may involve highdimensional data sets with arbitrary patterns of missing data. In this paper we present a framework based on maximum likelihood density estimation for learning from such data sets. We use mixture models for the density estimates and make two distinct appeal ..."
Abstract

Cited by 184 (2 self)
 Add to MetaCart
Realworld learning tasks may involve highdimensional data sets with arbitrary patterns of missing data. In this paper we present a framework based on maximum likelihood density estimation for learning from such data sets. We use mixture models for the density estimates and make two distinct appeals to the ExpectationMaximization (EM) principle (Dempster et al., 1977) in deriving a learning algorithmEM is used both for the estimation of mixture components and for coping with missing data. The resulting algorithm is applicable to a wide range of supervised as well as unsupervised learning problems. Results from a classification benchmarkthe iris data setare presented. 1 Introduction Adaptive systems generally operate in environments that are fraught with imperfections; nonetheless they must cope with these imperfections and learn to extract as much relevant information as needed for their particular goals. One form of imperfection is incompleteness in sensing information. Inc...
Image segmentation in video sequences: A probabilistic approach
, 1997
"... "Background subtraction" is an old technique for finding moving objects in a video sequencefor example, cars driving on a freeway. The idea is that subtracting the current image from a timeaveraged background image will leave only nonstationary objects. It is, however, a crude approximation to t ..."
Abstract

Cited by 178 (0 self)
 Add to MetaCart
"Background subtraction" is an old technique for finding moving objects in a video sequencefor example, cars driving on a freeway. The idea is that subtracting the current image from a timeaveraged background image will leave only nonstationary objects. It is, however, a crude approximation to the task of classifying each pixel of the current image
On Convergence Properties of the EM Algorithm for Gaussian Mixtures
 Neural Computation
, 1995
"... We build up the mathematical connection between the "ExpectationMaximization" (EM) algorithm and gradientbased approaches for maximum likelihood learning of finite Gaussian mixtures. We show that the EM step in parameter space is obtained from the gradient via a projection matrix P,andwe provide ..."
Abstract

Cited by 142 (13 self)
 Add to MetaCart
We build up the mathematical connection between the "ExpectationMaximization" (EM) algorithm and gradientbased approaches for maximum likelihood learning of finite Gaussian mixtures. We show that the EM step in parameter space is obtained from the gradient via a projection matrix P,andwe provide an explicit expression for the matrix. We then analyze the convergence of EM in terms of special properties of P and provide new results analyzing the effect that P has on the likelihood surface. Based on these mathematical results, we present a comparative discussion of the advantages and disadvantages of EM and other algorithms for the learning of Gaussian mixture models.
Neural network exploration using optimal experiment design
 Neural Networks
, 1994
"... We consider the question "How should one act when the only goal is to learn as much as possible?" Building on the theoretical results of Fedorov [1972] and MacKay [1992], we apply techniques from Optimal Experiment Design (OED) to guide the query/action selection of a neural network learner. We de ..."
Abstract

Cited by 124 (2 self)
 Add to MetaCart
We consider the question "How should one act when the only goal is to learn as much as possible?" Building on the theoretical results of Fedorov [1972] and MacKay [1992], we apply techniques from Optimal Experiment Design (OED) to guide the query/action selection of a neural network learner. We demonstrate that these techniques allow the learner to minimize its generalization error by exploring its domain efficiently and completely.We conclude that, while not a panacea, OEDbased query/action has muchto offer, especially in domains where its high computational costs can be tolerated.
An Improved Adaptive Background Mixture Model for Realtime Tracking with Shadow Detection
, 2001
"... Realtime segmentation of moving regions in image sequences is a fundamental step in many vision systems including automated visual surveillance, humanmachine interface, and very lowbandwidth telecommunications. A typical method is background subtraction. Many background models have been introduce ..."
Abstract

Cited by 123 (3 self)
 Add to MetaCart
Realtime segmentation of moving regions in image sequences is a fundamental step in many vision systems including automated visual surveillance, humanmachine interface, and very lowbandwidth telecommunications. A typical method is background subtraction. Many background models have been introduced to deal with different problems. One of the successful solutions to these problems is to use a multicolour background model per pixel proposed by Grimson et al [1,2,3]. However, the method suffers from slow learning at the beginning, especially in busy environments. In addition, it can not distinguish between moving shadows and moving objects. This paper presents a method which improves this adaptive background mixture model. By reinvestigating the update equations, we utilise different equations at different phases. This allows our system learn faster and more accurately as well as adapt effectively to changing environments. A shadow detection scheme is also introduced in this paper. It is based on a computational colour space that makes use of our background model. A comparison has been made between the two algorithms. The results show the speed of learning and the accuracy of the model using our update algorithm over the Grimson et al's tracker. When incorporate with the shadow detection, our method results in far better segmentation than that of Grimson et al.
Convergence Properties of the KMeans Algorithms
"... This paper studies the convergence properties of the well known KMeans clustering algorithm. The KMeans algorithm can be described either as a gradient descent algorithm or by slightly extending the mathematics of the EM algorithm to this hard threshold case. We show that the KMeans algorithm act ..."
Abstract

Cited by 83 (2 self)
 Add to MetaCart
This paper studies the convergence properties of the well known KMeans clustering algorithm. The KMeans algorithm can be described either as a gradient descent algorithm or by slightly extending the mathematics of the EM algorithm to this hard threshold case. We show that the KMeans algorithm actually minimizes the quantization error using the very fast Newton algorithm.
Nonlinear Gated Experts for Time Series: Discovering Regimes and Avoiding Overfitting
, 1995
"... this paper: ftp://ftp.cs.colorado.edu/pub/TimeSeries/MyPapers/experts.ps.Z, ..."
Abstract

Cited by 81 (5 self)
 Add to MetaCart
this paper: ftp://ftp.cs.colorado.edu/pub/TimeSeries/MyPapers/experts.ps.Z,
Tracking many objects with many sensors
 In IJCAI99
, 1999
"... Keeping track of multiple objects over time is a problem that arises in many realworld domains. The problem is often complicated by noisy sensors and unpredictable dynamics. Previous work by Huang and Russell, drawing on the data association literature, provided a probabilistic analysis and a thres ..."
Abstract

Cited by 80 (5 self)
 Add to MetaCart
Keeping track of multiple objects over time is a problem that arises in many realworld domains. The problem is often complicated by noisy sensors and unpredictable dynamics. Previous work by Huang and Russell, drawing on the data association literature, provided a probabilistic analysis and a thresholdbased approximation algorithm for the case of multiple objects detected by two spatially separated sensors. This paper analyses the case in which large numbers of sensors are involved. We show that the approach taken by Huang and Russell, who used pairwise sensorbased appearance probabilities as the elementary probabilistic model, does not scale. When more than two observations are made, the objects ' intrinsic properties must be estimated. These provide the necessary conditional independencies to allow a spatial decomposition of the global probability model. We also replace Huang and Russell's threshold algorithm for object identification with a polynomialtime approximation scheme based on Markov chain Monte Carlo simulation. Using sensor data from a freeway traffic simulation, we show that this allows accurate estimation of longrange origin/destination information even when the individual links in the sensor chain are highly unreliable. 1