Results 1  10
of
14
Bayesian Mixture Modeling by Monte Carlo Simulation
, 1991
"... . It is shown that Bayesian inference from data modeled by a mixture distribution can feasibly be performed via Monte Carlo simulation. This method exhibits the true Bayesian predictive distribution, implicitly integrating over the entire underlying parameter space. An infinite number of mixture com ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
(Show Context)
. It is shown that Bayesian inference from data modeled by a mixture distribution can feasibly be performed via Monte Carlo simulation. This method exhibits the true Bayesian predictive distribution, implicitly integrating over the entire underlying parameter space. An infinite number of mixture components can be accommodated without difficulty, using a prior distribution for mixing proportions that selects a reasonable subset of components to explain any finite training set. The need to decide on a "correct" number of components is thereby avoided. The feasibility of the method is shown empirically for a simple classification task. Introduction Mixture distributions [8, 20] are an appropriate tool for modeling processes whose output is thought to be generated by several different underlying mechanisms, or to come from several different populations. One aim of a mixture model analysis may be to identify and characterize these underlying "latent classes" [2, 7], either for some scient...
Data Filtering and Distribution Modeling Algorithms for Machine Learning
, 1993
"... vi Acknowledgments vii 1. Introduction 1 1.1 Boosting by majority : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 1.2 Query By Committee : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 1.3 Learning distributions of binary vectors : : ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
vi Acknowledgments vii 1. Introduction 1 1.1 Boosting by majority : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 1.2 Query By Committee : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 1.3 Learning distributions of binary vectors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2. Boosting a weak learning algorithm by majority 10 2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2 The majorityvote game : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14 2.2.1 Optimality of the weighting scheme : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 2.2.2 The representational power of majority gates : : : : : : : : : : : : : : : : : : : : : : 20 2.3 Boosting a weak learner using a majority vote : : : : : : : : : : : : : : : : : : : : : : : : : : 22 2.3.1 Preliminaries : : : : : : : : : : : : : : : : : : : : : : : : : :...
Modular Neural Networks for MAP Classification of Time Series and the Partition Algorithm
, 1996
"... We apply the Partition Algorithm to the problem of time series classification. We assume that the source that generates the time series belongs to a finite set of candidate sources. Classification is based on the computation of posterior probabilities. Prediction error is used to adaptively update t ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
We apply the Partition Algorithm to the problem of time series classification. We assume that the source that generates the time series belongs to a finite set of candidate sources. Classification is based on the computation of posterior probabilities. Prediction error is used to adaptively update the posterior probability of each source. The algorithm is implemented by a hierarchical, modular, recurrent network. The bottom (partition) level of the network consists of neural modules, each one trained to predict the output of one candidate source. The top (decision) level consists of a decision module, which computes posterior probabilities and classifies the time series to the source of maximum posterior probability. The classifier network is formed fi'om the composition of the partition and decision levels. This method applies to deterministic as well as probabilistic time series. Source switching can also be accommodated. We give some examples of application to problems of signal detection, phoneme and enzyme classification. In conclusion, the algorithm presented here gives a systematic method for the design of modular classification networks. The method can be extended by various choices of the partition and decision components.
A Neural Model for MultiExpert Architectures
, 2002
"... We present a generalization of conventional artificial neural networks that allows for a functional equivalence to multiexpert systems. The new model provides an architectural freedom going beyond existing multiexpert models and an integrarive formalism to compare and combine various techniques of ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
We present a generalization of conventional artificial neural networks that allows for a functional equivalence to multiexpert systems. The new model provides an architectural freedom going beyond existing multiexpert models and an integrarive formalism to compare and combine various techniques of learning. (We consider gradient, EM, reinforcement, and unsupervised learn ing.) Its uniform representation aims at a simple netic encoding and evolutionary structure optimization of multiexpert systems. This paper contains a detailed description of the model and learning rules, empirically validates its functionality, and discusses future perspec tives.
Deep Narrow Sigmoid Belief Networks are Universal
, 2007
"... In this paper we show that exponentially deep belief networks [3, 7, 4] can approximate any distribution over binary vectors to arbitrary accuracy, even when the width of each layer is limited to the dimensionality of the data. This resolves an open the problem in [6]. We further show that such netw ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
In this paper we show that exponentially deep belief networks [3, 7, 4] can approximate any distribution over binary vectors to arbitrary accuracy, even when the width of each layer is limited to the dimensionality of the data. This resolves an open the problem in [6]. We further show that such networks can be greedily learned in an easy yet impractical way.
COULD KNOWLEDGEBASED NEURAL LEARNING BE USEFUL IN DEVELOPMENTAL ROBOTICS? THECASEOFKBCC
, 2007
"... The new field of developmental robotics faces the formidable challenge of implementing effective learning mechanisms in complex, dynamic environments. We make a case that knowledgebased learning algorithms might help to meet this challenge. A constructive neural learning algorithm, knowledgebased ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
The new field of developmental robotics faces the formidable challenge of implementing effective learning mechanisms in complex, dynamic environments. We make a case that knowledgebased learning algorithms might help to meet this challenge. A constructive neural learning algorithm, knowledgebased cascadecorrelation (KBCC), autonomously recruits previouslylearned networks in addition to the single hidden units recruited by ordinary cascadecorrelation. This enables learning by analogy when adequate prior knowledge is available, learning by induction from examples when there is no relevant prior knowledge, and various combinations of analogy and induction. A review of experiments with KBCC indicates that recruitment of relevant existing knowledge typically speeds learning and sometimes enables learning of otherwise impossible problems. Some additional domains of interest to developmental robotics are identified in which knowledgebased learning seems essential. The characteristics of KBCC in relation to other knowledgebased neural learners and analogical reasoning are summarized as is the neurological basis for learning from knowledge. Current limitations of this approach and directions for future work are discussed. Keywords: Knowledgebased learning; neural networks; knowledge transfer; developmental robotics. 245 246 T. R. Shultz et al. 1.
Learning Stochastic Feedforward Neural Networks
"... Multilayer perceptrons (MLPs) or neural networks are popular models used for nonlinear regression and classification tasks. As regressors, MLPs model the conditional distribution of the predictor variables Y given the input variables X. However, this predictive distribution is assumed to be unimodal ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Multilayer perceptrons (MLPs) or neural networks are popular models used for nonlinear regression and classification tasks. As regressors, MLPs model the conditional distribution of the predictor variables Y given the input variables X. However, this predictive distribution is assumed to be unimodal (e.g. Gaussian). For tasks involving structured prediction, the conditional distribution should be multimodal, resulting in onetomany mappings. By using stochastic hidden variables rather than deterministic ones, Sigmoid Belief Nets (SBNs) can induce a rich multimodal distribution in the output space. However, previously proposed learning algorithms for SBNs are not efficient and unsuitable for modeling realvalued data. In this paper, we propose a stochastic feedforward network with hidden layers composed of both deterministic and stochastic variables. A new Generalized EM training procedure using importance sampling allows us to efficiently learn complicated conditional distributions. Our model achieves superior performance on synthetic and facial expressions datasets compared to conditional Restricted Boltzmann Machines and Mixture Density Networks. In addition, the latent features of our model improves classification and can learn to generate colorful textures of objects. 1
NOTE Communicated by Yoshua Bengio Deep, Narrow Sigmoid Belief Networks Are Universal
"... In this note, we show that exponentially deep belief networks can approximate any distribution over binary vectors to arbitrary accuracy, even when the width of each layer is limited to the dimensionality of the data. We further show that such networks can be greedily learned in an easy yet impracti ..."
Abstract
 Add to MetaCart
In this note, we show that exponentially deep belief networks can approximate any distribution over binary vectors to arbitrary accuracy, even when the width of each layer is limited to the dimensionality of the data. We further show that such networks can be greedily learned in an easy yet impractical way. 1