Results 11 - 20
of
62
Structurally Adaptive Modular Networks for Non-Stationary Environments
- IEEE Transactions on Neural Networks
"... This paper introduces a neural network capable of dynamically adapting its architecture to realize time variant non-linear input-output maps. This network has its roots in the mixture of experts framework but uses a localized model for the gating network. Modules or experts are grown or pruned depen ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
This paper introduces a neural network capable of dynamically adapting its architecture to realize time variant non-linear input-output maps. This network has its roots in the mixture of experts framework but uses a localized model for the gating network. Modules or experts are grown or pruned depending on the complexity of the modeling problem. The structural adaptation procedure addresses the model selection problem and typically leads to much better parameter estimation. Batch mode learning equations are extended to obtain on-line update rules enabling the network to model time varying environments. Simulation results are presented throughout the paper to support the proposed techniques. This research was supported in part by ARO contracts DAAH04-94-G-0417 and 04-95-10494 and NSF grant ECS 9307632. Contents 1 Introduction 3 2 Background on Mixture of Experts 4 2.1 Generic Mixture of Experts Architecture : : : : : : : : : : : : : : : : : : : : : : : 4 2.2 Drawbacks of a Global...
Bayesian Inference in Mixtures-of-Experts and Hierarchical Mixtures-of-Experts Models With an Application to Speech Recognition
, 1995
"... Machine classification of acoustic waveforms as speech events is often difficult due to context-dependencies. A vowel recognition task with multiple speakers is studied in this paper via the use of a class of modular and hierarchical systems referred to as mixtures-of-experts and hierarchical mixtur ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
Machine classification of acoustic waveforms as speech events is often difficult due to context-dependencies. A vowel recognition task with multiple speakers is studied in this paper via the use of a class of modular and hierarchical systems referred to as mixtures-of-experts and hierarchical mixtures-of-experts models. The statistical model underlying the systems is a mixture model in which both the mixture coefficients and the mixture components are generalized linear models. A full Bayesian approach is used as a basis of inference and prediction. Computations are performed using Markov chain Monte Carlo methods. A key benefit of this approach is the ability to obtain a sample from the posterior distribution of any functional of the parameters of the given model. In this way, more information is obtained than provided by a point estimate. Also avoided is the need to rely on a normal approximation to the posterior as the basis of inference. This is particularly important in cases wher...
Specialized Mappings and the Estimation of Human Body Pose from a Single Image
, 2000
"... We present an approach for recovering articulated body pose from single monocular images using the Specialized Mappings Architecture (SMA), a non-linear supervised learning architecture. SMA's consist of several specialized forward (input to output space) mapping functions and a feedback matching fu ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
We present an approach for recovering articulated body pose from single monocular images using the Specialized Mappings Architecture (SMA), a non-linear supervised learning architecture. SMA's consist of several specialized forward (input to output space) mapping functions and a feedback matching function, estimated automatically from data. Each of these forward functions maps certain areas (possibly disconnected) of the input space onto the output space. A probabilistic model for the architecture is first formalized along with a mechanism for learning its parameters. The learning problem is approached using a maximum likelihood estimation framework; we present Expectation Maximization (EM) algorithms for several different choices of the likelihood function. The performance of the presented solutions under these different likelihood functions is compared in the task of estimating human body posture from low level visual features obtained from a single image, showing promising results. ...
Adaptive Sparse Grids
- ANZIAM J
, 2001
"... Sparse grids, as studied by Zenger and Griebel in the last 10 years have been very successful in the solution of partial di#erential equations, integral equations and classification problems. Adaptive sparse grid functions are elements of a function space lattice. It is seen that such lattices a ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Sparse grids, as studied by Zenger and Griebel in the last 10 years have been very successful in the solution of partial di#erential equations, integral equations and classification problems. Adaptive sparse grid functions are elements of a function space lattice. It is seen that such lattices allow the generalisation of sparse grid techniques to the fitting of very high-dimensional functions with categorical and continuous variables. We have observed in first tests that these general adaptive sparse grids allow the identification of the ANOVA structure and can thus provide comprehensible models which is very important for data mining applications. Maybe the main advantage of these models is that they do not include any spurious interaction terms and thus can deal with very high dimensional data. Contents 1
Mining databases with different schemas: Integrating incompatible classifers
- In Proc. 4th Intl Conf. Knowledge Discovery and Data Mining
, 1998
"... Distributed data mining systems aim to discover (and combine) usefull information that is distributed across multiple databases. The JAM system, for example, applies machine learning algorithms to compute models over distributed data sets and employs meta-learning techniques to combine the multiple ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
Distributed data mining systems aim to discover (and combine) usefull information that is distributed across multiple databases. The JAM system, for example, applies machine learning algorithms to compute models over distributed data sets and employs meta-learning techniques to combine the multiple models. Occasionally, however, these models (or classifiers) are induced from databases that have (moderately) different schemas and hence are incompatible. In this paper, we systematically investigate the problem of combining multiple models computed over distributed data sets with different schemas. Through experiments performed on actual credit card data provided by two different financial institutions, we evaluate the effectiveness of the proposed approaches and demonstrate their potential utility. Keywords: bridging agents, incompatible classifiers, database schema, distributed data mining, meta-learning. Contact Author: Andreas L. Prodromidis (andreas@cs.columbia.edu) This research ...
A hybrid projection based and radial basis function architecture: Initial values and global optimization
, 2001
"... We introduce a mechanism for constructing and training a hybrid architecture of projection based units and radial basis functions. In particular, we introduce an optimization scheme which includes several steps and assures a convergence to a useful solution. During network architecture constructi ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
We introduce a mechanism for constructing and training a hybrid architecture of projection based units and radial basis functions. In particular, we introduce an optimization scheme which includes several steps and assures a convergence to a useful solution. During network architecture construction and training, it is determined whether a unit should be removed or replaced. The resulting architecture has often smaller number of units compared with competing architectures. A specific overfitting resulting from shrinkage of the RBF radii is addressed by introducing a penalty on small radii. Classification and regression results are demonstrated on various benchmark data sets and compared with several variants of RBF networks [?, ?]. A striking performance improvement is achieved on the vowel data set [?]. Keywords: Projection units, RBF Units, Hybrid Network Architecture, SMLP, Clustering, Regularization. 1
Robust Full Bayesian Learning for Neural Networks
, 1999
"... In this paper, we propose a hierarchical full Bayesian model for neural networks. This model treats the model dimension (number of neurons), model parameters, regularisation parameters and noise parameters as random variables that need to be estimated. We develop a reversible jump Markov chain Monte ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
In this paper, we propose a hierarchical full Bayesian model for neural networks. This model treats the model dimension (number of neurons), model parameters, regularisation parameters and noise parameters as random variables that need to be estimated. We develop a reversible jump Markov chain Monte Carlo (MCMC) method to perform the necessary computations. We find that the results obtained using this method are not only better than the ones reported previously, but also appear to be robust with respect to the prior specification. In addition, we propose a novel and computationally efficient reversible jump MCMC simulated annealing algorithm to optimise neural networks. This algorithm enables us to maximise the joint posterior distribution of the network parameters and the number of basis function. It performs a global search in the joint space of the parameters and number of parameters, thereby surmounting the problem of local minima. We show that by calibrating the full hierarchical ...
SECRET: A scalable linear regression tree algorithm
- In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2002
"... Recently there has been an increasing interest in developing regression models for large datasets that are both accurate and easy to interpret. Regressors that have these properties are regression trees with linear models in the leaves, but so far, the algorithms proposed for constructing them are n ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Recently there has been an increasing interest in developing regression models for large datasets that are both accurate and easy to interpret. Regressors that have these properties are regression trees with linear models in the leaves, but so far, the algorithms proposed for constructing them are not scalable. In this paper we propose a novel regression tree construction algorithm that is both accurate and can truly scale to very large datasets. The main idea is, for every intermediate node, to use the EM algorithm for Gaussian mixtures to find two clusters in the data and to locally transform the regression problem into a classification problem based on closeness to these clusters. Goodness of split measures, like the gini gain, can then be used to determine the split variable and the split point much like in classification tree construction. Scalability of the algorithm can be enhanced by employing scalable versions of the EM and the classification tree construction algorithms. Tests on real and artificial data show that the proposed algorithm has accuracy comparable to other linear regression tree algorithms but requires orders of magnitude less computation time for large datasets. 1.
Bayesian Methods for Neural Networks
, 1999
"... Summary The application of the Bayesian learning paradigm to neural networks results in a flexi-ble and powerful nonlinear modelling framework that can be used for regression, den-sity estimation, prediction and classification. Within this framework, all sources of uncertainty are expressed and meas ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Summary The application of the Bayesian learning paradigm to neural networks results in a flexi-ble and powerful nonlinear modelling framework that can be used for regression, den-sity estimation, prediction and classification. Within this framework, all sources of uncertainty are expressed and measured by probabilities. This formulation allows for a probabilistic treatment of our a priori knowledge, domain specific knowledge, model selection schemes, parameter estimation methods and noise estimation techniques. Many researchers have contributed towards the development of the Bayesian learn-ing approach for neural networks. This thesis advances this research by proposing several novel extensions in the areas of sequential learning, model selection, optimi-sation and convergence assessment. The first contribution is a regularisation strategy for sequential learning based on extended Kalman filtering and noise estimation via evidence maximisation. Using the expectation maximisation (EM) algorithm, a similar algorithm is derived for batch learning. Much of the thesis is, however, devoted to Monte Carlo simulation methods. A robust Bayesian method is proposed to estimate,
Microarchitecture sensitive empirical models for compiler optimizations
- In CGO
, 2007
"... This paper proposes the use of empirical modeling techniques for building microarchitecture sensitive models for compiler optimizations. The models we build relate program performance to settings of compiler optimization flags, associated heuristics and key microarchitectural parameters. Unlike trad ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper proposes the use of empirical modeling techniques for building microarchitecture sensitive models for compiler optimizations. The models we build relate program performance to settings of compiler optimization flags, associated heuristics and key microarchitectural parameters. Unlike traditional analytical modeling methods, this relationship is learned entirely from data obtained by measuring performance at a small number of carefully selected compiler/microarchitecture configurations. We evaluate three different learning techniques in this context viz. linear regression, adaptive regression splines and radial basis function networks. We use the generated models to a) predict program performance at arbitrary compiler/microarchitecture configurations, b) quantify the significance of complex interactions between optimizations and the microarchitecture, and c) efficiently search for ’optimal’ settings of optimization flags and heuristics for any given microarchitectural configuration. Our evaluation using benchmarks from the SPEC CPU2000 suits suggests that accurate models (< 5 % average error in prediction) can be generated using a reasonable number of simulations. We also find that using compiler settings prescribed by a model-based search can improve program performance by as much as 19 % (with an average of 9.5%) over highly optimized binaries. 1.

