Results 11  20
of
83
Bayesian Inference in MixturesofExperts and Hierarchical MixturesofExperts Models With an Application to Speech Recognition
, 1995
"... Machine classification of acoustic waveforms as speech events is often difficult due to contextdependencies. A vowel recognition task with multiple speakers is studied in this paper via the use of a class of modular and hierarchical systems referred to as mixturesofexperts and hierarchical mixtur ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
Machine classification of acoustic waveforms as speech events is often difficult due to contextdependencies. A vowel recognition task with multiple speakers is studied in this paper via the use of a class of modular and hierarchical systems referred to as mixturesofexperts and hierarchical mixturesofexperts models. The statistical model underlying the systems is a mixture model in which both the mixture coefficients and the mixture components are generalized linear models. A full Bayesian approach is used as a basis of inference and prediction. Computations are performed using Markov chain Monte Carlo methods. A key benefit of this approach is the ability to obtain a sample from the posterior distribution of any functional of the parameters of the given model. In this way, more information is obtained than provided by a point estimate. Also avoided is the need to rely on a normal approximation to the posterior as the basis of inference. This is particularly important in cases wher...
Specialized Mappings and the Estimation of Human Body Pose from a Single Image
 In: Proceedings of the Workshop on Human Motion
"... We present an approach for recovering articulated body pose from single monocular images using the Specialized Mappings Architecture (SMA), a nonlinear supervised learning architecture. SMA’s consist of several specialized forward (input to output space) mapping functions and a feedback matching fu ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
We present an approach for recovering articulated body pose from single monocular images using the Specialized Mappings Architecture (SMA), a nonlinear supervised learning architecture. SMA’s consist of several specialized forward (input to output space) mapping functions and a feedback matching function, estimated automatically from data. Each of these forward functions maps certain areas (possibly disconnected) of the input space onto the output space. A probabilistic model for the architecture is first formalized along with a mechanism for learning its parameters. The learning problem is approached using a maximum likelihood estimation framework; we present Expectation Maximization (EM) algorithms for several different choices of the likelihood function. The performance of the presented
Structurally Adaptive Modular Networks for NonStationary Environments
 IEEE Transactions on Neural Networks
"... This paper introduces a neural network capable of dynamically adapting its architecture to realize time variant nonlinear inputoutput maps. This network has its roots in the mixture of experts framework but uses a localized model for the gating network. Modules or experts are grown or pruned depen ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
This paper introduces a neural network capable of dynamically adapting its architecture to realize time variant nonlinear inputoutput maps. This network has its roots in the mixture of experts framework but uses a localized model for the gating network. Modules or experts are grown or pruned depending on the complexity of the modeling problem. The structural adaptation procedure addresses the model selection problem and typically leads to much better parameter estimation. Batch mode learning equations are extended to obtain online update rules enabling the network to model time varying environments. Simulation results are presented throughout the paper to support the proposed techniques. This research was supported in part by ARO contracts DAAH0494G0417 and 049510494 and NSF grant ECS 9307632. Contents 1 Introduction 3 2 Background on Mixture of Experts 4 2.1 Generic Mixture of Experts Architecture : : : : : : : : : : : : : : : : : : : : : : : 4 2.2 Drawbacks of a Global...
Network performance assessment for Neurofuzzy data modeling
 Intell. Data Anal. 1997
"... This paper evaluates the performance of ten significance measures applied to the problem of determining an appropriate network structure, for data modelling with neurofuzzy systems. The advantages of Neurofuzzy systems are demonstrated with application to both real and synthetic data interpretation ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
This paper evaluates the performance of ten significance measures applied to the problem of determining an appropriate network structure, for data modelling with neurofuzzy systems. The advantages of Neurofuzzy systems are demonstrated with application to both real and synthetic data interpretation problems. 1
Adaptive Sparse Grids
 ANZIAM J
, 2001
"... Sparse grids, as studied by Zenger and Griebel in the last 10 years have been very successful in the solution of partial di#erential equations, integral equations and classification problems. Adaptive sparse grid functions are elements of a function space lattice. It is seen that such lattices a ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Sparse grids, as studied by Zenger and Griebel in the last 10 years have been very successful in the solution of partial di#erential equations, integral equations and classification problems. Adaptive sparse grid functions are elements of a function space lattice. It is seen that such lattices allow the generalisation of sparse grid techniques to the fitting of very highdimensional functions with categorical and continuous variables. We have observed in first tests that these general adaptive sparse grids allow the identification of the ANOVA structure and can thus provide comprehensible models which is very important for data mining applications. Maybe the main advantage of these models is that they do not include any spurious interaction terms and thus can deal with very high dimensional data. Contents 1
A hybrid projection based and radial basis function architecture: Initial values and global optimization
, 2001
"... We introduce a mechanism for constructing and training a hybrid architecture of projection based units and radial basis functions. In particular, we introduce an optimization scheme which includes several steps and assures a convergence to a useful solution. During network architecture constructi ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
We introduce a mechanism for constructing and training a hybrid architecture of projection based units and radial basis functions. In particular, we introduce an optimization scheme which includes several steps and assures a convergence to a useful solution. During network architecture construction and training, it is determined whether a unit should be removed or replaced. The resulting architecture has often smaller number of units compared with competing architectures. A specific overfitting resulting from shrinkage of the RBF radii is addressed by introducing a penalty on small radii. Classification and regression results are demonstrated on various benchmark data sets and compared with several variants of RBF networks [?, ?]. A striking performance improvement is achieved on the vowel data set [?]. Keywords: Projection units, RBF Units, Hybrid Network Architecture, SMLP, Clustering, Regularization. 1
Mining databases with different schemas: Integrating incompatible classifers
 In Proc. 4th Intl Conf. Knowledge Discovery and Data Mining
, 1998
"... Distributed data mining systems aim to discover (and combine) usefull information that is distributed across multiple databases. The JAM system, for example, applies machine learning algorithms to compute models over distributed data sets and employs metalearning techniques to combine the multiple ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
Distributed data mining systems aim to discover (and combine) usefull information that is distributed across multiple databases. The JAM system, for example, applies machine learning algorithms to compute models over distributed data sets and employs metalearning techniques to combine the multiple models. Occasionally, however, these models (or classifiers) are induced from databases that have (moderately) different schemas and hence are incompatible. In this paper, we systematically investigate the problem of combining multiple models computed over distributed data sets with different schemas. Through experiments performed on actual credit card data provided by two different financial institutions, we evaluate the effectiveness of the proposed approaches and demonstrate their potential utility. Keywords: bridging agents, incompatible classifiers, database schema, distributed data mining, metalearning. Contact Author: Andreas L. Prodromidis (andreas@cs.columbia.edu) This research ...
Robust Full Bayesian Learning for Neural Networks
, 1999
"... In this paper, we propose a hierarchical full Bayesian model for neural networks. This model treats the model dimension (number of neurons), model parameters, regularisation parameters and noise parameters as random variables that need to be estimated. We develop a reversible jump Markov chain Monte ..."
Abstract

Cited by 12 (9 self)
 Add to MetaCart
In this paper, we propose a hierarchical full Bayesian model for neural networks. This model treats the model dimension (number of neurons), model parameters, regularisation parameters and noise parameters as random variables that need to be estimated. We develop a reversible jump Markov chain Monte Carlo (MCMC) method to perform the necessary computations. We find that the results obtained using this method are not only better than the ones reported previously, but also appear to be robust with respect to the prior specification. In addition, we propose a novel and computationally efficient reversible jump MCMC simulated annealing algorithm to optimise neural networks. This algorithm enables us to maximise the joint posterior distribution of the network parameters and the number of basis function. It performs a global search in the joint space of the parameters and number of parameters, thereby surmounting the problem of local minima. We show that by calibrating the full hierarchical ...
Microarchitecture sensitive empirical models for compiler optimizations
 In CGO
, 2007
"... This paper proposes the use of empirical modeling techniques for building microarchitecture sensitive models for compiler optimizations. The models we build relate program performance to settings of compiler optimization flags, associated heuristics and key microarchitectural parameters. Unlike trad ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
This paper proposes the use of empirical modeling techniques for building microarchitecture sensitive models for compiler optimizations. The models we build relate program performance to settings of compiler optimization flags, associated heuristics and key microarchitectural parameters. Unlike traditional analytical modeling methods, this relationship is learned entirely from data obtained by measuring performance at a small number of carefully selected compiler/microarchitecture configurations. We evaluate three different learning techniques in this context viz. linear regression, adaptive regression splines and radial basis function networks. We use the generated models to a) predict program performance at arbitrary compiler/microarchitecture configurations, b) quantify the significance of complex interactions between optimizations and the microarchitecture, and c) efficiently search for ’optimal’ settings of optimization flags and heuristics for any given microarchitectural configuration. Our evaluation using benchmarks from the SPEC CPU2000 suits suggests that accurate models (< 5 % average error in prediction) can be generated using a reasonable number of simulations. We also find that using compiler settings prescribed by a modelbased search can improve program performance by as much as 19 % (with an average of 9.5%) over highly optimized binaries. 1.
SECRET: A scalable linear regression tree algorithm
 In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2002
"... Recently there has been an increasing interest in developing regression models for large datasets that are both accurate and easy to interpret. Regressors that have these properties are regression trees with linear models in the leaves, but so far, the algorithms proposed for constructing them are n ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Recently there has been an increasing interest in developing regression models for large datasets that are both accurate and easy to interpret. Regressors that have these properties are regression trees with linear models in the leaves, but so far, the algorithms proposed for constructing them are not scalable. In this paper we propose a novel regression tree construction algorithm that is both accurate and can truly scale to very large datasets. The main idea is, for every intermediate node, to use the EM algorithm for Gaussian mixtures to find two clusters in the data and to locally transform the regression problem into a classification problem based on closeness to these clusters. Goodness of split measures, like the gini gain, can then be used to determine the split variable and the split point much like in classification tree construction. Scalability of the algorithm can be enhanced by employing scalable versions of the EM and the classification tree construction algorithms. Tests on real and artificial data show that the proposed algorithm has accuracy comparable to other linear regression tree algorithms but requires orders of magnitude less computation time for large datasets. 1.