Results 1  10
of
20
On Combining Artificial Neural Nets
 Connection Science
, 1996
"... This paper reviews research on combining artificial neural nets, and provides an overview of, and an introduction to, the papers contained this Special Issue, and its companion (Connection Science, 9, 1). Two main approaches, ensemblebased, and modular, are identified and considered. An ensembl ..."
Abstract

Cited by 80 (3 self)
 Add to MetaCart
This paper reviews research on combining artificial neural nets, and provides an overview of, and an introduction to, the papers contained this Special Issue, and its companion (Connection Science, 9, 1). Two main approaches, ensemblebased, and modular, are identified and considered. An ensemble, or committee, is made up of a set of nets, each of which is a general function approximator. The members of the ensemble are combined in order to obtain better generalisation performance than would be achieved by any of the individual nets. The main issues considered here under the heading of ensemblebased approaches, are (a) how to combine the outputs of the ensemble members (b) how to create candidate ensemble members and (c) which methods lead to the most effective ensembles? Under the heading of modular approaches we begin by considering a divideandconquer approach by which a function is automatically decomposed into a number of subfunctions which are treated by specialis...
Hierarchical MixturesofExperts for Exponential Family Regression Models: Approximation and Maximum Likelihood Estimation
 Ann. Statistics
, 1999
"... this paper we consider the denseness and consistency of these models in the generalized linear model context. Before proceeding we present some notation regarding mixtures and hierarchical mixtures of generalized linear models and oneparameter exponential family HIERARCHICAL MIXTURESOFEXPERTS 3 ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
this paper we consider the denseness and consistency of these models in the generalized linear model context. Before proceeding we present some notation regarding mixtures and hierarchical mixtures of generalized linear models and oneparameter exponential family HIERARCHICAL MIXTURESOFEXPERTS 3 regression models. Generalized linear models are widely used in statistical practice [McCullagh and Nelder (1989)]. Oneparameter exponential family regression models [see Bickel and Doksum (1977), page 67] with generalized linear mean functions (GLM1) are special examples of the generalized linear models, where the probability distribution can be parameterized by the mean function. In the regression context, a GLM1 model proposes that the conditional expectation (x) of a real response variable y (the output) is related to a vector of predictors (or inputs)
Probabilistic Inference from Arbitrary Uncertainty using Mixtures of Factorized Generalized Gaussians
 ControlShell: A Software Architecture for Complex Electromechanical Systems ; International Journal for Robotics Research (IJRR
, 1998
"... This paper presents a general and efficient framework for probabilistic inference and learning from arbitrary uncertain information. It exploits the calculation properties of finite mixture models, conjugate families and factorization. Both the joint probability density of the variables and the like ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper presents a general and efficient framework for probabilistic inference and learning from arbitrary uncertain information. It exploits the calculation properties of finite mixture models, conjugate families and factorization. Both the joint probability density of the variables and the likelihood function of the (objective or subjective) observation are approximated by a special mixture model, in such a way that any desired conditional distribution can be directly obtained without numerical integration. We have developed an extended version of the expectation maximization (EM) algorithm to estimate the parameters of mixture models from uncertain training examples (indirect observations). As a consequence, any piece of exact or uncertain information about both input and output values is consistently handled in the inference and learning stages. This ability, extremely useful in certain situations, is not found in most alternative methods. The proposed framework is formally just...
Visual Development and the Acquisition of Motion Velocity Sensitivities
 Neural Computation
, 2002
"... We consider the hypothesis that systems learning aspects of visual perception may benefit from the use of suitably designed developmental progressions during training. ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We consider the hypothesis that systems learning aspects of visual perception may benefit from the use of suitably designed developmental progressions during training.
2009): “Approximation of conditional densities by smooth mixtures of regressions,” Discussion paper
"... This paper shows that large nonparametric classes of conditional multivariate densities can be approximated in the Kullback–Leibler distance bydifferentspecifications of finite mixtures of normal regressions in which normal means and variances and mixing probabilities can depend on variables in the ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
This paper shows that large nonparametric classes of conditional multivariate densities can be approximated in the Kullback–Leibler distance bydifferentspecifications of finite mixtures of normal regressions in which normal means and variances and mixing probabilities can depend on variables in the conditioning set (covariates). These models are a special case of models known as “mixtures of experts” in statistics and computer science literature. Flexible specifications include models in which only mixing probabilities, modeled by multinomial logit, depend on the covariates and, in the univariate case, models in which only means of the mixed normals depend flexibly on the covariates. Modeling the variance of the mixed normals by flexible functions of the covariates can weaken restrictions on the class of the approximable densities. Obtained results can be generalized to mixtures of general location scale densities. Rates of convergence and easy to interpret bounds are also obtained for different model specifications. These approximation results can be useful for proving consistency of Bayesian and maximum likelihood density estimators based on these models. The results also have interesting implications for applied researchers.
Decomposition Methodology for Classification Tasks – A Meta Decomposer Framework
 PATTERN ANALYSIS AND APPLICATIONS
, 2006
"... The idea of decomposition methodology for classification tasks is to break down a complex classification task into several simpler and more manageable subtasks that are solvable by using existing induction methods, then joining their solutions together in order to solve the original problem. In thi ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
The idea of decomposition methodology for classification tasks is to break down a complex classification task into several simpler and more manageable subtasks that are solvable by using existing induction methods, then joining their solutions together in order to solve the original problem. In this paper we provide an overview of very popular but diverse decomposition methods and introduce a related taxonomy to categorize them. Subsequently we suggest using this taxonomy to create a novel metadecomposer framework to automatically select the appropriate decomposition method for a given problem. The experimental study validates the e#ectiveness of the proposed metadecomposer on a set of benchmark datasets.
On the Asymptotic Normality of Hierarchical MixturesofExperts for Generalized Linear Models
 IEEE Trans. on Information Theory
, 1999
"... In the class of hierarchical mixturesofexperts (HME) models, "experts" in the exponential family with generalized linear mean functions of the form /(ff + x T fi) are mixed, according to a set of local weights called the "gating functions" depending on the predictor x. Here /(\Delta) is the inve ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
In the class of hierarchical mixturesofexperts (HME) models, "experts" in the exponential family with generalized linear mean functions of the form /(ff + x T fi) are mixed, according to a set of local weights called the "gating functions" depending on the predictor x. Here /(\Delta) is the inverse link function. We provide regularity conditions on the experts and on the gating functions under which the maximum likelihood method in the large sample limit produces a consistent and asymptotically normal estimator of the mean response. The regularity conditions are validated for Poisson, gamma, normal and binomial experts. Index Terms  Hierarchical mixturesofexperts, generalized linear models, maximum likelihood estimation, large sample theory, asymptotic normal distribution, regularity conditions, Fisher information, statistical inference. 1 Introduction In Hierarchical MixturesofExperts (HME) (Jordan and Jacobs 1994), experts of simple regression models are mixed in a trees...
Bayesian modeling of joint and conditional distributions. Unpublished manuscript
, 2009
"... In this paper, we study a Bayesian approach to flexible modeling of conditional distributions. The approach uses a flexible model for the joint distribution of the dependent and independent variables and then extracts the conditional distributions of interest from the estimated joint distribution. W ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
In this paper, we study a Bayesian approach to flexible modeling of conditional distributions. The approach uses a flexible model for the joint distribution of the dependent and independent variables and then extracts the conditional distributions of interest from the estimated joint distribution. We use a finite mixture of multivariate normals (FMMN) to estimate the joint distribution. The conditional distributions can then be assessed analytically or through simulations. The discrete variables are handled through the use of latent variables. The estimation procedure employs an MCMC algorithm. We provide a characterization of the Kullback–Leibler closure of FMMN and show that the joint and conditional predictive densities implied by FMMN model are consistent estimators for a large class of data generating processes with continuous and discrete observables. The method can be used as a robust regression model with discrete and continuous dependent and independent variables and as a Bayesian alternative to semi and nonparametric models such as quantile and kernel regression. In experiments, the method compares favorably with classical nonparametric and alternative Bayesian methods.
On the Approximation Rate of Hierarchical MixturesofExperts for Generalized Linear Models
"... We investigate a class of hierarchical mixturesofexperts (HME) models where generalized linear models with nonlinear mean functions of the form /(ff + x T fi) are mixed. Here /(\Delta) is the inverse link function. It is shown that mixtures of such mean functions can approximate a class of smoot ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We investigate a class of hierarchical mixturesofexperts (HME) models where generalized linear models with nonlinear mean functions of the form /(ff + x T fi) are mixed. Here /(\Delta) is the inverse link function. It is shown that mixtures of such mean functions can approximate a class of smooth functions of the form /(h(x)) where h(\Delta) 2 W 1 2;K (a Sobolev class over [0; 1] s ), as the number of experts m in the network increases. An upperbound of the approximation rate is given as O(m \Gamma2=s ) in L p norm. This rate can be achieved within the family of HME structures with no more than slayers, where s is the dimension of the predictor x. 1 1 Introduction Hierarchical MixturesofExperts (HME) (Jordan and Jacobs 1994) have received considerable attention due to flexibility in modeling, appealing interpretation, and the availability of convenient computational algorithms. HME is the hierarchical extension of the MixturesofExperts (ME) model introduced by Jacobs ...