Results 1  10
of
199
Bagging Predictors
 Machine Learning
, 1996
"... Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making ..."
Abstract

Cited by 2479 (1 self)
 Add to MetaCart
Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy. 1. Introduction A learning set of L consists of data f(y n ; x n ), n = 1; : : : ; Ng where the y's are either class labels or a numerical response. We have a procedure for using this learning set to form a predictor '(x; L)  if the input is x we ...
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
Nonparametric regression using Bayesian variable selection
 Journal of Econometrics
, 1996
"... This paper estimates an additive model semiparametrically, while automatically selecting the significant independent variables and the app~opriatc power transformation of the dependent variable. The nonlinear variables arc modeled as regression splincs, with significant knots selected fiom a large ..."
Abstract

Cited by 136 (10 self)
 Add to MetaCart
This paper estimates an additive model semiparametrically, while automatically selecting the significant independent variables and the app~opriatc power transformation of the dependent variable. The nonlinear variables arc modeled as regression splincs, with significant knots selected fiom a large number of candidate knots. The estimation is made robust by modeling the errors as a mixture of normals. A Bayesian approach is used to select the significant knots, the power transformation, and to identify oatliers using the Gibbs sampler to curry out the computation. Empirical evidence is given that the sampler works well on both simulated and real examples and that in the univariate case it compares faw)rably with a kernelweighted local linear smoother, The variable selection algorithm in the paper is substantially fasler than previous Bayesian variable sclcclion algorithms. K('I ' word~': Additive nlodel, Pov¢¢r Iransformalio:l: Robust cslinlalion
Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces
 Journal of Machine Learning Research
, 2004
"... We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable Y from an explanatory variable X, we treat the problem of dimensionality reduction as that of finding a lowdimensional ..."
Abstract

Cited by 117 (26 self)
 Add to MetaCart
We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable Y from an explanatory variable X, we treat the problem of dimensionality reduction as that of finding a lowdimensional “effective subspace ” for X which retains the statistical relationship between X and Y. We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem we establish a general nonparametric characterization of conditional independence using covariance operators on reproducing kernel Hilbert spaces. This characterization allows us to derive a contrast function for estimation of the effective subspace. Unlike many conventional methods for dimensionality reduction in supervised learning, the proposed method requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y. We present experiments that compare the performance of the method with conventional methods.
Flexible Discriminant Analysis by Optimal Scoring
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 1993
"... Fisher's linear discriminant analysis is a valuable tool for multigroup classification. With a large number of predictors, one can nd a reduced number of discriminant coordinate functions that are "optimal" for separating the groups. With two such functions one can produce a classification map that ..."
Abstract

Cited by 112 (12 self)
 Add to MetaCart
Fisher's linear discriminant analysis is a valuable tool for multigroup classification. With a large number of predictors, one can nd a reduced number of discriminant coordinate functions that are "optimal" for separating the groups. With two such functions one can produce a classification map that partitions the reduced space into regions that are identified with group membership, and the decision boundaries are linear. This paper is about richer nonlinear classification schemes. Linear discriminant analysis is equivalent to multiresponse linear regression using optimal scorings to represent the groups. We obtain nonparametric versions of discriminant analysis by replacing linear regression by any nonparametric regression method. In this way, any multiresponse regression technique (such as MARS or neural networks) can be postprocessed to improve their classification performence.
Linear smoothers and additive models
 The Annals of Statistics
, 1989
"... We study linear smoothers and their use in building nonparametric regression models. In part Qfthis paper we examine certain aspects of linear smoothers for scatterplots; examples of these are the running mean and running line, kernel, and cubic spline smoothers. The eigenvalue and singular value d ..."
Abstract

Cited by 70 (2 self)
 Add to MetaCart
We study linear smoothers and their use in building nonparametric regression models. In part Qfthis paper we examine certain aspects of linear smoothers for scatterplots; examples of these are the running mean and running line, kernel, and cubic spline smoothers. The eigenvalue and singular value decompositions of the corresponding smoother matrix are used to qualitatively describe a smoother, and several other topics such as the number of degrees of freedom of a smoother are discussed. In the second part of the paper we describe how Iinearsmoothers can be used to estimate the additive model, a powerful nonparametric regression model, using the "backfitting algorithm". We study the convergence of the backfitting algorithm and prove its convergence for a class of smoothers that includes cubic e:ttJlCl€~nt jJI:::Jll<l.li:6I;:U least squares. algorithm and ' dis.cuss ev'W()r(is: Neaparametric, seanparametric, regression, GaussSeidelalgorithm,
Bayesian PSplines
 Journal of Computational and Graphical Statistics
, 2004
"... Psplines are an attractive approach for modelling nonlinear smooth effects of covariates within the generalized additive and varying coefficient models framework. In this paper we propose a Bayesian version for Psplines and generalize the approach for one dimensional curves to two dimensional surf ..."
Abstract

Cited by 67 (21 self)
 Add to MetaCart
Psplines are an attractive approach for modelling nonlinear smooth effects of covariates within the generalized additive and varying coefficient models framework. In this paper we propose a Bayesian version for Psplines and generalize the approach for one dimensional curves to two dimensional surface fitting for modelling interactions between metrical covariates. A Bayesian approach to Psplines has the advantage of allowing for simultaneous estimation of smooth functions and smoothing parameters. Moreover, it can easily be extended to more complex formulations, for example to mixed models with random effects for serially or spatially correlated response. Additionally, the assumption of constant smoothing parameters can be replaced by allowing the smoothing parameters to be locally adaptive. This is particularly useful in situations with changing curvature of the underlying smooth function or where the function is highly oscillating. Inference is fully Bayesian and uses recent MCMC techniques for drawing random samples from the posterior. In a couple of simulation studies the performance of Bayesian Psplines is studied and compared to other approaches in the literature. We illustrate the approach by a complex application on rents for flats in Munich.
Smoothing Spline ANOVA with ComponentWise Bayesian "Confidence Intervals"
 Journal of Computational and Graphical Statistics
, 1992
"... We study a multivariate smoothing spline estimate of a function of several variables, based on an ANOVA decomposition as sums of main effect functions (of one variable), twofactor interaction functions (of two variables), etc. We derive the Bayesian "confidence intervals" for the components of this ..."
Abstract

Cited by 44 (17 self)
 Add to MetaCart
We study a multivariate smoothing spline estimate of a function of several variables, based on an ANOVA decomposition as sums of main effect functions (of one variable), twofactor interaction functions (of two variables), etc. We derive the Bayesian "confidence intervals" for the components of this decomposition and demonstrate that, even with multiple smoothing parameters, they can be efficiently computed using the publicly available code RKPACK, which was originally designed just to compute the estimates. We carry out a small Monte Carlo study to see how closely the actual properties of these componentwise confidence intervals match their nominal confidence levels. Lastly, we analyze some lake acidity data as a function of calcium concentration, latitude, and longitude, using both polynomial and thin plate spline main effects in the same model. KEY WORDS: Bayesian "confidence intervals"; Multivariate function estimation; RKPACK; Smoothing spline ANOVA. Chong Gu chong@pop.stat.pur...
Bayesian model averaging
 STAT.SCI
, 1999
"... Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to overcon dent inferences and decisions tha ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to overcon dent inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA) provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA haverecently emerged. We discuss these methods and present anumber of examples. In these examples, BMA provides improved outofsample predictive performance. We also provide a catalogue of
Mixtures of gpriors for Bayesian variable selection
 Journal of the American Statistical Association
, 2008
"... Zellner’s gprior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this paper, we study mixtures of gpriors as an alternative to default gpriors that resolve many of the problems with the original formulation, while mai ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
Zellner’s gprior remains a popular conventional prior for use in Bayesian variable selection, despite several undesirable consistency issues. In this paper, we study mixtures of gpriors as an alternative to default gpriors that resolve many of the problems with the original formulation, while maintaining the computational tractability that has made the gprior so popular. We present theoretical properties of the mixture gpriors and provide real and simulated examples to compare the mixture formulation with fixed gpriors, Empirical Bayes approaches and other default procedures.