Results 1 - 10
of
133
Bagging Predictors
- Machine Learning
, 1996
"... Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making ..."
Abstract
-
Cited by 1998 (1 self)
- Add to MetaCart
Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy. 1. Introduction A learning set of L consists of data f(y n ; x n ), n = 1; : : : ; Ng where the y's are either class labels or a numerical response. We have a procedure for using this learning set to form a predictor '(x; L) --- if the input is x we ...
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
Nonparametric regression using Bayesian variable selection
- Journal of Econometrics
, 1996
"... This paper estimates an additive model semiparametrically, while automatically select-ing the significant independent variables and the app~opriatc power transformation of the dependent variable. The nonlinear variables arc modeled as regression splincs, with sig-nificant knots selected fiom a large ..."
Abstract
-
Cited by 107 (8 self)
- Add to MetaCart
This paper estimates an additive model semiparametrically, while automatically select-ing the significant independent variables and the app~opriatc power transformation of the dependent variable. The nonlinear variables arc modeled as regression splincs, with sig-nificant knots selected fiom a large number of candidate knots. The estimation is made robust by modeling the errors as a mixture of normals. A Bayesian approach is used to select the significant knots, the power transformation, and to identify oatliers using the Gibbs sampler to curry out the computation. Empirical evidence is given that the sampler works well on both simulated and real examples and that in the univariate case it compares faw)rably with a kernel-weighted local linear smoother, The variable selection algorithm in the paper is substantially fasler than previous Bayesian variable sclcclion algorithms. K('I ' word~': Additive nlodel, Pov¢¢r Iransformalio:l: Robust cslinlalion
Flexible Discriminant Analysis by Optimal Scoring
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 1993
"... Fisher's linear discriminant analysis is a valuable tool for multigroup classification. With a large number of predictors, one can nd a reduced number of discriminant coordinate functions that are "optimal" for separating the groups. With two such functions one can produce a classification map that ..."
Abstract
-
Cited by 80 (12 self)
- Add to MetaCart
Fisher's linear discriminant analysis is a valuable tool for multigroup classification. With a large number of predictors, one can nd a reduced number of discriminant coordinate functions that are "optimal" for separating the groups. With two such functions one can produce a classification map that partitions the reduced space into regions that are identified with group membership, and the decision boundaries are linear. This paper is about richer nonlinear classification schemes. Linear discriminant analysis is equivalent to multi-response linear regression using optimal scorings to represent the groups. We obtain nonparametric versions of discriminant analysis by replacing linear regression by any nonparametric regression method. In this way, any multi-response regression technique (such as MARS or neural networks) can be post-processed to improve their classification performence.
Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces
- Journal of Machine Learning Research
, 2004
"... We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable Y from an explanatory variable X, we treat the problem of dimensionality reduction as that of finding a low-dimensional ..."
Abstract
-
Cited by 79 (23 self)
- Add to MetaCart
We propose a novel method of dimensionality reduction for supervised learning problems. Given a regression or classification problem in which we wish to predict a response variable Y from an explanatory variable X, we treat the problem of dimensionality reduction as that of finding a low-dimensional “effective subspace ” for X which retains the statistical relationship between X and Y. We show that this problem can be formulated in terms of conditional independence. To turn this formulation into an optimization problem we establish a general nonparametric characterization of conditional independence using covariance operators on reproducing kernel Hilbert spaces. This characterization allows us to derive a contrast function for estimation of the effective subspace. Unlike many conventional methods for dimensionality reduction in supervised learning, the proposed method requires neither assumptions on the marginal distribution of X, nor a parametric model of the conditional distribution of Y. We present experiments that compare the performance of the method with conventional methods.
Linear smoothers and additive models
- The Annals of Statistics
, 1989
"... We study linear smoothers and their use in building non-parametric regression models. In part Qfthis paper we examine certain aspects of linear smoothers for scatterplots; examples of these are the running mean and running line, kernel, and cubic spline smoothers. The eigenvalue and singular value d ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
We study linear smoothers and their use in building non-parametric regression models. In part Qfthis paper we examine certain aspects of linear smoothers for scatterplots; examples of these are the running mean and running line, kernel, and cubic spline smoothers. The eigenvalue and singular value decompositions of the corresponding smoother matrix are used to qualitatively describe a smoother, and several other topics such as the number of degrees of freedom of a smoother are discussed. In the second part of the paper we describe how Iinear-smoothers can be used to estimate the additive model, a powerful non-parametric regression model, using the "backfitting algorithm". We study the convergence of the backfitting algorithm and prove its convergence for a class of smoothers that includes cubic e:ttJlCl€~nt jJI:::Jll<l.li:6I;:U least squares. algorithm and ' dis.cuss ev'W()r(is: Nea-parametric, sean-parametric, regression, Gauss-Seidelalgorithm,
Smoothing Spline ANOVA with Component-Wise Bayesian "Confidence Intervals"
- Journal of Computational and Graphical Statistics
, 1992
"... We study a multivariate smoothing spline estimate of a function of several variables, based on an ANOVA decomposition as sums of main effect functions (of one variable), two-factor interaction functions (of two variables), etc. We derive the Bayesian "confidence intervals" for the components of this ..."
Abstract
-
Cited by 37 (16 self)
- Add to MetaCart
We study a multivariate smoothing spline estimate of a function of several variables, based on an ANOVA decomposition as sums of main effect functions (of one variable), two-factor interaction functions (of two variables), etc. We derive the Bayesian "confidence intervals" for the components of this decomposition and demonstrate that, even with multiple smoothing parameters, they can be efficiently computed using the publicly available code RKPACK, which was originally designed just to compute the estimates. We carry out a small Monte Carlo study to see how closely the actual properties of these component-wise confidence intervals match their nominal confidence levels. Lastly, we analyze some lake acidity data as a function of calcium concentration, latitude, and longitude, using both polynomial and thin plate spline main effects in the same model. KEY WORDS: Bayesian "confidence intervals"; Multivariate function estimation; RKPACK; Smoothing spline ANOVA. Chong Gu chong@pop.stat.pur...
Bayesian P-Splines
- Journal of Computational and Graphical Statistics
, 2004
"... P-splines are an attractive approach for modelling nonlinear smooth effects of covariates within the generalized additive and varying coefficient models framework. In this paper we propose a Bayesian version for P-splines and generalize the approach for one dimensional curves to two dimensional surf ..."
Abstract
-
Cited by 35 (10 self)
- Add to MetaCart
P-splines are an attractive approach for modelling nonlinear smooth effects of covariates within the generalized additive and varying coefficient models framework. In this paper we propose a Bayesian version for P-splines and generalize the approach for one dimensional curves to two dimensional surface fitting for modelling interactions between metrical covariates. A Bayesian approach to P-splines has the advantage of allowing for simultaneous estimation of smooth functions and smoothing parameters. Moreover, it can easily be extended to more complex formulations, for example to mixed models with random effects for serially or spatially correlated response. Additionally, the assumption of constant smoothing parameters can be replaced by allowing the smoothing parameters to be locally adaptive. This is particularly useful in situations with changing curvature of the underlying smooth function or where the function is highly oscillating. Inference is fully Bayesian and uses recent MCMC techniques for drawing random samples from the posterior. In a couple of simulation studies the performance of Bayesian P-splines is studied and compared to other approaches in the literature. We illustrate the approach by a complex application on rents for flats in Munich.
Bayesian model averaging
- STAT.SCI
, 1999
"... Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-con dent inferences and decisions tha ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-con dent inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA) provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA haverecently emerged. We discuss these methods and present anumber of examples. In these examples, BMA provides improved out-of-sample predictive performance. We also provide a catalogue of
Block-relaxation Algorithms in Statistics
, 1994
"... this paper we discuss four such classes of algorithms. Or, more precisely, we discuss a single class of algorithms, and we show how some well-known classes of statistical algorithms fit in this common class. The subclasses are, in logical order, block-relaxation methods augmentation methods majoriza ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
this paper we discuss four such classes of algorithms. Or, more precisely, we discuss a single class of algorithms, and we show how some well-known classes of statistical algorithms fit in this common class. The subclasses are, in logical order, block-relaxation methods augmentation methods majorization methods Expectation-Maximization Alternating Least Squares Alternating Conditional Expectations

