Results 11  20
of
72
A scalable bootstrap for massive data
 Journal of the Royal Statistical Society
, 2014
"... The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets—which are increasingly prevalent— the computation of bootstrapbased quantities can be prohibitively demanding computationally. While variants such as subsamplin ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving large datasets—which are increasingly prevalent— the computation of bootstrapbased quantities can be prohibitively demanding computationally. While variants such as subsampling and the m out of n bootstrap can be used in principle to reduce the cost of bootstrap computations, we find that these methods are generally not robust to specification of hyperparameters (such as the number of subsampled data points), and they often require use of more prior information (such as rates of convergence of estimators) than the bootstrap. As an alternative, we introduce the Bag of Little Bootstraps (BLB), a new procedure which incorporates features of both the bootstrap and subsampling to yield a robust, computationally efficient 1 ar X iv
A statistical perspective on algorithmic leveraging
, 2013
"... One popular method for dealing with largescale data sets is sampling. Using the empirical statistical leverage scores as an importance sampling distribution, the method of algorithmic leveraging samples and rescales data matrices to reduce the data size before performing computations on the subpr ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
One popular method for dealing with largescale data sets is sampling. Using the empirical statistical leverage scores as an importance sampling distribution, the method of algorithmic leveraging samples and rescales data matrices to reduce the data size before performing computations on the subproblem. Existing work has focused on algorithmic issues, but none of it addresses statistical aspects of this method. Here, we provide an effective framework to evaluate the statistical properties of algorithmic leveraging in the context of estimating parameters in a linear regression model. In particular, for several versions of leveragebased sampling, we derive results for the bias and variance. We show that from the statistical perspective of bias and variance, neither leveragebased sampling nor uniform sampling dominates the other. This result is particularly striking, given the wellknown result that, from the algorithmic perspective of worstcase analysis, leveragebased sampling provides uniformly superior worstcase algorithmic results, when compared with uniform sampling. Based on these theoretical results, we propose and analyze two new leveraging algorithms: one constructs a smaller leastsquares problem with “shrinked” leverage scores (SLEV), and the other solves a smaller and unweighted (or biased) leastsquares problem (LEVUNW). The empirical results indicate that our theory is a good predictor of practical performance of existing and new leveragebased algorithms and that the new algorithms achieve improved performance.
Changepoint in stochastic design regression and the bootstrap
 Ann. Statist
, 2011
"... In this paper we study the consistency of different bootstrap procedures for constructing confidence intervals (CIs) for the unique jump discontinuity (changepoint) in an otherwise smooth regression function in a stochastic design setting. This problem exhibits nonstandard asymptotics and we argue ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
In this paper we study the consistency of different bootstrap procedures for constructing confidence intervals (CIs) for the unique jump discontinuity (changepoint) in an otherwise smooth regression function in a stochastic design setting. This problem exhibits nonstandard asymptotics and we argue that the standard bootstrap procedures in regression fail to provide valid confidence intervals for the changepoint. We propose a version of smoothed bootstrap, illustrate its remarkable finite sample performance in our simulation study, and prove the consistency of the procedure. The m out of n bootstrap procedure is also considered and shown to be consistent. We also provide sufficient conditions for any bootstrap procedure to be consistent in this scenario. 1
Penalized Loglikelihood Estimator for Partly Linear Transformation Models with Current Status Data
 Annals of Statistics
, 2005
"... ar ..."
(Show Context)
Richardson extrapolation and the bootstrap
 J. Amer. Statist. Assoc
, 1988
"... SUMMARY. The m out of n bootstrap, with or without replacement, where m→∞ and m/n → 0 has been proposed on two grounds: (i) As a way of ensuring consistency when the classical bootstrap is not consistent. (ii) When it is consistent, then in conjunction with extrapolation, as a way of obtaining behav ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
SUMMARY. The m out of n bootstrap, with or without replacement, where m→∞ and m/n → 0 has been proposed on two grounds: (i) As a way of ensuring consistency when the classical bootstrap is not consistent. (ii) When it is consistent, then in conjunction with extrapolation, as a way of obtaining behaviour equivalent to that of the classical bootstrap, to second or higher order, with reduced computation time. In this paper we shall discuss a partial taxonomy of higher order behaviour of the m out of n bootstrap and introduce a general form of extrapolation. 1.
On the Choice of m in the m Out of n Bootstrap and its Application to Confidence Bounds for Extreme Percentiles
, 2005
"... The m out of n bootstrap Bickel et al. [1997], Politis and Romano [1994] is a modification of the ordinary bootstrap which can rectify bootstrap failure when the bootstrap sample size is n. The modification is to take bootstrap samples of size m where m → ∞ and m/n → 0. The choice of m is an import ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
The m out of n bootstrap Bickel et al. [1997], Politis and Romano [1994] is a modification of the ordinary bootstrap which can rectify bootstrap failure when the bootstrap sample size is n. The modification is to take bootstrap samples of size m where m → ∞ and m/n → 0. The choice of m is an important matter, in general. In this paper we consider an adaptive rule proposed by Bickel, Götze and van Zwet (personal communication) to pick m. We give general sufficient conditions for validity of the rule and then examine its behavior in the problem of setting...
Sharp Bounds on the Distribution of the Treatment E¤ects and Their Statistical Inference
"... In this paper, we propose nonparametric estimators of sharp bounds on the distribution of the treatement e¤ect of a binary treatment and establish their asymptotic distributions. We note the possible failure of the standard bootstrap with the same sample size and apply the fewerthann bootstrap to ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
In this paper, we propose nonparametric estimators of sharp bounds on the distribution of the treatement e¤ect of a binary treatment and establish their asymptotic distributions. We note the possible failure of the standard bootstrap with the same sample size and apply the fewerthann bootstrap to making inferences on these bounds. The …nite sample performances of the con…dence intervals for the bounds based on normal critical values, the standard bootstrap, and the fewerthann bootstrap are investigated via a simulation study. Finally we establish sharp bounds on the treatment e¤ect distribution when covariates are available. We thank Jinyong Hahn and two anonymous referees for their valubale suggestions that greatly improved the
A SURVEY OF LIMIT LAWS FOR BOOTSTRAPPED SUMS
, 2003
"... Concentrating mainly on independent and identically distributed (i.i.d.) realvalued parent sequences, we give an overview of firstorder limit theorems available for bootstrapped sample sums for Efron’s bootstrap. As a light unifying theme, we expose by elementary means the relationship between cor ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Concentrating mainly on independent and identically distributed (i.i.d.) realvalued parent sequences, we give an overview of firstorder limit theorems available for bootstrapped sample sums for Efron’s bootstrap. As a light unifying theme, we expose by elementary means the relationship between corresponding conditional and unconditional bootstrap limit laws. Some open problems are also posed.
Properties of bagged nearest neighbour classifiers
 Journal of the Royal Statistical Society B
"... Summary. It is shown that bagging, a computationally intensive method, asymptotically improves the performance of nearest neighbour classifiers provided that the resample size is less than 69 % of the actual sample size, in the case of withreplacement bagging, or less than 50% of the sample size, ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Summary. It is shown that bagging, a computationally intensive method, asymptotically improves the performance of nearest neighbour classifiers provided that the resample size is less than 69 % of the actual sample size, in the case of withreplacement bagging, or less than 50% of the sample size, for withoutreplacement bagging. However, for larger sampling fractions there is no asymptotic difference between the risk of the regular nearest neighbour classifier and its bagged version. In particular, neither achieves the large sample performance of the Bayes classifier. In contrast, when the sampling fractions converge to 0, but the resample sizes diverge to 1, the bagged classifier converges to the optimal Bayes rule and its risk converges to the risk of the latter. These results are most readily seen when the two populations have welldefined densities, but they may also be derived in other cases, where densities exist in only a relative sense. Crossvalidation can be used effectively to choose the sampling fraction. Numerical calculation is used to illustrate these theoretical properties.