Results 1  10
of
23
Adaptive Regression by Mixing
 Journal of American Statistical Association
"... Adaptation over different procedures is of practical importance. Different procedures perform well under different conditions. In many practical situations, it is rather hard to assess which conditions are (approximately) satisfied so as to identify the best procedure for the data at hand. Thus auto ..."
Abstract

Cited by 39 (7 self)
 Add to MetaCart
Adaptation over different procedures is of practical importance. Different procedures perform well under different conditions. In many practical situations, it is rather hard to assess which conditions are (approximately) satisfied so as to identify the best procedure for the data at hand. Thus automatic adaptation over various scenarios is desirable. A practically feasible method, named Adaptive Regression by Mixing (ARM) is proposed to convexly combine general candidate regression procedures. Under mild conditions, the resulting estimator is theoretically shown to perform optimally in rates of convergence without knowing which of the original procedures work the best. Simulations are conducted in several settings, including comparing a parametric model with nonparametric alternatives, comparing a neural network with a projection pursuit in multidimensional regression, and combining bandwidths in kernel regression. The results clearly support the theoretical property of ARM. The ARM ...
Aggregation by exponential weighting and sharp oracle inequalities
"... Abstract. In the present paper, we study the problem of aggregation under the squared loss in the model of regression with deterministic design. We obtain sharp oracle inequalities for convex aggregates defined via exponential weights, under general assumptions on the distribution of errors and on t ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
Abstract. In the present paper, we study the problem of aggregation under the squared loss in the model of regression with deterministic design. We obtain sharp oracle inequalities for convex aggregates defined via exponential weights, under general assumptions on the distribution of errors and on the functions to aggregate. We show how these results can be applied to derive a sparsity oracle inequality. 1
Can the Strengths of AIC and BIC Be Shared?
 BIOMETRICA
, 2003
"... It is well known that AIC and BIC have different properties in model selection. BIC is consistent in the sense that if the true model is among the candidates, the probability of selecting the true model approaches 1. On the other hand, AIC is minimaxrate optimal for both parametric and nonparame ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
It is well known that AIC and BIC have different properties in model selection. BIC is consistent in the sense that if the true model is among the candidates, the probability of selecting the true model approaches 1. On the other hand, AIC is minimaxrate optimal for both parametric and nonparametric cases for estimating the regression function. There are several successful results on constructing new model selection criteria to share some strengths of AIC and BIC. However, we show that in a rigorous sense, even in the setting that the true model is included in the candidates, the above mentioned main strengths of AIC and BIC cannot be shared. That is, for any model selection criterion to be consistent, it must behave supoptimally compared to AIC in terms of mean average squared error.
Consistency of cross validation for comparing regression procedures. Annals of Statistics, Accepted paper
"... Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finitedimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistenc ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finitedimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property.
Combining forecasting procedures: some theoretical results
 Econometric Theory
, 2004
"... We study some methods of combining procedures for forecasting a continuous random variable. Statistical risk bounds under the square error loss are obtained under mild distributional assumptions on the future given the current outside information and the past observations. The risk bounds show that ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We study some methods of combining procedures for forecasting a continuous random variable. Statistical risk bounds under the square error loss are obtained under mild distributional assumptions on the future given the current outside information and the past observations. The risk bounds show that the combined forecast automatically achieves the best performance among the candidate procedures up to a constant factor and an additive penalty term. In term of the rate of convergence, the combined forecast performs as well as if one knew which candidate forecasting procedure is the best in advance. Empirical studies suggest combining procedures can sometimes improve forecasting accuracy compared to the original procedures. Risk bounds are derived to theoretically quantify the potential gain and price for linearly combining forecasts for improvement. The result supports the empirical finding that it is not automatically a good idea to combine forecasts. A blind combining can degrade performance dramatically due to the undesirable large variability in estimating the best combining weights. An automated combining method is shown in theory to achieve a balance between the potential gain and the complexity penalty (the price for combining); to take advantage (if any) of sparse combining; and to maintain the best performance (in rate) among the candidate forecasting procedures if linear or sparse combining does not help.
Combining Time Series Models for Forecasting
, 2002
"... Statistical models (e.g., ARIMA models) have been commonly used in time series data analysis and forecasting. Typically one model is selected based on a selection criterion (e.g., AIC), hypothesis testing, and/or graphical inspections. The selected model is then used to forecast future values. Howev ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Statistical models (e.g., ARIMA models) have been commonly used in time series data analysis and forecasting. Typically one model is selected based on a selection criterion (e.g., AIC), hypothesis testing, and/or graphical inspections. The selected model is then used to forecast future values. However, model selection is often unstable and may cause an unnecessarily high variability in the final estimation/prediction. In this work, we propose the use of an algorithm AFTER to convexly combine the models for a better performance of prediction. The weights are sequentially updated after each additional observation. Simulations and real data examples are used to compare performance of our approach with model selection methods. The results show advantage of combining by AFTER over selection in term of forecasting accuracy at several settings.
Adaptive Estimation in Pattern Recognition by Combining Different Procedures
 Statistica Sinica
"... : We study a problem of adaptive estimation of a conditional probability function in a pattern recognition setting. In many applications, for more flexibility, one may want to consider various estimation procedures targeted at different scenarios and/or under different assumptions. For example, when ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
: We study a problem of adaptive estimation of a conditional probability function in a pattern recognition setting. In many applications, for more flexibility, one may want to consider various estimation procedures targeted at different scenarios and/or under different assumptions. For example, when the feature dimension is high, to overcome the familiar curse of dimensionality, one may seek a good parsimonious model among a number of candidates such as CART, neural nets, additive models, and others. For such a situation, one wishes to have an automated final procedure performing always as well as the best candidate. In this work, we propose a method to combine a countable collection of procedures for estimating the conditional probability. We show that the combined procedure has a property that its statistical risk is bounded above by that of any of the procedure being considered plus a small penalty. Thus in an asymptotic sense, the strengths of the different estimation procedures i...
Prediction/estimation With Simple Linear Models: Is It Really That Simple?
, 2004
"... Consider the simple normal linear regression model for estimation/prediction at a new design point. When the slope parameter is not obviously nonzero, hypothesis testing and model selection methods can be used for identifying the right model. We compare performance of such methods both theoretically ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Consider the simple normal linear regression model for estimation/prediction at a new design point. When the slope parameter is not obviously nonzero, hypothesis testing and model selection methods can be used for identifying the right model. We compare performance of such methods both theoretically and empirically from different perspectives for more insight. The testing approach, in spite of being the "standard approch", performs poorly. We also found that the frequently told story "BIC is good when the true model is finitedimensional and AIC is good when the true model is infinitedimensional" is far from being accurate. In addition, despite some successes in the effort to go beyond the debate between AIC and BIC by adaptive model selection, it turns out that it is not possible to share the most essential properties of them by any model selection method. When model selection methods have difficulty in selection, model combining is seen to be a better alternative.
Segmentation of the mean of heteroscedastic data via crossvalidation
, 2010
"... This paper tackles the problem of detecting abrupt changes in the mean of a heteroscedastic signal by model selection, without knowledge on the variations of the noise. A new family of changepoint detection procedures is proposed, showing that crossvalidation methods can be successful in the heter ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
This paper tackles the problem of detecting abrupt changes in the mean of a heteroscedastic signal by model selection, without knowledge on the variations of the noise. A new family of changepoint detection procedures is proposed, showing that crossvalidation methods can be successful in the heteroscedastic framework, whereas most existing procedures are not robust to heteroscedasticity. The robustness to heteroscedasticity of the proposed procedures is supported by an extensive simulation study, together with recent theoretical results. An application to Comparative Genomic Hybridization (CGH) data is provided, showing that robustness to heteroscedasticity can indeed be required for their analysis.
From local polynomial approximation to pointwise shapeadaptive transforms: an evolutionary nonparametric regression perspective
 PROC. 2006 INT. TICSP WORKSHOP SPECTRAL METH. MULTIRATE SIGNAL PROCESS., SMMSP 2006
, 2006
"... In this paper we review and discuss some of the theoretical and practical aspects, the problems, and the considerations that pushed our research from the onedimensional LPAICI (local polynomial approximation interesection of conÞdence intervals) algorithm [27] to the development of powerful trans ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
In this paper we review and discuss some of the theoretical and practical aspects, the problems, and the considerations that pushed our research from the onedimensional LPAICI (local polynomial approximation interesection of conÞdence intervals) algorithm [27] to the development of powerful transformbased methods for anisotropic image