Results 1  10
of
11
Selfconcordant analysis for logistic regression
"... Most of the nonasymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closedform expressions. In this paper, we use and extend tools from the convex optimization literature, namely selfconcordant functions, to provide simple extensio ..."
Abstract

Cited by 23 (10 self)
 Add to MetaCart
Most of the nonasymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closedform expressions. In this paper, we use and extend tools from the convex optimization literature, namely selfconcordant functions, to provide simple extensions of theoretical results for the square loss to the logistic loss. We apply the extension techniques to logistic regression with regularization by the ℓ2norm and regularization by the ℓ1norm, showing that new results for binary classification through logistic regression can be easily derived from corresponding results for leastsquares regression. 1
Bootstrap estimate of KullbackLeibler information for model selection
 Statistica Sinica
, 1997
"... Estimation of KullbackLeibler amount of information is a crucial part of deriving a statistical model selection procedure which is based on likelihood principle like AIC. To discriminate nested models, we have to estimate it up to the order of constant while the KullbackLeibler information itself ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Estimation of KullbackLeibler amount of information is a crucial part of deriving a statistical model selection procedure which is based on likelihood principle like AIC. To discriminate nested models, we have to estimate it up to the order of constant while the KullbackLeibler information itself is of the order of the number of observations. A correction term employed in AIC is an example to ful ll this requirement but it is a simple minded bias correction to the log maximum likelihood. Therefore there is no assurance that such a bias correction yields a good estimate of KullbackLeibler information. In this paper as an alternative, bootstrap type estimation is considered. We will rst show that both bootstrap estimates proposed by Efron (1983,1986,1993) and Cavanaugh and Shumway(1994) are at least asymptotically equivalent and there exist many other equivalent bootstrap estimates. We also show that all such methods are asymptotically equivalent to a nonbootstrap method, known as TIC (Takeuchi's Information Criterion) which is a generalization of AIC.
Optimal design of regularization term and regularization parameter by subspace information criterion
 Neural Networks
, 2000
"... The problem of designing the regularization term and regularization parameter for linear regression models is discussed. Previously, we derived an approximation to the generalization error called the subspace information criterion (SIC), which is an unbiased estimator of the generalization error wit ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
The problem of designing the regularization term and regularization parameter for linear regression models is discussed. Previously, we derived an approximation to the generalization error called the subspace information criterion (SIC), which is an unbiased estimator of the generalization error with finite samples under certain conditions. In this paper, we apply SIC to regularization learning and use it for (a) choosing the optimal regularization term and regularization parameter from given candidates, and (b) obtaining the closed form of the optimal regularization parameter for a fixed regularization term. The effectiveness of SIC is demonstrated through computer simulations with artificial and real data. Keywords supervised learning, generalization error, linear regression, regularization learning, ridge regression, model selection, regularization parameter, subspace information criterion Optimal Regularization by SIC 2 Nomenclature f(x) : learning target function D: domain of f(x) xm: mth sample point ym: mth sample value ɛm: mth noise (xm,ym) : mth training example M: the number of training examples y: Mdimensional vector consisting of {ym} M m=1 ɛ: Mdimensional vector consisting of {ɛm} M m=1 ϕp(x) : pth basis function θp: pth coefficient µ: the number of basis functions JG: generalization error JTE: training error JR: regularized training error T: regularization matrix α: regularization parameter A: design matrix XT,α: regularization learning matrix U: µdimensional matrix θ: true parameter ˆθT,α: regularization estimate ˆθu: unbiased estimate σ 2: noise variance 1
Model Selection for Variable Length Markov Chains and Tuning the Context Algorithm
, 2000
"... We consider the model selection problem in the class of stationary variable length Markov chains (VLMC) on a nite space. The processes in this class are still Markovian of higher order, but with memory of variable length. Various aims in selecting a VLMC can be formalized with dierent nonequivalent ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We consider the model selection problem in the class of stationary variable length Markov chains (VLMC) on a nite space. The processes in this class are still Markovian of higher order, but with memory of variable length. Various aims in selecting a VLMC can be formalized with dierent nonequivalent risks, such as nal prediction error or expected KullbackLeibler information. We consider the asymptotic behavior of dierent risk functions and show how they can be generally estimated with the same resampling strategy. Such estimated risks then yield new model selection criteria. In particular, we obtain a datadriven tuning of Rissanen's tree structured context algorithm which is a computationally feasible procedure for selection and estimation of a VLMC. Key words and phrases. Bootstrap, zeroone loss, nal prediction error, nitememory source, FSMX model, KullbackLeibler information, L 2 loss, optimal tree pruning, resampling, tree model. Short title: Selecting variable length Mar...
Dynamic Adaptive Partitioning for Nonlinear Time Series
, 1998
"... Introduction Nonparametric methods which are able to adapt to local sparseness of the data are often substantially better than nonadaptive procedures because of the curse of dimensionality, and estimation of the mean as a function of predictor variables with adaptive partitioning schemes has attra ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Introduction Nonparametric methods which are able to adapt to local sparseness of the data are often substantially better than nonadaptive procedures because of the curse of dimensionality, and estimation of the mean as a function of predictor variables with adaptive partitioning schemes has attracted much attention (Breiman et al., 1984; Friedman, 1991; Gersho & Gray, 1992). Some of these schemes have been studied also in the case of stationary time series (Lewis & Stevens, 1991; Nobel, 1997), but none of the schemes use the simple fact, that in the case of a time series, the partition cells themselves typically have a dynamic characteristic. Consider a stationary realvalued pthorder Markov chain Y t (t 2 ZZ) with state vector S t\Gamma1 = (Y t\Gamma1 ; : : : ; Y t\Gammap ) being the first p
Information and Posterior Probability Criteria for Model Selection in Local Likelihood Estimation
 J Amer. Stat. Ass
, 1998
"... this paper we propose a modification to the methods used to motivate many information and posterior probability criteria for the weighted likelihood case. We derive weighted versions for two of the most widely known criteria, namely the AIC and BIC. Via a simple modification, the criteria are also m ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
this paper we propose a modification to the methods used to motivate many information and posterior probability criteria for the weighted likelihood case. We derive weighted versions for two of the most widely known criteria, namely the AIC and BIC. Via a simple modification, the criteria are also made useful for window span selection. The usefulness of the weighted version of these criteria are demonstrated through a simulation study and an application to three data sets. KEY WORDS: Information Criteria; Posterior Probability Criteria; Model Selection; Local Likelihood. 1. INTRODUCTION Local regression has become a popular method for smoothing scatterplots and for nonparametric regression in general. It has proven to be a useful tool in finding structure in datasets (Cleveland and Devlin 1988). Local regression estimation is a method for smoothing scatterplots (x i ; y i ), i = 1; : : : ; n in which the fitted value at x 0 is the value of a polynomial fit to the data using weighted least squares where the weight given to (x i ; y i ) is related to the distance between x i and x 0 . Stone (1977) shows that estimates obtained using the local regression methods have desirable theoretical properties. Recently, Fan (1993) has studied minimax properties of local linear regression. Tibshirani and Hastie (1987) extend the ideas of local regression to a local likelihood procedure. This procedure is designed for nonparametric regression modeling in situations where weighted least squares is inappropriate as an estimation method, for example binary data. Local regression may be viewed as a special case of local likelihood estimation. Tibshirani and Hastie (1987), Staniswalis (1989), and Loader (1999) apply local likelihood estimation to several types of data where local regressio...
Penalized quadratic inference functions for variable selection in longitudinal research
, 2006
"... For decades, much research has been devoted to developing and comparing variable selection methods, but primarily for the classical case of independent observations. Existing variableselection methods can be adapted to clustercorrelated observations, but some adaptation is required. For example, ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
For decades, much research has been devoted to developing and comparing variable selection methods, but primarily for the classical case of independent observations. Existing variableselection methods can be adapted to clustercorrelated observations, but some adaptation is required. For example, classical model fit statistics such as AIC and BIC are undefined if the likelihood function is unknown (Pan, 2001). Little research has been done on variable selection for generalized estimating equations (GEE, Liang and Zeger, 1986) and similar correlated data approaches. This thesis will review existing work on model selection for GEE and propose new model selection options for GEE, as well as for a more sophisticated marginal modeling approach based on quadratic inference functions (QIF, Qu, Lindsay, and Li, 2000), which has better asymptotic properties than classic GEE. The focus is on selection using continuous penalties such as LASSO (Tibshirani, 1996) or SCAD (Fan and Li, 2001) rather than the older discrete penalties such as AIC and BIC. The
SUMMARY
"... The problem of evaluating the goodness of statistical models is investigated from an informationtheoretic point of view. Information criteria are proposed for evaluating models constructed by various estimation procedures when the specified family of probability distributions does not contain the d ..."
Abstract
 Add to MetaCart
The problem of evaluating the goodness of statistical models is investigated from an informationtheoretic point of view. Information criteria are proposed for evaluating models constructed by various estimation procedures when the specified family of probability distributions does not contain the distribution generating the data. The proposed criteria are applied to the evaluation of models estimated by maximum likelihood, robust, penahsed likelihood, Bayes procedures, etc. We also discuss the use of the bootstrap in model evaluation problems and present a variance reduction technique in the bootstrap simulation.
Information ComplexityBased Regularization Parameter Selection for Solution of IllConditioned Inverse Problems
, 2002
"... We propose an information complexitybased regularization parameter selection method for solution of illconditioned inverse problems. The regularization parameter is selected to be the minimizer of the KullbackLeibler (KL) distance between the unknown datagenerating distribution and the fitted di ..."
Abstract
 Add to MetaCart
We propose an information complexitybased regularization parameter selection method for solution of illconditioned inverse problems. The regularization parameter is selected to be the minimizer of the KullbackLeibler (KL) distance between the unknown datagenerating distribution and the fitted distribution. The KL distance is approximated by an information complexity (ICOMP) criterion developed by Bozdogan (1988, 1990, 1994, 2000). The method is not limited to the white Gaussian noise case. It can be extended to the correlated and nonGaussian noise. It can also account for possible model misspecification. We demonstrate the performance of the proposed method on a test problem from Hansen's (1994) Regularization Tools. 1.
Journal of Statistical Planning and
, 2000
"... www.elsevier.com/locate/jspi Improving predictive inference under covariate shift by weighting the loglikelihood function ..."
Abstract
 Add to MetaCart
www.elsevier.com/locate/jspi Improving predictive inference under covariate shift by weighting the loglikelihood function