Results 1  10
of
17
Sufficient dimension reduction and prediction in regression
 Philosophical Transactions of the Royal Society A
"... Dimension reduction for regression is a prominent issue today because technological advances now allow scientists to routinely formulate regressions in which the number of predictors is considerably larger than in the past. While several methods have been proposed to deal with such regressions, pri ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Dimension reduction for regression is a prominent issue today because technological advances now allow scientists to routinely formulate regressions in which the number of predictors is considerably larger than in the past. While several methods have been proposed to deal with such regressions, principal components still seem to be the most widely used across the applied sciences. We give a broad overview of ideas underlying a particular class of methods for dimension reduction that includes principal components, along with an introduction to the corresponding methodology. New methods are proposed for prediction in regressions with many predictors.
Coordinateindependent sparse sufficient dimension reduction and variable selection
 The Annals of Statistics
"... ar ..."
Linear Dimensionality Reduction for MarginBased Classification: HighDimensional Data and Sensor Networks
, 2011
"... Lowdimensional statistics of measurements play an important role in detection problems, including those encountered in sensor networks. In this work, we focus on learning lowdimensional linear statistics of highdimensional measurement data along with decision rules defined in the lowdimensional ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Lowdimensional statistics of measurements play an important role in detection problems, including those encountered in sensor networks. In this work, we focus on learning lowdimensional linear statistics of highdimensional measurement data along with decision rules defined in the lowdimensional space in the case when the probability density of the measurements and class labels is not given, but a training set of samples from this distribution is given. We pose a joint optimization problem for linear dimensionality reduction and marginbased classification, and develop a coordinate descent algorithm on the Stiefel manifold for its solution. Although the coordinate descent is not guaranteed to find the globally optimal solution, crucially, its alternating structure enables us to extend it for sensor networks with a messagepassing approach requiring little communication. Linear dimensionality reduction prevents overfitting when learning from finite training data. In the sensor network setting, dimensionality reduction not only prevents overfitting, but also reduces power consumption due to communication. The learned reduceddimensional space and decision rule is shown to be consistent and its Rademacher complexity is characterized. Experimental results are presented for a variety of datasets, including those from existing sensor networks, demonstrating the potential of our methodology in comparison with other dimensionality reduction approaches.
A Review on Dimension Reduction
 INTERNATIONAL STATISTICAL REVIEW (2013), 81, 1, 134–150
, 2013
"... Summarizing the effect of many covariates through a few linear combinations is an effective way of reducing covariate dimension and is the backbone of (sufficient) dimension reduction. Because the replacement of highdimensional covariates by lowdimensional linear combinations is performed with a m ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Summarizing the effect of many covariates through a few linear combinations is an effective way of reducing covariate dimension and is the backbone of (sufficient) dimension reduction. Because the replacement of highdimensional covariates by lowdimensional linear combinations is performed with a minimum assumption on the specific regression form, it enjoys attractive advantages as well as encounters unique challenges in comparison with the variable selection approach. We review the current literature of dimension reduction with an emphasis on the two most popular models, where the dimension reduction affects the conditional distribution and the conditional mean, respectively. We discuss various estimation and inference procedures in different levels of detail, with the intention of focusing on their underneath idea instead of technicalities. We also discuss some unsolved problems in this area for potential future research.
Internet Accessible
 Lanzhou University, Lanzhou
, 1998
"... Genetic variants of p27 and p21 as predictors for risk of second primary malignancy in patients with index squamous cell carcinoma of head and neck ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Genetic variants of p27 and p21 as predictors for risk of second primary malignancy in patients with index squamous cell carcinoma of head and neck
Predictordependent shrinkage for linear regression via partial factor modeling
"... In prediction problems with more predictors than observations, it can sometimes be helpful to use a joint probability model, π(Y, X), rather than a purely conditional model, π(Y  X), where Y is a scalar response variable and X is a vector of predictors. This approach is motivated by the fact that i ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In prediction problems with more predictors than observations, it can sometimes be helpful to use a joint probability model, π(Y, X), rather than a purely conditional model, π(Y  X), where Y is a scalar response variable and X is a vector of predictors. This approach is motivated by the fact that in many situations the marginal predictor distribution π(X) can provide useful information about the parameter values governing the conditional regression. However, under very mild misspecification, this marginal distribution can also lead conditional inferences astray. Here, we explore these ideas in the context of linear factor models, to understand how they play out in a familiar setting. The resulting Bayesian model performs well across a wide range of covariance structures, on real and simulated data. 1
Gradientbased kernel method for feature extraction and variable selection
 In NIPS
, 2012
"... variable selection ..."
Bayesian partial factor regression
"... A Bayesian linear regression model is developed that cleanly addresses a longrecognized and fundamental difficulty of factor analytic regression – the response variable could be closely associated with the least important principal component. The model possesses inherent robustness to the choice of ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
A Bayesian linear regression model is developed that cleanly addresses a longrecognized and fundamental difficulty of factor analytic regression – the response variable could be closely associated with the least important principal component. The model possesses inherent robustness to the choice of the number of factors and provides a natural framework for variable selection of highly correlated predictors in high dimensional problems. In terms of outofsample prediction, the model is demonstrated to be competitive with partial least squares, ridge regression, and standard factor models under data regimes for which each of those methods excels; thus representing a promising default regression tool. By incorporating pointmass priors on key parameters this model permits variable selection in the presence of highly correlated predictors, as well as estimation of the sufficient dimension, in the p ≫ n setting. 1 The Predictor Distribution’s Role in Linear Regression 1.1
SNP Set Analysis for Detecting Disease Association Using Exon Sequence Data
"... Rare variants are believed to play important roles in disease etiology. Recent advances in high throughput sequencing technology enable one to systematically characterize the genetic effects of both common and rare variants. In this paper, we introduce several approaches which simultaneously test th ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Rare variants are believed to play important roles in disease etiology. Recent advances in high throughput sequencing technology enable one to systematically characterize the genetic effects of both common and rare variants. In this paper, we introduce several approaches which simultaneously test the effects of common and rare variants within a SNP set based on logistic regression models and logistic kernel machine models. Geneenvironment interactions and SNPSNP interactions are also considered in some of these models. We illustrate the performance of these methods using the unrelated individual data from Genetic Analysis Workshop 17. Three true disease genes, FLT1, PIK3C3, and KDR, have been consistently selected by the proposed methods. In addition, compared to logistic regression models, the logistic kernel machine models are more powerful, presumably because the latter reduce effective number of parameters through regularization. Our results also suggest that, a screening step is effective in decreasing the number of false positive findings which is often a big concern for association studies. Background
Xin Chen 1/4 RESEARCH STATEMENT
"... My research interests span the areas of dimension reduction, variable selection and statistical computing. In an era of highthroughput technologies and fast computing, it is essential to reduce redundancy in large data sets. Contemporary statistical theories and methodologies are quickly evolving t ..."
Abstract
 Add to MetaCart
My research interests span the areas of dimension reduction, variable selection and statistical computing. In an era of highthroughput technologies and fast computing, it is essential to reduce redundancy in large data sets. Contemporary statistical theories and methodologies are quickly evolving to adapt to the change, with rapid developments in areas such as dimension reduction, sparse variable selection via regularization, and “largepsmalln ” problems. Background Consider the regression of a univariate response y on p random predictors x = (x1,..., xp) T ∈ Rp, with the general goal of inferring about the conditional distribution of yx. Sufficient dimension reduction (SDR) introduced by Cook (1998a) is important in both theory and practice. It strives to reduce the dimension of x by replacing it with a minimal set of linear combinations of x, without loss of information on the conditional distribution of yx. If a predictor subspace S ⊆ Rp satisfies y ⊥ xPSx, where ⊥ stands for independence and P(·) represents the projection matrix with respect to the standard inner product, then S is called a dimension reduction space. The central subspace Syx, which is the intersection of all dimension reduction spaces, is an essential concept of SDR.