Results 1 
8 of
8
The Degrees of Freedom of Partial Least Squares Regression
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION. VOL.106, NO.494, PP.697–705, 2011
, 2011
"... The derivation of statistical properties for Partial Least Squares regression can be a challenging task. The reason is that the construction of latent components from the predictor variables also depends on the response variable. While this typically leads to good performance and interpretable model ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
The derivation of statistical properties for Partial Least Squares regression can be a challenging task. The reason is that the construction of latent components from the predictor variables also depends on the response variable. While this typically leads to good performance and interpretable models in practice, it makes the statistical analysis more involved. In this work, we study the intrinsic complexity of Partial Least Squares Regression. Our contribution is an unbiased estimate of its Degrees of Freedom. It is defined as the trace of the first derivative of the fitted values, seen as a function of the response. We establish two equivalent representations that rely on the close connection of Partial Least Squares to matrix decompositions and Krylov subspace techniques. We show that the Degrees of Freedom depend on the collinearity of the predictor variables: The lower the collinearity is, the higher the Degrees of Freedom are. In particular, they are typically higher than the naive approach that defines the Degrees of Freedom as the number of components. Further, we illustrate how our Degrees of Freedom estimate can be used for the comparison of different regression methods. In the experimental section, we show that our Degrees of Freedom estimate in combination with information criteria is useful for model selection.
Using basis expansions for estimating functional PLS regression. Applications with chemometric data
, 2010
"... There are many chemometric applications, such as spectroscopy, where the objective is to explain a scalar response from a functional variable (the spectrum) whose observations are functions of wavelengths rather than vectors. In this paper, PLS regression is considered for estimating the linear mode ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
There are many chemometric applications, such as spectroscopy, where the objective is to explain a scalar response from a functional variable (the spectrum) whose observations are functions of wavelengths rather than vectors. In this paper, PLS regression is considered for estimating the linear model when the predictor is a functional random variable. Due to the infinite dimension of the space to which the predictor observations belong, they are usually approximated by curves/functions within a finite dimensional space spanned by a basis of functions. We show that PLS regression with a functional predictor is equivalent to finite multivariate PLS regression using expansion basis coefficients as the predictor, in the sense that, at each step of the PLS iteration, the same prediction is obtained. In addition, from the linear model estimated using the basis coefficients, we derive the expression of the PLS estimate of the regression coefficient function from the model with a functional predictor. The results provided by this functional PLS approach are compared with those given by functional PCR and discrete PLS and PCR using different sets of simulated and spectrometric data.
Lanczos Approximations for the Speedup of Kernel Partial Least Squares Regression
"... The runtime for Kernel Partial Least Squares (KPLS) to compute the fit is quadratic in the number of examples. However, the necessity of obtaining sensitivity measures as degrees of freedom for model selection or confidence intervals for more detailed analysis requires cubic runtime, and thus consti ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
The runtime for Kernel Partial Least Squares (KPLS) to compute the fit is quadratic in the number of examples. However, the necessity of obtaining sensitivity measures as degrees of freedom for model selection or confidence intervals for more detailed analysis requires cubic runtime, and thus constitutes a computational bottleneck in realworld data analysis. We propose a novel algorithm for KPLS which not only computes (a) the fit, but also (b) its approximate degrees of freedom and (c) error bars in quadratic runtime. The algorithm exploits a close connection between Kernel PLS and the Lanczos algorithm for approximating the eigenvalues of symmetric matrices, and uses this approximation to compute the trace of powers of the kernel matrix in quadratic runtime. 1
Reduced rank regression in Bayesian FDA
, 2010
"... In functional data analysis (FDA) it is of interest to generalize techniques of multivariate analysis like canonical correlation analysis or regression to functions which are often observed with noise. In the proposed Bayesian approach to FDA two tools are combined: (i) a special DemmlerReinsch lik ..."
Abstract
 Add to MetaCart
In functional data analysis (FDA) it is of interest to generalize techniques of multivariate analysis like canonical correlation analysis or regression to functions which are often observed with noise. In the proposed Bayesian approach to FDA two tools are combined: (i) a special DemmlerReinsch like basis of interpolation splines to represent functions parsimoniously and ‡exibly; (ii) latent variable models for probabilistic principal components analysis or canonical correlation analysis of the corresponding coe ¢ cients. In this way partial curves and nonGaussian measurement error schemes can be handled. Bayesian inference is based on a variational algorithm such that computations are straight forward and fast corresponding to an idea of FDA as a toolbox for explorative data analysis. The performance of the approach is illustrated with synthetic and real data sets. As detailed in the table of contents the paper has a “vertical ” structure corresponding to topics in data analysis and de…ning the sequence of chapters and a “horizontal ” structure referring to the most important special cases of the proposed model: FCCA, functional regression, scalar prediction, classi…cation. Within chapters the special cases are addressed in turn such that a reader interested only in a special application of the model may skip the other sections.
Functional time series forecasting
"... We propose forecasting functional time series using weighted functional principal component regression and weighted functional partial least squares regression. These approaches allow for smooth functions, assign higher weights to more recent data, and provide a modeling scheme that is easily adapte ..."
Abstract
 Add to MetaCart
We propose forecasting functional time series using weighted functional principal component regression and weighted functional partial least squares regression. These approaches allow for smooth functions, assign higher weights to more recent data, and provide a modeling scheme that is easily adapted to allow for constraints and other information. We illustrate our approaches using agespecific French female mortality rates from 1816 to 2006 and agespecific Australian fertility rates from 1921 to 2006, and show that these weighted methods improve forecast accuracy in comparison to their unweighted counterparts. We also propose two new bootstrap methods to construct prediction intervals, and evaluate and compare their empirical coverage probabilities.
786 Biophysical Journal Volume 103 August 2012 786–796 Partial LeastSquares Functional Mode Analysis: Application to the Membrane Proteins AQP1, Aqy1, and CLCec1
"... We introduce an approach based on the recently introduced functional mode analysis to identify collective modes of internal dynamics that maximally correlate to an external order parameter of functional interest. Input structural data can be either experimentally determined structure ensembles or si ..."
Abstract
 Add to MetaCart
We introduce an approach based on the recently introduced functional mode analysis to identify collective modes of internal dynamics that maximally correlate to an external order parameter of functional interest. Input structural data can be either experimentally determined structure ensembles or simulated ensembles, such as molecular dynamics trajectories. Partial leastsquares regression is shown to yield a robust solution to the multidimensional optimization problem, with a minimal and controllable risk of overfitting, as shown by extensive crossvalidation. Several examples illustrate that the partial leastsquaresbased functional mode analysis successfully reveals the collective dynamics underlying the fluctuations in selected functional order parameters. Applications to T4 lysozyme, the Trpcage, the aquaporin channels Aqy1 and hAQP1, and the CLCec1 chloride antiporter are presented in which the active site geometry, the hydrophobic solventaccessible surface, channel gating dynamics, water permeability (p f), and a dihedral angle are defined as functional order parameters. The Aqy1 case reveals a gating mechanism that connects the inner channel gating residues with the protein surface, thereby providing an explanation of how the membrane may affect the channel. hAQP1 shows how the pf correlates with structural changes around the aromatic/arginine region of the pore. The CLCec1 application shows how local motions of the gating Glu 148 couple to a collective motion that affects ion affinity in the pore.
Review Article A Review of Feature Extraction Software for Microarray Gene Expression Data
"... License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming largescale gene expression data ..."
Abstract
 Add to MetaCart
(Show Context)
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. When gene expression data are too large to be processed, they are transformed into a reduced representation set of genes. Transforming largescale gene expression data into a set of genes is called feature extraction. If the genes extracted are carefully chosen, this gene set can extract the relevant information from the largescale gene expression data, allowing further analysis by using this reduced representation instead of the full size data. In this paper, we review numerous software applications that can be used for feature extraction. The software reviewed is mainly for Principal Component Analysis (PCA), Independent Component