Results 1  10
of
149
Overview and recent advances in partial least squares
 in ‘Subspace, Latent Structure and Feature Selection Techniques’, Lecture Notes in Computer Science
, 2006
"... Partial Least Squares (PLS) is a wide class of methods for modeling relations between sets of observed variables by means of latent variables. It comprises of regression and classification tasks as well as dimension reduction techniques and modeling tools. The underlying assumption of all PLS method ..."
Abstract

Cited by 124 (4 self)
 Add to MetaCart
(Show Context)
Partial Least Squares (PLS) is a wide class of methods for modeling relations between sets of observed variables by means of latent variables. It comprises of regression and classification tasks as well as dimension reduction techniques and modeling tools. The underlying assumption of all PLS methods is that the
Kernel methods for measuring independence
 Journal of Machine Learning Research
, 2005
"... We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prov ..."
Abstract

Cited by 58 (19 self)
 Add to MetaCart
We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent. We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information. Analogous results apply for two correlationbased dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis.
Two view learning: SVM2K, theory and practice
 Advances in Neural Information Processing Systems
, 2006
"... Kernel methods make it relatively easy to define complex highdimensional feature spaces. This raises the question of how we can identify the relevant subspaces for a particular learning task. When two views of the same phenomenon are available kernel Canonical Correlation Analysis (KCCA) has been sh ..."
Abstract

Cited by 57 (7 self)
 Add to MetaCart
(Show Context)
Kernel methods make it relatively easy to define complex highdimensional feature spaces. This raises the question of how we can identify the relevant subspaces for a particular learning task. When two views of the same phenomenon are available kernel Canonical Correlation Analysis (KCCA) has been shown to be an effective preprocessing step that can improve the performance of classification algorithms such as the Support Vector Machine (SVM). This paper takes this observation to its logical conclusion and proposes a method that combines this two stage learning (KCCA followed by SVM) into a single optimisation termed SVM2K. We present both experimental and theoretical analysis of the approach showing encouraging results and insights. 1
MultiLabel Informed Latent Semantic Indexing
, 2005
"... Latent semantic indexing (LSI) is a wellknown unsupervised approach for dimensionality reduction in information retrieval. However if the output information (i.e. category labels) is available, it is often beneficial to derive the indexing not only based on the inputs but also on the target values ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
Latent semantic indexing (LSI) is a wellknown unsupervised approach for dimensionality reduction in information retrieval. However if the output information (i.e. category labels) is available, it is often beneficial to derive the indexing not only based on the inputs but also on the target values in the training data set. This is of particular importance in applications with multiple labels, in which each document can belong to several categories simultaneously. In this paper we introduce the multilabel informed latent semantic indexing (MLSI) algorithm which preserves the information of inputs and meanwhile captures the correlations between the multiple outputs. The recovered “latent semantics” thus incorporate the humanannotated category information and can be used to greatly improve the prediction accuracy. Empirical study based on two data sets, Reuters21578 and RCV1, demonstrates very encouraging results.
Kernel PLSSVC for Linear and Nonlinear Classification
 Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... A new method for classification is proposed. This is based on kernel orthonormalized partial least squares (PLS) dimensionality reduction of the original data space followed by a support vector classifier. Unlike principal component analysis (PCA), which has previously served as a dimension reductio ..."
Abstract

Cited by 36 (10 self)
 Add to MetaCart
(Show Context)
A new method for classification is proposed. This is based on kernel orthonormalized partial least squares (PLS) dimensionality reduction of the original data space followed by a support vector classifier. Unlike principal component analysis (PCA), which has previously served as a dimension reduction step for discrimination problems, orthonormalized PLS is closely related to Fisher’s approach to linear discrimination or equivalently to canonical correlation analysis. For this reason orthonormalized PLS is preferable to PCA for discrimination. Good behavior of the proposed method is demonstrated on 13 different benchmark data sets and on the real world problem of classifying finger movement periods from nonmovement periods based on electroencephalograms. 1.
Regression Error Characteristic CurVes
 Proceedings of the 20th International Conference on Machine Learning
, 2003
"... Receiver Operating Characteristic (ROC) curves provide a powerful tool for visualizing and comparing classification results. Regression Error Characteristic (REC) curves generalize ROC curves to regression. REC curves plot the error tolerance on the xaxis versus the percentage of points predicted wi ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
(Show Context)
Receiver Operating Characteristic (ROC) curves provide a powerful tool for visualizing and comparing classification results. Regression Error Characteristic (REC) curves generalize ROC curves to regression. REC curves plot the error tolerance on the xaxis versus the percentage of points predicted within the tolerance on the yaxis. The resulting curve estimates the cumulative distribution function of the error. The REC curve visually presents commonlyused statistics. The areaoverthecurve (AOC) is a biased estimate of the expected error. The R 2 value can be estimated using the ratio of the AOC for a given model to the AOC for the null model. Users can quickly assess the relative merits of many regression functions by examining the relative position of their REC curves. The shape of the curve reveals additional information that can be used to guide modeling. 1.
Using string kernels to identify famous performers from their playing style
 In Proceedings of the 15th European Conference on Machine Learning (ECML’2004
, 2004
"... In this chapter we show a novel application of string kernels: that is to the problem of recognising famous pianists from their style of playing. The characteristics of performers playing the same piece are obtained from changes in beatlevel tempo and beatlevel loudness, which over the time of the ..."
Abstract

Cited by 33 (9 self)
 Add to MetaCart
(Show Context)
In this chapter we show a novel application of string kernels: that is to the problem of recognising famous pianists from their style of playing. The characteristics of performers playing the same piece are obtained from changes in beatlevel tempo and beatlevel loudness, which over the time of the piece form a performance worm. From such worms, general performance alphabets can be derived, and pianists ’ performances can then be represented as strings. We show that when using the string kernel on this data, both kernel partial least squares and Support Vector Machines outperform the current best results. Furthermore we suggest a new method of obtaining feature directions from the Kernel Partial Least Squares algorithm and show that this can deliver better performance than methods previously used in the literature when used in conjunction with a Support Vector Machine 1
Estimating the sentencelevel quality of Machine Translation systems
 In EAMT
, 2009
"... We investigate the problem of predicting the quality of sentences produced by machine translation systems when reference translations are not available. The problem is addressed as a regression task and a method that takes into account the contribution of different features is proposed. We experi ..."
Abstract

Cited by 31 (11 self)
 Add to MetaCart
(Show Context)
We investigate the problem of predicting the quality of sentences produced by machine translation systems when reference translations are not available. The problem is addressed as a regression task and a method that takes into account the contribution of different features is proposed. We experiment with this method for translations produced by various MT systems and different language pairs, annotated with quality scores both automatically and manually. Results show that our method allows obtaining good estimates and that identifying a reduced set of relevant features plays an important role. The experiments also highlight a number of outstanding features that were consistently selected as the most relevant and could be used in different ways to improve MT performance or to enhance MT evaluation. 1
The Geometry of Kernel Canonical Correlation Analysis
, 2003
"... Canonical correlation analysis (CCA) is a classical multivariate method concerned with describing linear dependencies between sets of variables. After a short exposition of the linear sample CCA problem and its analytical solution, the article proceeds with a detailed characterization of its geometr ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
Canonical correlation analysis (CCA) is a classical multivariate method concerned with describing linear dependencies between sets of variables. After a short exposition of the linear sample CCA problem and its analytical solution, the article proceeds with a detailed characterization of its geometry. Projection operators are used to illustrate the relations between canonical vectors and variates. The article then addresses the problem of CCA between spaces spanned by objects mapped into kernel feature spaces. An exact solution for this kernel canonical correlation (KCCA) problem is derived from a geometric point of view. It shows that the expansion coefficients of the canonical vectors in their respective feature space can be found by linear CCA in the basis induced by kernel principal component analysis. The effect of mappings into higher dimensional feature spaces is considered critically since it simplifies the CCA problem in general. Then two regularized variants of KCCA are discussed. Relations to other methods are illustrated, e.g., multicategory kernel Fisher discriminant analysis, kernel principal component regression and possible applications thereof in blind source separation.
Learning via Linear Operators: Maximum Margin Regression
 In Proceedings of 2001 IEEE International Conference on Data Mining
, 2005
"... We introduce a maximum margin framework realizing a regression type learning in an arbitrary Hilbert space whilst the corresponding dual problem preserving the structure and, therefore, the complexity that of the binary Support Vector Machine(SVM). We demonstrate via some examples this learning fram ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
We introduce a maximum margin framework realizing a regression type learning in an arbitrary Hilbert space whilst the corresponding dual problem preserving the structure and, therefore, the complexity that of the binary Support Vector Machine(SVM). We demonstrate via some examples this learning framework is broadly applicable in several seemingly different problems. One example is the multiclass classification problem which, in this way, can be implemented with the complexity of a binary SVM. The reduction of the complexity does not involve diminishing performance but, in some cases this approach can improve the classification accuracy. The multiclass classification is realized where the output labels are vector valued. Other examples implement multiview learning problems.