Results 1  10
of
558
Canonical correlation analysis; An overview with application to learning methods
, 2007
"... We present a general method using kernel Canonical Correlation Analysis to learn a semantic representation to web images and their associated text. The semantic space provides a common representation and enables a comparison between the text and images. In the experiments we look at two approaches o ..."
Abstract

Cited by 337 (17 self)
 Add to MetaCart
We present a general method using kernel Canonical Correlation Analysis to learn a semantic representation to web images and their associated text. The semantic space provides a common representation and enables a comparison between the text and images. In the experiments we look at two approaches of retrieving images based only on their content from a text query. We compare the approaches against a standard crossrepresentation retrieval technique known as the Generalised Vector Space Model.
Online learning for matrix factorization and sparse coding
, 2010
"... Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the largescale matrix factorization problem that consists of learning the basis set in order to ad ..."
Abstract

Cited by 317 (31 self)
 Add to MetaCart
(Show Context)
Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the largescale matrix factorization problem that consists of learning the basis set in order to adapt it to specific data. Variations of this problem include dictionary learning in signal processing, nonnegative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large data sets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to stateoftheart performance in terms of speed and optimization for both small and large data sets.
Convex multitask feature learning
 MACHINE LEARNING
, 2007
"... We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We prove th ..."
Abstract

Cited by 250 (25 self)
 Add to MetaCart
(Show Context)
We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We prove that the method is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the former step it learns taskspecific functions and in the latter step it learns commonacrosstasks sparse representations for these functions. We also provide an extension of the algorithm which learns sparse nonlinear representations using kernels. We report experiments on simulated and real data sets which demonstrate that the proposed method can both improve the performance relative to learning each task independently and lead to a few learned features common across related tasks. Our algorithm can also be used, as a special case, to simply select – not learn – a few common variables across the tasks.
Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2007
"... Abstract—We address the problem of comparing sets of images for object recognition, where the sets may represent variations in an object’s appearance due to changing camera pose and lighting conditions. Canonical Correlations (also known as principal or canonical angles), which can be thought of as ..."
Abstract

Cited by 126 (11 self)
 Add to MetaCart
(Show Context)
Abstract—We address the problem of comparing sets of images for object recognition, where the sets may represent variations in an object’s appearance due to changing camera pose and lighting conditions. Canonical Correlations (also known as principal or canonical angles), which can be thought of as the angles between two ddimensional subspaces, have recently attracted attention for image set matching. Canonical correlations offer many benefits in accuracy, efficiency, and robustness compared to the two main classical methods: parametric distributionbased and nonparametric samplebased matching of sets. Here, this is first demonstrated experimentally for reasonably sized data sets using existing methods exploiting canonical correlations. Motivated by their proven effectiveness, a novel discriminative learning method over sets is proposed for set classification. Specifically, inspired by classical Linear Discriminant Analysis (LDA), we develop a linear discriminant function that maximizes the canonical correlations of withinclass sets and minimizes the canonical correlations of betweenclass sets. Image sets transformed by the discriminant function are then compared by the canonical correlations. Classical orthogonal subspace method (OSM) is also investigated for the similar purpose and compared with the proposed method. The proposed method is evaluated on various object recognition problems using face image sets with arbitrary motion captured under different illuminations and image sets of 500 general objects taken at different views. The method is also applied to object category recognition using ETH80 database. The proposed method is shown to outperform the stateoftheart methods in terms of accuracy and efficiency. Index Terms—Object recognition, face recognition, image sets, canonical correlation, principal angles, canonical correlation analysis, linear discriminant analysis, orthogonal subspace method. Ç 1
On the Early History of the Singular Value Decomposition
, 1992
"... This paper surveys the contributions of five mathematicians  Eugenio Beltrami (18351899), Camille Jordan (18381921), James Joseph Sylvester (18141897), Erhard Schmidt (18761959), and Hermann Weyl (18851955)  who were responsible for establishing the existence of the singular value de ..."
Abstract

Cited by 122 (1 self)
 Add to MetaCart
This paper surveys the contributions of five mathematicians  Eugenio Beltrami (18351899), Camille Jordan (18381921), James Joseph Sylvester (18141897), Erhard Schmidt (18761959), and Hermann Weyl (18851955)  who were responsible for establishing the existence of the singular value decomposition and developing its theory.
Basic Properties of Strong Mixing Conditions. A Survey and Some Open Questions
 PROBABILITY SURVEYS
, 2005
"... This is an update of, and a supplement to, the author’s earlier survey paper [18] on basic properties of strong mixing conditions. That paper appeared in 1986 in a book containing survey papers on various types of dependence conditions and the limit theory under them. The survey here will include pa ..."
Abstract

Cited by 121 (0 self)
 Add to MetaCart
(Show Context)
This is an update of, and a supplement to, the author’s earlier survey paper [18] on basic properties of strong mixing conditions. That paper appeared in 1986 in a book containing survey papers on various types of dependence conditions and the limit theory under them. The survey here will include part (but not all) of the material in [18], and will also describe some relevant material that was not in that paper, especially some new discoveries and developments that have occurred since that paper was published. (Much of the new material described here involves “interlaced ” strong mixing conditions, in which the index sets are not restricted to “past ” and “future.”) At various places in this survey, open problems will be posed. There is a large literature on basic properties of strong mixing conditions. A survey such as this cannot do full justice to it. Here are a few references on important topics not covered in this survey. For the approximation of mixing sequences by martingale differences, see e.g. the book by Hall and Heyde [80]. For the direct approximation of mixing random variables by independent ones,
Learning over Sets using Kernel Principal Angles
 Journal of Machine Learning Research
, 2003
"... We consider the problem of learning with instances defined over a space of sets of vectors. We derive a new positive definite kernel f (A,B) defined over pairs of matrices A,B based on the concept of principal angles between two linear subspaces. We show that the principal angles can be recovered ..."
Abstract

Cited by 105 (2 self)
 Add to MetaCart
(Show Context)
We consider the problem of learning with instances defined over a space of sets of vectors. We derive a new positive definite kernel f (A,B) defined over pairs of matrices A,B based on the concept of principal angles between two linear subspaces. We show that the principal angles can be recovered using only innerproducts between pairs of column vectors of the input matrices thereby allowing the original column vectors of A,B to be mapped onto arbitrarily highdimensional feature spaces.
A Multistage Representation of the Wiener Filter Based on Orthogonal Projections
 IEEE Transactions on Information Theory
, 1998
"... The Wiener filter is analyzed for stationary complex Gaussian signals from an informationtheoretic point of view. A dualport analysis of the Wiener filter leads to a decomposition based on orthogonal projections and results in a new multistage method for implementing the Wiener filter using a nest ..."
Abstract

Cited by 102 (5 self)
 Add to MetaCart
(Show Context)
The Wiener filter is analyzed for stationary complex Gaussian signals from an informationtheoretic point of view. A dualport analysis of the Wiener filter leads to a decomposition based on orthogonal projections and results in a new multistage method for implementing the Wiener filter using a nested chain of scalar Wiener filters. This new representation of the Wiener filter provides the capability to perform an informationtheoretic analysis of previous, basisdependent, reducedrank Wiener filters. This analysis demonstrates that the recently introduced crossspectral metric is optimal in the sense that it maximizes mutual information between the observed and desired processes. A new reducedrank Wiener filter is developed based on this new structure which evolves a basis using successive projections of the desired signal onto orthogonal, lower dimensional subspaces. The performance is evaluated using a comparative computer analysis model and it is demonstrated that the lowcomplexity multistage reducedrank Wiener filter is capable of outperforming the more complex eigendecompositionbased methods.
A probabilistic interpretation of canonical correlation analysis
, 2005
"... We give a probabilistic interpretation of canonical correlation (CCA) analysis as a latent variable model for two Gaussian random vectors. Our interpretation is similar to the probabilistic interpretation of principal component analysis (Tipping and Bishop, 1999, Roweis, 1998). In addition, we can i ..."
Abstract

Cited by 102 (1 self)
 Add to MetaCart
(Show Context)
We give a probabilistic interpretation of canonical correlation (CCA) analysis as a latent variable model for two Gaussian random vectors. Our interpretation is similar to the probabilistic interpretation of principal component analysis (Tipping and Bishop, 1999, Roweis, 1998). In addition, we can interpret Fisher linear discriminant analysis (LDA) as CCA between appropriately defined vectors. 1
A Review of Kernel Methods in Machine Learning
, 2006
"... We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticate ..."
Abstract

Cited by 95 (4 self)
 Add to MetaCart
We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticated methods for estimation with structured data.