#### DMCA

## Kernel methods on the Riemannian manifold of symmetric positive definite matrices. (2013)

### Cached

### Download Links

Venue: | In CVPR, |

Citations: | 32 - 10 self |

### Citations

3734 | Histograms of oriented gradients for human detection
- Dalal, Triggs
- 2005
(Show Context)
Citation Context ...he 100 selected subwindows. At test time, detection is achieved in a sliding window manner followed by a non-maxima suppression step. To evaluate our approach, we made use of the INRIA person dataset =-=[6]-=-. Its training set consists of 2,416 positive windows and 1,280 person-free negative images, and its test set of 1,237 positive windows and 453 negative images. Negative windows are generated by sampl... |

1572 | Nonlinear Component Analysis as a Kernel Eigenvalue Problem
- SchÄolkopf, Smola, et al.
- 1998
(Show Context)
Citation Context ... and k as K (j) pq = k(gj(xp), gj(xq)). The combined kernel can be expressed as K∗ = ∑ j λjK (j), where λj ≥ 0 for j = 1 . . . N guarantees the positive definiteness of K∗. The weights λ can be learned using a min-max optimization procedure with an L1 regularizer on λ to obtain a sparse combination of kernels. For more details, we refer the reader to [22] and [21]. Note that convergence of MKL is only guaranteed if all the kernels are positive definite. 5.3. Kernel PCA on Sym+d We now describe the key concepts of kernel PCA on Sym+d . Kernel PCA is a non-linear dimensionality reduction method [17]. Since it works in feature space, kernel PCA may, however, extract a number of dimensions that exceeds the dimensionality of the input space. Kernel PCA proceeds as follows: All points Xi ∈ Sym+d of a given dataset {Xi}mi=1 are mapped to feature vectors in H, thus yielding the transformed set, {φ(Xi)}mi=1. The covariance matrix of this transformed set is then computed, which really amounts to computing the kernel matrix of the original data using the function k. An l-dimensional representation of the data is obtained by computing the eigenvectors of the kernel matrix. This representation can ... |

1508 |
Fast training of support vector machines using sequential minimal optimization
- Platt
- 1999
(Show Context)
Citation Context ...te only for some values of the Gaussian bandwidth parameter σ [9]. For all kernel methods, the optimal choice of σ largely depends on the data distribution and hence constraints on σ are not desirable. Moreover, many popular automatic model selection methods require σ to be continuously variable [5]. Other than for satisfying Mercer’s theorem to generate a valid RKHS, positive definiteness of the kernel is a required condition for the convergence of many kernel based algorithms. For instance, the Support Vector Machine (SVM) learning problem is convex only when the kernel is positive definite [14]. Similarly, positive definiteness of all participating kernels is required to guarantee the convexity in Multiple Kernel Learning (MKL) [22]. Although theories have been proposed to exploit non-positive definite kernels [12, 23], they have not experienced a widespread success. Many of these methods first enforce positive definiteness of the kernel by flipping or shifting its negative eigenvalues [23]. As a consequence, they result in a loss of information and become inapplicable with large sized kernels that are not uncommon in learning problems. Recently, mean-shift clustering with a positiv... |

469 | Choosing multiple parameters for support vector machines
- Chapelle, Vapnik, et al.
- 2002
(Show Context)
Citation Context ...timal choice of σ largely depends on the data distribution and hence constraints on σ are not desirable. Moreover, many popular automatic model selection methods require σ to be continuously variable =-=[5]-=-. Other than for satisfying Mercer’s theorem to generate a valid RKHS, positive definiteness of the kernel is a required condition for the convergence of many kernel based algorithms. For instance, th... |

368 |
Filtering for Texture Classification: A Comparative Study”,
- Randen, Husoy
- 1999
(Show Context)
Citation Context ...r mean [7] was used to compute the centroid. The results of the different methods are summarized in Table 2. Manifold kernel k-means with the log-Euclidean metric performs significantly better than all other methods in all test cases. These results also outperform the results with the heat kernel reported in [4]. Note, however, that [4] only considered 3 and 4 classes without mentioning which classes were used. 6.3. Texture Recognition We then utilized our Riemannian kernel to demonstrate the effectiveness of manifold kernel PCA on texture recognition. To this end, we used the Brodatz dataset [15], which consists of 111 different 640×640 texture images. Each image was divided into four subimages of equal size, two of which were used for training and the other two for testing. For each training image, covariance descriptors of randomly chosen 50 128× 128 windows were computed from the feature vector [I, |Ix|, |Iy|, |Ixx |, |Iyy|] [19]. Kernel PCA on Sym+5 with our Riemannian kernel was then used to extract the top l principal directions in the RKHS, and project the training data along those directions. Given a test image, we computed 100 covariance descriptors from random windows and pr... |

286 | A Riemannian Framework for Tensor Computing.
- Pennec, Fillard, et al.
- 2006
(Show Context)
Citation Context ...re analysis, 2D motion segmentation and Diffusion Tensor Imaging (DTI) segmentation. 1. Introduction Many mathematical entities in computer vision do not form vector spaces, but reside on non-linear manifolds. For instance, 3D rotation matrices form the SO(3) group, linear subspaces of the Euclidean space form the Grassmann manifold, and normalized histograms form the unit n-sphere Sn. Symmetric positive definite (SPD) matrices are another class of entities lying on a Riemannian manifold. Examples of SPD matrices in computer vision include covariance region descriptors [19], diffusion tensors [13] and structure tensors [8]. Despite the abundance of such manifold-valued data, computer vision algorithms are still primarily developed for data points lying in Euclidean space (Rn). Applying these algorithms directly to points on non-linear manifolds, and thus neglecting the geometry of the manifold, often yields ∗NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the ARC through the ICT Centre of Excellence program. This work was supported in part by an ARC grant. poor accuracy and undesirable effects, such ... |

278 | Region covariance: A fast descriptor for detection and classification
- Tuzel, Porikli, et al.
- 2006
(Show Context)
Citation Context ...ct categorization, texture analysis, 2D motion segmentation and Diffusion Tensor Imaging (DTI) segmentation. 1. Introduction Many mathematical entities in computer vision do not form vector spaces, but reside on non-linear manifolds. For instance, 3D rotation matrices form the SO(3) group, linear subspaces of the Euclidean space form the Grassmann manifold, and normalized histograms form the unit n-sphere Sn. Symmetric positive definite (SPD) matrices are another class of entities lying on a Riemannian manifold. Examples of SPD matrices in computer vision include covariance region descriptors [19], diffusion tensors [13] and structure tensors [8]. Despite the abundance of such manifold-valued data, computer vision algorithms are still primarily developed for data points lying in Euclidean space (Rn). Applying these algorithms directly to points on non-linear manifolds, and thus neglecting the geometry of the manifold, often yields ∗NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the ARC through the ICT Centre of Excellence program. This work was supported in part by an ARC grant. poor accuracy and un... |

227 | Learning the discriminative powerinvariance trade-off.
- Varma, Ray
- 2007
(Show Context)
Citation Context ...a distribution and hence constraints on σ are not desirable. Moreover, many popular automatic model selection methods require σ to be continuously variable [5]. Other than for satisfying Mercer’s theorem to generate a valid RKHS, positive definiteness of the kernel is a required condition for the convergence of many kernel based algorithms. For instance, the Support Vector Machine (SVM) learning problem is convex only when the kernel is positive definite [14]. Similarly, positive definiteness of all participating kernels is required to guarantee the convexity in Multiple Kernel Learning (MKL) [22]. Although theories have been proposed to exploit non-positive definite kernels [12, 23], they have not experienced a widespread success. Many of these methods first enforce positive definiteness of the kernel by flipping or shifting its negative eigenvalues [23]. As a consequence, they result in a loss of information and become inapplicable with large sized kernels that are not uncommon in learning problems. Recently, mean-shift clustering with a positive definite heat kernel on Riemannian manifolds was introduced [4]. However, due to the mathematical complexity of the kernel function, comput... |

219 | Analyzing appearance and contour based methods for object categorization
- Leibe, Schiele
- 2003
(Show Context)
Citation Context ...tradeoff (DET) curves of our approach and state-of-the-art methods. The curve for our method was generated by continuously varying the decision threshold of the final MKL classifier. We also evaluated our MKL framework with a Euclidean kernel. Note that the proposed MKL method with a Riemannian kernel outperforms MKL with a Euclidean kernel, as well as LogitBoost on the manifold. This suggests the importance of accounting for the geometry of the manifold. 6.2. Visual Object Categorization We next tackle the problem of unsupervised object categorization. To this end, we used the ETH-80 dataset [11] which contains 8 categories with 10 objects each and 41 images per object. We used 21 randomly chosen images from each object to compute the parameter σ and the rest to evaluate clustering accuracy. For each image, we used a single 5× 5 covariance descriptor calculated from the features [x, y, I , |Ix |, |Iy|], where x, y are pixel locations and I , Ix, Iy are intensity and derivatives. To obtain object categories, the kernel k-means algorithm on Sym+5 described in Section 5.4 was employed to perform clustering. One drawback of k-means and its kernel counterpart is their sensitivity to initia... |

217 | Log-euclidean metrics for fast and simple calculus on diffusion tensors. Magnetic Resonance in Medicine
- Arsigny, Pennec, et al.
- 2006
(Show Context)
Citation Context ...ugh the ICT Centre of Excellence program. This work was supported in part by an ARC grant. poor accuracy and undesirable effects, such as the swelling of diffusion tensors in the case of SPD matrices =-=[2, 13]-=-. Recently, many attempts have been made to generalize algorithms developed for Rn to Riemannian manifolds [20, 8]. The most common approach consists in computing the tangent space to the manifold at ... |

203 |
Harmonic Analysis on Semigroups,
- Berg, Christensen, et al.
- 1984
(Show Context)
Citation Context ...ψ : M → V such that, d(xi, xj) = ‖ψ(xi) − ψ(xj)‖ V . Proof. The proof of Theorem 4.1 follows a number of steps detailed below. We start with the definition of positive and negative definite functions =-=[3]-=-. Definition 4.2. Let X be a nonempty set. A function f : (X × X ) → R is called a positive (resp. negative) definite kernel if and only if f is symmetric and m∑ cicjf(xi, xj) ≥ 0 (resp. ≤ 0) i,j=1 fo... |

194 | Metric spaces and positive definite functions.
- Schoenberg
- 1938
(Show Context)
Citation Context ... :M → V such that, d(xi, xj) = ‖ψ(xi)− ψ(xj)‖V . Proof. The proof of Theorem 4.1 follows a number of steps detailed below. We start with the definition of positive and negative definite functions [3]. Definition 4.2. Let X be a nonempty set. A function f : (X × X )→ R is called a positive (resp. negative) definite kernel if and only if f is symmetric and m∑ i,j=1 cicjf(xi, xj) ≥ 0 (resp. ≤ 0) for all m ∈ N, {x1, . . . , xm} ⊆ X and {c1, ..., cm} ⊆ R, with ∑m i=1 ci = 0 in the negative definite case. Given this definition, we make use of the following important theorem due mainly to Schoenberg [16]. Theorem 4.3. LetX be a nonempty set and f : (X ×X )→ R be a function. The kernel exp(−tf(xi, xj)) is positive definite for all t > 0 if and only if f is negative definite. Proof. We refer the reader to Chapter 3, Theorem 2.2 of [3] for a detailed proof of this theorem. Although the origin of this theorem dates back to 1938 [16], it has received little attention in the computer vision community. Theorem 4.3 implies that positive definiteness of the Gaussian kernel induced by a distance is equivalent to negative definiteness of the squared distance function. Therefore, to prove the positive de... |

140 | Pedestrian detection via classification on Riemannian manifolds
- Tuzel, Porikli, et al.
- 2008
(Show Context)
Citation Context ...Rn). Applying these algorithms directly to points on non-linear manifolds, and thus neglecting the geometry of the manifold, often yields ∗NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the ARC through the ICT Centre of Excellence program. This work was supported in part by an ARC grant. poor accuracy and undesirable effects, such as the swelling of diffusion tensors in the case of SPD matrices [2, 13]. Recently, many attempts have been made to generalize algorithms developed for Rn to Riemannian manifolds [20, 8]. The most common approach consists in computing the tangent space to the manifold at the mean of the data points to obtain a Euclidean approximation of the manifold [20]. The logarithmic and exponential maps are then iteratively used to map points from the manifold to the tangent space, and vice-versa. Unfortunately, the resulting algorithms suffer from two drawbacks: The iterative use of the logarithmic and exponential maps makes them computationally expensive, and, more importantly, they only approximate true distances on the manifold by Euclidean distances on the tangent space. To overcome... |

80 | More generality in efficient multiple kernel learning”,In
- Varma, Babu
- 2009
(Show Context)
Citation Context ...makes use of a rich high dimensional feature space. As will be shown in our experiments, this yields better classification results. 5.2. Multiple Kernel Learning on Sym+d The core idea of Multiple Kernel Learning (MKL) is to combine kernels computed from different descriptors (e.g., image features) to obtain a kernel that optimally separates two classes for a given classifier. Here, we follow the formulation of [22], and make use of an SVM classifier. As a feature selection method, MKL has proven more effective than conventional feature selection methods such as wrappers, filters and boosting [21]. More specifically, given training examples {(xi, yi)}m1 , where xi ∈ X , yi ∈ {−1, 1}, and a set of descriptor generating functions {gj}N1 where gj : X → Sym+d , we seek to learn a binary classifier f : X → {−1, 1} by selecting and optimally combining the different descriptors generated by g1, . . . , gN . Let K (j) be the kernel matrix generated by gj and k as K (j) pq = k(gj(xp), gj(xq)). The combined kernel can be expressed as K∗ = ∑ j λjK (j), where λj ≥ 0 for j = 1 . . . N guarantees the positive definiteness of K∗. The weights λ can be learned using a min-max optimization procedure wit... |

55 | Learning with nonpositive kernels
- Ong, Mary, et al.
- 2004
(Show Context)
Citation Context ...automatic model selection methods require σ to be continuously variable [5]. Other than for satisfying Mercer’s theorem to generate a valid RKHS, positive definiteness of the kernel is a required condition for the convergence of many kernel based algorithms. For instance, the Support Vector Machine (SVM) learning problem is convex only when the kernel is positive definite [14]. Similarly, positive definiteness of all participating kernels is required to guarantee the convexity in Multiple Kernel Learning (MKL) [22]. Although theories have been proposed to exploit non-positive definite kernels [12, 23], they have not experienced a widespread success. Many of these methods first enforce positive definiteness of the kernel by flipping or shifting its negative eigenvalues [23]. As a consequence, they result in a loss of information and become inapplicable with large sized kernels that are not uncommon in learning problems. Recently, mean-shift clustering with a positive definite heat kernel on Riemannian manifolds was introduced [4]. However, due to the mathematical complexity of the kernel function, computing it is not tractable and hence only an approximation of the true kernel was used in t... |

40 | Non-Euclidean Statistics for Covariance Matrices, with Applications to
- Dryden, Koloydenko, et al.
- 2009
(Show Context)
Citation Context ... logEuclidean distance [2]. The main reason for their popularity is that they are true geodesic distances induced by Riemannian metrics. For a review of metrics on Sym + d , the reader is referred to =-=[7]-=-. 3.2. Kernel Methods on Non-linear Manifolds Kernel methods in R n have proven extremely effective in machine learning and computer vision to explore non-linear patterns in data. The fundamental idea... |

34 |
Sparse Coding and Dictionary Learning for Symmetric Positive Definite Matrices: A Kernel Approach.
- Harandi, Sanderson, et al.
- 2012
(Show Context)
Citation Context ...has been shown to define a true geodesic distance on Sym+d . We demonstrate the benefits of our manifold-based kernel by exploiting it in four different algorithms. Our experiments show that the resulting manifold kernel methods outperform the corresponding Euclidean kernel methods, as well as the manifold methods that use tangent space approximations. 2. Related Work SPD matrices find a variety of applications in computer vision. For instance, covariance region descriptors are used in object detection [20], texture classification [19], object tracking, action recognition and face recognition [9]. Diffusion Tensor Imaging (DTI) was one of the pioneering fields for the development of non-linear algorithms on Sym+d [13, 2]. In optical flow estimation and motion segmentation, structure tensors are often employed to encode important image features, such as texture and motion [8]. In recent years, several optimization algorithms on manifolds have been proposed for Sym+d . In particular, LogitBoost on a manifold was introduced for binary classification [20]. This algorithm has the drawbacks of approximating the manifold by tangent spaces and not scaling with the number of training samples d... |

28 | Clustering and Dimensionality Reduction on Riemannian Manifolds.
- Goh, Vidal
- 2008
(Show Context)
Citation Context ...entation and Diffusion Tensor Imaging (DTI) segmentation. 1. Introduction Many mathematical entities in computer vision do not form vector spaces, but reside on non-linear manifolds. For instance, 3D rotation matrices form the SO(3) group, linear subspaces of the Euclidean space form the Grassmann manifold, and normalized histograms form the unit n-sphere Sn. Symmetric positive definite (SPD) matrices are another class of entities lying on a Riemannian manifold. Examples of SPD matrices in computer vision include covariance region descriptors [19], diffusion tensors [13] and structure tensors [8]. Despite the abundance of such manifold-valued data, computer vision algorithms are still primarily developed for data points lying in Euclidean space (Rn). Applying these algorithms directly to points on non-linear manifolds, and thus neglecting the geometry of the manifold, often yields ∗NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the ARC through the ICT Centre of Excellence program. This work was supported in part by an ARC grant. poor accuracy and undesirable effects, such as the swelling of diffusi... |

21 | An analysis of transformation on non-positive semidefinite similarity matrix for kernel machines,”
- Wu, Chang, et al.
- 2005
(Show Context)
Citation Context ...automatic model selection methods require σ to be continuously variable [5]. Other than for satisfying Mercer’s theorem to generate a valid RKHS, positive definiteness of the kernel is a required condition for the convergence of many kernel based algorithms. For instance, the Support Vector Machine (SVM) learning problem is convex only when the kernel is positive definite [14]. Similarly, positive definiteness of all participating kernels is required to guarantee the convexity in Multiple Kernel Learning (MKL) [22]. Although theories have been proposed to exploit non-positive definite kernels [12, 23], they have not experienced a widespread success. Many of these methods first enforce positive definiteness of the kernel by flipping or shifting its negative eigenvalues [23]. As a consequence, they result in a loss of information and become inapplicable with large sized kernels that are not uncommon in learning problems. Recently, mean-shift clustering with a positive definite heat kernel on Riemannian manifolds was introduced [4]. However, due to the mathematical complexity of the kernel function, computing it is not tractable and hence only an approximation of the true kernel was used in t... |

16 |
Kernel analysis over riemannian manifolds for visual recognition of actions, pedestrians and textures.
- Harandi, Sanderson, et al.
- 2012
(Show Context)
Citation Context ...cation algorithms on non-linear manifolds. Dimensionality reduction and clustering on Sym+d was demonstrated in [8] with Riemannian versions of the Laplacian Eigenmaps (LE), Locally Linear Embedding (LLE) and Hessian LLE (HLLE). Clustering was performed in a low dimensional space after dimensionality reduction, which does not necessarily preserve all the information in the original data distribution. We instead utilize our kernels to perform clustering in a higher dimensional RKHS that embeds Sym+d . The use of kernels on Sym+d has previously been advocated for locality preserving projections [10] and sparse coding [9]. In the first case, the kernel, derived from the affine-invariant distance, is not positive definite in general [10]. In the second case, the kernel uses the Stein divergence, which is not a true geodesic distance, as the distance measure and is positive definite only for some values of the Gaussian bandwidth parameter σ [9]. For all kernel methods, the optimal choice of σ largely depends on the data distribution and hence constraints on σ are not desirable. Moreover, many popular automatic model selection methods require σ to be continuously variable [5]. Other than for... |

13 |
Positive Definite Matrices and the Symmetric Stein Divergence.
- Sra
- 2012
(Show Context)
Citation Context ... product. A number of other metrics have been proposed for Sym+d [7]. The definitions and properties of these metrics are summarized in Table 1. Note that only some of them were derived by considering the Riemannian geometry of the manifold and hence define true geodesic distances. Similar to the log-Euclidean metric, from Theorem 4.1, it directly follows that the Cholesky and power-Euclidean metrics also define positive definite Gaussian kernels for all values of σ. Note that some metrics may yield a positive definite Gaussian kernel for some value of σ only. This, for instance, was shown in [18] for the root Stein divergence metric. No such result is known for the affine-invariant metric. Constraints on σ are nonetheless undesirable, since σ should reflect the data distribution and automatic model selection algorithms require σ to be continuously variable [5]. 76 5. Kernel-based Algorithms on Sym+d A major advantage of being able to compute positive definite kernels on a Riemannian manifold is that it directly allows us to make use of algorithms developed for Rn, while still accounting for the geometry of the manifold. In this section, we discuss the use of four kernel-based algorith... |

7 | SemiIntrinsic Mean-shift on Riemannian Manifolds.
- Caseiro, Henriques, et al.
- 2012
(Show Context)
Citation Context ...s is required to guarantee the convexity in Multiple Kernel Learning (MKL) [22]. Although theories have been proposed to exploit non-positive definite kernels [12, 23], they have not experienced a widespread success. Many of these methods first enforce positive definiteness of the kernel by flipping or shifting its negative eigenvalues [23]. As a consequence, they result in a loss of information and become inapplicable with large sized kernels that are not uncommon in learning problems. Recently, mean-shift clustering with a positive definite heat kernel on Riemannian manifolds was introduced [4]. However, due to the mathematical complexity of the kernel function, computing it is not tractable and hence only an approximation of the true kernel was used in the algorithm. Here, we introduce a family of provably positive definite kernels on Sym+d , and show their benefits in various kernelbased algorithms and on several computer vision tasks. 3. Background In this section, we introduce some notions of Riemannian geometry on the manifold of SPD matrices, and discuss the use of kernel methods on non-linear manifolds. 3.1. The Riemannian Manifold of SPD Matrices A differentiable manifold M ... |