Results 1  10
of
26
Shared Kernel Information Embedding for Discriminative Inference
"... Latent Variable Models (LVM), like the SharedGPLVM and the Spectral Latent Variable Model, help mitigate overfitting when learning discriminative methods from small or moderately sized training sets. Nevertheless, existing methods suffer from several problems: 1) complexity; 2) the lack of explicit ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
Latent Variable Models (LVM), like the SharedGPLVM and the Spectral Latent Variable Model, help mitigate overfitting when learning discriminative methods from small or moderately sized training sets. Nevertheless, existing methods suffer from several problems: 1) complexity; 2) the lack of explicit mappings to and from the latent space; 3) an inability to cope with multimodality; and 4) the lack of a welldefined density over the latent space. We propose a LVM called the Shared Kernel Information Embedding (sKIE). It defines a coherent density over a latent space and multiple input/output spaces (e.g., image features and poses), and it is easy to condition on a latent state, or on combinations of the input/output states. Learning is quadratic, and it works well on small datasets. With datasets too large to learn a coherent global model, one can use sKIE to learn local online models. sKIE permits missing data during inference, and partially labelled data during learning. We use sKIE for human pose inference. 1.
Latent Spaces for Dynamic Movement Primitives
"... Abstract — Dynamic movement primitives (DMPs) have been proposed as a powerful, robust and adaptive tool for planning robot trajectories based on demonstrated example movements. Adaptation of DMPs to new task requirements becomes difficult when demonstrated trajectories are only available in joint s ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
Abstract — Dynamic movement primitives (DMPs) have been proposed as a powerful, robust and adaptive tool for planning robot trajectories based on demonstrated example movements. Adaptation of DMPs to new task requirements becomes difficult when demonstrated trajectories are only available in joint space, because their parameters do not in general correspond to variables meaningful for the task. This problem becomes more severe with increasing number of degrees of freedom and hence is particularly an issue for humanoid movements. It has been shown that DMP parameters can directly relate to task variables, when DMPs are learned in latent spaces resulting from dimensionality reduction of demonstrated trajectories. As we show here, however, standard dimensionality reduction techniques do not in general provide adequate latent spaces which need to be highly regular. In this work we concentrate on learning discrete (pointtopoint) movements and propose a modification of a powerful nonlinear dimensionality reduction technique (Gaussian Process Latent Variable Model). Our modification makes the GPLVM more suitable for the use of DMPs by favouring latent spaces with highly regular structure. Even though in this case the user has to provide a structure hypothesis we show that its precise choice is not important in order to achieve good results. Additionally, we can overcome one of the main disadvantages of the GPLVM with this modification: its dependence on the initialisation of the latent space. We motivate our approach on data from a 7DoF robotic arm and demonstrate its feasibility on a highdimensional human motion capture data set. I.
Distributed optimization of deeply nested systems
, 2012
"... In science and engineering, intelligent processing of complex signals such as images, sound or language is often performed by a parameterized hierarchy of nonlinear processing layers, sometimes biologically inspired. Hierarchical systems (or, more generally, nested systems) offer a way to generate c ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
In science and engineering, intelligent processing of complex signals such as images, sound or language is often performed by a parameterized hierarchy of nonlinear processing layers, sometimes biologically inspired. Hierarchical systems (or, more generally, nested systems) offer a way to generate complex mappings using simple stages. Each layer performs a different operation and achieves an ever more sophisticated representation of the input, as, for example, in an deep artificial neural network, an object recognition cascade in computer vision or a speech frontend processing. Joint estimation of the parameters of all the layers and selection of an optimal architecture is widely considered to be a difficult numerical nonconvex optimization problem, difficult to parallelize for execution in a distributed computation environment, and requiring significant human expert effort, which leads to suboptimal systems in practice. We describe a general mathematical strategy to learn the parameters and, to some extent, the architecture of nested systems, called the method of auxiliary coordinates (MAC). This replaces the original problem involving a deeply nested function with a constrained problem involving a different function in an augmented space without nesting. The constrained problem may be solved with penaltybased methods using alternating optimization over the parameters and the auxiliary coordinates. MAC has provable convergence, is easy to implement reusing existing algorithms for single layers, can be parallelized trivially and massively, applies even when parameter derivatives are not available or not desirable, and is competitive with stateoftheart nonlinear optimizers even in the serial computation setting, often providing reasonable models within a few iterations.
Kernel Information Embeddings
 In ICML
, 2006
"... We describe a family of embedding algorithms that are based on nonparametric estimates of mutual information (MI). Using Parzen window estimates of the distribution in the joint (input, embedding)space, we derive a MIbased objective function for dimensionality reduction that can be optimized direc ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
(Show Context)
We describe a family of embedding algorithms that are based on nonparametric estimates of mutual information (MI). Using Parzen window estimates of the distribution in the joint (input, embedding)space, we derive a MIbased objective function for dimensionality reduction that can be optimized directly with respect to a set of latent data representatives. Various types of supervision signal can be introduced within the framework by replacing plain MI with several forms of conditional MI. Examples of the semi(un)supervised algorithms that we obtain this way are a new model for manifold alignment, and a new type of embedding method that performs ’conditional dimensionality reduction’. 1.
Dimensionality Reduction and Principal Surfaces via Kernel Map Manifolds
"... We present a manifold learning approach to dimensionality reduction that explicitly models the manifold as a mapping from low to high dimensional space. The manifold is represented as a parametrized surface represented by a set of parameters that are defined on the input samples. The representation ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
We present a manifold learning approach to dimensionality reduction that explicitly models the manifold as a mapping from low to high dimensional space. The manifold is represented as a parametrized surface represented by a set of parameters that are defined on the input samples. The representation also provides a natural mapping from high to low dimensional space, and a concatenation of these two mappings induces a projection operator onto the manifold. The explicit projection operator allows for a clearly defined objective function in terms of projection distance and reconstruction error. A formulation of the mappings in terms of kernel regression permits a direct optimization of the objective function and the extremal points converge to principal surfaces as the number of data to learn from increases. Principal surfaces have the desirable property that they, informally speaking, pass through the middle of a distribution. We provide a proof on the convergence to principal surfaces and illustrate the effectiveness of the proposed approach on synthetic and real data sets. 1.
H.: Variants of Unsupervised Kernel Regression: General loss functions
 In: Proc. European Symposium on Artificial Neural Networks
, 2006
"... Abstract. We present an extension to a recent method for learning of nonlinear manifolds, which allows to incorporate general cost functions. We focus on the ɛinsensitive loss and visually demonstrate our method on both toy and real data. 1 ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Abstract. We present an extension to a recent method for learning of nonlinear manifolds, which allows to incorporate general cost functions. We focus on the ɛinsensitive loss and visually demonstrate our method on both toy and real data. 1
A LeaveKOut CrossValidation Scheme for Unsupervised Kernel Regression
"... We show how to employ leaveKout crossvalidation in Unsupervised Kernel Regression, a recent method for learning of nonlinear manifolds. We thereby generalize an already present regularization method, yielding more flexibility without additional computational cost. We demonstrate our method on bot ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
(Show Context)
We show how to employ leaveKout crossvalidation in Unsupervised Kernel Regression, a recent method for learning of nonlinear manifolds. We thereby generalize an already present regularization method, yielding more flexibility without additional computational cost. We demonstrate our method on both toy and real data. 1
Improving dimensionality reduction with spectral gradient descent
 NEURAL NETWORKS
, 2005
"... We introduce spectral gradient descent, a way of improving iterative dimensionality reduction techniques. 1 The method uses information contained in the leading eigenvalues of a data affinity matrix to modify the steps taken during a gradientbased optimization procedure. We show that the approach i ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We introduce spectral gradient descent, a way of improving iterative dimensionality reduction techniques. 1 The method uses information contained in the leading eigenvalues of a data affinity matrix to modify the steps taken during a gradientbased optimization procedure. We show that the approach is able to speed up the optimization and to help dimensionality reduction methods find better local minima of their objective functions. We also provide an interpretation of our approach in terms of the power method for finding the leading eigenvalues of a symmetric matrix and verify the usefulness of the approach in some simple experiments.
Does Dimensionality Reduction Improve the Quality of Motion Interpolation?
"... Abstract. In recent years nonlinear dimensionality reduction has frequently been suggested for the modelling of highdimensional motion data. While it is intuitively plausible to use dimensionality reduction to recover low dimensional manifolds which compactly represent a given set of movements, the ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract. In recent years nonlinear dimensionality reduction has frequently been suggested for the modelling of highdimensional motion data. While it is intuitively plausible to use dimensionality reduction to recover low dimensional manifolds which compactly represent a given set of movements, there is a lack of critical investigation into the quality of resulting representations, in particular with respect to generalisability. Furthermore it is unclear how consistently particular methods can achieve good results. Here we use a set of robotic motion data for which we know the ground truth to evaluate a range of nonlinear dimensionality reduction methods with respect to the quality of motion interpolation. We show that results are extremely sensitive to parameter settings and data set used, but that dimensionality reduction can potentially improve the quality of linear motion interpolation, in particular in the presence of noise. 1