Results 1  10
of
29
Learning the Kernel with Hyperkernels
, 2003
"... This paper addresses the problem of choosing a kernel suitable for estimation with a Support Vector Machine, hence further automating machine learning. This goal is achieved by defining a Reproducing Kernel Hilbert Space on the space of kernels itself. Such a formulation leads to a statistical es ..."
Abstract

Cited by 79 (2 self)
 Add to MetaCart
This paper addresses the problem of choosing a kernel suitable for estimation with a Support Vector Machine, hence further automating machine learning. This goal is achieved by defining a Reproducing Kernel Hilbert Space on the space of kernels itself. Such a formulation leads to a statistical estimation problem very much akin to the problem of minimizing a regularized risk functional.
Matrix exponentiated gradient updates for online learning and Bregman projections
 Journal of Machine Learning Research
, 2005
"... We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that ..."
Abstract

Cited by 47 (9 self)
 Add to MetaCart
We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: Online learning with a simple square loss and finding a symmetric positive definite matrix subject to symmetric linear constraints. The updates generalize the Exponentiated Gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive definite matrix of trace one instead of a probability vector (which in this context is a diagonal positive definite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive definiteness. Most importantly, we show how the analysis of each algorithm generalizes to the nondiagonal case. We apply both new algorithms, called the Matrix Exponentiated Gradient (MEG) update and DefiniteBoost, to learn a kernel matrix from distance measurements. 1
Fluid registration of diffusion tensor images using information theory
 IEEE Trans. Med. Imaging
, 2008
"... Abstract—We apply an informationtheoretic cost metric, the symmetrized KullbackLeibler (sKL) divergence, ordivergence, to fluid registration of diffusion tensor images. The difference between diffusion tensors is quantified based on the sKLdivergence of their associated probability density funct ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
Abstract—We apply an informationtheoretic cost metric, the symmetrized KullbackLeibler (sKL) divergence, ordivergence, to fluid registration of diffusion tensor images. The difference between diffusion tensors is quantified based on the sKLdivergence of their associated probability density functions (PDFs). Threedimensional DTI data from 34 subjects were fluidly registered to an optimized target image. To allow large image deformations but preserve image topology, we regularized the flow with a largedeformation diffeomorphic mapping based on the kinematics of a NavierStokes fluid. A driving force was developed to minimize thedivergence between the deforming source and target diffusion functions, while reorienting the flowing tensors to preserve fiber topography. In initial experiments, we showed that the sKLdivergence based on full diffusion PDFs is adaptable to higherorder diffusion models, such as high angular resolution diffusion imaging (HARDI). The sKLdivergence was sensitive to subtle differences between two diffusivity profiles, showing promise for nonlinear registration applications and multisubject statistical analysis of HARDI data. Index Terms—Diffusion tensor imaging (DTI), fluid registration, high angular resolution diffusion imaging (HARDI), KullbackLeibler divergence. I.
Learning to learn and collaborative filtering
 In Neural Information Processing Systems Workshop on Inductive Transfer: 10 Years Later
, 2005
"... This paper reviews several recent multitask learning algorithms in a general framework. Interestingly, the framework establishes a connection to recent collaborative filtering algorithms using lowrank matrix approximation. This connection suggests to build a more general nonparametric approach to ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
This paper reviews several recent multitask learning algorithms in a general framework. Interestingly, the framework establishes a connection to recent collaborative filtering algorithms using lowrank matrix approximation. This connection suggests to build a more general nonparametric approach to collaborative preference learning that additionally explores the content features of items. 1
Kernelizing the output of treebased methods
 In International conference on machine learning
, 2006
"... We extend treebased methods to the prediction of structured outputs using a kernelization of the algorithm that allows one to grow trees as soon as a kernel can be defined on the output space. The resulting algorithm, called output kernel trees (OK3), generalizes classification and regression trees ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
We extend treebased methods to the prediction of structured outputs using a kernelization of the algorithm that allows one to grow trees as soon as a kernel can be defined on the output space. The resulting algorithm, called output kernel trees (OK3), generalizes classification and regression trees as well as treebased ensemble methods in a principled way. It inherits several features of these methods such as interpretability, robustness to irrelevant variables, and input scalability. When only the Gram matrix over the outputs of the learning sample is given, it learns the output kernel as a function of inputs. We show that the proposed algorithm works well on an image reconstruction task and on a biological network inference problem. 1.
Bayesian Inference for Transductive Learning of Kernel Matrix Using the TannerWong Data Augmentation Algorithm
 In Proceedings of the TwentyFirst International Conference on Machine Learning
, 2004
"... In kernel methods, an interesting recent development seeks to learn a good kernel from empirical data automatically. In this paper, by regarding the transductive learning of the kernel matrix as a missing data problem, we propose a Bayesian hierarchical model for the problem and devise the Ta ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
In kernel methods, an interesting recent development seeks to learn a good kernel from empirical data automatically. In this paper, by regarding the transductive learning of the kernel matrix as a missing data problem, we propose a Bayesian hierarchical model for the problem and devise the TannerWong data augmentation algorithm for making inference on the model. The TannerWong algorithm is closely related to Gibbs sampling, and it also bears a strong resemblance to the expectationmaximization (EM) algorithm. For an e#cient implementation, we propose a simplified Bayesian hierarchical model and the corresponding TannerWong algorithm. We express the relationship between the kernel on the input space and the kernel on the output space as a symmetricdefinite generalized eigenproblem.
Modelbased transductive learning of the kernel matrix
 Machine Learning
, 2006
"... This paper addresses the problem of transductive learning of the kernel matrix from a probabilistic perspective. We define the kernel matrix as a Wishart process prior and construct a hierarchical generative model for kernel matrix learning. Specifically, we consider the target kernel matrix as a r ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
This paper addresses the problem of transductive learning of the kernel matrix from a probabilistic perspective. We define the kernel matrix as a Wishart process prior and construct a hierarchical generative model for kernel matrix learning. Specifically, we consider the target kernel matrix as a random matrix following the Wishart distribution with a positive definite parameter matrix and a degree of freedom. This parameter matrix, in turn, has the inverted Wishart distribution (with a positive definite hyperparameter matrix) as its conjugate prior and the degree of freedom is equal to the dimensionality of the feature space induced by the target kernel. Resorting to a missing data problem, we devise an expectationmaximization (EM) algorithm to infer the missing data, parameter matrix and feature dimensionality in a maximum a posteriori (MAP) manner. Using different settings for the target kernel and hyperparameter matrices, our model can be applied to different types of learning problems. In particular, we consider its application in a semisupervised learning setting and present two classification methods. Classification experiments are reported on some benchmark data sets with encouraging results. In addition, we also devise the EM algorithm for kernel matrix completion.
Enhanced Protein Fold Recognition through a Novel Data Integration Approach
"... Background: Protein fold recognition is a key step in protein threedimensional (3D) structure discovery. There are multiple fold discriminatory data sources which use physicochemical and structural properties as well as further data sources derived from local sequence alignments. This raises the is ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Background: Protein fold recognition is a key step in protein threedimensional (3D) structure discovery. There are multiple fold discriminatory data sources which use physicochemical and structural properties as well as further data sources derived from local sequence alignments. This raises the issue of finding the most efficient method for combining these different informative data sources and exploring their relative significance for protein fold classification. Kernel methods have been extensively used for biological data analysis. They can incorporate separate fold discriminatory features into kernel matrices which encode the similarity between samples in their respective data sources. Results: In this paper we consider the problem of integrating multiple data sources using a kernelbased approach. We propose a novel informationtheoretic approach based on a KullbackLeibler (KL) divergence between the output kernel matrix and the input kernel matrix so as to integrate heterogeneous data sources. One of the most appealing properties of this approach is that it can easily cope with multiclass classification and multitask learning by an appropriate choice of the output kernel matrix. Based on the position of the output and input kernel matrices in the KLdivergence objective, there are two formulations which we respectively refer to as MKLdivdc and MKLdivconv. We propose to efficiently solve MKLdivdc by a difference of convex (DC) programming method
Protein functional class prediction with a combined graph
 Proceedings of the Korean Data Mining Conference
, 2004
"... Abstract. In bioinformatics, there exist multiple descriptions of graphs for the same set of genes or proteins. For instance, in yeast systems, graph edges can represent different relationships such as proteinprotein interactions, genetic interactions, or coparticipation in a protein complex, etc. ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Abstract. In bioinformatics, there exist multiple descriptions of graphs for the same set of genes or proteins. For instance, in yeast systems, graph edges can represent different relationships such as proteinprotein interactions, genetic interactions, or coparticipation in a protein complex, etc. Relying on similarities between nodes, each graph can be used independently for prediction of protein function. However, since different graphs contain partly independent and partly complementary information about the problem at hand, one can enhance the total information extracted by combining all graphs. In this paper, we propose a method for integrating multiple graphs within a framework of semisupervised learning. The method alternates between minimizing the objective function with respect to network output and with respect to combining weights. We apply the method to the task of protein functional class prediction in yeast. The proposed method performs significantly better than the same algorithm trained on any single graph. 1
Classification using nonstandard metrics
, 2005
"... A large variety of supervised or unsupervised learning algorithms is based on a metric or similarity measure of the patterns in input space. Often, the standard euclidean metric is not sufficient and much more efficient and powerful approximators can be constructed based on more complex similarity c ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
A large variety of supervised or unsupervised learning algorithms is based on a metric or similarity measure of the patterns in input space. Often, the standard euclidean metric is not sufficient and much more efficient and powerful approximators can be constructed based on more complex similarity calculations such as kernels or learning metrics. This procedure is benefitial for data in euclidean space and it is crucial for more complex data structures such as occur in bioinformatics or natural language processing. In this article, we review similarity based methods and its combination with similarity measures which go beyond the standard Euclidian metric. Thereby, we focus on general unifying principles of learning using nonstandard metrics and metric adaptation.