Results 1  10
of
52
Learning the Kernel with Hyperkernels
, 2003
"... This paper addresses the problem of choosing a kernel suitable for estimation with a Support Vector Machine, hence further automating machine learning. This goal is achieved by defining a Reproducing Kernel Hilbert Space on the space of kernels itself. Such a formulation leads to a statistical es ..."
Abstract

Cited by 113 (2 self)
 Add to MetaCart
This paper addresses the problem of choosing a kernel suitable for estimation with a Support Vector Machine, hence further automating machine learning. This goal is achieved by defining a Reproducing Kernel Hilbert Space on the space of kernels itself. Such a formulation leads to a statistical estimation problem very much akin to the problem of minimizing a regularized risk functional. We state the
Matrix exponentiated gradient updates for online learning and Bregman projections
 Journal of Machine Learning Research
, 2005
"... We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that ..."
Abstract

Cited by 71 (11 self)
 Add to MetaCart
(Show Context)
We address the problem of learning a symmetric positive definite matrix. The central issue is to design parameter updates that preserve positive definiteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: Online learning with a simple square loss and finding a symmetric positive definite matrix subject to symmetric linear constraints. The updates generalize the Exponentiated Gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive definite matrix of trace one instead of a probability vector (which in this context is a diagonal positive definite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive definiteness. Most importantly, we show how the analysis of each algorithm generalizes to the nondiagonal case. We apply both new algorithms, called the Matrix Exponentiated Gradient (MEG) update and DefiniteBoost, to learn a kernel matrix from distance measurements. 1
DTI segmentation using an information theoretic tensor dissimilarity measure
 IEEE Transactions on Medical Imaging
, 2005
"... Abstract—In recent years, diffusion tensor imaging (DTI) has become a popular in vivo diagnostic imaging technique in Radiological sciences. In order for this imaging technique to be more effective, proper image analysis techniques suited for analyzing these high dimensional data need to be devel ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
Abstract—In recent years, diffusion tensor imaging (DTI) has become a popular in vivo diagnostic imaging technique in Radiological sciences. In order for this imaging technique to be more effective, proper image analysis techniques suited for analyzing these high dimensional data need to be developed. In this paper, we present a novel definition of tensor “distance” grounded in concepts from information theory and incorporate it in the segmentation of DTI. In a DTI, the symmetric positive definite (SPD) diffusion tensor at each voxel can be interpreted as the covariance matrix of a local Gaussian distribution. Thus, a natural measure of dissimilarity between SPD tensors would be the KullbackLeibler (KL) divergence or its relative. We propose the square root of the Jdivergence (symmetrized KL) between two Gaussian distributions corresponding to the diffusion tensors being compared and this leads to a novel closed form expression for the “distance ” as well as the mean value of a DTI. Unlike the traditional Frobenius normbased tensor distance, our “distance” is affine invariant, a desirable property in segmentation and many other applications. We then incorporate this new tensor “distance” in a region based active contour model for DTI segmentation. Synthetic and real data experiments are shown to depict the performance of the proposed model. Index Terms—Diffusion tensor MRI, image segmentation, KullbackLeibler divergence, Jdivergence, MumfordShah functional, active contour.
Fluid registration of diffusion tensor images using information theory
 IEEE Trans. Med. Imaging
, 2008
"... Abstract—We apply an informationtheoretic cost metric, the symmetrized KullbackLeibler (sKL) divergence, ordivergence, to fluid registration of diffusion tensor images. The difference between diffusion tensors is quantified based on the sKLdivergence of their associated probability density funct ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
(Show Context)
Abstract—We apply an informationtheoretic cost metric, the symmetrized KullbackLeibler (sKL) divergence, ordivergence, to fluid registration of diffusion tensor images. The difference between diffusion tensors is quantified based on the sKLdivergence of their associated probability density functions (PDFs). Threedimensional DTI data from 34 subjects were fluidly registered to an optimized target image. To allow large image deformations but preserve image topology, we regularized the flow with a largedeformation diffeomorphic mapping based on the kinematics of a NavierStokes fluid. A driving force was developed to minimize thedivergence between the deforming source and target diffusion functions, while reorienting the flowing tensors to preserve fiber topography. In initial experiments, we showed that the sKLdivergence based on full diffusion PDFs is adaptable to higherorder diffusion models, such as high angular resolution diffusion imaging (HARDI). The sKLdivergence was sensitive to subtle differences between two diffusivity profiles, showing promise for nonlinear registration applications and multisubject statistical analysis of HARDI data. Index Terms—Diffusion tensor imaging (DTI), fluid registration, high angular resolution diffusion imaging (HARDI), KullbackLeibler divergence. I.
Enhanced Protein Fold Recognition through a Novel Data Integration Approach
"... Background: Protein fold recognition is a key step in protein threedimensional (3D) structure discovery. There are multiple fold discriminatory data sources which use physicochemical and structural properties as well as further data sources derived from local sequence alignments. This raises the is ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Background: Protein fold recognition is a key step in protein threedimensional (3D) structure discovery. There are multiple fold discriminatory data sources which use physicochemical and structural properties as well as further data sources derived from local sequence alignments. This raises the issue of finding the most efficient method for combining these different informative data sources and exploring their relative significance for protein fold classification. Kernel methods have been extensively used for biological data analysis. They can incorporate separate fold discriminatory features into kernel matrices which encode the similarity between samples in their respective data sources. Results: In this paper we consider the problem of integrating multiple data sources using a kernelbased approach. We propose a novel informationtheoretic approach based on a KullbackLeibler (KL) divergence between the output kernel matrix and the input kernel matrix so as to integrate heterogeneous data sources. One of the most appealing properties of this approach is that it can easily cope with multiclass classification and multitask learning by an appropriate choice of the output kernel matrix. Based on the position of the output and input kernel matrices in the KLdivergence objective, there are two formulations which we respectively refer to as MKLdivdc and MKLdivconv. We propose to efficiently solve MKLdivdc by a difference of convex (DC) programming method
Learning to learn and collaborative filtering
 In Neural Information Processing Systems Workshop on Inductive Transfer: 10 Years Later
, 2005
"... This paper reviews several recent multitask learning algorithms in a general framework. Interestingly, the framework establishes a connection to recent collaborative filtering algorithms using lowrank matrix approximation. This connection suggests to build a more general nonparametric approach to ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
This paper reviews several recent multitask learning algorithms in a general framework. Interestingly, the framework establishes a connection to recent collaborative filtering algorithms using lowrank matrix approximation. This connection suggests to build a more general nonparametric approach to collaborative preference learning that additionally explores the content features of items. 1
Kernelizing the output of treebased methods
 In International conference on machine learning
, 2006
"... We extend treebased methods to the prediction of structured outputs using a kernelization of the algorithm that allows one to grow trees as soon as a kernel can be defined on the output space. The resulting algorithm, called output kernel trees (OK3), generalizes classification and regression trees ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
We extend treebased methods to the prediction of structured outputs using a kernelization of the algorithm that allows one to grow trees as soon as a kernel can be defined on the output space. The resulting algorithm, called output kernel trees (OK3), generalizes classification and regression trees as well as treebased ensemble methods in a principled way. It inherits several features of these methods such as interpretability, robustness to irrelevant variables, and input scalability. When only the Gram matrix over the outputs of the learning sample is given, it learns the output kernel as a function of inputs. We show that the proposed algorithm works well on an image reconstruction task and on a biological network inference problem. 1.
Analytical Kernel Matrix Completion with Incomplete MultiView Data
 Proc. Int’l Conf. Machine Learning Workshop on Learning with Multiple Views
, 2005
"... In multiview remote sensing applications, incomplete data can result when only a subset of sensors are deployed at certain regions. We derive a closedform expression for computing a Gaussian kernel when faced with incomplete data. This expression is obtained by analytically integrating out the ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
In multiview remote sensing applications, incomplete data can result when only a subset of sensors are deployed at certain regions. We derive a closedform expression for computing a Gaussian kernel when faced with incomplete data. This expression is obtained by analytically integrating out the missing data. This result can subsequently be used in conjunction with any kernelbased classifier. The superiority of the proposed method over two common imputation schemes is demonstrated on one benchmark data set and three real (measured) multiview land mine data sets. 1.
On classification with incomplete data
 IEEE Transactions on Pattern Analysis and Machine Intelligence
"... Abstract—We address the incompletedata problem in which feature vectors to be classified are missing data (features). A (supervised) logistic regression algorithm for the classification of incomplete data is developed. Single or multiple imputation for the missing data is avoided by performing anal ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
Abstract—We address the incompletedata problem in which feature vectors to be classified are missing data (features). A (supervised) logistic regression algorithm for the classification of incomplete data is developed. Single or multiple imputation for the missing data is avoided by performing analytic integration with an estimated conditional density function (conditioned on the observed data). Conditional density functions are estimated using a Gaussian mixture model (GMM), with parameter estimation performed using both ExpectationMaximization (EM) and Variational Bayesian EM (VBEM). The proposed supervised algorithm is then extended to the semisupervised case by incorporating graphbased regularization. The semisupervised algorithm utilizes all available data—both incomplete and complete, as well as labeled and unlabeled. Experimental results of the proposed classification algorithms are shown. Index Terms—Classification, incomplete data, missing data, supervised learning, semisupervised learning, imperfect labeling. Ç 1
Classification using nonstandard metrics
, 2005
"... A large variety of supervised or unsupervised learning algorithms is based on a metric or similarity measure of the patterns in input space. Often, the standard euclidean metric is not sufficient and much more efficient and powerful approximators can be constructed based on more complex similarity c ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
A large variety of supervised or unsupervised learning algorithms is based on a metric or similarity measure of the patterns in input space. Often, the standard euclidean metric is not sufficient and much more efficient and powerful approximators can be constructed based on more complex similarity calculations such as kernels or learning metrics. This procedure is benefitial for data in euclidean space and it is crucial for more complex data structures such as occur in bioinformatics or natural language processing. In this article, we review similarity based methods and its combination with similarity measures which go beyond the standard Euclidian metric. Thereby, we focus on general unifying principles of learning using nonstandard metrics and metric adaptation.