Results 11  20
of
53
Bayesian Inference for Transductive Learning of Kernel Matrix Using the TannerWong Data Augmentation Algorithm
 In Proceedings of the TwentyFirst International Conference on Machine Learning
, 2004
"... In kernel methods, an interesting recent development seeks to learn a good kernel from empirical data automatically. In this paper, by regarding the transductive learning of the kernel matrix as a missing data problem, we propose a Bayesian hierarchical model for the problem and devise the Ta ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
In kernel methods, an interesting recent development seeks to learn a good kernel from empirical data automatically. In this paper, by regarding the transductive learning of the kernel matrix as a missing data problem, we propose a Bayesian hierarchical model for the problem and devise the TannerWong data augmentation algorithm for making inference on the model. The TannerWong algorithm is closely related to Gibbs sampling, and it also bears a strong resemblance to the expectationmaximization (EM) algorithm. For an e#cient implementation, we propose a simplified Bayesian hierarchical model and the corresponding TannerWong algorithm. We express the relationship between the kernel on the input space and the kernel on the output space as a symmetricdefinite generalized eigenproblem.
Protein functional class prediction with a combined graph
 Proceedings of the Korean Data Mining Conference
, 2004
"... Abstract. In bioinformatics, there exist multiple descriptions of graphs for the same set of genes or proteins. For instance, in yeast systems, graph edges can represent different relationships such as proteinprotein interactions, genetic interactions, or coparticipation in a protein complex, etc. ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Abstract. In bioinformatics, there exist multiple descriptions of graphs for the same set of genes or proteins. For instance, in yeast systems, graph edges can represent different relationships such as proteinprotein interactions, genetic interactions, or coparticipation in a protein complex, etc. Relying on similarities between nodes, each graph can be used independently for prediction of protein function. However, since different graphs contain partly independent and partly complementary information about the problem at hand, one can enhance the total information extracted by combining all graphs. In this paper, we propose a method for integrating multiple graphs within a framework of semisupervised learning. The method alternates between minimizing the objective function with respect to network output and with respect to combining weights. We apply the method to the task of protein functional class prediction in yeast. The proposed method performs significantly better than the same algorithm trained on any single graph. 1
Modelbased transductive learning of the kernel matrix
 Machine Learning
, 2006
"... This paper addresses the problem of transductive learning of the kernel matrix from a probabilistic perspective. We define the kernel matrix as a Wishart process prior and construct a hierarchical generative model for kernel matrix learning. Specifically, we consider the target kernel matrix as a r ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
This paper addresses the problem of transductive learning of the kernel matrix from a probabilistic perspective. We define the kernel matrix as a Wishart process prior and construct a hierarchical generative model for kernel matrix learning. Specifically, we consider the target kernel matrix as a random matrix following the Wishart distribution with a positive definite parameter matrix and a degree of freedom. This parameter matrix, in turn, has the inverted Wishart distribution (with a positive definite hyperparameter matrix) as its conjugate prior and the degree of freedom is equal to the dimensionality of the feature space induced by the target kernel. Resorting to a missing data problem, we devise an expectationmaximization (EM) algorithm to infer the missing data, parameter matrix and feature dimensionality in a maximum a posteriori (MAP) manner. Using different settings for the target kernel and hyperparameter matrices, our model can be applied to different types of learning problems. In particular, we consider its application in a semisupervised learning setting and present two classification methods. Classification experiments are reported on some benchmark data sets with encouraging results. In addition, we also devise the EM algorithm for kernel matrix completion.
Kronecker factorization for speeding up kernel machines
 In SIAM International Conference on Data Mining (SDM
, 2005
"... In kernel machines, such as kernel principal component analysis (KPCA), Gaussian Processes (GPs), and Support Vector Machines (SVMs), the computational complexity of finding a solution is O(n 3), where n is the number of training instances. To reduce this expensive computational complexity, we propo ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
In kernel machines, such as kernel principal component analysis (KPCA), Gaussian Processes (GPs), and Support Vector Machines (SVMs), the computational complexity of finding a solution is O(n 3), where n is the number of training instances. To reduce this expensive computational complexity, we propose using Kronecker factorization, which approximates a positive definite kernel matrix by the Kronecker product of two smaller positive definite matrices. This approximation can speed up the calculation of the kernelmatrix inverse or eigendecomposition involved in kernel machines. When the two factorized matrices have about the same dimensions, the computational complexity is improved from O(n 3) to O(n 2). Furthermore, if n is very large, Kronecker factorization can be recursively applied to further reduce the computational complexity. We propose two methods to carry out Kronecker factorization and apply them to speed up KPCA and GPs. In addition, we propose an effective approximate method for Gaussian process classification by integrating the surrogate maximization algorithm and the Kronecker factorization. Experiments show that our methods can drastically reduce the computation time of kernel machines without any significant degradation in their effectiveness. 1
Learning Discriminative Models with Incomplete Data
, 2006
"... practical problems in pattern recognition require making inferences using multiple modalities, e.g. sensor data from video, audio, physiological changes etc. Often in realworld scenarios there can be incompleteness in the training data. There can be missing channels due to sensor failures in multi ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
practical problems in pattern recognition require making inferences using multiple modalities, e.g. sensor data from video, audio, physiological changes etc. Often in realworld scenarios there can be incompleteness in the training data. There can be missing channels due to sensor failures in multisensory data and many data points in the training set might be unlabeled. Further, instead of having exact labels we might have easy to obtain coarse labels that correlate with the task. Also, there can be labeling errors, for example human annotation can lead to incorrect labels in the training data. The discriminative paradigm of classification aims to model the classification boundary directly by conditioning on the data points; however, discriminative models cannot easily handle incompleteness since the distribution of the observations is never explicitly modeled. We present a unified Bayesian framework that extends the discriminative paradigm to handle four different kinds of incompleteness. First, a solution based on a mixture of Gaussian processes is proposed for achieving sensor fusion under the problematic conditions of missing channels. Second, the framework
A nonparametric Bayesian model for kernel matrix completion
 in ICASSP 2010
"... We present a nonparametric Bayesian model for completing lowrank, positive semidefinite matrices. Given an N × N matrix with underlying rank r, and noisy measured values and missing values with a symmetric pattern, the proposed Bayesian hierarchical model nonparametrically uncovers the underlying r ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
We present a nonparametric Bayesian model for completing lowrank, positive semidefinite matrices. Given an N × N matrix with underlying rank r, and noisy measured values and missing values with a symmetric pattern, the proposed Bayesian hierarchical model nonparametrically uncovers the underlying rank from all positive semidefinite matrices, and completes the matrix by approximating the missing values. We analytically derive all posterior distributions for the fully conjugate model hierarchy and discuss variational Bayes and MCMC Gibbs sampling for inference, as well as an efficient measurement selection procedure. We present results on a toy problem, and a music recommendation problem, where we complete the kernel matrix of 2,250 pieces of music. Index Terms — kernel matrix completion, Bayesian nonparametrics, music recommendation 1.
Large margin transformation learning
, 2009
"... With the current explosion of data coming from many scientific fields and industry, machine learning algorithms are more important than ever to help make sense of this data in an automated manner. Support vector machine (SVMs) have been a very successful learning algorithm for many applied settings. ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
With the current explosion of data coming from many scientific fields and industry, machine learning algorithms are more important than ever to help make sense of this data in an automated manner. Support vector machine (SVMs) have been a very successful learning algorithm for many applied settings. However, the support vector machine only finds linear classifiers so data often needs to be preprocessed with appropriately chosen nonlinear mappings in order to find a model with good predictive properties. These mappings can either take the form of an explicit transformation or be defined implicitly with a kernel function. Automatically choosing these mappings has been studied under the name of kernel learning. These methods typically optimize a cost function to find a kernel made up of a combination of base kernels thus implicitly learning mappings. This dissertation investigates methods for choosing explicit transformations automatically. This setting differs from the kernel learning framework by learning a combination of base transformations rather than base kernels. This allows prior knowledge to be exploited in the functional form of the transformations which may not be easily encoded as kernels such as when learning monotonic