Results 1 - 10
of
10
Large Scale Max-Margin Multi-Label Classification with Priors
"... We propose a max-margin formulation for the multi-label classification problem where the goal is to tag a data point with a set of pre-specified labels. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
We propose a max-margin formulation for the multi-label classification problem where the goal is to tag a data point with a set of pre-specified labels. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this exponentially large label space subject to label correlations. Existing solutions take either of two approaches. The first assumes, a priori, that there are no label correlations and independently trains a classifier for each label (as is done in the 1-vs-All heuristic). This reduces the problem complexity from exponential to linear and such methods can scale to large problems. The second approach explicitly models correlations by pairwise label interactions. However, the complexity remains exponential unless one assumes that label correlations are sparse. Furthermore, the learnt correlations reflect the training set biases. We take a middle approach that assumes labels are correlated but does not incorporate pairwise label terms in the prediction function. We show that the complexity can still be reduced from exponential to linear while modelling dense pairwise label correlations. By incorporating correlation priors we can overcome training set biases and improve prediction accuracy. We provide a principled interpretation of the 1-vs-All method and show
On Multiple Kernel Learning with Multiple Labels
"... For classification with multiple labels, a common approach is to learn a classifier for each label. With a kernel-based classifier, there are two options to set up kernels: select a specific kernel for each label or the same kernel for all labels. In this work, we present a unified framework for mul ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
For classification with multiple labels, a common approach is to learn a classifier for each label. With a kernel-based classifier, there are two options to set up kernels: select a specific kernel for each label or the same kernel for all labels. In this work, we present a unified framework for multi-label multiple kernel learning, in which the above two approaches can be considered as two extreme cases. Moreover, our framework allows the kernels shared partially among multiple labels, enabling flexible degrees of label commonality. We systematically study how the sharing of kernels among multiple labels affects the performance based on extensive experiments on various benchmark data including images and microarray data. Interesting findings concerning efficacy and efficiency are reported. 1
Accelerated gradient methods for stochastic optimization and online learning
- Advances in Neural Information Processing Systems 22
, 2009
"... Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., ℓ1-regularizer). Gradient methods, though highly scalable and easy to implement, are known to converge slowly. In this paper, we develop a novel acce ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., ℓ1-regularizer). Gradient methods, though highly scalable and easy to implement, are known to converge slowly. In this paper, we develop a novel accelerated gradient method for stochastic optimization while still preserving their computational simplicity and scalability. The proposed algorithm, called SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic composite optimization with convex or strongly convex objectives. Experimental results show that SAGE is faster than recent (sub)gradient methods including FOLOS, SMIDAS and SCD. Moreover, SAGE can also be extended for online learning, resulting in a simple algorithm but with the best regret bounds currently known for these problems. 1
Mach Learn manuscript No. (will be inserted by the editor) Efficient Max-Margin Multi-Label Classification with Applications to Zero-Shot Learning
, 2010
"... Abstract The goal in multi-label classification is to tag a data point with the subset of relevant labels from a pre-specified set. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this exponentially large ..."
Abstract
- Add to MetaCart
Abstract The goal in multi-label classification is to tag a data point with the subset of relevant labels from a pre-specified set. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this exponentially large label space subject to label correlations. Our objective, in this paper, is to design efficient algorithms for multi-label classification when the labels are densely correlated. In particular, we are interested in the zero-shot learning scenario where the label correlations on the training set might be significantly different from those on the test set. We propose a max-margin formulation where we model prior label correlations but do not incorporate pairwise label interaction terms in the prediction function. We show that the problem complexity can be reduced from exponential to linear while modelling dense pairwise prior label correlations. By incorporating relevant correlation priors we can handle mismatches between the training and test set statistics. Our proposed formulation generalises the effective 1-vs-All method and we provide a principled interpretation of the 1-vs-All technique. We develop efficient optimisation algorithms for our proposed formulation. We adapt the Sequential Minimal Optimisation (SMO) algorithm to multi-label classification and show that, with some book-keeping, we can reduce the training time from being super-quadratic to almost linear in the number of labels. Furthermore, by effectively re-utilizing the kernel cache and jointly optimising over all variables, we can be orders of magnitude faster than the competing state-of-the-art algorithms. We
Abstract
"... Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability. Unfortunately, ℓ1-norm MKL is hardly observed to outperform tr ..."
Abstract
- Add to MetaCart
Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability. Unfortunately, ℓ1-norm MKL is hardly observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures, we generalize MKL to arbitrary ℓp-norms. We devise new insights on the connection between several existing MKL formulations and develop two efficient interleaved optimization strategies for arbitrary p> 1. Empirically, we demonstrate that the interleaved optimization strategies are much faster compared to the traditionally used wrapper approaches. Finally, we apply ℓp-norm MKL to real-world problems from computational biology, showing that non-sparse MKL achieves accuracies that go beyond the state-of-the-art. 1
Enhanced Protein Fold Recognition through a Novel Data Integration Approach
"... Background: Protein fold recognition is a key step in protein three-dimensional (3D) structure discovery. There are multiple fold discriminatory data sources which use physicochemical and structural properties as well as further data sources derived from local sequence alignments. This raises the is ..."
Abstract
- Add to MetaCart
Background: Protein fold recognition is a key step in protein three-dimensional (3D) structure discovery. There are multiple fold discriminatory data sources which use physicochemical and structural properties as well as further data sources derived from local sequence alignments. This raises the issue of finding the most efficient method for combining these different informative data sources and exploring their relative significance for protein fold classification. Kernel methods have been extensively used for biological data analysis. They can incorporate separate fold discriminatory features into kernel matrices which encode the similarity between samples in their respective data sources. Results: In this paper we consider the problem of integrating multiple data sources using a kernel-based approach. We propose a novel information-theoretic approach based on a Kullback-Leibler (KL) divergence between the output kernel matrix and the input kernel matrix so as to integrate heterogeneous data sources. One of the most appealing properties of this approach is that it can easily cope with multi-class classification and multi-task learning by an appropriate choice of the output kernel matrix. Based on the position of the output and input kernel matrices in the KL-divergence objective, there are two formulations which we respectively refer to as MKLdiv-dc and MKLdiv-conv. We propose to efficiently solve MKLdiv-dc by a difference of convex (DC) programming method
Multi-task Learning in Square Integrable Space
"... Several kernel based methods for multi-task learning have been proposed, which leverage relations among tasks as regularization to enhance the overall learning accuracies. These methods assume that the tasks share the same kernel, which could limit their applications because in practice different ta ..."
Abstract
- Add to MetaCart
Several kernel based methods for multi-task learning have been proposed, which leverage relations among tasks as regularization to enhance the overall learning accuracies. These methods assume that the tasks share the same kernel, which could limit their applications because in practice different tasks may need different kernels. The main challenge of introducing multiple kernels into multiple tasks is that models from different Reproducing Kernel Hilbert Spaces (RKHSs) are not comparable, making it difficult to exploit relations among tasks. This paper addresses the challenge by formalizing the problem in the Square Integrable Space (SIS). Specially, it proposes a kernel based method which makes use of a regularization term defined in the SIS to represent task relations. We prove a new representer theorem for the proposed approach in SIS. We further derive a practical method for solving the learning problem and conduct consistency analysis of the method. We discuss the relations between our method and an existing method. We also give an SVM based implementation of our method for multi-label classification. Experiments on two real-world data sets show that the proposed method performs better than the existing method. 1
Multi-label Multiple Kernel Learning by Stochastic Approximation: Application to Visual Object Recognition
"... Recent studies have shown that multiple kernel learning is very effective for object recognition, leading to the popularity of kernel learning in computer vision problems. In this work, we develop an efficient algorithm for multi-label multiple kernel learning (ML-MKL). We assume that all the classe ..."
Abstract
- Add to MetaCart
Recent studies have shown that multiple kernel learning is very effective for object recognition, leading to the popularity of kernel learning in computer vision problems. In this work, we develop an efficient algorithm for multi-label multiple kernel learning (ML-MKL). We assume that all the classes under consideration share the same combination of kernel functions, and the objective is to find the optimal kernel combination that benefits all the classes. Although several algorithms have been developed for ML-MKL, their computational cost is linear in the number of classes, making them unscalable when the number of classes is large, a challenge frequently encountered in visual object recognition. We address this computational challenge by developing a framework for ML-MKL that combines the worst-case analysis with stochastic approximation. Our analysis shows that the complexity of our algorithm is O(m 1/3 √ lnm), where m is the number of classes. Empirical studies with object recognition show that while achieving similar classification accuracy, the proposed method is significantly more efficient than the state-of-the-art algorithms for ML-MKL. 1
MKBoost: A Framework of Multiple Kernel Boosting
"... Multiple kernel learning (MKL) has been shown as a promising machine learning technique for data mining tasks by integrating with multiple diverse kernel functions. Traditional MKL methods often formulate the problem as an optimization task of learning both optimal combination of kernels and classif ..."
Abstract
- Add to MetaCart
Multiple kernel learning (MKL) has been shown as a promising machine learning technique for data mining tasks by integrating with multiple diverse kernel functions. Traditional MKL methods often formulate the problem as an optimization task of learning both optimal combination of kernels and classifiers, and attempt to resolve the challenging optimization task by various techniques. Unlike the existing MKL methods, in this paper, we investigate a boosting framework of exploring multiple kernel learning for classification tasks. In particular, we present a novel framework of Multiple Kernel Boosting (MKBoost), which applies boosting techniques for learning kernel-based classifiers with multiple kernels. Based on the proposed framework, we develop several variants of MKBoost algorithms and examine their empirical performance in comparisons to several state-of-the-art MKL algorithms on classification tasks. Experimental results show that the proposed method is more effective and efficient than the existing MKL techniques. 1
Mach Learn manuscript No. (willbe insertedby theeditor) Efficient Max-Margin Multi-Label Classification with
"... Received:30September 2010/Accepted: date Abstract The goal in multi-label classification is to tag a data point with the subset of relevant labels from a pre-specified set. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in ..."
Abstract
- Add to MetaCart
Received:30September 2010/Accepted: date Abstract The goal in multi-label classification is to tag a data point with the subset of relevant labels from a pre-specified set. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimisingover thisexponentially largelabel space subject tolabel correlations. Our objective, in this paper, is to design efficient algorithms for multi-label classification when the labels are densely correlated. In particular, we are interested in the zero-shot learning scenario where the label correlations on the training set might be significantly different fromthose onthe testset. We propose a max-margin formulation where we model prior label correlations butdonotincorporatepairwiselabelinteractiontermsinthepredictionfunction.We show that the problem complexity can be reduced from exponential to linear while modelling dense pairwise prior label correlations. By incorporating relevant correlation priors we can handle mismatches between the training and test set statistics. Our proposed formulation generalises the effective 1-vs-All method and we provide aprincipled interpretation ofthe 1-vs-Alltechnique. We develop efficient optimisation algorithms for our proposed formulation. We adapt the Sequential Minimal Optimisation (SMO) algorithm to multi-label classificationandshowthat,withsomebook-keeping,wecanreducethetrainingtimefrom being super-quadratic to almost linear in the number of labels. Furthermore, by effectively re-utilizing the kernel cache and jointly optimising over all variables, we can be orders of magnitude faster than the competing state-of-the-art algorithms. We B.Hariharan University ofCaliforniaat Berkeley

