Results 1 - 10
of
24
Reverse multi-label learning
- Advances in Neural Information Processing Systems 23
, 2010
"... Multi-label classification is the task of predicting potentially multiple labels for a given instance. This is common in several applications such as image annotation, document classification and gene function prediction. In this paper we present a formulation for this problem based on reverse predi ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
(Show Context)
Multi-label classification is the task of predicting potentially multiple labels for a given instance. This is common in several applications such as image annotation, document classification and gene function prediction. In this paper we present a formulation for this problem based on reverse prediction: we predict sets of instances given the labels. By viewing the problem from this perspective, the most popular quality measures for assessing the performance of multi-label classification admit relaxations that can be efficiently optimised. We optimise these relaxations with standard algorithms and compare our results with several stateof-the-art methods, showing excellent performance. 1
Transduction with Matrix Completion: Three Birds with One Stone
"... We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multi-label learning, where each item has more than one label, ii) transduction, where most of these labels are un ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
(Show Context)
We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multi-label learning, where each item has more than one label, ii) transduction, where most of these labels are unspecified, and iii) missing data, where a large number of features are missing. We obtained satisfactory results on several real-world tasks, suggesting that the low rank assumption may not be as restrictive as it seems. Our method allows for different loss functions to apply on the feature and label entries of the matrix. The resulting nuclear norm minimization problem is solved with a modified fixed-point continuation method that is guaranteed to find the global optimum. 1
A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics
- IJCV
"... This paper investigates the problem of modeling Internet images and associated text or tags for tasks such as image-to-image search, tag-to-image search, and image-to-tag search (image annotation). We start with canonical correlation analysis (CCA), a popular and successful approach for mapping vis ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
This paper investigates the problem of modeling Internet images and associated text or tags for tasks such as image-to-image search, tag-to-image search, and image-to-tag search (image annotation). We start with canonical correlation analysis (CCA), a popular and successful approach for mapping visual and textual features to the same latent space, and incorporate a third view capturing high-level image semantics, represented either by a single category or multiple non-mutually-exclusive concepts. We present two ways to train the three-view embedding: supervised, with the third view coming from ground-truth labels or search keywords; and unsupervised, with semantic themes automatically obtained by clustering the tags. To ensure high accuracy for retrieval tasks while keeping the learning process scalable, we combine multiple strong visual features
Multi-Label Output Codes using Canonical Correlation Analysis
"... Traditional error-correctingoutput codes (E-COCs) decompose a multi-class classification problem into many binary problems. Although it seems natural to use ECOCs for multi-label problems as well, doing so naively createsissues related to: the validity of the encoding, the efficiency of the decoding ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
(Show Context)
Traditional error-correctingoutput codes (E-COCs) decompose a multi-class classification problem into many binary problems. Although it seems natural to use ECOCs for multi-label problems as well, doing so naively createsissues related to: the validity of the encoding, the efficiency of the decoding, the predictabilityofthegeneratedcodeword,and the exploitation of the label dependency. Using canonical correlation analysis, we propose an error-correcting code for multi-label classification. Labeldependencyischaracterized as the most predictable directions in the label space, which are extracted as canonical output variates and encoded into the codeword. Predictions for the codeword define a graphical model of labels with both Bernoulli potentials (from classifiers on the labels) and Gaussian potentials (from regression on the canonical output variates). Decoding is performed by mean-field approximation. We establish connections between the proposed code and research areas such as compressed sensing and ensemble learning. Some of these connections contribute to better understanding of the new code, and others lead to practical improvements in code design. In our empirical study, the proposed code leads to substantial improvements compared to various competitors in music emotion classification and outdoor scene recognition. 1
Learning and inference in probabilistic classifier chains with beam search
- In Proceedings of the European
, 2012
"... Abstract. Multilabel learning is an extension of binary classification that is both challenging and practically important. Recently, a method for multilabel learning called probabilistic classifier chains (PCCs) was proposed with numerous appealing properties, such as conceptual sim-plicity, flexibi ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Multilabel learning is an extension of binary classification that is both challenging and practically important. Recently, a method for multilabel learning called probabilistic classifier chains (PCCs) was proposed with numerous appealing properties, such as conceptual sim-plicity, flexibility, and theoretical justification. However, PCCs suffer from the computational issue of having inference that is exponential in the number of tags, and the practical issue of being sensitive to the suit-able ordering of the tags while training. In this paper, we show how the classical technique of beam search may be used to solve both these prob-lems. Specifically, we show how to use beam search to perform tractable test time inference, and how to integrate beam search with training to determine a suitable tag ordering. Experimental results on a range of multilabel datasets show that these proposed changes dramatically ex-tend the practical viability of PCCs. 1
Sparse Bayesian Multi-Task Learning
"... We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity in ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1
Maximum Margin Multi-Label Structured Prediction
- In NIPS
, 2011
"... We study multi-label prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in com-putational biology, and graph matching with symmetries. Conventional multi-label classification techniques are typically not applicable ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
We study multi-label prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in com-putational biology, and graph matching with symmetries. Conventional multi-label classification techniques are typically not applicable in this situation, be-cause they require explicit enumeration of the label set, which is infeasible in case of structured outputs. Relying on techniques originally designed for single-label structured prediction, in particular structured support vector machines, results in reduced prediction accuracy, or leads to infeasible optimization problems. In this work we derive a maximum-margin training formulation for multi-label structured prediction that remains computationally tractable while achieving high prediction accuracy. It also shares most beneficial properties with single-label maximum-margin approaches, in particular formulation as a convex optimization problem, efficient working set training, and PAC-Bayesian generalization bounds. 1
Exploiting Tag and Word Correlations for Improved Webpage Clustering
"... Automatic clustering of webpages helps a number of information retrieval tasks, such as improving user interfaces, collection clustering, introducing diversity in search results, etc. Typically, webpage clustering algorithms only use features extracted from the page-text. However, the advent of soci ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Automatic clustering of webpages helps a number of information retrieval tasks, such as improving user interfaces, collection clustering, introducing diversity in search results, etc. Typically, webpage clustering algorithms only use features extracted from the page-text. However, the advent of social-bookmarking websites, such as StumbleUpon 1 and Delicious 2, has led to a huge amount of user-generated content such as the tag information that is associated with the webpages. Inthispaper,wepresentasubspacebasedfeature extractionapproachwhichleveragestaginformationtocomplement the page-contents of a webpage to extract highly discriminative features, with the goal of improved clustering performance. In our approach, we consider page-text and tags as two separate views of the data, and learn a shared subspace that maximizes the correlation between the two views. Any clustering algorithm can then be applied in this subspace. We compare our subspace based approach with a numberofbaselinesthatusetaginformationinvariousother ways, and show that the subspace based approach leads to improved performance on the webpage clustering task. Although our results here are on the webpage clustering task, the same approach can be used for webpage classification as well. In the end, we also suggest possible future work for leveraging tag information in webpage clustering, especially when tag information is present for not all, but only for a small number of webpages. Also holds an adjunct position with the School of Computing,
The Supervised IBP: Neighbourhood Preserving Infinite Latent Feature Models
"... We propose a probabilistic model to infer supervised latent variables in the Hamming space from observed data. Our model al-lows simultaneous inference of the number of binary latent variables, and their values. The latent variables preserve neighbourhood structure of the data in a sense that object ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We propose a probabilistic model to infer supervised latent variables in the Hamming space from observed data. Our model al-lows simultaneous inference of the number of binary latent variables, and their values. The latent variables preserve neighbourhood structure of the data in a sense that objects in the same semantic concept have similar latent values, and objects in different con-cepts have dissimilar latent values. We for-mulate the supervised infinite latent variable problem based on an intuitive principle of pulling objects together if they are of the same type, and pushing them apart if they are not. We then combine this principle with a flexible Indian Buffet Process prior on the latent variables. We show that the inferred supervised latent variables can be directly used to perform a nearest neighbour search for the purpose of retrieval. We introduce a new application of dynamically extending hash codes, and show how to effectively cou-ple the structure of the hash codes with con-tinuously growing structure of the neighbour-hood preserving infinite latent feature space. 1
D.: Semi-supervised multi-label classification - a simultaneous largemargin, subspace learning approach
- In: ECML/PKDD
"... Abstract. Labeled data is often sparse in common learning scenarios, either because it is too time consuming or too expensive to obtain, while unlabeled data is almost always plentiful. This asymmetry is exacerbated in multi-label learning, where the labeling process is more complex than in the sing ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Labeled data is often sparse in common learning scenarios, either because it is too time consuming or too expensive to obtain, while unlabeled data is almost always plentiful. This asymmetry is exacerbated in multi-label learning, where the labeling process is more complex than in the single label case. Although it is important to consider semisupervised methods for multi-label learning, as it is in other learning scenarios, surprisingly, few proposals have been investigated for this particular problem. In this paper, we present a new semi-supervised multilabel learning method that combines large-margin multi-label classification with unsupervised subspace learning. We propose an algorithm that learns a subspace representation of the labeled and unlabeled inputs, while simultaneously training a supervised large-margin multi-label classifier on the labeled portion. Although joint training of these two interacting components might appear intractable, we exploit recent developments in induced matrix norm optimization to show that these two problems can be solved jointly, globally and efficiently. In particular, we develop an efficient training procedure based on subgradient search and a simple coordinate descent strategy. An experimental evaluation demonstrates that semi-supervised subspace learning can improve the performance of corresponding supervised multi-label learning methods.