Results 1  10
of
24
Reverse multilabel learning
 Advances in Neural Information Processing Systems 23
, 2010
"... Multilabel classification is the task of predicting potentially multiple labels for a given instance. This is common in several applications such as image annotation, document classification and gene function prediction. In this paper we present a formulation for this problem based on reverse predi ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
(Show Context)
Multilabel classification is the task of predicting potentially multiple labels for a given instance. This is common in several applications such as image annotation, document classification and gene function prediction. In this paper we present a formulation for this problem based on reverse prediction: we predict sets of instances given the labels. By viewing the problem from this perspective, the most popular quality measures for assessing the performance of multilabel classification admit relaxations that can be efficiently optimised. We optimise these relaxations with standard algorithms and compare our results with several stateoftheart methods, showing excellent performance. 1
Transduction with Matrix Completion: Three Birds with One Stone
"... We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multilabel learning, where each item has more than one label, ii) transduction, where most of these labels are un ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
(Show Context)
We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multilabel learning, where each item has more than one label, ii) transduction, where most of these labels are unspecified, and iii) missing data, where a large number of features are missing. We obtained satisfactory results on several realworld tasks, suggesting that the low rank assumption may not be as restrictive as it seems. Our method allows for different loss functions to apply on the feature and label entries of the matrix. The resulting nuclear norm minimization problem is solved with a modified fixedpoint continuation method that is guaranteed to find the global optimum. 1
A MultiView Embedding Space for Modeling Internet Images, Tags, and their Semantics
 IJCV
"... This paper investigates the problem of modeling Internet images and associated text or tags for tasks such as imagetoimage search, tagtoimage search, and imagetotag search (image annotation). We start with canonical correlation analysis (CCA), a popular and successful approach for mapping vis ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
This paper investigates the problem of modeling Internet images and associated text or tags for tasks such as imagetoimage search, tagtoimage search, and imagetotag search (image annotation). We start with canonical correlation analysis (CCA), a popular and successful approach for mapping visual and textual features to the same latent space, and incorporate a third view capturing highlevel image semantics, represented either by a single category or multiple nonmutuallyexclusive concepts. We present two ways to train the threeview embedding: supervised, with the third view coming from groundtruth labels or search keywords; and unsupervised, with semantic themes automatically obtained by clustering the tags. To ensure high accuracy for retrieval tasks while keeping the learning process scalable, we combine multiple strong visual features
MultiLabel Output Codes using Canonical Correlation Analysis
"... Traditional errorcorrectingoutput codes (ECOCs) decompose a multiclass classification problem into many binary problems. Although it seems natural to use ECOCs for multilabel problems as well, doing so naively createsissues related to: the validity of the encoding, the efficiency of the decoding ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
Traditional errorcorrectingoutput codes (ECOCs) decompose a multiclass classification problem into many binary problems. Although it seems natural to use ECOCs for multilabel problems as well, doing so naively createsissues related to: the validity of the encoding, the efficiency of the decoding, the predictabilityofthegeneratedcodeword,and the exploitation of the label dependency. Using canonical correlation analysis, we propose an errorcorrecting code for multilabel classification. Labeldependencyischaracterized as the most predictable directions in the label space, which are extracted as canonical output variates and encoded into the codeword. Predictions for the codeword define a graphical model of labels with both Bernoulli potentials (from classifiers on the labels) and Gaussian potentials (from regression on the canonical output variates). Decoding is performed by meanfield approximation. We establish connections between the proposed code and research areas such as compressed sensing and ensemble learning. Some of these connections contribute to better understanding of the new code, and others lead to practical improvements in code design. In our empirical study, the proposed code leads to substantial improvements compared to various competitors in music emotion classification and outdoor scene recognition. 1
Learning and inference in probabilistic classifier chains with beam search
 In Proceedings of the European
, 2012
"... Abstract. Multilabel learning is an extension of binary classification that is both challenging and practically important. Recently, a method for multilabel learning called probabilistic classifier chains (PCCs) was proposed with numerous appealing properties, such as conceptual simplicity, flexibi ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Multilabel learning is an extension of binary classification that is both challenging and practically important. Recently, a method for multilabel learning called probabilistic classifier chains (PCCs) was proposed with numerous appealing properties, such as conceptual simplicity, flexibility, and theoretical justification. However, PCCs suffer from the computational issue of having inference that is exponential in the number of tags, and the practical issue of being sensitive to the suitable ordering of the tags while training. In this paper, we show how the classical technique of beam search may be used to solve both these problems. Specifically, we show how to use beam search to perform tractable test time inference, and how to integrate beam search with training to determine a suitable tag ordering. Experimental results on a range of multilabel datasets show that these proposed changes dramatically extend the practical viability of PCCs. 1
Sparse Bayesian MultiTask Learning
"... We propose a new sparse Bayesian model for multitask regression and classification. The model is able to capture correlations between tasks, or more specifically a lowrank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity in ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
We propose a new sparse Bayesian model for multitask regression and classification. The model is able to capture correlations between tasks, or more specifically a lowrank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrixvariate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1
Maximum Margin MultiLabel Structured Prediction
 In NIPS
, 2011
"... We study multilabel prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in computational biology, and graph matching with symmetries. Conventional multilabel classification techniques are typically not applicable ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
We study multilabel prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in computational biology, and graph matching with symmetries. Conventional multilabel classification techniques are typically not applicable in this situation, because they require explicit enumeration of the label set, which is infeasible in case of structured outputs. Relying on techniques originally designed for singlelabel structured prediction, in particular structured support vector machines, results in reduced prediction accuracy, or leads to infeasible optimization problems. In this work we derive a maximummargin training formulation for multilabel structured prediction that remains computationally tractable while achieving high prediction accuracy. It also shares most beneficial properties with singlelabel maximummargin approaches, in particular formulation as a convex optimization problem, efficient working set training, and PACBayesian generalization bounds. 1
Exploiting Tag and Word Correlations for Improved Webpage Clustering
"... Automatic clustering of webpages helps a number of information retrieval tasks, such as improving user interfaces, collection clustering, introducing diversity in search results, etc. Typically, webpage clustering algorithms only use features extracted from the pagetext. However, the advent of soci ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Automatic clustering of webpages helps a number of information retrieval tasks, such as improving user interfaces, collection clustering, introducing diversity in search results, etc. Typically, webpage clustering algorithms only use features extracted from the pagetext. However, the advent of socialbookmarking websites, such as StumbleUpon 1 and Delicious 2, has led to a huge amount of usergenerated content such as the tag information that is associated with the webpages. Inthispaper,wepresentasubspacebasedfeature extractionapproachwhichleveragestaginformationtocomplement the pagecontents of a webpage to extract highly discriminative features, with the goal of improved clustering performance. In our approach, we consider pagetext and tags as two separate views of the data, and learn a shared subspace that maximizes the correlation between the two views. Any clustering algorithm can then be applied in this subspace. We compare our subspace based approach with a numberofbaselinesthatusetaginformationinvariousother ways, and show that the subspace based approach leads to improved performance on the webpage clustering task. Although our results here are on the webpage clustering task, the same approach can be used for webpage classification as well. In the end, we also suggest possible future work for leveraging tag information in webpage clustering, especially when tag information is present for not all, but only for a small number of webpages. Also holds an adjunct position with the School of Computing,
The Supervised IBP: Neighbourhood Preserving Infinite Latent Feature Models
"... We propose a probabilistic model to infer supervised latent variables in the Hamming space from observed data. Our model allows simultaneous inference of the number of binary latent variables, and their values. The latent variables preserve neighbourhood structure of the data in a sense that object ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We propose a probabilistic model to infer supervised latent variables in the Hamming space from observed data. Our model allows simultaneous inference of the number of binary latent variables, and their values. The latent variables preserve neighbourhood structure of the data in a sense that objects in the same semantic concept have similar latent values, and objects in different concepts have dissimilar latent values. We formulate the supervised infinite latent variable problem based on an intuitive principle of pulling objects together if they are of the same type, and pushing them apart if they are not. We then combine this principle with a flexible Indian Buffet Process prior on the latent variables. We show that the inferred supervised latent variables can be directly used to perform a nearest neighbour search for the purpose of retrieval. We introduce a new application of dynamically extending hash codes, and show how to effectively couple the structure of the hash codes with continuously growing structure of the neighbourhood preserving infinite latent feature space. 1
D.: Semisupervised multilabel classification  a simultaneous largemargin, subspace learning approach
 In: ECML/PKDD
"... Abstract. Labeled data is often sparse in common learning scenarios, either because it is too time consuming or too expensive to obtain, while unlabeled data is almost always plentiful. This asymmetry is exacerbated in multilabel learning, where the labeling process is more complex than in the sing ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Labeled data is often sparse in common learning scenarios, either because it is too time consuming or too expensive to obtain, while unlabeled data is almost always plentiful. This asymmetry is exacerbated in multilabel learning, where the labeling process is more complex than in the single label case. Although it is important to consider semisupervised methods for multilabel learning, as it is in other learning scenarios, surprisingly, few proposals have been investigated for this particular problem. In this paper, we present a new semisupervised multilabel learning method that combines largemargin multilabel classification with unsupervised subspace learning. We propose an algorithm that learns a subspace representation of the labeled and unlabeled inputs, while simultaneously training a supervised largemargin multilabel classifier on the labeled portion. Although joint training of these two interacting components might appear intractable, we exploit recent developments in induced matrix norm optimization to show that these two problems can be solved jointly, globally and efficiently. In particular, we develop an efficient training procedure based on subgradient search and a simple coordinate descent strategy. An experimental evaluation demonstrates that semisupervised subspace learning can improve the performance of corresponding supervised multilabel learning methods.