Results 1  10
of
41
Reverse multilabel learning
 Advances in Neural Information Processing Systems 23
, 2010
"... Multilabel classification is the task of predicting potentially multiple labels for a given instance. This is common in several applications such as image annotation, document classification and gene function prediction. In this paper we present a formulation for this problem based on reverse predi ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
Multilabel classification is the task of predicting potentially multiple labels for a given instance. This is common in several applications such as image annotation, document classification and gene function prediction. In this paper we present a formulation for this problem based on reverse prediction: we predict sets of instances given the labels. By viewing the problem from this perspective, the most popular quality measures for assessing the performance of multilabel classification admit relaxations that can be efficiently optimised. We optimise these relaxations with standard algorithms and compare our results with several stateoftheart methods, showing excellent performance. 1
Multilabel classification on tree and DAGstructured hierarchies
 In ICML
, 2011
"... Many realworld applications involve multilabel classification, in which the labels are organized in the form of a tree or directed acyclic graph (DAG). However, current research efforts typically ignore the label dependencies or can only exploit the dependencies in treestructured hierarchies. In t ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Many realworld applications involve multilabel classification, in which the labels are organized in the form of a tree or directed acyclic graph (DAG). However, current research efforts typically ignore the label dependencies or can only exploit the dependencies in treestructured hierarchies. In this paper, we present a novel hierarchical multilabel classification algorithm which can be used on both tree and DAGstructured hierarchies. The key idea is to formulate the search for the optimal consistent multilabel as the finding of the best subgraph in a tree/DAG. Using a simple greedy strategy, the proposed algorithm is computationally efficient, easy to implement, does not suffer from the problem of insufficient/skewed training data in classifier training, and can be readily used on large hierarchies. Theoretical results guarantee the optimality of the obtained solution. Experiments are performed on a large number of functional genomics data sets. The proposed method consistently outperforms the stateoftheart method on both tree and DAGstructured hierarchies. 1.
MultiLabel Output Codes using Canonical Correlation Analysis
"... Traditional errorcorrectingoutput codes (ECOCs) decompose a multiclass classification problem into many binary problems. Although it seems natural to use ECOCs for multilabel problems as well, doing so naively createsissues related to: the validity of the encoding, the efficiency of the decoding ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Traditional errorcorrectingoutput codes (ECOCs) decompose a multiclass classification problem into many binary problems. Although it seems natural to use ECOCs for multilabel problems as well, doing so naively createsissues related to: the validity of the encoding, the efficiency of the decoding, the predictabilityofthegeneratedcodeword,and the exploitation of the label dependency. Using canonical correlation analysis, we propose an errorcorrecting code for multilabel classification. Labeldependencyischaracterized as the most predictable directions in the label space, which are extracted as canonical output variates and encoded into the codeword. Predictions for the codeword define a graphical model of labels with both Bernoulli potentials (from classifiers on the labels) and Gaussian potentials (from regression on the canonical output variates). Decoding is performed by meanfield approximation. We establish connections between the proposed code and research areas such as compressed sensing and ensemble learning. Some of these connections contribute to better understanding of the new code, and others lead to practical improvements in code design. In our empirical study, the proposed code leads to substantial improvements compared to various competitors in music emotion classification and outdoor scene recognition. 1
Submodular MultiLabel Learning
"... In this paper we present an algorithm to learn a multilabel classifier which attempts at directly optimising the Fscore. The key novelty of our formulation is that we explicitly allow for assortative (submodular) pairwise label interactions, i.e., we can leverage the coocurrence of pairs of label ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
In this paper we present an algorithm to learn a multilabel classifier which attempts at directly optimising the Fscore. The key novelty of our formulation is that we explicitly allow for assortative (submodular) pairwise label interactions, i.e., we can leverage the coocurrence of pairs of labels in order to improve the quality of prediction. Prediction in this model consists of minimising a particular submodular set function, what can be accomplished exactly and efficiently via graphcuts. Learning however is substantially more involved and requires the solution of an intractable combinatorial optimisation problem. We present an approximate algorithm for this problem and prove that it is sound in the sense that it never predicts incorrect labels. We also present a nontrivial test of a sufficient condition for our algorithm to have found an optimal solution. We present experiments on benchmark multilabel datasets, which attest the value of the proposed technique. We also make available source code that enables the reproduction of our experiments. 1
Transduction with Matrix Completion: Three Birds with One Stone
"... We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multilabel learning, where each item has more than one label, ii) transduction, where most of these labels are un ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multilabel learning, where each item has more than one label, ii) transduction, where most of these labels are unspecified, and iii) missing data, where a large number of features are missing. We obtained satisfactory results on several realworld tasks, suggesting that the low rank assumption may not be as restrictive as it seems. Our method allows for different loss functions to apply on the feature and label entries of the matrix. The resulting nuclear norm minimization problem is solved with a modified fixedpoint continuation method that is guaranteed to find the global optimum. 1
From Black and White to Full Colour: Extending Redescription Mining Outside the Boolean World
"... Redescription mining is a powerful data analysis tool that is used to find multiple descriptions of the same entities. Consider geographical regions as an example. They can be characterized by the fauna that inhabits them on one hand and by their meteorological conditions on the other hand. Finding ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Redescription mining is a powerful data analysis tool that is used to find multiple descriptions of the same entities. Consider geographical regions as an example. They can be characterized by the fauna that inhabits them on one hand and by their meteorological conditions on the other hand. Finding such redescriptors, a task known as nichefinding, is of much importance in biology. But current redescription mining methods cannot handle other than Boolean data. This restricts the range of possible applications or makes discretization a prerequisite, entailing a possibly harmful loss of information. In nichefinding, while the fauna can be naturally represented using a Boolean presence/absence data, the weather cannot. In this paper, we extend redescription mining to realvalued data using a surprisingly simple and efficient approach. We provide extensive experimental evaluation to study the behaviour of the proposed algorithm. Furthermore, we show the statistical significance of our results using recent innovations on randomization methods. 1
Regret analysis for performance metrics in multilabel classification: The case of hamming and subset zeroone loss
 In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
"... Abstract. In multilabel classification (MLC), each instance is associated with a subset of labels instead of a single class, as in conventional classification, and this generalization enables the definition of a multitude of loss functions. Indeed, a large number of losses has already been proposed ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract. In multilabel classification (MLC), each instance is associated with a subset of labels instead of a single class, as in conventional classification, and this generalization enables the definition of a multitude of loss functions. Indeed, a large number of losses has already been proposed and is commonly applied as performance metrics in experimental studies. However, even though these loss functions are of a quite different nature, a concrete connection between the type of multilabel classifier used and the loss to be minimized is rarely established, implicitly giving the misleading impression that the same method can be optimal for different loss functions. In this paper, we elaborate on risk minimization and the connection between loss functions in MLC, both theoretically and empirically. In particular, we compare two important loss functions, namely the Hamming loss and the subset 0/1 loss. We perform a regret analysis, showing how poor a classifier intended to minimize the subset 0/1 loss can become in terms of Hamming loss and vice versa. The theoretical results are corroborated by experimental studies, and their implications for MLC methods are discussed in a broader context. 1
LEARNING FROM MULTILABEL DATA
, 2009
"... This volume contains research papers accepted for presentation at the 1st International Workshop on Learning from MultiLabel Data (MLD’09), which will be held in Bled, Slovenia, at September 7, 2009 in conjunction with ECML/PKDD 2009. MLD’09 is devoted to multilabel learning, which is an emerging ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
This volume contains research papers accepted for presentation at the 1st International Workshop on Learning from MultiLabel Data (MLD’09), which will be held in Bled, Slovenia, at September 7, 2009 in conjunction with ECML/PKDD 2009. MLD’09 is devoted to multilabel learning, which is an emerging and promising research topic of machine learning. In multilabel learning, each example is associated with multiple labels simultaneously, which therefore encompasses traditional supervised learning (singlelabel) as its special case. Multilabel learning is related to various machine learning paradigms, such as classification, ranking, semisupervised learning, active learning, multiinstance learning, dimensionality reduction, etc. Initial attempts on multilabel learning date back to 1999 with works on multilabel text categorization. In recent years, the task of learning from multilabel data has been addressed by a number of methods adapted from various popular learning techniques, such as neural networks, decision trees, knearest neighbors, kernel methods, ensemble methods, etc. More impressively, multilabel learning has manifested its effectiveness in a diversity of realworld applications, such as image/video annotation, bioinformatics,
A Composite Likelihood View for MultiLabel Classification
"... Given limited training samples, learning to classify multiple labels is challenging. Problem decomposition [24] is widely used in this case, where the original problem is decomposed into a set of easiertolearn subproblems, and predictions from subproblems are combined to make the final decision. I ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Given limited training samples, learning to classify multiple labels is challenging. Problem decomposition [24] is widely used in this case, where the original problem is decomposed into a set of easiertolearn subproblems, and predictions from subproblems are combined to make the final decision. In this paper we show the connection between composite likelihoods [17] and many multilabel decomposition methods, e.g., onevsall, onevsone, calibrated label ranking, probabilistic classifier chain. This connection holds promise for improving problem decomposition in both the choice of subproblems and the combination of subproblem decisions. As an attempt to exploit this connection, we design a composite marginal method that improves pairwise decomposition. Pairwise label comparisons, which seem to be a natural choice for subproblems, are replaced by bivariate label densities, which are more informative and natural components in a composite likelihood. For combining subproblem decisions, we propose a new meanfield approximation that minimizes the notion of composite divergence and is potentially more robust to inaccurate estimations in subproblems. Empirical studies on five data sets show that, given limited training samples, the proposed method outperforms many alternatives. 1
Learning and Inference in Probabilistic Classifier Chains with Beam Search
"... Abstract. Multilabel learning is an extension of binary classification that is both challenging and practically important. Recently, a method for multilabel learning called probabilistic classifier chains (PCCs) was proposed with numerous appealing properties, such as conceptual simplicity, flexibil ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Multilabel learning is an extension of binary classification that is both challenging and practically important. Recently, a method for multilabel learning called probabilistic classifier chains (PCCs) was proposed with numerous appealing properties, such as conceptual simplicity, flexibility, and theoretical justification. However, PCCs suffer from the computational issue of having inference that is exponential in the number of tags, and the practical issue of being sensitive to the suitable ordering of the tags while training. In this paper, we show how the classical technique of beam search may be used to solve both these problems. Specifically, we show how to use beam search to perform tractable test time inference, and how to integrate beam search with training to determine a suitable tag ordering. Experimental results on a range of multilabel datasets show that these proposed changes dramatically extend the practical viability of PCCs. 1