Results 1  10
of
92
A Review on MultiLabel Learning Algorithms
"... Multilabel learning studies the problem where each example is represented by a single instance while associated with a set of labels simultaneously. During the past decade, significant amount of progresses have been made towards this emerging machine learning paradigm. This paper aims to provide a ..."
Abstract

Cited by 41 (7 self)
 Add to MetaCart
Multilabel learning studies the problem where each example is represented by a single instance while associated with a set of labels simultaneously. During the past decade, significant amount of progresses have been made towards this emerging machine learning paradigm. This paper aims to provide a timely review on this area with emphasis on stateoftheart multilabel learning algorithms. Firstly, fundamentals on multilabel learning including formal definition and evaluation metrics are given. Secondly and primarily, eight representative multilabel learning algorithms are scrutinized under common notations with relevant analyses and discussions. Thirdly, several related learning settings are briefly summarized. As a conclusion, online resources and open research problems on multilabel learning are outlined for reference purposes.
Active learning by querying informative and representative examples
 in Advances in Neural Information Processing Systems (NIPS'10
, 2010
"... Most active learning approaches select either informative or representative unlabeled instances to query their labels. Although several active learning algorithms have been proposed to combine the two criteria for query selection, they are usually ad hoc in finding unlabeled instances that are bot ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
(Show Context)
Most active learning approaches select either informative or representative unlabeled instances to query their labels. Although several active learning algorithms have been proposed to combine the two criteria for query selection, they are usually ad hoc in finding unlabeled instances that are both informative and representative. We address this challenge by a principled approach, termed QUIRE, based on the minmax view of active learning. The proposed approach provides a systematic way for measuring and combining the informativeness and representativeness of an instance. Extensive experimental results show that the proposed QUIRE approach outperforms several stateoftheart active learning approaches. 1
Reverse multilabel learning
 Advances in Neural Information Processing Systems 23
, 2010
"... Multilabel classification is the task of predicting potentially multiple labels for a given instance. This is common in several applications such as image annotation, document classification and gene function prediction. In this paper we present a formulation for this problem based on reverse predi ..."
Abstract

Cited by 29 (2 self)
 Add to MetaCart
(Show Context)
Multilabel classification is the task of predicting potentially multiple labels for a given instance. This is common in several applications such as image annotation, document classification and gene function prediction. In this paper we present a formulation for this problem based on reverse prediction: we predict sets of instances given the labels. By viewing the problem from this perspective, the most popular quality measures for assessing the performance of multilabel classification admit relaxations that can be efficiently optimised. We optimise these relaxations with standard algorithms and compare our results with several stateoftheart methods, showing excellent performance. 1
Multilabel classification on tree and DAGstructured hierarchies
 In ICML
, 2011
"... Many realworld applications involve multilabel classification, in which the labels are organized in the form of a tree or directed acyclic graph (DAG). However, current research efforts typically ignore the label dependencies or can only exploit the dependencies in treestructured hierarchies. In t ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
(Show Context)
Many realworld applications involve multilabel classification, in which the labels are organized in the form of a tree or directed acyclic graph (DAG). However, current research efforts typically ignore the label dependencies or can only exploit the dependencies in treestructured hierarchies. In this paper, we present a novel hierarchical multilabel classification algorithm which can be used on both tree and DAGstructured hierarchies. The key idea is to formulate the search for the optimal consistent multilabel as the finding of the best subgraph in a tree/DAG. Using a simple greedy strategy, the proposed algorithm is computationally efficient, easy to implement, does not suffer from the problem of insufficient/skewed training data in classifier training, and can be readily used on large hierarchies. Theoretical results guarantee the optimality of the obtained solution. Experiments are performed on a large number of functional genomics data sets. The proposed method consistently outperforms the stateoftheart method on both tree and DAGstructured hierarchies. 1.
Multidimensional classification with Bayesian networks.
 International Journal of Approximate Reasoning,
, 2011
"... Multidimensional classification aims at finding a function that assigns a vector of class values to a given vector of features. In this paper, this problem is tackled by a general family of models, called multidimensional Bayesian network classifiers (MBCs). This probabilistic graphical model org ..."
Abstract

Cited by 24 (7 self)
 Add to MetaCart
(Show Context)
Multidimensional classification aims at finding a function that assigns a vector of class values to a given vector of features. In this paper, this problem is tackled by a general family of models, called multidimensional Bayesian network classifiers (MBCs). This probabilistic graphical model organizes class and feature variables as three different subgraphs: class subgraph, feature subgraph, and bridge (from class to features) subgraph. Under the standard 01 loss function, the most probable explanation (MPE) must be computed, for which we provide theoretical results in both general MBCs and in MBCs decomposable into maximal connected components. Moreover, when computing the MPE, the vector of class values is covered by following a special ordering (gray code). Under other loss functions defined in accordance with a decomposable structure, we derive theoretical results on how to minimize the expected loss. Besides these inference issues, the paper presents flexible algorithms for learning MBC structures from data based on filter, wrapper and hybrid approaches. The cardinality of the search space is also given. New performance evaluation metrics adapted from the singleclass setting are introduced. Experimental results with three benchmark data sets are encouraging, and they outperform stateoftheart algorithms for multilabel classification.
Transduction with Matrix Completion: Three Birds with One Stone
"... We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multilabel learning, where each item has more than one label, ii) transduction, where most of these labels are un ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
(Show Context)
We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multilabel learning, where each item has more than one label, ii) transduction, where most of these labels are unspecified, and iii) missing data, where a large number of features are missing. We obtained satisfactory results on several realworld tasks, suggesting that the low rank assumption may not be as restrictive as it seems. Our method allows for different loss functions to apply on the feature and label entries of the matrix. The resulting nuclear norm minimization problem is solved with a modified fixedpoint continuation method that is guaranteed to find the global optimum. 1
MultiLabel Output Codes using Canonical Correlation Analysis
"... Traditional errorcorrectingoutput codes (ECOCs) decompose a multiclass classification problem into many binary problems. Although it seems natural to use ECOCs for multilabel problems as well, doing so naively createsissues related to: the validity of the encoding, the efficiency of the decoding ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
Traditional errorcorrectingoutput codes (ECOCs) decompose a multiclass classification problem into many binary problems. Although it seems natural to use ECOCs for multilabel problems as well, doing so naively createsissues related to: the validity of the encoding, the efficiency of the decoding, the predictabilityofthegeneratedcodeword,and the exploitation of the label dependency. Using canonical correlation analysis, we propose an errorcorrecting code for multilabel classification. Labeldependencyischaracterized as the most predictable directions in the label space, which are extracted as canonical output variates and encoded into the codeword. Predictions for the codeword define a graphical model of labels with both Bernoulli potentials (from classifiers on the labels) and Gaussian potentials (from regression on the canonical output variates). Decoding is performed by meanfield approximation. We establish connections between the proposed code and research areas such as compressed sensing and ensemble learning. Some of these connections contribute to better understanding of the new code, and others lead to practical improvements in code design. In our empirical study, the proposed code leads to substantial improvements compared to various competitors in music emotion classification and outdoor scene recognition. 1
Automated Topic Naming  Supporting Crossproject Analysis of Software Maintenance Activities
 EMPIRICAL SOFTWARE ENGINEERING
"... Software repositories provide a deluge of software artifacts to analyze. Researchers have attempted to summarize, categorize, and relate these artifacts by using semiunsupervised machinelearning algorithms, such as Latent Dirichlet Allocation (LDA), used for concept and topic analysis to suggest ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
Software repositories provide a deluge of software artifacts to analyze. Researchers have attempted to summarize, categorize, and relate these artifacts by using semiunsupervised machinelearning algorithms, such as Latent Dirichlet Allocation (LDA), used for concept and topic analysis to suggest candidate wordlists or topics that describe and relate software artifacts. However, these wordlists and topics are difficult to interpret in the absence of meaningful summary labels. Current topic modeling techniques assume manual labelling and do not use domainspecific knowledge to improve, contextualize, or describe results for the developers. We propose a solution: automated labelled topic extraction. Topics are extracted using LDA from commitlog comments recovered from source control systems. These topics are given labels from a generalizable crossproject taxonomy, consisting of nonfunctional
Featureaware label space dimension reduction for multilabel classification problem
, 2012
"... Label space dimension reduction (LSDR) is an efficient and effective paradigm for multilabel classification with many classes. Existing approaches to LSDR, such as compressive sensing and principal label space transformation, exploit only the label part of the dataset, but not the feature part. In ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
Label space dimension reduction (LSDR) is an efficient and effective paradigm for multilabel classification with many classes. Existing approaches to LSDR, such as compressive sensing and principal label space transformation, exploit only the label part of the dataset, but not the feature part. In this paper, we propose a novel approach to LSDR that considers both the label and the feature parts. The approach, called conditional principal label space transformation, is based on minimizing an upper bound of the popular Hamming loss. The minimization step of the approach can be carried out efficiently by a simple use of singular value decomposition. In addition, the approach can be extended to a kernelized version that allows the use of sophisticated feature combinations to assist LSDR. The experimental results verify that the proposed approach is more effective than existing ones to LSDR across many realworld datasets. 1
Submodular MultiLabel Learning
"... In this paper we present an algorithm to learn a multilabel classifier which attempts at directly optimising the Fscore. The key novelty of our formulation is that we explicitly allow for assortative (submodular) pairwise label interactions, i.e., we can leverage the coocurrence of pairs of label ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
In this paper we present an algorithm to learn a multilabel classifier which attempts at directly optimising the Fscore. The key novelty of our formulation is that we explicitly allow for assortative (submodular) pairwise label interactions, i.e., we can leverage the coocurrence of pairs of labels in order to improve the quality of prediction. Prediction in this model consists of minimising a particular submodular set function, what can be accomplished exactly and efficiently via graphcuts. Learning however is substantially more involved and requires the solution of an intractable combinatorial optimisation problem. We present an approximate algorithm for this problem and prove that it is sound in the sense that it never predicts incorrect labels. We also present a nontrivial test of a sufficient condition for our algorithm to have found an optimal solution. We present experiments on benchmark multilabel datasets, which attest the value of the proposed technique. We also make available source code that enables the reproduction of our experiments. 1