Results 1 - 10
of
38
Random k-Labelsets: An Ensemble Method for Multilabel Classification
"... Abstract. This paper proposes an ensemble method for multilabel classification. The RAndom k-labELsets (RAKEL) algorithm constructs each member of the ensemble by considering a small random subset of labels and learning a single-label classifier for the prediction of each element in the powerset of ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
Abstract. This paper proposes an ensemble method for multilabel classification. The RAndom k-labELsets (RAKEL) algorithm constructs each member of the ensemble by considering a small random subset of labels and learning a single-label classifier for the prediction of each element in the powerset of this subset. In this way, the proposed algorithm aims to take into account label correlations using single-label classifiers that are applied on subtasks with manageable number of labels and adequate number of examples per label. Experimental results on common multilabel domains involving protein, document and scene classification show that better performance can be achieved compared to popular multilabel classification approaches. 1
Mining multi-label data
- In Data Mining and Knowledge Discovery Handbook
, 2010
"... A large body of research in supervised learning deals with the analysis of singlelabel data, where training examples are associated with a single label λ from a set of disjoint labels L. However, training examples in several application domains are often associated with a set of labels Y ⊆ L. Such d ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
A large body of research in supervised learning deals with the analysis of singlelabel data, where training examples are associated with a single label λ from a set of disjoint labels L. However, training examples in several application domains are often associated with a set of labels Y ⊆ L. Such data are called multi-label.
Extracting Shared Subspace for Multi-label Classification
"... Multi-label problems arise in various domains such as multitopic document categorization and protein function prediction. One natural way to deal with such problems is to construct a binary classifier for each label, resulting in a set of independent binary classification problems. Since the multipl ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Multi-label problems arise in various domains such as multitopic document categorization and protein function prediction. One natural way to deal with such problems is to construct a binary classifier for each label, resulting in a set of independent binary classification problems. Since the multiple labels share the same input space, and the semantics conveyed by different labels are usually correlated, it is essential to exploit the correlation information contained in different labels. In this paper, we consider a general framework for extracting shared structures in multi-label classification. In this framework, a common subspace is assumed to be shared among multiple labels. We show that the optimal solution to the proposed formulation can be obtained by solving a generalized eigenvalue problem, though the problem is nonconvex. For high-dimensional problems, direct computation of the solution is expensive, and we develop an efficient algorithm for this case. One appealing feature of the proposed framework is that it includes several well-known algorithms as special cases, thus elucidating their intrinsic relationships. We have conducted extensive experiments on eleven multitopic web page categorization tasks, and results demonstrate the effectiveness of the proposed formulation in comparison with several representative algorithms.
Database-text alignment via structured multilabel classification
- In Proc. of the International Joint Conference on Artificial Intelligence
, 2007
"... This paper addresses the task of aligning a database with a corresponding text. The goal is to link individual database entries with sentences that verbalize the same information. By providing explicit semantics-to-text links, these alignments can aid the training of natural language generation and ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
This paper addresses the task of aligning a database with a corresponding text. The goal is to link individual database entries with sentences that verbalize the same information. By providing explicit semantics-to-text links, these alignments can aid the training of natural language generation and information extraction systems. Beyond these pragmatic benefits, the alignment problem is appealing from a modeling perspective: the mappings between database entries and text sentences exhibit rich structural dependencies, unique to this task. Thus, the key challenge is to make use of as many global dependencies as possible without sacrificing tractability. To this end, we cast text-database alignment as a structured multilabel classification task where each sentence is labeled with a subset of matching database entries. In contrast to existing multilabel classifiers, our approach operates over arbitrary global features of inputs and proposed labels. We compare our model with a baseline classifier that makes locally optimal decisions. Our results show that the proposed model yields a 15% relative reduction in error, and compares favorably with human performance. 1
Combining Instance-Based Learning and Logistic Regression for Multilabel Classification
"... Abstract. Multilabel classification is an extension of conventional classification in which a single instance can be associated with multiple labels. Recent research has shown that, just like for standard classification, instance-based learning algorithms relying on the nearest neighbor estimation p ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Abstract. Multilabel classification is an extension of conventional classification in which a single instance can be associated with multiple labels. Recent research has shown that, just like for standard classification, instance-based learning algorithms relying on the nearest neighbor estimation principle can be used quite successfully in this context. However, since hitherto existing algorithms do not take correlations and interdependencies between labels into account, their potential has not yet been fully exploited. In this paper, we propose a new approach to multilabel classification, which is based on a framework that unifies instancebased learning and logistic regression, comprising both methods as special cases. This approach allows one to capture interdependencies between labels and, moreover, to combine model-based and similarity-based inference for multilabel classification. As will be shown by experimental studies, our approach is able to improve predictive accuracy in terms of several evaluation criteria for multilabel prediction. 1
Semi-supervised Multi-label Learning by Constrained Non-negative Matrix Factorization
, 2006
"... We present a novel framework for multi-label learning that explicitly addresses the challenge arising from the large number of classes and a small size of training data. The key assumption behind this work is that two examples tend to have large overlap in their assigned class memberships if they sh ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We present a novel framework for multi-label learning that explicitly addresses the challenge arising from the large number of classes and a small size of training data. The key assumption behind this work is that two examples tend to have large overlap in their assigned class memberships if they share high similarity in their input patterns. We capitalize this assumption by first computing two sets of similarities, one based on the input patterns of examples, and the other based on the class memberships of the examples. We then search for the optimal assignment of class memberships to the unlabeled data that minimizes the difference between these two sets of similarities. The optimization problem is formulated as a constrained Non-negative Matrix Factorization (NMF) problem, and an algorithm is presented to efficiently find the solution. Compared to the existing approaches for multi-label learning, the proposed approach is advantageous in that it is able to explore both the unlabeled data and the correlation among different classes simultaneously. Experiments with text categorization show that our approach performs significantly better than several state-of-the-art classification techniques when the number of classes is large and the size of training data is small.
Semi-supervised Multi-label Learning by Solving a Sylvester Equation
"... Multi-label learning refers to the problems where an instance can be assigned to more than one category. In this paper, we present a novel Semi-supervised algorithm for Multi-label learning by solving a Sylvester Equation (SMSE). Two graphs are first constructed on instance level and category level ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Multi-label learning refers to the problems where an instance can be assigned to more than one category. In this paper, we present a novel Semi-supervised algorithm for Multi-label learning by solving a Sylvester Equation (SMSE). Two graphs are first constructed on instance level and category level respectively. For instance level, a graph is defined based on both labeled and unlabeled instances, where each node represents one instance and each edge weight reflects the similarity between corresponding pairwise instances. Similarly, for category level, a graph is also built based on
Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains
"... In the realm of multilabel classification (MLC), it has become an opinio communis that optimal predictive performance can only be achieved by learners that explicitly take label dependence into account. The goal of this paper is to elaborate on this postulate in a critical way. To this end, we forma ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In the realm of multilabel classification (MLC), it has become an opinio communis that optimal predictive performance can only be achieved by learners that explicitly take label dependence into account. The goal of this paper is to elaborate on this postulate in a critical way. To this end, we formalize and analyze MLC within a probabilistic setting. Thus, it becomes possible to look at the problem from the point of view of risk minimization and Bayes optimal prediction. Moreover, inspired by our probabilistic setting, we propose a new method for MLC that generalizes and outperforms another approach, called classifier chains, that was recently introduced in the literature. 1.
Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora
"... We propose Dirichlet-Bernoulli Alignment (DBA), a generative model for corpora in which each pattern (e.g., a document) contains a set of instances (e.g., paragraphs in the document) and belongs to multiple classes. By casting predefined classes as latent Dirichlet variables (i.e., instance level la ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We propose Dirichlet-Bernoulli Alignment (DBA), a generative model for corpora in which each pattern (e.g., a document) contains a set of instances (e.g., paragraphs in the document) and belongs to multiple classes. By casting predefined classes as latent Dirichlet variables (i.e., instance level labels), and modeling the multi-label of each pattern as Bernoulli variables conditioned on the weighted empirical average of topic assignments, DBA automatically aligns the latent topics discovered from data to human-defined classes. DBA is useful for both pattern classification and instance disambiguation, which are tested on text classification and named entity disambiguation in web search queries respectively. 1
On Search Guide Phrase Compilation for Recommending Home Medical Products
"... Abstract — To help people find desired home medical products (HMPs), we developed an intelligent personal health record (iPHR) system that can automatically recommend HMPs based on users ’ health issues. Using nursing knowledge, we pre-compile a set of “search guide ” phrases that provides semantic ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Abstract — To help people find desired home medical products (HMPs), we developed an intelligent personal health record (iPHR) system that can automatically recommend HMPs based on users ’ health issues. Using nursing knowledge, we pre-compile a set of “search guide ” phrases that provides semantic translation from words describing health issues to their underlying medical meanings. Then iPHR automatically generates queries from those phrases and uses them and a search engine to retrieve HMPs. To avoid missing relevant HMPs during retrieval, the compiled search guide phrases need to be comprehensive. Such compilation is a challenging task because nursing knowledge updates frequently and contains numerous details scattered in many sources. This paper presents a semi-automatic tool facilitating such compilation. Our idea is to formulate the phrase compilation task as a multi-label classification problem. For each newly obtained search guide phrase, we first use nursing knowledge and information retrieval techniques to identify a small set of potentially relevant classes with corresponding hints. Then a nurse makes the final decision on assigning this phrase to proper classes based on those hints. We demonstrate the effectiveness of our techniques by compiling search guide phrases from an occupational therapy textbook. I.

